Traffic Management
Routing​
- Gateway Proxy
- Load Balancer (LB)
Aspect | API Gateway | Forward Proxy | Reverse Proxy |
---|---|---|---|
Visualization | |||
Definition | Server that acts as an API front-end, receiving API requests, enforcing throttling and security policies, passing requests to the back-end service, and then passing the response back to the requester | Forward proxy, often simply referred to as a proxy, is an intermediary server that sits between the client and the internet. It captures all requests from the client and forwards them to the internet on behalf of the client | A reverse proxy is a type of proxy server that sits between the client and one or more backend servers. It accepts requests from clients, forwards those requests to the appropriate servers, and then returns the servers' responses to the clients |
Functionality |
|
|
|
Use Cases |
|
|
|
Advantages |
|
|
|
Disadvantages |
|
|
|
Vendors |
|
|
|
- Algorithms
- Components
- Techniques
- Deployment Architecture
- Protocols
- Best Practices
Algorithm | Visualization | Type | Definition | Use Cases |
---|---|---|---|---|
Round Robin | Static |
|
| |
Sticky Round Robin | Static |
|
| |
Weighted Round Robin | Static |
|
| |
IP/URL Hash | Static |
|
| |
Least Connections | Dynamic |
|
| |
Least Time | Dynamic |
|
|
Aspect | Hardware LB | Software LB | Virtual LB | Cloud LB | Application LB | Network LB |
---|---|---|---|---|---|---|
Definition | Physical devices dedicated to distributing incoming network traffic across multiple servers | Applications or services that perform load balancing functions using software running on standard hardware | Operate as software instances within virtualized environments | Load balancing services provided by cloud service providers | Operate at the application layer, making routing decisions based on application-specific data | Operate at the transport layer, distributing traffic based on IP addresses and ports |
Features |
|
|
|
|
|
|
Use Cases |
|
|
|
|
|
|
Load Balancing Technique | Description | Benefits | Pros | Cons | Considerations | Use Cases |
---|---|---|---|---|---|---|
Session Persistence | Ensures that subsequent requests from a client are directed to the same backend server |
|
|
|
| Maintains user sessions by routing requests from the same client to the same server |
SSL Offloading | SSL/TLS decryption and encryption processes are offloaded from backend servers to the Load Balancer |
|
|
|
| Relieves servers from decrypting SSL/TLS traffic, enhancing performance |
Health Checks | Perform health checks to monitor the availability and status of backend servers, ensuring that traffic is only routed to healthy servers |
|
|
|
| Monitors server health, removes failed servers, ensures high availability |
Content-Based Routing | Traffic is routed based on specific content attributes, such as URL paths or headers, allowing for more granular control over traffic distribution |
|
|
|
| Routes traffic based on request content for optimized distribution |
Global Server Load Balancing (GSLB) | Technique for distributing traffic across multiple geographically dispersed data centers or points of presence (PoPs), improving performance and reliability for global users |
|
|
|
| Distributes traffic across multiple data centers for performance and availability |
Queue-based Load Balancing | Incoming requests are queued and distributed to backend servers based on predefined algorithms, such as round-robin or least connections |
|
|
|
| Regulates request rates, ensures fair workload distribution |
Dynamic Load Balancing | Adjust traffic distribution based on real-time metrics, such as server load, network latency, or user location, ensuring optimal performance and resource utilization |
|
|
|
| Scales resources dynamically based on real-time demand to optimize performance and cost |
Deployment Architecture | Description | Advantages | Disadvantages | Use Cases |
---|---|---|---|---|
Single Load Balancer | Central traffic director for all requests | Simple deployment, low cost | Single point of failure (SPOF), limited scalability |
|
Multiple Load Balancers | Multiple devices for redundancy and scalability | High availability (HA), improved scalability | Increased complexity, management overhead |
|
Active-Passive Load Balancers | One active, one passive load balancer for failover | High availability, fast failover | Passive LB underutilized, single point of failure within active LB |
|
Active-Active Load Balancers | Multiple load balancers actively handle traffic | Highest availability, excellent scalability | Most complex configuration, careful health check implementation |
|
Protocol | Layer | Description | Considerations | Load Balancing Algorithms (Common) | Typical Applications | Vendors |
---|---|---|---|---|---|---|
HTTP Load Balancing | Application (Layer 7) | Distributes incoming HTTP requests across a pool of web servers. Analyzes request content (URLs, headers, etc.) for intelligent routing |
|
|
|
|
TCP Load Balancing | Transport (Layer 4) | Distributes incoming TCP connections across a pool of servers. Operates at the transport layer without inspecting application data |
|
|
|
|
UDP Load Balancing | Transport (Layer 4) | Distributes incoming UDP datagrams across a pool of servers. Offers minimal processing overhead |
|
|
|
|
SSL/TLS Load Balancing | Application (Layer 7) | Terminates and decrypts incoming SSL/TLS connections, then forwards traffic to backend servers using another load balancing protocol (often HTTP or TCP) |
|
|
|
|
WebSocket Load Balancing | Application (Layer 7) | Distributes WebSocket connections across a pool of servers. Manages complex handshake and stateful nature of WebSockets |
|
|
|
|
MQTT Load Balancing | Application (Layer 7) | Distributes MQTT (Message Queuing Telemetry Transport) messages across a pool of message brokers. Handles topics, QoS levels, and client subscriptions |
|
|
|
|
Other Protocols | Load balancing can also be extended to support various other protocols depending on specific application needs |
|
|
|
|
- Scalability Considerations
- Choose a load balancer that supports horizontal scaling (adding more instances) and vertical scaling (increasing resource allocation per instance)
- Identify scaling triggers based on metrics like CPU, memory, or connection volume
- Consider autoscaling features that automatically adjust resources based on real-time load
- Redundancy and High Availability
- Implement redundant load balancers in an active/active or active/passive configuration
- Utilize health checks to monitor server health and automatically remove unhealthy servers from the pool
- Design for failover capabilities to seamlessly reroute traffic in case of failures
- Load Testing and Performance Tuning
- Perform load testing under various traffic scenarios (peak hours, sudden spikes)
- Analyze metrics like response times, throughput, and error rates
- Fine-tune load balancing algorithms (round robin, least connections) based on application behavior
- Regular Maintenance and Updates
- Schedule regular updates to address new features, bug fixes, and security vulnerabilities
- Implement configuration management tools for consistent and repeatable deployments
- Monitor system logs for potential issues and troubleshoot proactively
- Disaster Recovery Planning
- Design a disaster recovery plan that outlines actions for restoring the load balancing service
- Consider geographically dispersed deployments for redundancy in case of regional outages
- Test the disaster recovery plan regularly to ensure effectiveness
Rate Limiter​
- Overview
- Strategies
- Geo-Fencing
- Definition
- Workflow
- Benefits
- Granularity
Regulates incoming and outgoing traffic. By setting maximum request thresholds within specific time frames, it controls flow at various system levels, such as APIs, servers, and networks.
Core Concepts
- Request Rate: Maximum allowable requests in a set time
- Time Window: Duration for rate restriction to apply
- Tracking Requests: System monitors incoming requests, typically keeping track of the IP address or unique identifier of the requester
- Time Window & Limits: Specific time window is defined. Within this window, a limit is set on the number of allowable requests from a single source (IP address or identifier)
- Throttling & Blocking: If a requester exceeds the defined limit within the time window, their requests are throttled or blocked for a predetermined period. This essentially puts them on hold until the next window opens
- Prevent DoS Attacks: Shields servers from overload caused by excessive requests, thwarting malicious attempts to render services unavailable
- Mitigate Abusive Usage: Curbs server overload from web scrapers and data miners, promoting fair resource allocation
- Enhance Scalability & Performance: Regulating traffic flow improves system stability and performance by preventing server overload
- Protect Logins & Accounts: On login pages slows down brute-force attacks, bolstering account security by impeding rapid login attempts
- Manage API Access: Prevents resource monopolization, ensuring equitable access for all applications
Aspect | IP Address-Based Rate Limiting | User ID-Based Rate Limiting | API Key-Based Rate Limiting | Combining Granularity Levels |
---|---|---|---|---|
Definition | Restricts requests based on the source IP address | Limits requests based on user identity | Controls access by using API keys provided by the service | Allows for multiple levels of granularity to be applied simultaneously
|
Cons |
|
|
|
|
Use Cases |
|
|
|
|
- Fixed Window Counter
- Leaky Bucket
- Sliding Window
- Token Bucket
Aspect | |
---|---|
Visualization | |
Definition | Counts the number of requests within fixed time windows and compares it to a preset limit |
Process |
|
Example | Prerequisites: 1-minute window with capacity of 2
|
Functionality |
|
Pros | Easy to understand and implement |
Cons | Prone to request bursts |
Streaming APIs | Not ideal, window might miss bursts across segments |
Geo-fencing | Can be combined with IP address tracking |
Use Cases |
|
Aspect | |
---|---|
Visualization | |
Definition | Similar to the Token Bucket, but instead of tokens, it leaks requests at a constant rate |
Process |
|
Example |
Prerequisites: Capacity = 2 |
Functionality |
|
Pros | Simplicity in implementation |
Cons | Potential burstiness |
Streaming APIs | Suitable, allows for controlled bursts within segments |
Geo-fencing | Can be combined with location-based rate limit |
Use Cases |
|
Aspect | Sliding Window Counter | Sliding Window Log |
---|---|---|
Visualization | ||
Definition | Tracks request timestamps in a log to calculate the number of requests within sliding time windows | Maintains a log of timestamps for requests and slides a window over it to calculate rates |
Distinction | Maintains a counter for each event or category of events | Instead of counting occurrences, it records details or metadata of events within the window |
Process |
|
|
Example |
|
Prerequisites: Window Size = 1min, Rate Limit = 2 requests/min |
Functionality |
|
|
Pros |
|
|
Cons |
|
|
Streaming APIs | More suitable, handles bursts within window segments | |
Geo-fencing | Can be combined with dynamic window adjustments based on location | |
Use Cases |
|
|
Aspect | |
---|---|
Visualization | |
Definition | Classic algorithm that uses a token bucket to control the rate of requests |
Process |
|
Example | Prerequisites: Capacity = 5 tokens with burst allowance
|
Functionality |
|
Pros | Precise control over request rates |
Cons | Complex implementation |
Streaming APIs | Most suitable, pre-allocate tokens for expected data volume |
Geo-fencing | Can be combined with location-based token allocation |
Use Cases |
|
Approach | Description | Pros | Cons |
---|---|---|---|
IP-based | Rate limiting is applied based on the geographic location of the client IP address. Requests originating from specific regions or countries may be subject to different rate limits or access controls |
|
|
Geofencing APIs | Third-party geolocation APIs are utilized to determine the physical location of the client device or network. Rate limiting rules are then applied based on the detected location |
|
|
DNS-based | Rate limiting rules are enforced based on the DNS resolution of client requests. DNS records are analyzed to determine the geographic origin of the request, and rate limits are applied accordingly |
|
|
Geofencing Rules | Custom geofencing rules are defined based on geographical boundaries, such as countries, regions, or proximity to specific locations. Requests originating from within or outside these boundaries are subject to different rate limits or access controls |
|
|