Distributed System Design - Throttling

Throttling is a crucial technique in distributed systems to control the rate at which a client can access a service or resource. It prevents abuse, ensures fair usage, protects backend systems from being overwhelmed, and maintains service availability. Throttling methods can be implemented on either the server side or the client side, or often a combination of both for robust control.

Server-Side Throttling Methods

Server-side throttling mechanisms are implemented by the service provider to protect their infrastructure and ensure quality of service for all users.

1. Rate Limiting:

Description: Limits the number of requests a client can make within a specific time window (e.g., 100 requests per minute). If the limit is exceeded, subsequent requests are rejected, often with an HTTP 429 Too Many Requests status code.
Common Implementations:
- Fixed Window: Simplest, but can have a "bursty" problem at the window edges.
- Sliding Log: Tracks individual request timestamps. More accurate but resource-intensive.
- Sliding Window Counter: Divides the time into smaller windows, combining request counts. Good balance of accuracy and efficiency.
- Token Bucket/Leaky Bucket Algorithms: Allows for bursts up to a certain size but limits the long-term rate.
Identification: Often based on IP address, API key, user ID, or client certificate.

2. Concurrency Limiting:

Description: Restricts the maximum number of simultaneous or active requests a server will process from a single client or across all clients.
Use Case: Prevents a single client from monopolizing server resources (e.g., database connections, CPU threads).

3. Resource Quotas:

Description: Limits the total amount of a specific resource a client can consume over a longer period (e.g., total data transferred per month, total storage used, total number of API calls per day).
Use Case: Managing billing, preventing long-term resource exhaustion, and controlling capacity.

4. Bandwidth Limiting (Traffic Shaping):

Description: Restricts the total network throughput (data transfer rate) that a client can utilize.
Use Case: Ensures fair sharing of network capacity, especially for file transfers or streaming services.

5. Load Shedding:

Description: During extreme overload conditions, the server deliberately drops a percentage of incoming requests (or rejects requests from lower-priority clients) to maintain stability for the remaining requests.
Use Case: Last resort to prevent a complete system collapse, ensuring some requests succeed rather than all failing.

6. Circuit Breaker (for Downstream Services):

Description: While primarily for calling downstream services, a server can implement a circuit breaker pattern internally. If a particular downstream dependency (e.g., a database, another microservice) starts failing or slowing down, the circuit breaker opens, causing the server to immediately fail requests that depend on that service, rather than waiting for timeouts.
Use Case: Prevents cascading failures when a dependency is unhealthy.

7. Prioritization/Queueing:

Description: Requests are placed into queues, and the server processes them based on priority. Higher-priority clients or requests (e.g., paid users, critical transactions) are processed first.
Use Case: Ensures essential services or users get preferential treatment during peak loads.

8. Authentication and Authorization:

Description: While not strictly "throttling," robust authentication and authorization ensure that only legitimate and authorized clients can make requests, indirectly reducing unwanted traffic and potential abuse.

Client-Side Throttling Methods

Client-side throttling mechanisms are implemented by the client application to avoid overwhelming the server, reduce errors, and ensure a better user experience. These methods often complement server-side throttling.

1. Exponential Backoff with Jitter:

Description: When a client receives an error (e.g., 429 Too Many Requests, 503 Service Unavailable) or experiences a timeout, it waits for an increasingly longer period before retrying the request. "Jitter" (adding a random delay) helps prevent all clients from retrying simultaneously, creating a "thundering herd" problem.
Use Case: Robust error handling and graceful degradation, preventing the client from continuously hammering an overloaded server.

2. Token Bucket/Leaky Bucket Algorithms (Client-side Implementation):

Description: The client itself maintains a virtual "bucket" of tokens. A token is required for each request. Tokens are added to the bucket at a fixed rate, and the bucket has a maximum capacity. If the bucket is empty, the client waits until a new token is available.
Use Case: Adhering to known server-side rate limits proactively to avoid receiving 429 errors.

3. Caching:

Description: The client stores frequently accessed data locally. Before making a request to the server, the client checks its cache. If the data is available and fresh, it uses the cached version, reducing the number of requests to the server.
Use Case: Reduces load on the server for read-heavy workloads, improves client-side performance.

4. Batching Requests:

Description: Instead of sending multiple individual requests, the client combines several operations into a single, larger request (if the API supports it).
Use Case: Reduces the overall number of API calls, particularly useful for updating multiple items or fetching related data.

5. Circuit Breaker (for Upstream Services):

Description: The client maintains a "circuit" for each upstream service. If calls to a service start failing or timing out consistently, the circuit "opens," causing subsequent calls to immediately fail without even attempting to contact the server. After a timeout, it goes into a "half-open" state, allowing a few test requests to see if the service has recovered.
Use Case: Prevents the client from wasting resources on an unavailable server and provides faster failure responses.

6. Debouncing and Throttling Input:

Description: Common in UI development:
- Debouncing: Ensures a function is only called after a certain period of inactivity (e.g., only call search API after user stops typing for 300ms).
- Throttling: Ensures a function is called at most once within a specified period (e.g., only update position once every 100ms during a drag operation).
Use Case: Reduces the number of requests generated by frequent user interactions.