Distributed Systems - Backpressure

In a distributed system, backpressure is the mechanism by which a system component (the "Downstream" or Receiver) tells an upstream component (the "Upstream" or Sender) to slow down because it can no longer process data at the incoming rate.

Without backpressure, the downstream component eventually suffers from resource exhaustion: it runs out of memory (RAM) while buffering requests, its CPU thrashes, or it crashes entirely, leading to cascading failures across the entire system.

The Core Problem: Producer > Consumer

Imagine a Fast Producer sending 1,000 messages/sec to a Slow Consumer that can only process 500 messages/sec.

No Backpressure: The consumer tries to buffer the extra 500 messages in RAM. Eventually, the consumer hits an OOM (Out of Memory) error and dies.
With Backpressure: The consumer tells the producer: "Stop sending so much; my buffer is 80% full." The producer then buffers the data locally or slows down its own upstream.

1. Server-Side Techniques (The Defense)

The server (downstream) must protect its own resources.

A. Server-Side Throttling (Policy-Based)

This is an intentional "Speed Limit" based on Quotas.

The Technique: The server tracks how many requests a specific client (User A) is sending.
The Action: If User A exceeds their "Fair Share" (e.g., 100 requests/sec), the server returns an HTTP 429 (Too Many Requests).
Goal: To prevent a "Noisy Neighbor" from hogging all the resources and affecting other users.

B. Load Shedding (Dropping)

When the server is overwhelmed, it simply rejects new requests immediately with an error (like HTTP 503 Service Unavailable or 429 Too Many Requests).

Why: It is better to serve 1,000 users successfully and reject 500, than to try to serve 1,500 and crash for everyone.
Example: Google Front Ends (GFE) and Cloud Load Balancers do this automatically when backends are unhealthy.

C. Reactive Pull (Demand-based)

Instead of the server having data "pushed" to it, the server "pulls" only what it can handle.

Example: In RSocket or Java Reactive Streams, the consumer sends a request(n) signal to the producer saying, "Send me exactly 10 more items." The producer is forbidden from sending the 11th item until the consumer asks for more.

D. Internal Bounded Queues

Every internal buffer must have a maximum size.

Technique: When the queue is full, the server can either:
1. Block: Stop reading from the network socket (which triggers TCP backpressure).
2. Drop Newest: Reject the incoming request.
3. Drop Oldest: Throw away the oldest item in the queue to make room for the new one (common in real-time metrics).

2. Client-Side Techniques

A "good citizen" client detects that the server is struggling and adjusts its behavior.

A. Client-Side Throttling (Rate Limiting)

The client limits itself before it even makes a call.

The Technique: The client maintains its own "Token Bucket." If it knows the downstream service only allows 50 requests/sec, it queues or delays its own outgoing requests locally.
Goal: To avoid wasting network bandwidth and processing power on requests that it knows will be rejected with a 429.

B. Exponential Backoff

When a client receives an error (like a 429 or 503), it shouldn't retry immediately. It should wait: 1s, then 2s, then 4s, then 8s.

Why: This gives the server "breathing room" to recover from a spike.

C. Jitter (Randomness)

If 1,000 clients all wait exactly 2 seconds and then retry, they will hit the server at the exact same time, causing a "Thundering Herd" problem.

Technique: Add a random delay (e.g., wait between 1.5s and 2.5s). This spreads the load out over time.

D. Circuit Breaking

If the server is consistently failing, the client "trips the circuit."

How it works: For a set period (e.g., 30 seconds), the client doesn't even attempt to call the server. It returns a local error or a cached value immediately.
Why: This prevents the client from wasting its own threads waiting for a server that is clearly down.

3. The Buffer Layer: External Queues / PubSub (The "Storage Tank")

When the Producer and Consumer run at vastly different speeds, you insert a "Shock Absorber" between them.

The Solution: Use a managed service like GCP Pub/Sub, Kafka, or Cloud Tasks.
How it handles Backpressure:
1. Decoupling: The Producer dumps data into the Queue and moves on.
2. Reactive Pull: The Consumer only "Pulls" messages when it has free internal capacity.
3. Accumulation: If the Consumer is slow, messages safely build up in the External Queue (which has massive storage) rather than in the Consumer's fragile RAM.
The Limit: Backpressure still exists here! If the External Queue hits its storage limit (e.g., 7 days of retention or 10TB of data), it will finally apply backpressure to the Producer by rejecting new messages.

4. Protocol-Level Backpressure

Some of the most effective backpressure happens automatically in the networking stack.

TCP Windowing (The Original Backpressure)

TCP uses a "Receive Window". The receiver tells the sender exactly how many bytes of data it can fit in its buffer.

If the receiver's application is slow, the buffer fills up.
The receiver sends a packet with a Window Size: 0.
The sender's OS kernel physically stops sending packets until the window opens up again.

HTTP/2 and gRPC Flow Control

Unlike HTTP/1.1 (which was one request per connection), HTTP/2 multiplexes many streams over one connection.

The Problem: One slow stream shouldn't block other fast streams.
The Solution: HTTP/2 implements Stream-level Flow Control. The receiver can tell the sender to "pause" Stream A while continuing to send data on Stream B.

Backpressure Management Toolbox

Backpressure management is a core pillar of "Cloud Native" architecture. Because cloud environments are dynamic and prone to "noisy neighbor" effects, several projects and managed services have emerged specifically to handle traffic overflow.

Here are the top open-source projects and cloud-managed solutions for dealing with backpressure.

1. Open Source Cloud-Native Projects

A. Envoy Proxy & Istio (Service Mesh)

Envoy (the data plane) and Istio (the control plane) are the gold standards for backpressure in microservices.

How they handle it: They use Sidecars to intercept all traffic.
Key Features:
- Circuit Breaking: Automatically stops sending traffic to a failing service.
- Outlier Detection: Removes "sick" instances from the load-balancing pool.
- Rate Limiting: Global and local throttling of users.
- Adaptive Concurrency: Dynamically limits how many requests a service can handle based on current latency.

B. Cilium (eBPF-based)

Connecting back to your previous interest in eBPF/XDP, Cilium is a networking and security project that handles backpressure at the Kernel level.

How it handles it: It uses eBPF to monitor the health of the networking stack.
Key Feature: If a node is overwhelmed, Cilium can perform Load Shedding at the XDP/NIC layer, dropping packets before they even consume CPU cycles in the application or the standard Linux network stack.

C. Alibaba Sentinel

Sentinel is a powerful, lightweight "flow control" library designed specifically for microservices.

Key Features:
- Flow Control: Throttling based on QPS (queries per second) or thread count.
- Load Shedding: If the system’s "System Load" (CPU/Wait) is too high, it automatically rejects new traffic.
- Hot Parameter Isolation: Throttles specific "hot" users or products (e.g., a specific "deal of the day" item) without affecting other traffic.

D. NATS & Apache Kafka (Pull-based Messaging)

While not "backpressure tools" in the traditional sense, they are the architectural solution to it.

The "Pull" Model: Unlike HTTP (Push), these systems require the consumer to Pull data. This is the ultimate backpressure: a consumer simply doesn't ask for a new message until it has finished the previous one.

2. Managed Solutions on GCP (Google Cloud)

GCP handles backpressure primarily through its global infrastructure and serverless offerings.

Cloud Load Balancing (Global Shedding): Google’s GFE (Google Front End) uses Cascading Load Balancing. If the us-east1 region is at capacity, the load balancer will automatically "shed" that load to us-central1 or europe-west1 rather than letting the original region crash.
Cloud Run (Max Concurrency): This is a built-in backpressure setting. You can set max-concurrency (e.g., 80). If more than 80 requests hit one container, Cloud Run will automatically queue them or spin up a new container. If you hit your max-instances limit, it will return a 429 (Throttled).
Cloud Pub/Sub: The ultimate "Shock Absorber." It can ingest millions of messages per second and store them for up to 7 days, allowing your backends to process them at their own pace.

3. Managed Solutions on AWS (Amazon Web Services)

AWS provides similar tools but with a heavy focus on the API Gateway and SQS.

Amazon API Gateway (Throttling): You can set "Usage Plans" and "Throttling" at the API level. It uses a Token Bucket algorithm to ensure clients don't exceed their allocated rate, returning a 429 before the request ever reaches your Lambda or EC2.
AWS Lambda (Reserved Concurrency): You can limit a specific function to, say, 100 concurrent executions. This prevents one Lambda from "eating" your entire account's concurrency limit and protects your downstream databases from being overwhelmed.
Amazon SQS (Simple Queue Service): Similar to Pub/Sub, SQS acts as the buffer. AWS also offers FIFO Queues to ensure that backpressure is applied while maintaining strict ordering.

Recommendation

Envoy will teach you about Layer 7 backpressure (retries, circuit breaking).
Cilium will teach you about Layer 3/4 backpressure (packet dropping and kernel-level protection).

By combining an External Queue (Pub/Sub) with Service Mesh Sidecars (Envoy), you create a system that is almost impossible to crash via overflow.