Throttling vs Load Shedding

Throttling and Load Shedding are not the same, though they are closely related tools in the "Reliability" toolbox.

Think of Throttling as a Speed Limit (preventative) and Load Shedding as an Ejection Seat (emergency).

Throttling (The Governor)

Throttling is the intentional restriction of the rate of requests. It is usually based on a pre-defined contract or quota.

Goal: Fairness and resource allocation.
When it happens: When a specific user or service exceeds their "fair share" (e.g., 100 requests per second).
Action: The system usually returns an HTTP 429 (Too Many Requests).
Analogy: A cell phone plan that slows your data down to 3G speeds once you hit 10GB. The network is fine; you just hit your limit.

Load Shedding is the intentional dropping of requests because the server is overwhelmed and at risk of crashing.

Goal: System survival and preventing "death spirals."
When it happens: When the server’s CPU is at 95%, its memory is full, or its request queue is backed up. The server decides it cannot safely handle more work.
Action: The system usually returns an HTTP 503 (Service Unavailable).
Analogy: A crowded nightclub that stops letting anyone in (even VIPs) because the building is at maximum fire-code capacity and the floor might collapse.

Feature	Throttling	Load Shedding
Trigger	User-specific quotas / API limits.	System-wide resource exhaustion (CPU/RAM).
Focus	Who is sending the request?	Can I handle any more work?
Latency	Usually stays low (rejected early).	Often preceded by high latency (the "warning").
Priority	Ignores priority (contract based).	Can be Priority-aware (keep high-value traffic).
Outcome	"You've used too much."	"I'm about to break."

In a well-designed distributed system, a request passes through both layers:

First, Throttling (The Gatekeeper): The Load Balancer checks if User A has exceeded their quota. If yes, it throttles them. This protects the backend from one single "noisy" user.
Next, Load Shedding (The Safety Valve): Even if no single user is over their quota, if 1,000,000 "legal" users show up at once, the backend will start Load Shedding to stay alive. It will drop the "cheap" or "low priority" requests (like background analytics) to save the "expensive" or "high priority" ones (like checkout/payment).