GCP - Quotas

Actual Usage vs Quota vs Quota Limit vs Capacity

Actual Usage: This is the amount of resource you genuinely consume. For example, if your Compute Engine VM runs for 10 hours, you're charged for 10 hours of VM time. If you store 100 GB in Cloud Storage, you're charged for 100 GB of storage.
Quota: A quota is a limit on the maximum amount of a specific resource your project (or organization) can use. It's a ceiling, not a reservation or a commitment to pay. If you have the permission, you can change the quotas in the Cloud Console.
Quota Limit: You can only bump up quotas to a limit, this is a limit set by Google, and it depends on a few factors. If you want to increase the quota beyond this quota limit, you need to make a request to Google.
Capacity: this is the actual capacity in Google's data center. Quota and Quota Limit are the maximum amount of resources you can use, but it may happen that you are still within your quota but somehow Google does not have enough underlying capacity in that specific data center / zone / region, then your request may also fail due to stockout.

Google Cloud Platform (GCP) quotas are system-defined limits on the amount of a specific resource that your project (or organization) can consume. They are a fundamental part of how GCP manages its resources and helps users control their spending.

Here's a detailed explanation of how GCP Cloud Quotas work:

What are Quotas and Why Do They Exist?

Quotas are limits, not reservations or guarantees of resource availability. They serve several critical purposes:

Protecting Users from Accidental Overspending: Quotas act as a safety net. If a rogue process, a bug in your code, or unauthorized access leads to excessive resource creation or API calls, hitting a quota prevents your bill from skyrocketing uncontrollably.
Preventing Resource Exhaustion: GCP is a multi-tenant environment. Quotas ensure that no single project can consume an disproportionate amount of a shared resource, thus maintaining fair access and stability for all GCP users.
Capacity Planning for Google: Quotas help Google manage its underlying infrastructure effectively by providing insights into demand patterns and ensuring enough physical resources are available.
System Stability: By setting limits on operations (like the rate of VM creation or API calls), quotas help prevent individual projects from inadvertently causing performance degradation for shared GCP services.

Types of Quotas

GCP quotas generally fall into two main categories:

1. API Rate Quotas

Definition: These limit the number of API requests you can make to a specific Google Cloud service within a given time window (e.g., requests per second, requests per minute).
Purpose: To prevent API abuse, denial-of-service attacks, and to manage the load on Google's control plane services. * Example: "Compute Engine API calls per minute," "Cloud Storage bucket creation requests per 100 seconds."

2. Resource Quotas

Definition: These limit the total number or quantity of a specific resource that your project can provision.
Purpose: To manage the underlying physical capacity of GCP's data centers and ensure fair distribution of resources. * Example: "Number of Compute Engine VM instances in us-central1," "Total GB of SSD persistent disk in asia-east1," "Number of external IP addresses."

Scope of Quotas

Quotas can be applied at different levels:

Project-level: Most common. A quota applies to a specific project.
Folder/Organization-level: While primary resource quotas are requested at the project level, default quotas and broader restrictions can sometimes be set or influenced by Organization Policies at higher levels of the resource hierarchy.
Region-level / Zone-level: Many resource quotas are specific to a particular GCP region (e.g., number of VMs in europe-west1) or even a zone within a region (e.g., number of GPUs in us-central1-a). Some are global (e.g., number of projects you can create).

How Quotas Are Enforced

When you make an API call or attempt to provision a resource:

Real-time Check: GCP's quota system checks against your project's current usage and the defined quotas for that specific resource/API and location.
Decision:
- If your usage is within the quota, the request proceeds.
- If your usage would exceed the quota, the request is denied.
Error Message: You (or your application) will receive an error message, typically an HTTP 429 (Too Many Requests) for rate limits or a QUOTA_EXCEEDED error code for resource limits.

Important Note: Quotas are not a billing mechanism. You are only charged for the actual resources you consume, up to your quota limit. For example, if your project has a quota of 20 VMs, but you only run 5 VMs, you are only billed for the 5 VMs you use, not the capacity of 20. The quota simply prevents you from creating the 6th VM without an increase.

Managing Quotas

You can manage your quotas primarily through the Google Cloud Console or the gcloud CLI.

Viewing Quotas:
- Cloud Console: Navigate to IAM & Admin > Quotas. Here you can filter by service, metric, region, and see your current usage against your limits.
- gcloud CLI: Use commands like gcloud compute project-info describe --project=YOUR_PROJECT_ID or gcloud services quota list --service=SERVICE_NAME for more specific information.
Requesting Quota Increases:
- Most quotas are soft limits and can be increased by submitting a request to Google Cloud support.
- Process (via Cloud Console):
  1. Go to IAM & Admin > Quotas.
  2. Select the desired service, metric, and region.
  3. Click "EDIT QUOTAS" (or "REQUEST INCREASE").
  4. Fill out the request form, providing a clear and detailed justification for why you need the increase (e.g., "Deploying 10 new instances for a production application," "Anticipating 50% traffic increase next month"). This justification is crucial for approval.
  5. Submit the request. Google Cloud support reviews these requests, and approval can take anywhere from a few minutes to several business days, depending on the complexity and resource availability.
- Reasons for Denial: Requests might be denied if there isn't sufficient capacity in the requested region/zone, if the justification is unclear, or if there's suspicious activity on the account.
Non-Adjustable Quotas (Hard Limits): Some quotas are hard limits and cannot be increased (e.g., the number of projects you can create under an organization, or certain physical hardware constraints). In these cases, you might need to rethink your architecture or contact your Google Cloud sales representative for specialized solutions.

Best Practices for Quota Management

Proactive Planning: Don't wait until you hit a quota limit in production. Plan your resource needs in advance and request increases well before they are critical.
Monitor Quota Usage: Set up Cloud Monitoring alerts to notify you when your usage approaches a quota limit (e.g., 80% or 90% utilization).
Understand Default Quotas: Familiarize yourself with the default quotas for the services you use in each region.
Justify Thoroughly: Provide comprehensive business justifications for all quota increase requests.
Clean Up Unused Resources: Periodically delete resources you no longer need to free up quota.
Consider Committed Use Discounts (CUDs): While not quotas, CUDs allow you to commit to a certain usage level for a discount. This is a billing commitment, distinct from a quota limit.