HTTP Error 429 (Too Many Requests) – How to Fix
HTTP Error 429 Explained:
HTTP 429 Too Many Requests means you’ve exceeded the API’s rate limit. Solutions include implementing exponential backoff, respecting Retry-After headers, using request queuing, caching responses, and distributing load over time.
| Question | Answer |
|---|---|
| What does 429 mean? | You’ve exceeded the API’s rate limit and need to slow down requests. |
| How long should I wait? | Check Retry-After header. If absent, use exponential backoff starting at 1 second. |
| Should I retry immediately? | No. Always wait before retrying, following Retry-After guidance. |
| Can I prevent 429 errors? | Yes, by implementing client-side rate limiting and request queuing. |
| What’s exponential backoff? | A retry strategy where wait time doubles: 1s, 2s, 4s, 8s, 16s. |
| Is caching a good solution? | Yes, for GET requests. Cache responses to reduce API calls. |
Table of Contents
When an API starts throwing 429s in production, it’s a reliability signal. Understanding rate limits early in your testing and monitoring workflows helps teams prevent user-facing downtime before it happens.
The 429 Too Many Requests error code (or HTTP error 429) means you’ve hit the API’s rate limit, where the server temporarily blocks requests to protect itself from overload. This guide shows you how to understand, prevent, and handle 429 errors with practical retry strategies and code examples.
What is HTTP 429 Too Many Requests?
HTTP 429 Too Many Requests is a client error status code indicating you’ve exceeded the allowed request rate. It’s part of the 4xx family of status codes that signal client-side problems rather than server errors.
Rate limiting acts as traffic control. Just as highways have speed limits, APIs enforce request limits to maintain stability for all users. When you exceed these limits, the server responds with 429 instead of processing your request.
Rate limiting helps protect backend CPU and network capacity from DDoS attacks and excessive HTML scraping, while also defending against bots, brute-force attacks, and repeated login attempts that can overload servers or degrade response times.
Anatomy of a 429 response
A properly formatted 429 response includes headers that tell you how to handle the situation:
HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1699564800
Content-Type: application/json
{
"error": "rate_limit_exceeded",
"message": "You have exceeded the rate limit of 100 requests per minute",
"retry_after": 60
}
Key response headers include:
-
Retry-After→ seconds to wait before retrying -
X-RateLimit-Limit→ maximum requests allowed -
X-RateLimit-Remaining→ requests left in current window -
X-RateLimit-Reset→ Unix timestamp when quota refreshes
Header names vary by API. Some use RateLimit-* without the X- prefix, while others use custom conventions. Always check the API documentation.
Common causes of 429 errors
Exceeding request rate limits
Sending too many requests in a short period is the most common cause. This happens with loops that fetch data for multiple resources without pacing. For example, retrieving user data for 1,000 users in a tight loop quickly triggers rate limits.
Burst traffic patterns also cause problems. Even if your average request rate stays within limits, sudden spikes can exceed per-second thresholds.
Concurrent request limits
Some APIs limit the number of simultaneous connections, not just the frequency of requests. Making 50 parallel requests might violate concurrent limits even if your per-minute rate is acceptable.
This becomes problematic in distributed systems where multiple servers share API credentials. Each component individually stays reasonable, but collectively they overwhelm the concurrent request threshold.
Endpoint-specific limits
Write operations, such as POST, PUT, and DELETE, often have stricter limits than read operations. Search and query endpoints tend to impose lower limits because they’re resource-intensive. An API might allow 1,000 GET requests per hour but only 100 POST requests.
Aggressive retry logic
Poorly implemented retry logic often makes rate limiting worse. Applications that immediately retry failed requests create retry storms that amplify the problem. Each retry consumes another request from your quota.
How rate limiting works
APIs use different rate-limiting algorithms:
-
Fixed window limits reset at specific intervals. For example, 100 requests per minute resets at the top of each minute. This can cause traffic spikes at reset boundaries.
-
Sliding window limits track requests over rolling time periods, calculating your rate at any moment based on the past 60 seconds. This provides smoother traffic distribution but requires more complex calculations.
-
Token bucket algorithms offer the most flexibility. Tokens are added at a constant rate, and each request consumes one token. The bucket has a maximum capacity, allowing brief bursts when tokens have accumulated. AWS API Gateway, Stripe, and many modern APIs use token bucket algorithms.
APIs apply rate limits at different scopes: per API key, per user account, per IP address, per endpoint, or globally across all users. Many APIs track limits per user or IP address, measuring the number of requests within a time window.
Diagnosing 429 errors
When you encounter a 429 error, start by extracting response headers:
fetch('https://api.example.com/users')
.then(response => {
if (response.status === 429) {
console.log('Rate Limit:', response.headers.get('X-RateLimit-Limit'));
console.log('Remaining:', response.headers.get('X-RateLimit-Remaining'));
console.log('Retry After:', response.headers.get('Retry-After'));
}
return response.json();
});
Review each API’s documentation to understand its published rate limits, which can vary widely by provider and endpoint. Many APIs specify limits for authenticated versus unauthenticated requests, and some offer burst capacity for short periods of higher traffic. Always check the official documentation for current rate-limit policies and recommended retry strategies.
Log timestamps of your API calls to identify patterns and calculate your request rate over different windows. You might think you’re sending 50 requests per minute, but bursts could spike to 150 requests in 10 seconds.
How to handle 429 errors
Whether you’re debugging in Python, JavaScript, or any other language, it’s important to optimize how your client retries.
Implement exponential backoff
Exponential backoff is the industry-standard retry strategy. Instead of retrying immediately, you progressively increase wait times: 1 second, 2 seconds, 4 seconds, 8 seconds. This gives servers time to recover while avoiding retry storms.
These retry strategies are best defined and tested collaboratively before release. In Postman, you can simulate 429 responses across environments to validate that your client logic behaves correctly under throttling conditions.
async function fetchWithRetry(url, maxRetries = 5) {
let retries = 0;
while (retries < maxRetries) {
const response = await fetch(url);
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After');
const waitTime = retryAfter
? parseInt(retryAfter) * 1000
: Math.pow(2, retries) * 1000;
console.log(`Rate limited. Waiting ${waitTime}ms before retry ${retries + 1}`);
await new Promise(resolve => setTimeout(resolve, waitTime));
retries++;
continue;
}
return response;
}
throw new Error('Max retries exceeded');
}
This checks for Retry-After headers first, using server guidance when available. If missing, it falls back to exponential backoff.
Adding jitter improves the approach. When many clients hit limits simultaneously with identical backoff, they synchronize retries and create thundering herd problems. Jitter introduces randomness:
function calculateBackoff(retryCount, baseDelay = 1000, maxDelay = 32000) {
const exponentialDelay = Math.min(baseDelay * Math.pow(2, retryCount), maxDelay);
const jitter = Math.random() * exponentialDelay * 0.1;
return exponentialDelay + jitter;
}
Respect Retry-After headers
Always check Retry-After headers before implementing your own backoff. Servers know their rate limit windows better than client-side algorithms. The header can contain either seconds to wait or an HTTP date:
import time
import requests
def make_request_with_retry(url):
response = requests.get(url)
if response.status_code == 429:
retry_after = response.headers.get('Retry-After')
if retry_after:
try:
wait_seconds = int(retry_after)
except ValueError:
wait_seconds = 60
time.sleep(wait_seconds)
return make_request_with_retry(url)
return response
Implement request queuing
Queue requests and process them at a controlled rate. This proactive approach prevents 429 errors instead of reacting to them:
class RateLimitedQueue {
constructor(requestsPerSecond) {
this.requestsPerSecond = requestsPerSecond;
this.queue = [];
this.lastRequestTime = 0;
}
async enqueue(requestFn) {
return new Promise((resolve, reject) => {
this.queue.push({ requestFn, resolve, reject });
this.processQueue();
});
}
async processQueue() {
if (this.queue.length === 0) return;
const now = Date.now();
const minInterval = 1000 / this.requestsPerSecond;
const timeSinceLastRequest = now - this.lastRequestTime;
if (timeSinceLastRequest < minInterval) {
setTimeout(() => this.processQueue(), minInterval - timeSinceLastRequest);
return;
}
const { requestFn, resolve, reject } = this.queue.shift();
this.lastRequestTime = Date.now();
try {
const result = await requestFn();
resolve(result);
} catch (error) {
reject(error);
}
if (this.queue.length > 0) {
setTimeout(() => this.processQueue(), minInterval);
}
}
}
Batch requests when possible
Many APIs offer batch endpoints. Instead of 100 individual calls, make one batch request:
// Instead of 100 API calls
for (const userId of userIds) {
await fetch(`https://api.example.com/users/${userId}`);
}
// Single batch request
await fetch('https://api.example.com/users/batch', {
method: 'POST',
body: JSON.stringify({ user_ids: userIds })
});
Cache responses appropriately
Caching reduces API calls for data that doesn’t change frequently:
const cache = new Map();
const CACHE_DURATION = 5 * 60 * 1000;
async function fetchWithCache(url) {
const cached = cache.get(url);
if (cached && Date.now() - cached.timestamp < CACHE_DURATION) {
return cached.data;
}
const response = await fetch(url);
const data = await response.json();
cache.set(url, { data, timestamp: Date.now() });
return data;
}
Preventing 429 errors
Implement proactive rate limiting
For organizations managing APIs at scale, rate limiting serves as both traffic control and a key component of a larger API governance and observability strategy, ensuring fair usage and consistent reliability across teams.
Client-side rate limiting prevents sending requests too quickly:
class RateLimiter {
constructor(maxRequests, windowMs) {
this.maxRequests = maxRequests;
this.windowMs = windowMs;
this.requests = [];
}
async acquire() {
const now = Date.now();
this.requests = this.requests.filter(time => now - time < this.windowMs);
if (this.requests.length >= this.maxRequests) {
const oldestRequest = this.requests[0];
const waitTime = this.windowMs - (now - oldestRequest);
await new Promise(resolve => setTimeout(resolve, waitTime));
return this.acquire();
}
this.requests.push(now);
}
}
const limiter = new RateLimiter(100, 60000);
async function makeRequest(url) {
await limiter.acquire();
return fetch(url);
}
If you’re running an API-driven site on WordPress or using a shared hosting provider or CDN, you may hit these limits faster depending on your hosting plan.
Monitor your usage
Track request counts and set up alerts at 80% of rate limits:
class RateLimitMonitor {
constructor(limit, alertThreshold = 0.8) {
this.limit = limit;
this.alertThreshold = alertThreshold;
this.requestCount = 0;
}
recordRequest() {
this.requestCount++;
if (this.requestCount / this.limit >= this.alertThreshold) {
console.warn(`Rate limit usage: ${(this.requestCount / this.limit * 100).toFixed(1)}%`);
}
}
}
Rate limiting in the real world
Different APIs communicate rate limits in different ways. Some use headers like Retry-After or X-RateLimit-Remaining while others document their limits in dashboards or developer portals. The key is to detect these signals and build your client logic around them. For instance, you can inspect the headers in your test calls within Postman to understand how the API enforces throttling.
Major platforms such as Microsoft Graph, GitHub, and OpenAI’s ChatGPT API implement strict rate limiting to ensure fair access.
Simulating rate limits before deployment
Rate limits, like tests and monitors, are part of a healthy API lifecycle. You can use Postman to better understand, simulate, and improve those limits before your APIs reach production.
Create a collection with test scripts that handle 429 responses:
pm.test('Handle rate limit responses', function () {
if (pm.response.code === 429) {
const retryAfter = pm.response.headers.get('Retry-After') || 5;
const retryCount = pm.environment.get('retryCount') || 0;
console.log(`Rate limited (attempt ${retryCount + 1})`);
pm.environment.set('retryCount', retryCount + 1);
} else {
pm.response.to.have.status(200);
pm.environment.set('retryCount', 0);
}
});
const limit = pm.response.headers.get('X-RateLimit-Limit');
const remaining = pm.response.headers.get('X-RateLimit-Remaining');
console.log(`Rate limit: ${remaining}/${limit} remaining`);
Use Collection Runner with high iteration counts to trigger rate limits. Set the delay between requests to 0ms to create burst traffic that tests your retry logic.
Common mistakes to avoid
Ignoring Retry-After headers
// Wrong
if (response.status === 429) {
await sleep(5000);
}
// Correct
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After') || 60;
await sleep(parseInt(retryAfter) * 1000);
}
Implementing aggressive retry logic
// Wrong - creates retry storm
for (let i = 0; i < 100; i++) {
try {
return await makeRequest();
} catch (error) {
continue;
}
}
// Correct - uses exponential backoff
for (let i = 0; i < 5; i++) {
try {
return await makeRequest();
} catch (error) {
if (error.status === 429) {
await sleep(Math.pow(2, i) * 1000);
}
}
}
Not monitoring usage
Track rate limit headers proactively. Don’t wait until hitting 429 errors to discover problems.
Retrying non-idempotent operations
Use idempotency keys for POST/PUT/PATCH requests:
async function retryRequest(method, url, data) {
if (method === 'GET') {
return retryWithBackoff(() => fetch(url));
}
const idempotencyKey = generateUniqueId();
return fetch(url, {
method,
headers: {
'Idempotency-Key': idempotencyKey,
'Content-Type': 'application/json'
},
body: JSON.stringify(data)
});
}
Final thoughts
HTTP 429 Too Many Requests errors don’t have to cause application failures. Understanding rate-limiting mechanics, implementing exponential backoff with appropriate retry logic, and proactively monitoring your usage result in resilient integrations that handle throttling gracefully.
The main practices that are essential for a reliable API integration include: paying attention to Retry-After headers, using exponential backoff with random delays, managing request queues to control the rate in advance, grouping operations when possible, storing responses correctly, and keeping an eye on usage patterns before reaching limits
By testing rate limit scenarios directly in Postman, whether manually or as part of your CI/CD pipeline, you can catch throttling risks before users do. It’s another way to make sure your APIs are not only functional, but also resilient and ready for production.

What do you think about this topic? Tell us in a comment below.