Configure multiple provider API keys per provider to automatically failover when keys hit rate limits or encounter errors.
Basic example
on_http_request:
- type: ai-gateway
config:
providers:
- id: openai
api_keys:
- value: ${secrets.get('openai', 'key-one')}
- value: ${secrets.get('openai', 'key-two')}
How it works
When a request fails with the first key, the gateway automatically tries the next key:
1. Request arrives for gpt-4o
2. Try with openai/key-one → 429 Rate Limit
3. Automatically retry with openai/key-two → Success ✓
Keys are tried in the order they’re listed. Put your highest-capacity or preferred keys first.
Benefits
- No downtime when hitting rate limits
- Automatic failover without manual intervention
- Load distribution across multiple billing accounts
- Increased capacity by combining quotas
Three-key failover
Add more keys for additional resilience:
providers:
- id: openai
api_keys:
- value: ${secrets.get('openai', 'team-a-key')}
- value: ${secrets.get('openai', 'team-b-key')}
- value: ${secrets.get('openai', 'backup-key')}
Failover sequence:
1. Try team-a-key → Rate limited
2. Try team-b-key → Timeout
3. Try backup-key → Success ✓
Multiple providers with multiple keys
Combine multi-key and multi-provider failover for maximum resilience:
providers:
- id: openai
api_keys:
- value: ${secrets.get('openai', 'key-one')}
- value: ${secrets.get('openai', 'key-two')}
- id: anthropic
api_keys:
- value: ${secrets.get('anthropic', 'key-one')}
- value: ${secrets.get('anthropic', 'key-two')}
Failover cascade:
1. openai/key-one → Fails
2. openai/key-two → Fails
3. anthropic/key-one → Success ✓
For cross-provider failover, clients must specify multiple models in the request or use a model selection strategy.
Real-world scenario
High-traffic production application:
on_http_request:
- type: ai-gateway
config:
per_request_timeout: "30s"
total_timeout: "2m"
providers:
- id: openai
api_keys:
# Production keys with high quotas
- value: ${secrets.get('openai', 'prod-key-1')}
- value: ${secrets.get('openai', 'prod-key-2')}
- value: ${secrets.get('openai', 'prod-key-3')}
# Backup key for emergencies
- value: ${secrets.get('openai', 'emergency-key')}
This provides:
- 4 failover keys for OpenAI
- Up to 2 minutes of retry attempts
- Automatic key rotation on failures
Intelligent key selection
For smarter key selection based on runtime metrics, use api_key_selection:
on_http_request:
- type: ai-gateway
config:
providers:
- id: openai
api_keys:
- value: ${secrets.get('openai', 'key-one')}
- value: ${secrets.get('openai', 'key-two')}
- value: ${secrets.get('openai', 'key-three')}
api_key_selection:
strategy:
# Prefer keys with remaining quota
- "ai.keys.filter(k, k.quota.remaining_requests > 100)"
# Fall back to all keys
- "ai.keys"
Quota-aware selection
Route to keys with the most remaining capacity:
api_key_selection:
strategy:
# Keys with plenty of quota
- "ai.keys.filter(k, k.quota.remaining_requests > 500)"
# Keys with some quota
- "ai.keys.filter(k, k.quota.remaining_requests > 50)"
# Fall back to all keys
- "ai.keys"
Error rate-aware selection
Avoid keys that are hitting rate limits:
api_key_selection:
strategy:
# Keys with low rate limit errors
- "ai.keys.filter(k, k.error_rate.rate_limit < 0.05)"
# Keys with acceptable overall error rates
- "ai.keys.filter(k, k.error_rate.total < 0.2)"
# Fall back to all keys
- "ai.keys"
Combined strategy
Use both quota and error rate for optimal selection:
api_key_selection:
strategy:
# Best keys: high quota AND low errors
- "ai.keys.filter(k, k.quota.remaining_requests > 500 && k.error_rate.rate_limit < 0.05)"
# Good keys: decent quota
- "ai.keys.filter(k, k.quota.remaining_requests > 100)"
# Acceptable keys: low errors
- "ai.keys.filter(k, k.error_rate.total < 0.3)"
# Fall back to all keys
- "ai.keys"
Load distribution
Randomize selection to spread load across keys:
api_key_selection:
strategy:
# Randomly select from healthy keys
- "ai.keys.filter(k, k.quota.remaining_requests > 100).randomize()"
# Fall back to any key
- "ai.keys.randomize()"
Best practices
- Use at least 2-3 keys per provider for reliability
- Order keys by capacity - highest quota first
- Use different billing accounts for true isolation
- Monitor usage to identify when keys need rotation
- Rotate keys regularly for security
See also