Multi-Key Failover

Configure multiple provider API keys per provider to automatically failover when keys hit rate limits or encounter errors.

Basic example

on_http_request:
  - type: ai-gateway
    config:
      providers:
        - id: openai
          api_keys:
            - value: ${secrets.get('openai', 'key-one')}
            - value: ${secrets.get('openai', 'key-two')}

How it works

When a request fails with the first key, the gateway automatically tries the next key:

Request arrives for gpt-4o
Try with openai/key-one → 429 Rate Limit
Automatically retry with openai/key-two → Success ✓

Keys are tried in the order they’re listed. Put your highest-capacity or preferred keys first.

Benefits

No downtime when hitting rate limits
Automatic failover without manual intervention
Load distribution across multiple billing accounts
Increased capacity by combining quotas

Three-key failover

Add more keys for additional resilience:

providers:
  - id: openai
    api_keys:
      - value: ${secrets.get('openai', 'team-a-key')}
      - value: ${secrets.get('openai', 'team-b-key')}
      - value: ${secrets.get('openai', 'backup-key')}

Failover sequence:

Try team-a-key → Rate limited
Try team-b-key → Timeout
Try backup-key → Success ✓

Multiple providers with multiple keys

Combine multi-key and multi-provider failover for maximum resilience:

on_http_request:
  - type: ai-gateway
    config:
      only_allow_configured_providers: true

      providers:
        - id: openai
          api_keys:
            - value: ${secrets.get('openai', 'key-one')}
            - value: ${secrets.get('openai', 'key-two')}
        
        - id: anthropic
          api_keys:
            - value: ${secrets.get('anthropic', 'key-one')}
            - value: ${secrets.get('anthropic', 'key-two')}
      
      model_selection:
        strategy:
          - "ai.models.filter(m, m.provider_id == 'openai')"
          - "ai.models.filter(m, m.provider_id == 'anthropic')"

Failover cascade:

openai/key-one → Fails
openai/key-two → Fails
anthropic/key-one → Success ✓

For cross-provider failover, clients must specify multiple models in the request or use a model selection strategy. Providers are tried in alphabetical order by default; use model_selection.strategy to specify a custom order.

Real-world scenario

High-traffic production application:

on_http_request:
  - type: ai-gateway
    config:
      per_request_timeout: "30s"
      total_timeout: "2m"
      
      providers:
        - id: openai
          api_keys:
            # Production keys with high quotas
            - value: ${secrets.get('openai', 'prod-key-1')}
            - value: ${secrets.get('openai', 'prod-key-2')}
            - value: ${secrets.get('openai', 'prod-key-3')}
            # Backup key for emergencies
            - value: ${secrets.get('openai', 'emergency-key')}

This provides:

4 failover keys for OpenAI
Up to 2 minutes of retry attempts
Automatic key rotation on failures

Intelligent key selection

For smarter key selection based on runtime metrics, use api_key_selection:

on_http_request:
  - type: ai-gateway
    config:
      providers:
        - id: openai
          api_keys:
            - value: ${secrets.get('openai', 'key-one')}
            - value: ${secrets.get('openai', 'key-two')}
            - value: ${secrets.get('openai', 'key-three')}
      
      api_key_selection:
        strategy:
          # Prefer keys with remaining quota
          - "ai.keys.filter(k, k.quota.remaining_requests > 100)"
          # Fall back to all keys
          - "ai.keys"

Quota-aware selection

Route to keys with the most remaining capacity:

api_key_selection:
  strategy:
    # Keys with plenty of quota
    - "ai.keys.filter(k, k.quota.remaining_requests > 500)"
    # Keys with some quota
    - "ai.keys.filter(k, k.quota.remaining_requests > 50)"
    # Fall back to all keys
    - "ai.keys"

Error rate-aware selection

Avoid keys that are hitting rate limits:

api_key_selection:
  strategy:
    # Keys with low rate limit errors
    - "ai.keys.filter(k, k.error_rate.rate_limit < 0.05)"
    # Keys with acceptable overall error rates
    - "ai.keys.filter(k, k.error_rate.total < 0.2)"
    # Fall back to all keys
    - "ai.keys"

Combined strategy

Use both quota and error rate for optimal selection:

api_key_selection:
  strategy:
    # Best keys: high quota AND low errors
    - "ai.keys.filter(k, k.quota.remaining_requests > 500 && k.error_rate.rate_limit < 0.05)"
    # Good keys: decent quota
    - "ai.keys.filter(k, k.quota.remaining_requests > 100)"
    # Acceptable keys: low errors
    - "ai.keys.filter(k, k.error_rate.total < 0.3)"
    # Fall back to all keys
    - "ai.keys"

Load distribution

Randomize selection to spread load across keys:

api_key_selection:
  strategy:
    # Randomly select from healthy keys
    - "ai.keys.filter(k, k.quota.remaining_requests > 100).randomize()"
    # Fall back to any key
    - "ai.keys.randomize()"

Best practices

Use at least 2-3 keys per provider for reliability
Order keys by capacity - highest quota first
Use different billing accounts for true isolation
Monitor usage to identify when keys need rotation
Rotate keys regularly for security

SDKs

Concepts

Guides

Custom Providers

Observability

Examples

Reference

Basic example

How it works

Benefits

Three-key failover

Multiple providers with multiple keys

Real-world scenario

Intelligent key selection

Quota-aware selection

Error rate-aware selection

Combined strategy

Load distribution

Best practices

See also

SDKs

Concepts

Guides

Custom Providers

Observability

Examples

Reference

​Basic example

​How it works

​Benefits

​Three-key failover

​Multiple providers with multiple keys

​Real-world scenario

​Intelligent key selection

​Quota-aware selection

​Error rate-aware selection

​Combined strategy

​Load distribution

​Best practices

​See also

Basic example

How it works

Benefits

Three-key failover

Multiple providers with multiple keys

Real-world scenario

Intelligent key selection

Quota-aware selection

Error rate-aware selection

Combined strategy

Load distribution

Best practices

See also