Skip to main content
Configure multiple provider API keys per provider to automatically failover when keys hit rate limits or encounter errors.
This example uses Bring Your Own Keys (BYOK) with your own provider keys. If you’re using AI Gateway API Keys, ngrok manages provider key failover automatically.

Basic example

on_http_request:
  - type: ai-gateway
    config:
      providers:
        - id: openai
          api_keys:
            - value: ${secrets.get('openai', 'key-one')}
            - value: ${secrets.get('openai', 'key-two')}

How it works

When a request fails with the first key, the gateway automatically tries the next key:
1. Request arrives for gpt-4o
2. Try with openai/key-one → 429 Rate Limit
3. Automatically retry with openai/key-two → Success ✓
Keys are tried in the order they’re listed. Put your highest-capacity or preferred keys first.

Benefits

  • No downtime when hitting rate limits
  • Automatic failover without manual intervention
  • Load distribution across multiple billing accounts
  • Increased capacity by combining quotas

Three-key failover

Add more keys for additional resilience:
providers:
  - id: openai
    api_keys:
      - value: ${secrets.get('openai', 'team-a-key')}
      - value: ${secrets.get('openai', 'team-b-key')}
      - value: ${secrets.get('openai', 'backup-key')}
Failover sequence:
1. Try team-a-key → Rate limited
2. Try team-b-key → Timeout
3. Try backup-key → Success ✓

Multiple providers with multiple keys

Combine multi-key and multi-provider failover for maximum resilience:
on_http_request:
  - type: ai-gateway
    config:
      only_allow_configured_providers: true

      providers:
        - id: openai
          api_keys:
            - value: ${secrets.get('openai', 'key-one')}
            - value: ${secrets.get('openai', 'key-two')}
        
        - id: anthropic
          api_keys:
            - value: ${secrets.get('anthropic', 'key-one')}
            - value: ${secrets.get('anthropic', 'key-two')}
      
      model_selection:
        strategy:
          - "ai.models.filter(m, m.provider_id == 'openai')"
          - "ai.models.filter(m, m.provider_id == 'anthropic')"
Failover cascade:
1. openai/key-one → Fails
2. openai/key-two → Fails
3. anthropic/key-one → Success ✓
For cross-provider failover, clients must specify multiple models in the request or use a model selection strategy. Providers are tried in alphabetical order by default; use model_selection.strategy to specify a custom order.

Real-world scenario

High-traffic production application:
on_http_request:
  - type: ai-gateway
    config:
      per_request_timeout: "30s"
      total_timeout: "2m"
      
      providers:
        - id: openai
          api_keys:
            # Production keys with high quotas
            - value: ${secrets.get('openai', 'prod-key-1')}
            - value: ${secrets.get('openai', 'prod-key-2')}
            - value: ${secrets.get('openai', 'prod-key-3')}
            # Backup key for emergencies
            - value: ${secrets.get('openai', 'emergency-key')}

Advanced key selection

For smarter key selection based on runtime metrics, use api_key_selection:
on_http_request:
  - type: ai-gateway
    config:
      providers:
        - id: openai
          api_keys:
            - value: ${secrets.get('openai', 'key-one')}
            - value: ${secrets.get('openai', 'key-two')}
            - value: ${secrets.get('openai', 'key-three')}
      
      api_key_selection:
        strategy:
          # Prefer keys with remaining quota
          - "ai.keys.filter(k, k.quota.remaining_requests > 100)"
          # Fall back to all keys
          - "ai.keys"

Quota-aware selection

Route to keys with the most remaining capacity:
api_key_selection:
  strategy:
    # Keys with plenty of quota
    - "ai.keys.filter(k, k.quota.remaining_requests > 500)"
    # Keys with some quota
    - "ai.keys.filter(k, k.quota.remaining_requests > 50)"
    # Fall back to all keys
    - "ai.keys"

Error rate-aware selection

Avoid keys that are hitting rate limits:
api_key_selection:
  strategy:
    # Keys with low rate limit errors
    - "ai.keys.filter(k, k.error_rate.rate_limit < 0.05)"
    # Keys with acceptable overall error rates
    - "ai.keys.filter(k, k.error_rate.total < 0.2)"
    # Fall back to all keys
    - "ai.keys"

Combined strategy

Use both quota and error rate for optimal selection:
api_key_selection:
  strategy:
    # Best keys: high quota AND low errors
    - "ai.keys.filter(k, k.quota.remaining_requests > 500 && k.error_rate.rate_limit < 0.05)"
    # Good keys: decent quota
    - "ai.keys.filter(k, k.quota.remaining_requests > 100)"
    # Acceptable keys: low errors
    - "ai.keys.filter(k, k.error_rate.total < 0.3)"
    # Fall back to all keys
    - "ai.keys"

Load distribution

Randomize selection to spread load across keys:
api_key_selection:
  strategy:
    # Randomly select from healthy keys
    - "ai.keys.filter(k, k.quota.remaining_requests > 100).randomize()"
    # Fall back to any key
    - "ai.keys.randomize()"

Best practices

  1. Use at least 2-3 keys per provider for reliability
  2. Order keys by capacity - highest quota first
  3. Use different billing accounts for true isolation
  4. Monitor usage to identify when keys need rotation
  5. Rotate keys regularly for security

See also