Skip to main content
Configure multiple providers for automatic failover when your primary provider experiences issues.

Basic example

on_http_request:
  - type: ai-gateway
    config:
      providers:
        - id: openai
          api_keys:
            - value: ${secrets.get('openai', 'key-one')}
        - id: anthropic
          api_keys:
            - value: ${secrets.get('anthropic', 'key')}

How it works

When the primary provider fails, the gateway automatically tries the next provider:
1. Request arrives for compatible models
2. Try OpenAI → Timeout
3. Automatically try Anthropic → Success ✓
Important: The client must specify compatible models for cross-provider failover to work:
const res = await openai.chat.completions.create({
  model: "gpt-4o",
  models: ["claude-3-5-sonnet-20241022"],  // Fallback models
  messages: [{ role: "user", content: "Hello!" }]
});

Three-provider setup

Add multiple providers for maximum reliability:
providers:
  - id: openai
    api_keys:
      - value: ${secrets.get('openai', 'key')}
  
  - id: anthropic
    api_keys:
      - value: ${secrets.get('anthropic', 'key')}
  
  - id: google
    api_keys:
      - value: ${secrets.get('google', 'key')}

Provider order

Providers are tried in alphabetical order, not the order they are configured.

Combining multi-key and multi-provider

Maximum resilience with multiple keys per provider:
providers:
  - id: openai
    api_keys:
      - value: ${secrets.get('openai', 'key-one')}
      - value: ${secrets.get('openai', 'key-two')}
  
  - id: anthropic
    api_keys:
      - value: ${secrets.get('anthropic', 'key-one')}
      - value: ${secrets.get('anthropic', 'key-two')}
Failover cascade:
1. openai/key-one → Rate limited
2. openai/key-two → Success ✓

If both OpenAI keys fail:
3. anthropic/key-one → Success ✓

Performance-based selection

Use model selection strategies to prefer providers based on metrics:
on_http_request:
  - type: ai-gateway
    config:
      providers:
        - id: openai
        - id: anthropic
        - id: google
      
      model_selection:
        strategy:
          # Prefer low-latency models
          - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_p95 < 2000)"
          # Prefer reliable models
          - "ai.models.filter(m, m.metrics.global.error_rate.total < 0.02)"
          # Fall back to any model
          - "ai.models"

Regional failover

Use multiple providers across different regions:
providers:
  - id: openai
    api_keys:
      - value: ${secrets.get('openai', 'us-east')}
    metadata:
      region: "us-east"
  
  - id: openai-eu
    base_url: "https://eu.api.openai.com"
    id_aliases: ["openai"]
    api_keys:
      - value: ${secrets.get('openai', 'eu-west')}
    metadata:
      region: "eu-west"
  
  - id: anthropic
    api_keys:
      - value: ${secrets.get('anthropic', 'key')}
    metadata:
      region: "us-west"

Cost optimization

Prefer cheaper providers with fallback to premium:
providers:
  - id: deepseek      # Cost-effective primary
  - id: openai        # Premium fallback
  - id: anthropic     # Alternative premium

model_selection:
  strategy:
    # Prefer mini/turbo models
    - "ai.models.filter(m, m.id.contains('mini') || m.id.contains('turbo'))"
    # Fall back to any model
    - "ai.models"

Real-world production example

Enterprise setup with multiple providers:
on_http_request:
  - type: ai-gateway
    config:
      per_request_timeout: "45s"
      total_timeout: "3m"
      on_error: "halt"
      
      providers:
        # Primary: OpenAI with 3 keys
        - id: openai
          api_keys:
            - value: ${secrets.get('openai', 'prod-1')}
            - value: ${secrets.get('openai', 'prod-2')}
            - value: ${secrets.get('openai', 'prod-3')}
          metadata:
            tier: "primary"
        
        # Secondary: Anthropic with 2 keys
        - id: anthropic
          api_keys:
            - value: ${secrets.get('anthropic', 'prod-1')}
            - value: ${secrets.get('anthropic', 'prod-2')}
          metadata:
            tier: "secondary"
        
        # Tertiary: Google with 1 key
        - id: google
          api_keys:
            - value: ${secrets.get('google', 'prod')}
          metadata:
            tier: "tertiary"
      
      model_selection:
        strategy:
          - "ai.models.filter(m, m.metrics.global.error_rate.total < 0.05)"
          - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_p95 < 3000)"
          - "ai.models"
This provides:
  • 6 total provider API keys across 3 providers
  • Automatic failover at both key and provider levels
  • Performance-based selection
  • Up to 3 minutes of retry attempts

Client configuration

For cross-provider failover, clients must specify multiple models:
// TypeScript
const res = await openai.chat.completions.create({
  model: "gpt-4o",
  models: [
    "claude-3-5-sonnet-20241022",
    "gemini-2.5-pro"
  ],
  messages: [...]
});
# Python
response = openai.ChatCompletion.create(
    model="gpt-4o",
    models=["claude-3-5-sonnet-20241022", "gemini-2.5-pro"],
    messages=[...]
)

Best practices

  1. Configure at least 2 providers for reliability
  2. Order providers by preference (fastest/cheapest first)
  3. Use multiple keys per provider for key-level failover
  4. Monitor provider metrics to optimize order
  5. Test failover regularly to ensure it works
  6. Set appropriate timeouts to fail fast

See also