Skip to main content
When a request to a provider fails, the AI Gateway automatically attempts failover to the next available candidate. This page explains how failover works and how to configure error behavior.

Automatic failover

The gateway automatically tries the next candidate when a request fails due to:
  • Timeouts - Request exceeded per_request_timeout
  • HTTP errors - Any 4xx or 5xx response from providers
  • Connection errors - Network failures, DNS issues, TLS errors
The gateway never retries the same model/key combination. It always moves to the next candidate.

Failover order

When a request fails, the gateway follows this order:

1. Try another API key

If multiple API keys are configured for the current model’s provider, the gateway tries the next key:
providers:
  - id: "openai"
    api_keys:
      - value: ${secrets.get('openai', 'key-one')}   # Try first
      - value: ${secrets.get('openai', 'key-two')}   # Try if first fails

2. Try another model

After exhausting all keys for a model, the gateway moves to the next model candidate. Candidates come from:
  • The client’s models array in the request body
  • Model selection strategies that return multiple models
{
  "model": "gpt-4o",
  "models": ["anthropic:claude-3-5-sonnet-20241022", "google:gemini-2.0-flash"],
  "messages": [{"role": "user", "content": "Hello"}]
}
The model field is tried first, then entries in models as fallbacks.
Cross-provider failover requires the client to specify models from different providers, or model selection strategies that return candidates from multiple providers.

Error behavior configuration

on_error: "halt" (default)

Stop processing and return the error to the client:
on_http_request:
  - type: ai-gateway
    config:
      on_error: "halt"

on_error: "continue"

Continue to the next action in the Traffic Policy, allowing custom error handling:
on_http_request:
  - type: ai-gateway
    config:
      on_error: "continue"
  - type: custom-response
    config:
      status_code: 503
      body: "AI service temporarily unavailable"
When using on_error: "continue", you can inspect the error details using action result variables.

Timeout configuration

Control failover timing with these settings:
on_http_request:
  - type: ai-gateway
    config:
      per_request_timeout: "30s"   # Each provider attempt
      total_timeout: "5m"          # All attempts combined
SettingDefaultDescription
per_request_timeout30sMaximum time for a single provider attempt
total_timeout5mMaximum time for all failover attempts combined
If total_timeout is reached, failover stops immediately even if more candidates remain.

Errors that skip failover

These errors return immediately without attempting failover:
ErrorDescription
Invalid request bodyRequest JSON could not be parsed
No models availableNo models matched the gateway configuration and client request
Model selection emptyAll model selection strategies returned empty results
Configuration errorsInvalid provider or model configuration
Once failover begins, all provider errors (including 4xx) trigger the next candidate until exhausted.
Token limit and API key errors for a specific model trigger failover to the next model, not immediate failure.

Best practices

  1. Configure multiple API keys per provider for key-level failover
  2. Use the models array in client requests for cross-provider failover
  3. Set appropriate timeouts based on your latency requirements
  4. Use on_error: "continue" with custom responses for graceful degradation
  5. Monitor with log exports to track failover patterns

Next steps