Skip to main content

Automatic failover behavior

When a request to a provider fails, the gateway automatically tries the next candidate. This includes failures from:
  • Timeouts - Request exceeded per_request_timeout
  • HTTP errors - Any non-2xx/3xx response from providers (4xx, 5xx)
  • Connection errors - Network failures, DNS issues, etc.
The gateway never retries the same model/key combination—it always moves to the next candidate in its list.

Failover cascade

When a request fails, the gateway follows this failover order:

1. Try another key

If multiple API keys are configured for the current model’s provider, try the next key:
providers:
  - id: "openai"
    api_keys:
      - value: ${secrets.get('openai', 'key-one')}   # Try first
      - value: ${secrets.get('openai', 'key-two')}   # Try if first fails

2. Try another model candidate

If all keys for the current model fail, the gateway moves to the next model in its candidate list. This happens when:
  • The client specified multiple models using the models array
  • Model selection strategies returned multiple candidates
{
  "model": "gpt-4o",
  "models": ["anthropic:claude-3-5-sonnet-20241022", "google:gemini-2.0-flash"],
  "messages": [{"role": "user", "content": "Hello"}]
}
The model field is tried first, then entries in the models array as fallbacks.
Provider failover requires the client to specify models from different providers, or for model selection strategies to return candidates from multiple providers.

Error behavior configuration

Control what happens when all candidates are exhausted:

on_error: "halt" (default)

Stop processing and return the error to the client:
on_http_request:
  - type: ai-gateway
    config:
      on_error: "halt"
The client receives the last error from the failover cascade.

on_error: "continue"

Continue to the next action in the Traffic Policy:
on_http_request:
  - type: ai-gateway
    config:
      on_error: "continue"
  - type: custom-error-handler
    # This action runs if ai-gateway fails
Useful for custom error handling or fallback logic.

Error response format

When all candidates fail, the gateway returns an error. The response preserves the last provider’s error response when available, or returns an ngrok error message. Provider error passthrough: If the last attempt received an HTTP error response from the provider, that response (status code and body) is returned to the client. Gateway error: If the failure was due to timeouts, connection errors, or configuration issues, the gateway returns an error message:
All AI providers failed to respond successfully. The request could not be completed. Errors:
[429] openai/gpt-4o: 429 Too Many Requests
[500] anthropic/claude-3-5-sonnet-20241022: 500 Internal Server Error

Common error scenarios

Scenario 1: Key exhaustion

All keys for a provider hit rate limits:
1. Try openai/key-one → 429 Rate Limit
2. Try openai/key-two → 429 Rate Limit
3. Try openai/key-three → 429 Rate Limit
4. Move to next provider or return error

Scenario 2: Provider timeout

Provider times out, fallback to another:
1. Try openai → Timeout after 30s
2. Try anthropic → Success

Scenario 3: All providers fail

All configured providers and keys fail:
1. Try openai/key-one → 500 Server Error
2. Try openai/key-two → 500 Server Error
3. Try anthropic/key → 503 Service Unavailable
4. Return error to client (if on_error: "halt")

Timeout configuration

Control failover timing with timeout settings:
on_http_request:
  - type: ai-gateway
    config:
      per_request_timeout: "30s"   # Each provider attempt
      total_timeout: "5m"           # All attempts combined
  • per_request_timeout: Maximum time for a single provider attempt
  • total_timeout: Maximum time for all failover attempts
If total_timeout is reached, failover stops even if more candidates are available.

Errors that skip failover

These errors occur before the failover cascade begins and are returned immediately:
  • Invalid request body - Request JSON could not be parsed
  • No models available - No configured providers support the requested model
  • Token limit exceeded - Request exceeds max_input_tokens
  • No API key - No key configured and none provided by client
  • Configuration errors - Invalid provider or model configuration
Once the failover cascade starts, all provider errors (including 4xx) trigger the next candidate until all are exhausted.

Monitoring errors

Track errors using the error_rate metrics available in model selection strategies:
MetricDescription
error_rate.totalFraction of requests that failed (0.0 to 1.0)
error_rate.timeoutFraction of requests that timed out
error_rate.rate_limitFraction of requests that returned HTTP 429
error_rate.clientFraction of requests that returned 4xx errors
error_rate.serverFraction of requests that returned 5xx errors
Use these metrics to:
  • Filter out problematic providers in model selection
  • Identify keys hitting rate limits
  • Adjust timeout values based on timeout rates

Best practices

  1. Configure multiple keys per provider for key-level failover
  2. Specify multiple models in client requests for cross-provider failover
  3. Set appropriate timeouts based on your use case
  4. Monitor error metrics to catch issues early
  5. Use model selection strategies to order candidates by preference

Example: Comprehensive error handling

on_http_request:
  - type: ai-gateway
    config:
      per_request_timeout: "45s"
      total_timeout: "3m"
      on_error: "halt"
      
      providers:
        # Primary provider with multiple keys
        - id: "openai"
          api_keys:
            - value: ${secrets.get('openai', 'key-a')}
            - value: ${secrets.get('openai', 'key-b')}
            - value: ${secrets.get('openai', 'key-c')}
        
        # Backup provider
        - id: "anthropic"
          api_keys:
            - value: ${secrets.get('anthropic', 'primary')}
            - value: ${secrets.get('anthropic', 'backup')}
        
        # Emergency fallback
        - id: "google"
          api_keys:
            - value: ${secrets.get('google', 'key')}
This configuration provides:
  • 3 failover keys for OpenAI
  • 2 failover keys for Anthropic
  • 1 emergency key for Google
  • Up to 3 minutes for all failover attempts
  • Clear error messages on complete failure
For cross-provider failover, clients must specify models from multiple providers using the models array, or model selection strategies must return candidates from multiple providers.