Automatic failover behavior
When a request to a provider fails, the gateway automatically tries the next candidate. This includes failures from:- Timeouts - Request exceeded
per_request_timeout - HTTP errors - Any non-2xx/3xx response from providers (4xx, 5xx)
- Connection errors - Network failures, DNS issues, etc.
Failover cascade
When a request fails, the gateway follows this failover order:1. Try another key
If multiple API keys are configured for the current model’s provider, try the next key:2. Try another model candidate
If all keys for the current model fail, the gateway moves to the next model in its candidate list. This happens when:- The client specified multiple models using the
modelsarray - Model selection strategies returned multiple candidates
model field is tried first, then entries in the models array as fallbacks.
Provider failover requires the client to specify models from different providers, or for model selection strategies to return candidates from multiple providers.
Error behavior configuration
Control what happens when all candidates are exhausted:on_error: "halt" (default)
Stop processing and return the error to the client:
on_error: "continue"
Continue to the next action in the Traffic Policy:
Error response format
When all candidates fail, the gateway returns an error. The response preserves the last provider’s error response when available, or returns an ngrok error message. Provider error passthrough: If the last attempt received an HTTP error response from the provider, that response (status code and body) is returned to the client. Gateway error: If the failure was due to timeouts, connection errors, or configuration issues, the gateway returns an error message:Common error scenarios
Scenario 1: Key exhaustion
All keys for a provider hit rate limits:Scenario 2: Provider timeout
Provider times out, fallback to another:Scenario 3: All providers fail
All configured providers and keys fail:Timeout configuration
Control failover timing with timeout settings:per_request_timeout: Maximum time for a single provider attempttotal_timeout: Maximum time for all failover attempts
total_timeout is reached, failover stops even if more candidates are available.
Errors that skip failover
These errors occur before the failover cascade begins and are returned immediately:- Invalid request body - Request JSON could not be parsed
- No models available - No configured providers support the requested model
- Token limit exceeded - Request exceeds
max_input_tokens - No API key - No key configured and none provided by client
- Configuration errors - Invalid provider or model configuration
Monitoring errors
Track errors using theerror_rate metrics available in model selection strategies:
| Metric | Description |
|---|---|
error_rate.total | Fraction of requests that failed (0.0 to 1.0) |
error_rate.timeout | Fraction of requests that timed out |
error_rate.rate_limit | Fraction of requests that returned HTTP 429 |
error_rate.client | Fraction of requests that returned 4xx errors |
error_rate.server | Fraction of requests that returned 5xx errors |
- Filter out problematic providers in model selection
- Identify keys hitting rate limits
- Adjust timeout values based on timeout rates
Best practices
- Configure multiple keys per provider for key-level failover
- Specify multiple models in client requests for cross-provider failover
- Set appropriate timeouts based on your use case
- Monitor error metrics to catch issues early
- Use model selection strategies to order candidates by preference
Example: Comprehensive error handling
- 3 failover keys for OpenAI
- 2 failover keys for Anthropic
- 1 emergency key for Google
- Up to 3 minutes for all failover attempts
- Clear error messages on complete failure
For cross-provider failover, clients must specify models from multiple providers using the
models array, or model selection strategies must return candidates from multiple providers.