Automatic failover
The gateway automatically tries the next candidate when a request fails due to:- Timeouts - Request exceeded
per_request_timeout - HTTP errors - Any 4xx or 5xx response from providers
- Connection errors - Network failures, DNS issues, TLS errors
Failover order
When a request fails, the gateway follows this order:1. Try another API key
If multiple API keys are configured for the current model’s provider, the gateway tries the next key:2. Try another model
After exhausting all keys for a model, the gateway moves to the next model candidate. Candidates come from:- The client’s
modelsarray in the request body - Model selection strategies that return multiple models
model field is tried first, then entries in models as fallbacks.
Cross-provider failover requires the client to specify models from different providers, or model selection strategies that return candidates from multiple providers.
Error behavior configuration
on_error: "halt" (default)
Stop processing and return the error to the client:
on_error: "continue"
Continue to the next action in the Traffic Policy, allowing custom error handling:
on_error: "continue", you can inspect the error details using action result variables.
Timeout configuration
Control failover timing with these settings:| Setting | Default | Description |
|---|---|---|
per_request_timeout | 30s | Maximum time for a single provider attempt |
total_timeout | 5m | Maximum time for all failover attempts combined |
total_timeout is reached, failover stops immediately even if more candidates remain.
Errors that skip failover
These errors return immediately without attempting failover:| Error | Description |
|---|---|
| Invalid request body | Request JSON could not be parsed |
| No models available | No models matched the gateway configuration and client request |
| Model selection empty | All model selection strategies returned empty results |
| Configuration errors | Invalid provider or model configuration |
Token limit and API key errors for a specific model trigger failover to the next model, not immediate failure.
Best practices
- Configure multiple API keys per provider for key-level failover
- Use the
modelsarray in client requests for cross-provider failover - Set appropriate timeouts based on your latency requirements
- Use
on_error: "continue"with custom responses for graceful degradation - Monitor with log exports to track failover patterns