When a request to a provider fails, the AI Gateway automatically attempts failover to the next available candidate. This page explains how failover works and how to configure error behavior.Documentation Index
Fetch the complete documentation index at: https://ngrok.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Automatic failover
The gateway automatically tries the next candidate when a request fails due to:- Timeouts - Request exceeded
per_request_timeout - HTTP errors - Any 4xx or 5xx response from providers
- Connection errors - Network failures, DNS issues, TLS errors
Failover order
When a request fails, the gateway follows this order:1. Try another API key
If multiple API keys are configured for the current model’s provider, the gateway tries the next key:2. Try another model
After exhausting all keys for a model, the gateway moves to the next model candidate. Candidates come from:- The client’s
modelsarray in the request body - Model selection strategies that return multiple models
model field is tried first, then entries in models as fallbacks.
Cross-provider failover requires the client to specify models from different providers, or model selection strategies that return candidates from multiple providers.
Error behavior configuration
on_error: "halt" (default)
Stop processing and return the error to the client:
on_error: "continue"
Continue to the next action in the Traffic Policy, allowing custom error handling:
on_error: "continue", you can inspect the error details using action result variables.
Timeout configuration
Control failover timing with these settings:| Setting | Default | Description |
|---|---|---|
per_request_timeout | 30s | Maximum time for a single provider attempt |
total_timeout | 5m | Maximum time for all failover attempts combined |
total_timeout is reached, failover stops immediately even if more candidates remain.
Errors that skip failover
These errors return immediately without attempting failover:| Error | Description |
|---|---|
| Invalid request body | Request JSON could not be parsed |
| No models available | No models matched the gateway configuration and client request |
| Model selection empty | All model selection strategies returned empty results |
| Configuration errors | Invalid provider or model configuration |
Token limit and API key errors for a specific model trigger failover to the next model, not immediate failure.
Best practices
- Configure multiple API keys per provider for key-level failover
- Use the
modelsarray in client requests for cross-provider failover - Set appropriate timeouts based on your latency requirements
- Use
on_error: "continue"with custom responses for graceful degradation - Monitor with log exports to track failover patterns
Next steps
Troubleshooting
Error codes, causes, and solutions
Debugging
Inspect action results and diagnose issues