Documentation Index
Fetch the complete documentation index at: https://ngrok.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Configure multiple providers for automatic failover when your primary provider experiences issues.
Basic example
on_http_request:
- type: ai-gateway
config:
only_allow_configured_providers: true
providers:
- id: openai
api_keys:
- value: ${secrets.get('openai', 'key-one')}
- id: anthropic
api_keys:
- value: ${secrets.get('anthropic', 'key')}
How it works
When the primary provider fails, the gateway automatically tries the next provider:
1. Request arrives for compatible models
2. Try OpenAI → Timeout
3. Automatically try Anthropic → Success ✓
Important: The client must specify compatible models for cross-provider failover to work:
const res = await openai.chat.completions.create({
model: "gpt-4o",
models: ["claude-3-5-sonnet-20241022"], // Fallback models
messages: [{ role: "user", content: "Hello!" }]
});
Three-provider setup
Add multiple providers for maximum reliability:
on_http_request:
- type: ai-gateway
config:
only_allow_configured_providers: true
providers:
- id: openai
api_keys:
- value: ${secrets.get('openai', 'key')}
- id: anthropic
api_keys:
- value: ${secrets.get('anthropic', 'key')}
- id: google
api_keys:
- value: ${secrets.get('google', 'key')}
Provider order
Providers are tried in alphabetical order, not the order they are configured, to control order use the model_selection.strategy to specify the order:
on_http_request:
- type: ai-gateway
config:
only_allow_configured_providers: true
providers:
- id: openai
- id: anthropic
- id: google
model_selection:
strategy:
- "ai.models.filter(m, m.provider_id == 'openai')"
- "ai.models.filter(m, m.provider_id == 'anthropic')"
- "ai.models.filter(m, m.provider_id == 'google')"
Combining multi-key and multi-provider
Maximum resilience with multiple keys per provider and explicit ordering:
on_http_request:
- type: ai-gateway
config:
only_allow_configured_providers: true
providers:
- id: openai
api_keys:
- value: ${secrets.get('openai', 'key-one')}
- value: ${secrets.get('openai', 'key-two')}
- id: anthropic
api_keys:
- value: ${secrets.get('anthropic', 'key-one')}
- value: ${secrets.get('anthropic', 'key-two')}
model_selection:
strategy:
# Try OpenAI models first
- "ai.models.filter(m, m.provider_id == 'openai')"
# Then fall back to Anthropic models
- "ai.models.filter(m, m.provider_id == 'anthropic')"
Failover cascade:
1. openai/key-one → Rate limited
2. openai/key-two → Success ✓
If both OpenAI keys fail:
3. anthropic/key-one → Success ✓
Note: The model_selection.strategy ensures providers are tried in the specified order, not alphabetically. The only_allow_configured_providers option restricts requests to only the configured providers.
Use model selection strategies to prefer providers based on metrics:
on_http_request:
- type: ai-gateway
config:
only_allow_configured_providers: true
providers:
- id: openai
- id: anthropic
- id: google
model_selection:
strategy:
# Prefer low-latency models
- "ai.models.filter(m, m.metrics.global.latency.upstream_ms_p95 < 2000)"
# Prefer reliable models
- "ai.models.filter(m, m.metrics.global.error_rate.total < 0.02)"
# Fall back to any model
- "ai.models"
Regional failover
For providers that offer regional availability, you can use the custom providers feature to add specific regions:
on_http_request:
- type: ai-gateway
config:
only_allow_configured_providers: true
providers:
- id: openai
api_keys:
- value: ${secrets.get('openai', 'us-east')}
metadata:
region: "us-east"
- id: openai-eu
base_url: "https://eu.api.openai.com"
id_aliases: ["openai"]
api_keys:
- value: ${secrets.get('openai', 'eu-west')}
metadata:
region: "eu-west"
model_selection:
strategy:
- "ai.models.filter(m, m.metadata.region == 'us-east')"
- "ai.models.filter(m, m.metadata.region == 'eu-west')"
Cost optimization
Prefer cheaper providers with fallback to premium:
on_http_request:
- type: ai-gateway
config:
only_allow_configured_providers: true
providers:
- id: deepseek # Cost-effective primary
- id: openai # Premium fallback
- id: anthropic # Alternative premium
model_selection:
strategy:
# Prefer mini/turbo models
- "ai.models.filter(m, m.id.contains('mini') || m.id.contains('turbo'))"
# Fall back to any model
- "ai.models"
Real-world production example
Enterprise setup with multiple providers:
on_http_request:
- type: ai-gateway
config:
only_allow_configured_providers: true
per_request_timeout: "45s"
total_timeout: "3m"
on_error: "halt"
providers:
# Primary: OpenAI with 3 keys
- id: openai
api_keys:
- value: ${secrets.get('openai', 'prod-1')}
- value: ${secrets.get('openai', 'prod-2')}
- value: ${secrets.get('openai', 'prod-3')}
metadata:
tier: "primary"
# Secondary: Anthropic with 2 keys
- id: anthropic
api_keys:
- value: ${secrets.get('anthropic', 'prod-1')}
- value: ${secrets.get('anthropic', 'prod-2')}
metadata:
tier: "secondary"
# Tertiary: Google with 1 key
- id: google
api_keys:
- value: ${secrets.get('google', 'prod')}
metadata:
tier: "tertiary"
model_selection:
strategy:
- "ai.models.filter(m, m.metrics.global.error_rate.total < 0.05)"
- "ai.models.filter(m, m.metrics.global.latency.upstream_ms_p95 < 3000)"
- "ai.models"
The patterns in this example provide:
- 6 total provider API keys across 3 providers
- Automatic failover at both the key and provider levels
- Performance-based model selection
- Up to 3 minutes of retry attempts
Client configuration
For cross-provider failover, clients must specify multiple models:
// TypeScript
const res = await openai.chat.completions.create({
model: "gpt-4o",
models: [
"claude-3-5-sonnet-20241022",
"gemini-2.5-pro"
],
messages: [...]
});
# Python
response = openai.ChatCompletion.create(
model="gpt-4o",
models=["claude-3-5-sonnet-20241022", "gemini-2.5-pro"],
messages=[...]
)
Best practices
- Configure at least 2 providers for reliability
- Order providers by preference (fastest/cheapest first)
- Use multiple keys per provider for key-level failover
- Monitor provider metrics to optimize order
- Test failover regularly to ensure it works
- Set appropriate timeouts to fail fast
See also