Skip to main content
The ngrok AI Gateway sits between your application and AI providers like OpenAI, Anthropic, and Google. It routes requests using your provider API keys, adding automatic failover, load balancing, and observability—without changing your code. If one provider fails, it automatically tries another. If one API key hits rate limits, it switches to the next. You keep full control of your provider relationships and billing.

Why use the AI Gateway?

Failover & Routing

Automatic retries on errors and timeouts. Customize the routing logic to prefer cheaper models, specific providers, or your own criteria.

One Endpoint, Many Providers

Use the same endpoint for OpenAI, Anthropic, Google, and others. Switch providers without changing your code.

Compatible With Popular SDKs

Works with official and third-party SDKs. Simply change the base URL configuration option and you’re connected.

Self-Hosted Models

Route to local models like Ollama or vLLM alongside cloud providers.

Quick example

Point your SDK at your ngrok endpoint:
from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-gateway.ngrok.app/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
Behind the scenes, the AI Gateway:
  1. Receives your request
  2. Selects which model and provider to use (based on request path and your configuration)
  3. Forwards the request with the appropriate provider API key
  4. If it fails, retries with the next option in your failover chain
  5. Returns the response

What can you do?

Use CaseDescription
Multi-provider failoverConfigure OpenAI as primary, Anthropic as backup
Multi-key rotationUse multiple provider API keys to avoid rate limits
Custom selection strategiesDefine exactly how models are selected using CEL expressions
Cost-based routingRoute to the cheapest available model automatically
Access controlRestrict which providers and models clients can use
Self-hosted modelsRoute to Ollama, vLLM, or other local inference servers
Content modificationRedact PII, sanitize responses, or inject prompts

Next steps