Skip to main content
The ngrok AI Gateway lets you route requests to AI providers like OpenAI, Anthropic, and Google through a single endpoint. If one provider fails, it automatically tries another. If one provider API key hits rate limits, it switches to another.

Why use the AI Gateway?

Failover & Routing

Automatic retries on errors and timeouts. Customize the routing logic to prefer cheaper models, specific providers, or your own criteria.

One Endpoint, Many Providers

Use the same endpoint for OpenAI, Anthropic, Google, and others. Switch providers without changing your code.

OpenAI SDK Compatible

Works with any OpenAI SDK. Just change the baseURL and you’re connected.

Self-Hosted Models

Route to local models like Ollama or vLLM alongside cloud providers.

Quick example

Point your OpenAI SDK at your ngrok endpoint:
from openai import OpenAI

client = OpenAI(
    base_url="https://your-endpoint.ngrok.app/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)
Behind the scenes, the AI Gateway:
  1. Receives your request
  2. Selects which model and provider to use (based on your configuration)
  3. Forwards the request with the appropriate provider API key
  4. If it fails, retries with the next option in your failover chain
  5. Returns the response

What can you do?

Use CaseDescription
Multi-provider failoverConfigure OpenAI as primary, Anthropic as backup
Multi-key rotationUse multiple provider API keys to avoid rate limits
Custom selection strategiesDefine exactly how models are selected using CEL expressions
Cost-based routingRoute to the cheapest available model automatically
Access controlRestrict which providers and models clients can use
Self-hosted modelsRoute to Ollama, vLLM, or other local inference servers
Content modificationRedact PII, sanitize responses, or inject prompts

Next steps