AI Gateway Overview

The ngrok AI Gateway sits between your application and AI providers like OpenAI, Anthropic, and Google. It routes requests using your provider API keys, adding automatic failover, load balancing, and observability—without changing your code. If one provider fails, it automatically tries another. If one API key hits rate limits, it switches to the next. You keep full control of your provider relationships and billing.

Why use the AI Gateway?

Failover & Routing

Automatic retries on errors and timeouts. Customize the routing logic to prefer cheaper models, specific providers, or your own criteria.

One Endpoint, Many Providers

Use the same endpoint for OpenAI, Anthropic, Google, and others. Switch providers without changing your code.

Compatible With Popular SDKs

Works with official and third-party SDKs. Simply change the base URL configuration option and you’re connected.

Self-Hosted Models

Route to local models like Ollama or vLLM alongside cloud providers.

Quick example

Point your SDK at your ngrok endpoint:

from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-gateway.ngrok.app/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Behind the scenes, the AI Gateway:

Receives your request
Selects which model and provider to use (based on request path and your configuration)
Forwards the request with the appropriate provider API key
If it fails, retries with the next option in your failover chain
Returns the response

What can you do?

Use Case	Description
Multi-provider failover	Configure OpenAI as primary, Anthropic as backup
Multi-key rotation	Use multiple provider API keys to avoid rate limits
Custom selection strategies	Define exactly how models are selected using CEL expressions
Cost-based routing	Route to the cheapest available model automatically
Access control	Restrict which providers and models clients can use
Self-hosted models	Route to Ollama, vLLM, or other local inference servers
Content modification	Redact PII, sanitize responses, or inject prompts

Next steps

Quickstart

Get your AI Gateway running in 5 minutes using the dashboard

Manual Setup

Create endpoints via CLI, API, or Kubernetes

How It Works

Understand the request flow and failover behavior

SDK Integration

Connect your application to the AI Gateway

SDKs

Concepts

Guides

Custom Providers

Observability

Examples

Reference

Why use the AI Gateway?

Failover & Routing

One Endpoint, Many Providers

Compatible With Popular SDKs

Self-Hosted Models

Quick example

What can you do?

Next steps

Quickstart

Manual Setup

How It Works

SDK Integration

SDKs

Concepts

Guides

Custom Providers

Observability

Examples

Reference

​Why use the AI Gateway?

Failover & Routing

One Endpoint, Many Providers

Compatible With Popular SDKs

Self-Hosted Models

​Quick example

​What can you do?

​Next steps

Quickstart

Manual Setup

How It Works

SDK Integration

Why use the AI Gateway?

Quick example

What can you do?

Next steps