Skip to main content
Prerequisite: You need an AI Gateway endpoint before continuing. Create one using the dashboard quickstart or follow the manual setup guide.
The AI Gateway is compatible with OpenAI’s official SDKs. Change the base URL to route requests through your gateway.

Installation

pip install openai

Basic usage

Point the SDK at your AI Gateway endpoint. Which API key do I use?
  • AI Gateway API Keys (recommended): Use your AI Gateway API Key (format: ng-xxxxx-g1-xxxxx). ngrok handles provider keys for you.
  • BYOK (passthrough mode): Use your provider API key (for example, sk-... from OpenAI). See Bring Your Own Keys.
from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-gateway.ngrok.app/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your AI Gateway API Key
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Streaming

The AI Gateway supports streaming responses:
from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-gateway.ngrok.app/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your AI Gateway API Key
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku about APIs"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Using different providers

Route to different providers using model prefixes:
from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-gateway.ngrok.app/v1",
    api_key="unused"  # Gateway handles auth
)

# OpenAI
response = client.chat.completions.create(model="openai:gpt-4o", messages=[...])

# Anthropic (through the gateway)
response = client.chat.completions.create(model="anthropic:claude-3-5-sonnet-latest", messages=[...])

# Your self-hosted Ollama
response = client.chat.completions.create(model="ollama:llama3.2", messages=[...])

Automatic model selection

Let the gateway choose the best model:
from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-gateway.ngrok.app/v1",
    api_key="unused"
)

response = client.chat.completions.create(
    model="ngrok/auto",  # Gateway selects based on your strategy
    messages=[{"role": "user", "content": "Hello!"}]
)

Embeddings

Generate embeddings through the gateway:
response = client.embeddings.create(
    model="openai:text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog"
)

embedding = response.data[0].embedding

Function calling

Tool/function calling works exactly as documented by OpenAI:
from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-gateway.ngrok.app/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your AI Gateway API Key
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                }
            }
        }
    }]
)

Async usage

Use async clients for better performance:
import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://your-ai-gateway.ngrok.app/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your AI Gateway API Key
)

async def main():
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Error handling

The gateway handles many errors automatically through failover. For errors that reach your app:
from openai import OpenAI, APIError, RateLimitError

client = OpenAI(
    base_url="https://your-ai-gateway.ngrok.app/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your AI Gateway API Key
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except RateLimitError:
    # All configured keys exhausted
    print("Rate limited across all providers")
except APIError as e:
    print(f"API error: {e}")

Supported endpoints

The AI Gateway supports these OpenAI API endpoints:
EndpointDescription
/v1/chat/completionsChat completions
/v1/completionsLegacy completions
/v1/embeddingsText embeddings
/v1/responsesResponses

Next steps