Skip to main content
Prerequisite: You need an AI Gateway endpoint before continuing. Create one using the dashboard quickstart or follow the manual setup guide.
The AI Gateway is fully compatible with OpenAI’s official SDKs. Simply change the baseURL to route all requests through your gateway, getting automatic failover, key rotation, and observability.

Installation

pip install openai

Basic usage

Point the SDK at your AI Gateway endpoint:
from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-subdomain.ngrok.app/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Streaming

The AI Gateway fully supports streaming responses:
from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-subdomain.ngrok.app/v1",
    api_key="your-api-key"
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku about APIs"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Using different providers

Route to different providers using model prefixes:
from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-subdomain.ngrok.app/v1",
    api_key="unused"  # Gateway handles auth
)

# OpenAI
response = client.chat.completions.create(model="openai:gpt-4o", messages=[...])

# Anthropic (through the gateway)
response = client.chat.completions.create(model="anthropic:claude-3-5-sonnet-latest", messages=[...])

# Your self-hosted Ollama
response = client.chat.completions.create(model="ollama:llama3.2", messages=[...])

Automatic model selection

Let the gateway choose the best model:
from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-subdomain.ngrok.app/v1",
    api_key="unused"
)

response = client.chat.completions.create(
    model="ngrok/auto",  # Gateway selects based on your strategy
    messages=[{"role": "user", "content": "Hello!"}]
)

Embeddings

Generate embeddings through the gateway:
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog"
)

embedding = response.data[0].embedding

Function calling

Tool/function calling works exactly as documented by OpenAI:
from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-subdomain.ngrok.app/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                }
            }
        }
    }]
)

Async usage

Use async clients for better performance:
import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://your-ai-subdomain.ngrok.app/v1",
    api_key="your-api-key"
)

async def main():
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Error handling

The gateway handles many errors automatically through failover. For errors that reach your app:
from openai import OpenAI, APIError, RateLimitError

client = OpenAI(
    base_url="https://your-ai-subdomain.ngrok.app/v1",
    api_key="your-api-key"
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except RateLimitError:
    # All configured keys exhausted
    print("Rate limited across all providers")
except APIError as e:
    print(f"API error: {e}")

Supported endpoints

The AI Gateway supports these OpenAI API endpoints:
EndpointDescription
/v1/chat/completionsChat completions (GPT-4, Claude, etc.)
/v1/completionsLegacy completions
/v1/embeddingsText embeddings
/v1/modelsList available models

Next steps