OpenAI SDK

Prerequisite: You need an AI Gateway endpoint before continuing. Create one using the dashboard quickstart or follow the manual setup guide.

The AI Gateway is fully compatible with OpenAI’s official SDKs. Simply change the baseURL to route all requests through your gateway, getting automatic failover, key rotation, and observability.

Installation

pip install openai

Basic usage

Point the SDK at your AI Gateway endpoint:

from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-subdomain.ngrok.app/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Streaming

The AI Gateway fully supports streaming responses:

from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-subdomain.ngrok.app/v1",
    api_key="your-api-key"
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku about APIs"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Using different providers

Route to different providers using model prefixes:

from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-subdomain.ngrok.app/v1",
    api_key="unused"  # Gateway handles auth
)

# OpenAI
response = client.chat.completions.create(model="openai:gpt-4o", messages=[...])

# Anthropic (through the gateway)
response = client.chat.completions.create(model="anthropic:claude-3-5-sonnet-latest", messages=[...])

# Your self-hosted Ollama
response = client.chat.completions.create(model="ollama:llama3.2", messages=[...])

Automatic model selection

Let the gateway choose the best model:

from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-subdomain.ngrok.app/v1",
    api_key="unused"
)

response = client.chat.completions.create(
    model="ngrok/auto",  # Gateway selects based on your strategy
    messages=[{"role": "user", "content": "Hello!"}]
)

Embeddings

Generate embeddings through the gateway:

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog"
)

embedding = response.data[0].embedding

Function calling

Tool/function calling works exactly as documented by OpenAI:

from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-subdomain.ngrok.app/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                }
            }
        }
    }]
)

Async usage

Use async clients for better performance:

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://your-ai-subdomain.ngrok.app/v1",
    api_key="your-api-key"
)

async def main():
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Error handling

The gateway handles many errors automatically through failover. For errors that reach your app:

from openai import OpenAI, APIError, RateLimitError

client = OpenAI(
    base_url="https://your-ai-subdomain.ngrok.app/v1",
    api_key="your-api-key"
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except RateLimitError:
    # All configured keys exhausted
    print("Rate limited across all providers")
except APIError as e:
    print(f"API error: {e}")

Supported endpoints

The AI Gateway supports these OpenAI API endpoints:

Endpoint	Description
`/v1/chat/completions`	Chat completions (GPT-4, Claude, etc.)
`/v1/completions`	Legacy completions
`/v1/embeddings`	Text embeddings
`/v1/models`	List available models

Next steps

Model Selection Strategies - Configure routing logic
Configuring Providers - Set up providers and keys
Multi-Provider Failover - Failover examples

SDKs

Concepts

Guides

Custom Providers

Observability

Examples

Reference

Installation

Basic usage

Streaming

Using different providers

Automatic model selection

Embeddings

Function calling

Async usage

Error handling

Supported endpoints

Next steps

SDKs

Concepts

Guides

Custom Providers

Observability

Examples

Reference

​Installation

​Basic usage

​Streaming

​Using different providers

​Automatic model selection

​Embeddings

​Function calling

​Async usage

​Error handling

​Supported endpoints

​Next steps

Installation

Basic usage

Streaming

Using different providers

Automatic model selection

Embeddings

Function calling

Async usage

Error handling

Supported endpoints

Next steps