Documentation Index
Fetch the complete documentation index at: https://ngrok.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
The AI Gateway is compatible with OpenAI’s official SDKs. Change the base URL to route requests through your gateway.
Installation
Basic usage
Point the SDK at your AI Gateway endpoint.
Which API key do I use?
- AI Gateway API Keys (recommended): Use your AI Gateway API Key (format:
ng-xxxxx-g1-xxxxx). ngrok handles provider keys for you.
- BYOK (passthrough mode): Use your provider API key (for example,
sk-... from OpenAI). See Bring Your Own Keys.
from openai import OpenAI
client = OpenAI(
base_url="https://your-ai-gateway.ngrok.app/v1",
api_key="ng-xxxxx-g1-xxxxx" # Your AI Gateway API Key
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Streaming
The AI Gateway supports streaming responses:
from openai import OpenAI
client = OpenAI(
base_url="https://your-ai-gateway.ngrok.app/v1",
api_key="ng-xxxxx-g1-xxxxx" # Your AI Gateway API Key
)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a haiku about APIs"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Using different providers
Route to different providers using model prefixes:
from openai import OpenAI
client = OpenAI(
base_url="https://your-ai-gateway.ngrok.app/v1",
api_key="unused" # Gateway handles auth
)
# OpenAI
response = client.chat.completions.create(model="openai:gpt-4o", messages=[...])
# Anthropic (through the gateway)
response = client.chat.completions.create(model="anthropic:claude-3-5-sonnet-latest", messages=[...])
# Your self-hosted Ollama
response = client.chat.completions.create(model="ollama:llama3.2", messages=[...])
Automatic model selection
Let the gateway choose the best model:
from openai import OpenAI
client = OpenAI(
base_url="https://your-ai-gateway.ngrok.app/v1",
api_key="unused"
)
response = client.chat.completions.create(
model="ngrok/auto", # Gateway selects based on your strategy
messages=[{"role": "user", "content": "Hello!"}]
)
Embeddings
Generate embeddings through the gateway:
response = client.embeddings.create(
model="openai:text-embedding-3-small",
input="The quick brown fox jumps over the lazy dog"
)
embedding = response.data[0].embedding
Function calling
Tool/function calling works exactly as documented by OpenAI:
from openai import OpenAI
client = OpenAI(
base_url="https://your-ai-gateway.ngrok.app/v1",
api_key="ng-xxxxx-g1-xxxxx" # Your AI Gateway API Key
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
}
}
}
}]
)
Async usage
Use async clients for better performance:
import asyncio
from openai import AsyncOpenAI
client = AsyncOpenAI(
base_url="https://your-ai-gateway.ngrok.app/v1",
api_key="ng-xxxxx-g1-xxxxx" # Your AI Gateway API Key
)
async def main():
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
asyncio.run(main())
Error handling
The gateway handles many errors automatically through failover. For errors that reach your app:
from openai import OpenAI, APIError, RateLimitError
client = OpenAI(
base_url="https://your-ai-gateway.ngrok.app/v1",
api_key="ng-xxxxx-g1-xxxxx" # Your AI Gateway API Key
)
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
except RateLimitError:
# All configured keys exhausted
print("Rate limited across all providers")
except APIError as e:
print(f"API error: {e}")
Supported endpoints
The AI Gateway supports these OpenAI API endpoints:
| Endpoint | Description |
|---|
/v1/chat/completions | Chat completions |
/v1/completions | Legacy completions |
/v1/embeddings | Text embeddings |
/v1/responses | Responses |
Next steps