The AI Gateway can route requests to any OpenAI-compatible endpoint beyond the built-in providers. This enables you to use self-hosted models, private deployments, or alternative AI services.
Requirements
Custom providers must:
- Expose an OpenAI-compatible API - Same request/response format as OpenAI
- Be reachable from ngrok - Either via HTTPS URL or ngrok internal endpoint
- Support the endpoints you need -
/v1/chat/completions, /v1/embeddings, etc.
URL requirements
The base_url configuration has specific requirements based on the URL type:
| URL Type | Scheme | Example |
|---|
| External HTTPS | https:// only | https://api.custom.com/v1 |
| ngrok internal endpoint | http:// or https:// | https://my-service.internal |
HTTP URLs are only allowed for ngrok .internal endpoints. All external URLs must use HTTPS.
External HTTPS URLs
For publicly accessible services with valid TLS certificates:
providers:
- id: "my-provider"
base_url: "https://api.my-service.com/v1"
api_keys:
- value: ${secrets.get('my-provider', 'api-key')}
ngrok Internal Endpoints
For services running behind ngrok (local services, private networks), use internal endpoints:
providers:
- id: "my-local-service"
base_url: "https://my-service.internal"
# No API key needed if your service doesn't require one
Internal endpoints let you:
- Route to local services without public exposure
- Connect to other ngrok endpoints in your account
- Use HTTP for services without TLS
Basic configuration
on_http_request:
- type: ai-gateway
config:
providers:
- id: "my-custom-provider"
base_url: "https://my-ai-service.example.com/v1"
api_keys:
- value: ${secrets.get('custom', 'api-key')}
models:
- id: "my-model"
Configuration fields
| Field | Required | Description |
|---|
id | Yes | Unique identifier for the provider |
base_url | Yes | Base URL for the provider’s API |
api_keys | No | API keys for authentication |
models | No | List of available models |
id_aliases | No | Alternative names for the provider |
metadata | No | Custom metadata for selection strategies |
Defining models
Specify which models your custom provider offers:
providers:
- id: "my-provider"
base_url: "https://my-service.internal"
models:
- id: "llama3-70b"
- id: "mistral-7b"
- id: "codellama-34b"
Add metadata to models for use in selection strategies:
providers:
- id: "my-provider"
base_url: "https://my-service.internal"
models:
- id: "llama3-70b"
metadata:
gpu: "A100"
quantization: "none"
max_context: 8192
Authentication
With provider API keys
If your service requires authentication:
providers:
- id: "my-provider"
base_url: "https://my-service.example.com"
api_keys:
- value: ${secrets.get('my-provider', 'api-key')}
Without provider API keys
Some self-hosted services don’t require authentication:
providers:
- id: "my-provider"
base_url: "https://my-service.internal"
# No api_keys needed
For services requiring non-standard authentication:
on_http_request:
- type: ai-gateway
config:
headers:
X-Custom-Auth: ${secrets.get('my-provider', 'token')}
providers:
- id: "my-provider"
base_url: "https://my-service.example.com"
Timeouts
Self-hosted models can be slower than cloud providers. Adjust timeouts as needed:
on_http_request:
- type: ai-gateway
config:
per_request_timeout: "120s"
total_timeout: "5m"
providers:
- id: "my-provider"
base_url: "https://my-service.internal"
Restricting access
Allow only your custom provider and block cloud providers:
on_http_request:
- type: ai-gateway
config:
only_allow_configured_providers: true
only_allow_configured_models: true
providers:
- id: "my-provider"
base_url: "https://my-service.internal"
models:
- id: "llama3"
- id: "mistral"
Failover patterns
Custom provider with cloud fallback
Use your self-hosted model as primary with cloud backup:
on_http_request:
- type: ai-gateway
config:
providers:
- id: "my-provider"
base_url: "https://my-service.internal"
models:
- id: "llama3"
- id: "openai"
api_keys:
- value: ${secrets.get('openai', 'api-key')}
model_selection:
strategy:
- "ai.models.filter(m, m.provider_id == 'my-provider')"
- "ai.models.filter(m, m.provider_id == 'openai')"
The first strategy that returns models wins. If your custom provider has matching models, only those are tried. OpenAI is only used if no custom provider models match. For cross-provider failover when requests fail, have clients specify multiple models: models: ["my-provider:llama3", "openai:gpt-4o"].
Multiple custom providers
Load balance across multiple self-hosted instances:
on_http_request:
- type: ai-gateway
config:
only_allow_configured_providers: true
providers:
- id: "inference-1"
base_url: "https://inference-1.internal"
models:
- id: "llama3"
- id: "inference-2"
base_url: "https://inference-2.internal"
models:
- id: "llama3"
model_selection:
strategy:
# Randomize across configured providers only
- "ai.models.random()"
Troubleshooting
Connection refused
- Verify the
base_url is correct and reachable
- For internal endpoints, ensure the ngrok tunnel is running
- Check firewall rules allow traffic
HTTPS required error
External URLs must use HTTPS. For local services, use ngrok internal endpoints:
# Expose local service with internal endpoint
ngrok http 8000 --url https://my-service.internal
Authentication errors
- Verify API key is correct in secrets
- Check if the service requires specific headers
- Some services use
Authorization: Bearer vs Api-Key
Model not found
- Verify the model ID matches exactly what the service expects
- Check if the model is loaded/downloaded on the service
- Some services require specific model name formats
Integration guides
Step-by-step setup instructions for specific platforms: