Skip to main content

Configuration and setup

Why am I getting “provider not allowed”?

This error occurs when only_allow_configured_providers: true is set and you’re trying to use a provider that isn’t explicitly configured. Solution:
only_allow_configured_providers: true
providers:
  - id: openai      # Add your provider here
    api_keys:
      - value: ${secrets.get('openai', 'key')}
Or set only_allow_configured_providers: false to allow all providers.

Why am I getting “model unknown”?

This error can occur for several reasons:
  1. Model restrictions enabled - only_allow_configured_models: true is set and the model isn’t explicitly listed
  2. Missing provider prefix - For unknown models (not in catalog), you must include a provider prefix like openai:new-model
  3. Provider not configured - The provider for the unknown model isn’t configured or allowed
Solutions: Add the model to your configuration:
only_allow_configured_models: true
providers:
  - id: openai
    models:
      - id: "gpt-4o"      # Add your model here
      - id: "gpt-4o-mini"
Or use a provider prefix for unknown models:
{
  "model": "openai:gpt-5-preview",
  "messages": [{"role": "user", "content": "Hello"}]
}
Or set only_allow_configured_models: false to allow all models.

How do I limit which models users can access?

Use model restrictions:
only_allow_configured_models: true
providers:
  - id: openai
    models:
      - id: "gpt-4o-mini"  # Only allow specific models

Can I use the gateway with the Vercel AI SDK?

Yes, the gateway is fully compatible:
import { createOpenAI } from "@ai-sdk/openai"

const openai = createOpenAI({
  baseUrl: "https://ai-gateway.ngrok.app",
})
See SDK Integration for details.

Failover

How does failover work?

When a request fails, the gateway tries the next candidate in its list. This includes failures from:
  • Timeouts
  • HTTP errors (4xx, 5xx)
  • Connection failures
Failover order:
  1. Next API key for the same model/provider
  2. Next model candidate (from model selection or client models array)
The gateway never retries the same model/key combination—it always moves to the next candidate. See Error Handling for details.

How long does the gateway try before giving up?

Controlled by total_timeout (default: 5 minutes):
total_timeout: "3m"  # Allow up to 3 minutes for all attempts

Can I disable automatic failover?

Not directly, but you can limit the attempts by:
  • Configuring only one API key per provider
  • Having clients specify a single model
  • Setting short timeouts:
per_request_timeout: "5s"
total_timeout: "10s"

Will the gateway failover to a different provider automatically?

Yes, if the client specifies multiple models that span different providers:
  1. Using the models array: "models": ["openai:gpt-4o", "anthropic:claude-3-5-sonnet-20241022"]
  2. Or using just the model name (like "model": "gpt-4o") when selection strategies return candidates from multiple providers
The gateway automatically tries alternative providers when the primary fails. You can also specify fallback models in the request body:
{
  "model": "gpt-4o",
  "models": ["claude-3-5-sonnet-20241022"],
  "messages": [{"role": "user", "content": "Hello"}]
}

Keys and authentication

Do I need to configure API keys in the gateway?

No, it’s optional. By default (passthrough mode), the gateway forwards whatever key your SDK sends. Configure keys in the gateway for:
  • Key rotation and failover
  • Hiding keys from clients
  • Key-level metrics

How do I secure my gateway when using server-side API keys?

When you configure API keys in the gateway, your endpoint becomes publicly accessible. Add authorization to protect it:
on_http_request:
  - expressions:
      - req.headers['authorization'][0] != 'Bearer ' + secrets.get('gateway-auth', 'access-token')
    actions:
      - type: custom-response
        config:
          status_code: 401
          body: '{"error": {"message": "Unauthorized"}}'

  - actions:
      - type: ai-gateway
        config:
          # ...
See Securing Your Gateway for complete examples.

Can I use different keys for different teams?

Yes, configure multiple keys and use metadata:
providers:
  - id: openai
    api_keys:
      - value: ${secrets.get('openai', 'team-a')}
      - value: ${secrets.get('openai', 'team-b')}
Note: Per-key metadata is not currently supported. Use separate endpoints or providers for team-based tracking.

How do I rotate API keys?

Add the new key, deploy, then remove the old key:
api_keys:
  - value: ${secrets.get('openai', 'old-key')}
  - value: ${secrets.get('openai', 'new-key')}  # Add new
After confirming the new key works:
api_keys:
  - value: ${secrets.get('openai', 'new-key')}  # Remove old

Performance and costs

Does the gateway add latency?

Yes, minimal overhead (~10-15ms) for parsing, token counting, and routing. Provider response time dominates total latency.

How are tokens counted?

Using tiktoken (OpenAI’s tokenizer) for estimation, with provider-reported counts used when available. Token counts are available in metrics and event destinations.

Does the gateway charge for token usage?

No, you only pay providers for actual token usage. The gateway itself doesn’t charge per token. Check ngrok’s pricing for gateway usage costs.

Can I cache responses to reduce costs?

Currently no. Caching may be added in future versions.

Security and privacy

Is my data stored by ngrok?

No, unless Traffic Inspector is enabled. By default:
  • Request/response bodies processed in-memory only
  • Data immediately discarded after processing
  • Only metadata (token counts, latencies) retained
See Traffic Inspector for details.

Can I use the gateway for sensitive data?

Yes, with considerations:
  • Disable Traffic Inspector in production
  • Review your compliance requirements
  • Understand that data still goes to AI providers

How do I redact PII automatically?

Use Traffic Policy’s find-and-replace actions before the AI Gateway action:
on_http_request:
  - actions:
      - type: request-body-find-replace
        config:
          replacements:
            - from: "\\b\\d{3}-\\d{2}-\\d{4}\\b"  # SSN pattern
              to: "[REDACTED]"
      - type: ai-gateway
        config:
          # ...
For streaming responses, use sse-find-replace in on_event_stream_message. See Modifying Requests and Modifying Responses for details.

Models and providers

Which providers are supported?

Built-in support for:
  • OpenAI
  • Anthropic
  • Google
  • DeepSeek
  • OpenRouter
  • Hyperbolic
  • Inception Labs
  • Inference Net
Plus any self-hosted OpenAI-compatible endpoint. See the Model Catalog for the full list.

Can I use self-hosted models?

Yes, you can configure a custom provider:
providers:
  - id: ollama
    base_url: "https://ollama.internal"
    models:
      - id: "llama3"
See Custom Providers for details.

How do I know which models are available?

Built-in providers have pre-configured model catalogs. For custom providers, you must specify models manually.

What is ngrok/auto?

ngrok/auto is a special model name that tells the gateway to choose the model based on your model selection strategy:
{
  "model": "ngrok/auto",
  "messages": [{"role": "user", "content": "Hello"}]
}
This is equivalent to omitting the model field. Use it when you want the gateway to select the best model based on latency, cost, or other criteria you’ve configured.

Can clients use models not in the catalog?

Yes, if a client includes a provider prefix (like openai:new-model), the gateway passes the request through to that provider even if the model isn’t in the catalog. This lets clients use new models immediately. To restrict this behavior, use a model selection strategy:
model_selection:
  strategy:
    - "ai.models.filter(m, m.known)"  # Only allow catalog models
The m.known field is false for pass-through models not in the catalog.

Why isn’t my custom model working?

Ensure:
  1. Your endpoint is OpenAI-compatible
  2. The model name is configured correctly
  3. Authentication is set up
  4. The endpoint is reachable from ngrok

Monitoring and debugging

How do I view request metrics?

View in the ngrok dashboard:
  1. Navigate to your AI Gateway endpoint
  2. Click “Metrics” tab
Or export to external systems via Log Exporting.

Why are my requests failing silently?

Check:
  1. Traffic Inspector (if enabled) for error details
  2. Event destinations for error logs
  3. Provider-level error metrics

How do I debug which provider was used?

Enable event destinations at the endpoint level to stream detailed request logs. Configure event destinations in your ngrok endpoint configuration (not in the ai-gateway action config).

Can I see individual request/response bodies?

Yes, enable Traffic Inspector in development (not recommended for production with sensitive data).

Advanced usage

Can I implement custom routing logic?

Yes, using CEL expressions in model selection strategies:
model_selection:
  strategy:
    - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_avg < 1000)"
    - "ai.models.random()"
See Model Selection Strategies.

How do I prioritize certain models or providers?

Use selection strategies to define preference order:
model_selection:
  strategy:
    - "ai.models.filter(m, m.provider_id == 'openai')"
    - "ai.models.filter(m, m.provider_id == 'anthropic')"
    - "ai.models"
The first strategy that returns models wins. If OpenAI models exist, only those are tried. Anthropic is only considered if no OpenAI models match.
Strategies control which models are considered, not failover order. For cross-provider failover when requests fail, have clients specify multiple models: models: ["openai:gpt-4o", "anthropic:claude-3-5-sonnet-20241022"].
See Model Selection Strategies.

Can I route based on request content?

Model selection is based on model/provider, but you can use Traffic Policy expressions to route different requests to different configurations:
on_http_request:
  - expressions:
      - req.headers['x-priority'][0] == 'high'
    actions:
      - type: ai-gateway
        config:
          model_selection:
            strategy:
              - "ai.models.filter(m, m.provider_id == 'openai')"
  - actions:
      - type: ai-gateway
        config:
          # Default configuration
You cannot currently route based on the message content itself.

Does the gateway support streaming?

Yes, streaming is fully supported. The gateway forwards SSE streams transparently.

Troubleshooting

The gateway is timing out

Increase timeout values:
per_request_timeout: "60s"
total_timeout: "5m"
Or check provider health metrics.

I’m getting rate limited even with multiple keys

Ensure keys are properly configured with multiple keys for automatic failover:
providers:
  - id: openai
    api_keys:
      - value: ${secrets.get('openai', 'key-one')}
      - value: ${secrets.get('openai', 'key-two')}
Keys are tried in order when the previous key fails.

Failover isn’t working

For key failover (same provider, different keys):
  • Configure multiple keys for the provider
  • Keys are tried in order when one fails
For provider failover (different providers):
  • Client must specify fallback models using the models array
  • Or the same model must be available from multiple providers
{
  "model": "gpt-4o",
  "models": ["claude-3-5-sonnet-20241022"],
  "messages": [{"role": "user", "content": "Hello"}]
}
For model selection failover:
  • Configure multiple strategies in model_selection
  • Strategies are tried in order until one returns models

How do I get support?


See also