Skip to main content
The Traffic Policy configuration reference for the AI Gateway action.

Supported phases

on_http_request

Type

ai-gateway

Basic structure

on_http_request:
  - type: ai-gateway
    config:
      max_input_tokens: 4096
      max_output_tokens: 8192
      headers: {}
      query_params: {}
      body: {}
      on_error: "halt"
      total_timeout: "5m"
      per_request_timeout: "30s"
      providers: []
      only_allow_configured_providers: false
      only_allow_configured_models: false
      model_selection:
        strategy: []
      api_key_selection:
        strategy: []

Configuration fields

max_input_tokens
integer

Maximum number of tokens allowed in the prompt and context. Requests exceeding this limit will be rejected.

No limit is applied if not specified. Maximum allowed value is 500,000.

max_input_tokens: 4096
max_output_tokens
integer

Maximum number of tokens allowed in the completion response.

No limit is applied if not specified. Maximum allowed value is 500,000.

max_output_tokens: 2048
headers

Additional HTTP headers to include in requests to AI providers.

headers:
  X-Custom-Header: "value"
  X-Request-ID: "${req.id}"
query_params

Additional query parameters to append to provider requests.

query_params:
  api_version: "2023-10-01"
body

Additional JSON fields to merge into the request body.

body:
  temperature: 0.7
  top_p: 0.9
on_error
enum
default:halt

Behavior when all failover attempts are exhausted.

Supported values
halt (default) - Stop processing and return error to client
continue - Continue to next action in Traffic Policy
on_error: "continue"
total_timeout
string
default:5m

Maximum total time for all failover attempts across all models and keys. Must be specified as a duration string (for example, “2m”, ”90s”).

total_timeout: "2m"
per_request_timeout
string
default:30s

Timeout for a single request to a provider. Must be specified as a duration string (for example, ”45s”, “1m”).

per_request_timeout: "45s"
providers
array

List of AI provider configurations. When empty, all built-in providers are allowed in passthrough mode.

See Provider Configuration below for detailed field definitions.

providers:
  - id: "openai"
    api_keys:
      - value: ${secrets.get('openai', 'key-one')}
only_allow_configured_providers
boolean
default:false

When true, only providers explicitly listed in providers are allowed. Requests to other providers are rejected with an error.

only_allow_configured_providers: true
only_allow_configured_models
boolean
default:false

When true, only models explicitly listed in provider configurations are allowed. Requests for other models are rejected.

only_allow_configured_models: true
model_selection
object

Strategy for selecting model candidates using CEL expressions. The first strategy that returns models wins—subsequent strategies are only used if previous ones return no models.

See Model Selection Strategies for details and CEL Functions Reference for available functions.

model_selection:
  strategy:
    - "ai.models.filter(m, m.provider_id == 'openai')"
    - "ai.models"
api_key_selection
object

Strategy for selecting API keys using CEL expressions. Enables intelligent key selection based on metrics like quota usage and error rates.

When not specified, keys are tried in the order listed. See CEL Functions Reference for available functions.

api_key_selection:
  strategy:
    - "ai.keys.filter(k, k.quota.remaining_requests > 100)"
    - "ai.keys.filter(k, k.error_rate.rate_limit < 0.1)"
    - "ai.keys"

Provider configuration

Each provider in the providers array supports these fields:
providers[].id
string
Required

Provider identifier. Use built-in names (openai, anthropic, google, deepseek) or custom names for self-hosted providers.

- id: "openai"
providers[].id_aliases
array of strings

Alternative identifiers for this provider. Allows clients to reference the same provider by different names.

- id: "custom-openai"
  id_aliases: ["openai", "gpt"]
providers[].base_url
string

Custom endpoint URL for self-hosted or alternative provider endpoints. Required for custom providers.

- id: "ollama"
  base_url: "https://ollama.internal.company.com"
providers[].display_name
string

Human-readable name for the provider.

providers[].description
string

Description of the provider.

providers[].website
string

Provider’s website URL.

providers[].disabled
boolean
default:false

Temporarily disable this provider without removing its configuration.

- id: "openai"
  disabled: true
providers[].metadata
object

Custom metadata for tracking and organization. Not sent to providers. Available in selection strategies via m.getMetadata().

- id: "openai"
  metadata:
    team: "ml-platform"
    environment: "production"
providers[].api_keys
array

List of API keys for this provider. Keys are tried in order for automatic failover.

api_keys:
  - value: ${secrets.get('openai', 'primary')}
  - value: ${secrets.get('openai', 'backup')}
providers[].models
array

List of model configurations for this provider. See Model Configuration below.

API key configuration

Each API key in providers[].api_keys supports:
providers[].api_keys[].value
string
Required

The API key value. Use secrets.get() for secure storage.

api_keys:
  - value: ${secrets.get('openai', 'key-one')}

Model configuration

Each model in providers[].models supports:
providers[].models[].id
string
Required

Model identifier as recognized by the provider.

models:
  - id: "gpt-4o"
providers[].models[].id_aliases
array of strings

Alternative identifiers for this model.

models:
  - id: "gpt-4o-2024-11-20"
    id_aliases: ["gpt-4o", "gpt-4-latest"]
providers[].models[].author_id
string

ID of the model author (for third-party models).

providers[].models[].display_name
string

Human-readable name for the model.

providers[].models[].description
string

Description of the model.

providers[].models[].disabled
boolean
default:false

Temporarily disable this model.

models:
  - id: "gpt-3.5-turbo"
    disabled: true
providers[].models[].metadata
object

Custom metadata for the model. Available in selection strategies.

models:
  - id: "gpt-4o"
    metadata:
      tier: "premium"
      approved: true
providers[].models[].input_modalities
array of strings

Input types supported by the model (for example, “text”, “image”, “audio”).

providers[].models[].output_modalities
array of strings

Output types supported by the model.

providers[].models[].max_context_window
integer

Maximum context window size in tokens.

providers[].models[].max_output_tokens
integer

Maximum output tokens the model can generate.

providers[].models[].supported_features
array of strings

Features supported by the model (for example, “tool-calling”, “coding”).

Complete example

on_http_request:
  - type: ai-gateway
    config:
      max_input_tokens: 4096
      max_output_tokens: 2048
      total_timeout: "3m"
      per_request_timeout: "30s"
      on_error: "halt"
      only_allow_configured_providers: true
      only_allow_configured_models: true
      
      providers:
        - id: "openai"
          metadata:
            team: "ml"
          api_keys:
            - value: ${secrets.get('openai', 'primary')}
            - value: ${secrets.get('openai', 'backup')}
            - value: ${secrets.get('openai', 'emergency')}
          models:
            - id: "gpt-4o"
              metadata:
                approved: true
            - id: "gpt-4o-mini"
              metadata:
                approved: true
        
        - id: "anthropic"
          api_keys:
            - value: ${secrets.get('anthropic', 'key')}
          models:
            - id: "claude-3-5-sonnet-20241022"
        
        - id: "ollama-internal"
          base_url: "https://ollama.company.internal"
          models:
            - id: "llama3-70b"
      
      model_selection:
        strategy:
          - "ai.models.filter(m, m.provider_id == 'openai')"
          - "ai.models.filter(m, m.provider_id == 'anthropic')"
          - "ai.models.filter(m, m.provider_id == 'ollama-internal')"
      
      api_key_selection:
        strategy:
          # Prefer keys with remaining quota
          - "ai.keys.filter(k, k.quota.remaining_requests > 100)"
          # Fall back to keys with low error rates
          - "ai.keys.filter(k, k.error_rate.rate_limit < 0.1)"
          # Fall back to all keys
          - "ai.keys"