Model Filtering Strategy

Basic example

Select only fast models:

on_http_request:
  - type: ai-gateway
    config:
      model_selection:
        strategy:
          - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_p95 < 1200)"
          - "ai.models.filter(m, m.metrics.global.error_rate.total < 0.01)"

How it works

Strategies execute in order:

Filter to models with p95 latency < 1200ms
From those, filter to models with < 1% error rate
Select from the remaining models

If a strategy returns no models, the gateway tries the next strategy.

Performance-based filtering

Low latency

Prefer the fastest models:

model_selection:
  strategy:
    - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_avg < 500)"
    - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_avg < 1000)"
    - "ai.models.random()"

High reliability

Prefer models with low error rates:

model_selection:
  strategy:
    - "ai.models.filter(m, m.metrics.global.error_rate.total < 0.01 && m.metrics.global.error_rate.timeout < 0.005)"
    - "ai.models.random()"

Balanced

Balance speed and reliability:

model_selection:
  strategy:
    - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_p95 < 2000 && m.metrics.global.error_rate.total < 0.02)"
    - "ai.models.random()"

Cost optimization

Prefer cheaper models

Select mini/turbo models when available:

model_selection:
  strategy:
    - "ai.models.filter(m, m.id.contains('mini') || m.id.contains('turbo'))"
    - "ai.models.random()"

Cost tiers

Define cost tiers with fallback:

model_selection:
  strategy:
    # Try budget models first
    - "ai.models.filter(m, m.id.contains('3.5-turbo'))"
    # Fall back to standard models
    - "ai.models.filter(m, m.id.contains('4o-mini'))"
    # Fall back to any model
    - "ai.models"

Provider preference

Prefer specific provider

Try a specific provider first:

model_selection:
  strategy:
    - "ai.models.filter(m, m.provider_id == 'openai')"
    - "ai.models.random()"

Avoid specific provider

Exclude a provider unless necessary:

model_selection:
  strategy:
    # Try everything except Google first
    - "ai.models.filter(m, m.provider_id != 'google')"
    # Fall back to Google if needed
    - "ai.models.filter(m, m.provider_id == 'google')"

Multi-criteria filtering

Complex performance criteria

model_selection:
  strategy:
    # Must be fast AND reliable
    - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_p95 < 1500 && m.metrics.global.error_rate.total < 0.01)"
    - "ai.models.random()"

Tiered selection

Prioritize different criteria:

model_selection:
  strategy:
    # Tier 1: Fast and reliable
    - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_p95 < 1000 && m.metrics.global.error_rate.total < 0.005)"
    # Tier 2: Moderate speed, good reliability
    - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_p95 < 2000 && m.metrics.global.error_rate.total < 0.01)"
    # Tier 3: Any model
    - "ai.models"

Metadata-based filtering

Use custom metadata to categorize and filter models:

providers:
  - id: openai
    models:
      - id: "gpt-4o"
        metadata:
          tier: "premium"
          compliance: "hipaa"
      - id: "gpt-4o-mini"
        metadata:
          tier: "budget"
          compliance: "standard"

Strategy:

model_selection:
  strategy:
    # Use budget tier for cost savings
    - "ai.models.filter(m, m.metadata.tier == 'budget')"
    # Fall back to any tier
    - "ai.models"

Compliance filtering

Filter by compliance requirements:

model_selection:
  strategy:
    - "ai.models.filter(m, m.metadata.compliance == 'hipaa')"
    - "ai.models"

Feature-based filtering

Filter by supported capabilities:

model_selection:
  strategy:
    # Only models that support tool calling
    - "ai.models.filter(m, 'tool-calling' in m.supported_features)"
    - "ai.models"

Vision models

Select models that can process images:

model_selection:
  strategy:
    - "ai.models.filter(m, 'image' in m.input_modalities)"
    - "ai.models"

Real-world examples

High-volume production

Optimize for cost and speed:

on_http_request:
  - type: ai-gateway
    config:
      providers:
        - id: openai
        - id: anthropic
        - id: google
      
      model_selection:
        strategy:
          # Prefer cheap, fast models with good reliability
          - "ai.models.filter(m, m.id.contains('turbo') || m.id.contains('mini')).filter(m, m.metrics.global.latency.upstream_ms_avg < 800 && m.metrics.global.error_rate.total < 0.02)"
          - "ai.models.random()"

High-reliability application

Optimize for reliability over cost:

model_selection:
  strategy:
    # Only the most reliable models, prefer premium
    - "ai.models.filter(m, m.metrics.global.error_rate.total < 0.005 && m.metrics.global.error_rate.timeout < 0.001).filter(m, !m.id.contains('mini'))"
    - "ai.models.random()"

Development environment

Prefer self-hosted for development:

model_selection:
  strategy:
    # Use self-hosted Ollama
    - "ai.models.filter(m, m.provider_id == 'ollama')"
    # Fall back to cloud for testing
    - "ai.models.filter(m, m.provider_id == 'openai' && m.id.contains('mini'))"
    - "ai.models"

Monitoring strategy effectiveness

Track which models are selected:

request_count{model="gpt-4o"}: 45,234
request_count{model="gpt-4o-mini"}: 123,456
request_count{model="claude-3-5-sonnet"}: 5,432

latency_avg{model="gpt-4o"}: 1100ms
latency_avg{model="gpt-4o-mini"}: 650ms

Use metrics to refine your strategy.

Available variables

Variable	Type	Description
`m.id`	string	Model identifier
`m.provider_id`	string	Provider identifier
`m.author_id`	string	Model author identifier
`m.display_name`	string	Human-readable model name
`m.custom`	boolean	Whether this is a custom model
`m.metadata`	object	Custom metadata
`m.input_modalities`	list	Supported input types
`m.output_modalities`	list	Supported output types
`m.supported_features`	list	Supported capabilities
`m.metrics.global.request_count`	number	Total requests
`m.metrics.global.latency.upstream_ms_avg`	number	Average latency (ms)
`m.metrics.global.latency.upstream_ms_p95`	number	P95 latency (ms)
`m.metrics.global.error_rate.total`	number	Overall error rate (0-1)
`m.metrics.global.error_rate.timeout`	number	Timeout error rate (0-1)
`m.metrics.global.error_rate.rate_limit`	number	Rate limit error rate (0-1)

Best practices

Start simple - Begin with basic filters, add complexity as needed
Include fallbacks - Always have a final strategy that accepts any model
Monitor metrics - Use metrics to validate your strategy is working
Test strategies - Test with production traffic patterns
Use request_count - Filter by m.metrics.global.request_count > 0 to ensure metrics exist

SDKs

Concepts

Guides

Custom Providers

Observability

Examples

Reference

Basic example

How it works

Performance-based filtering

Low latency

High reliability

Balanced

Cost optimization

Prefer cheaper models

Cost tiers

Provider preference

Prefer specific provider

Avoid specific provider

Multi-criteria filtering

Complex performance criteria

Tiered selection

Metadata-based filtering

Compliance filtering

Feature-based filtering

Vision models

Real-world examples

High-volume production

High-reliability application

Development environment

Monitoring strategy effectiveness

Available variables

Best practices

See also

SDKs

Concepts

Guides

Custom Providers

Observability

Examples

Reference

​Basic example

​How it works

​Performance-based filtering

​Low latency

​High reliability

​Balanced

​Cost optimization

​Prefer cheaper models

​Cost tiers

​Provider preference

​Prefer specific provider

​Avoid specific provider

​Multi-criteria filtering

​Complex performance criteria

​Tiered selection

​Metadata-based filtering

​Compliance filtering

​Feature-based filtering

​Vision models

​Real-world examples

​High-volume production

​High-reliability application

​Development environment

​Monitoring strategy effectiveness

​Available variables

​Best practices

​See also

Basic example

How it works

Performance-based filtering

Low latency

High reliability

Balanced

Cost optimization

Prefer cheaper models

Cost tiers

Provider preference

Prefer specific provider

Avoid specific provider

Multi-criteria filtering

Complex performance criteria

Tiered selection

Metadata-based filtering

Compliance filtering

Feature-based filtering

Vision models

Real-world examples

High-volume production

High-reliability application

Development environment

Monitoring strategy effectiveness

Available variables

Best practices

See also