> ## Documentation Index > Fetch the complete documentation index at: https://ngrok.com/docs/llms.txt > Use this file to discover all available pages before exploring further. # Metrics Reference > Real-time metrics for intelligent model selection. The AI Gateway collects real-time performance metrics that you can use in [model selection strategies](/ai-gateway/guides/model-selection-strategies) to make intelligent routing decisions. ## Availability **Important**: Metrics are **only available within `model_selection.strategy` CEL expressions**. They are not available in: * General `expression` fields in Traffic Policies * Other action configurations This is because metrics are populated at runtime during AI Gateway request processing, specifically when evaluating model selection strategies. ## Accessing metrics Metrics are available on each model through the `metrics` field: ```yaml theme={null} model_selection: strategy: # Filter by latency - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_avg < 1000)" # Filter by error rate - "ai.models.filter(m, m.metrics.global.error_rate.total < 0.05)" # Fallback - "ai.models" ``` ## Metric scopes Metrics are collected at multiple scopes, allowing you to make decisions based on global trends or your specific usage: | Scope | CEL Path | Description | | -------- | ------------------------------ | ---------------------------------------------- | | Global | `m.metrics.global` | Aggregated across all ngrok accounts | | Region | `m.metrics.region` | Aggregated for the region handling the request | | Account | `m.metrics.account` | Your ngrok account's usage | | Endpoint | `m.metrics.endpoint` | This specific endpoint's usage | | API Key | `m.metrics.api_keys["key_id"]` | Per-provider API key metrics | ### Scope selection guidelines * **Global** - Best for understanding overall provider health and comparing models you haven't used yet * **Region** - Useful when latency varies by geographic region * **Account** - Reflects your specific usage patterns and rate limit status * **Endpoint** - Most specific, useful for per-application decisions * **API Key** - Track quota and usage for specific provider API keys ## Available metrics ### Base metrics (all scopes) | Field | Type | Description | | --------------- | ------ | ---------------------------------------- | | `provider` | string | Provider ID (for example, `"openai"`) | | `model` | string | Model ID (for example, `"gpt-4o"`) | | `request_count` | uint64 | Total requests in the aggregation window | | `start_time` | uint32 | Window start (Unix timestamp in seconds) | | `end_time` | uint32 | Window end (Unix timestamp in seconds) | ### Latency metrics Access via `m.metrics..latency`: | Field | Type | Description | | ------------------------------ | -------- | ------------------------------------------------------------------ | | `gateway_ms_avg` | uint32 | Average gateway processing time (request received → upstream sent) | | `gateway_ms_p95` | uint32 | P95 gateway processing time | | `upstream_ms_avg` | uint32 | Average time to receive full response from provider | | `upstream_ms_p95` | uint32 | P95 upstream response time | | `time_to_first_token_ms_avg` | uint32\* | Average TTFT (streaming responses only) | | `time_to_first_token_ms_p95` | uint32\* | P95 TTFT (streaming responses only) | | `time_per_output_token_ms_avg` | uint32\* | Average inter-token time (streaming only) | | `time_per_output_token_ms_p95` | uint32\* | P95 inter-token time (streaming only) | \*Fields marked with `*` may be null if no streaming requests have been recorded. ### Error rate metrics Access via `m.metrics..error_rate`: All values are fractions from 0.0 to 1.0 (for example, 0.05 = 5% error rate): | Field | Type | Description | | ------------ | ------- | ---------------------------------------------------- | | `total` | float32 | Overall error rate (any non-2xx/3xx response) | | `timeout` | float32 | Timeout errors (no response received within timeout) | | `rate_limit` | float32 | Rate limit errors (HTTP 429) | | `client` | float32 | Client errors (4xx excluding 429) | | `server` | float32 | Server errors (5xx) | ### Token metrics (account/endpoint/API key scopes) Access via `m.metrics..token`: | Field | Type | Description | | ------------------ | ------ | -------------------------------------------- | | `provider_input` | uint64 | Input tokens as reported by the provider | | `provider_output` | uint64 | Output tokens as reported by the provider | | `estimated_input` | uint64 | Input tokens estimated by ngrok's tokenizer | | `estimated_output` | uint64 | Output tokens estimated by ngrok's tokenizer | Token metrics are only available at Account, Endpoint, and API Key scopes. Global and Region scopes do not include token counts. ### Quota metrics (API key scope only) **Safe API key access**: `m.metrics.api_keys` is only populated for keys that have made requests recently, so a given model may have no entry for a given key. Accessing a missing map entry in CEL throws—always check existence first. ❌ This throws an error if the key hasn't issued a request for this model yet: ```yaml theme={null} - "ai.models.filter(m, m.metrics.api_keys['my-key-id'].latency.upstream_ms_avg < 1000)" ``` ✅ Check the key exists first: ```yaml theme={null} - "ai.models.filter(m, 'my-key-id' in m.metrics.api_keys && m.metrics.api_keys['my-key-id'].latency.upstream_ms_avg < 1000)" ``` Keys in this map are AI Gateway API Key IDs, not raw key values. Access via `m.metrics.api_keys["key_id"].quota`: | Field | Type | Description | | -------------------- | -------- | -------------------------------------------- | | `remaining_requests` | uint64\* | Requests remaining before hitting rate limit | | `remaining_tokens` | uint64\* | Tokens remaining before hitting rate limit | | `limit_requests` | uint64\* | Max requests allowed in rate limit period | | `limit_tokens` | uint64\* | Max tokens allowed in rate limit period | \*Fields may be null if quota information is not available from the provider. ## Examples ### Route to fastest models Prefer models with low average latency: ```yaml theme={null} model_selection: strategy: - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_avg < 500)" - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_avg < 2000)" - "ai.models" ``` ### Avoid high error rates Skip models with too many errors: ```yaml theme={null} model_selection: strategy: - "ai.models.filter(m, m.metrics.global.error_rate.total < 0.01)" - "ai.models.filter(m, m.metrics.global.error_rate.total < 0.10)" - "ai.models" ``` ### Avoid rate-limited providers Skip models currently hitting rate limits: ```yaml theme={null} model_selection: strategy: - "ai.models.filter(m, m.metrics.global.error_rate.rate_limit < 0.05)" - "ai.models" ``` ### Combined performance criteria Use multiple criteria for optimal routing: ```yaml theme={null} model_selection: strategy: # Ideal: fast and reliable - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_avg < 1000 && m.metrics.global.error_rate.total < 0.01)" # Good: reasonably fast with acceptable errors - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_avg < 3000 && m.metrics.global.error_rate.total < 0.05)" # Fallback: any available model - "ai.models" ``` ### Sort by latency Order models by speed instead of filtering: ```yaml theme={null} model_selection: strategy: - "ai.models.sortBy(m, m.metrics.global.latency.upstream_ms_avg)" ``` ### Use account-specific metrics Base decisions on your own usage data: ```yaml theme={null} model_selection: strategy: # Use your account's error rate data - "ai.models.filter(m, m.metrics.account.error_rate.total < 0.05)" - "ai.models" ``` ## Metric availability notes 1. **New models**: Models without historical data will have zero values for metrics. Include a fallback strategy that doesn't filter by metrics. 2. **Custom providers**: Metrics for custom providers (Ollama, vLLM, etc.) are only available after you've sent traffic through them. 3. **Aggregation windows**: Metrics are aggregated over rolling time windows. The exact window size may vary by scope. 4. **Metric freshness**: Metrics are cached and updated periodically. There may be a brief delay before recent requests are reflected. ## See also * [Model Selection Strategies](/ai-gateway/guides/model-selection-strategies) - Full guide on selection strategies * [CEL Functions Reference](/ai-gateway/reference/cel-functions) - All available CEL functions * [Configuration Reference](/ai-gateway/reference/configuration-schema) - Full configuration options