Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ngrok.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

When troubleshooting AI Gateway issues, you can access detailed information about what happened during request processing using action result variables. This page explains how to capture and interpret this data.

Action result variables

After the ai-gateway action runs, detailed results are available in ${actions.ngrok.ai_gateway}. This includes:
  • Model selection process and filtering steps
  • Every model and request attempted
  • API key selection details
  • Token counts
  • Latency measurements
  • Error details for failed attempts

Schema

Accessing action results

To access action results, configure on_error: "continue" so subsequent actions can inspect the data:
on_http_request:
  - type: ai-gateway
    config:
      on_error: continue
  - type: log
    config:
      metadata:
        ai_gateway_result: ${actions.ngrok.ai_gateway}
  - type: deny
Cloud Endpoints require a terminal action such as deny, custom-response, redirect, or forward-internal to complete the request. See Cloud Endpoints for more details.

Debugging patterns

Return results as response (development)

During development, return the full action result to the client for inspection:
on_http_request:
  - type: ai-gateway
    config:
      on_error: continue
  - type: custom-response
    config:
      status_code: 503
      headers:
        content-type: application/json
      body: ${actions.ngrok.ai_gateway}
Example response:
{
  "status": "error",
  "error": {
    "code": "ERR_NGROK_3807",
    "message": "All AI providers failed to respond successfully."
  },
  "client": {
    "method": "POST",
    "path": "/v1/chat/completions",
    "user_agent": "curl/8.17.0",
    "model": "gpt-4o",
    "models": ["gpt-4.1", "gpt-5"],
    "api_key_hash": "sk-proj...7890" // If the actual key was sk-proj-abcdefghijklmnopqrstuvwxyz1234567890
  },
  "input_ngrok_tokens": 150,
  "gateway_latency_ms": 45,
  "model_selection": [
    {
      "strategy": "ai.models.filter(x, x.id in [\"gpt-4o\", \"claude-3-5-sonnet-20241022\"])",
      "candidates_returned": ["gpt-4o", "claude-3-5-sonnet-20241022"],
      "candidates_after_allowed_filter": ["gpt-4o"],
      "candidates_after_client_filter": ["gpt-4o"],
      "models_to_try": ["gpt-4o"]
    }
  ],
  "models_tried": [
    {
      "model": "gpt-4o",
      "provider": "openai",
      "author": "openai",
      "api_key_selection": [
        {
          "strategy": "ai.keys",
          "keys_to_try": ["sk-proj...2315"]
        }
      ],
      "requests_made": [
        {
          "status": "error",
          "error": "rate limit exceeded",
          "upstream_input_tokens": 0,
          "upstream_output_tokens": 0,
          "request": {
            "url": "https://api.openai.com/v1/chat/completions",
            "api_key": "sk-proj...2315"
          },
          "response": {
            "status_code": 429,
            "headers": {"content-type": ["application/json"]},
            "body_on_error": "{\"error\":{\"message\":\"Rate limit exceeded\",\"type\":\"rate_limit_error\"}}"
          },
          "upstream_latency": {
            "time_to_first_byte_ms": 120,
            "total_ms": 125
          }
        }
      ]
    }
  ]
}

Send to log exports (production)

In production, send action results to your logging infrastructure:
on_http_request:
  - type: ai-gateway
    config:
      on_error: continue
  - type: log
    config:
      metadata:
        ai_gateway_result: ${actions.ngrok.ai_gateway}
  - type: deny
This fires a log event that can be exported to your observability platform. See Log Exporting for setup.

Combined approach

Log the results and return a user-friendly error:
on_http_request:
  - type: ai-gateway
    config:
      on_error: continue
  - type: log
    config:
      metadata:
        ai_gateway_result: ${actions.ngrok.ai_gateway}
  - type: custom-response
    config:
      status_code: 503
      headers:
        content-type: application/json
      body: |
        {
          "error": "AI service temporarily unavailable",
          "code": "${actions.ngrok.ai_gateway.error.code}"
        }

Interpreting results

Identifying rate limits

Look for status_code: 429 in request responses:
{
  "models_tried": [
    {
      "model": "gpt-4o",
      "provider": "openai",
      "requests_made": [
        {
          "status": "error",
          "response": {
            "status_code": 429,
            "body_on_error": "{\"error\":{\"type\":\"rate_limit_error\"}}"
          },
          "request": {
            "api_key": "sk-proj...2315"
          }
        }
      ]
    }
  ]
}
Solution: Add more provider keys. With attached provider keys, attach additional keys for the same provider—the most recently attached key is tried first, falling back to older keys on failure. See Multi-Key Failover.

Identifying model filtering issues

When you get ERR_NGROK_3804 or want to understand what the gateway did, check model_selection to understand how models were filtered:
{
  "status": "error",
  "model_selection": [
    {
      "strategy": "round-robin",
      "candidates_returned": ["gpt-4o", "claude-3-5-sonnet-20241022"],
      "candidates_after_allowed_filter": ["gpt-4o"],
      "candidates_after_client_filter": [], // After filtering by client model/models we have no remaining models
      "models_to_try": []
    }
  ],
  "models_tried": [],
  "error": {
    "code": "ERR_NGROK_3804",
    "message": "Unable to route request - no models matched"
  }
}
Solution: The client requested models that didn’t survive filtering. Check client.model and adjust your configuration or the client request. Model names/spelling may be wrong. Try adding the provider prefix for new models or models not in the gateway config or model catalog.

Identifying timeout issues

Look for attempts without a status_code or with timeout errors:
{
  "models_tried": [
    {
      "model": "gpt-4o",
      "provider": "openai",
      "requests_made": [
        {
          "status": "error",
          "error": "context deadline exceeded",
          "response": {
            "status_code": 0
          }
        }
      ]
    }
  ]
}
Solution: Increase per_request_timeout or investigate provider latency using upstream_latency metrics.

Understanding latency

Use the latency measurements to identify bottlenecks:
{
  "gateway_latency_ms": 45,
  "models_tried": [
    {
      "requests_made": [
        {
          "upstream_latency": {
            "time_to_first_byte_ms": 2500,
            "total_ms": 8000
          }
        }
      ]
    }
  ]
}
Solution: High time_to_first_byte_ms/total_ms indicates slow provider response. High gateway_latency_ms indicates gateway processing overhead. Increase per_request_timeout/total_timeout or investigate provider latency.

Next steps

Troubleshooting

Error codes and solutions

Log Exporting

Export logs to your observability platform