Inference.net - ngrok documentation

Inference.net provides a distributed inference network for running AI models at scale. Inference.net requires you to bring your own key—ngrok-managed keys are not available.

Setup

Create an AI Gateway endpoint

If you don’t have one yet, follow the quickstart to create your AI Gateway endpoint.

Get an Inference.net API key

Make a request

Pass your key directly—the gateway forwards it to Inference.net.

from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-gateway.ngrok.app/v1",
    api_key="..."  # Your Inference.net key, forwarded by gateway
)

response = client.chat.completions.create(
    model="inference-net:meta-llama/llama-3.1-8b-instruct/fp-8",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Store key in the gateway

Instead of each client passing their own key, you can store it once in ngrok Secrets and have the gateway inject it automatically.

Storing your provider key in the gateway makes your endpoint publicly accessible. You must add authorization to prevent unauthorized use and unexpected charges. See Protecting BYOK Endpoints.

Store your key in ngrok secrets

ngrok api secrets create \
  --name inference-net \
  --secret-data '{"api-key": "..."}'

Or use the Vaults & Secrets dashboard.

Configure your traffic policy

on_http_request:
  - type: ai-gateway
    config:
      providers:
        - id: "inference-net"
          api_keys:
            - value: ${secrets.get('inference-net', 'api-key')}

Make a request

Clients no longer need an Inference.net key—pass any value for api_key.

from openai import OpenAI

client = OpenAI(
    base_url="https://your-ai-gateway.ngrok.app/v1",
    api_key="unused"  # Gateway injects your Inference.net key
)

response = client.chat.completions.create(
    model="inference-net:meta-llama/llama-3.1-8b-instruct/fp-8",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Next steps

Protecting BYOK Endpoints: add authorization to your endpoint
Managing Provider Keys: key rotation and multiple keys
Multi-provider failover: failover patterns

InceptionLabs Overview

Documentation Index

​Setup

​Store key in the gateway

​Next steps

Setup

Store key in the gateway

Next steps