What you’ll need
Your model server must:- Expose a supported API surface, such as OpenAI Chat Completions, OpenAI Responses, or Anthropic Messages.
- Be reachable from the AI Gateway.
- Have at least one model ID configured on the provider.
Connect a local model with an internal endpoint
If your model runs on your machine or private network, expose it with an ngrok internal endpoint. For example, if your model server listens on port11434:
Internal endpoints (
.internal domains) are private to your ngrok account, meaning they’re not reachable from the public internet. Use the same ngrok account here and in the AI Gateway, otherwise the gateway can’t reach the endpoint.Create a custom provider
In app.ngrok.ai:- Go to Providers.
- Open the Custom tab.
- Select Add provider.
- Choose External URL or Local.
- Enter a provider ID, base URL, API format, and model IDs.
- Add a provider key if the upstream requires authentication.
Create a custom provider with the API
Allow the provider on an access key
Creating a custom provider doesn’t automatically make every access key able to call it. To route traffic to the provider:- Add a provider key if the upstream requires authentication.
- Create or update an access key configuration.
- Allow the custom provider ID.
- Assign the configuration to an access key.
Call the model
Useprovider:model in the request.
URL requirements
| URL type | Scheme | Example |
|---|---|---|
| External | https:// only | https://api.example.com/v1 |
| ngrok internal | http:// or https:// | https://my-ollama.internal |
.internal endpoints. External URLs must use HTTPS.
Next steps
- Ollama guide: Connect Ollama
- vLLM guide: Connect vLLM
- Provider keys: Store upstream credentials
- Restrict providers and models: Allow the custom provider on a key