Introducing Endpoint Pools: Load balance anything, anywhere

May 20, 2025

min read

Today, I’m excited to announce that we’ve added load balancing to ngrok with a new feature we call Endpoint Pooling. Load balancing is now as simple as starting two endpoints with the same URL.

Let’s try it! First, fire up one endpoint.

ngrok http 8080 --url https://api.example.com --pooling-enabled

Then, run the same command again in a separate terminal (or another machine):

ngrok http 8080 --url https://api.example.com --pooling-enabled

Traffic to https://api.example.com will now be load balanced between the two.

That's it. Truly. When two or more endpoints share a URL, ngrok load-balances between them whether they're running on different machines, environments, networks, or even in different clouds.

Endpoint pools are available on all non-legacy paid plans and you can also try it out on the free tier. On paid plans, it is available at no additional charge to start, but we'll add pricing for it as early as June. Read the Endpoint Pooling documentation to get started.

How ngrok's load balancer works

When two or more endpoints share a URL, ngrok groups them into an Endpoint Pool. Traffic to the URL of the Endpoint pool is balanced equally among the constituent endpoints via random distribution. Endpoints are automatically added to the pool they come online and removed from the pool as they go offline.

Don’t worry if you don’t want your endpoints to pool. Pooling is disabled by default and endpoints must “enable pooling” to be pooled together. If an existing endpoint has pooling disabled and a new endpoint is created with pooling enabled (or vice-versa), the new endpoint will fail to create.

Endpoint pooling is supported for all endpoints.

You can pool cloud and agent endpoints, including agent endpoints started via different mechanisms (CLI, SDK, k8s operator).
You can pool endpoints with any binding, that includes internal endpoints.
You can pool endpoints of all protocols. Traffic to HTTP/S endpoints is balanced at layer 7 (on a per request basis) and traffic to TCP and TLS endpoints is balanced at layer 4 (on a per connection basis).

Lastly, endpoints with different Traffic Policies can be pooled together. Traffic is balanced to an endpoint in the pool first and then its Traffic Policy rules are applied.

Balance across multi-cluster, multicloud, and hybrid environments

Today’s cloud-native workloads often span multiple clusters, container runtimes, networks and even clouds, but existing load balancers are limited to balancing only within a single cluster or a single network.

We designed Endpoint Pools so that you can balance across endpoints that are running on different networks, regions and cloud providers. You can even balance across cloud infrastructure and your own hardware. Host one replica in your office's server closet, another on a Droplet in Frankfurt, and a half-dozen in us-east-1.

If you’re running Kubernetes, it’s as easy as installing the ngrok Kubernetes Operator and applying the exact same Ingress or Gateway manifest in every cluster you want to balance between.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: demo-k3d
  namespace: demo
  annotations:
    k8s.ngrok.com/pooling-enabled: "true"
spec:
  ingressClassName: ngrok
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: hello-world
                port:
                  number: 80

kubectl apply -f ingress.yml --context eks-us-east-1
kubectl apply -f ingress.yml --context lke-eu-west
kubectl apply -f ingress.yml --context k3d

And just like that, ngrok load balances incoming requests among all these endpoints in the same pool—your multi-cluster and multicloud networking is done and dusted.

Resize pools with zero configuration

We designed Endpoint Pooling for today’s cloud-native world: backends run on elastic infrastructure like containers, pods, and entire clusters that scale up and down across heterogenous environments. Dynamic infrastructure is the rule, not the exception.

Endpoint pools require zero up front configuration. Unlike other gateways, there is no list of backends to configure up front. When you create endpoints with the same URL, they’re grouped into an Endpoint Pool and traffic begins balancing among them.

Endpoints automatically register and unregister themselves from the pool. Unlike other gateways, you do not need to keep ngrok in-sync with a service registry. Endpoints join the pool when they come online and are removed from the pool when they shut down. Heartbeats ensure that endpoints that don’t exit cleanly are also removed from the pool.

Want to load balance between many endpoints? No problem. Just fire up more endpoints, there’s nothing to configure. We’ve begun to see load balancing used to farm out inference tasks to LLMs spread across a fleet of GPUs which is why we architected Endpoint Pools to scale.

On our pay-as-you-go plan, you can add up to 1000 endpoints to an Endpoint Pool. We plan to increase that limit as we optimize Endpoint Pools to scale to more demanding workloads.

Derisk your Traffic Policy changes

Updating a gateway’s configuration is a sweaty-palms proposition for even the most experienced engineers – one wrong configuration change and all of your traffic goes poof.

We’ve integrated Endpoint Pools with Traffic Policy to derisk these changes by allowing each endpoint in a pool to define its own Traffic Policy. As a reminder, Traffic Policy is ngrok’s CEL-based rules engine that enables you to to filter, route, manage and orchestrate traffic through your endpoints.

When traffic is received on a URL with an Endpoint Pool, traffic is first balanced to an endpoint in the pool and then its Traffic Policy is applied. This means you can canary-test a new gateway configuration and build confidence in its correctness before rolling it out completely. If your configuration is broken, only the portion of traffic to that endpoint is disrupted.

Geo-aware global balancing

If you run globally distributed replicas of your application, ngrok tries to strike the delicate balance between performance fairness by prioritizing endpoints in your pool closest to where a given user and request came from.

ngrok will accelerate your traffic by balancing among the endpoints in an endpoint pool that are closest to the point of presence where incoming originates. To improve latency for global clients, you can spin up new geo-distributed replicas of your application service. After you do so, ngrok will route traffic from nearby clients to the new application replicas.

And more …

Custom load balancing strategies

The single most common question we get about endpoint pools is: “Can I customize how traffic is balanced?” The answer is: not yet. Custom load balancing strategies are coming soon—hop to the Early Access page in your dashboard to request a preview when they're ready.

Balance with the CLI, SDKs, K8s Operator

Endpoint pools work no matter how you use ngrok—that means you can load balance between endpoints created via the Agent CLI, one of our SDKs in Go, Rust, JavaScript, and Python, the Kubernetes Operator. You can even mix-and-match the endpoints in the same pool.

Balance internal endpoints, too

Endpoint pools are supported on all Endpoint Bindings so you can balance your Internal Endpoints as well. When traffic is forwarded to an internal URL via the forward-internal action, it is balanced among those internal endpoints if they are pooled. This enables you to balance among agent endpoints while keeping your Traffic Policy centrally managed on a Cloud Endpoint.

Ready to hop in?

Endpoint Pools are perfect for helping you:

Add capacity to your services
Tolerate failures with failover and redundancy
Geo-distribute your load based on where your customers are
Go multi-region, multi-cluster, or multicloud with no extra networking
Derisk gateway-level traffic management changes

I'm excited to see what you bring to the endpoint pool party after you've gotten yourself situated with our docs and guides:

Have questions about pools? Jump in on our new Discord server to chat with me and a bunch of other ngrokkers.

Share this post

Alan Shreve

Alan Shreve is no stranger to building distributed systems at scale. He organically grew ngrok from zero to 5 million users before raising $50 million Series A.