Drive application performance and stability with global rate limiting

Ensuring that over-eager or actively malicious clients cannot overwhelm your service is a key component of production-quality network ingress. Global rate limiting API requests and incoming HTTP network traffic provides one avenue to ensuring reliable service in the face of high request volume. This is a key capability of our API Gateway that we launched last month.

Rate limiting can be used to ensure that a resource-intensive service cannot be overwhelmed by incoming requests, a particular user cannot demand all the attention of your software, and that capacity is reserved for paying users of your system. Each tenant deserves their fair share of resources; rate limiting gives you the power to prioritize clients that remain within their agreed upon usage, plan for future traffic, and scale efficiently. This becomes especially critical for AI apps - you want to deter excessive or suspicious activity by throttling the number of requests.

You can now apply rate limits with our Traffic Policy module. With our global rate limiter, when you host your service with ngrok, you get globally distributed traffic management while retaining the ability to control the total rate of requests to your software.

Configure a Rate Limit action

Rate limiting actions can be added to the inbound policy rule on any Traffic Policy module. You can configure the number of requests a particular endpoint can receive within a specified number of minutes and apply those rate limits overall or per each IP address or HTTP request header value.

Let’s look at an example:

  - type: rate-limit
      name: Ursula of strong grip
      algorithm: sliding_window
      capacity: 93
      rate: 5m
        - conn.ClientIP

Our Rate Limit policy action limits the number of requests sent to your upstream server over a specified time period. Each rate limiter needs a distinctive name for us to call it by. Pick one carefully; your rate limiter will have to live with it its whole life! 

In this example, Ursula the rate limiter will forward an incoming request to your upstream service as long as fewer than 93 requests—as specified by the capacity property—have been sent in the last 5 minutes—as specified by the rate property. 

The rate limiter action uses a sliding window algorithm to determine whether to limit a request to an endpoint. When Ursula makes her decision, she counts the requests in the last five minutes and allows the current request through if that count is less than 93. Her approach ensures that, on average, the number of requests to your upstream application over any five minute interval will be less than 93.

Intuitively, configuring Ursula with a large number of requests over a long period of time allows bursty traffic while protecting against a protracted, excessive stream of requests. Conversely, configuring her with a small number of requests over a short period of time will prevent bursts of requests while allowing a continuous stream of reasonable requests.

Use buckets to ensure fairness

Buckets allow you to specify the criteria to apply to incoming requests to determine whether the rate limit has been exceeded. Specifying no bucket key applies the same rate limit across all requests to your endpoint. This limits the total number of requests in a specified time period. However, this will still allow one abusive client to deny service to all other clients. 

To ameliorate this, requests can be divided across different buckets. In the example above, Ursula has one bucket per client IP address. Each bucket allows 93 requests over the last five minutes, so a single IP address spamming requests will be swiftly limited while Ursula will continue to permit other, more polite, IP addresses to continue making requests.

Bucket keys can be composed of the hostname to which the request is made, the client IP address, and values from the request’s HTTP headers. Including the hostname in the bucket key is important if you are configuring a Rate Limit policy action on an Edge serving multiple Domains and want each domain to have its own set of buckets. Limiting users of your upstream service by OAuth user can be done by including req.Headers[‘Authorization’] in the bucket key.

Multiple rate limiters can be configured as part of the same policy. Applying a permissive rate limit action with no bucket key followed by a more restrictive rate limit action keyed by client IP can protect your service against a deluge of requests from a plethora of locations while also ensuring that individual offenders are limited far in advance of your upstream service being exhausted.

Interpret responses from a rate limiter

When Ursula chooses to limit a request, instead of forwarding it to your upstream service, she responds to it with an HTTP 429: Too Many Requests response. This response contains a Retry-After header with the number of seconds a well-behaved client should wait before retrying the request. Attempting to retry sooner will result in the request being limited again.

Get started with the ngrok Rate Limit Policy today!

This blog post should help you get started using ngrok’s Rate Limit policy action to ensure fairness to your upstream services. So, what else are you waiting for? You’ve already got this as part of the policy module! Quit reading and go have fun. If you don’t have an account, you can sign up here and get started with ngrok. Connect with us on Twitter, the ngrok community on Slack, or contact us at and share your feedback.

Share this post
Product updates
Load balancer