Protect services with a circuit breaker at your API gateway
You can now better protect your upstream services from attacks, abuse attempts, and cascading error conditions with our new circuit-breaker Traffic Policy action. A circuit breaker helps improve the reliability of your systems, beyond the DDoS protection included with every ngrok account, by rejecting requests when your services respond with 500-level error codes, then re-evaluating the health of your upstream service before resuming normal traffic flows.
You can now add and customize a circuit breaker in front of your APIs and apps with a single Traffic Policy rule:
---
on_http_request:
- actions:
- type: circuit-breaker
config:
error_threshold: 0.25
If your upstream service starts responding with 25% error codes, ngrok steps in to pause traffic and potentially prevent much bigger problems.
What is circuit breaking?
Unsurprising to anyone who’s linked too many strings of Christmas lights on the same outlet, the theory behind circuit breaking in software comes directly from the switches behind your breaker box.
When either systems detect an unhealthy or sustainable state—due to overcurrent or error response codes—they pause flow to prevent even worse conditions and allow components to return to their normal state. From a software engineering and resilience perspective, that’s incredibly useful in a few ways:
- ngrok’s network blocks malicious attacks before they have enough of an impact on your upstream services to degrade the experience for others.
- Your systems save on CPU cycles trying to complete tasks that will inevitably fail.
- You prevent cascading failures across services, particularly in a microservices environment, by stopping requests before they even ingress into your system.
- Users receive informative error messages instead of blank screens or hanging curl requests, which is a better user experience.
Where does circuit breaking happen—the service or the API gateway?
Does every device you plug into your outlets have a separate circuit breaker? Or do you have one place—a breaker box—to funnel current to many circuits from one convenient place?
Software circuit breakers work best the same way, too, but unlike your laptops and refrigerators, you do get some choice in the matter.
With a service-level circuit breaker, you get fine-grained control and all the custom failure logic you can imagine. No matter the state of the rest of your infrastructure, you know this in-service circuit-breaking function will respond as expected. That control comes with some downsides—notably, it’s a lot of work to build this custom code, maintain state against the rest of your system, and observe what’s going on inside the system to make sure it's running the way you expect.
When you implement circuit breaking at your API gateway, you get consistent failure handling across your API services without having to build the same function over and over again. With a configuration language like Traffic Policy, you can pair circuit breaking with rate limiting, global load balancing, and DDoS protection to quickly implement multiple layers of protection for your services and systems.
Either way, you must first properly instrument your API service with proper error handling to send relevant success and error messages.
Does every API need a circuit breaker?
You get the most value from a circuit breaker when you have:
- API services with critical external dependencies.
- High-throughput systems where failure in one place can cascade into others.
- Systems with limited resources, like IoT devices or network appliances, which need extra protection from entering out-of-memory states.
- Microservice architectures with many possible failure points.
With proper observability data, you can also figure out how badly you need a circuit breaker by looking for increased latency during peak loads or whether your systems are exhausted of resources during partial outages. Obviously, cascading failures across services are a surefire sign that stopping the flow of traffic, and allowing your services time to recover, will help you from an operational perspective.
That said, you benefit every time you add circuit breaking to your API services.
What holds most folks back from implementing circuit breakers everywhere isn’t misplaced confidence that their services are “too small to fail” or have some built-in resilience on an architectural level. The big problem is that they’re difficult to implement in many situations.
That’s often because you get stuck trying to implement circuit breaking at the service level and get stuck trying to custom-wire these complex patterns. In other cases, you realize that you need to upgrade to a more expensive edition or install third-party plugins to enable the feature with your current API gateway or reverse proxy. It's unfortunate, because the use case for and benefit of circuit breakers is pretty definitive.
But, instead of pondering over observability data and trying to decide when it’s the “right” time to finally implement circuit breaking for your API services, ngrok makes it so easy you have no excuse not to.
What can you do with ngrok’s circuit breaker?
As with all of ngrok’s Traffic Policy actions, you have plenty of opportunities to flexibly control exactly how circuit breaking works and at which points in your routing topology.
Tweak thresholds based on observability data
When you use ngrok as your API gateway, you can also use Traffic Inspector to view every request and response sent to your upstream services. That includes seeing the entire lifecycle of your circuit breaker action, including which requests triggered error codes from your upstream service, which requests were blocked by the tripped breaker (with the associated ngrok error 3202), and when they started to flow normally again.
This built-in observability helps you quickly configure the exact conditions to open your circuit breaker and debug common errors by replaying specific requests against a development environment.
Replace rate limits with a circuit breaker
If you're currently using rate limiting on your endpoints to restrict usage and protect your services' health as a side effect, a circuit breaker may be a better option.
In most cases, rate limiting is most useful to enforce fairness, in that every user gets equal access to your API or app. That's particularly important in a multitenant architecture, where everyone is pinging the same backend service—if one user becomes a "noisy neighbor" by sending too many requests, they can negatively impact the experience for others.
A circuit breaker is better suited to explicitly protect service health. You're less concerned about a noisy neighbor in this case, but rather any user activity that could cause your services to go offline, which could include an accidental DDoS attack.
Another benefit of using a circuit breaker for service health is that it's based solely on the error rate of your service, not your overall infrastructure. If you use a rate limit to protect your services, you'll need to tweak your capacity/rate settings every time you scale up your infrastructure or make your code more efficient.
Compose circuit breaker configurations on cloud and internal endpoints
One of the most powerful functions of Traffic Policy is that it’s inherently composable. That means you can have one circuit breaker policy attached to a cloud endpoint for all your services, then also configure different thresholds for specific upstream APIs that are more “fragile.”
Let’s walk through a quick example, using a combination of cloud and internal endpoints, to help you test circuit breaking at the API gateway level.
We’ll assume you have multiple internal endpoints running at https://foo.internal
, https://bar.internal
, and so on, each of which point to an upstream API service.
In the ngrok dashboard, reserve a domain and head over to the Endpoints section of the ngrok dashboard and to create a new cloud endpoint. Leave the binding as Public and enter your reserved domain as the URL.
You’ll see an editable IDE with some example YAML, which you should replace with the following:
---
on_http_request:
- actions:
- type: circuit-breaker
config:
error_threshold: 0.20
- actions:
- type: forward-internal
config:
url: https://${req.url.path}.internal
Click Save to apply this policy, which first checks whether to trip the circuit breaker on every HTTP request. An error rate of 50% within 10 seconds will trip the circuit breaker for an additional 10 seconds before resuming traffic flow to normal.
If the circuit breaker remains closed, in that traffic is not paused, then ngrok forwards requests to internal endpoints based on the path of the request (e.g. https://api.example.com/foo
-> https://foo.internal
and https://api.example.com/bar
-> https://bar.internal
).
You can then attach a separate policy, with fine-tuned thresholds, to https://foo.internal
:
---
on_http_request:
- actions:
- type: circuit-breaker
config:
error_threshold: 0.05
tripped_duration: 2m
window_duration: 1m
volume_threshold: 25000
The cloud endpoint-level circuit breaker then triggers on more widespread error states, while the specific circuit breaker for https://foo.internal
steps in only during specific situations you want to prepare for in advance.
A tripped circuit breaker is better than a broken system
The circuit breaker action is available with every ngrok account—just sign up for free to get started. From there, we highly recommend our developer documentation on traffic management, from circuit breakers to mTLS and beyond:
- Get started with Traffic Policy and the ngrok agent
- Core concepts of Traffic Policy
- Variables available to conditionally handle traffic
Have questions or want to request a new feature in Traffic Policy? We’d love to hear from you. Hit us up at support@ngrok.com or create an issue or discussion on the ngrok community repo. if you prefer a more in-person experience, be sure to check out Office Hours, our monthly livestream for answering your questions with demos straight from our DevRel and Product folks.