Earlier this year, we released our owasp-crs-request and owasp-crs-response Traffic Policy actions for you to protect your ngrok endpoints with a Web Application Firewall (WAF).
In this post I'll explain how we built these WAF actions and how we dogfood them today.
A WAF is a type of firewall.
Like all firewalls, a WAF inspects traffic and decides whether to allow or deny it. What makes a WAF unique is that it understands what web applications use: HTTP. While traditional firewalls understand lower-level details like IP addresses, ports, and packets, a WAF understands the contents of each HTTP request like the headers, query params, and body.
A WAF defends against web application attacks. These are attacks that aim to trigger unintended application behavior by embedding malicious payloads inside otherwise valid HTTP. One example is SQL injection, where the request includes a SQL snippet meant to trick your application into running an unintended database command.
Even before adding these WAF actions, ngrok already had many of the core capabilities of a WAF.
First, ngrok already sees and understands HTTP. Because we sit in the traffic path between the internet and your upstream services, we see every HTTP request and response. And because we terminate TLS, we have access to the decrypted HTTP that a WAF needs.
Second, ngrok lets you write logic against HTTP contents. Today we support this through Traffic Policy. With Traffic Policy variables, you have access to the contents of a HTTP request. With Traffic Policy expressions, you can write logic against those variables. And with Traffic Policy rules you can define what to do based on the evaluation.
For example, here's a Traffic Policy rule that configures ngrok to protect your upstream application against a SQL injection attempt.
on_http_request:
- name: "Check user agent header for sql injection attempt"
expressions:
- req.headers["user-agent"].contains("' OR '1'='1 ")
actions:
- type: "custom-response"
config:
status_code: 403
body: "you have been blocked by our WAF"However, users told us they wanted more. They didn't want to write or maintain their own rules for every web attack pattern. They wanted ngrok to provide a "one-click" way to defend against the most common attacks.
What we were missing was a way to tell our system what to look for.
Maintained by the Open Worldwide Application Security Project (OWASP), a nonprofit dedicated to improving application security, the OWASP Top Ten is a widely referenced list of the most important web application risks. While this list is great for awareness, it doesn't directly translate into rules a WAF can execute.
The OWASP CRS, however, fills this gap. The CRS ("Core Rule Set") is a collection of rules and signatures that includes phrase matches, regex patterns, and heuristics designed to identify attacks.
As an example, below is a CRS rule that detects SQL injection attempts by looking for common database names. It uses a regex to scan fields in an HTTP request. If it finds one, it increments an "anomaly score." The CRS isn't pass/fail but maintains a score for how likely a request is to be malicious.
SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|REQUEST_COOKIES_NAMES|ARGS_NAMES|ARGS|XML:/* "@rx (?i)\b(?:d(?:atabas|b_nam)e[^0-9A-Z_a-z]*\(|(?:information_schema|m(?:aster\.\.sysdatabases|s(?:db|ys(?:ac(?:cess(?:objects|storage|xml)|es)|modules2?|(?:object|querie|relationship)s))|ysql\.db)|northwind|pg_(?:catalog|toast)|tempdb)\b|s(?:chema(?:_name\b|[^0-9A-Z_a-z]*\()|(?:qlite_(?:temp_)?master|ys(?:aux|\.database_name))\b))" \
"id:942140,\
phase:2,\
block,\
capture,\
t:none,t:urlDecodeUni,\
msg:'SQL Injection Attack: Common DB Names Detected',\
logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}',\
tag:'application-multi',\
tag:'language-multi',\
tag:'platform-multi',\
tag:'attack-sqli',\
tag:'paranoia-level/1',\
tag:'OWASP_CRS',\
tag:'OWASP_CRS/ATTACK-SQLI',\
tag:'capec/1000/152/248/66',\
tag:'PCI/6.5.2',\
ver:'OWASP_CRS/4.14.0',\
severity:'CRITICAL',\
setvar:'tx.sql_injection_score=+%{tx.critical_anomaly_score}',\
setvar:'tx.inbound_anomaly_score_pl1=+%{tx.critical_anomaly_score}'"While we could've expanded our earlier Traffic Policy to have it catch more cases with a regex, that gets tedious fast. The CRS provides a broadly applicable ruleset you can easily enable with a few lines of Traffic Policy. The ruleset is also open source, and the community behind it keeps it updated as real world attack patterns change.
Having decided on our ruleset, we still needed to a way to evaluate these rules. CRS rules are written in SecLang, a domain specific languages for WAFs.
Fortunately OWASP Coraza solves this. It's an open source, high-performance WAF engine that parses and executes CRS rules. It also has a native Go library, which is perfect for us, as most of ngrok is written in Go.
From the start we knew if we were going to build a WAF product, we wanted to make it part of the Traffic Policy system.
We decided to add two Traffic Policy actions, owasp-crs-request and owasp-crs-response. The former runs CRS rules on the HTTP request and the latter on the HTTP response.
Embedding Coraza into Traffic Policy brought several advantages:
on_http_request to CRS's request headers and request body phases, and on_http_response to the response headers and response body phases.on_error config option) where the action runs but does not actually deny traffic. This is how WAF deployments typically work. First you run it in detection mode to make sure you don't have false positives, and then once you're satisfied, you run it in block mode.
owasp-crs-request and owasp-crs-response return result variables to explain not only that a request was blocked, but why.ngrok runs a multi-tenant platform that serves thousands of customers and hundreds of thousands of endpoints. While building out these WAF actions, we identified two main risks to running at scale.
ngrok endpoints are backed by handler chains which run the logic defined in each endpoint's Traffic Policy. These are lazily initialized, but once someone hits an endpoint with a request, the compiled handlers live in memory. At hundreds of thousands of active endpoints, we need to be careful about the memory footprint of every compiled handler.
We profiled a compiled Coraza instance to consume around 25MB of memory. This is driven primarily by the compiled pattern matchers (~33% of the memory cost) and regex structures (~10%) that enable efficient scanning at runtime. 25MB for a WAF instance doesn't sound that bad, but for hundreds of thousands of endpoints, it quickly adds up.
This led us to design the WAF actions around a singleton Coraza instance per node, shared across all endpoints using the WAF actions. Coraza makes this is safe because the WAF instance only stores the global CRS rule state while each HTTP request gets its own Coraza Transaction to hold the request specific context.
We also disable the logging phase of the CRS, which eliminates the possibility of sharing logs across tenants. Instead, we store this data in action result variables per request.
We had a trade-off to make between the amount of body we give Coraza to scan and the stability of our platform. The more body we can process, the more attacks we can catch, but the more memory we need to devote to it.
Bodies need to be buffered in memory to scan them. Every additional buffered byte increases memory pressure across the platform, and with too much memory pressure we might see increased garbage collection leading to increased latency or pods running out of memory and crashing.
To determine the largest amount we could safely handle, we gathered data by running load tests against production ingress nodes. These tests varied along dimensions like the Traffic Policy used on the endpoint and body size.
As a proxy for balancing both performance and stability, we chose our success criteria to be the 99th percentile HTTP request rate without triggering alerts due to failures in our end-to-end test suite, which runs continously and exercises the functionality that our customers rely on.
Through this testing we arrived at the current limit of 4KB. We chose to err on the conservative side as a starting point.
We love to dogfood our products. Not only because they're useful to us but to make sure they are actually ready for use. Ahead of releasing the WAF actions publicly, we ran them on ngrok.com for several months.
on_error: continue). Immediately we saw in the logs that traffic was being blocked.exclude_rule_ids: 953100 for owasp-crs-response).on_error: halt)./downloads/docker-desktop to ngrok.com which triggered rule 932260, which uses a regex-based check to scan for remote command execution attacks. Here the substring docker- was matching.exclude_rule_ids: 932260 for owasp-crs-request) and then re-enabled the actions to run in block mode which is where we are today.In our original design, owasp-crs-request and owasp-crs-response didn't give you any control over the rules it ran. You could configure whether it was in dry-run mode or block mode, but you couldn't disable specific rules if they caused you trouble.
Since all the docs false positives had been triggered by only a few rules, very much causing us trouble, we added the exclude_rule_ids parameter. This lets you disable specific rules that may be false positives for you.
If you really wanted to, you could use Traffic Policy to disable specific rules for specific paths. We didn't do this because we're not running PHP and those particular rules aren't likely to be helpful for us, and we want to keep our configuration simple.
Today we run the WAF actions in dry-run mode, using actions variables to create custom log and custom-response actions that are triggered upon a deny. The log action enables us to send deny metadata to Datadog, and the custom-response action actually blocks the traffic by responding with a nicely-formatted error pages.
Sending deny logs to Datadog also lets us set up alerts for when there's a high level of denies. These alerts page our on-call engineers so that they can quickly investigate the cause of the denies.
The WAF actions form one layer of our defenses for ngrok.com. Within the same Traffic Policy document for ngrok.com, we also run the rate-limit and close-connection actions to protect us against volume-based attacks.
on_http_request:
- name: run waf on all requests in continue mode
# scan all traffic
expressions: []
actions:
- config:
exclude_rule_ids:
- 932260
# in order to run the log action
on_error: continue
process_body: true
type: owasp-crs-request
- name: log all waf deny decisions
expressions:
- actions.ngrok.owasp_crs_request.decision == 'deny'
actions:
- config:
metadata:
action: waf deny
anomaly_score: ${actions.ngrok.owasp_crs_request.anomaly_score}
first_matched_data: ${actions.ngrok.owasp_crs_request.matched_rules[0].data}
first_matched_id: ${actions.ngrok.owasp_crs_request.matched_rules[0].id}
first_matched_message: ${actions.ngrok.owasp_crs_request.matched_rules[0].message}
first_matched_severity: ${actions.ngrok.owasp_crs_request.matched_rules[0].severity}
ngrok_error_message: ${actions.ngrok.owasp_crs_request.error.message}
type: log
- name: "return ngrok error page for all requests denied with 403 (exceeded anomaly threshold)"
expressions:
# check whether the request was denied due to exceeding the anomaly threshold
- actions.ngrok.owasp_crs_request.decision == 'deny' &&
actions.ngrok.owasp_crs_request.error.code == 'ERR_NGROK_3700'
actions:
- config:
# this is just a big ol' blob of HTML that we want to return to the client
body: >
<!DOCTYPE html>
<html class="h-full" lang="en-US" dir="ltr">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="preload" href="https://assets.ngrok.com/static/fonts/euclid-square/EuclidSquare-Regular-WebS.woff" as="font" type="font/woff" crossorigin="anonymous" />
<link rel="preload" href="https://assets.ngrok.com/static/fonts/euclid-square/EuclidSquare-RegularItalic-WebS.woff" as="font" type="font/woff" crossorigin="anonymous" />
<link rel="preload" href="https://assets.ngrok.com/static/fonts/euclid-square/EuclidSquare-Medium-WebS.woff" as="font" type="font/woff" crossorigin="anonymous" />
<link rel="preload" href="https://assets.ngrok.com/static/fonts/euclid-square/EuclidSquare-MediumItalic-WebS.woff" as="font" type="font/woff" crossorigin="anonymous" />
<link rel="preload" href="https://assets.ngrok.com/static/fonts/ibm-plex-mono/IBMPlexMono-Text.woff" as="font" type="font/woff" crossorigin="anonymous" />
<link rel="preload" href="https://assets.ngrok.com/static/fonts/ibm-plex-mono/IBMPlexMono-TextItalic.woff" as="font" type="font/woff" crossorigin="anonymous" />
<link rel="preload" href="https://assets.ngrok.com/static/fonts/ibm-plex-mono/IBMPlexMono-SemiBold.woff" as="font" type="font/woff" crossorigin="anonymous" />
<link rel="preload" href="https://assets.ngrok.com/static/fonts/ibm-plex-mono/IBMPlexMono-SemiBoldItalic.woff" as="font" type="font/woff" crossorigin="anonymous" />
<meta name="author" content="ngrok">
<meta name="description" content="ngrok is the fastest way to put anything on the internet with a single command.">
<meta name="robots" content="noindex, nofollow">
<link id="style" rel="stylesheet" href="https://cdn.ngrok.com/static/css/error.css">
<noscript>The request was blocked by the WAF. (ERR_NGROK_3700)</noscript>
<script id="script" src="https://cdn.ngrok.com/static/js/error.js" type="text/javascript"></script>
</head>
<body class="h-full" id="ngrok">
<div id="root" data-payload="eyJjZG5CYXNlIjoiaHR0cHM6Ly9jZG4ubmdyb2suY29tLyIsImNvZGUiOiIzNzAwIiwibWVzc2FnZSI6IlRoZSByZXF1ZXN0IHdhcyBibG9ja2VkIGJ5IHRoZSBXQUYuIiwidGl0bGUiOiJGb3JiaWRkZW4ifQ=="></div>
</body>
</html>
headers:
Content-Type: text/html
Referrer-Policy: no-referrer
ngrok-error-code: ERR_NGROK_3700
status_code: 403
type: custom-response
on_http_response:
# The changes we made here mirror the above, cut here to save space.We've been running these WAF actions for more than six months on ngrok.com. In that time, they have run the CRS rules on every one of the 300M+ requests and responses. Of those, we've blocked ~1.2% (~3.6M) requests from reaching the upstream services of ngrok.com.
Sampling from the deny logs, we see that the attempted attacks vary widely. These attacks have triggered 98 different CRS rules and follow a power law distribution. The attack patterns also vary over time.
The top two most frequently matched rules, 920440 and 930130, both protect against attempted access of sensitive files. The first rule protects by using a regex to match file types in the last part of the path and then comparing them against a list of restricted extensions (e.g. .log, .bak). The second protects by using phrase match for sensitive filenames and directories (e.g. .env, .git/) against the request path.
We have come a long way from our initial design of the WAF actions to validating that it runs at scale in front of ngrok.com. But we know we are still early in our WAF journey. We want to do more, including:
We would also love to hear what you are looking for in a WAF—if you have feedback on our Traffic Policy actions or feature requests, we'd love to hear about them in our GitHub community repo.