December 14, 2023
|
5
min read

Get started with User Agent Filtering

Mandy Hubbard

ngrok users can now use the User Agent Filter module to filter traffic destined for upstream services based on the value of the HTTP <code>user-agent</code> request header. 

What is a User Agent?

A User Agent is any application that can interact with another application on behalf of an end user. It includes other applications–such as front-end applications that access backend APIs—as well as bots and scrapers. Command line tools such as <code>curl</code> are also user agents, as are download managers.  

User Agent HTTP header

When a client makes an HTTP request to your service, that request includes a <code>user-agent</code> header containing information about the application making the request, such as the type of application, its operating system, software vendor name, and software version. The value of the <code>user-agent</code> header is known as the User Agent (UA) string. Applications use the UA string to decide which content, if any, to return to the client in the HTTP response.

For example, a Google bot might make a request that includes this User Agent string in the <code>user-agent</code> HTTP request header:

User Agent filtering

Filtering based on the UA string allows you to exclude bots and crawlers from gaining access to your service. Most bots do no harm, and some benefit you by monitoring your application’s performance and availability or by indexing your website to optimize search engine results. However, some bots may crawl your website or application in search of email addresses to spam or might be searching for a backdoor into your application, such as via a development login page. You can also filter out known malicious User Agents or block old clients or browsers that are below a minimum version.

Regardless of why you want to restrict certain traffic, ngrok’s User Agent Filter module allows you to do just that—without modifying your application code.

How does the User Agent Filter module work?

ngrok’s User Agent Filter module equips you to filter traffic to your endpoints based on the User Agent string by matching the value against a list of <code>allow</code> and <code>deny</code> regular expression rules you define. You can use the module with the ngrok agent, the agent SDKs, or the ngrok Ingress Controller for Kubernetes. 

  • ngrok agent: Define the rules in the agent configuration file or pass them as parameters on the command line
  • SDKs: Describe rules declaratively within your application code
  • Ingress Controller for Kubernetes: Declare the rules in your manifest and apply them to your cluster

When a request destined for your upstream service arrives in ngrok’s global network, the User Agent Filter module checks the <code>user-agent</code> header against the list of defined <code>allow</code> and <code>deny</code> regular expression rules. The module forwards requests with a User Agent string matching the regular expressions in the list onto your upstream service and restricts access for requests whose User Agent strings do not match, returning a <code>403 Forbidden</code> error.

If the request does not include a <code>user-agent</code> header or the value is null, it will match the regular expression, and the module will forward the request to your service. If you’d prefer not to receive requests with no <code>user-agent</code> header, you can add a <code>deny</code> rule for empty strings. Requests may contain multiple <code>user-agent</code> headers, which the module matches as a single value.

In this example, HTTP request 1 is for Googlebot Images, a special type of web crawler used to index images for display in “Image Search” results, and HTTP request 2 is for regular Googlebots, which index web pages for display in typical Google search results. 

ngrok blocks HTTP request 2 because it matches the <code>deny</code> rule. However, ngrok forwards HTTP request 2 onto the upstream service even though it matches the <code>deny</code> rule because it also matches the <code>allow</code> rule, which overrides the <code>deny</code> rule when there is a conflict. 

Get started with ngrok and the User Agent Filter module today

To get started using the User Agent Filter module to control access to your services, check out these examples

You can override <code>deny</code> rules by defining a more specific <code>allow</code> rule. For example, you could add the following to your agent configuration to block all Google bots except Google images:

tunnels:
  example:
    proto: http
    addr: 80
    user-agent-filter:
     allow:
       - "(?i)googlebot-image"
     deny:
       - "(?i)google"
       - "[Bb]ing"

You could achieve the same thing by passing command line flags when you start the agent:

ngrok http 80 --ua-allow (?i)google-images --ua-deny (?i)google --ua-deny [Bb]ing

You can also modify the <code>allow</code> and <code>deny</code> list for any of your endpoints directly in the UI:

The examples above filter out Google bots as an illustration. However, when you enable the User Agent Filter module on your HTTP endpoints, ngrok automatically filters out bots and crawlers, including Google bots.

If you have any questions, don’t hesitate to reach out. Connect with us on Twitter, the ngrok community on Slack, or contact us at support@ngrok.com. And if you want to sign up for an account, you can do so here.

Share this post
Mandy Hubbard
Mandy Hubbard is a seasoned technologist with a strong QA and developer advocacy background. She is passionate about software quality, CI/CD, good processes, and great documentation. Mandy is currently a Sr. Technical Marketing Engineer at ngrok, where she combines her technical experience and creative skills to help bring new features to customers.