Corin Caliendo

Like all things, regular expressions come with a cost

Discussion created by Corin Caliendo Employee on May 5, 2016
Latest reply on Sep 13, 2017 by Dhilip Venkatesh Uvarajan

Cloudlet rules that include a regular expression (regex) match require more resources during evaluation than other match conditions. Because of this resource cost, here are some general considerations and best practices to consider before using regex.

Note: Currently the Audience Segmentation, Edge Redirector, Forward Rewrite, and Input Validation Cloudlets support regex matches.

 

General Considerations When Using Regular Expressions

Before using the URL Regular Expression match with Cloudlets, consider the following:

  • There is a maximum processing cost per policy. Regular expressions have a very high processing cost, often 100 times more expensive than other match criteria. The actual number of rules processed per policy depends on the complexity of the regular expressions defined. You can exceed the maximum cost for the policy by using as few as 50 to 100 regular expressions.

  • Only use regular expressions and capture groups if you need to extract a value and use it in either a redirect or a forward path. They add significant cost.

Note: The regex match for Cloudlets supports a maximum of nine numbered substitutions using capture groups. Also, if you need to use parentheses for grouping, use a non-capturing group, like this: (?:x).

 

  • When using regular expressions, you can reduce the cost by constructing your rule to first match based on path or query string before matching on the regex. The path and query string matches both allow wildcards.

  • Don’t include the incoming protocol in the regex if the redirect path uses the same protocol. In the regex implementation for Cloudlets, the incoming protocol is included by default.

For example, if your regex is ^(https?)://www.test.com/(.*) and \1://www.test1.com/\2 is the redirect, you can use www.test.com/(.*) as the regex and www.test1.com/\1 as the redirect instead.

  • As processing errors occur during runtime, the only way currently to determine whether your policy will exceed the maximum processing cost is through thorough testing. If you hit the maximum during testing, try making your regular expressions more efficient, and follow the best practices listed below. 


Best Practices When Using Regular Expressions

 

If you need to use a regex match, consider following these best practices:

In this case, while using the regular expression reduces the number of rules you have, it increases the cost significantly. Remember, the cost to evaluate one regular expression is often 100 times more expensive than the corresponding set of rules rewritten without regular expressions.

  • Review the list of rules for the entire policy version and sort based on the following order of precedence:
    1. Protocol (HTTP/HTTPS)
    2. Hostname
    3. Path
    4. Query String
  • If you have to use regular expressions in a rule, include a combination of hostname, path, and query string matches whenever possible to reduce the cost. For example:

 

ExampleMatch StructureSubstitution Pattern for Redirect
You want to extract the product ID from a query string parameter, and redirect using the ID as a path parameter.
  1. Query String match: prod_id=*
  2. Regex match:^https?://host1.example.com/path1(?:.*)[?&]prod_id(?:=([^&]*))?
https://host2.example.com/products/\1
You want to capture everything after /path1/* on host1.example.com and re-route to /path2/ on host2.example.com.
  1. Path match: /path1/*
  2. Regex match:^https?://host1.example.com/path1/?(.*)?
https://host2.example.com/path2/\1

Outcomes