Security Content Generating Too Many False Positives? Make the Machine Tune Your Rules!

Security Content Generating Too Many False Positives? Make the Machine Tune Your Rules!


How many logs does a typical SOC index daily? Honestly, too many. Endpoint, network, cloud, application; each of these sources is extremely chatty. The cybersecurity community deals with this firehose by writing content rules filtering logs into a smaller number of (hopefully) security relevant events. Still, even a fraction of a lot is too many for triaging or correlation.  There are also those rules that will always just need to be noisy, like encoded PowerShell, email, registry queries - really anything that is widely used by many applications and threat actors (check out our “Detecting at the Apex” white paper where we explain how to better deal with those). 

Right now, many content engineers in the SOC have to modify rules using allow lists (i.e. exceptions for rules). This type of manual effort requires a lot of time to fine-tune searches so that security relevant logs are not accidentally ignored.  We have seen content engineers open dozens of tabs, misplace searches and revise queries multiple times to create these exceptions. Anvilogic has developed a solution to automatically generate entries for your allowlists to further reduce the number of filtered events. Our auto allowlisting uses machine learning and intuition (from years of detection rule engineering) to do a lot of the grunt work of identifying strings and creating regexes to safely ignore non-security relevant logs. Turn once noisy rules into high-fidelity alerts or useful events for correlation.  Ultimately, our goal is to empower your SOC to create new content, which our platform will help tune automatically.  

Leverage Machine Learning for Process Strings of Endpoint Logs:

Our initial recommendations for allowlisting focus on the command line of endpoint logs (e.g. “cmd.exe /e something interesting”). When we think of different machine learning techniques that could be leveraged for this task, there are certainly options from unsupervised, semi-supervised and reinforcement learning. For us, however, learning a model that may be somewhat of a black box could inadvertently ignore security relevant logs (e.g. an attacker using LOTL techniques). Instead, we chose to leverage some classic machine learning techniques to find commonly occurring patterns in the logs that we could easily build regexes from. Since many SOC analysts are comfortable using regexes to hunt for logs, they can easily verify the correctness of any recommendation we provide.  

The Move Towards Automating Allowlisting: Simplifying RegEx 

Ultimately, our tuning recommendations with regexes allow us to drive down the number of entries our customers need to maintain for each alert.  Fewer entries enable our customers to verify more quickly the correctness of these exceptions.  But what do these regexes typically look like? 

Let’s take an example inspired by multiple customers. Adversaries will deploy a container in order to avoid detection. Monitoring Kubernetes is not a small feat considering how often stacks are modified, particularly to scale across many machines (thanks CI/CD!).  There’s a lot of variability in the container naming, making it difficult to allowlist on a specific command line length string. However, these deployments provide the opportunity for regular expressions to key in common keywords across these commands since humans still write these stacks and need to know what is running. 

For Example:

  • ​​kubectl exec -i application-worker-z987s -n fancynetwork --bash -c your-command

Can pull out a regex like

  • *application-worker*fancynetwork*your-command

This regex will be laser-focused on the benign activity in your network and able to generalize unseen examples. Specifically, we want to only match substrings in a regex clearly related to a specific task that will not be present in attack commands. In the above example, the algorithm will choose substrings like application-worker and fancynetwork that are extremely unlikely to be present in a command issued by an attacker.  The algorithm takes pains to avoid common substrings like “Windows,” “Files,” and “Temp” that occur so often, including in attacks.  In addition, the algorithm also tries to optimize the selection of good substrings for regexes resulting in the fewest number of entries. For example, application-worker is a better substring than application-worker-z9 since it will match many more logs but is still matching very specific words in the command.

Spend Less Time Managing Rules and More Time Creating

Content engineering for the SOC is hard. Most rules are not one and done: after deployment, they need to be tuned periodically to reduce volume for analysts. Anvilogic can generate recommendations for allow list entries on process fields (and more to come) to minimize time managing rules and focus on content creation to address the ever-expanding vectors of attacks that adversaries and criminals employ.

We’ve seen a great deal of positive customer sentiment to our allowlist functionality. Take St. George University as an example: “Allowlisting, version control, and easy rollout of detections made Anvilogic stick out. These are features that our SIEM was severely lacking.”

To learn more about how you can leverage allowlisting to save your team a lot of manual effort, check out this video from our Director of Customer Success, Michael Monte.

Good hunting,

Mike Hart

Chat with our team to receive a free maturity assessment

Get in Touch

Ready to learn more about Anvilogic?

Kickstart your security operations

Anvilogic provided the necessary threat detection automation for our small SOC, adding a significant force-multiplier advantage for my team.