Security Correlation Then and Now: A Sad Truth About SIEM
(By Anton Chuvakin — Originally posted at Anton on Security)
We all know David Bianco Pyramid of Pain, a classic from 2013. The focus of this famous visual is on indicators that you “latch onto” in your detection activities. This post will reveal a related mystery connected to SIEM detection evolution and its current state. So, yeah, this is another way of saying that a very small number of people are perhaps very passionate about it …
But who am I kidding? I plan to present a dangerously long rant about the state of detection content today. So, yes, of course there will be jokes, but ultimately that is a serious thing that had been profoundly bothering me lately.
First, let’s travel to 1999 for a brief minute. Host IDS is very much a thing (but the phrase “something is a thing” has not yet been born), the term “SIEM” is barely a twinkle in a Gartner analyst eye. However, some vendors are starting to develop and sell “SIM” and “SEM” appliances (It is 1999! Appliances are HOT!).
Some of the first soon-to-be-called-SIEM tools have very basic “correlation” rules (really, just aggregation and counting of a single attribute like username or source IP) and have rules like “many connections to the same port across many destinations”, “Cisco PIX log message containing SYNflood, repeated 50 times” and “SSH login failure.” Most of these rules are very fragile i.e. a tiny deviation in attacker activities will cause it to not trigger. They are also very device dependent (i.e. you need to write such rules for every firewall device, for example). So the SIM / SEM vendor had to load up many hundreds of these rules. And customers had to suffer through enabling/disabling and tuning them. Yuck!
While we are still in 1999, a host IDS like say Dragon Squire, a true wonder of 1990s security technology, scoured logs for things like “FTP:NESSUS-PROBE” and “FTP:USER-NULL-REFUSED.” For this post, I reached deep into my log archives and actually reviewed some ancient (2002) Dragon HIDS logs to refresh my memory, and got into the vibe of that period (no, I didn’t do it on a Blackberry or using Crystal Reports — I am not that dedicated).
Now fast forward to about 2003–2004 — and the revolution happened! SIEM products unleashed normalized events and event taxonomies. I spent some of that time categorizing device event IDs (where does Windows Event ID 1102 go?) into SIEM taxonomy event types, and then writing detection rules on them. SIEM detection content writing became substantially more fun!
This huge advance in SIEM gave us the famous correlation rules like “Several Events of The Exploit Category Followed By an Event of Remote Access Category to Same Destination” that delivered generic detection logic across devices. Life was becoming great! These rules were supposed to be a lot more resilient (such as “any Exploit” and “any Remote Access” vs a specific attack and, say, VNC access). They also worked across devices — write it once, was the promise, and then even if you change the type of the firewall you use, your correlation still detects badness.
Wow, magic! Now you can live (presumably) with dozens of good rules without digging deep into regexes and substrings and device event IDs across 70 system and OS version types deployed. This was (then) perceived as essential progress of security products, like perhaps a horse-and-buggy to a car evolution.
But you won’t believe what happened next!
Now, let’s fast forward to today — 2020 is almost here. Most of the detection content I see today is in fact written in the 1990s style of exact and narrow matching to raw logs. Look at all the sexy Sigma content, will you? A fellow Network Intelligence enVision SIM user from 1998 will recognized many of the detections! Sure, we have ATT&CK today, but it is about solving a different problem.
An extra bizarre angle here is that as machine learning and analytics rise, the need for clean, structured data rises if we were to crack more security use cases, not just detection. Instead, we just get more data overall, but less data that you can feed your pet ML unicorn with. We need more clean, enriched data, not merely more data!
To me, this feels like the evolution got us from a horse and buggy to a car, then a better car, then a modern car — and then again a horse and buggy …
So, my question is WHY? What happened?
I’ve been polling a lot of my industry peers about it, ranging from old ArcSight hands that did correlation magic 15 years ago (and who can take a good joke about kurtosis) and people who run detection teams today on modern tools [I am happy to provide shout-outs, please ping me if I missed somebody, because I very likely did due to some of you saying that you want to NOT be mentioned]
But first, before we get to the answer I finally arrived at, after much agonizing, let’s review some of the things I’ve heard during my data gathering efforts:
- Products that either lack event normalization or do it poorly (or lazily rely on clients to do this work) won the market battle for unrelated reasons (such as overall volume of data collected), and a new generation of SOC analysts have never seen anything else. So they get by with what they have. Let’s call this line of reasoning “the raw search won.”
- Threat hunters beat up the traditional detection guys because “hunting is cool” and chucked them out of the window. Now, they try to detect the same way they hunt — by searching for random bits of knowledge of the attack they’ve heard of. Let’s call this line of thinking “the hunters won.”
- Another thought was that tolerance for “false positives” (FP) has decreased (due to growing talent shortages) and so writing more narrow detections with lower FP rates became more popular (‘“false negatives” be damned — we can just write more rules to cover them’). These narrow rules are also easier to test. Let’s calls this “false positives won.”
- Another hypothesis was related to the greater diversity of modern threats and also a greater variety of data being collected. This supposedly left the normalized and taxonomized events behind since we needed to detect more things of more types. Let’s call this one “the data/threat diversity won.”
So, what do you think? Are you seeing the same in your detection work?
Now, to me all the above explanations left something to be desired — so I kept digging and agonizing. Frankly, they sort of make some sense, but my experience and intuition suggested that the magic was still missing…
What do I think really happened? I did arrive at a very sad thought, the one I was definitely in denial about, but the one that ultimately “clicked” and many puzzle pieces slid into place!
The normalized and taxonomized approach in SIEM never actually worked! It didn’t work back in 2003 when it was invented, and it didn’t work in any year since then. And it still does not work now. It probably cannot work in today’s world unless some things change in a big way.
When I realized this, I cried a bit. Given how much I invested in building, evangelizing and improving it, then actually trying to globally standardize it (via CEE), it feels kinda sad…
Now, is this really true? Sadly, I think so! SIEM event taxonomization is …
- always behind the times and more behind now than ever
- inconsistent across events and log sources — for every vendor today
- remains to be seriously different between vendors — and hence cannot be learned once
- contains an ever-increasing number of errors and omissions that accumulate over time
- is impossible to test effectively vs real threats people face today.
So, I cannot even say “SIEM event taxonomy is dead”, because it seems like it was never really alive. For example, “Authentication Failure” event category from a SIEM vendor may miss events from a new version of software (such as a new event type introduced in a Windows update), miss events from an uncommon log source (SAP login failed), or miss events erroneously mapped to something else (say to “Other Authentication” category).
In essence, people write stupid string-matching and regex-based content because they trust it. They do not — en masse — trust the event taxonomies if their lives and breach detections depend on it. And they do.
What can we do? Well, I am organizing my thinking about it, so wait for another post, will you?