Schrödinger's Alert: Finding Risks vs. Finding Threats
Schrödinger's Alert (noun): an alert that is neither a true positive nor a false positive until organizational context is added.
You're a security analyst. You log on to see what alerts have fired recently. You pick one at random: AWS Successful Console Login Without MFA . According to the alert details, "This alert can be used to identify potential unauthorized access attempts, as logging in without MFA can indicate compromised credentials or misconfigured security settings." That sounds pretty serious! But... has anything bad actually happened here?
The Typical Investigation Flow
Ambiguous alerts like these are extraordinarily frustrating to deal with, in my experience. For the investigator, the process often looks like:
- Find the human identity that used the AWS identity for login. If you're lucky, there's some sort of systematic mapping here, otherwise this can be a real adventure in IdP logs.
- Reach out to the person who logged into the console. Let's face it, most engineers ignore email so this is probably happening as a Slack or Teams message.
- Wait for a response from the end user. If you're lucky, they'll reply before you log off for the day. If not, you'll have to add it to your list of things to hand off to the next person, or else follow up the next day.
- Close out the alert as "True Positive - Benign" when the end user confirms that it was their activity.
This is a ton of effort to chase down something that actually doesn't represent malicious behavior. Our detection fired correctly, but what we found was a risky behavior, not a threat actor in our environment.
The Tyranny of Binary Classification
I think there are two major contributing factors that lead to the prevalence of Schrodinger's Alerts in our environments:
- First, vendors with a detection product have to make sure it's suitable across a diverse range of environments. I don't want to pick on any vendor specifically here - you can easily find plenty of ambiguous alerts in the open-source Sigma repository. But ultimately, "risky behavior" tends to be a lowest common denominator that's true everywhere, whereas actual malicious behaviors beyond IOCs tend to be much more environment-specific.
- Second, just because the action wasn't performed by a threat actor, doesn't mean we're fine with it happening! Ideally, nobody would ever log into AWS without MFA, and if someone is unwittingly violating company policy here, we'd want to correct that behavior.
Our instinctual desire to fit everything into a framework of True Positive/False Positive (which is mutually reinforced by the tools we use) incentivizes the behavior of identifying security risk, and mixing it in with identifying threats.
(Automated) Context is the Way Forward
Before putting a detection into production, it's important to ask ourselves one simple question: "What is the likelihood that this will be caused by threat actor activity when it fires?"
For most out-of-the-box detections like the one I mentioned at the start of this post, the reality is that the likelihood is a coin flip at best, absent additional information. That's not high enough fidelity to be handled by humans.
Instead, we should be looking at leveraging more automation and AI tools to answer questions like:
- Did the human who performed the activity recognize it and confirm (in a timely way) that they performed it?
- What is our company policy around this type of action? Are you allowed to get an exception during an incident, or for an approved business use case?
- Does this person in fact have an approved policy exception?
From there, we can make a decision about what happens next. If the end user doesn't recognize the activity, now we can escalate this to humans for incident response. If they forgot to ask for an exception, our workflows can assist them in asking for one.
In other words, we can still have security interventions with end users, nudging them towards security risk reduction and creating a fidelity filter for identifying threat actor activity. We just don't need a SOC analyst or security incident responder to do it.
Let our humans focus on actual threat behavior, and let the robots handle the rest.