Home AI Solutions MedTech Cyber Security Clients Company Blog עברית Get Started
CybersecuritySIEMSOARDetection EngineeringSOC

SIEM and SOAR Without Alert Fatigue: A Detection Engineering Approach

Pelican Tech 5 min read
Abstract dark composition with blue event streams flowing through orange detection nodes against a deep starfield, evoking a security operations grid

Almost every SOC we walk into has the same Slack channel: 200 to 600 high-severity alerts a week, an on-call rotation that has stopped reading them carefully, and a SIEM dashboard that the team checks once at the start of each shift. Management asks why detection coverage seems low when the SIEM is firing constantly. The answer is the gap between detection and alerting, and that gap is what detection engineering closes.

This is the operating model we use to take a fatigued SOC to a measured one. It works because it borrows the right ideas from software engineering (versioning, testing, deprecation) and applies them to detection content rather than to the security stack as a whole. The technology pieces are the same as before; the operating discipline is what changes.

What detection engineering actually is

Detection engineering treats every alert rule, correlation, and SOAR automation as a maintained piece of code. Each detection has an owner, a documented adversary behaviour it claims to detect, a test case that demonstrates the detection fires, a measured false-positive rate over rolling time windows, a deprecation policy, and a recorded contribution to incidents.

In a healthy programme this looks operationally identical to a microservice repository: pull requests for new detections, code review by the lead detection engineer, a CI pipeline that runs each detection against a synthetic event corpus, version-controlled deployment, and dashboards that report on detection-level health metrics over time.

Most SOCs do not work this way. They work the way SIEM marketing implied they should: rules are added by analysts in response to incidents or vendor recommendations, never deprecated, never measured for fidelity, and never tested. The result, after 18 to 36 months of operation, is the alert volume problem.

Why high alert volume is rarely a tooling problem

The reflex when alert volume is high is to look for a tool that summarises better. UEBA, ML-driven SIEM features, AI-assisted triage, all of these promise to reduce noise. They sometimes do, on the margin. But the structural cause of alert fatigue is almost never that the SIEM is bad at math. It is that the detection content is unmaintained.

A typical SOC we audit has between 200 and 1,500 detection rules in production. When we sample them, we usually find:

  • 30–45% of rules have produced no alerts in the last 90 days
  • 15–30% of rules have produced more than 100 alerts in the last 30 days, of which fewer than 1% were investigated
  • 10–25% of rules are duplicates with slightly different filter logic, all firing on the same source events
  • 5–15% of rules detect adversary techniques that the environment is not actually exposed to (e.g., on-prem AD attack patterns in a 100% cloud-native shop)

Buying a tool to summarise this corpus produces summarised noise. Cleaning the corpus produces fewer and better detections. The work is unglamorous but it is the work that moves the metric.

The detection lifecycle

A maintained detection moves through five named states. Knowing where each rule lives is the precondition for managing them.

Proposed: Adversary technique exists in the threat model, no rule exists yet. The detection engineer writes a rule and a test corpus.

Tuning: Rule is in production but emitting to a non-paging stream. Engineer measures FP rate against real traffic for two to four weeks. Tunes filter logic. Adds context fields the analyst will need.

Active: Rule meets fidelity threshold (typically <5% FP rate, sometimes higher for high-severity rules), pages on-call, has runbook, has owner. Counted in coverage metrics.

At-risk: Rule has degraded, either via rising FP rate, declining fire rate, or environmental change. Triggers a review.

Deprecated: Rule removed from production with a recorded reason. Should account for 10–30% of rule turnover annually in a healthy programme.

A SOC that cannot tell you what stage each rule is in does not have a detection programme; it has a rule pile. The rule pile produces alert fatigue regardless of how good the underlying SIEM is.

SOAR is not what most teams think it is

The SOAR market originally promised "automate triage." In practice, that promise translated into playbooks that enrich alerts with reputation data, query a few sources, and present the analyst with a slightly fancier alert. This is not nothing, but it is also not the leverage SOAR can provide.

Where SOAR earns its keep is in two narrow patterns:

1. Containment automation for high-confidence detections. When a detection has a measured <2% false-positive rate over six months, you can move beyond enrichment into action. Automatically isolating an endpoint that fires a credential-theft detection, automatically suspending an identity that performs a known-bad sequence of cloud actions, automatically rotating a credential found exposed. The economics of this require trust in the underlying detection, which means the four detection-lifecycle disciplines above must be in place first.

2. Investigation playbooks that codify analyst expertise. Instead of automating decisions, automate the gathering. A SOAR playbook that, on a phishing alert, pulls the original email, the user's authentication history, the device posture, and any email-cluster matches, presents this as a single artefact. Analyst still decides; analyst saves 20 minutes of pivoting per alert. Across a thousand alerts a month, that is real capacity.

SOAR is most often deployed as the first thing teams buy after a SIEM, before either is producing reliable signal. This is the wrong sequence. Automate after the detection content is healthy, not before.

What a measured SOC looks like

In our experience, a SOC running detection engineering well shows specific quantitative signatures:

  • Active detection count is stable or slowly growing, with regular deprecation
  • Mean false-positive rate across active detections is under 10%
  • Median time-to-triage is under 20 minutes for paging detections
  • 70%+ of alerts that page have a runbook the analyst can follow without escalation
  • Coverage metrics are tied to a threat model, not to a vendor's MITRE heatmap
  • Each major incident over the last two years has produced at least one new detection

If a SOC cannot produce these numbers, the answer to "why is detection coverage low?" is mechanical. The work is in the discipline of detection content, not in another tool purchase.

Where this connects to our practice

Pelican Tech's SIEM & SOAR practice builds the detection-engineering operating model above on whatever SIEM the client already runs (Sentinel, Splunk, Elastic, Chronicle). We start with a detection inventory, classify the rule pile against the five-stage lifecycle, deprecate aggressively, and rebuild the active set with proper test corpora and fidelity tracking. We then layer SOAR for the two patterns where it actually pays. We work alongside our cloud security team when cloud control-plane logs are the dominant source, and with our risk management practice to align coverage with the threat model rather than with a vendor's framework heatmap.

If your SOC is running 200+ alerts a week and analysts are visibly fatigued, that is the conversation to have with us before another tool purchase.