Listen to this Post

Introduction:
In the modern Security Operations Center (SOC), data enrichment is the critical process of adding context to raw logs to distinguish between a genuine threat and background noise. However, security teams face a strategic dilemma: should you enrich data at ingestion in your SIEM, or wait until an alert fires in your SOAR? A recent discussion among security architects highlights that doing it exclusively in one layer leads to cost inefficiencies or alert fatigue. The consensus is a hybrid model, but executing it requires a deep understanding of your data pipeline and tooling.
Learning Objectives:
- Understand the strategic differences between SIEM-level (detection) and SOAR-level (response) enrichment.
- Learn how to configure lookup tables and API calls for real-time context.
- Identify the operational risks of enriching with unreliable data sources like a poorly maintained CMDB.
You Should Know:
- SIEM-Level Enrichment: Baking Context into the Detection Layer
Enrichment at the SIEM level involves augmenting log data as it is ingested or indexed. This provides immediate context for correlation rules, allowing the SIEM to make intelligent decisions before an alert is ever generated.
Why do it?
If a firewall log shows a connection from a service account, the SIEM needs to know instantly if that server was decommissioned last week. Without this internal context (CMDB, IAM), your detection logic is flying blind.
How to implement it (Example using Logstash/Elastic):
You can enrich data streams using lookup tables. This example takes an incoming log, checks the source IP against an asset database, and adds a `server_status` field.
Logstash Configuration Snippet (Linux)
filter {
Example: Enrich with CMDB data from a CSV file
csv {
path => "/etc/logstash/cmdb_assets.csv"
columns => ["ip_address", "server_status", "owner"]
Define a lookup hash
}
mutate {
Add a field based on the lookup (simplified logic)
In practice, you'd use a translate filter or a pipeline to external DB
}
Alternative: Using the translate filter for static enrichment
translate {
field => "[bash][ip]"
destination => "[bash][status]"
dictionary_path => "/etc/logstash/ip_status.yml"
fallback => "unknown"
}
}
What this does: It adds a new field `cmdb.status` to every log. If the IP is in the dictionary, it tags it as “active” or “decommissioned.” This allows a detection rule to simply check `cmdb.status: decommissioned` to trigger a high-fidelity alert instantly.
2. SOAR/AI SOC Enrichment: Context for the Responder
Enrichment at the SOAR layer happens after a detection rule has fired. Instead of querying external threat intel feeds for every single log (which is expensive and slow), you wait until you have a potential incident.
Why do it?
Threat Intelligence lookups (VirusTotal, AbuseIPDB) are API-call heavy and often rate-limited. Performing these on an alert-by-alert basis, rather than on a log-by-log basis, reduces costs and pipeline latency.
How to implement it (Example using a SOAR Playbook – Python/Pseudo-code):
A SOAR playbook triggers when an alert for “Malicious Outbound Connection” fires. It pulls the destination IP and enriches it.
Python script for SOAR enrichment (Cross-Platform)
import requests
import json
Function to enrich an IP with VirusTotal
def enrich_ip_virustotal(ip_address):
api_key = "YOUR_VT_API_KEY"
url = f"https://www.virustotal.com/api/v3/ip_addresses/{ip_address}"
headers = {"x-apikey": api_key}
try:
response = requests.get(url, headers=headers)
if response.status_code == 200:
data = response.json()
malicious_count = data['data']['attributes']['last_analysis_stats']['malicious']
Add the result back to the alert
return f"VT Malicious Detections: {malicious_count}"
else:
return "VT Enrichment Failed"
except Exception as e:
return f"Error: {str(e)}"
Example usage within a SOAR action
alert_ip = "8.8.8.8" Placeholder for IP from alert
enrichment_result = enrich_ip_virustotal(alert_ip)
print(f"Enrichment added to case: {enrichment_result}")
On Windows, you might run this as a scheduled task or within a SOAR engine like Splunk SOAR (Phantom) or Tines.
3. The “Both” Approach: Building the Pipeline
Implementing a “both” strategy requires a clear demarcation of data types. As highlighted by industry experts, static internal data (asset criticality, owner) lives in the SIEM. Dynamic, investigative data (threat intel, geo-location) lives in the SOAR.
Linux Command Line: Testing Enrichment Sources
Before automating, validate your data sources manually.
Test if a CMDB is reachable and returning data (Linux) curl -X GET "http://cmdb.internal/api/assets?ip=192.168.1.10" -H "accept: application/json" | jq . Test DNS reverse lookup enrichment (Windows PowerShell) Resolve-DnsName 192.168.1.10 -ErrorAction SilentlyContinue | Select-Object Name, IPAddress
- The Catch: The Garbage In, Garbage Out (GIGO) Principle
A critical point raised in the discussion is the reliability of the enrichment source. Enriching an alert with data from a CMDB that hasn’t been updated in six months is dangerous—it leads to false negatives (suppressing real threats) or false positives (investigating ghosts).
Windows Command: Auditing AD for Stale Objects
If your IAM feeds your SIEM enrichment, ensure the source is clean.
PowerShell (Windows) - Find computers inactive for 90 days
Search-ADAccount -ComputersOnly -AccountInactive -TimeSpan 90.00:00:00 |
Where-Object { $_.Enabled -eq $true } |
Select-Object Name, LastLogonDate |
Export-Csv -Path "stale_computers.csv" -NoTypeInformation
What this does: This script audits Active Directory for enabled computer accounts that haven’t logged on in 90 days. Running this monthly ensures your SIEM’s “decommissioned” lookup table stays accurate.
5. Data Pipeline Enrichment: The Emerging Middle Ground
Modern architectures are moving enrichment to a dedicated data pipeline (e.g., Logstash, Cribl, Fluentd) sitting between the log source and the SIEM. This allows you to normalize data and add context once, then route it to multiple destinations (SIEM, Data Lake, Analytics) without re-processing.
Configuration (Cribl/Logstash Example):
- Input: Raw Windows Event Logs (Event ID 4624).
- Processing: Parse the username, query an HR database via API (or local lookup) to get the user’s department and manager.
- Output: Send the enriched log to the SIEM and a long-term S3 bucket. The SIEM pays for the storage of the core log, but the department field helps with routing alerts to the correct team.
6. Mitigation: Avoiding Enrichment Sprawl
Enriching in too many places leads to “enrichment sprawl,” where data fields conflict. Standardize your taxonomy.
Example: Using jq (Linux) to compare enriched vs. raw fields
View only the new fields added by enrichment
cat enriched_log.json | jq 'with_entries(select(.key | startswith("enrichment_")))'
If you see `asset_status` from the pipeline and `cmdb.asset.status` from the SIEM and `asset_data.status` from the SOAR, you have a problem. Consolidate to a single source of truth per data type.
What Undercode Say:
- Context is King, but Data Hygiene is God: Enrichment is only as good as the source data. Prioritize cleaning your CMDB and IAM systems before architecting complex enrichment pipelines.
- Don’t Pay for Investigation Data: Threat intelligence is for investigation, not detection at scale. Keep it out of your expensive SIEM ingest if possible; query it when you have a suspect.
Analysis:
The debate over where to enrich reveals a maturity curve in SOC architecture. Teams that dump everything into the SIEM are paying for storage they don’t need, while teams that rely solely on SOAR are overworking their analysts with low-fidelity alerts. The strategic takeaway is to treat enrichment as a “need-to-have” versus “nice-to-have” split. Foundational context (asset criticality, network ownership) must be available in milliseconds at the detection layer. Investigative context (who owns this IP, is this file malicious) can take seconds and belongs in the response layer. The rise of AI SOC layers will further blur this line, as AI agents will need immediate access to both types of context to make autonomous decisions.
Prediction:
The next evolution will be “GenAI-Enriched Pipelines,” where large language models analyze incoming logs and enrichment data simultaneously to generate natural language summaries of an alert’s context before a human ever sees it. This will move enrichment from a simple key-value lookup to a complex reasoning layer, further reducing the distinction between SIEM and SOAR.
▶️ Related Video (72% Match):
🎯Let’s Practice For Free:
IT/Security Reporter URL:
Reported By: Filipstojkovski Friday – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅


