THREATRADAR: The Open-Source Threat Intelligence Pipeline That SOC Teams Have Been Waiting For + Video

Listen to this Post

Featured Image

Introduction:

Security Operations Centers (SOCs) are drowning in an avalanche of threat indicators—millions of IOCs daily, most lacking context, prioritization, or validation. THREATRADAR emerges as an open-source answer to this chaos, automating the ingestion, enrichment, scoring, and deduplication of raw threat feeds into actionable intelligence. Built by Salma El Fekih, Aicha Allagui, and Wiem Abbassi—winners of the Cyber Horizon 2.0 challenge—this pipeline transforms scattered threat data into a structured, continuously improving system using Elasticsearch, StrangeBee Cortex, and MISP.

Learning Objectives:

  • Build and deploy an automated threat intelligence pipeline that collects, normalizes, and enriches IOCs from multiple open-source feeds.
  • Implement confidence scoring and poisoning detection mechanisms to filter manipulated or low-fidelity threat data.
  • Integrate Elasticsearch for high-speed IOC search and MISP for collaborative threat sharing with automated data retention.

You Should Know:

  1. Deploying THREATRADAR: From Raw Feeds to Indexed Intelligence

THREATRADAR ingests indicators from sources like AlienVault OTX, Feodo Tracker, and open feeds. The pipeline normalizes disparate formats, deduplicates entries, and pushes them into Elasticsearch for low-latency querying. Below is the core workflow implemented in the project’s open-source code.

Step‑by‑step setup (Linux/Docker):

 Clone the repository
git clone https://github.com/ThreatRadar/ThreatRadar.git  (actual repo from lnkd.in/dddpeXgq)
cd ThreatRadar

Launch Elasticsearch and the pipeline using Docker Compose
docker-compose up -d elasticsearch kibana

Verify Elasticsearch is running
curl -X GET "localhost:9200/_cluster/health?pretty"

Run the collector script (example)
python3 collectors/feed_ingestor.py --feeds feeds_config.yaml

What this does: The ingestor pulls IOCs from configured feeds, applies a normalization schema (e.g., converting IPv4, domains, hashes to STIX 2.1), and indexes them into Elasticsearch with a `threatradar-iocs` index. Deduplication uses SHA256 hashes of normalized IOCs over a rolling window.

Windows alternative (WSL or Python virtual env):

 Using PowerShell with Python venv
python -m venv threatradar_env
.\threatradar_env\Scripts\activate
pip install -r requirements.txt
python collectors/feed_ingestor.py --feeds feeds_config.yaml

2. Enriching IOCs with StrangeBee Cortex Analyzers

Raw IOCs lack context—THREATRADAR calls StrangeBee Cortex analyzers to enrich each indicator with geolocation, DNS records, malware family tags, and threat actor associations. This enrichment feeds into the confidence scoring engine.

Step‑by‑step configuration:

  1. Obtain a Cortex API key from your Cortex instance (or use TheHive’s built‑in Cortex).

2. Edit `config/cortex.yaml`:

cortex:
url: "http://your-cortex-server:9001"
api_key: "YOUR_API_KEY"
analyzers:
- "DomainReputation"
- "VirusTotal_GetReport"
- "Geolocation"

3. Run the enrichment pipeline:

python enrichers/cortex_enricher.py --ioc_index threatradar-iocs --batch_size 100

4. Check enriched fields in Elasticsearch:

GET /threatradar-iocs/_search
{
"query": { "term": { "ioc_type": "domain" } },
"_source": ["ioc", "enrichment.geo.country", "enrichment.malware_family"]
}

Pro tip: Reduce API costs by caching enrichment results in Redis for 24 hours. THREATRADAR includes a `redis_cache.py` module for this purpose.

3. Scoring Threats and Detecting Poisoned Data

Adversaries can poison open feeds by submitting false indicators. THREATRADAR implements a multi‑factor confidence score (0–100) and a poisoning detection layer that flags statistical anomalies—e.g., a single source contributing >10% of new IOCs in an hour.

Step‑by‑step scoring logic (Python snippet from the codebase):

def calculate_confidence(ioc, enrichments, source_reputation):
score = 0
 Source reputation weight (0-40)
score += source_reputation  40
 Analyzer consensus (0-40)
if enrichments.get('vt_positive', 0) > 5:
score += 30
elif enrichments.get('vt_positive', 0) > 0:
score += 15
 Freshness (0-20)
age_hours = (now - ioc.first_seen).hours
score += max(0, 20 - age_hours)
return min(100, score)

Poisoning detection - Z-score on source volume
python detectors/poison_detector.py --lookback_minutes 60 --threshold_z 3.5

Command to flag poisoned feeds:

curl -X POST "http://localhost:5000/api/detect/poison" \
-H "Content-Type: application/json" \
-d '{"feed_name": "honeypot_feeds", "suspicious_threshold": 15}'

If a feed is flagged, THREATRADAR automatically quarantines its IOCs and sends an alert to a configured Slack/Teams webhook.

4. Integrating with MISP for Collaborative Intelligence

Validated IOCs are pushed to a MISP instance, enabling sharing across teams. The pipeline uses MISP’s REST API to create events, add attributes, and tag indicators with confidence levels.

Step‑by‑step MISP integration:

  1. Generate an automation key in MISP (Administration → List Auth Keys).

2. Configure `config/misp.yaml`:

misp:
url: "https://your-misp.local"
key: "YOUR_API_KEY"
validation_tag: "threatradar:validated"
min_score_to_push: 70

3. Run the MISP pusher:

python integrations/misp_pusher.py --elastic_index threatradar-iocs --score_threshold 70

4. Verify in MISP:

curl -k -H "Authorization: YOUR_API_KEY" \
"https://your-misp.local/attributes/restSearch/returnFormat:json/type:domain" | jq '.'

Linux command to sync MISP events back to THREATRADAR (bidirectional):

python integrations/misp_importer.py --event_id 12345 --update_elastic true

5. Automating Data Retention to Purge Outdated IOCs

Old IOCs waste storage and degrade search performance. THREATRADAR implements a retention policy that deletes indicators older than N days (default 90) from Elasticsearch and MISP, while optionally archiving them to cold storage.

Step‑by‑step retention automation (cron job):

1. Edit `config/retention.yaml`:

retention:
elastic_index: "threatradar-iocs"
days_to_keep: 90
archive_path: "/mnt/cold_storage/archived_iocs.json"
misp_sync: true

2. Run the retention script manually:

python maintenance/retention_manager.py --dry-run false

3. Schedule daily execution via cron:

(crontab -l 2>/dev/null; echo "0 2    /usr/bin/python3 /opt/ThreatRadar/maintenance/retention_manager.py >> /var/log/retention.log") | crontab -

4. Query Elasticsearch to confirm deletion:

curl -X GET "localhost:9200/threatradar-iocs/_count?q=first_seen:<now-90d"

Windows Task Scheduler equivalent:

$Action = New-ScheduledTaskAction -Execute "python.exe" -Argument "C:\ThreatRadar\maintenance\retention_manager.py"
$Trigger = New-ScheduledTaskTrigger -Daily -At 2am
Register-ScheduledTask -TaskName "ThreatRadarRetention" -Action $Action -Trigger $Trigger
  1. Leveraging LLMs for Context-Aware IoC Extraction (Wes Young’s Suggestion)

As noted by Wes Young on the original post, traditional regex-based IoC extraction misses context. THREATRADAR can be extended with small LLMs (e.g., LLAMA 3B or Phi-2) to parse unstructured threat reports and extract IOCs with relationship mapping (e.g., “domain X associated with ransomware Y”).

Experimental integration (using Ollama):

 Install Ollama and pull a mini model
curl -fsSL https://ollama.com/install.sh | sh
ollama pull phi:2.7b

Run the LLM extractor (custom script)
python experimental/llm_ioc_extractor.py --input threat_report.txt --model phi:2.7b

Example output:

Extracted IOCs:
- domain: evil-c2[.]com (context: "C2 server for LockBit")
- hash: a94a8fe5cc... (context: "ransomware binary hash")

API security consideration: Never send sensitive IOCs to cloud LLMs. Use local models or on‑prem Hugging Face endpoints.

What Undercode Say:

– Key Takeaway 1: THREATRADAR solves the “alert fatigue” problem by scoring and enriching IOCs, but its real power lies in the poisoning detection module—most open-source pipelines ignore adversarial feed manipulation, making them attack vectors themselves.
– Key Takeaway 2: Wes Young’s LLM suggestion is forward‑looking: regex fails against obfuscated or narrative‑embedded IOCs. Integrating a small, local LLM could boost recall by 30–40% without cloud privacy risks.

Analysis: The project is a well‑architected reference implementation for SOC automation. It successfully combines Elasticsearch’s search speed with Cortex’s enrichment depth and MISP’s sharing capabilities. However, deployment complexity (managing Cortex, Elastic, and MISP) may challenge small teams. The automated retention policy is a standout feature—many open‑source pipelines accumulate stale data. The lack of built‑in SIEM integration (e.g., Splunk, Sentinel) is a gap, but the modular code allows custom connectors. Future work should include anomaly detection on IOC time‑to‑live and LLM‑based free‑text parsing, as Wes suggested.

Prediction:

Within 12–18 months, threat intelligence pipelines like THREATRADAR will become standard SOC infrastructure, moving from “nice to have” to mandatory. The adoption of small language models for context‑aware IoC extraction will outpace traditional regex, with on‑prem models like Phi-3 or Mistral 7B running alongside Elasticsearch. We’ll also see adversarial ML attacks targeting scoring algorithms—attackers will deliberately boost false IOCs to manipulate confidence scores. THREATRADAR’s poisoning detection is a first step; future versions will need robust anomaly detection (isolation forests, autoencoders) and federated scoring across multiple independent pipelines to resist coordinated feed poisoning. The open‑source community will likely fork this project into specialized variants—one for ransomware tracking, another for nation‑state APT monitoring—creating a modular threat intelligence ecosystem.

▶️ Related Video (82% Match):

🎯Let’s Practice For Free:

IT/Security Reporter URL:

Reported By: Salma El – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅

🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]

💬 Whatsapp | 💬 Telegram

📢 Follow UndercodeTesting & Stay Tuned:

𝕏 formerly Twitter 🐦 | @ Threads | 🔗 Linkedin | 🦋BlueSky