Demystifying Phishing Emails: How to Build a Python-Powered Email Header Analysis & Threat Intelligence Tool + Video

Listen to this Post

Featured Image

Introduction:

In the relentless battle against phishing and Business Email Compromise (BEC), manual email header analysis is a tedious and error-prone process for cybersecurity professionals. A single investigation often involves juggling multiple command-line tools, online lookup services, and mapping utilities. This article explores the development of an automated, all-in-one Email Header Analysis Tool using Python, which consolidates parsing, threat intelligence enrichment, and geolocation visualization into a single, powerful workflow, transforming a fragmented process into an efficient investigative pipeline.

Learning Objectives:

  • Understand the critical components of an email header and how they can be forged or analyzed for authenticity.
  • Learn how to integrate multiple Threat Intelligence Platforms (TIPs) like VirusTotal and AbuseIPDB into a custom Python tool.
  • Gain practical skills in automating the extraction, parsing, and geolocation mapping of sender IP addresses from header chains.

You Should Know:

  1. Anatomy of an Email Header: The Digital Envelope
    An email header is a log of the message’s journey from sender to recipient. It contains a series of fields added by each mail transfer agent (MTA). Key fields for forensic analysis include `From` (often spoofed), Reply-To, Return-Path, and most importantly, the `Received` headers which form a hop-by-hop chain. The final `Received` header from the first external server usually contains the true originating IP address. This tool automates the parsing of this complex structure.

Step-by-Step Guide & Commands:

Step 1: Extract Raw Headers. First, obtain the full email headers. In Gmail, open an email, click the three dots `More` > Show original. In a Linux environment using `curl` to fetch an email from a mail server, you might use: curl --ssl -u 'user:pass' imaps://imap.server.com/INBOX -X "FETCH 1 BODY

"</code>. The Python tool will accept this raw header text as input.
 Step 2: Parse with Python's `email` Library. The core parsing uses Python's built-in modules.
[bash]
import email
from email import policy
from email.parser import BytesParser

Assuming raw_headers is a bytes object
msg = BytesParser(policy=policy.default).parsebytes(raw_headers)

Extract specific headers
from_header = msg.get('From')
received_chain = msg.get_all('Received')  Returns list of all Received headers

Step 3: Regex for IP Extraction. Use regular expressions to mine IP addresses from the `Received` headers.

import re
ip_pattern = re.compile(r'\b(?:[0-9]{1,3}.){3}[0-9]{1,3}\b')

all_ips = []
for received in received_chain:
ips = ip_pattern.findall(received)
all_ips.extend(ips)
 The first IP in the last 'Received' entry is often the originator

2. Authenticating the Sender: SPF, DKIM, and DMARC

Email authentication protocols are the first line of defense against spoofing. SPF (Sender Policy Framework) lists authorized sending IPs for a domain. DKIM (DomainKeys Identified Mail) uses cryptographic signatures. DMARC (Domain-based Message Authentication, Reporting & Conformance) defines policies for handling failures. The analysis tool can check for the presence and alignment of these headers.

Step-by-Step Guide:

Step 1: Header Inspection. Look for authentication results headers, often added by the receiving mail server (e.g., Gmail, Office 365).

spf_header = msg.get('Received-SPF')
dkim_header = msg.get('DKIM-Signature')
dmarc_header = msg.get('Authentication-Results')

Step 2: Interpret Results. Parse the values. For example, `Received-SPF: pass (google.com: domain of [email protected] designates 192.0.2.1 as permitted sender)` indicates a valid SPF record. A `DKIM-Signature` header's presence is a positive sign, but full cryptographic validation requires external libraries. The tool can flag emails missing these headers or containing `fail` results.

3. Enriching IPs with Threat Intelligence APIs

An IP address is just a number without context. Threat intelligence APIs provide reputation scoring, historical abuse data, and malware associations. This tool centralizes queries to services like AbuseIPDB and VirusTotal.

Step-by-Step Guide & Code:

Step 1: API Setup. Obtain free API keys from AbuseIPDB and VirusTotal. Store them securely as environment variables.

 Linux/macOS
export ABUSEIPDB_KEY='your_key_here'
export VT_KEY='your_key_here'

Step 2: Query AbuseIPDB. Check if an IP is reported for malicious activity.

import requests
import os
ABUSEIPDB_KEY = os.getenv('ABUSEIPDB_KEY')
ip_to_check = all_ips[bash]  Use the originating IP

url = 'https://api.abuseipdb.com/api/v2/check'
headers = {'Key': ABUSEIPDB_KEY, 'Accept': 'application/json'}
params = {'ipAddress': ip_to_check, 'maxAgeInDays': '90'}

response = requests.get(url, headers=headers, params=params)
data = response.json()
abuse_score = data['data']['abuseConfidenceScore']
total_reports = data['data']['totalReports']

Step 3: Query VirusTotal. Get a report on files, URLs, and IPs associated with the address.

url = f'https://www.virustotal.com/api/v3/ip_addresses/{ip_to_check}'
headers = {'x-apikey': os.getenv('VT_KEY')}
vt_response = requests.get(url, headers=headers)
vt_data = vt_response.json()
 Analyze vt_data['data']['attributes']['last_analysis_stats']

4. Visualizing the Threat: IP Geolocation Mapping

Mapping an IP to a geographic location helps identify improbable travel distances (e.g., a "local" bank email originating from a different continent) and visualizes attack infrastructure. Python's `folium` library can generate interactive Leaflet maps.

Step-by-Step Guide & Code:

Step 1: Get Geolocation Data. Use a free service like ip-api.com.

geo_url = f'http://ip-api.com/json/{ip_to_check}'
geo_resp = requests.get(geo_url)
geo_data = geo_resp.json()
lat, lon = geo_data.get('lat'), geo_data.get('lon')
city, country = geo_data.get('city'), geo_data.get('country')

Step 2: Generate Interactive Map.

import folium
if lat and lon:
threat_map = folium.Map(location=[lat, lon], zoom_start=10)
folium.Marker([lat, lon], popup=f"{ip_to_check}<br>{city}, {country}<br>Abuse Score: {abuse_score}%").add_to(threat_map)
 Save to HTML file
threat_map.save('threat_geolocation.html')

5. Building the Consolidated Workflow

The final tool stitches all these modules together into a command-line or simple GUI application.

Step-by-Step Guide:

Step 1: Create Argument Parser. Accept input via file or pasted text.

import argparse
parser = argparse.ArgumentParser(description='Email Header Analyzer')
parser.add_argument('-f', '--file', help='Path to file containing raw email headers')
parser.add_argument('-t', '--text', help='Raw header text as a string')
args = parser.parse_args()

Step 2: Orchestrate Modules. Call the parsing, analysis, API lookup, and mapping functions in sequence, passing data between them.
Step 3: Generate a Consolidated Report. Output a formatted report (JSON, HTML, or console print) summarizing sender info, authentication results, threat scores, and a link to the generated map.

What Undercode Say:

  • Automation is Force Multiplication: This tool exemplifies how scripting repetitive tasks frees up analyst time for higher-level investigation and response, moving from data collection to insight generation.
  • The Power of Integration: The true value lies not in any single function, but in the seamless flow from raw header to enriched threat intelligence and visual context, creating a narrative around the email's origin.
  • Foundation for SOAR: Such a custom tool can be modularized and integrated into Security Orchestration, Automation, and Response (SOAR) platforms as a custom playbook action for automated phishing triage.

Analysis:

The development of this tool highlights a critical evolution in defensive cybersecurity practices: the shift from manual, siloed analysis to integrated, automated workflows. While commercial Security Information and Event Management (SIEM) and Email Security Gateways offer similar features, the custom-built approach provides unmatched flexibility, cost-effectiveness for smaller teams, and deep educational value. It forces the developer-analyst to understand the underlying protocols and data flows, making them better investigators. The tool's core weakness, like any that relies on external APIs, is rate-limiting and potential changes to the APIs themselves. However, its modular design allows for easy swapping of intelligence sources as the threat landscape evolves.

Prediction:

The future of such investigative tools lies in deeper integration with AI. Machine learning models could be trained on the aggregated data (IP reputation, authentication failures, header anomalies) to not just report facts but predict the likelihood of an email being malicious with a confidence score. Furthermore, as API standards mature, we may see a move towards decentralized, blockchain-based threat intelligence sharing, where tools like this one can both query and contribute anonymized indicators of compromise (IoCs) in real-time, creating a collective defense network. The role of the cybersecurity professional will increasingly be that of a tool-smith and interpreter of automated systems, rather than a manual data gatherer.

▶️ Related Video (78% Match):

🎯Let’s Practice For Free:

IT/Security Reporter URL:

Reported By: Vivek Vishwakarma - Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅

🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]

💬 Whatsapp | 💬 Telegram

📢 Follow UndercodeTesting & Stay Tuned:

𝕏 formerly Twitter 🐦 | @ Threads | 🔗 Linkedin | 🦋BlueSky