AI vs Human in Threat Hunting: A Side-by-Side KQL Showdown Reveals Where LLMs Excel and Where They Still Fail + Video

Listen to this Post

Featured Image

Introduction:

The integration of Large Language Models (LLMs) into security operations is rapidly transforming how threat hunters and detection engineers approach their craft. While AI promises to accelerate query writing and uncover patterns at scale, a critical question remains: can these models truly replace the nuanced reasoning of an experienced human analyst? In a recent real-world experiment, Alex Teixeira, a seasoned Detection Engineering and Security Analytics SME, put two leading AI models head-to-head against his own hunting logic in a KQL (Kusto Query Language) matchup, revealing both the impressive capabilities and the significant blind spots of current AI in the context of Microsoft Defender threat hunting.

Learning Objectives:

  • Understand the core challenges AI faces when generating production-ready KQL queries for threat hunting, including training data limitations and schema awareness.
  • Learn how to construct effective KQL hunts for detecting PowerShell-based defense evasion techniques, such as Defender exclusion manipulations.
  • Discover practical strategies for integrating AI into the hunting workflow, including using AI for code review and path normalization, while maintaining human oversight.

You Should Know:

  1. The Training Data Gap: Why AI Excels at Python but Struggles with KQL

A model’s performance is directly tied to the quality and quantity of its training data. For languages like Python, there is an enormous public corpus including GitHub, Stack Overflow, and countless tutorials. In contrast, KQL and SPL suffer from a “brutal” difference in available data. KQL only gained widespread cybersecurity adoption with Microsoft’s SIEM launch in 2019, resulting in fewer public repositories and community projects publishing solid, production-grade code. This data scarcity means that even when a model produces syntactically correct KQL, it often lacks the domain expertise to write queries that are truly effective in a live environment. For instance, a model might generate a query hunting for `Invoke-Mimikatz` in the command line, a pattern that real attackers almost never use due to obfuscation and memory execution techniques. Such queries provide a false sense of security, returning results that look like threat hunting but miss actual attacks.

  1. Schema Awareness and Data Normalization: A Critical Failure Point

One of the most underappreciated failure modes in deploying LLMs for data tasks is their struggle with schema awareness. A model can memorize query syntax, but without understanding where to find the data and how it’s structured, it writes queries for tables or fields that may not exist. This is less of an issue in schema-on-write platforms like Microsoft Defender, where the `Device` tables are structured consistently across environments. However, the challenge remains acute when dealing with unstructured data or disparate log sources. The experiment highlighted this when both LLMs scoped their hunts exclusively to the `DeviceProcessEvents` table, which accounted for only about 30% of the potential results. A human hunter, by contrast, leveraged the `DeviceEvents` table as well, which provides rich telemetry often missed by AI models.

  1. The Hunt: Robot vs. Human – A Side-by-Side Analysis

The core of the experiment involved a specific threat hunting prompt: identifying PowerShell commands that add or set Defender exclusions to user-writable paths. The goal was to detect potential defense evasion. The prompt was deliberately simple to simulate an average user’s interaction with an LLM. The responses from ChatGPT (GPT-5.5) and Claude (Sonnet 4.6) were then compared against a query crafted by Alex Teixeira.

  • ChatGPT (GPT-5.5) Query: The model generated a 55-line query that attempted to normalize paths and compute prevalence. However, the query raised an error and failed to execute.
  • Claude (Sonnet 4.6) Query: Claude produced a 121-line, well-commented query that executed successfully. It demonstrated a strong coding style and even incorporated some advanced techniques like path normalization and environment variable monitoring.
  • Human Query: Alex’s query was a concise 15 lines (excluding comments). It used a different approach, focusing on the `ActionType` field and using regular expressions to match against command lines across multiple tables, regardless of the interpreter process used.
  1. Step-by-Step Guide: Building an Effective KQL Hunt for Defender Exclusions

To illustrate the gap between AI and human-crafted queries, let’s break down the core components of an effective KQL hunt, drawing from the human query in the experiment.

Step 1: Define the Attack Pattern

Identify the core behavior you are hunting for. In this case, it’s the use of `Add-MpPreference` or `Set-MpPreference` with the `-ExclusionPath` parameter, which attackers use to exclude malicious folders from Windows Defender scans.

Step 2: Craft a Robust Regular Expression

A strong regular expression (regex) is the heart of the hunt. It must be flexible enough to catch variations in command-line syntax.

let IOARegex = @"(?i)(add|set)-MpPreference[\s\S]+ExclusionPath";
let PathRegex= @"(?i)(c:|\$env:(HOMEDRIVE|SYSTEMROOT)).\(users|programdata|windows[\]+(temp|tracing)|\$Recycle.Bin)\|\$env:(TEMP|TMP|APPDATA|LOCALAPPDATA|PROGRAMDATA|PUBLIC|USERPROFILE|HOMEPATH|ALLUSERSPROFILE|ONEDRIVE|DESKTOP|DOCUMENTS|DOWNLOADS|FAVORITES)";

IOARegex: Matches the core Defender exclusion command, case-insensitively, across any characters including newlines.
PathRegex: Defines a list of user-writable or suspicious paths that an exclusion might target.

Step 3: Search Across Relevant Tables

Don’t limit yourself to a single table. Use the `search in()` operator to look across both `DeviceProcessEvents` and `DeviceEvents` to maximize coverage.

Step 4: Filter by Time and Action Type

Scope the hunt to a relevant time range, such as the last 30 days. Then, filter for specific `ActionType` values that are likely to contain the command-line data.

| where Timestamp > ago(30d)
| where ActionType has_any("PowerShellCommand", "ProcessCreated", "ScriptContent")

Step 5: Extract and Normalize the Command

Use the `parse_json()` function to extract the command from the `AdditionalFields` dynamic column, which is common in DeviceEvents. Then, use a `case()` statement to prioritize the most relevant command source.

| extend ScriptContent = parse_json(AdditionalFields)["ScriptContent"]
| extend AFCommand = parse_json(AdditionalFields)["Command"]
| extend PsCommand = case(
ScriptContent matches regex IOARegex, ScriptContent,
ProcessCommandLine matches regex IOARegex, ProcessCommandLine,
AFCommand matches regex IOARegex, AFCommand,
InitiatingProcessCommandLine)

Step 6: Apply the Path Regex and Summarize

Filter for commands that match the path regex, then summarize the results to find the most prevalent (most common) commands across devices.

| where PsCommand matches regex PathRegex
| summarize DevCount = dcount(DeviceId), arg_max(Timestamp, ) by PsCommand
| sort by DevCount

5. The Prevalence Component: A Key Differentiator

How a hunter defines “prevalence” significantly impacts the value of the hunt. Claude correctly assumed that prevalence should be drawn from the number of distinct devices or users. This approach helps identify commands that are rare and potentially malicious, as opposed to common administrative tasks. In contrast, ChatGPT defaulted to using the total number of events, which is a less effective metric for threat hunting and can be easily skewed by a single noisy host.

6. What the Robot Taught the Human

Despite the AI’s shortcomings, the experiment highlighted a powerful use case: using LLMs to review and improve human-written queries. Claude suggested several additional writable paths that Alex’s initial query missed, including `\Windows\Tracing\` and C:\$Recycle.Bin. This demonstrates that AI can be an invaluable tool for augmenting human expertise, acting as a second set of eyes to catch oversights and suggest improvements. The most effective current use cases for AI in this domain include code-to-documentation generation, automatic hunt deployment, and code review against best practices.

7. Linux and Windows Commands for Complementary Analysis

While the core of this article focuses on KQL within the Microsoft Defender ecosystem, threat hunting often requires a multi-platform approach. Here are some complementary commands for both Linux and Windows environments that can aid in detecting similar defense evasion techniques.

Windows Commands (PowerShell and CMD):

Check for Defender Exclusions (Admin):

Get-MpPreference | Select-Object -ExpandProperty ExclusionPath

Monitor for Defender Configuration Changes (Event Logs):

Get-WinEvent -LogName "Microsoft-Windows-Windows Defender/Operational" | Where-Object { $<em>.Id -eq 5007 }

Query for Suspicious PowerShell Command Lines (using Event Tracing for Windows – ETW):

Get-WinEvent -LogName "Microsoft-Windows-PowerShell/Operational" | Where-Object { $</em>.Message -match "Add-MpPreference|Set-MpPreference" }

Linux Commands (Bash):

Audit for Tampering with Security Tools (e.g., AppArmor, SELinux):

sudo ausearch -m avc -ts recent | grep "denied"

Monitor for Suspicious Process Execution (using `auditd`):

sudo auditctl -a always,exit -F arch=b64 -S execve -k process_monitor
sudo ausearch -k process_monitor -ts recent

Check for Cron Jobs or Systemd Timers that could be used for Persistence:

crontab -l
systemctl list-timers --all

What Undercode Say:

  • AI is a Powerful Assistant, Not a Replacement: Current LLMs excel at generating syntactically correct code and can significantly speed up the prototyping process. However, they lack the deep domain knowledge, contextual understanding, and critical thinking required to produce production-ready hunt queries that are both effective and efficient. The human hunter’s query was not only shorter but also captured four times as many true positives as the best AI-generated query.
  • The Human Edge Lies in Experience and Adaptability: An experienced hunter knows which tables to query, understands the nuances of different log sources, and can craft flexible regex patterns that catch real-world adversary behavior. This expertise, built over years of analyzing attacks, allows for the creation of concise, high-fidelity detections that AI models, limited by their training data, cannot yet replicate.

Analysis: The experiment underscores a critical point for the cybersecurity community: AI is not a magic bullet. While it can democratize access to complex query languages and help bridge the skills gap, it cannot replace the intuition and adaptability of a seasoned professional. The most effective approach is a symbiotic one, where hunters use AI to augment their workflow—for code review, path normalization, and rapid prototyping—while maintaining ultimate responsibility for the logic and efficacy of their hunts. The future of threat hunting lies not in replacing humans with machines, but in creating powerful human-machine teams that leverage the strengths of both.

Prediction:

  • +1 The integration of AI into Security Operations Centers (SOCs) will continue to accelerate, with AI agents taking on more routine tasks such as alert triage, data normalization, and initial query drafting. This will free up human analysts to focus on complex, high-value investigations and strategic threat hunting.
  • -1 A significant skills gap will emerge as organizations become over-reliant on AI-generated content. Junior analysts may lack the foundational knowledge to critically evaluate AI output, leading to a proliferation of “false sense of security” detections that miss sophisticated attacks. This will necessitate a renewed emphasis on fundamental security training and hands-on experience.
  • +1 The development of specialized, fine-tuned models for cybersecurity will be a major growth area. Models trained exclusively on security telemetry, query languages, and adversary tradecraft will outperform general-purpose LLMs, bringing us closer to the goal of effective AI-assisted threat hunting.
  • -1 The current limitations of AI, particularly in schema awareness and contextual reasoning, will be exploited by adversaries. Attackers will develop techniques specifically designed to evade AI-driven detection systems, creating a new cat-and-mouse game that requires constant human oversight and adaptation.

▶️ Related Video (70% Match):

🎯Let’s Practice For Free:

🎓 Live Courses & Certifications:

Join Undercode Academy for Verified Certifications

🚀 Request a Custom Project:

Secure, high-velocity infrastructure and disruptive technological engineering. Contact our engineering team for high-tier development and proprietary systems:
[email protected]
💎 Smart Architecture | 🛡️ Secure by Design | ⭐ Trusted by Thousands

IT/Security Reporter URL:

Reported By: Inode Ai – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅

🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]

💬 Whatsapp | 💬 Telegram

📢 Follow UndercodeTesting & Stay Tuned:

𝕏 formerly Twitter 🐦 | @ Threads | 🔗 Linkedin | 🦋BlueSky