Listen to this Post
2025-02-15
Attackers donât break in; they log in. They move laterally, chaining misconfigurations, privilege escalations, and trust relationships to reach their objective. Successful attack path reasoning requires a system that can:
- Recall prior findings dynamicallyâIf credentials are found on Machine A, the system must remember to test them when reaching Machine G.
- Adapt when an attack failsâIf a password spray fails, pivot to Kerberoasting or check NTLM relays.
- Track multi-hop relationshipsâIf an identity provider is compromised, understand how it impacts lateral movement into cloud environments.
- Persist state across an entire networkâEnterprise networks are complex, with thousands of interdependencies that canât fit in a single prompt.
Humans do this instinctively, but LLMs struggle due to context window limitations, memory retention, and multi-step reasoning failures.
Very Long Context Windows
Current LLMs suffer from fixed context windows and struggle to retain key information across multi-step reasoning tasks. This leads to:
1. Truncation Killing Multi-Step AttacksâIf a model loses track of credentials found early in a pentest, it wonât attempt lateral movement later.
2. Attention Decay Breaking Attack ChainsâTransformer-based models often deprioritize older information in favor of recent input.
3. Non-Trivial Querying of Past FindingsâExternal memory and RAG (Retrieval-Augmented Generation) are needed to fetch past discoveries dynamically.
A pentest with 20k hosts could generate a graph with 1B nodes and edges, consuming ~100B tokens in current LLM architectures. This is why infrastructure pentesting requires a fit-for-purpose stack.
Why Application Pentesting is Easier
Web and API pentests operate in short, stateless interactions:
– SQL InjectionâSend a payload, check for a response.
– XSSâInject JavaScript, see if it executes.
– Broken AuthenticationâTry a few crafted requests.
Each vulnerability is detected in isolation, meaning the model doesnât need long-term memory or multi-step reasoning. This is why application pentesting is easier for LLMs:
1. Small context windowsâA web request plus a response typically fits within a few kilobytes of memory.
2. Minimal state trackingâEach attack is a single atomic event.
3. Massive training data availabilityâOpen-source web apps provide extensive training data.
Practice-Verified Commands and Codes
For infrastructure pentesting, here are some practical commands:
- Kerberoasting:
GetUserSPNs.py -request -dc-ip <DC_IP> <DOMAIN>/<USER>:<PASSWORD>
- NTLM Relay:
ntlmrelayx.py -t <TARGET_IP> -smb2support
- Password Spraying:
crackmapexec smb <TARGET_IP> -u <USER_LIST> -p <PASSWORD_LIST>
For application pentesting:
- SQL Injection:
sqlmap -u "http://example.com/page?id=1" --dbs
- XSS Testing:
xsstrike -u "http://example.com/search?q=test"
What Undercode Say
AI in cybersecurity, particularly for infrastructure pentesting, faces significant challenges due to the complexity of enterprise networks and the limitations of current LLMs. While application pentesting benefits from smaller context windows and stateless interactions, infrastructure pentesting requires advanced memory retention, multi-step reasoning, and dynamic adaptation.
To address these challenges, cybersecurity professionals must leverage fit-for-purpose tools and techniques. For example, using Kerberoasting and NTLM relay attacks requires precise command execution and understanding of network trust relationships. Similarly, tools like `crackmapexec` and `sqlmap` are indispensable for password spraying and SQL injection testing.
As AI continues to evolve, integrating external memory systems like RAG and improving context window sizes will be critical for advancing infrastructure pentesting capabilities. Until then, human expertise remains irreplaceable in navigating the intricate web of enterprise networks.
For further reading, explore:
References:
Hackers Feeds, Undercode AI


