Listen to this Post

Introduction:
The middle‑end of the LLVM compiler is a battlefield. Obfuscators try to hide program logic by inserting junk code, opaque predicates and mixed Boolean arithmetic (MBA), while optimizers relentlessly simplify everything back to its most trivial form. This tug‑of‑war reached a new level with LLVM 19: a single commit that teaches InstCombine to apply De Morgan’s law recursively turned a previously robust MBA‑based obfuscation into a plain `ret i8 0` – and changed the game for both red and blue teams.
Learning Objectives:
– Understand how LLVM’s middle‑end optimisation passes (Early‑CSE, InstCombine, Reassociate, GVN, BDCE) dismantle obfuscated code.
– Learn to generate and inspect LLVM IR, run custom pass pipelines, and identify why a specific commit broke a whole class of obfuscation.
– Explore practical counter‑measures – from inline assembly and live variables to subtle constant tweaks – that can still survive the optimiser.
You Should Know:
1. LLVM IR: The Language of the Arms Race
The battlefield is LLVM Intermediate Representation (IR). It is target‑independent, simple and the place where all middle‑end optimisations happen. If you can produce IR that the optimiser cannot simplify, your obfuscation holds; if an analyst can run the same passes, they can de‑obfuscate your code.
Step‑by‑step guide to generating and inspecting LLVM IR
1. Generate unoptimised IR from C code
clang -O0 -Xclang -disable-O0-optnone victim.c -S -emit-llvm -o victim.ll
The `-disable-O0-optnone` flag prevents the compiler from marking functions as “optnone”, which would block later optimisation passes.
2. Run a full O2 optimisation pipeline
opt victim.ll -O2 -S -o victim_opt.ll
This applies a carefully tuned sequence of passes (Early‑CSE, InstCombine, GVN, BDCE, etc.) and shows you what the compiler can see.
3. Run only a specific pass to isolate its effect
opt victim.ll -passes=early-cse -S
Early‑CSE (Common Subexpression Elimination) is often the first pass that folds constants and eliminates redundant code. In the blog’s first example, Early‑CSE alone reduced the whole mystery function to `ret i8 0`.
4. Use Compiler Explorer (godbolt.org) for interactive learning
– Select “LLVM IR” on the left and “opt 18.1.0” (or 19.1.7) on the right.
– Add `-O2` as a compiler option.
– Click “Add New → Opt Pipeline” to see exactly which passes run and in what order.
5. Generate an object file from IR
llc -filetype=obj victim.ll -o victim.o
This step lowers the IR to machine code for a specific architecture (e.g. x86).
What this does and how to use it for obfuscation or de‑obfuscation
By examining the IR before and after optimisation, you see exactly which constructs survive and which get eliminated. For an obfuscator author, this is a feedback loop: keep tweaking until the IR resists the pipeline. For a reverse engineer, it is a way to lift binary code into IR and let the same passes simplify the logic before decompilation.
2. The Classic Mistake: Pure Constant Expressions
The blog’s first mystery function was a long chain of bitwise and arithmetic operations on constants:
define i8 @mystery() {
%notx = xor i8 40, -1
%a = or i8 %notx, -101
%b = and i8 %a, 65
; ... and so on ...
ret i8 %r
}
Even though the code looked “complex”, the optimiser saw that every input was a compile‑time constant and simply folded the whole function into `ret i8 0`.
Step‑by‑step guide to defeating constant folding (and why it fails)
1. Identify the problem – constant propagation (SCCP) and instruction combining (InstCombine) will evaluate expressions that have no runtime dependencies.
2. Introduce a runtime‑unknown value – even a single byte read from a register prevents the compiler from pre‑computing the result.
%unknown = call i8 asm "", "=r"()
This inline assembly fetches an undefined value into a register; the optimiser cannot predict it, so the expression remains alive.
3. Weave the unknown value into the existing control flow – instead of an isolated `asm` block, use a function parameter or a value loaded from memory:
define i8 @mystery(i8 %arg1) { ... }
Now the result depends on the input, and the optimiser can no longer fold everything to a constant.
Why this still fails in LLVM 19
When the expression is linear (no branching) and uses only the unknown value once, the new De Morgan inversion in InstCombine can still simplify it down to zero for all inputs. The blog shows that with LLVM 19, even the parameterised version becomes `ret i8 0`.
3. Mixed Boolean Arithmetic (MBA) and Its Fragile Assumptions
MBA expressions mix arithmetic (`+`, `-`) and bitwise operations (`&`, `|`, `^`) in a way that is supposed to be hard to simplify. The obfuscation in the blog uses a semi‑linear MBA that was resistant to LLVM 18 but collapsed under LLVM 19 because of a single algebraic rule: De Morgan’s law.
Step‑by‑step guide to understanding the kill‑chain
1. Start with the obfuscated expression (simplified from the blog):
%r = xor ( ( ( ( ( ( (xor %arg1, -1) | -101 ) & 65 ) + ... ) ) ), -1 )
2. LLVM 19’s InstCombine applies De Morgan
The commit [`cf5cd98`](https://github.com/llvm/llvm-project/commit/cf5cd98e74275ed6198b4bbe76cec250ade2c186) teaches `getFreelyInvertedImpl` to push `~(`A `|` B`)` → `(~`A `& ~`B`)` recursively through the expression.
3. The NOT layers peel away
Once the outer `xor` (which is a NOT) is distributed over the ORs and ANDs, two identical terms appear on both sides of a subtraction. For example, the expression reduces to something like `(x – x)`.
4. GVN (Global Value Numbering) detects the redundancy and eliminates it, leaving `ret i8 0`.
How to survive this (temporarily)
The blog’s final experiment changes the constants `65` and `-65` to `66` and `-66`. That small tweak misaligns the bit‑masks that InstCombine needs for its algebraic cancellation, and even LLVM 22.1.0 cannot fold it.
Lesson: Obfuscation that depends on a specific algebraic pattern is inherently fragile – the moment the compiler learns one new rewrite rule, the whole construction breaks.
4. Linux / Windows Commands for Analysing Obfuscated Binaries
The same LLVM passes that break obfuscation can be used to de‑obfuscate binaries – after they are lifted back to IR.
Linux (using Ghidra or Binary Ninja + LLVM)
– Lift binary to LLVM IR (simplified): use `mctoll` (from the LLVM `llvm-mctoll` tool) or `retdec`.
– Run optimisation passes on the lifted IR
opt lifted.ll -O2 -S -o cleaned.ll
The optimiser will fold constant expressions, eliminate dead stores and simplify control flow, often revealing the original logic.
– Decompile the cleaned IR with `llc` and a decompiler, or directly read the simplified IR.
Windows (using IDA + Hex‑Rays or Ghidra + LLVM)
– Export the binary as a PE file, then use Ghidra’s “Export → LLVM IR” (via a plugin) or use `retdec` (which runs on Windows as well).
– Run the same `opt` pipeline (available via the LLVM Windows binaries) to simplify the expression.
A practical PowerShell one‑liner to test a suspicious function
Assuming you have retdec installed retdec-decompiler.exe suspect.dll --target raw --backend-llvmir-output suspect.ll opt.exe suspect.ll -O2 -S -o suspect_opt.ll findstr "ret i8" suspect_opt.ll
If the result is a constant (e.g., `ret i8 0`), the “secret” value is no secret at all.
5. Hardening Obfuscation Against Future Optimisers
The arms race never stops. What works today against LLVM 19 may be broken in LLVM 20. Here are practical steps to increase resilience.
Step‑by‑step guide to building more durable obfuscation
1. Avoid pure linear MBA expressions – introduce real control flow (e.g., opaque predicates that look like they depend on input but always evaluate the same way).
volatile int x = arg1; // prevent constant propagation
if ((x & 1) == 0) { ... } else { ... } // both paths lead to same result
2. Use live variables that are not trivially eliminated – pass obfuscated values through memory (stack) rather than registers, because SROA (Scalar Replacement of Aggregates) and MemCpyOpt may still optimise them, but at least you force the compiler to work harder.
3. Insert opaque predicates that are hard to prove – for example, a comparison that uses an input value in a way that the compiler cannot prove is always true or always false.
if ((arg1 0x12345678) ^ 0xDEADBEEF == 0x12345678) { ... }
4. Apply the “constant tweak” trick from the blog – change constants slightly (e.g., 65→66) to break the exact algebraic identities that InstCombine relies on. This is fragile but demonstrates the principle.
5. Write your own LLVM pass that re‑obfuscates after optimisation – run the standard `-O2` pipeline, then add a custom pass that re‑scrambles the IR. This way you benefit from the compiler’s own simplification while still producing obfuscated output.
What Undercode Say:
– The middle‑end of a compiler is a double‑edged sword: the same passes that obfuscators try to evade are the ones reverse engineers use to de‑obfuscate code. Mastering LLVM IR and its optimisation pipeline is a superpower for both sides.
– A single algebraic rewrite (De Morgan’s law) added in LLVM 19 made years of MBA‑based obfuscation obsolete overnight. This shows that relying on “hard” mathematical identities is a losing strategy unless you constantly adapt.
– The only durable approach is to embrace the arms race: understand exactly what the compiler can prove about your code, and introduce runtime randomness or control‑flow constructs that resist symbolic evaluation. Meanwhile, defenders should aggressively lift binaries into LLVM IR and apply the full `-O2` pipeline to de‑obfuscate malware – the compiler is the best de‑obfuscator you never knew you had.
Prediction:
– +1 Obfuscation will move away from pure MBA and constant‑based tricks toward dynamic, self‑modifying code that relies on runtime values – forcing LLVM and other compilers to add new analysis passes, which in turn will be used by RE tools.
– +1 The LLVM middle‑end will continue to incorporate more algebraic identities (associativity, commutativity, distributive laws), making a larger class of “hard” expressions trivial to simplify. This will benefit both optimising compilers and de‑obfuscation pipelines.
– -1 Many currently deployed obfuscation tools (including commercial protectors) that rely on static MBA expressions will become ineffective against LLVM‑19+ based de‑obfuscators, forcing vendors to push emergency updates or risk losing customers.
– -1 Reverse engineering will become more accessible to junior analysts: with a simple `opt -O2` command, they can now break obfuscation that previously required manual analysis. This lowers the barrier to malware analysis but also makes it easier for attackers to test and break protections.
🎯Let’s Practice For Free:
🎓 Live Courses & Certifications:
[Join Undercode Academy for Verified Certifications](https://undercode.co.uk/certifications/)
🚀 Request a Custom Project:
Secure, high-velocity infrastructure and disruptive technological engineering. Contact our engineering team for high-tier development and proprietary systems:
[[email protected]](mailto:[email protected])
💎 Smart Architecture | 🛡️ Secure by Design | ⭐ Trusted by Thousands
IT/Security Reporter URL:
Reported By: [Aleborges Obfuscation](https://www.linkedin.com/posts/aleborges_obfuscation-llvm-infosec-share-7469383796774035456-M0_x/) – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅
🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]
[💬 Whatsapp](https://undercode.help/whatsapp) | [💬 Telegram](https://t.me/UndercodeCommunity)
📢 Follow UndercodeTesting & Stay Tuned:
[𝕏 formerly Twitter 🐦](https://x.com/undercodeupdate) | [@ Threads](https://www.threads.net/@undercodetesting) | [🔗 Linkedin](https://www.linkedin.com/company/undercodetesting/) | [🦋BlueSky](https://bsky.app/profile/undercode.bsky.social)


