Listen to this Post

Introduction:
Large language models have a verbosity problem. When you ask Claude Code to debug a React re-render or fix authentication middleware, it often responds with paragraphs of pleasantries, hedging, and redundant explanations—all of which consume expensive output tokens without adding technical value. The Caveman plugin (84.3K GitHub stars and climbing) solves this by forcing AI coding agents to communicate like prehistoric humans: stripped of filler, packed with signal. What started as a viral joke has become one of the most practical token-optimization tools in the AI developer ecosystem, backed by real benchmarks and a March 2026 academic paper showing that brevity constraints can improve model accuracy by up to 26 points on certain benchmarks.
Learning Objectives:
- Understand how Caveman achieves 65–75% output token reduction while preserving 100% technical accuracy
- Master the one-line installation process across Claude Code, Codex, Gemini, Cursor, and 30+ other agents
- Learn to configure intensity levels from `lite` to `ultra` and even `wenyan` (classical Chinese) for maximum compression
- Implement Caveman’s companion tools: commit message compression, one-line PR reviews, and input-file compression for CLAUDE.md
- Evaluate when Caveman delivers massive savings and when it should be turned off (per the project’s own HONEST-1UMBERS.md)
- What Caveman Actually Does — And What It Leaves Alone
Caveman is not a model replacement or a fine-tune. It is a skill/plugin that injects a system-level instruction into your AI agent, penalizing verbose output and enforcing a caveman-style communication pattern. The plugin preserves everything that matters:
- Code blocks — written normally, not compressed
- Technical terms — “polymorphism” stays “polymorphism”
- Error messages — quoted exactly as they appear
- Git commits and PRs — written in standard format
What Caveman removes is the fluff: articles (“a,” “an,” “the”), pleasantries (“Sure, I’d be happy to help”), hedging (“It might be worth considering”), and any narrative filler that doesn’t convey technical information.
The before/after comparison is stark. A normal Claude response explaining a React re-render bug uses 69 tokens: “The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle…” Caveman Claude delivers the same fix in 19 tokens: “New object ref each render. Inline object prop = new ref = re-render. Wrap in useMemo.” That’s a 72% reduction with zero loss in technical accuracy.
2. Installation: One Line, 30 Seconds, Zero Telemetry
Caveman installs across every supported agent on your machine automatically. The installation script detects Claude Code, Cursor, Codex, Gemini, Windsurf, Cline, Copilot, and 30+ other agents, then runs each agent’s native installation path.
For macOS / Linux / WSL / Git Bash:
curl -fsSL https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.sh | bash
For Windows (PowerShell 5.1+):
irm https://raw.githubusercontent.com/JuliusBrussee/caveman/main/install.ps1 | iex
Alternative installation via npx:
npx skills add JuliusBrussee/caveman
Or via Claude Code plugin system:
claude plugin marketplace add JuliusBrussee/caveman claude plugin install caveman@caveman
Requirements: Node.js ≥18. The installation is idempotent—safe to re-run anytime. Everything stays local: no telemetry, no network calls after installation.
3. Triggering Caveman: Modes, Commands, and Intensity Levels
Once installed, Caveman activates with simple commands:
- Type `/caveman` in Claude Code
- Say “talk like caveman,” “caveman mode,” or “less tokens please”
- For Codex, use `$caveman`
To stop: “stop caveman” or “normal mode”
Caveman offers four intensity levels, switchable with a single command:
| Mode | Description | Best For |
||-|-|
| `lite` | Drop filler only, keep professional tone | Everyday coding |
| `full` | Default caveman—aggressive compression | Most technical tasks |
| `ultra` | Telegraphic, maximum brevity | High-volume, repetitive work |
| `wenyan` | Classical Chinese (even shorter per token) | Maximum compression |
The `wenyan` mode is particularly interesting: classical Chinese is one of the densest languages per token, and Caveman offers three variants—wenyan-lite, wenyan-full, and wenyan-ultra—with character reductions of 80–90%.
4. Real-World Benchmarks: 65% Average, 87% Peak Reduction
The Caveman repository publishes transparent benchmarks across multiple task categories:
| Task | Normal Tokens | Caveman Tokens | Reduction |
|||-|–|
| Explain React re-render bug | 1,180 | 159 | 87% |
| Fix authentication middleware | 704 | 121 | 83% |
| Set up PostgreSQL connection pool | 2,347 | 380 | 84% |
| Docker multi-stage build | 1,042 | 290 | 72% |
Average across 10 tasks: 65% output token reduction.
One developer testing Caveman on a medium TypeScript refactoring task reported: standard Claude Code used 4,287 tokens; Caveman mode used 1,498—a 65% reduction. At Claude Code’s pricing (~$0.075 per 1K output tokens), that’s roughly $0.42 saved per task. At 50 tasks per day, that’s $21/day or $420+ per month.
Speed improvements are equally significant: with ~75% fewer tokens to generate, responses arrive roughly 3x faster.
- The Honest Numbers: When Caveman Costs You Tokens
What sets Caveman apart from hype-driven tools is its transparency. The repository includes a document called HONEST-1UMBERS.md that admits exactly when Caveman saves tokens and when it costs them.
Caveman saves tokens when:
- The task involves conversational or discursive output (roughly 25% of a typical coding session’s total tokens)
- You need quick, mechanical fixes, boilerplate, or well-scoped bug fixes
Caveman can be net-1egative when:
- The reply is already short (Caveman only shrinks output, not input)
- You’re dealing with complex architectural decisions that require nuanced explanation
As one tester noted: “I asked it to design a microservices communication strategy and got back ‘Many service. Talk slow. Use queue.’ Technically correct? Sure. Actually useful? Not even close.” The HONEST-1UMBERS.md page tells you exactly when to turn it off.
6. Beyond Output Compression: The Full Caveman Ecosystem
Caveman is more than a single skill—it’s an ecosystem of token-optimization tools:
caveman-compress: Compresses your CLAUDE.md, todos, and preference files by ~46%, so every future session starts with fewer input tokens.
caveman-commit: Generates terse, one-line commit messages that still convey the full intent.
caveman-review: Produces one-line PR reviews—no fluff, just actionable feedback.
caveman-code: A full terminal coding agent built from the ground up with caveman principles—~2× fewer tokens than Codex on identical tasks, supporting 20+ providers, plan mode, and an autopilot goal loop.
Install caveman-code:
npm install -g @juliusbrussee/caveman-code
Statusline badge: Claude Code shows `[bash] ⛏ 12.4k` (lifetime tokens saved), updated with every `/caveman-stats` run.
7. The Academic Backing: Why Brevity Improves Accuracy
A March 2026 paper, “Brevity Constraints Reverse Performance Hierarchies in Language Models,” found that constraining large models to brief responses improved accuracy by 26 points on certain benchmarks. The implication is counterintuitive: verbose isn’t always smarter. Sometimes it’s just slower, more expensive, and less accurate.
Caveman operationalizes this insight. By stripping away the filler that models generate to sound helpful or conversational, it forces the agent to focus on what actually matters—the technical content. The result is not just cheaper output; it’s often better output.
What Undercode Say:
- Token efficiency is a competitive advantage. In high-volume AI coding workflows, 65–75% output reduction translates directly to faster iterations, lower costs, and more productive sessions. The developers starring Caveman aren’t doing it for the memes—they’re doing it because the math works.
-
Transparency builds trust. Publishing HONEST-1UMBERS.md to admit when the tool doesn’t work is a masterclass in open-source integrity. Most projects bury their limitations; Caveman links them in the README. That’s why 84.3K developers trust it.
-
The “caveman” approach is a lens on LLM behavior. The fact that you can slash tokens by 75% without losing technical accuracy reveals how much of typical LLM output is noise. Caveman isn’t just a plugin—it’s a critique of how we’ve been using these models.
-
Context matters. Caveman excels at mechanical, well-scoped tasks and fails at architectural judgment. Use it strategically: activate for bug fixes and boilerplate, deactivate for design decisions. The HONEST-1UMBERS.md guide is your roadmap.
-
The ecosystem is expanding. With caveman-code, caveman-compress, and support for 30+ agents, this isn’t a one-off trick. It’s a paradigm shift in how we interact with AI coding assistants—prioritizing signal over noise, efficiency over verbosity.
Prediction:
+1 Caveman-style compression will become a standard feature in every major AI coding agent within 18 months. Anthropic, OpenAI, and Google will either build similar brevity controls natively or acquire the startups that do.
+1 The “wenyan” (classical Chinese) mode points to a broader trend: multilingual token optimization. As models expand to more languages, we’ll see compression techniques that exploit the density of specific languages to drive down costs.
-1 The 65–75% savings figures will be walked back in production environments as developers discover that Caveman’s effectiveness varies wildly by task type. The HONEST-1UMBERS.md admission is honest, but many users won’t read it—leading to frustration when Caveman fails on complex tasks.
+1 The academic finding that brevity improves accuracy will spark a new wave of research into “minimum viable response” generation. Caveman is the first practical implementation of a principle that will reshape how LLMs are trained and prompted.
-1 Over-reliance on caveman-style output could degrade developers’ ability to understand complex reasoning. When every response is telegraphic, the nuance that helps junior developers learn may be lost. The tool is a productivity multiplier for experts—but a potential crutch for beginners.
▶️ Related Video (78% Match):
🎯Let’s Practice For Free:
🎓 Live Courses & Certifications:
Join Undercode Academy for Verified Certifications
🚀 Request a Custom Project:
Secure, high-velocity infrastructure and disruptive technological engineering. Contact our engineering team for high-tier development and proprietary systems:
[email protected]
💎 Smart Architecture | 🛡️ Secure by Design | ⭐ Trusted by Thousands
IT/Security Reporter URL:
Reported By: Curiouslearner Someone – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅


