The caveman skill matters more now that Copilot bills by tokens
GitHub is moving Copilot to usage-based billing. Pairing the caveman skill with grill-me planning keeps output cheap and intent sharp.
GitHub announced that Copilot is moving to usage-based billing on June 1, 2026. Seat prices stay the same, but agent work is no longer a flat buffet. Premium request units go away. GitHub AI Credits get consumed from token usage: input, output, and cached tokens, billed at each model's API rate.
That changes the math for anyone running long agent sessions. A short chat and a multi-hour repo crawl used to cost the same under the old premium-request model. Soon they will not. The product direction is clear: agentic coding is the default, and inference cost follows usage.
I have been using two skills together that fit this shift well: caveman for compressed replies, and grill-me for planning before code.
Why output tokens suddenly matter
Most of the advice around AI coding still focuses on prompts and context windows. That is half the bill.
Under usage-based pricing, verbose agents are not just annoying. They are expensive. Every preamble, recap, and "I'd be happy to help" paragraph is billed output. So is every long implementation summary when a short one would do.
The caveman skill attacks that side directly. It tells the agent to drop filler, keep technical substance, and answer in short fragments. The project's own benchmarks claim roughly 65–75% less output on comparable tasks without losing accuracy. Your mileage will vary by model and session, but the direction is right: shrink what the agent says, not what it can do.
Install it the same way as other agent skills:
npx skills@latest add JuliusBrussee/caveman -a github-copilot
For Cursor or a repo-wide always-on rule:
npx skills@latest add JuliusBrussee/caveman -a cursor --with-init
Swap the -a flag for whatever editor or CLI you use. The installer maps the skill into each tool's native format (Copilot instructions, Cursor rules, Claude plugin, and so on).
Caveman does not replace good architecture or tests. It reduces token burn on the response side. That is exactly what usage-based Copilot will charge you for on every turn.
Grill first, then talk small
Caveman alone would still let you burn credits on the wrong work. A terse agent that implements the wrong feature is cheap noise.
That is where grill-me fits. I wrote about this pattern in AI-assisted coding needs smaller loops: make the agent interview you before it touches the repo. One question at a time. Product shape, edge cases, data model, failure modes. Blurry intent becomes a design you can inspect.
Install from Matt Pocock's skills collection:
npx skills@latest add mattpocock/skills
Then invoke grill-me (or the equivalent prompt) at the start of a feature. The agent spends tokens on questions and decisions up front instead of on large wrong diffs later.
Used together, the loop looks like this:
- Grill-me forces deep planning and shared shape before implementation.
- Caveman keeps each reply short once coding starts, so output tokens stay down across a long session.
You plan in full sentences if you want. The agent can answer in compressed prose. Planning quality goes up; chattiness on the bill goes down.
What this does not solve
A few limits are worth stating plainly.
Caveman compresses agent output, not your context files, not every tool call, and not input tokens from a huge AGENTS.md or thread history. Usage-based billing still punishes bloated context. Compaction, vertical slices, and clearing stale sessions still matter. See the same smaller loops post for that side of the workflow.
Grill-me adds turns at the start. Those cost tokens too. The bet is that cheaper, sharper implementation later pays for the interview. On token-metered plans, that bet is easier to justify than on unlimited PRUs.
Business and Enterprise customers get pooled credits and budget caps starting in June, with promotional extra credits for a few months. Individuals on Pro get credits aligned to the monthly price. None of that removes the incentive to waste fewer tokens per task.
Practical stack for token-metered Copilot
If you are on Copilot Business or Enterprise when the switch hits, this is a reasonable default:
- Run grill-me (or your own planning skill) before non-trivial features.
- Enable caveman so agent replies stay dense during implementation.
- Keep tasks in vertical slices with fast feedback (tests, lint, smoke) so wrong paths get killed early.
- Watch the preview bill GitHub is rolling out in May so you see projected spend before June 1.
The caveman bit is memey on purpose. The billing change is not. Copilot is pricing like an API now. Skills that tighten planning and shrink output are a cheap way to align your workflow with that reality.