Don't argue with the hype. Let your history settle it.
Months ago I wrote that your Claude Code session history is a goldmine, and that almost nobody bothers to mine it. Then I left my own gold sitting in the ground. Like you do.
This week I went digging again. Not for knowledge this time, but to settle a bet: everyone’s saying you should convert your MCP servers’ JSON to TOON to save tokens. The benchmarks look incredible, 30 to 60 percent fewer tokens. I wanted to believe it.
So I asked the only source that actually knows how my tools behave: months of my own history.
What TOON even is
Same data as JSON, far fewer tokens, because it names the columns once and lays the rows out like a table. Three issues in JSON, 257 characters:
{
"issues": [
{ "key": "RD-1", "status": "Open", "type": "Bug" },
{ "key": "RD-2", "status": "Done", "type": "Task" },
{ "key": "RD-3", "status": "Open", "type": "Story" }
]
}
XML is worse, 285:
<issues>
<issue><key>RD-1</key><status>Open</status><type>Bug</type></issue>
<issue><key>RD-2</key><status>Done</status><type>Task</type></issue>
<issue><key>RD-3</key><status>Open</status><type>Story</type></issue>
</issues>
TOON, 79:
issues[3]{key,status,type}:
RD-1,Open,Bug
RD-2,Done,Task
RD-3,Open,Story
Gorgeous. Hold onto that picture though, it’s TOON at its best: rows that all share the same columns. Remember that when we hit real data.
I asked my history first
Before building anything, I had Claude find the tool that was actually hurting. Not a guess. My traffic, ~1,600 real Atlassian calls.
So I pulled one real search result, 100 issues, 428 KB of pretty-printed JSON, and measured every option.
What the data actually said
TOON lost to plain compact JSON. Because the benchmark’s happy path, flat uniform rows, is not what a Jira issue is. Real issues are deeply nested, every one a different shape. No table to lay out, so TOON falls back to indented YAML and the indentation costs more than it saves.
So I asked a better question: if the format isn’t the problem, what is? Look at what one issue actually carries.
{
"expand": "renderedFields,names,schema,operations,editmeta,changelog",
"self": "https://your-co.atlassian.net/rest/api/3/issue/84745",
"key": "PROJ-142",
"fields": {
"summary": "Checkout fails on expired coupon",
"issuetype": { "self": "https://.../issuetype/10018", "id": "10018",
"iconUrl": "https://.../avatar/10303?size=medium", "name": "Bug" },
"assignee": { "self": "https://.../user", "accountId": "5f2a...",
"avatarUrls": { "48x48": "https://...", "24x24": "https://...", "16x16": "https://..." },
"displayName": "Maria P.", "active": true, "timeZone": "Europe/Bucharest" },
"...": "and 40 more fields shaped just like this"
}
}
All of that, for one ticket. Icons, avatars, self-links, schema plumbing, none of which the model reads. Strip it to the fields a human actually uses and the same row becomes one line:
issues[1]{key,summary,status,type,assignee}:
PROJ-142,Checkout fails on expired coupon,In Progress,Bug,Maria P.
Do that across the whole payload and measure again:
The win was never the format. It was throwing away the junk. TOON earns its keep, but it rides on the trimming, not the other way around. I had the lever backwards. So does the hype.
The boring version that works
What I shipped isn’t “convert JSON to TOON.” It’s three tiers, each doing the cheapest correct thing:
And on any error it returns the original response, byte for byte. This sits in front of live work. A clever optimization that occasionally breaks a tool call is worse than no optimization at all.
Then Metabase said no too
Same trick, my BI tool, another 28 MB. The data said no again, louder. Those responses are already markdown tables where every byte is real data, rows averaging 2.7 KB because the cells hold full JSON blobs. TOON saved nothing, and on the biggest one made it worse. No formatting fat to cut. The only lever there is fewer rows, which belongs in the query, not in a gateway quietly hiding data from me. So I left it alone.
Letting the data talk you out of work is the same muscle as letting it talk you into it. I just got to use it twice in one afternoon.
What this is actually about
TOON is fine. I’ll reach for it the day my data is shaped right for it.
The post is about the move that keeps paying off: when a shiny technique is making the rounds, don’t argue with the benchmark and don’t adopt it on faith. Ask your own history, which already knows how your tools behave, and let the numbers pick the direction. Claude does the mining and the measuring in minutes. You just have to be willing to be wrong before you build.
On a good day I’m a Thinker, the kind that gets twitchy around claims without a number attached. The old version of that was slow: argue from first principles, trust your gut. The new version is fast and honest. Pair the instinct with data and an agent that can crunch it, and you get to be skeptical and done by lunch. Smart, not dumb.
How I’ll actually know (the next 30 days)
Here’s the part I’m a little smug about: the interceptor grades itself. Every single call it touches, it writes one line to a log, the size before and the size after:
{"in": 459642, "out": 9216}
{"in": 88311, "out": 19452}
{"in": 3284, "out": 2106}
in is what would have hit the model without it. out is what actually did. No estimate, no sample data, no benchmark I have to trust. Real traffic, every call, for 30 days. At the end I sum both columns and run one line:
4,127 calls · 31.4 MB in · 6.1 MB out · saved 81%
That’s the whole point. Every tool call is now its own tiny experiment, and the claim in this post gets graded by reality instead of by me. The lab said 80 percent. In a month I’ll know what production says, and the two are allowed to disagree.
In 30 days I’ll either confirm that number on real traffic or eat my words in public, and I genuinely don’t know which yet. If I forget to publish it, and knowing my todo list that’s a live possibility, poke me and I’ll pull the receipts. The whole point was to not trust the number until I’d seen it on my own data. Only fair I show you mine.