Don't argue with the hype. Let your history settle it.

7 min read
claude-code mcp toon data-driven session-analysis tokens automation

Months ago I wrote that your Claude Code session history is a goldmine, and that almost nobody bothers to mine it. Then I left my own gold sitting in the ground. Like you do.

This week I went digging again. Not for knowledge this time, but to settle a bet: everyone’s saying you should convert your MCP servers’ JSON to TOON to save tokens. The benchmarks look incredible, 30 to 60 percent fewer tokens. I wanted to believe it.

So I asked the only source that actually knows how my tools behave: months of my own history.

What TOON even is

Same data as JSON, far fewer tokens, because it names the columns once and lays the rows out like a table. Three issues in JSON, 257 characters:

{
  "issues": [
    { "key": "RD-1", "status": "Open", "type": "Bug" },
    { "key": "RD-2", "status": "Done", "type": "Task" },
    { "key": "RD-3", "status": "Open", "type": "Story" }
  ]
}

XML is worse, 285:

<issues>
  <issue><key>RD-1</key><status>Open</status><type>Bug</type></issue>
  <issue><key>RD-2</key><status>Done</status><type>Task</type></issue>
  <issue><key>RD-3</key><status>Open</status><type>Story</type></issue>
</issues>

TOON, 79:

issues[3]{key,status,type}:
  RD-1,Open,Bug
  RD-2,Done,Task
  RD-3,Open,Story

Gorgeous. Hold onto that picture though, it’s TOON at its best: rows that all share the same columns. Remember that when we hit real data.

I asked my history first

Before building anything, I had Claude find the tool that was actually hurting. Not a guess. My traffic, ~1,600 real Atlassian calls.

Loudest tools · 28 MB of JSON across my history
Jira search 16.8 MB
Get one issue 4.9 MB
Confluence page 2.2 MB
Confluence search 1.0 MB
One tool, the Jira search, was 60 percent of everything. That's where any win had to come from.

So I pulled one real search result, 100 issues, 428 KB of pretty-printed JSON, and measured every option.

What the data actually said

Just changing the format · one 428 KB payload
Convert to TOON (the plan) 13%
Just compact the JSON 24%
Longer bar = more tokens saved. The fancy format lost to just removing whitespace.

TOON lost to plain compact JSON. Because the benchmark’s happy path, flat uniform rows, is not what a Jira issue is. Real issues are deeply nested, every one a different shape. No table to lay out, so TOON falls back to indented YAML and the indentation costs more than it saves.

So I asked a better question: if the format isn’t the problem, what is? Look at what one issue actually carries.

{
  "expand": "renderedFields,names,schema,operations,editmeta,changelog",
  "self": "https://your-co.atlassian.net/rest/api/3/issue/84745",
  "key": "PROJ-142",
  "fields": {
    "summary": "Checkout fails on expired coupon",
    "issuetype": { "self": "https://.../issuetype/10018", "id": "10018",
      "iconUrl": "https://.../avatar/10303?size=medium", "name": "Bug" },
    "assignee": { "self": "https://.../user", "accountId": "5f2a...",
      "avatarUrls": { "48x48": "https://...", "24x24": "https://...", "16x16": "https://..." },
      "displayName": "Maria P.", "active": true, "timeZone": "Europe/Bucharest" },
    "...": "and 40 more fields shaped just like this"
  }
}

All of that, for one ticket. Icons, avatars, self-links, schema plumbing, none of which the model reads. Strip it to the fields a human actually uses and the same row becomes one line:

issues[1]{key,summary,status,type,assignee}:
  PROJ-142,Checkout fails on expired coupon,In Progress,Bug,Maria P.

Do that across the whole payload and measure again:

Changing what's in the payload · same 428 KB
Convert to TOON 13%
Just compact the JSON 24%
Trim to the fields that matter 96%
Trim it, then TOON 98%
The cliff is projection, not format. TOON only adds the last two points, once the data is already flat.

The win was never the format. It was throwing away the junk. TOON earns its keep, but it rides on the trimming, not the other way around. I had the lever backwards. So does the hype.

The boring version that works

What I shipped isn’t “convert JSON to TOON.” It’s three tiers, each doing the cheapest correct thing:

Jira searches · trim + TOON 98%
Single issues · strip junk + compact 78%
Everything else · compact ~32%
Roughly 80 percent off across my real traffic. Anything it can't parse, it passes through untouched.

And on any error it returns the original response, byte for byte. This sits in front of live work. A clever optimization that occasionally breaks a tool call is worse than no optimization at all.

Then Metabase said no too

Same trick, my BI tool, another 28 MB. The data said no again, louder. Those responses are already markdown tables where every byte is real data, rows averaging 2.7 KB because the cells hold full JSON blobs. TOON saved nothing, and on the biggest one made it worse. No formatting fat to cut. The only lever there is fewer rows, which belongs in the query, not in a gateway quietly hiding data from me. So I left it alone.

Letting the data talk you out of work is the same muscle as letting it talk you into it. I just got to use it twice in one afternoon.

What this is actually about

TOON is fine. I’ll reach for it the day my data is shaped right for it.

The post is about the move that keeps paying off: when a shiny technique is making the rounds, don’t argue with the benchmark and don’t adopt it on faith. Ask your own history, which already knows how your tools behave, and let the numbers pick the direction. Claude does the mining and the measuring in minutes. You just have to be willing to be wrong before you build.

On a good day I’m a Thinker, the kind that gets twitchy around claims without a number attached. The old version of that was slow: argue from first principles, trust your gut. The new version is fast and honest. Pair the instinct with data and an agent that can crunch it, and you get to be skeptical and done by lunch. Smart, not dumb.

How I’ll actually know (the next 30 days)

Here’s the part I’m a little smug about: the interceptor grades itself. Every single call it touches, it writes one line to a log, the size before and the size after:

{"in": 459642, "out": 9216}
{"in": 88311, "out": 19452}
{"in": 3284, "out": 2106}

in is what would have hit the model without it. out is what actually did. No estimate, no sample data, no benchmark I have to trust. Real traffic, every call, for 30 days. At the end I sum both columns and run one line:

4,127 calls · 31.4 MB in · 6.1 MB out · saved 81%

That’s the whole point. Every tool call is now its own tiny experiment, and the claim in this post gets graded by reality instead of by me. The lab said 80 percent. In a month I’ll know what production says, and the two are allowed to disagree.


In 30 days I’ll either confirm that number on real traffic or eat my words in public, and I genuinely don’t know which yet. If I forget to publish it, and knowing my todo list that’s a live possibility, poke me and I’ll pull the receipts. The whole point was to not trust the number until I’d seen it on my own data. Only fair I show you mine.