The Retrospective Is the Missing Step in Your AI Workflows

Last night a walkthrough from the Claude team, “Stop babysitting your agents”, showed up in my YouTube subscriptions. I watched half of it this morning on the way to work.

It is mostly about orchestration, but one smaller idea followed me off the tram and into the office.

A skill should be able to improve itself.

A skill, and the one I picked

In Claude Code, a skill is a reusable set of instructions for a recurring job, written once in plain language and followed by the agent every time. A checklist with opinions.

One of mine is transcript-analyzer. I record meetings, and it turns the raw transcript into a summary, action items, decisions, and who owns what. It works in passes and checks with me at each step.

It is a decent skill. But making my skills improve themselves was not a new idea. Months ago I wrote a whole post about exactly this: every correction you make in a session holds the before-and-after a skill needs to improve, and almost none of it flows back. I knew all that. It just sat on my todo list. Until this morning.

The experiment: make the skill grade itself

The change was simple enough to try that same morning. What if every run ended with the skill grading its own performance? So I added one step: a retro, in three parts.

Measure. Count how many times I had to correct or redirect the output. That number is the scoreboard, and the point is to drive it down.

Reflect. Look back for friction the skill itself caused: a misleading instruction, a missing step, a wrong assumption.

Propose and apply. For each finding, draft the exact change to the skill file, show it to me, and apply it if I approve.

Reflect carries the guardrail that matters most: a change only counts if the agent can quote the exact moment of friction behind it. Ask an AI to improve something and it always finds something, polish and busywork, because coming back empty-handed feels like failing the task. The quotable-moment rule keeps the retro honest, and most runs rightly end with nothing to change.

Making the retro survive the run

A retro is the last thing the skill does, and the last step of a long run is the easiest to quietly drop. By the time a long transcript is done, the model’s working memory has been summarized down to make room for new detail.

That is context rot: an instruction near the top of the skill thins out on the way to the end. The longer the run, the more likely the retro never happens.

The fix is almost dull. The skill opens every run by writing a task list, the retro as its final item, and works it from the top. That list lives outside the conversation, so it does not decay as the context fills. An unfinished last task keeps nagging until the retro happens: no hard gate, just a checklist that refuses to look done.

What happened on the first real run

Then I used the skill for real on a working-session transcript: decisions, loose ends, promises.

The analysis went fine. Then the retro ran, and the scoreboard came back: eight. I had corrected the agent eight times in one run.

Eight is a lot, but the count is not the interesting part. What the retro did with it is.

Five of those eight had the same root cause

The retro grouped the corrections and found that five of the eight were the same mistake wearing different clothes.

The skill had been treating the transcript as the only source of truth. It wrote down whatever the transcript said, even when better, more current information was one query away.

Two examples, kept generic:

The transcript had someone returning from leave “Monday or Tuesday.” I wrote “Monday or Tuesday.” The team wiki had the exact date.
I described the work board from what people said in the meeting. The issue tracker had the real state, which had moved on since.

Same root cause, five times. A meeting is a snapshot of what one room believed at one moment. The systems of record know what is true now, and the skill never told the agent to check them.

The fix wrote itself

The retro’s proposal was specific: add a step to the extraction phase, reconcile against systems of record. Before presenting any fact, check it against the sources that would know, the project’s own files, the issue tracker, the team wiki, the analytics database. And never park something as an open question if one of those can answer it now.

It drafted the exact wording. I approved it, and it went into the skill file in the same session that surfaced the problem.

The retro is older than the tools

A skill without this loop is frozen at version one, exactly as good as the day you wrote it. Every correction you make afterward evaporates when the session ends, and you fix the same thing forever.

None of this is new. The retrospective is the oldest habit in agile: end an iteration, ask what went well, what hurt, what changes next. What changed is only what it inspects and who acts on it, an AI skill, reviewed by the AI itself.

That is where the 10x actually comes from. AI gives you speed, and speed with no feedback loop just reaches the wrong place faster. A twenty-year-old discipline is the steering. Old practice, new tool, finally compounding.

So add a retro step to one of your own AI workflows this week, and tell me how it goes. I want to know what the rest of you find.