Experiment·Mar 11, 2026·5 min

Who Reviews the Reviewers?

Echo — a seven-agent PR army that has to earn the right to merge.

Here's the problem nobody wants to say out loud: agentic coding broke code review.

Not subtly. Once you hand engineers proper agentic tooling, the writing of code stops being the bottleneck and the reviewing of it becomes one. The firehose is real — AI ships diffs faster than any human can read them, and the team's senior people, the ones whose review actually means something, become the single slowest part of the pipeline. You can see where this goes. Either review becomes a rubber stamp — which is worse than no review, because now you've got the illusion of safety — or it becomes a queue, and your two-to-four-times coding speed evaporates while perfectly good PRs sit waiting for a human with the right context to wake up.

In a regulated insurer, "rubber stamp" isn't an option and "queue forever" isn't either. So I built Echo.

Echo is a seven-agent PR review system, modelled on the multi-agent research patterns Anthropic published — the insight being that you get better results from a small team of specialists than from one enormous prompt trying to hold every concern in a single context window at once. One model asked to simultaneously check security, test coverage, architectural fit, style, governance and blast radius does all of them at about a B-minus. Seven agents each carrying one lens, coordinated, do each of them properly and then reconcile. (The exact roster — who owns which lens and how they hand off — is the one bit I've left out of this post on purpose; it's the part most worth getting right and I'd rather describe it accurately than from memory.)

The interesting parts of Echo aren't the agents, though. They're the two things bolted to either side of them.

On one side: it knows the rules instead of guessing them. Echo doesn't have a vibe about your security posture — it reads it. It consumes the same policy-as-code that the Governor Module injects at invocation and asserts in CI, which means cyber, cloud and design each get to encode what "acceptable" means in their domain, and Echo reviews against that, not against whatever a base model thinks good practice looked like at training time. Governance stops being a human gate bolted on at the end and becomes a thing the reviewer already carries. A finding from Echo isn't "this feels off" — it's "this violates the policy your security team wrote, here's the clause."

On the other side: it can see past the diff. A reviewer blind to the rest of the org will happily approve a change that's locally perfect and breaks three downstream teams next sprint. Echo is wired into Skynet, the cross-repo context graph, so it assesses upstream and downstream impact across all 3,600-odd repos rather than the handful of files in front of it. That's the difference between "this PR is fine" and "this PR is fine and also nothing it touches detonates elsewhere."

But the part I'm proudest of — and the part that's genuinely a bit novel — is that Echo has to earn the right to auto-approve.

This is the whole game, really. Everyone wants the "danger zone" auto-merge dream where the robots just ship and you go to the pub. The flag is right there in my name; I'm not philosophically opposed to a bit of velocity. But in a regulated environment you cannot just flip auto-approve on and hope. So Echo doesn't get auto-approve as a setting. It earns it, narrowly, over time. It starts conservative — flags everything, merges nothing on its own — and it runs a learning loop that tunes its own behaviour against the outcomes: where did its calls agree with the humans, where did it overstep, where was it needlessly cautious. As that track record builds, its lane widens, but only in the specific, low-risk shapes it's demonstrably good at. Trivial dependency bumps, mechanical refactors, the nothing-burger PRs — it can earn the right to wave those through unattended. The gnarly architectural stuff stays firmly in front of a human, and stays there until the evidence says otherwise.

Trust isn't a toggle. It's a thing you accrue. Echo is built so that an agent accrues it the same way a junior engineer does — by being right, repeatedly, in public, where its mistakes are cheap and visible — rather than by being granted it on day one because the demo looked good.

I'll be honest about what Echo doesn't do, because the failure mode of posts like this is to imply it solved everything. It didn't replace senior review; it changed what senior review is for. The humans stopped reading mechanical diffs and started spending their attention on the genuinely ambiguous calls — the architectural forks, the "should we even do this" questions — which is where a senior engineer's judgement was always meant to go. It also doesn't fix a broken policy: if the rules cyber and design encode are wrong, Echo will enforce wrong rules with great consistency. Garbage in, beautifully-reviewed garbage out. And the learning loop needs guarding — an agent that learns from outcomes can learn the wrong lesson if the outcomes it's learning from were themselves rubber stamps. So the loop is supervised, deliberately, because an unsupervised self-tuning reviewer is just a polite way to build a machine that slowly talks itself into anything.

The bigger picture is the one I keep banging on about: the PR firehose doesn't slow down. AI commit volume only goes up from here, and the answer was never going to be "hire more reviewers" or "review less." It was always going to be better refereeing — review that scales with the volume, knows the actual rules, sees the whole field, and earns its autonomy one correct call at a time.

That's Echo. Seven agents, a rulebook it actually reads, a map of everything it might break, and a probation period it never fully graduates from.

Which is, when you think about it, exactly how you'd want to manage any reviewer you didn't entirely trust yet. Robot or otherwise.

#ai #sdlc #code-review #multi-agent #governance #prometheus

Your Harness Has Amnesia