An illustrated figure of a woman who is examining a code pipeline. She is scratching her head to indicate that she's confused.

Code review is theater now

John Bristowe
John Bristowe

Back in March, Gene Kim shared a conversation he had with Jez Humble on LinkedIn. Jez made a beautifully sarcastic remark:

Don’t worry about code reviews, Gene. Code reviews and approvals have always involved a lot of theater. We just need to perpetuate that illusion a little longer and keep pretending that humans are actually reviewing all that agent-generated code.

Jez is absolutely right; code reviews do involve a lot of theater. Especially now in the era of AI-generated code. In the short amount of time since this post, this trend has become more pronounced. Code review used to be considered a solid approach to ensuring quality and compliance. It just isn’t anymore, and we need to be honest about its effectiveness for development teams today.

The chocolate belt wrappers

Consider the all-too-familiar process of reviewing a pull request (PR). The notification bell icon lights up, indicating that you have something to review. You open it, review the code, slap “LGTM” on it, and click “approve.” It compiles. Ship it.

Now consider the scenario in which agents write the majority of the PRs. You probably know how this ends up if you’ve ever seen the “Job Switching” episode of I Love Lucy.

Lucy and Ethel wrapping chocolates

In the episode, Lucy and Ethel take jobs on an assembly line wrapping chocolates. Everything starts fine until then the belt speeds up. Lucy and Ethel can’t keep pace, so they start hiding chocolates wherever they can. The chocolates keep coming. The wrapping of chocolates, what we call code review, becomes theater.

In our world, these chocolates are PRs, AI coding agents are the belt, and code review is Lucy, frantically trying to keep up while the quality of what’s getting through drops with every passing minute. A lot of what’s coming off that belt can be slop. It compiles. (Or, sometimes not.) If you’re lucky, it passes your test matrix. Looking closely at the code, it looks fine until you realize the model copied a pattern from its training data that doesn’t actually fit your problem. The person reviewing would likely not realize this. The reviewer would likely check whether the syntax and structure are correct, not whether the code should exist in the first place.

Agents can produce huge chunks of code in the time it takes to read this sentence. The PRs reflect this. Now consider the burden this places on a reviewer. Is it reasonable to evaluate a 40,000-line change? Does it get better if we atomize it into 4,000 tiny 10-line diffs? You can read each diff and still miss whether it’s the right change. That’s because you weren’t part of the reasoning that produced it. You have no context whatsoever. It’s like flipping to the middle of a book and claiming you know where you are in the story.

Yes, AI makes producing code much, much faster. However, reviewing that code has become much, much harder. As an industry, we tout and celebrate the speed. But we don’t talk about the PRs piling up, putting everyone downstream under pressure.

The chocolate belt speeds up

If you take a look at the 2026 DORA report, 90% of developers now use AI tools at work. Developers are spending 2+ hours a day with these tools, completing 21% more tasks and merging 98% more pull requests.

With great power comes great responsibility. The average number of bugs per developer is up 54%. Faros AI’s analysis of 10,000+ developers found incidents per pull request are up 242.7%. We’ve essentially doubled our merge rates while breaking things three times as often. We see the impact of AI-generated code in our own data, too. Our 2026 AI Pulse report found that AI reduces task hours across every part of the delivery pipeline except for code review. 72% of developers use AI to write code, but only 56% bother using it for their reviews. The chocolate belt is accelerating, and Lucy and Ethel are starting to look nervous.

To be fair, Daniel Stenberg, the creator of curl, recently noted that AI-generated contributions have gone from slop to genuinely good. Problem solved, right? Not quite. PRs are arriving faster than his team can review them. We have better chocolates, but the same belt speed problem. Our review queue is starting to resemble a backlog.

So what do we do about it? The prevailing sentiment right now is to chuck AI at the review problem, too. Make Ethel check Lucy’s work. But think about that. They’re standing at the same belt and they’ve trained on the same data. They have the same blind spots. “AI reviewed it so we’re good” is the new “the dog ate my homework.” Except now the dog wrote the homework, ate it, barfed it up, and gave it an A+.

That’s the real takeaway from the DORA data. AI is an amplifier. It can amplify our intelligence or our stupidity. We need to be careful. Right now, a lot of us have the chocolate belt of PRs cranked up to full speed.

Enter the wrapping machine

The chocolate belt does exactly what it’s supposed to do. The wrapping process (code review) is what failed. And the fix has been staring us in the face since the Continuous Delivery (CD) movement began. Our deployment pipeline is the assurance mechanism, not the human with the approve button. If quality and security requirements are missing from the pipeline as automated checks, code review will never ensure they are met. We are simply hoping that a human – somewhere in the chain – might catch the problem.

Yes, we still need people who can look at a system and say, “This is the wrong approach.” That’s not going away. But we’re expecting that same person also to be the last line of defense against every bug and every security gap in every deployment. That was never going to work. We just didn’t have a reason to admit it until now.

What’s actually in the chocolate

Let’s stop pretending code review is something it isn’t.

Code review is great for knowledge sharing and catching design-level issues. But it’s horrible at catching every bug in a 40,000-line diff. Bugs matter when code is shipping to production.

So the question we should be asking ourselves isn’t “how do we make code review scale?” It’s “how do we build a pipeline that can verify what it’s shipping, regardless of who or what wrote the code?”

Policy-as-code is one way to get there. We write rules that define our deployment standards, and the pipeline checks every deployment against them. The developer sees what went wrong and how to fix it. There’s no waiting around for someone to review a diff.

Learning to wrap chocolate

It would be foolish of me not to mention the fact that there’s something a chocolate wrapping machine can’t teach you. And that’s the process of wrapping chocolate. In our world, that’s the act of conducting a code review. It’s how junior engineers develop judgment.

Mentorship comes from reading other people’s code, getting feedback on your own, and absorbing the unwritten reasons behind certain decisions. That pipeline is already breaking. 73% of organizations have reduced junior developer hiring in the past two years. Junior devs dropped from 32.8% to 24.8% of Stack Overflow respondents between 2024 and 2025. If we let that continue without figuring out another way for juniors to learn, we’re in trouble. We end up with a generation of engineers who can prompt effectively but can’t reason about a system’s design.

I’m not saying we need to remove code review. But we need to stop kidding ourselves that it’s the all-seeing, all-knowing quality gate we’ve built it up to be.

Wrapping up

To reiterate, Jez was right. Code review has worked well enough when humans are involved in the volume of code being reviewed. It was good enough when a team merged a handful of PRs a day. However, it’s not good enough when AI is generating them.

The answer isn’t a better performance. It’s a better pipeline. One that can prove our software works before it hits production. The CD community has been saying this for years. Most of us just didn’t have a reason urgent enough to listen. But with the advent of AI and code generation, we’re now compelled to.

If our quality gates live in our pipeline, it doesn’t matter whether the code was written by a human, an AI, or a very determined cat walking across a keyboard.

If the quality of our code reviews is determined by a human’s abilities, we’re in trouble, because AI sped up the belt, and the chocolates aren’t going to wrap themselves.

Happy deployments!

John Bristowe

Related posts