Code Review Is Your New Bottleneck (And AI Made It Worse)

There’s a pattern I keep seeing. An engineering team adopts AI coding tools. Developers start shipping code faster. Everyone’s excited for about six weeks. Then the metrics come in and cycle time is… flat. Sometimes worse.

What happened? The bottleneck moved.

Your engineers can write code 2-3x faster now. Great. But every one of those PRs still needs to go through the same code review process, the same CI pipeline, the same deployment queue. You didn’t speed up delivery — you just created a traffic jam at the next choke point.

And the data backs this up in a way that should worry every engineering leader paying attention.

The DORA Numbers Are Alarming

Google’s DORA team — the gold standard for engineering performance research — released findings that should be mandatory reading for anyone managing an AI adoption strategy. The headlines:

91% increase in code review time. PRs are taking nearly twice as long to get reviewed.
154% increase in PR size. AI-assisted PRs are significantly larger than manually written ones.
Review quality is declining. Reviewers are spending more time per PR but catching less, because the cognitive load of reviewing AI-generated code is different than reviewing human-written code.

Think about what this means practically. Your developer writes a feature in two hours instead of six. They open a PR that’s 250% bigger than it would have been. The reviewer — who isn’t using AI for review — now needs to understand and validate a massive diff that was generated by a system optimized for producing correct-looking code, not readable code.

The reviewer is now your bottleneck. And they’re drowning.

Why AI-Generated Code Is Harder to Review

There’s a subtle problem with AI-generated code that most teams haven’t internalized yet: it’s optimized for correctness, not comprehension.

When a human writes code, they leave fingerprints everywhere. Variable names that reflect their mental model. Comments where they struggled. A refactor path you can trace through the commit history. Patterns that match the rest of the codebase because the human has been reading it for months.

AI-generated code doesn’t have those fingerprints. It’s syntactically correct, passes the linter, probably runs the tests. But it often:

Uses generic patterns instead of project-specific conventions
Mixes abstraction levels in ways that technically work but confuse readers
Generates verbose code where an experienced developer would have used a well-known library or pattern
Lacks the “why” — there’s no comment explaining the design choice because there wasn’t a design choice, there was a prompt

Reviewers are catching this intuitively. They look at a 400-line PR and something feels off. They can’t skim it the way they’d skim a colleague’s code because the patterns aren’t familiar. So they have to read every line carefully. That’s where the 91% increase in review time comes from.

The Math Nobody’s Doing

Let me walk through a concrete example.

Before AI:

Developer writes a feature: 6 hours
PR size: ~150 lines
Review time: 30 minutes
CI + deploy: 45 minutes
Total cycle time: ~8 hours

After AI (naive adoption):

Developer writes the same feature: 2 hours
PR size: ~375 lines (154% increase)
Review time: 57 minutes (91% increase)
CI + deploy: 45 minutes (unchanged)
Review queue wait: 4 hours (because every other developer is also shipping 3x more PRs)
Total cycle time: ~8 hours

The developer saved 4 hours writing code. The system consumed those 4 hours in review queue wait time and longer reviews. Net improvement to delivery speed: zero.

This is the “CI pipeline bottleneck” pattern we wrote about, but shifted one stage upstream. You fixed — or in this case, accelerated — one part of the pipeline without addressing the downstream capacity. The system’s throughput is still gated by its slowest stage.

Three Ways to Actually Fix This

The fix isn’t to slow down code generation. You paid for those AI tools, and they are making your developers more productive at writing code. The fix is to accelerate the stages that are now the constraint.

1. AI-Assisted Code Review

If AI is generating the code, AI should also be helping review it.

I don’t mean replacing human reviewers — that’s a terrible idea and I’ve seen the bugs that produces. I mean using AI as a first pass: catch the style violations, the convention mismatches, the obvious issues. By the time a human reviewer opens the PR, the trivial stuff is already flagged and the developer has fixed it.

This can cut human review time by 30-40% in practice. Not by catching subtle logic bugs (AI is mediocre at that in large PRs), but by eliminating the noise that makes reviewers spend 10 minutes on formatting nitpicks instead of focusing on the actual logic.

The key: the AI reviewer needs context infrastructure. A generic linter won’t cut it. The AI needs to know your team’s conventions, your architecture patterns, your common anti-patterns. Without that context, AI-assisted review produces a wall of generic suggestions that reviewers learn to ignore.

2. Smaller, More Frequent PRs

The 154% increase in PR size is a choice, not an inevitability. AI makes it easy to generate a lot of code at once, but nothing forces you to ship it all in one PR.

Teams that manage this well establish explicit PR size limits — 200 lines max, for example — and use AI to help break large changes into logical, reviewable chunks. The AI that helped you write 500 lines of a feature can also help you split it into three PRs with clear boundaries.

Smaller PRs review faster (exponentially, not linearly — a 100-line PR reviews in 5 minutes, a 400-line PR reviews in 45). They’re also easier to revert if something breaks, which reduces the blast radius of AI-generated bugs.

The cultural shift: stop measuring developer productivity by PR size. Start measuring it by PRs-merged-to-production. A developer who ships five 80-line PRs that each get reviewed in 10 minutes is moving faster than one who ships a single 400-line PR that sits in review for three days.

3. Fix the Review Process Itself

Most code review processes were designed for a world where developers produced 3-5 PRs per week. They assumed reviews would be a small fraction of engineering time. That assumption is now wrong.

Things that need to change:

Dedicated review time blocks. If reviews are “whenever you get to it,” they’re always the thing that gets deprioritized when you’re in flow on your own work. Teams that allocate 30-60 minutes per day specifically for reviews maintain healthier review queues.

Tiered review requirements. Not every PR needs the same level of scrutiny. An AI-generated test file doesn’t need the same review depth as a hand-written authentication flow. Create explicit tiers: automated-only review for low-risk changes, lightweight review for standard changes, deep review for critical paths.

Review load balancing. In most teams, 2-3 senior engineers end up reviewing 70% of all PRs. That’s always been a problem, but AI amplified it because now there are more PRs. Distribute review load explicitly. Track who’s reviewing what. Make it visible.

Review the prompt, not just the output. This is an emerging practice that I think will become standard. When a PR is AI-generated, include the prompt or description of what the AI was asked to do. Reviewing “I asked Claude to implement retry logic with exponential backoff for the payment API” plus the output is dramatically faster than reviewing 200 lines of retry logic cold.

The Bigger Picture: Velocity Engineering

Here’s the thing that most AI adoption conversations miss. Writing code faster is a local optimization. Velocity engineering is about the whole system.

When we work with engineering teams, the first thing we do is map the entire delivery pipeline — from ticket creation to production deployment. We measure where time actually goes. And without exception, code writing is never the longest stage. It’s usually third or fourth behind review, CI, and deployment.

AI tools just made the third-longest stage even shorter. If you don’t address the stages that were already slower, you’ve invested in acceleration and gotten nowhere.

The teams that see real cycle time improvements from AI adoption are the ones that treat it as a pipeline problem:

AI accelerates code writing ✓
AI-assisted review + smaller PRs accelerate review ←
Parallelized CI and fast test suites accelerate validation ←
Automated deployment with good rollback accelerates release ←

You need all four. Most teams have only done #1 and are wondering why nothing changed.

The Counter-Intuitive Move

Here’s what I tell engineering leaders who show me their flat cycle time numbers post-AI-adoption: your AI investment is working. Your pipeline can’t absorb it yet.

That’s actually good news. It means the technology delivers on its promise. The engineering practice around it just needs to catch up. And catching up on review processes, CI optimization, and deployment automation is a known problem with known solutions. It’s not speculative — we’ve been solving these problems in DevOps for a decade.

The companies that crack this see the results show up fast. Once you unclog the review bottleneck and your pipeline can handle the throughput, all those developer productivity gains that were being absorbed by queue wait times suddenly flow through to actual delivery speed.

The AI adoption story isn’t “we made developers faster and everything got better.” It’s “we made developers faster, found the next bottleneck, fixed it, found the next one, fixed that.” It’s iterative. It’s engineering. It’s the whole point.

Stop celebrating faster code writing. Start measuring faster delivery.

Dave O’Dell is co-founder of App Vitals, where he and Dan McAulay help engineering teams identify and fix the bottlenecks that keep AI investments from translating into real delivery speed. When your pipeline can’t absorb your developers’ new velocity, let’s fix that.