Automated Code Review for AI-Generated Code: What We Learned

When a developer writes code, a peer reviews it. That process works because humans produce code at a pace that humans can review. But when AI generates code 10-25x faster, the review bottleneck becomes the entire bottleneck. We learned this the hard way, and then we built a system to fix it.

The Volume Problem

On a typical agentic development project, a single developer directing AI agents can produce what used to take a team of five. That is great for velocity. It is terrible for code review. You cannot ask one human reviewer to carefully examine 5x their normal volume and expect them to catch the same percentage of issues. The math does not work.

We tried scaling human review for about two months. Reviewers started rubber-stamping. Subtle bugs slipped through. A credentials file made it into a staging deployment. That was the incident that made us build automated review into the pipeline.

What Our Automated Review Catches

At CenterConsulting, every commit goes through our /review-code process before it can be merged. Here is what it checks:

Security scanning. This is the highest priority. The review checks for hardcoded credentials, API keys, database connection strings, and PII (personally identifiable information) in code, comments, and test fixtures. AI agents are surprisingly prone to generating example data that includes realistic-looking email addresses, phone numbers, or even test credentials from their training data. We flag and block these automatically.

Style and convention enforcement. Every project has a CodingStyle.md and often a project-specific StyleGuide.md. The automated review checks that AI-generated code follows these conventions — naming patterns, error handling approaches, import organization, file structure. This catches the drift problem where different AI sessions produce code in subtly different styles.

Documentation coverage. If a commit adds a new public API endpoint, there must be corresponding documentation. If it modifies an existing interface, the docs must be updated. Stale documentation is flagged as a blocking issue. This enforces the documentation-first principle of the CenCon Method at the commit level.

Dependency and import analysis. AI agents sometimes introduce unnecessary dependencies or import modules that create circular references. The review checks for new dependencies that were not in the original requirements and flags them for human decision.

What Humans Still Catch Better

Automated review is not a replacement for human judgment. It is a filter. After thousands of commits through this system, here is what the automation reliably catches that humans miss: credential leaks, PII in test data, style drift across sessions, and missing documentation updates.

Here is what humans still catch better: architectural misalignment (the code works but should not live in that module), over-engineering (the AI built an abstraction layer nobody asked for), and business logic errors where the code does exactly what was specified but the specification was wrong.

The Three-Layer Approach

Our process works in three layers:

Layer 1: AI generates. The developer directs the AI agent to implement a feature based on documented requirements and coding standards.

Layer 2: Automation validates. The /review-code process runs automatically. Blocking issues must be resolved before the commit can proceed. Non-blocking issues are flagged for the human reviewer.

Layer 3: Human approves. A developer reviews the code with the automated report in hand. They focus on architecture, business logic, and design decisions — the things automation cannot evaluate. The automated review has already handled the mechanical checks.

This layered approach means human reviewers spend their time on judgment calls, not on spotting a misplaced API key in line 847 of a generated file. The result is faster review cycles with higher quality outcomes than either automation or humans could achieve alone.

If you are generating code with AI at any meaningful scale, you need automated review in your pipeline. Human review alone will not keep up, and the consequences of what slips through get more expensive the longer they sit in your codebase.