Why AI Review Needs More Than One Model

In earlier posts, I wrote about deterministic execution, browser verification, and stop conditions for AI driven workflows. Those addressed specific failure modes around execution and validation. Over time, another category of problems started to appear during review.

Using one AI to review the output of another was not a recent change in my workflow. Before cross-ai-review became a released workflow, I was already routinely having one model review implementation plans, prompts, architecture notes, and generated artifacts produced by another model before execution continued. In many cases, I would send the same instructions through multiple systems first to see where the interpretations diverged.

What changed was the degree of structure around it. The workflows moved from ad hoc experimentation into something more repeatable and governed. Different models surfaced different classes of issues, fixes introduced regressions, and ambiguity propagated downstream into later artifacts. The released cross-ai-review workflow was simply a formalization of patterns that had emerged through repeated use.

Running AI Where It Doesn’t Exist

The Problem

I was working on a Windows target where Claude simply would not run. This was not a degraded experience or a partial failure. It would not start at all due to a dependency issue. At the same time, the code I was building had to run in that environment, so avoiding the platform was not an option.

The obvious next step was to try to fix Claude on that system. I spent some time going down that path, but it quickly became clear that this was not going to be a quick fix. Even if I managed to get it working once, there was no guarantee it would continue working across updates or configuration changes. At that point, the problem started to look different.

Why AI Review Needs Stop Conditions

In the previous post, I described how structured manifests and browser verification make execution deterministic.

That solves execution. It does not solve review.

Deterministic execution without deterministic review is incomplete.

Teaching Claude Code to Run Scripts and Check Browsers

In earlier posts, I described how I used ChatGPT for architectural reasoning and Claude Code for implementation. That workflow continues to evolve.

Recently I ran into two friction points that exposed a larger issue.

The problem was not capability. It was structure.

A Better Way to Use AI in Software Development

There’s a lot of hype right now around using AI in software development. And to be fair, the hype isn’t entirely misplaced. AI can write code. It can summarize, generate, and even refactor. But there’s a major problem with the way most people use it. The problem is guessing.

When AI Has to Guess

If your prompt is not sufficiently clear, the AI has no choice but to guess what you meant. And this guessing introduces subtle, and sometimes major, errors. It might guess wrong about your naming conventions, your error-handling preferences, your architectural style, or your end goals. Even if you think the prompt is clear, the AI may read it differently than you do.