I write about software development, AI assisted engineering workflows, architecture, automation, running, and the occasional side trail.

AI Makes Missing Judgment More Expensive

AI has made it cheaper to produce work that looks complete. It has not made judgment cheaper. That difference explains much of the disappointment, cost, and confusion now appearing around AI in software development.

The problem is not that AI cannot produce useful work. I use it heavily, and it can be extremely valuable when it is constrained by a strong engineering process. The problem starts when generated output is treated as evidence that the engineering work has been completed. A system can now accumulate code, tests, documentation, plans, reviews, and diagrams faster than the organization can reliably evaluate them.

That is the larger pattern behind several of my recent posts on deterministic execution, browser verification, stop conditions, and review workflows using more than one model. Those posts described specific techniques. The broader reason those techniques matter is that AI has moved the bottleneck. The limiting factor is increasingly less about producing artifacts and more about knowing whether those artifacts are correct, coherent, maintainable, and worth keeping.

Why AI Review Needs More Than One Model

In earlier posts, I wrote about deterministic execution, browser verification, and stop conditions for AI driven workflows. Those addressed specific failure modes around execution and validation. Over time, another category of problems started to appear during review.

Using one AI to review the output of another was not a recent change in my workflow. Before cross-ai-review became a released workflow, I was already routinely having one model review implementation plans, prompts, architecture notes, and generated artifacts produced by another model before execution continued. In many cases, I would send the same instructions through multiple systems first to see where the interpretations diverged.

What changed was the degree of structure around it. The workflows moved from ad hoc experimentation into something more repeatable and governed. Different models surfaced different classes of issues, fixes introduced regressions, and ambiguity propagated downstream into later artifacts. The released cross-ai-review workflow was simply a formalization of patterns that had emerged through repeated use.

Running AI Where It Doesn’t Exist

The Problem

I was working on a Windows target where Claude simply would not run. This was not a degraded experience or a partial failure. It would not start at all due to a dependency issue. At the same time, the code I was building had to run in that environment, so avoiding the platform was not an option.

The obvious next step was to try to fix Claude on that system. I spent some time going down that path, but it quickly became clear that this was not going to be a quick fix. Even if I managed to get it working once, there was no guarantee it would continue working across updates or configuration changes. At that point, the problem started to look different.

Why AI Review Needs Stop Conditions

In the previous post, I described how structured manifests and browser verification make execution deterministic.

That solves execution. It does not solve review.

Deterministic execution without deterministic review is incomplete.

Teaching Claude Code to Run Scripts and Check Browsers

In earlier posts, I described how I used ChatGPT for architectural reasoning and Claude Code for implementation. That workflow continues to evolve.

Recently I ran into two friction points that exposed a larger issue.

The problem was not capability. It was structure.