improve turns AI code audits into an execution backlog instead of another one-shot review

June 11, 2026updates

shadcn's improve uses a stronger model to audit a codebase, verify findings, and write self-contained plans that cheaper agents or humans can execute with much less ambiguity.

GitHub README capture for shadcn/improve

A lot of AI coding workflows still break at the handoff point. One model finds problems, another model edits code, and somewhere in the middle the real engineering judgment gets lost in a pile of chat context. That is why improve caught my eye. Instead of trying to be yet another agent that jumps straight from analysis to edits, it treats the missing artifact as the real product: a tight, executable plan that weaker or cheaper agents can carry out without guessing.

That is a sharper idea than it first sounds. Most teams do not actually need their most expensive model writing every line of implementation. They need that model to understand the repo, separate signal from noise, decide what matters, and describe the work clearly enough that execution can be delegated. improve is built around that boundary.

What the project actually ships

At the top level, improve is an agent skill that audits a repository and writes implementation plans into a plans/ directory. The README frames it around a simple division of labor: use the strongest model for repo understanding and planning, then hand the resulting spec to a cheaper executor. That immediately makes the project feel workflow-shaped rather than model-hype-shaped.

The command surface is also more complete than a gimmick repo. The main /improve flow audits the whole codebase, but there are narrower modes for quick passes, deep audits, security-focused reviews, branch-only review, and feature-direction exploration. There is also a plan mode for turning one idea into a spec, a review-plan mode for tightening an existing plan, an execute mode for dispatching the implementation, and a reconcile mode for cleaning up what happened after plans met reality.

That last part matters. A lot of agent tooling still behaves like the work ends once a model produces text. improve is more realistic about the lifecycle. Plans drift. Findings become stale. A fix lands independently. An executor gets blocked. The repo is trying to manage those states explicitly instead of pretending the first generation pass is enough.

Why the planning layer is the interesting part

The strongest product choice here is that improve treats planning as a durable interface, not a side effect of one conversation. The generated plan files are plain markdown, self-contained, and meant to be picked up later by any agent or human. That sounds simple, but it solves a real operational problem in agent-heavy teams.

Chat context is fragile. A beautiful analysis inside one expensive session is not very reusable if another model needs the same context reconstructed later. improve tries to compress that context into something more like a work ticket written by a very technical lead: exact file paths, current-state excerpts, verification commands, expected outcomes, and explicit boundaries about what not to touch.

That is where the repo becomes more than a clever prompt. It is packaging judgment into an artifact that survives model switching. In practice, that is much closer to how real teams want to use frontier models anyway. The expensive thinking happens once, then execution can become cheaper, repeatable, and easier to review.

The repo understands where smaller agents fail

The README is especially good at acknowledging why weaker executors go off the rails. Smaller agents do not just need a high-level summary. They need the current state inlined, the test commands verified, the done criteria made machine-checkable, and the stop conditions spelled out when reality diverges from the plan.

improve appears to lean into exactly that. The project emphasizes self-contained specs, verification gates after each step, and hard boundaries around out-of-scope work. It even stamps plans against a specific git commit so an executor can detect drift before it starts editing.

That is a very grounded product decision. Many agent workflows fail not because the model is too weak in the abstract, but because the task description leaves too much room for improvisation. This repo is basically an attempt to narrow that ambiguity window.

What makes the audit loop more credible

Another good sign is that improve does not present the audit as magic. The README describes parallel review across categories like correctness, security, performance, coverage, technical debt, dependencies, documentation, and direction, then says the advisor re-checks the cited evidence before surfacing findings. Whether every run is perfect is not the point. The important thing is that the repo is structurally aware of false positives and over-reporting.

That awareness makes the backlog more trustworthy. If a tool is going to create actionable plans, it cannot behave like a noisy lint pass with prettier prose. It has to act more like a reviewer that can justify why something deserves engineering time. The evidence-first posture and prioritization framing make improve feel closer to that bar.

I also like the optional GitHub issue publishing path. It suggests the author understands that a useful planning artifact should not stay trapped inside a local agent session. Once the plan is good enough, it can move into the same work systems the team already uses.

Why this feels timely now

As coding agents get more capable, the bottleneck is shifting. The hard part is less about whether an agent can produce code at all and more about whether the work arrives with enough structure, scope control, and verification to be safely parallelized. That is especially true if a team wants to mix frontier models for high-leverage reasoning with cheaper models for implementation.

improve sits right on that transition. It is not trying to win by being the agent that does everything. It is trying to make multi-model delegation less chaotic. That is a stronger product insight than yet another repo promising autonomous coding in one command.

There is also a subtle cultural point here. Good engineering organizations already know that clear specs, reviewable tasks, and explicit verification criteria scale better than heroic intuition. improve is interesting because it tries to turn that same management logic into an agent-era primitive.

Where the boundaries are

The tradeoff is also clear. If a team wants fully autonomous implementation with minimal review, improve is probably not the right frame. It deliberately adds a planning step, and that means more upfront rigor before code changes happen. Some teams will see that as friction.

But for anything non-trivial, that friction may be the point. The repo is optimized for reducing waste later, not maximizing the speed of the first flashy demo. It assumes that better task framing is worth real effort because execution quality, scope discipline, and reviewability improve downstream.

It is also still a skill-shaped workflow, not a complete management platform. Teams that want dashboards, assignments, and broader orchestration will still need other layers around it. The value here is narrower and more focused: better plans, better delegation, and a cleaner boundary between thinking and doing.

Why builders should care

For builders, improve is a useful reminder that the next wave of agent tooling may not look like ever-more-autonomous bots. It may look like better interfaces between models, humans, and work artifacts. This repo is compelling because it turns the planning layer into something concrete, reusable, and testable instead of leaving it buried in ephemeral chat.

That is why it stood out to me. The most valuable use of a strong model is often not the final patch. It is the ability to understand a messy codebase, decide what is worth changing, and produce an execution plan that survives the handoff. improve treats that handoff as the real design problem, and that makes it much more interesting than another one-shot AI code review demo.

Repo

GitHub: https://github.com/shadcn/improve