Deep Review: Teaching AI Agents to Read the Article, Then Think It Through

You read an excellent technical article, get excited, decide the whole approach should be adopted, then three days later you completely forget it. Or you copy a bunch of ideas into your system, only to realize a month later that you never needed them. Many people have lived this plot.

Where the Problem Comes From

Common failure pattern

When reading articles, it is easy to get pulled along by a few things:

Reputation: If a famous expert wrote it, must it be right?
Novelty: New methods always sound more powerful
Action impulse: After reading, you feel like you “should do something”

The root problem is that readers skip analysis. Whether the article itself is well written is secondary. Between “finished reading” and “implementation,” one step is missing: thinking it through.

But thinking carefully takes time and energy. Humans usually do not have enough patience to lay out every argument and inspect it.

AI does.

What Deep Review Is

Deep Review is a research methodology for AI agents. It is a structured prompt designed specifically to answer one question: “Should we adopt the suggestions in this article?”

Its core job is analysis. Summarizing an article is something any AI can do:

Do I actually have the problem this article is solving?
Are its suggestions supported by evidence, or are they just opinions?
Compared with the current system, what is different?
What are the adoption cost and risk?
If the change turns out badly, can we roll it back?

After the process finishes, every suggestion gets a clear verdict: adopt, experiment, reject, or needs discussion.

Not intuition. Process.

Six Phases

Six phases

Phase 0: Filter

The most important first step: does this problem exist in your own system?

Most articles solve problems that do not exist in the reader’s own system. If the answer to the first question is “no,” the entire analysis ends immediately. No wasted time, no wasted tokens.

“Change nothing” is a completely valid result.

Phase 1: Extract

Break the article into independent claims, then label the evidence type for each claim:

Type	Example
Experimental data	”We tested 500 runs and latency dropped 40%“
Case study	”The team’s productivity improved after using it”
Logical reasoning	”Because A, B should hold”
Pure opinion	”This is better”

This step only extracts. It does not judge. First, faithfully present what the author said.

It also asks a subtle question: If this article were published anonymously, would you still find it equally convincing? This is there to fight authority bias. Sometimes we are persuaded not by the argument, but simply because the author is famous.

Phase 2: Compare

Compare every claim against the current system state, and it must cite specific files and line numbers.

Vague wording like “the system seems to have something similar” is not accepted. Either point to config.yaml:42, or admit it has not been found yet.

Phase 3: Debate

For every claim, list both sides:

For: the article’s evidence + concrete benefits for your own context
Against: implementation cost, conflicts with the current system, situations the author did not consider
Missing: information still needed before making the decision

It also evaluates impact across reliability, maintainability, operability, and complexity, plus one often-ignored question: If we regret adopting this, how expensive is rollback?

Phase 4: Decide

Four decision types

Each claim gets one decision card with a consistent format:

Claim: what the claim says
Decision: adopt / experiment / reject / needs discussion
Reasons: the top 2-3 reasons
Concrete change: which part of which file would change
Expected consequences: expected positive and negative effects

The answer is not only “good” or “bad.” Sometimes it is “try this in a small experiment first.” Sometimes it is “good idea, but not useful here.”

Phase 5: Audit

Independent audit

The final step is also the most important design choice: the audit must be performed by an independent subagent.

Why? When AI checks its own output in the same conversation, it will almost always say, “Looks fine.” Research shows this kind of self-checking has nearly zero discriminative power.

The independent subagent checks several common failure modes:

Every claim was adopted, or every claim was rejected, which shows no discrimination
The comparison table has no concrete file paths, only vague claims
All objections are just “needs more data,” avoiding judgment
Supporting arguments merely restate the article without connecting to the local context

It also asks a harsh question: If we skipped the whole analysis and made the decision by gut in 30 seconds, would the conclusion be the same? If yes, the analysis did not add value.

The Point Is Learning, Not Reviewing Sources

Someone might ask: is this a way to “audit” articles?

No. Deep Review’s core attitude is learning. The goal is not copying, and it is not criticism either.

It answers this question: “What in this article is useful for the user’s system?” Deciding whether the article is “right” or “wrong” is never the point.

Phase 5 audits the quality of the analysis process, not the article itself. When evaluating resources with a learning mindset, a lightweight red-flag check is enough.

How to Start

The simplest path:

Put deep-review.md in your project directory or ~/.claude/
Type deep-review in Claude Code and paste the article
Wait for it to finish all six phases and return the conclusion

That’s it. One file. Nothing to install.

It is also fine if you are not using Claude Code. deep-review.md is just a structured prompt, so you can use it in Cursor, Windsurf, or any AI tool that can read markdown.

Why I Built This

People who run long-term AI agent systems read a lot of technical articles and other people’s practices every day. Some are genuinely good. Some sound good but do not fit their own setup.

The problem is that there is not enough time or energy to analyze every article carefully. Gut judgment is also unreliable: sometimes too optimistic, sometimes completely dismissive.

So Deep Review exists: turning “intuition” into “process.”

Not every article needs this full analysis. Simple tips can just be read and moved on from. But when an article might change your system architecture or workflow, spending a few minutes on Deep Review can save hours of mistakes later.

Research Behind It

The design is grounded in existing work:

CheckEval: why checklists work better than open-ended scoring
LLM-as-Judge research: known biases when AI acts as judge
Multi-agent debate research: why AI “roleplay debate” often backfires
Heilmeier Catechism: DARPA’s proposal evaluation method
Architecture Decision Records: the standard engineering-team format for recording decisions

Penchan’s Take

This skill is something I wrote and use every day. The most common use case is reading a technical article that sounds powerful, dropping it into Deep Review, and only then deciding whether to change the system. About 80% of articles end at Phase 0 because my own system simply does not have the problem. Of the remaining 20%, roughly half land on “needs discussion” or “reject” in Phase 4. The percentage that truly reaches “adopt” is much lower than expected, and that is exactly the value: turning excitement into an evidence-based decision.

FAQ

Q: What is Deep Review?

It is a structured analysis method for AI agents. After reading a technical article, it uses six phases, filter → extract → compare → debate → decide → audit, to decide which suggestions are worth adopting and which should be skipped.

Q: Do I need a specific AI tool to use it?

It is mainly designed for Claude Code, but deep-review.md is just a structured prompt. You can use it in Cursor, Windsurf, or any AI assistant that can read markdown.

Q: How is it different from directly asking AI, ‘Is this article good?’

Asking AI directly usually gives you a one-sided positive answer. Deep Review forces AI through a full analysis process, including comparison with the current system, counterarguments, and an independent agent audit to reduce self-confirmation bias.

Q: Is ‘change nothing’ a valid outcome?

Completely valid. Phase 0 of Deep Review asks, ‘Do we actually have this problem?’ If the answer is no, the review ends there. Not every article needs action.

— Penchan