Why Claude Code's Creator Only Uses the Slowest AI Model

Quick Answer: Boris Cherny, creator of Claude Code, exclusively uses Opus 4.5 with thinking enabled—Anthropic's largest and slowest model. His reasoning: "A wrong fast answer is slower than a right slow answer." Because you spend less time steering and correcting errors, the bigger model is actually faster for real-world development tasks despite higher latency per response.

In an industry obsessed with speed—faster inference, lower latency, quicker responses—one of the world's most productive AI-assisted developers made a counterintuitive choice. Boris Cherny, who ships 22-27 pull requests daily using 100% AI-generated code, deliberately uses the slowest model available.

Here's why that decision makes him faster, not slower, and what it means for how you should think about AI model selection.

Watch: Understanding Opus and Claude's Capabilities

Dario Amodei explains the philosophy behind Claude's different model tiers and why capability often matters more than speed.

The Industry Obsession with Speed

Watch any AI demo or read any product announcement. The metrics they highlight are almost always about speed:

"50% faster inference"
"Reduced latency by 40%"
"Sub-second response times"

This focus on speed makes sense for certain use cases. Chatbots need to feel responsive. Real-time applications can't wait. User experience matters.

But for coding? The calculus is completely different.

The Hidden Cost of Fast, Wrong Answers

When you're building software, what matters isn't how fast you get a response. What matters is how fast you get the right response.

Consider two scenarios:

Scenario A: Fast Model

Response time: 5 seconds
Quality: Needs 3 rounds of corrections
Each correction: 5 seconds
Total time: 20 seconds + debugging time
Mental overhead: Context switching, frustration, lost flow

Scenario B: Slow Model

Response time: 30 seconds
Quality: Correct on first attempt
Total time: 30 seconds
Mental overhead: None—move on to next task

The "fast" model took less time per response but more time overall. And that doesn't account for the cognitive cost of debugging and redirecting.

Boris Cherny's Philosophy: Quality Over Latency

"I use Opus 4.5 with thinking for everything. It's the best coding model I've ever used." — Boris Cherny

Cherny's choice isn't based on speculation. It's based on shipping thousands of pull requests and observing what actually works.

Why Opus 4.5 Specifically?

Opus 4.5 is Anthropic's flagship model—the largest, most capable, and yes, slowest. When combined with "thinking" mode (extended reasoning), it gets even slower but substantially more capable.

Key characteristics:

Deeper reasoning on complex problems
Better understanding of full codebase context
More accurate on first attempt
Superior tool use and multi-step tasks
Fewer hallucinations and errors

The "Steering" Problem with Smaller Models

"Since you have to steer it less and it's better at tool use, it is almost always faster than using a smaller model in the end." — Boris Cherny

"Steering" is the work you do to correct, redirect, and guide an AI when it doesn't get things right. With smaller models, you spend significant time:

Clarifying what you meant
Pointing out errors
Providing additional context
Re-explaining requirements
Fixing generated code manually

This steering overhead is invisible in benchmarks but dominates real-world development time.

Developer productivity comparison showing quality vs speed tradeoffs in AI-assisted coding workflows

The Math Behind the Counterintuitive Choice

Let's break down why slower can be faster:

Time Comparison: Simple Task

Model	Response Time	Iterations	Total Time
Sonnet (faster)	8 seconds	3 rounds	24+ seconds
Opus 4.5 (slower)	25 seconds	1 round	25 seconds

For simple tasks, they're roughly equivalent. But simple tasks aren't where you spend most of your time.

Time Comparison: Complex Task

Model	Response Time	Iterations	Total Time
Sonnet (faster)	15 seconds	5 rounds	75+ seconds
Opus 4.5 (slower)	45 seconds	1-2 rounds	45-90 seconds

For complex tasks, the gap widens significantly—but in Opus's favor.

Time Comparison: Architecture Decisions

Model	Response Time	Iterations	Total Time
Sonnet (faster)	20 seconds	Often wrong approach	Hours of rework
Opus 4.5 (slower)	60 seconds	Better initial approach	Minimal rework

For architectural decisions, wrong initial directions can cost hours or days. The model that gets the approach right the first time saves enormous amounts of time.

The Thinking Mode Advantage

Cherny specifically uses Opus 4.5 with thinking enabled. This mode allows the model to reason through problems before responding, similar to how humans work through complex problems step by step.

How Thinking Mode Works

Without Thinking:
Input → Immediate Response

With Thinking:
Input → Internal Reasoning → Response

The thinking process:

Analyzes the full context
Considers multiple approaches
Evaluates tradeoffs
Selects best solution
Generates implementation

Why This Matters for Coding

Coding problems are rarely straightforward. They involve:

Understanding existing code
Considering edge cases
Maintaining consistency with patterns
Avoiding regressions
Optimizing for multiple factors

Thinking mode excels at this multi-factor analysis. The extra time spent "thinking" often eliminates entire rounds of iteration.

Tool Use: The Underrated Factor

"It's better at tool use." — Boris Cherny

Modern AI coding assistants don't just generate text—they use tools:

Reading files
Searching codebases
Running commands
Checking test results
Browsing documentation

Larger models are significantly better at knowing which tools to use and when to use them. This compounds over a development session.

Tool Use Comparison

Smaller Model:

Generates code
Misses important context in another file
Developer points out the file
Model reads file
Regenerates with correct context

Larger Model:

Realizes relevant context might exist elsewhere
Proactively searches for related files
Generates code with full context

The larger model's superior judgment about tool use eliminates entire cycles of back-and-forth.

Modern AI development environment showing model capabilities and intelligent tool use for coding tasks

When to Use Which Model: A Practical Guide

Cherny's preference for Opus 4.5 doesn't mean smaller models have no place. Here's a framework for choosing:

Use Larger Models (Opus 4.5) For:

Use Case	Why
Complex features	Multi-file changes, architecture decisions
Debugging	Requires deep understanding of system
Refactoring	Needs to understand patterns and implications
New codebases	Learning phase benefits from thorough analysis
Code review	Quality matters more than speed

Use Smaller Models (Sonnet, Haiku) For:

Use Case	Why
Simple syntax changes	Low risk, clear requirements
Formatting	Mechanical, well-defined
Quick questions	Information retrieval, not generation
Batch operations	Cost optimization for many simple tasks
Interactive chatting	Conversational latency matters

The Key Question

Ask yourself: "What's the cost of getting this wrong?"

High cost of wrong answer → Use Opus 4.5
Low cost of wrong answer → Use smaller model

For Cherny, building production software where bugs cost time and customer trust, the answer is almost always Opus 4.5.

Implementing This Philosophy

Ready to adopt Cherny's approach? Here's how:

Step 1: Set Your Default to Opus

In Claude Code settings, configure Opus 4.5 with thinking as your default model. Remove the friction of choosing.

Step 2: Track Your Iterations

For one week, note how many rounds of iteration each task takes. You'll likely find complex tasks require multiple rounds with smaller models but fewer with Opus.

Step 3: Calculate True Cost

Factor in:

Time per response
Number of iterations
Debugging time
Context-switching overhead
Cognitive load

The "slower" model often has lower true cost.

Step 4: Use the Wait Time

When Opus is thinking, don't sit idle:

Review previous output
Think about next steps
Check other parallel sessions
Grab coffee

The wait time isn't wasted if you use it productively.

The Broader Lesson

Cherny's model choice reflects a deeper principle: optimize for outcomes, not intermediate metrics.

Response latency is an intermediate metric. What matters is:

Working code shipped
Time to complete features
Quality of solutions
Developer satisfaction

When you optimize for these true outcomes, counterintuitive choices often make sense.

Frequently Asked Questions

Isn't Opus 4.5 much more expensive?

Yes, per token. But if you need fewer tokens (less iteration, fewer corrections), the total cost can be similar or lower. More importantly, developer time is usually more expensive than API costs.

Does this apply to all types of development?

It applies most strongly to complex, production-quality development. For learning, experimentation, or simple scripts, the calculus might differ.

What about hybrid approaches?

Some developers use smaller models for initial drafts and Opus for refinement. This can work but adds complexity. Cherny's simplicity (always Opus) eliminates decision fatigue.

How do I enable thinking mode in Claude Code?

Thinking mode is enabled by default with Opus 4.5 in Claude Code. You can verify in settings or by observing the "thinking..." indicator before responses.

Will smaller models catch up?

Models improve constantly. But larger models also improve. The quality gap may narrow in some areas but is likely to persist for complex reasoning tasks.

Bottom Line

Boris Cherny's choice to use the "slowest" model isn't about ignoring speed—it's about understanding where speed actually matters.

Key takeaways:

"A wrong fast answer is slower than a right slow answer"
Steering and correction time dominates real-world development
Opus 4.5 with thinking reduces iterations significantly
Superior tool use compounds the advantages
Optimize for outcomes, not intermediate metrics

The counterintuitive insight: by choosing the slowest model, Cherny becomes one of the fastest developers.

Ready to optimize your AI development workflow for real outcomes? Contact Houston IT Developers to discuss how we help teams implement effective AI-first development practices.

Sources:

Development

Staffing

Design

Marketing

Why Claude Code's Creator Only Uses the Slowest AI Model.