Quick Answer: Boris Cherny, creator of Claude Code, exclusively uses Opus 4.5 with thinking enabled—Anthropic's largest and slowest model. His reasoning: "A wrong fast answer is slower than a right slow answer." Because you spend less time steering and correcting errors, the bigger model is actually faster for real-world development tasks despite higher latency per response.
In an industry obsessed with speed—faster inference, lower latency, quicker responses—one of the world's most productive AI-assisted developers made a counterintuitive choice. Boris Cherny, who ships 22-27 pull requests daily using 100% AI-generated code, deliberately uses the slowest model available.
Here's why that decision makes him faster, not slower, and what it means for how you should think about AI model selection.
Watch: Understanding Opus and Claude's Capabilities
Dario Amodei explains the philosophy behind Claude's different model tiers and why capability often matters more than speed.
The Industry Obsession with Speed
Watch any AI demo or read any product announcement. The metrics they highlight are almost always about speed:
- "50% faster inference"
- "Reduced latency by 40%"
- "Sub-second response times"
This focus on speed makes sense for certain use cases. Chatbots need to feel responsive. Real-time applications can't wait. User experience matters.
But for coding? The calculus is completely different.
The Hidden Cost of Fast, Wrong Answers
When you're building software, what matters isn't how fast you get a response. What matters is how fast you get the right response.
Consider two scenarios:
Scenario A: Fast Model
- Response time: 5 seconds
- Quality: Needs 3 rounds of corrections
- Each correction: 5 seconds
- Total time: 20 seconds + debugging time
- Mental overhead: Context switching, frustration, lost flow
Scenario B: Slow Model
- Response time: 30 seconds
- Quality: Correct on first attempt
- Total time: 30 seconds
- Mental overhead: None—move on to next task
The "fast" model took less time per response but more time overall. And that doesn't account for the cognitive cost of debugging and redirecting.
Boris Cherny's Philosophy: Quality Over Latency
"I use Opus 4.5 with thinking for everything. It's the best coding model I've ever used." — Boris Cherny
Cherny's choice isn't based on speculation. It's based on shipping thousands of pull requests and observing what actually works.
Why Opus 4.5 Specifically?
Opus 4.5 is Anthropic's flagship model—the largest, most capable, and yes, slowest. When combined with "thinking" mode (extended reasoning), it gets even slower but substantially more capable.
Key characteristics:
- Deeper reasoning on complex problems
- Better understanding of full codebase context
- More accurate on first attempt
- Superior tool use and multi-step tasks
- Fewer hallucinations and errors
The "Steering" Problem with Smaller Models
"Since you have to steer it less and it's better at tool use, it is almost always faster than using a smaller model in the end." — Boris Cherny
"Steering" is the work you do to correct, redirect, and guide an AI when it doesn't get things right. With smaller models, you spend significant time:
- Clarifying what you meant
- Pointing out errors
- Providing additional context
- Re-explaining requirements
- Fixing generated code manually
This steering overhead is invisible in benchmarks but dominates real-world development time.
The Math Behind the Counterintuitive Choice
Let's break down why slower can be faster:
Time Comparison: Simple Task
| Model | Response Time | Iterations | Total Time |
|---|---|---|---|
| Sonnet (faster) | 8 seconds | 3 rounds | 24+ seconds |
| Opus 4.5 (slower) | 25 seconds | 1 round | 25 seconds |
For simple tasks, they're roughly equivalent. But simple tasks aren't where you spend most of your time.
Time Comparison: Complex Task
| Model | Response Time | Iterations | Total Time |
|---|---|---|---|
| Sonnet (faster) | 15 seconds | 5 rounds | 75+ seconds |
| Opus 4.5 (slower) | 45 seconds | 1-2 rounds | 45-90 seconds |
For complex tasks, the gap widens significantly—but in Opus's favor.
Time Comparison: Architecture Decisions
| Model | Response Time | Iterations | Total Time |
|---|---|---|---|
| Sonnet (faster) | 20 seconds | Often wrong approach | Hours of rework |
| Opus 4.5 (slower) | 60 seconds | Better initial approach | Minimal rework |
For architectural decisions, wrong initial directions can cost hours or days. The model that gets the approach right the first time saves enormous amounts of time.
The Thinking Mode Advantage
Cherny specifically uses Opus 4.5 with thinking enabled. This mode allows the model to reason through problems before responding, similar to how humans work through complex problems step by step.
How Thinking Mode Works
Without Thinking:
Input → Immediate Response
With Thinking:
Input → Internal Reasoning → Response
The thinking process:
- Analyzes the full context
- Considers multiple approaches
- Evaluates tradeoffs
- Selects best solution
- Generates implementation
Why This Matters for Coding
Coding problems are rarely straightforward. They involve:
- Understanding existing code
- Considering edge cases
- Maintaining consistency with patterns
- Avoiding regressions
- Optimizing for multiple factors
Thinking mode excels at this multi-factor analysis. The extra time spent "thinking" often eliminates entire rounds of iteration.
Tool Use: The Underrated Factor
"It's better at tool use." — Boris Cherny
Modern AI coding assistants don't just generate text—they use tools:
- Reading files
- Searching codebases
- Running commands
- Checking test results
- Browsing documentation
Larger models are significantly better at knowing which tools to use and when to use them. This compounds over a development session.
Tool Use Comparison
Smaller Model:
- Generates code
- Misses important context in another file
- Developer points out the file
- Model reads file
- Regenerates with correct context
Larger Model:
- Realizes relevant context might exist elsewhere
- Proactively searches for related files
- Generates code with full context
The larger model's superior judgment about tool use eliminates entire cycles of back-and-forth.
When to Use Which Model: A Practical Guide
Cherny's preference for Opus 4.5 doesn't mean smaller models have no place. Here's a framework for choosing:
Use Larger Models (Opus 4.5) For:
| Use Case | Why |
|---|---|
| Complex features | Multi-file changes, architecture decisions |
| Debugging | Requires deep understanding of system |
| Refactoring | Needs to understand patterns and implications |
| New codebases | Learning phase benefits from thorough analysis |
| Code review | Quality matters more than speed |
Use Smaller Models (Sonnet, Haiku) For:
| Use Case | Why |
|---|---|
| Simple syntax changes | Low risk, clear requirements |
| Formatting | Mechanical, well-defined |
| Quick questions | Information retrieval, not generation |
| Batch operations | Cost optimization for many simple tasks |
| Interactive chatting | Conversational latency matters |
The Key Question
Ask yourself: "What's the cost of getting this wrong?"
- High cost of wrong answer → Use Opus 4.5
- Low cost of wrong answer → Use smaller model
For Cherny, building production software where bugs cost time and customer trust, the answer is almost always Opus 4.5.
Implementing This Philosophy
Ready to adopt Cherny's approach? Here's how:
Step 1: Set Your Default to Opus
In Claude Code settings, configure Opus 4.5 with thinking as your default model. Remove the friction of choosing.
Step 2: Track Your Iterations
For one week, note how many rounds of iteration each task takes. You'll likely find complex tasks require multiple rounds with smaller models but fewer with Opus.
Step 3: Calculate True Cost
Factor in:
- Time per response
- Number of iterations
- Debugging time
- Context-switching overhead
- Cognitive load
The "slower" model often has lower true cost.
Step 4: Use the Wait Time
When Opus is thinking, don't sit idle:
- Review previous output
- Think about next steps
- Check other parallel sessions
- Grab coffee
The wait time isn't wasted if you use it productively.
The Broader Lesson
Cherny's model choice reflects a deeper principle: optimize for outcomes, not intermediate metrics.
Response latency is an intermediate metric. What matters is:
- Working code shipped
- Time to complete features
- Quality of solutions
- Developer satisfaction
When you optimize for these true outcomes, counterintuitive choices often make sense.
Frequently Asked Questions
Isn't Opus 4.5 much more expensive?
Yes, per token. But if you need fewer tokens (less iteration, fewer corrections), the total cost can be similar or lower. More importantly, developer time is usually more expensive than API costs.
Does this apply to all types of development?
It applies most strongly to complex, production-quality development. For learning, experimentation, or simple scripts, the calculus might differ.
What about hybrid approaches?
Some developers use smaller models for initial drafts and Opus for refinement. This can work but adds complexity. Cherny's simplicity (always Opus) eliminates decision fatigue.
How do I enable thinking mode in Claude Code?
Thinking mode is enabled by default with Opus 4.5 in Claude Code. You can verify in settings or by observing the "thinking..." indicator before responses.
Will smaller models catch up?
Models improve constantly. But larger models also improve. The quality gap may narrow in some areas but is likely to persist for complex reasoning tasks.
Bottom Line
Boris Cherny's choice to use the "slowest" model isn't about ignoring speed—it's about understanding where speed actually matters.
Key takeaways:
- "A wrong fast answer is slower than a right slow answer"
- Steering and correction time dominates real-world development
- Opus 4.5 with thinking reduces iterations significantly
- Superior tool use compounds the advantages
- Optimize for outcomes, not intermediate metrics
The counterintuitive insight: by choosing the slowest model, Cherny becomes one of the fastest developers.
Ready to optimize your AI development workflow for real outcomes? Contact Houston IT Developers to discuss how we help teams implement effective AI-first development practices.
Sources:

Houston IT Developers
Houston IT Developers is a leading software development and digital marketing agency based in Houston, Texas. We specialize in web development, mobile apps, and digital solutions.
View all posts →Need Help With Your Project?
Our team of experts is ready to help you build your next web or mobile application. Get a free consultation today.
Get in Touch