Recommended Models
Codex (gpt-5.3-codex) — Implementation
Codex is the strongest model for writing and modifying code. Use it for:- Implementing features
- Fixing bugs
- Running and interpreting tests
- Refactoring code
- Fixing review issues
high by default. Use xhigh for especially complex tasks — architectural refactors, multi-file changes with tricky dependencies.
Claude Opus (claude-opus-4-6) — Planning and Review
Claude Opus is the strongest model for reasoning about architecture and evaluating code quality. Use it for:- Research and codebase exploration
- Planning implementation steps
- Code review
- Report generation
- Orchestration logic and tool calling
Claude Sonnet (claude-sonnet-4-5-20250929) — Simple Tasks
Sonnet is fast, cheap, and good enough for straightforward work. Use it for:- Simple tool calling (reading files, running commands)
- Lightweight reviews where deep reasoning is not needed
- Report aggregation from structured data
- Tasks where a more expensive model would be wasteful
Summary Table
| Task Type | Recommended Model | Why |
|---|---|---|
| Implementing code | Codex | Strongest at code generation |
| Reviewing code | Claude Opus + Codex (parallel) | Two models catch more issues |
| Research and planning | Claude Opus | Strongest at architectural reasoning |
| Running tests / validation | Codex | Good at interpreting build output |
| Simple tool calls | Claude Sonnet | Fast, cheap, sufficient |
| Report generation | Claude Sonnet or Opus | Depends on complexity |
| Ticket discovery | Codex or Claude Opus | Both work well for codebase analysis |
CLI Agents vs AI SDK Agents
Smithers supports two ways to run each model. The choice depends on how you pay.CLI Agents (subscription-based)
UseClaudeCodeAgent, CodexAgent, and KimiAgent when you have a subscription to the respective service. The agent runs as a subprocess using the CLI binary, which provides its native tool ecosystem — file editing, shell access, and everything else the CLI supports.
AI SDK Agents (API billing)
UseAnthropicAgent and OpenAIAgent when you want per-token billing instead of a subscription, or when you want sandboxed tools from Smithers:
Dual-Agent Setup
In practice, you want the flexibility to switch between CLI and API agents without rewriting your workflow. Define both and let an environment variable decide:Assigning Models to Steps
In a typical workflow with a review loop, assign models by what they are good at:| Step | Agent | Reasoning |
|---|---|---|
| Discover | codex | Good at codebase analysis and structured output |
| Research | claude | Strong at finding patterns and synthesizing information |
| Plan | claude | Best at architectural reasoning |
| Implement | codex | Strongest at writing code |
| Validate | codex | Good at running and interpreting tests |
| Review (parallel) | claude + codex | Two models catch different issue types |
| ReviewFix | codex | Fixing code is implementation work |
| Report | claude | Good at summarization |
Codex Reasoning Effort
Themodel_reasoning_effort config controls how much thinking Codex does before it generates. Higher effort produces better results but costs more time and tokens.
| Level | Use when |
|---|---|
medium | Simple, well-defined changes with clear instructions |
high | Default. Most implementation and review tasks |
xhigh | Complex architectural changes, multi-file refactors, tricky edge cases |
high. You can always bump it to xhigh for the tasks that keep failing.
Next Steps
- Implement-Review Loop — The recommended review loop pattern.
- CLI Agents — Full reference for ClaudeCodeAgent, CodexAgent, GeminiAgent, PiAgent, KimiAgent.
- Built-in Tools — Sandboxed tools for AI SDK agents.