Skip to main content
Choosing the right model for each task has a significant impact on workflow quality and cost. This guide covers recommended models, when to use CLI agents vs the AI SDK, and how to set up a dual-agent configuration.

Codex (gpt-5.3-codex) — Implementation

Codex is the strongest model for writing and modifying code. Use it for:
  • Implementing features
  • Fixing bugs
  • Running and interpreting tests
  • Refactoring code
  • Fixing review issues
Reasoning effort: Set to high by default. Use xhigh for especially complex tasks (architectural refactors, multi-file changes with tricky dependencies).

Claude Opus (claude-opus-4-6) — Planning and Review

Claude Opus is the strongest model for reasoning about architecture and evaluating code quality. Use it for:
  • Research and codebase exploration
  • Planning implementation steps
  • Code review
  • Report generation
  • Orchestration logic and tool calling

Claude Sonnet (claude-sonnet-4-5-20250929) — Simple Tasks

Sonnet is fast, cheap, and good enough for straightforward tasks. Use it for:
  • Simple tool calling (reading files, running commands)
  • Lightweight reviews where deep reasoning is not needed
  • Report aggregation from structured data
  • Tasks where a more expensive model would be wasteful

Summary Table

Task TypeRecommended ModelWhy
Implementing codeCodexStrongest at code generation
Reviewing codeClaude Opus + Codex (parallel)Two models catch more issues
Research and planningClaude OpusStrongest at architectural reasoning
Running tests / validationCodexGood at interpreting build output
Simple tool callsClaude SonnetFast, cheap, sufficient
Report generationClaude Sonnet or OpusDepends on complexity
Ticket discoveryCodex or Claude OpusBoth work well for codebase analysis

CLI Agents vs AI SDK Agents

Smithers supports two ways to run each model:

CLI Agents (subscription-based)

Use ClaudeCodeAgent and CodexAgent when you have a Claude Code or Codex subscription. The agent runs as a subprocess using the CLI binary, which provides its native tool ecosystem (file editing, shell access, etc.).
import { ClaudeCodeAgent, CodexAgent } from "smithers-orchestrator";

const claude = new ClaudeCodeAgent({
  model: "claude-opus-4-6",
  systemPrompt: SYSTEM_PROMPT,
  dangerouslySkipPermissions: true,
  timeoutMs: 30 * 60 * 1000,
});

const codex = new CodexAgent({
  model: "gpt-5.3-codex",
  systemPrompt: SYSTEM_PROMPT,
  yolo: true,
  config: { model_reasoning_effort: "high" },
  timeoutMs: 30 * 60 * 1000,
});

AI SDK Agents (API billing)

Use ToolLoopAgent from the ai package when you want API billing instead of a subscription, or when you want sandboxed tools from Smithers:
import { ToolLoopAgent as Agent, stepCountIs } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { openai } from "@ai-sdk/openai";
import { tools } from "smithers-orchestrator/tools";

const claude = new Agent({
  model: anthropic("claude-opus-4-6"),
  tools,
  instructions: SYSTEM_PROMPT,
  stopWhen: stepCountIs(100),
});

const codex = new Agent({
  model: openai("gpt-5.3-codex"),
  tools,
  instructions: SYSTEM_PROMPT,
  stopWhen: stepCountIs(100),
});

Dual-Agent Setup

The recommended pattern is to define both CLI and API versions and switch with an environment variable:
// agents.ts
import { ToolLoopAgent as Agent, stepCountIs, type ToolSet } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { openai } from "@ai-sdk/openai";
import { ClaudeCodeAgent, CodexAgent } from "smithers-orchestrator";
import { tools as smithersTools } from "smithers-orchestrator/tools";
import { SYSTEM_PROMPT } from "./system-prompt";

const tools = smithersTools as ToolSet;
const USE_CLI = process.env.USE_CLI_AGENTS !== "0" && process.env.USE_CLI_AGENTS !== "false";
const UNSAFE = process.env.SMITHERS_UNSAFE === "1";

// --- Codex ---
const CODEX_MODEL = process.env.CODEX_MODEL ?? "gpt-5.3-codex";

const codexApi = new Agent({
  model: openai(CODEX_MODEL),
  tools,
  instructions: SYSTEM_PROMPT,
  stopWhen: stepCountIs(100),
  maxOutputTokens: 8192,
});

const codexCli = new CodexAgent({
  model: CODEX_MODEL,
  systemPrompt: SYSTEM_PROMPT,
  yolo: UNSAFE,
  config: { model_reasoning_effort: "high" },
  timeoutMs: 30 * 60 * 1000,
});

export const codex = USE_CLI ? codexCli : codexApi;

// --- Claude ---
const CLAUDE_MODEL = process.env.CLAUDE_MODEL ?? "claude-opus-4-6";

const claudeApi = new Agent({
  model: anthropic(CLAUDE_MODEL),
  tools,
  instructions: SYSTEM_PROMPT,
  stopWhen: stepCountIs(100),
  maxOutputTokens: 8192,
});

const claudeCli = new ClaudeCodeAgent({
  model: CLAUDE_MODEL,
  systemPrompt: SYSTEM_PROMPT,
  dangerouslySkipPermissions: UNSAFE,
  timeoutMs: 30 * 60 * 1000,
});

export const claude = USE_CLI ? claudeCli : claudeApi;
Switch between modes at launch:
# Use CLI agents (subscription)
USE_CLI_AGENTS=1 SMITHERS_UNSAFE=1 bunx smithers run workflow.tsx

# Use API agents
USE_CLI_AGENTS=0 bunx smithers run workflow.tsx

Assigning Models to Steps

In a typical workflow with a review loop, assign models by their strengths:
StepAgentReasoning
DiscovercodexGood at codebase analysis and structured output
ResearchclaudeStrong at finding patterns and synthesizing information
PlanclaudeBest at architectural reasoning
ImplementcodexStrongest at writing code
ValidatecodexGood at running and interpreting tests
Review (parallel)claude + codexTwo models catch different issue types
ReviewFixcodexFixing code is implementation work
ReportclaudeGood at summarization

Codex Reasoning Effort

The model_reasoning_effort config controls how much thinking Codex does. Higher effort produces better results but is slower and more expensive:
const codex = new CodexAgent({
  model: "gpt-5.3-codex",
  config: { model_reasoning_effort: "high" },  // default recommendation
});
LevelUse when
mediumSimple, well-defined changes with clear instructions
highDefault. Most implementation and review tasks
xhighComplex architectural changes, multi-file refactors, tricky edge cases

Next Steps