Skip to main content
Planned Feature - This component is not yet implemented. See GitHub issues for usage context and design.

Codex Component

Wraps OpenAI Codex CLI for code-focused AI agent capabilities with structured output, tool use, and Smithers task lifecycle integration. Provides alternative to Claude with OpenAI’s code-specialized models.

Planned API

interface CodexProps<TSchema extends z.ZodType = z.ZodType> {
  /**
   * Prompt/task for the agent.
   */
  children: ReactNode

  /**
   * Model selection.
   * @default 'gpt-4-turbo'
   */
  model?: 'gpt-4' | 'gpt-4-turbo' | 'gpt-3.5-turbo'

  /**
   * Temperature for randomness (0-2).
   * @default 1
   */
  temperature?: number

  /**
   * System prompt for agent behavior.
   */
  systemPrompt?: string

  /**
   * Working directory for code operations.
   */
  cwd?: string

  /**
   * Environment variables.
   */
  env?: Record<string, string>

  /**
   * Timeout in milliseconds.
   * @default 300000 (5 minutes)
   */
  timeout?: number

  /**
   * Zod schema for structured output.
   */
  schema?: TSchema

  /**
   * Callback on successful completion.
   */
  onFinished?: (result: AgentResult<TSchema>) => void

  /**
   * Callback on error.
   */
  onError?: (error: Error) => void
}

interface AgentResult<TSchema extends z.ZodType = z.ZodType> {
  output: string
  parsedOutput?: z.infer<TSchema>
  toolCalls?: ToolCall[]
  durationMs: number
}

export function Codex<TSchema extends z.ZodType = z.ZodType>(
  props: CodexProps<TSchema>
): JSX.Element

Proposed Usage

Basic Code Generation

import { Codex, Phase, Step } from 'smithers-orchestrator'

export function ImplementFeature() {
  return (
    <Phase name="Implementation">
      <Step name="implement">
        <Codex model="gpt-4-turbo">
          Implement user authentication with JWT tokens.
          Include login, logout, and token refresh endpoints.
        </Codex>
      </Step>
    </Phase>
  )
}

With Structured Output

import { z } from 'zod'

const reviewSchema = z.object({
  decision: z.enum(['approve', 'request_changes']),
  issues: z.array(z.object({
    severity: z.enum(['critical', 'major', 'minor']),
    file: z.string(),
    message: z.string()
  })),
  summary: z.string()
})

<Codex
  model="gpt-4"
  schema={reviewSchema}
  onFinished={(result) => {
    const review = result.parsedOutput
    console.log(`Decision: ${review.decision}`)
    console.log(`Issues: ${review.issues.length}`)
  }}
>
  Review this pull request and provide structured feedback
</Codex>

Custom System Prompt

<Codex
  model="gpt-4-turbo"
  systemPrompt="You are a senior software engineer specializing in React and TypeScript. Focus on performance, type safety, and maintainability."
>
  Refactor this component to use modern React patterns
</Codex>

In Fallback Chain

import { FallbackAgent, Codex, Claude } from 'smithers-orchestrator'

<FallbackAgent>
  <Codex model="gpt-4-turbo">{prompt}</Codex>
  <Claude model="sonnet">{prompt}</Claude>
</FallbackAgent>

Props (Planned)

children
ReactNode
required
Prompt or task for the agent.Can be string or JSX elements. Converted to string for CLI.
<Codex>Implement feature X</Codex>
<Codex>
  Analyze this code:
  {codeSnippet}

  Suggest improvements for performance.
</Codex>
model
'gpt-4' | 'gpt-4-turbo' | 'gpt-3.5-turbo'
default:"gpt-4-turbo"
OpenAI model selection.Options:
  • gpt-4 - Original GPT-4 (high quality, slower, more expensive)
  • gpt-4-turbo - Faster GPT-4 variant with extended context
  • gpt-3.5-turbo - Faster, cheaper, less capable
Use cases:
  • Complex code: gpt-4
  • Balanced: gpt-4-turbo
  • Simple tasks: gpt-3.5-turbo
temperature
number
default:"1"
Controls randomness in responses (0-2).Lower (0-0.5): More deterministic, focused
  • Code generation
  • Refactoring
  • Bug fixing
Medium (0.5-1): Balanced creativity
  • General tasks
  • Documentation
Higher (1-2): More creative, diverse
  • Brainstorming
  • Exploration
systemPrompt
string
System-level instructions defining agent behavior and role.Examples:
systemPrompt="You are an expert in database optimization and SQL performance tuning."
systemPrompt="Focus on security best practices. Flag any potential vulnerabilities."
Overrides default Codex system prompt.
cwd
string
Working directory for code operations.Respects worktree context if inside <Worktree>.Priority: Explicit cwd > Worktree context > process.cwd()
env
Record<string, string>
Environment variables for CLI execution.Merged with process.env.
<Codex env={{ DEBUG: "true", LOG_LEVEL: "verbose" }}>
  Debug this issue
</Codex>
timeout
number
default:"300000"
Timeout in milliseconds (default 5 minutes).Agent killed if exceeds timeout.
<Codex timeout={600000}>  {/* 10 minutes for complex task */}
  Implement large refactoring
</Codex>
schema
z.ZodType
Zod schema for structured output validation.Agent output parsed and validated against schema.
const schema = z.object({
  files: z.array(z.string()),
  changes: z.number(),
  summary: z.string()
})

<Codex schema={schema}>Analyze changes</Codex>
Benefits:
  • Type-safe output
  • Validation errors caught early
  • Better agent prompting (schema in prompt)
onFinished
(result: AgentResult<TSchema>) => void
Callback on successful completion.Receives parsed output if schema provided.
onFinished={(result) => {
  console.log(`Agent completed in ${result.durationMs}ms`)
  if (result.parsedOutput) {
    // Type-safe access
    processResults(result.parsedOutput)
  }
}}
onError
(error: Error) => void
Callback on error (timeout, API error, validation failure).
onError={(error) => {
  console.error(`Codex failed: ${error.message}`)
  metrics.recordAgentFailure('codex', error)
}}
Note: Prevents error propagation if provided. Omit to throw.

Implementation Status

1

Design Phase

Component designed for multi-provider failover in review system. View on GitHub
2

CLI Wrapper (Pending)

Implement Codex CLI wrapper similar to ClaudeCodeCLI.ts pattern.
3

Structured Output (Pending)

JSON schema generation from Zod, response validation.
4

Task Integration (Pending)

Smithers task lifecycle, database logging, observability.
5

Testing (Future)

Unit tests with mocked API, integration tests with real Codex.

Design Rationale

Why Codex Component?

Provider diversity: Reduce dependency on single AI provider Cost optimization: OpenAI pricing may be more favorable for certain workloads Model capabilities: GPT-4 excels at certain code tasks Fallback resilience: Alternative when Claude unavailable

Implementation Pattern

Similar to Claude component:
export function Codex<TSchema extends z.ZodType = z.ZodType>(
  props: CodexProps<TSchema>
): ReactNode {
  const { db } = useSmithers()
  const worktree = useWorktree()
  const taskIdRef = useRef<string | null>(null)

  useMount(() => {
    ;(async () => {
      taskIdRef.current = db.tasks.start('codex_agent', extractPrompt(props.children))

      try {
        const result = await executeCodexCLI({
          prompt: extractPrompt(props.children),
          model: props.model ?? 'gpt-4-turbo',
          temperature: props.temperature,
          systemPrompt: props.systemPrompt,
          cwd: props.cwd ?? worktree?.cwd,
          env: props.env,
          timeout: props.timeout ?? 300000,
          schema: props.schema
        })

        props.onFinished?.(result)
      } catch (error) {
        props.onError?.(error)
        throw error
      } finally {
        db.tasks.complete(taskIdRef.current)
      }
    })()
  })

  return <codex-agent model={props.model} status="running" />
}

Authentication

Uses OPENAI_API_KEY environment variable:
export OPENAI_API_KEY="sk-..."
Required in GitHub Actions secrets for CI integration.

Examples of Use Cases

Use Case 1: Code Review

const reviewSchema = z.object({
  decision: z.enum(['approve', 'request_changes']),
  blocking: z.boolean(),
  issues: z.array(z.object({
    severity: z.enum(['critical', 'major', 'minor']),
    message: z.string(),
    file: z.string().optional()
  })),
  summary: z.string()
})

<Codex
  model="gpt-4"
  schema={reviewSchema}
  systemPrompt="You are a senior code reviewer. Focus on correctness, security, and maintainability."
  onFinished={async (result) => {
    const review = result.parsedOutput
    await db.state.set('lastReview', review)

    if (review.decision === 'request_changes') {
      await postReviewComment(review.issues)
    }
  }}
>
  Review this PR:

  {prDiff}

  Check for:
  - Security vulnerabilities
  - Performance issues
  - Type safety
  - Test coverage
</Codex>

Use Case 2: Refactoring

<Codex
  model="gpt-4-turbo"
  temperature={0.3}  // Lower temp for focused refactoring
  systemPrompt="Expert in React hooks and modern patterns. Preserve functionality while improving code quality."
>
  Refactor this component to:
  1. Use modern React hooks (no class components)
  2. Extract custom hooks for reusable logic
  3. Add proper TypeScript types
  4. Improve performance with memoization

  {componentCode}
</Codex>

Use Case 3: Bug Analysis

const bugAnalysisSchema = z.object({
  rootCause: z.string(),
  affectedFiles: z.array(z.string()),
  suggestedFix: z.string(),
  testCases: z.array(z.string())
})

<Codex
  model="gpt-4"
  schema={bugAnalysisSchema}
  systemPrompt="Expert debugger. Provide root cause analysis and concrete fixes."
>
  Analyze this bug:

  Error: {errorMessage}
  Stack trace: {stackTrace}
  Recent changes: {gitDiff}

  Provide root cause, affected files, and suggested fix.
</Codex>

Alternatives Considered

  • Direct OpenAI API calls: No CLI abstraction, harder to test
  • Generic LLM component: Less specialized, more configuration needed
  • Wrap existing tools: Codex CLI not widely available, need custom wrapper
  • Only support Claude: Single point of failure, vendor lock-in

Migration Path

Current pattern (manual OpenAI API):
// Before (manual API calls)
const completion = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [
    { role: 'system', content: systemPrompt },
    { role: 'user', content: prompt }
  ],
  temperature: 0.7
})
const result = completion.choices[0].message.content
With Codex component:
// After (declarative)
<Codex
  model="gpt-4"
  temperature={0.7}
  systemPrompt={systemPrompt}
>
  {prompt}
</Codex>
Benefits: Task tracking, worktree integration, error handling, observability.

Feedback

If you have feedback on this planned component, please open an issue.