Skip to main content
Planned Feature - This component is not yet implemented. See GitHub issues for usage context and design.

Gemini Component

Wraps Google Gemini API for AI agent capabilities with multi-modal input support (text, images, code), structured output, and Smithers task lifecycle integration. Provides alternative to Claude and Codex with Google’s latest models.

Planned API

interface GeminiProps<TSchema extends z.ZodType = z.ZodType> {
  /**
   * Prompt/task for the agent.
   */
  children: ReactNode

  /**
   * Model selection.
   * @default 'gemini-2.5-pro'
   */
  model?: 'gemini-2.5-pro' | 'gemini-2.5-flash' | 'gemini-1.5-pro'

  /**
   * Temperature for randomness (0-2).
   * @default 1
   */
  temperature?: number

  /**
   * System prompt for agent behavior.
   */
  systemPrompt?: string

  /**
   * Working directory for code operations.
   */
  cwd?: string

  /**
   * Environment variables.
   */
  env?: Record<string, string>

  /**
   * Timeout in milliseconds.
   * @default 300000 (5 minutes)
   */
  timeout?: number

  /**
   * Zod schema for structured output.
   */
  schema?: TSchema

  /**
   * Multi-modal inputs (images, files).
   */
  media?: Array<{
    type: 'image' | 'file'
    path: string
  }>

  /**
   * Callback on successful completion.
   */
  onFinished?: (result: AgentResult<TSchema>) => void

  /**
   * Callback on error.
   */
  onError?: (error: Error) => void
}

interface AgentResult<TSchema extends z.ZodType = z.ZodType> {
  output: string
  parsedOutput?: z.infer<TSchema>
  durationMs: number
}

export function Gemini<TSchema extends z.ZodType = z.ZodType>(
  props: GeminiProps<TSchema>
): JSX.Element

Proposed Usage

Basic Text Generation

import { Gemini, Phase, Step } from 'smithers-orchestrator'

export function AnalyzeCode() {
  return (
    <Phase name="Analysis">
      <Step name="analyze">
        <Gemini model="gemini-2.5-pro">
          Analyze this codebase and identify architectural issues.
          Focus on scalability and maintainability concerns.
        </Gemini>
      </Step>
    </Phase>
  )
}

With Structured Output

import { z } from 'zod'

const architectureSchema = z.object({
  issues: z.array(z.object({
    category: z.enum(['scalability', 'maintainability', 'performance', 'security']),
    severity: z.enum(['critical', 'major', 'minor']),
    description: z.string(),
    recommendation: z.string()
  })),
  summary: z.string(),
  overallScore: z.number().min(0).max(10)
})

<Gemini
  model="gemini-2.5-pro"
  schema={architectureSchema}
  onFinished={(result) => {
    const analysis = result.parsedOutput
    console.log(`Architecture score: ${analysis.overallScore}/10`)
    console.log(`Critical issues: ${analysis.issues.filter(i => i.severity === 'critical').length}`)
  }}
>
  Analyze codebase architecture
</Gemini>

Multi-Modal Analysis (Images)

<Gemini
  model="gemini-2.5-pro"
  media={[
    { type: 'image', path: './screenshots/ui-bug.png' },
    { type: 'image', path: './screenshots/expected.png' }
  ]}
>
  Compare these two UI screenshots.
  Identify differences and suggest CSS fixes for the bug.
</Gemini>

Fast Model for Simple Tasks

<Gemini model="gemini-2.5-flash" temperature={0.3}>
  Extract function names from this code file
</Gemini>

In Fallback Chain

import { FallbackAgent, Claude, Codex, Gemini } from 'smithers-orchestrator'

<FallbackAgent>
  <Claude model="sonnet">{prompt}</Claude>
  <Codex model="gpt-4">{prompt}</Codex>
  <Gemini model="gemini-2.5-pro">{prompt}</Gemini>
</FallbackAgent>

Props (Planned)

children
ReactNode
required
Prompt or task for the agent.Converted to string for API call.
<Gemini>Analyze this code for security issues</Gemini>
Can include dynamic content:
<Gemini>
  Review changes in PR #{prNumber}:
  {prDiff}
</Gemini>
model
'gemini-2.5-pro' | 'gemini-2.5-flash' | 'gemini-1.5-pro'
default:"gemini-2.5-pro"
Google Gemini model selection.Options:
  • gemini-2.5-pro - Latest, most capable (2M token context)
  • gemini-2.5-flash - Faster, cheaper, smaller context
  • gemini-1.5-pro - Previous generation (stable)
Use cases:
  • Complex analysis: gemini-2.5-pro
  • Large codebases: gemini-2.5-pro (huge context)
  • Simple tasks: gemini-2.5-flash
  • Stable/production: gemini-1.5-pro
temperature
number
default:"1"
Controls randomness (0-2).Lower (0-0.5): Deterministic, focused
  • Code analysis
  • Bug fixing
  • Security reviews
Medium (0.5-1): Balanced
  • General tasks
  • Documentation
Higher (1-2): Creative
  • Design suggestions
  • Brainstorming
systemPrompt
string
System-level instructions for agent behavior.Examples:
systemPrompt="You are a security expert. Flag all potential vulnerabilities, even minor ones."
systemPrompt="Focus on performance optimization. Suggest concrete improvements with benchmarks."
cwd
string
Working directory for code operations.Respects worktree context if inside <Worktree>.Priority: Explicit cwd > Worktree context > process.cwd()
env
Record<string, string>
Environment variables merged with process.env.
<Gemini env={{ LOG_LEVEL: "debug" }}>
  Debug this issue with verbose logging
</Gemini>
timeout
number
default:"300000"
Timeout in milliseconds (default 5 minutes).
<Gemini timeout={600000}>  {/* 10 minutes */}
  Analyze large codebase
</Gemini>
schema
z.ZodType
Zod schema for structured output validation.Gemini supports JSON mode with schema constraints.
const issueSchema = z.object({
  type: z.enum(['bug', 'feature', 'refactor']),
  priority: z.number().min(1).max(5),
  description: z.string()
})

<Gemini schema={issueSchema}>
  Categorize this issue
</Gemini>
Type safety: result.parsedOutput typed as z.infer<typeof issueSchema>
media
Array<{ type: 'image' | 'file', path: string }>
Multi-modal inputs for vision and document analysis.Image analysis:
<Gemini
  media={[
    { type: 'image', path: './screenshot.png' }
  ]}
>
  Describe what's wrong with this UI
</Gemini>
Multiple images:
<Gemini
  media={[
    { type: 'image', path: './before.png' },
    { type: 'image', path: './after.png' }
  ]}
>
  Compare these screenshots and list differences
</Gemini>
File analysis:
<Gemini
  media={[
    { type: 'file', path: './large-dataset.csv' }
  ]}
>
  Analyze this CSV and identify patterns
</Gemini>
Supports images (PNG, JPEG, WebP) and documents (PDF, TXT, CSV).
onFinished
(result: AgentResult<TSchema>) => void
Callback on successful completion.
onFinished={(result) => {
  console.log(`Gemini completed in ${result.durationMs}ms`)
  if (result.parsedOutput) {
    // Type-safe structured output
    processData(result.parsedOutput)
  }
}}
onError
(error: Error) => void
Callback on error.
onError={(error) => {
  console.error(`Gemini error: ${error.message}`)
  metrics.recordProviderFailure('gemini', error)
}}
Prevents error propagation if provided.

Implementation Status

1

Design Phase

Component designed for multi-provider fallback in review workflows. View on GitHub
2

API Client (Pending)

Implement Gemini API client using Bun native fetch.
3

Multi-Modal Support (Pending)

Image/file upload, base64 encoding, media API integration.
4

Structured Output (Pending)

JSON schema generation, response validation with Zod.
5

Task Integration (Pending)

Smithers task lifecycle, database logging, observability.
6

Testing (Future)

Unit tests with mocked API, integration tests with real Gemini.

Design Rationale

Why Gemini Component?

Massive context window: 2M tokens in Gemini 2.5 Pro (analyze entire codebases) Multi-modal capabilities: Native image/video/audio understanding Speed: Gemini Flash competitive on speed, lower cost Provider diversity: Reduce single-provider dependency Free tier: Gemini offers generous free quota for development

API vs CLI

Unlike Claude (CLI) and Codex (CLI), Gemini uses direct API:
async function executeGeminiAPI(props: GeminiProps): Promise<AgentResult> {
  const response = await fetch('https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${process.env.GOOGLE_API_KEY}`
    },
    body: JSON.stringify({
      contents: buildContents(props),
      generationConfig: {
        temperature: props.temperature,
        responseSchema: props.schema ? zodToJsonSchema(props.schema) : undefined
      },
      systemInstruction: props.systemPrompt
    })
  })

  return parseResponse(await response.json())
}
Rationale: No official Gemini CLI, API well-documented and stable.

Authentication

Uses GOOGLE_API_KEY environment variable:
export GOOGLE_API_KEY="AIza..."
Required in GitHub Actions secrets for CI integration.

Multi-Modal Content Format

interface Content {
  role: 'user' | 'model'
  parts: Array<
    | { text: string }
    | { inlineData: { mimeType: string; data: string } }
  >
}
Text and images combined in single request.

Examples of Use Cases

Use Case 1: UI Screenshot Analysis

<Gemini
  model="gemini-2.5-pro"
  media={[
    { type: 'image', path: './ui-bug.png' }
  ]}
  schema={z.object({
    issues: z.array(z.object({
      element: z.string(),
      problem: z.string(),
      cssFix: z.string()
    })),
    severity: z.enum(['critical', 'minor'])
  })}
>
  Analyze this screenshot for UI bugs.
  Identify misaligned elements, broken styles, and provide CSS fixes.
</Gemini>

Use Case 2: Entire Codebase Analysis

// Gemini 2.5 Pro: 2M token context - can analyze huge codebases
<Gemini
  model="gemini-2.5-pro"
  timeout={900000}  // 15 minutes for large analysis
>
  Analyze entire codebase structure:

  {await getAllSourceFiles()}

  Provide:
  1. Architectural overview
  2. Dependencies graph
  3. Code quality metrics
  4. Refactoring opportunities
</Gemini>

Use Case 3: Document Understanding

<Gemini
  model="gemini-2.5-pro"
  media={[
    { type: 'file', path: './requirements.pdf' }
  ]}
  schema={z.object({
    features: z.array(z.string()),
    technicalRequirements: z.array(z.string()),
    timeline: z.string(),
    complexity: z.enum(['low', 'medium', 'high'])
  })}
>
  Extract structured information from this requirements document.
  List all features, technical requirements, and estimate complexity.
</Gemini>

Use Case 4: Fast Fallback

<FallbackAgent>
  {/* Try expensive model first */}
  <Claude model="opus">{complexPrompt}</Claude>

  {/* Fast, cheap fallback */}
  <Gemini model="gemini-2.5-flash" temperature={0.5}>
    {complexPrompt}
  </Gemini>
</FallbackAgent>

Alternatives Considered

  • CLI wrapper: No official Gemini CLI, would need custom tool
  • Generic LLM component: Less specialized, more configuration
  • Only support Claude/Codex: Missing multi-modal and huge context benefits
  • Vertex AI: More complex auth, less suitable for open-source tool

Migration Path

Current pattern (manual Gemini API):
// Before (manual)
const response = await fetch('https://generativelanguage.googleapis.com/...', {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${apiKey}` },
  body: JSON.stringify({
    contents: [{ parts: [{ text: prompt }] }]
  })
})
const data = await response.json()
const result = data.candidates[0].content.parts[0].text
With Gemini component:
// After (declarative)
<Gemini>{prompt}</Gemini>
Benefits: Task tracking, error handling, structured output, multi-modal support, observability.

Feedback

If you have feedback on this planned component, please open an issue.