Skip to main content
This guide walks you through adding scorers to an existing workflow. By the end you will have live scoring on every task run, with results visible in the CLI and TUI.

Prerequisites

Step 1: Import Scorers

import {
  schemaAdherenceScorer,
  latencyScorer,
  relevancyScorer,
} from "smithers-orchestrator/scorers";

Step 2: Attach Scorers to a Task

Add the scorers prop to any <Task>:
<Task
  id="analyze"
  agent={claude}
  output={outputs.analysis}
  scorers={{
    schema: { scorer: schemaAdherenceScorer() },
    latency: { scorer: latencyScorer({ targetMs: 5000, maxMs: 30000 }) },
  }}
>
  <AnalysisPrompt />
</Task>
These two scorers are code-based and require no additional LLM calls.

Step 3: Add LLM-based Scoring (Optional)

For LLM-as-judge evaluation, pass an agent to the scorer factory:
import { AnthropicAgent } from "smithers-orchestrator";

const judge = new AnthropicAgent({
  model: "claude-sonnet-4-20250514",
});

<Task
  id="analyze"
  agent={claude}
  output={outputs.analysis}
  scorers={{
    schema: { scorer: schemaAdherenceScorer() },
    relevancy: {
      scorer: relevancyScorer(judge),
      sampling: { type: "ratio", rate: 0.2 },  // Score 20% of runs
    },
  }}
>
  <AnalysisPrompt />
</Task>

Step 4: Run Your Workflow

smithers up workflow.tsx
If you are running a discovered workflow from .smithers/workflows, use smithers workflow run <name> instead. Scorers run asynchronously after each task finishes. They never slow down your workflow.

Step 5: View Scores

CLI

# List all scores for a run
smithers scores <run_id>
Example output:
Scores for run abc123
┌──────────┬────────────────────┬───────┬───────────────────────────────┐
│ Node     │ Scorer             │ Score │ Reason                        │
├──────────┼────────────────────┼───────┼───────────────────────────────┤
│ analyze  │ Schema Adherence   │  1.00 │ Output matches schema         │
│ analyze  │ Latency            │  0.85 │ 7200ms (target: 5000ms)       │
│ analyze  │ Relevancy          │  0.92 │ Output directly addresses ... │
└──────────┴────────────────────┴───────┴───────────────────────────────┘

TUI

Open the TUI with smithers tui, navigate to a task, and switch to the Scores tab to see per-task scoring results.

Step 6: Custom Scorers

Build your own scorer with createScorer:
import { createScorer } from "smithers-orchestrator/scorers";

const wordCountScorer = createScorer({
  id: "word-count",
  name: "Word Count",
  description: "Scores based on output word count",
  score: async ({ output }) => {
    const words = String(output).split(/\s+/).length;
    const score = Math.min(words / 200, 1);
    return {
      score,
      reason: `Output contains ${words} words`,
    };
  },
});

Step 7: LLM-as-Judge Custom Scorers

Use llmJudge to build custom LLM-based scorers:
import { llmJudge } from "smithers-orchestrator/scorers";

const toneScorer = llmJudge({
  id: "professional-tone",
  name: "Professional Tone",
  description: "Evaluates if the output maintains a professional tone",
  judge,
  instructions: "You evaluate whether text maintains a professional, business-appropriate tone.",
  promptTemplate: ({ input, output }) =>
    `Rate the professionalism of this response on a scale of 0-1.\n\nInput: ${String(input)}\n\nOutput: ${String(output)}\n\nRespond with a JSON object: { "score": <number>, "reason": "<explanation>" }`,
});

Batch Evaluation

For testing and offline evaluation, use runScorersBatch directly:
import { runScorersBatch } from "smithers-orchestrator/scorers";

const results = await runScorersBatch(
  {
    myScorer: { scorer: schemaAdherenceScorer() },
  },
  {
    runId: "test-run",
    nodeId: "analyze",
    iteration: 0,
    attempt: 0,
    input: "Analyze this code",
    output: { summary: "The code is clean" },
    outputSchema: analysisSchema,
  },
  adapter,
);