Skip to main content
A scorer grades a task’s output and returns a number between 0 and 1. Attach scorers to a Task via its scorers prop; they run after the task completes and never block the workflow. Each result is persisted to the _smithers_scorers table so you can aggregate scores across runs. All scorer values and types are re-exported from the smithers-orchestrator facade, which is canonical. The smithers-orchestrator/scorers subpath exports the same surface.
import {
  createScorer,
  llmJudge,
  faithfulnessScorer,
  schemaAdherenceScorer,
  latencyScorer,
  aggregateScores,
  runScorersBatch,
} from "smithers-orchestrator";
import type { Scorer, ScoreResult, ScorersMap } from "smithers-orchestrator";
The component that hosts scorers (Task) is returned by the factory, not imported. See the Components reference for the scorers prop and ScorersMap for its shape.

Concepts

A Scorer is a named, self-describing evaluator. Its score function is a ScorerFn: given a ScorerInput, it returns a Promise<ScoreResult>.
Scorer
object
A named scorer.
ScorerInput
object
The argument passed to a ScorerFn. Built from the task’s input, output, and metadata at scoring time.
ScoreResult
object
What a ScorerFn returns.
A scorer is bound to a task through a ScorerBinding, and a ScorersMap is the keyed set of bindings you pass to the scorers prop. Each binding may carry a SamplingConfig controlling how often the scorer runs.
type ScorerBinding = { scorer: Scorer; sampling?: SamplingConfig };
type ScorersMap    = Record<string, ScorerBinding>;
type SamplingConfig =
  | { type: "all" }                  // run every time (default)
  | { type: "ratio"; rate: number }  // run with probability `rate`
  | { type: "none" };                // never run
<Task
  id="analyze"
  output={outputs.analysis}
  agent={analyst}
  scorers={{
    schema: { scorer: schemaAdherenceScorer() },
    latency: { scorer: latencyScorer({ targetMs: 5000, maxMs: 20000 }) },
    safety: { scorer: toxicityScorer(judge), sampling: { type: "ratio", rate: 0.1 } },
  }}
>
  Analyze the report.
</Task>
Source types.ts · See also ScorersMap, Components

createScorer

Build a custom Scorer from a plain config object. The returned scorer is just its config; the work lives in your score function.
function createScorer(config: CreateScorerConfig): Scorer;
config
CreateScorerConfig
required
Scorer
object
The named scorer, ready to bind to a task.
const wordCount = createScorer({
  id: "word-count",
  name: "Word Count",
  description: "Scores toward 1.0 as output approaches 200 words",
  score: async ({ output }) => ({
    score: Math.min(String(output).split(/\s+/).length / 200, 1),
  }),
});
Source createScorer.js · Tests create-scorer.test.js · See also llmJudge

llmJudge

Build an LLM-as-judge scorer that delegates evaluation to an agent. The judge is prompted with your instructions plus the output of promptTemplate, and is expected to reply with JSON { "score": <0-1>, "reason": "<text>" }. The reply is parsed leniently (a bare number works, and braces inside reason do not truncate the match), the score is clamped to 0–1, and an unparseable reply scores 0.
function llmJudge(config: LlmJudgeConfig): Scorer;
config
LlmJudgeConfig
required
Scorer
object
A scorer whose score calls judge.generate(...) and parses the reply.
const tone = llmJudge({
  id: "tone",
  name: "Professional Tone",
  description: "Evaluates professional tone",
  judge,
  instructions: "You evaluate text for professional tone.",
  promptTemplate: ({ output }) =>
    `Rate the professionalism (0-1 JSON):\n\n${String(output)}`,
});
Source llmJudge.js · Tests create-scorer.test.js · See also Built-in scorers

Built-in scorers

Each built-in is a factory that returns a Scorer. The three judge-based scorers take an AgentLike judge; the two deterministic ones do not call an agent.
ScorerWhat it measuresFactory
faithfulnessScorerOutput is grounded in context, no hallucinationsfaithfulnessScorer(judge)
relevancyScorerOutput addresses the inputrelevancyScorer(judge)
toxicityScorerToxic, harmful, or inappropriate content (higher = more toxic)toxicityScorer(judge)
schemaAdherenceScorerOutput passes the task’s outputSchema (1 valid, 0 invalid)schemaAdherenceScorer()
latencyScorerExecution time vs. budget (1 at/below target, 0 at/above max)latencyScorer({ targetMs, maxMs })
const grounded = faithfulnessScorer(judge);
const onSchema = schemaAdherenceScorer();
const fast = latencyScorer({ targetMs: 5000, maxMs: 20000 });
schemaAdherenceScorer and latencyScorer no-op (score 1) when the input lacks an outputSchema or latencyMs. toxicityScorer scores the level of toxicity, so clean text scores near 0.

smithersScorers

smithersScorers is the Drizzle table backing scorer persistence (_smithers_scorers). Every scorer result is inserted here as a ScoreRow; aggregateScores reads from it. Use it for direct queries against your store. Source faithfulnessScorer.js · schema.js · Tests builtins.test.js · See also ScoreRow

Running scorers

Bound scorers run automatically when a task completes, so you rarely call these directly. They are exported for custom hosts, batch evaluation, and tooling.

runScorersAsync

Fire-and-forget execution for live scoring. Runs every binding concurrently via Effect.runFork and returns immediately, so scoring never blocks the workflow. Failures are logged, not thrown.
function runScorersAsync(
  scorers: ScorersMap,
  ctx: ScorerContext,
  adapter: SmithersDb | null,
  eventBus?: EventBus | null,
): void;
scorers
ScorersMap
required
The keyed bindings to run.
ctx
ScorerContext
required
Run/node coordinates plus the data the scorers grade. See ScorerContext.
adapter
SmithersDb | null
required
Database adapter to persist results, or null to skip persistence.
eventBus
EventBus | null
Optional bus that receives ScorerStarted / ScorerFinished / ScorerFailed events.

runScorersBatch

Blocking execution for batch and test evaluation. Runs every binding concurrently and resolves to a map of binding key to ScoreResult (or null when a scorer is sampled out or fails).
function runScorersBatch(
  scorers: ScorersMap,
  ctx: ScorerContext,
  adapter: SmithersDb | null,
  eventBus?: EventBus | null,
): Promise<Record<string, ScoreResult | null>>;
Promise<Record<string, ScoreResult | null>>
object
One entry per binding key, in the order the scorers were declared.
const results = await runScorersBatch(
  { quality: { scorer: tone } },
  {
    runId: "RUN_ID",
    nodeId: "NODE_ID",
    iteration: 0,
    attempt: 1,
    input: "Summarize the article.",
    output: "...",
  },
  null,
);
// results.quality?.score

aggregateScores

Compute per-scorer statistics across persisted results: count, mean, min, max, p50, and stddev. Filter to a run, node, or scorer.
function aggregateScores(
  adapter: SmithersDb,
  opts?: AggregateOptions,
): Promise<AggregateScore[]>;
adapter
SmithersDb
required
Database adapter to read scorer rows from.
opts
AggregateOptions
Promise<AggregateScore[]>
object
One row per scorer, ordered by scorer name.
const stats = await aggregateScores(adapter, { runId: "RUN_ID" });
Scores for a run are also viewable from the CLI:
bunx smithers-orchestrator scores RUN_ID
Source run-scorers.js · aggregate.js · Tests run-scorers.test.js · aggregate.test.js · See also ScorerContext, AggregateScore
To wire scorers into a workflow and read them back, see the Evals quickstart. For the full type surface, see the Types reference. For the scorers prop on Task, see the Components reference.