Task via its scorers prop; they run after the task
completes and never block the workflow. Each result is persisted to the
_smithers_scorers table so you can aggregate scores across runs.
All scorer values and types are re-exported from the smithers-orchestrator
facade, which is canonical. The smithers-orchestrator/scorers subpath exports
the same surface.
The component that hosts scorers (
Task) is returned by the factory, not
imported. See the Components reference for the
scorers prop and ScorersMap for its shape.Concepts
AScorer is a named, self-describing evaluator. Its score function is a
ScorerFn: given a ScorerInput, it returns a Promise<ScoreResult>.
A named scorer.
The argument passed to a
ScorerFn. Built from the task’s input, output, and
metadata at scoring time.What a
ScorerFn returns.ScorerBinding, and a ScorersMap is the
keyed set of bindings you pass to the scorers prop. Each binding may carry a
SamplingConfig controlling how often the scorer runs.
types.ts · See also ScorersMap, Components
createScorer
Build a customScorer from a plain config object. The returned scorer is just
its config; the work lives in your score function.
The named scorer, ready to bind to a task.
createScorer.js · Tests create-scorer.test.js · See also llmJudge
llmJudge
Build an LLM-as-judge scorer that delegates evaluation to an agent. The judge is prompted with yourinstructions plus the output of promptTemplate, and is
expected to reply with JSON { "score": <0-1>, "reason": "<text>" }. The reply
is parsed leniently (a bare number works, and braces inside reason do not
truncate the match), the score is clamped to 0–1, and an unparseable reply
scores 0.
A scorer whose
score calls judge.generate(...) and parses the reply.llmJudge.js · Tests create-scorer.test.js · See also Built-in scorers
Built-in scorers
Each built-in is a factory that returns aScorer. The three judge-based
scorers take an AgentLike judge; the two deterministic ones do not call an
agent.
| Scorer | What it measures | Factory |
|---|---|---|
faithfulnessScorer | Output is grounded in context, no hallucinations | faithfulnessScorer(judge) |
relevancyScorer | Output addresses the input | relevancyScorer(judge) |
toxicityScorer | Toxic, harmful, or inappropriate content (higher = more toxic) | toxicityScorer(judge) |
schemaAdherenceScorer | Output passes the task’s outputSchema (1 valid, 0 invalid) | schemaAdherenceScorer() |
latencyScorer | Execution time vs. budget (1 at/below target, 0 at/above max) | latencyScorer({ targetMs, maxMs }) |
schemaAdherenceScorer and latencyScorer no-op (score 1) when the input lacks
an outputSchema or latencyMs. toxicityScorer scores the level of
toxicity, so clean text scores near 0.smithersScorers
smithersScorers is the Drizzle table backing scorer persistence (_smithers_scorers).
Every scorer result is inserted here as a ScoreRow;
aggregateScores reads from it. Use it for direct queries
against your store.
Source faithfulnessScorer.js · schema.js · Tests builtins.test.js · See also ScoreRow
Running scorers
Bound scorers run automatically when a task completes, so you rarely call these directly. They are exported for custom hosts, batch evaluation, and tooling.runScorersAsync
Fire-and-forget execution for live scoring. Runs every binding concurrently viaEffect.runFork and returns immediately, so scoring never blocks the workflow.
Failures are logged, not thrown.
The keyed bindings to run.
Run/node coordinates plus the data the scorers grade. See
ScorerContext.Database adapter to persist results, or
null to skip persistence.Optional bus that receives
ScorerStarted / ScorerFinished /
ScorerFailed events.runScorersBatch
Blocking execution for batch and test evaluation. Runs every binding concurrently and resolves to a map of binding key toScoreResult (or null
when a scorer is sampled out or fails).
One entry per binding key, in the order the scorers were declared.
aggregateScores
Compute per-scorer statistics across persisted results:count, mean, min,
max, p50, and stddev. Filter to a run, node, or scorer.
Database adapter to read scorer rows from.
One row per scorer, ordered by scorer name.
run-scorers.js · aggregate.js · Tests run-scorers.test.js · aggregate.test.js · See also ScorerContext, AggregateScore
To wire scorers into a workflow and read them back, see the Evals quickstart. For the full type surface, see the Types reference. For the
scorers prop on Task, see the
Components reference.