Use eval suites when a workflow is important enough to protect with repeatable cases. Each case gets a persisted run record, stable case metadata, and a JSON report that can be checked in CI.Documentation Index
Fetch the complete documentation index at: https://smithers.sh/llms.txt
Use this file to discover all available pages before exploring further.
1. Create cases
Createevals/smoke.jsonl:
cases array:
expected checks:
status: one offinished,failed,cancelled,waiting-approval,waiting-event, orwaiting-timeroutput: exact JSON match against the workflow result outputoutputContains: recursive partial JSON matcherrorContains: substring match against thrown errors
2. Dry-run the plan
--run-label <label> when you want the dry-run and execution to use the same generated IDs.
3. Execute the suite
.smithers/evals/smoke.json. Use --report path/to/report.json to choose a different location.
The command exits 0 when all cases pass and 1 when any case fails. Invalid case files exit with 4.
4. Use structured output in CI
Options that matter in production
--concurrency N: run multiple cases at once; keep this low for stateful or expensive workflows.--run-label LABEL: append a stable label to run IDs, useful for CI build IDs or benchmark names.--max-concurrency N: pass a per-workflow task concurrency cap to each case.--max-cases N: shard or sample a large suite.--no-include-output: omit workflow outputs from the report when outputs are too large or sensitive.--allow-network: enable network access for bash tools in cases that need it.--root PATH: set the sandbox root for tool execution.
Run a discovered workflow
eval also accepts workflow IDs from .smithers/workflows: