bunx smithers-orchestrator up or bunx smithers-orchestrator workflow run reports a failed run, the CLI launches the post-failure system workflow in the background against that run. It gathers the run’s state, events, and workflow source, has an agent investigate the root cause, and produces a verdict: what broke, how sure it is, and the exact command(s) to run next.
The trigger prints one line so you can follow along:
What the autopsy does
Thepost-failure workflow runs four steps against the failed run:
- Gather (deterministic):
inspect,events, and the workflow source for the failed run, plus the Smithers version. - Investigate (agent, read-only tools): digs through the evidence, re-runs read-only CLI commands, and reads the workflow. It never mutates anything: no retries, no rewinds, no edits.
- Bug gate (only when it suspects Smithers itself): an Approval pauses the run and asks you before anything is reported.
- Verdict: a stable output row with the failure class, root cause, suggestion, and commands.
Failure classes
The investigator classifies every failure as one of:| Class | Meaning |
|---|---|
workflow-bug | The workflow script or prompts are at fault (bad schema, wrong deps/needs, a compute task throwing). |
environment | A missing CLI, auth, network, or disk problem on this machine. |
agent-flake | A transient provider fault (rate limit, 5xx, timeout) that a re-run would likely clear. |
smithers-bug | Smithers itself misbehaved: an engine/CLI/component defect. Chosen conservatively, only when the evidence points into Smithers code. |
unknown | Evidence too thin to say. |
Suggestions
The verdict carries one suggestion plus the exact commands implementing it:retry: transient; re-run the failed task (bunx smithers-orchestrator retry-task) or the whole run.resume: the run can continue from where it stopped.rewind: state is bad but an earlier frame is good (bunx smithers-orchestrator rewind).edit-workflow-and-reset: the workflow needs a fix first; the verdict names the exact edit. Never edit a script while its run is resumable (that causesRESUME_METADATA_MISMATCH); make the edit, then start a fresh run.fix-environment: the exact install/auth/config fix.escalate: a human must decide; the verdict says what to look at.
Reporting Smithers bugs (approval-gated)
When the investigation concludes the failure is a bug in Smithers itself (not your workflow or environment), the autopsy pauses on an Approval gate explaining what it thinks the bug is. Nothing is sent anywhere without your explicit approval. If you approve, the workflow files the report withbunx smithers-orchestrator bug, which POSTs to https://bug.smithers.sh/api/bugs and records the returned bug id and URL in the verdict. If you deny, the verdict is kept and nothing is reported.
You can also file a report by hand at any time:
bunx smithers-orchestrator bug attaches the run’s workflow name, status, error, and recent events (secrets scrubbed) along with the Smithers version and platform. See the CLI catalog for its flags.
Opting out
The trigger is on by default. Turn it off with either:- the
--no-post-failureflag onbunx smithers-orchestrator up/bunx smithers-orchestrator workflow run, or - the
SMITHERS_POST_FAILURE=0environment variable.
- when the failing workflow is
post-failureitself or another ops workflow (triage-run,monitor,monitor-smithers,init), so autopsies never recurse; the launched autopsy runs withSMITHERS_POST_FAILURE=0in its environment for the same reason; - when the
post-failureworkflow is not installed. Then it prints the manual command instead:
bunx smithers-orchestrator init to install the workflow pack, which includes post-failure as a hidden system workflow (it does not appear in bunx smithers-orchestrator workflow list without --system, but is always runnable by id).