How It Works
Every task output is written to SQLite keyed by(runId, nodeId, iteration). When you resume a run, Smithers re-renders the JSX tree with the persisted outputs already available in ctx. Tasks that already have valid output rows are marked finished and skipped. Tasks that were in-progress or pending are picked up from where they left off.
The resume flow:
- Load existing state — Smithers reads
_smithers_runs,_smithers_nodes, and_smithers_attemptsfor the givenrunId. - Stale attempt cleanup — Any in-progress attempts older than 15 minutes are automatically cancelled. This prevents zombie tasks from blocking forward progress. The associated nodes are reset to
pending. - Re-render — The JSX tree is rendered with the current
ctx, which includes all previously persisted outputs. Completed tasks are naturally skipped because their output exists. - Resume execution — The engine schedules and executes any remaining runnable tasks.
Deterministic Node IDs
Resumability relies on stable, deterministic node identity. A task’s identity comes from itsid prop:
nodeId in the database is "analyze". If you rename the id prop between runs, Smithers treats it as a new task and the old output is orphaned.
Rules for stable IDs:
- Use fixed, descriptive strings for static tasks:
id="analyze",id="report". - For dynamic tasks, derive the ID from a stable identifier:
id={$:implement}. - Never use array indices or timestamps as IDs — they change between renders.
Resume via CLI
Start a run, then resume it later:--input again.
Resume Programmatically
resume: true is set, Smithers loads the existing run state instead of creating a new run.
What Gets Skipped on Resume
| Node state before resume | Behavior on resume |
|---|---|
finished | Skipped. Output row exists and is valid. |
skipped | Remains skipped. |
failed (retries exhausted) | Stays failed unless the workflow code changed to allow more retries. |
in-progress (stale) | Cancelled after 15 minutes, then retried as pending. |
in-progress (recent) | Left in-progress. If the process died, the attempt will time out and be cleaned up on the next resume. |
pending | Scheduled for execution. |
waiting-approval | Stays waiting. Approve or deny to unblock. |
cancelled | Stays cancelled. |
Stale Attempt Recovery
If a process crashes mid-execution, some tasks may be stuck inin-progress state with no process to complete them. Smithers handles this automatically:
- On resume, any in-progress attempt with a
started_at_msolder than 15 minutes is markedcancelled. - The associated node is reset to
pending. - The task will be picked up on the next scheduling pass.
Common Resume Scenarios
Crash during execution
Waiting for approval
Fixing a bug and retrying
If a task failed because of a bug in your workflow code:- Fix the code in your workflow file.
- Resume the run. The failed task’s node may be re-evaluated if the retry count changed, or you can start a fresh run.
Database Tables
Smithers uses these internal tables for resume state. You can query them for debugging:Tips
- Always use stable task IDs. Changing IDs between runs breaks resume because the engine cannot match old output rows to new task nodes.
- Test resume in development. Run your workflow, cancel it partway through, and resume to verify it picks up correctly.
- Check for stale runs. Use
bunx smithers list workflow.tsx --status runningto find runs that may need to be resumed or cancelled. - Input immutability. Once a run starts, the input is persisted. Passing different input on resume is an error.
Next Steps
- Debugging — Inspect run state and diagnose resume issues.
- Execution Model — Understand the render-schedule-execute loop that drives resume.
- VCS Integration — Revert filesystem changes to a specific attempt.