The Durability Contract
The rule fits in one sentence:A completed task is never re-executed. When a workflow resumes, it picks up from the first incomplete task.Think of it like a save game. Every time a task finishes, Smithers writes the result to disk. If the power goes out, you do not replay the entire game from the title screen. You reload your last save and keep going. So that forty-minute, twelve-step workflow that crashed at step seven? You resume from step seven. Steps one through six are done. Their outputs are already in SQLite. You do not pay for them again.
How State Is Preserved
Every task output is written to SQLite immediately on completion, keyed by(runId, nodeId, iteration). When you resume a run, Smithers does five things:
- Loads existing state — Reads run metadata, node states, and attempt history from SQLite
- Validates the environment — Checks that the workflow file hash and VCS revision match the original run
- Cleans up stale work — Cancels any in-progress attempts older than 15 minutes
- Re-renders — Builds the JSX tree with persisted outputs already in context
- Continues — Schedules and executes remaining incomplete tasks
Three Ways Workflows Pause
There are exactly three reasons a workflow stops before it finishes: something broke, someone needs to decide, or you told it to stop.1. Crash Recovery
The process dies. Maybe the machine ran out of memory. Maybe you hit Ctrl-C at the wrong moment. Either way, some tasks are stuck inin-progress with no process behind them.
On resume, Smithers handles this automatically:
2. Approval Gates
Some steps should not proceed without a human saying yes. When a workflow reaches an<Approval> node or a <Task needsApproval>, it pauses durably until someone decides:
waiting-approval status. Nothing runs. Nothing times out. It waits as long as it needs to. Resolve it from the CLI:
3. Manual Cancellation
Sometimes you want to stop a workflow on purpose — maybe you realized the input was wrong, or you need the machine for something else. Cancel now, resume later:What Gets Skipped on Resume
This table is worth memorizing, or at least bookmarking:| Node state before resume | Behavior on resume |
|---|---|
finished | Skipped. Output exists and is valid. |
skipped | Remains skipped. |
failed (retries exhausted) | Stays failed unless workflow code now allows more retries. |
in-progress (stale, >15 min) | Cancelled, then retried as pending. |
in-progress (recent) | Left in-progress. Will time out and be cleaned up on next resume. |
pending | Scheduled for execution. |
waiting-approval | Stays waiting. Approve or deny to unblock. |
cancelled | Stays cancelled. |
Resuming Programmatically
The CLI is fine for manual recovery. For automation, use the API directly:runId is the thread that ties the two calls together.
Stable Task IDs
Here is where most people trip up. Resumability depends on stable, deterministic task identity. Theid prop on each <Task> becomes the durable key in SQLite. If the key changes between runs, Smithers cannot find the old output. It treats the task as new and runs it from scratch.
task-${index} bad? Because if you insert a new item at the beginning of a list, every index shifts. Task 3 becomes task 4, and suddenly Smithers loads task 4’s old output into the wrong context. This is the same problem React has with list keys, and the fix is the same: derive keys from the data, not the position.
Rules for stable IDs:
- Use fixed strings for static tasks:
id="analyze",id="report" - Derive IDs from stable data for dynamic tasks:
id={$:implement} - Never use array indices, timestamps, or random values
Loop State Persistence
Loops are where durability really earns its keep. If a workflow crashes mid-loop, you do not want to replay every completed iteration. And you do not have to:- Completed iterations are preserved (each has its own output row)
- The loop resumes from the incomplete iteration
ctx.latest()correctly returns the most recent completed output
implement but before review, resuming picks up at iteration 2’s review task. Iterations 0 and 1 are untouched. Their outputs sit in SQLite, ready for anything that needs them.
Environment Validation
“What if I fix a bug in my workflow and then resume?” Smithers will not let you. On resume, Smithers checks that:- The workflow file hash matches the original run
- The VCS revision matches (if tracked)
Next Steps
- Human-in-the-Loop — Approval gates, denial policies, and multi-step approval patterns.
- Resumability Guide — Practical tips for designing resumable workflows.
- Execution Model — The internal execution loop that drives suspend and resume.