Snapshots
A snapshot is a frozen picture of everything that matters at a specific frame in a run:- Node states — which tasks are pending, running, finished, or failed
- Output rows — the actual data each completed task produced
- Loop (Ralph) state — loop iteration counters and completion flags (
<Loop>renders as a<smithers:ralph>host element internally) - Input data — the original input row that started the run
- VCS pointer — the jj
change_idat the time of capture (if the repo uses jj) - Workflow hash — hash of the workflow definition at capture time (
nullif unavailable) - Content hash — SHA-256 of the serialized state, so you can detect identical snapshots cheaply
_smithers_snapshots with a composite key of (run_id, frame_no). The serialized JSON blobs are self-contained — you can reconstruct the full workflow state from any single snapshot without reading the event log.
Loading the Latest Snapshot
To load the most recent snapshot for a run (regardless of frame number), useloadLatestSnapshot:
Why Not Just Use Events?
Events tell you what happened. Snapshots tell you what the world looked like. Replaying an event log to reconstruct state is expensive and error-prone. A snapshot gives you the answer in one read.Diffing
Given two snapshots,diffSnapshots computes a structured diff:
smithers diff:
Forking
Forking creates a new run that starts from the state of an existing run at a specific frame. Thinkgit branch but for workflow execution.
- A new run is created with a fresh
runId - The snapshot from the parent run at the specified frame is copied as the initial state
- Optionally, specific nodes are reset to “pending” — they and their downstream dependents will re-execute
- Optionally, the input is overridden with new values
- Optionally, a
forkDescriptionis attached for traceability - The parent-child relationship is recorded in
_smithers_branches
Reset Nodes
TheresetNodes parameter lets you selectively re-execute specific tasks. When a node is reset:
- Its state is set to
pending - Its output row is cleared
- Any downstream nodes that depend on it are also reset (transitively)
Listing Branches
To list all forks that branched from a given run:BranchInfo contains the child runId, parentRunId, parentFrameNo, optional branchLabel and forkDescription, and a createdAtMs timestamp.
Looking Up Branch Info
To check whether a run is itself a fork and retrieve its parent relationship:Replay
Replay combines forking with execution. It creates a forked run and immediately starts running it:--restore-vcs, Smithers also restores the filesystem to the jj revision that was active at the source frame. This means the code that runs the workflow is the same code that was running when the snapshot was taken.
VCS Integration
Every snapshot records the jjchange_id and operation ID at capture time. This creates a parallel timeline: workflow state in SQLite, filesystem state in jj.
The _smithers_vcs_tags table maps (run_id, frame_no) to VCS metadata. To look up a specific tag:
--restore-vcs, Smithers:
- Looks up the VCS pointer for the source frame
- Creates a jj workspace at that revision
- Executes the workflow from the workspace
Timeline Visualization
The timeline shows the complete execution history of a run and all its forks:--tree flag recursively includes all child runs (forks), building a tree of execution history. The TUI’s run detail view shows this automatically.
Database Tables
Time travel adds three tables:| Table | Primary Key | Purpose |
|---|---|---|
_smithers_snapshots | (run_id, frame_no) | Full state capture at each frame |
_smithers_branches | run_id | Parent-child fork relationships |
_smithers_vcs_tags | (run_id, frame_no) | jj revision metadata per snapshot |
_smithers_runs table also gains three columns: parent_run_id, parent_frame_no, and branch_label, making fork relationships queryable directly from the runs table.
Selecting a Specific Attempt
When you time-travel to a node, by default Smithers picks the most recent attempt. Pass an explicitattempt number to target a different one:
attempt field on TimeTravelOptions. If the specified attempt does not exist, the operation fails with success: false and no changes are made.
Run Reset
Thesmithers reset command resets an entire run back to its starting state without creating a fork. Unlike smithers travel (which targets a specific node), a run reset re-queues every node and clears all outputs:
smithers replay when you want to preserve the original run history alongside the re-execution.
Metrics
Time travel operations export four metrics:| Metric | Type | Description |
|---|---|---|
smithers.snapshots.captured | counter | Total snapshots written to the database |
smithers.snapshot.duration_ms | histogram | Time to serialize and write a single snapshot |
smithers.forks.created | counter | Total fork operations completed |
smithers.replays.started | counter | Total replay operations initiated |
Reset Dependents Toggle
By default, when you time-travel to a specific node, every downstream node that ran after the target attempt is also reset. You can disable cascade reset with--no-deps:
resetDependents: false on TimeTravelOptions. This is useful when you want to re-run a single task without disturbing work that is downstream but was not actually affected by the target node’s output.
When resetDependents is true (the default), Smithers identifies all nodes whose attempts started at or after the target attempt’s start timestamp and resets them too.
Frame History Truncation on Revert
When a time-travel operation completes, Smithers truncates the frame log to match. All frames with acreated_at_ms after the target attempt’s start timestamp are deleted from _smithers_frames. This keeps the frame history consistent with the reset node states — if you render the workflow after time travel, the frame log reflects the point in time you reverted to.
Snapshot Restoration on Resume
When a suspended run resumes (for example, after a<WaitForEvent> unblocks), the engine calls restoreDurableStateFromSnapshot before re-entering the render loop. It loads the most recent snapshot for the run, re-inserts the input row, and rebuilds node state from the snapshot data. This means a resumed run does not need to replay the entire event log — it picks up from the last committed snapshot.
Events
Three new event types track time travel operations:SnapshotCaptured— emitted after each automatic snapshot. CarriesrunId,frameNo, andcontentHash.RunForked— emitted when a fork is created. Carries the parent run ID, parent frame, and optional branch label.ReplayStarted— emitted when a replay begins. Carries the source run ID and frame.
Next Steps
- Time Travel Quickstart — Walk through diffing, replaying, and restoring runs from the CLI.
- Workflow State — The state model snapshots serialize and restore.
- Suspend and Resume — How resumed runs restore from the latest snapshot.
- Runtime Revert — Runtime APIs behind revert and replay operations.
- Debugging Guide — Use snapshot diffs and forks to investigate failures.