- the run input payload
- task output rows
- internal workflow metadata
Run Input
When you kick off a workflow, you hand it a payload. That payload is the entire context your workflow gets from the outside world:ctx.input. If the run crashes and resumes, the same input is still there — unless you explicitly override it.
So what should go in input? Three things:
- user-supplied run context
- durable across resume
- available everywhere through
ctx.input
Task Outputs
Here is where your domain data lives. Most Smithers workflows define output schemas up front withcreateSmithers(...):
- creates the SQLite table
- maps the schema key to a snake_case table name
- adds
runId,nodeId, anditerationbookkeeping columns - validates agent output before persisting it
runIdnodeIditerationattempt- approval metadata
Identity of an Output Row
Here is a subtlety that trips people up. Two different tasks can write to the same output schema. The same task can write to it ten times inside a loop. So how does Smithers know which row is which? The answer: output identity is not “table name only.” Each row is keyed by:- run id
- task id (
nodeId) - iteration when the task is inside a loop
ctx.output(...), ctx.outputMaybe(...), and ctx.latest(...) all require both an output target and a nodeId. The table tells Smithers where to look. The node ID and iteration tell it which row you mean.
Custom Drizzle Tables
Sometimes you already have a table, or you need a schema that Smithers cannot auto-generate. In that case,<Task output={...}> can point at a custom Drizzle table.
Fair warning: when you go this route, you take on responsibility that Smithers normally handles for you:
- creating and migrating the table
- including Smithers bookkeeping columns such as
runIdandnodeId - including
iterationin looped tasks - optionally pairing the table with
outputSchemafor stricter validation
createSmithers(...) can express your schema, use it.
Internal Smithers Metadata
Open your database and you will see tables prefixed with_smithers_. Do not be alarmed. These are Smithers’ own operational tables:
| Table | Purpose |
|---|---|
_smithers_runs | One row per workflow run. Tracks status, heartbeat, VCS revision, and error. |
_smithers_nodes | Current state of each task node within a run (pending, running, finished, failed). |
_smithers_attempts | Every execution attempt for every node, including start/finish timestamps and error detail. |
_smithers_frames | The rendered JSX tree at each commit boundary, stored as serialized XML. |
_smithers_approvals | Approval requests and decisions for tasks gated by <Approval>. |
_smithers_human_requests | Human-in-the-loop requests (form fills, confirmations) and their responses. |
_smithers_cache | Cached task outputs keyed by workflow, node, schema signature, and agent signature. |
_smithers_sandboxes | Sandbox session metadata for bubblewrap and container-based execution. |
_smithers_tool_calls | Per-call log of every tool invocation: input, output, latency, and status. |
_smithers_events | Sequential event journal for a run. Source of truth for all observable events. |
_smithers_ralph | Loop (<Loop>) iteration counters and completion flags. |
_smithers_cron | Cron schedule definitions, last-run and next-run timestamps. |
_smithers_scorers | Scorer results for each task attempt: score, reason, and latency. |
_smithers_vectors | RAG vector store: chunk text, embeddings (as BLOBs), and metadata. |
_smithers_signals | Inbound signals received by waiting runs. |
Table Schema Ensurance and Auto-Migration
Smithers callsensureSmithersTables() at startup, which runs CREATE TABLE IF NOT EXISTS for every internal table. You never need to run migrations by hand for _smithers_* tables.
For your own output tables defined via createSmithers(...), Smithers also auto-migrates columns. When the Drizzle schema defines a column that is missing from the SQLite table on disk, Smithers issues an ALTER TABLE ... ADD COLUMN statement to add it. Columns that exist in the database but are absent from the schema are left in place — Smithers does not remove data.
This forward-only migration means you can add fields to an output schema and existing runs will continue to work. Removing a field or changing a column type requires a manual migration or a fresh database.
Schema Signature Verification
Before persisting a cached task result, Smithers computes a schema signature for the output table. The signature is a SHA-256 hash of the table name and every column’s name, type, nullability, and primary key flag, all sorted alphabetically:schema_sig in _smithers_cache. When a cached result is retrieved, Smithers checks that the current table’s signature still matches. If the schema changed since caching, the cached entry is ignored and the task runs fresh. You never get silently stale cache hits after a schema migration.
Transaction Model
SmithersDb uses a single-writer transaction model with a serial promise queue. Every write operation (including those outside an explicit transaction) acquires a turn in a transactionTail promise chain before proceeding. This serializes all writes even when multiple Effect fibers run concurrently.
Explicit transactions use BEGIN IMMEDIATE so SQLite acquires a write lock immediately, preventing lock contention with concurrent readers:
Write Retry and Exponential Backoff
All write paths wrap the underlying operation withwithSqliteWriteRetryEffect. When a write fails with SQLITE_BUSY, SQLITE_IOERR, “database is locked”, or “disk i/o error”, Smithers retries up to six times with exponential backoff:
- Base delay: 50 ms
- Maximum delay: 2,000 ms
- Jitter: ±25% of the computed delay
- Each retry increments the
smithers.db.retriescounter
DB_WRITE_FAILED SmithersError. This makes Smithers resilient to transient WAL-mode lock contention without requiring any configuration.
Frame Codec
Render frames in_smithers_frames are stored in one of three encodings:
| Encoding | When used | Description |
|---|---|---|
full | Frame 0 and any keyframe | Complete serialized XML of the render tree |
delta | Frames between keyframes | JSON patch (set, insert, remove ops) relative to the previous frame |
keyframe | Every 50th frame | Same as full; resets the delta chain |
FRAME_KEYFRAME_INTERVAL = 50). Reading an arbitrary frame requires loading the nearest preceding keyframe and applying all deltas up to the target frame number. An in-memory LRU cache (up to 512 entries) stores reconstructed frame XML so repeated reads of hot frames are free.
Delta encoding uses a structural diff algorithm that walks the XML JSON tree, emitting set, insert, and remove operations. It is node-ID-aware: when comparing adjacent objects in the tree, it uses the id prop of element nodes as a stable identity anchor, so reordered elements produce insert/remove pairs rather than spurious updates.
Signal Persistence
Signals are external messages sent to a running workflow. When a signal arrives, Smithers writes it to_smithers_signals with an automatically allocated sequence number. You never pick the seq yourself — Smithers computes MAX(seq) + 1 inside a BEGIN IMMEDIATE transaction so two concurrent signals never collide.
Before inserting, the adapter checks whether an identical signal already exists (same runId, signalName, correlationId, payloadJson, receivedAtMs, and receivedBy). If a match is found, the existing seq is returned and no duplicate row is created. This deduplication prevents replay or retry from doubling signals.
Signal Query Filters
Querying signals supports four filters, all optional:| Filter | Column | Description |
|---|---|---|
signalName | signal_name | Match a specific signal type |
correlationId | correlation_id | Match a specific correlation key (supports null) |
receivedAfterMs | received_at_ms | Only signals received at or after this timestamp |
limit | — | Max rows to return (default 200) |
seq ASC, so you always see signals in arrival order.
Event Persistence
The_smithers_events table is the durable event journal for each run. Every SmithersEvent emitted during execution is persisted here with a sequential seq number that serves as the total ordering.
Auto-Sequence Allocation
Like signals, events get theirseq via SELECT COALESCE(MAX(seq), -1) + 1 inside a BEGIN IMMEDIATE transaction. This guarantees gap-free, monotonically increasing sequence numbers per run.
Insert Deduplication
Before inserting, the adapter checks for an existing row matching the samerunId, timestampMs, type, and payloadJson. If found, the existing seq is returned without creating a duplicate. This makes event insertion idempotent across retries.
Event Queue and Flush
For performance, events can be enqueued asynchronously viaemitEventQueued. The event is emitted to listeners and tracked immediately, but database and log-file persistence happens in a background promise chain (persistTail). Call flush() to await all queued persistence — the engine does this at task boundaries and run completion to ensure nothing is lost.
Sequence Start Override
TheEventBus constructor accepts a startSeq option, which sets the initial sequence counter. This is used on resume to continue from where the previous run left off, preventing sequence number collisions with already-persisted events.
Event History Queries
The adapter supports filtered history queries with these parameters:| Filter | Description |
|---|---|
afterSeq | Return events with seq > afterSeq |
limit | Max rows |
nodeId | Filter by $.nodeId inside the payload JSON |
types | Filter to specific event type strings |
sinceTimestampMs | Events at or after this timestamp |
countEventHistory method returns the count matching the same filters, useful for pagination.
Human Request Persistence
Human requests (form fills, confirmations, free-text prompts) are stored in_smithers_human_requests with lifecycle states: pending, answered, cancelled, expired.
Pending Inbox Query
listPendingHumanRequests returns all pending requests across all runs, joined with _smithers_runs and _smithers_nodes to include the workflowName, runStatus, and nodeLabel. Before returning, it automatically expires any requests whose timeoutAtMs has passed, transitioning them to expired status.
Answer Persistence
answerHumanRequest sets the response JSON, timestamp, and optional answeredBy field, transitioning the request from pending to answered. Only pending requests can be answered — the WHERE status = 'pending' clause prevents double-answering.
Cancellation
cancelHumanRequest transitions a pending request to cancelled. Like answering, it only operates on requests in pending status.
Cron Persistence
Cron schedules are stored in_smithers_cron and managed through the adapter:
| Operation | Method | Description |
|---|---|---|
| Create/Update | upsertCron | Inserts or updates a cron schedule by cronId |
| List | listCrons(enabledOnly?) | Returns all cron entries, optionally filtering to enabled = true |
| Track execution | updateCronRunTime | Updates lastRunAtMs, nextRunAtMs, and optional errorJson |
| Delete | deleteCron | Removes a cron entry by ID |
enabled flag allows disabling a schedule without deleting it. The lastRunAtMs and nextRunAtMs columns let the scheduler know when to fire next without recomputing from the cron pattern on every poll. If a scheduled run fails, the error is stored in errorJson on the cron row for diagnostics.
Run Lifecycle Management
Stale Run Claims
The supervisor detects stale runs by querying_smithers_runs for rows with status = 'running' whose heartbeat_at_ms is older than the stale threshold (default 30 seconds). To safely resume a stale run without races, the supervisor uses a compare-and-swap pattern:
-
Claim:
claimRunForResumeatomically setsruntime_owner_idandheartbeat_at_msonly if the current values match the expected stale state. TheWHEREclause checksruntime_owner_id,heartbeat_at_ms, and the stale threshold in a singleUPDATE, and returns whether the row was modified. -
Release: If the supervisor decides not to resume after claiming,
releaseRunResumeClaimrestores the originalruntime_owner_idandheartbeat_at_ms, but only if the claim is still held (the currentruntime_owner_idmatches the claimer).
Sandbox Tracking
Sandbox sessions (bubblewrap, Docker, or Codeplane) are tracked in_smithers_sandboxes. The adapter upserts sandbox rows keyed by (runId, sandboxId), recording runtime type, configuration, status, shipping and completion timestamps, and bundle paths.
Output Edge Cases
Payload-Only Tables
When an output table’s only non-bookkeeping column ispayload, Smithers detects it and wraps the entire agent output into that single column instead of spreading fields across multiple columns. This is useful for unstructured or polymorphic outputs where a fixed column set does not make sense.
Boolean Column Coercion
Bun’s SQLite driver returns raw0/1 integers for columns declared with { mode: "boolean" } in Drizzle. When loading output snapshots, Smithers detects these columns by inspecting the Drizzle table metadata and coerces the integer values to proper JavaScript booleans. Without this, strict equality checks like value === true would fail.
Schema Key Aliasing
When loading outputs vialoadOutputs, each result set is stored under both the schema key (e.g., "analysis") and the actual SQLite table name (e.g., "analysis" or a custom name). This dual indexing lets downstream code reference outputs by either name, which matters when schema keys and table names diverge (e.g., with custom Drizzle tables).
Snapshot Persistence
Loading a complete workflow snapshot (loadInput + loadOutputs) reconstructs the full ctx state from SQLite. The input is loaded by filtering the input table for the current runId. Outputs are loaded by iterating every schema key, querying each table for rows matching the runId, applying boolean coercion, and indexing under both schema key and table name.
This snapshot is what powers resume: when a crashed run restarts, the snapshot populates ctx so the JSX tree renders with all completed outputs already in place.
Transaction Internals
Read Gating
Reads, not just writes, also acquire a turn in thetransactionTail promise queue. This prevents reads from seeing intermediate state during a multi-statement transaction. If the current fiber already owns the active transaction, reads proceed immediately without acquiring a new turn.
Commit Retry
The entirewithTransaction call is wrapped in withSqliteWriteRetryEffect. If the COMMIT (or BEGIN IMMEDIATE) fails with SQLITE_BUSY or an I/O error, the retry mechanism rolls back and retries the full transaction from BEGIN, using the same exponential backoff as standalone writes.
Why the Separation Matters
Ask two questions about any completed task: Your workflow output answers: what did this task produce? Smithers metadata answers:- when did it run?
- how many attempts did it take?
- was it cached?
- did it wait for approval?
- which loop iteration produced it?
Schema Changes
Changing a Zod output schema is not just a prompt tweak. It is a persistence change. The table on disk has to match the schema in code. Typical examples:- adding a field
- removing a field
- changing a field type
- tightening validation rules
Direct Queries
Smithers does not hide SQLite from you. The database is right there. Open it, poke around, write queries. Use output tables when you care about business results. Use_smithers_* tables when you care about execution history.
This is one of the advantages of keeping the layers separate: you can hand your output tables to an analyst who has never heard of Smithers, and the data makes sense on its own.
Mental Model
When in doubt, apply this rule of thumb:ctx.inputis run-scoped input- output tables hold validated task results
_smithers_*tables hold orchestration state
_smithers_*. The line is clean. Keep it that way.
Next Steps
- Execution Model — See how these tables participate in render, scheduling, and resume.
- Structured Output — Validation and persistence details for task outputs.
- Debugging — Query the internal tables directly when a run behaves unexpectedly.