SmithersEvent objects throughout a run. Subscribe via onProgress in runWorkflow, or read persisted events from NDJSON log files.
Events serve as the durable replay/audit log, correlate with structured logs through runId/nodeId/attempt, and drive built-in lifecycle counters. For OTLP export and Prometheus/Grafana setup, see Observability.
Subscribing
onProgress Callback
NDJSON Log Files
Events are appended as JSON lines to:logDir in runWorkflow or --log-dir / --no-log in the CLI.
Event-Driven Metrics
| Event | Metric |
|---|---|
RunStarted | smithers.runs.total |
NodeStarted | smithers.nodes.started |
NodeFinished | smithers.nodes.finished |
NodeFailed | smithers.nodes.failed |
| Approval events | Approval counters |
trackSmithersEvent from smithers-orchestrator/observability exposes this mapping for custom integrations.
Common Fields
EverySmithersEvent:
| Field | Type | Description |
|---|---|---|
type | string | Event type discriminator. |
runId | string | Run this event belongs to. |
timestampMs | number | Unix timestamp in milliseconds. |
| Field | Type | Description |
|---|---|---|
nodeId | string | Task node ID. |
iteration | number | Loop iteration number. |
| Field | Type | Description |
|---|---|---|
attempt | number | Attempt number (starts at 1). |
Event Types
Supervisor
SupervisorStarted
Emitted when the supervisor process starts polling for stale runs.pollIntervalMs: How often the supervisor checks for stale runs. staleThresholdMs: Age after which a run is considered stale.
SupervisorPollCompleted
Emitted after each supervisor poll cycle.staleCount: Runs found to be stale. resumedCount: Runs successfully auto-resumed. skippedCount: Stale runs skipped (e.g. process still alive). durationMs: Wall time for this poll cycle.
Run Lifecycle
RunStarted
Emitted once at the beginning of every run (including resumes).RunStatusChanged
RunStatus: "running" | "waiting-approval" | "waiting-event" | "waiting-timer" | "finished" | "continued" | "failed" | "cancelled".
RunFinished
RunFailed
RunCancelled
RunAutoResumed
Emitted by the supervisor when a stale run is automatically restarted.lastHeartbeatAtMs: Unix ms of the last recorded heartbeat, or null if no heartbeat was recorded. staleDurationMs: How long the run had been stale before resumption.
RunAutoResumeSkipped
Emitted when the supervisor decided not to resume a stale run.reason: "pid-alive" — the original process is still running; "missing-workflow" — workflow file could not be located; "rate-limited" — resumption was throttled.
RunContinuedAsNew
Emitted when a long-running workflow continues as a fresh run, carrying forward state.newRunId: The run ID of the continuation. carriedStateSize: Byte size of the state passed to the new run. ancestryDepth: How many continuation hops have occurred (omitted on first continuation).
RunForked
Emitted when a run is forked from a parent run’s snapshot for time-travel or branching.parentRunId: The run this fork originated from. parentFrameNo: Frame number in the parent run where the fork was taken. branchLabel: Optional human-readable label for the branch.
ReplayStarted
Emitted when a run begins replaying from a parent run’s snapshot.parentRunId: The run being replayed from. parentFrameNo: Snapshot frame to replay from. restoreVcs: Whether VCS state was restored as part of the replay.
Frame Events
FrameCommitted
Emitted each time the engine renders a new frame.xmlHash: SHA-256 hex digest of the canonicalized XML tree.
Snapshot
SnapshotCaptured
Emitted when the engine captures a point-in-time snapshot of the workflow frame, enabling time-travel and forking.frameNo: The frame this snapshot was taken at. contentHash: Hash of the snapshot content, used to detect duplicate snapshots.
Node Lifecycle
NodePending
Task identified, waiting to be scheduled.NodeStarted
NodeFinished
NodeFailed
NodeCancelled
reason may be "unmounted" if the task disappeared from the tree after re-render.
NodeSkipped
NodeRetrying
Fires before the next attempt starts.attempt is the upcoming attempt number.
NodeWaitingApproval
NodeWaitingTimer
Emitted when a node is suspended waiting for a timer to fire.firesAtMs: Unix ms when the timer is scheduled to fire.
Approval Events
ApprovalRequested
ApprovalGranted
ApprovalAutoApproved
Emitted when an approval is granted automatically by a configured policy without human intervention.ApprovalDenied
Tool Events
ToolCallStarted
seq: sequential counter for tool calls within the attempt.
ToolCallFinished
Output Events
NodeOutput
Streaming text from an agent.Timer Events
TimerCreated
Emitted when a durable timer is registered with the engine.timerId: Stable identifier for this timer. firesAtMs: Unix ms when the timer will fire. timerType: "duration" — created from a relative delay; "absolute" — created from a specific wall-clock time.
TimerFired
Emitted when a timer fires and resumes its waiting node.firesAtMs: Scheduled fire time. firedAtMs: Actual fire time. delayMs: Difference between actual and scheduled fire time; non-zero indicates scheduler lag.
TimerCancelled
Emitted when a timer is cancelled before it fires.Task Heartbeat Events
TaskHeartbeat
Emitted periodically by long-running tasks to signal they are still alive.hasData: Whether the heartbeat carries a checkpoint payload. dataSizeBytes: Byte size of any checkpoint data. intervalMs: Configured heartbeat interval, if set.
TaskHeartbeatTimeout
Emitted when a task fails to send a heartbeat within its configured timeout window.lastHeartbeatAtMs: Unix ms of the last heartbeat received before timeout. timeoutMs: The configured timeout duration.
Sandbox Events
SandboxCreated
Emitted when a sandboxed execution environment is provisioned.sandboxId: Unique identifier for this sandbox instance. runtime: The isolation backend used. configJson: JSON-serialized sandbox configuration.
SandboxShipped
Emitted when the initial code bundle has been uploaded to the sandbox.bundleSizeBytes: Size of the uploaded bundle in bytes.
SandboxHeartbeat
Emitted periodically while a sandbox is executing to indicate liveness.remoteRunId: Run ID assigned by the remote sandbox environment, if available. progress: Optional 0–1 progress fraction reported by the sandbox.
SandboxBundleReceived
Emitted when the sandbox returns an output bundle to the orchestrator.bundleSizeBytes: Size of the received bundle. patchCount: Number of file patches included in the bundle. hasOutputs: Whether structured task outputs were included.
SandboxCompleted
Emitted when a sandbox execution finishes (regardless of outcome).status: Final execution status. durationMs: Total sandbox execution time.
SandboxFailed
Emitted when a sandbox encounters an unrecoverable error.SandboxDiffReviewRequested
Emitted when a sandbox produces patches that require human review before being applied.patchCount: Number of patches awaiting review. totalDiffLines: Total lines across all diffs.
SandboxDiffAccepted
Emitted when a human reviewer accepts the sandbox’s proposed patches.SandboxDiffRejected
Emitted when a human reviewer rejects the sandbox’s proposed patches.reason: Optional explanation for the rejection.
Revert Events
RevertStarted
RevertFinished
Retry / Time-Travel Events
RetryTaskStarted
Emitted when a manual or programmatic retry is initiated for a specific task node.resetDependents: Whether nodes that depend on this task are also being reset. resetNodes: Full list of node IDs being cleared as part of this retry.
RetryTaskFinished
Emitted when the retry operation completes.resetNodes: Node IDs that were actually reset. error: Set if the retry operation itself failed (not the retried task).
TimeTravelStarted
Emitted when a time-travel operation begins, rewinding the run to a prior state.jjPointer: VCS change identifier to restore to, if VCS state is being rewound.
TimeTravelFinished
Emitted when the time-travel operation completes.vcsRestored: Whether VCS state was successfully rewound. resetNodes: Node IDs that were cleared as part of the rewind. error: Set if time-travel failed.
Voice Events
VoiceStarted
Emitted when a voice operation begins.operation: "speak" for text-to-speech; "listen" for speech-to-text. provider: The voice provider in use (e.g. "openai", "elevenlabs").
VoiceFinished
Emitted when a voice operation completes successfully.durationMs: Wall time for the voice operation.
VoiceError
Emitted when a voice operation fails.RAG Events
RagIngested
Emitted after documents are chunked and embedded into a vector store namespace.documentCount: Number of source documents ingested. chunkCount: Number of chunks stored after splitting. namespace: The vector store namespace written to.
RagRetrieved
Emitted after a semantic search query completes.query: The query string submitted. resultCount: Number of chunks returned. topScore: Similarity score of the highest-ranked result.
Memory Events
MemoryFactSet
Emitted when a key-value fact is written to the memory store.namespace: Memory namespace the fact belongs to. key: Key under which the fact was stored.
MemoryRecalled
Emitted when the memory store is queried for relevant facts.query: The recall query. resultCount: Number of facts returned.
MemoryMessageSaved
Emitted when a conversation message is persisted to memory.threadId: Identifier of the conversation thread. role: Message role (e.g. "user", "assistant").
OpenAPI Events
OpenApiToolCalled
Emitted when a generated OpenAPI tool executes an HTTP operation.operationId: The OpenAPI operationId of the called operation. method: HTTP method (e.g. "GET", "POST"). path: URL path template. durationMs: Round-trip duration. status: Whether the HTTP call succeeded or errored.
Hot Reload
WorkflowReloadDetected
WorkflowReloaded
generation: monotonically increasing reload counter.
WorkflowReloadFailed
WorkflowReloadUnsafe
Scorer Events
ScorerStarted
Emitted when a scorer begins evaluating a task’s output.scorerId: Unique identifier of the scorer. scorerName: Human-readable scorer name.
ScorerFinished
Emitted when a scorer completes successfully.score: The 0–1 normalized score produced by the scorer.
ScorerFailed
Emitted when a scorer throws an error during evaluation.error: The error thrown by the scorer. Scorer failures never fail the parent task — they are logged and the workflow continues.
See Evals & Scorers for the full scoring system documentation.
Quick Reference
| Event Type | Section | Extra Fields |
|---|---|---|
SupervisorStarted | Supervisor | pollIntervalMs, staleThresholdMs |
SupervisorPollCompleted | Supervisor | staleCount, resumedCount, skippedCount, durationMs |
RunStarted | Run Lifecycle | — |
RunStatusChanged | Run Lifecycle | status |
RunFinished | Run Lifecycle | — |
RunFailed | Run Lifecycle | error |
RunCancelled | Run Lifecycle | — |
RunAutoResumed | Run Lifecycle | lastHeartbeatAtMs, staleDurationMs |
RunAutoResumeSkipped | Run Lifecycle | reason |
RunContinuedAsNew | Run Lifecycle | newRunId, iteration, carriedStateSize, ancestryDepth? |
RunForked | Run Lifecycle | parentRunId, parentFrameNo, branchLabel? |
ReplayStarted | Run Lifecycle | parentRunId, parentFrameNo, restoreVcs |
FrameCommitted | Frame Events | frameNo, xmlHash |
SnapshotCaptured | Snapshot | frameNo, contentHash |
NodePending | Node Lifecycle | nodeId, iteration |
NodeStarted | Node Lifecycle | nodeId, iteration, attempt |
NodeFinished | Node Lifecycle | nodeId, iteration, attempt |
NodeFailed | Node Lifecycle | nodeId, iteration, attempt, error |
NodeCancelled | Node Lifecycle | nodeId, iteration, attempt?, reason? |
NodeSkipped | Node Lifecycle | nodeId, iteration |
NodeRetrying | Node Lifecycle | nodeId, iteration, attempt |
NodeWaitingApproval | Node Lifecycle | nodeId, iteration |
NodeWaitingTimer | Node Lifecycle | nodeId, iteration, firesAtMs |
ApprovalRequested | Approval | nodeId, iteration |
ApprovalGranted | Approval | nodeId, iteration |
ApprovalAutoApproved | Approval | nodeId, iteration |
ApprovalDenied | Approval | nodeId, iteration |
ToolCallStarted | Tool | nodeId, iteration, attempt, toolName, seq |
ToolCallFinished | Tool | nodeId, iteration, attempt, toolName, seq, status |
NodeOutput | Output | nodeId, iteration, attempt, text, stream |
TimerCreated | Timer | timerId, firesAtMs, timerType |
TimerFired | Timer | timerId, firesAtMs, firedAtMs, delayMs |
TimerCancelled | Timer | timerId |
TaskHeartbeat | Task Heartbeat | nodeId, iteration, attempt, hasData, dataSizeBytes, intervalMs? |
TaskHeartbeatTimeout | Task Heartbeat | nodeId, iteration, attempt, lastHeartbeatAtMs, timeoutMs |
SandboxCreated | Sandbox | sandboxId, runtime, configJson |
SandboxShipped | Sandbox | sandboxId, runtime, bundleSizeBytes |
SandboxHeartbeat | Sandbox | sandboxId, remoteRunId?, progress? |
SandboxBundleReceived | Sandbox | sandboxId, bundleSizeBytes, patchCount, hasOutputs |
SandboxCompleted | Sandbox | sandboxId, remoteRunId?, runtime, status, durationMs |
SandboxFailed | Sandbox | sandboxId, runtime, error |
SandboxDiffReviewRequested | Sandbox | sandboxId, patchCount, totalDiffLines |
SandboxDiffAccepted | Sandbox | sandboxId, patchCount |
SandboxDiffRejected | Sandbox | sandboxId, reason? |
RevertStarted | Revert | nodeId, iteration, attempt, jjPointer |
RevertFinished | Revert | nodeId, iteration, attempt, jjPointer, success, error? |
RetryTaskStarted | Retry / Time-Travel | nodeId, iteration, resetDependents, resetNodes |
RetryTaskFinished | Retry / Time-Travel | nodeId, iteration, resetNodes, success, error? |
TimeTravelStarted | Retry / Time-Travel | nodeId, iteration, attempt, jjPointer? |
TimeTravelFinished | Retry / Time-Travel | nodeId, iteration, attempt, jjPointer?, success, vcsRestored, resetNodes, error? |
VoiceStarted | Voice | nodeId, iteration, operation, provider |
VoiceFinished | Voice | nodeId, iteration, operation, provider, durationMs |
VoiceError | Voice | nodeId, iteration, operation, provider, error |
RagIngested | RAG | documentCount, chunkCount, namespace |
RagRetrieved | RAG | query, resultCount, namespace, topScore |
MemoryFactSet | Memory | namespace, key |
MemoryRecalled | Memory | namespace, query, resultCount |
MemoryMessageSaved | Memory | threadId, role |
OpenApiToolCalled | OpenAPI | operationId, method, path, durationMs, status |
AgentEvent | Output | nodeId, iteration, attempt, engine, event |
WorkflowReloadDetected | Hot Reload | changedFiles |
WorkflowReloaded | Hot Reload | generation, changedFiles |
WorkflowReloadFailed | Hot Reload | error, changedFiles |
WorkflowReloadUnsafe | Hot Reload | reason, changedFiles |
RunHijackRequested | Run Lifecycle | target? |
RunHijacked | Run Lifecycle | nodeId, iteration, attempt, engine, mode, resume?, cwd |
ScorerStarted | Scorer | nodeId, scorerId, scorerName |
ScorerFinished | Scorer | nodeId, scorerId, scorerName, score |
ScorerFailed | Scorer | nodeId, scorerId, scorerName, error |
TokenUsageReported | Output | nodeId, iteration, attempt, model, agent, inputTokens, outputTokens, cacheReadTokens?, cacheWriteTokens?, reasoningTokens? |
AlertFired | Alert Lifecycle | alertId, policyName, severity, fingerprint, nodeId?, iteration?, message |
AlertAcknowledged | Alert Lifecycle | alertId, acknowledgedBy? |
AlertSilenced | Alert Lifecycle | alertId, silencedUntilMs? |
AlertResolved | Alert Lifecycle | alertId, resolvedBy? |
AlertReopened | Alert Lifecycle | alertId, fingerprint, occurrenceCount |
AlertEscalated | Alert Lifecycle | alertId, fromSeverity, toSeverity |
BudgetExceeded | Normalized Input | nodeId, iteration, budgetType, limit, actual |
ProviderDisconnected | Normalized Input | nodeId, iteration, attempt, engine |
ProviderReconnected | Normalized Input | nodeId, iteration, attempt, engine |
Persistence
Events are persisted in two places:- SQLite —
_smithers_eventstable with sequentialseqnumber. Source of truth. - NDJSON —
stream.ndjsonin the run’s log directory. Best-effort.
onProgress fires synchronously before persistence.
Related
- runWorkflow — Where
onProgressis configured. - Monitoring and Logs — Practical monitoring guide.
- CLI — View run status and frames.