Database
-
Tracked SQLite schema migrations. Internal schema setup now records
applied migrations in
_smithers_schema_migrationsand runs the DB migration path idempotently on startup. The legacy startup pseudo-migration code was moved into the DB package, including the rebuild that restores missing run foreign keys on split schema tables. -
SqlMessageStoragewas split into a lowercase module. The oldSqlMessageStorage.jsentry is now a compatibility shim oversql-message-storage.js, keeping imports stable while making the implementation easier to maintain. - Node diff cache upserts are safer. Cache writes now use a stricter upsert path and have regression coverage for repeated writes and schema setup.
Runtime And Engine
- Durability and operator flows were hardened. Runtime recovery, workflow metadata generation, and operator-console paths received regression fixes before landing this release.
- Workflow metadata and skill generation are now first-class. The CLI can discover richer workflow metadata and generated workflow skill files from the seeded workflow pack.
- Cache policy logic was extracted and tested. Engine cache scoping, TTL behavior, and schema validation now live behind a focused cache policy module with unit and integration coverage.
- Hot watch is more robust. The hot reload watcher now handles rapid file changes and rebuild boundaries more defensively.
-
jumpToFrameis hardened. Time-travel frame jumps now preserve the expected audit and state invariants more reliably.
CLI
-
Antigravity CLI support landed for Google agent workflows.
Smithers now includes
AntigravityAgent, CLI detection, init templates, hijack support, account-provider environment wiring, trace normalization, and docs for theagyCLI.GeminiAgentandGeminiAgentOptionsremain available for legacy and enterprise Gemini CLI setups, but are now marked deprecated in favor of Antigravity for new Google CLI integrations. -
JSON arguments are preflighted before workflow modules load.
Malformed
--inputand--annotationsvalues now fail with Smithers errors instead of surfacing raw runtime stack traces. -
--input -and--annotations -read JSON from stdin. Stdin JSON is capped at 1 MiB, parsed before detached child processes are spawned, and documented in the CLI reference. - Raw JSON stdout is preserved. JSON-format command output avoids accidental human formatting so automation can parse it reliably.
- Argument parsing helpers were split out. Shared argv and JSON parsing utilities reduce duplicated command handling and make flag behavior easier to test.
-
Architecture budgets are enforced.
scripts/check-architecture-budget.mjsnow guards major CLI, engine, and Gateway files from growing past agreed line-count budgets.
Gateway And Control Plane
-
New
@smithers-orchestrator/control-planepackage. Hosted deployments now have a tested SQLite store for organizations, teams, projects, billing records, identity providers, usage events and limits, secret manager references, and audit export. -
Facade export for hosted control-plane APIs. Consumers can import
ControlPlaneStorethroughsmithers-orchestrator/control-planeor the scoped package. - Default Gateway console. Gateway can now mount a built-in operator UI for workflow inventory, active runs, approvals, and common run actions. The UI was extracted into focused auth, bundle, and default-console modules so custom Gateway apps have a cleaner integration point.
- Production hardening docs. Deployment docs now cover durable storage, Gateway tokens, sandbox boundaries, cache policy, audit trail retention, and release checks.
Sandbox
-
<Sandbox>now supports injectable providers. Workflow authors can pass a provider object or registered provider id instead of hardcoding a runtime such as Docker. Provider-backed sandboxes run remotely, return a validated result bundle, and record the same sandbox lifecycle events as built-in transports. -
Sandbox result bundles can carry
diffBundles. Providers may return a structured result with output, remote ids, artifacts, logs, and adiffBundle; Smithers materializes the bundle, review-gates changes, and applies accepted diffs through the engine diff-bundle path. - Runtime selection now fails closed. Unknown runtimes are rejected, and Docker no longer silently falls back to bubblewrap when Docker is unavailable. The legacy local transport path still defaults to bubblewrap only when no provider and no runtime are supplied.
-
Nested sandboxes are explicit. Sandbox execution tracks parent sandbox
context and rejects nested sandboxes unless the nested component opts in
with
allowNested, making diff-base, cleanup, quota, and secret-boundary risks visible at the API boundary. -
Freestyle is documented as a third-party sandbox provider. The new
examples/freestyle/adapter shows how a provider can create a Freestyle VM, write request files with the VM file API, runvm.exec(), read a result JSON file, and return a Smithers sandbox bundle. The sandbox docs now use this as the provider-extension example. - Process runner transport. Sandbox execution can now use a process-backed runner with request/result bundle boundaries and persisted sandbox metadata.
- Bundle safety was tightened. Bundle manifests, produced diffs, artifact paths, and cleanup behavior now have stronger path containment, size-boundary, and review-decision coverage.
Eval Suite And DevTools
-
Workflow eval suites landed in the CLI.
bunx smithers-orchestrator evalcan run workflow cases, write reports, detect duplicate run IDs, dry-run plans, and evaluate exact or partial output assertions. -
Eval assertions are more flexible.
outputContainsnow matches array entries outside prefix position, and docs cover thecontinuedstatus. - DevTools tree utilities gained nested ordering coverage. Tests now assert depth-first task collection through nested containers.
Demo, Docs, And CI
-
Keyboard-driven demo deck.
.smithers/scripts/run-demo.shnow launches a 35-slide terminal deck with keyboard navigation, replay, mute, auto mode, and a live durability/time-travel sequence. - Dynamic demo workflow. A lightweight dynamic workflow was added for smoke-testing task graph behavior without running the full deck.
-
Demo output is cleaner for recording. The live deck no longer inherits
NO_COLORinto forced-color child commands, avoiding Bun color warnings during the durability slide. - Workflow catalog and docs were refreshed. The seeded workflow catalog, MCP/server docs, caching docs, eval quickstart, and quickstart copy were updated alongside the new behavior.
- CI now runs the test gate on pull requests. The GitHub workflow includes the repository test job, and agent timeout tests were hardened so idle timeout coverage is less timing-sensitive.