Skip to main content
This page covers storage durability, access scoping, execution isolation, secrets handling, cache policy, and audit for production Gateway deployments. Treat the Gateway, database, workflow modules, and sandbox workers as one operational boundary.

Required Checks

Run these checks before promoting a workflow module:
pnpm check:effect
pnpm check:deps
pnpm check:architecture
pnpm -r typecheck
pnpm test
The default CI workflow (.github/workflows/ci.yml) runs check:effect, check:deps, typecheck, and test on pull requests and pushes to main. check:architecture is a local pre-promotion gate and is not enforced by CI.

Persistence

Use one SQLite database file per deployment and place it on durable storage, or run against PostgreSQL for managed, multi-connection storage (see PostgreSQL and PGlite below). The internal tables are created and migrated idempotently on startup on either backend. Recommended database practices:
  • Back up the database file and its WAL files together.
  • Keep PRAGMA foreign_keys = ON; Smithers enables it during schema setup and uses referential checks for core run artifacts.
  • Keep run IDs stable across resume attempts.
  • Use the Gateway event stream sequence numbers for reconnects; clients should resume from the last seen seq.
  • Avoid manually deleting internal rows. Delete whole runs through supported administrative paths so dependent frames, node diffs, and audit rows stay consistent.

PostgreSQL and PGlite

createSmithersPostgres(schemas, opts) runs the same durable engine, and the same crash-and-resume guarantees, on PostgreSQL or an embedded PGlite through the SQL dialect seam in packages/db/src/dialect.js. Point it at managed Postgres with { provider: "postgres", connectionString }, pass a node-postgres connection config with { provider: "postgres", connection }, or run an in-process PGlite with { provider: "pglite", dataDir }. The factory is async and returns the same createSmithers API plus a close() teardown. pg, @electric-sql/pglite, and @electric-sql/pglite-socket are optional dependencies installed only when you take this path; the default synchronous bun:sqlite path needs none of them. On Postgres, take database-native backups and connection-pool sizing in place of the SQLite file-and-WAL backup guidance above.

Access Control

Expose the Gateway only behind TLS. Use scoped bearer grants for automation and short TTLs for human-triggered actions. Recommended scopes by client type:
ClientTypical scopes
Run dashboardrun:read, approval:read
Launch automationrun:read, run:write
Approval inboxrun:read, approval:read, approval:submit
Operator toolsrun:read, run:write, approval:submit
Rotate token grants regularly and revoke grants when a user, CI job, or integration no longer needs access. For multi-tenant deployments, see Control Plane for org, project, usage, and audit primitives.

Execution Boundary

Sandbox workers run in an isolated environment so that untrusted workflow code cannot reach the Gateway database or host filesystem. The concrete controls are:
  • request and result bundles are written under the run sandbox directory
  • bundle manifests are size-bounded
  • patch and artifact paths are checked against path traversal
  • produced diffs require review unless autoAcceptDiffs is enabled
  • sandbox records and events are persisted for audit
Configure allowNetwork, container images, environment variables, ports, volumes, and CPU or memory limits per worker. Verify your chosen runtime (Docker, Kubernetes, etc.) actually enforces these limits before running untrusted code. For high-risk code generation, run sandbox workers in a separate account, namespace, or machine with no ambient production credentials.

Secrets

Never pass long-lived credentials through workflow input. Prefer short-lived tokens from the caller, scoped environment injection at the worker boundary, or a secret manager mounted only into the worker process that needs it. Operational rules:
  • Do not store provider keys in SQLite rows, run input, task output, or event payloads.
  • Redact logs before forwarding them to shared observability sinks.
  • Split launch permissions from approval permissions for workflows that can write files, create pull requests, or deploy.

Cache Policy

Use cache policy keys deliberately:
// Example: cache a step result across runs of the same workflow for up to 1 hour
cachePolicy: {
  scope: "workflow",
  ttlMs: 60 * 60 * 1000,
  version: "v2", // bump when prompt, model, or output semantics change
}
  • scope: "run" keeps reuse inside one run.
  • scope: "workflow" shares reuse across runs of the same workflow.
  • scope: "global" shares reuse across workflow names.
  • ttlMs bounds staleness; expired cache rows are treated as misses and refreshed.
  • version should change whenever prompt, model, provider, tool behavior, or output semantics change.
Cached payloads are still validated against the current output schema on every hit. A schema mismatch is a cache miss.

Audit Trail

For incident review, preserve:
  • Gateway access logs
  • Smithers run events
  • rows in _smithers_time_travel_audit, which record workflow rewind/replay events
  • sandbox bundle metadata and review decisions
  • approval decisions, notes, and actor IDs
  • deployment version and workflow module revision
Keep the audit log append-only from the perspective of normal operators.

Release Checklist

Before a production release:
  • CI is green on typecheck, dependency checks, and tests.
  • Database backups have been restored in a staging environment.
  • Gateway tokens are scoped and have bounded TTLs.
  • Sandbox runtime enforcement has been tested against the intended threat model.
  • Approval paths have a named owner and a fallback owner (see Approval).
  • Logs are retained long enough to investigate delayed workflow failures.