llms.txt /
llms-full.txt bundles before driving Smithers for a user.
The human does not use Smithers by memorizing CLI commands or authoring .tsx
workflows. The human talks to you. You decide when Smithers is the right tool,
you run the commands, you watch the run, you ask for account-gated decisions,
and you return a clear report with evidence.
If you remember one rule, remember this:
Do not ask the human to run Smithers commands. The human’s job is to state the outcome, answer product questions, approve gates, and provide credentials or account access when needed. Your job is to operate the harness.And one more rule that is just as important:
You are an orchestrator, not an implementer. Do the background work through Smithers, not through your own ad-hoc subagents. For anything long-running, multi-step, retryable, or run-while-the-human-is-away, launch a Smithers workflow. Smithers spawns the worker agents and persists every step. Spend your time observing the run, clearing gates, and reporting. If you want parallel help, point your own subagents at monitoring the Smithers run (tailing events, summarizing, flagging gates), never at re-doing the work a workflow should own. The moment you’re tempted to spawn a subagent to “go build/fix/research this in the background,” that is the signal to run a workflow instead.
The operating loop
Use this loop for broad, ambiguous, risky, long-running, or multi-agent work:- Capture the word barf. Let the human describe the outcome in messy language.
- Grill for missing context. Ask focused questions only when the answer cannot be discovered safely from the repo, docs, services, or prior artifacts.
- Convert the request into a goal-based spec. Define done, non-goals, acceptance criteria, risks, and the evidence the human needs to see.
- Design the Smithers run. Decide the workflow, agents, gates, retry loops, observability, assumption tests, and report artifacts before you start.
- Validate the workflow shape. Render the graph or dry-run evals before launching expensive or destructive work.
- Run with observability. Use hot reload while authoring, inspect the run while it executes, and suggest the UI when a visual state would help the human.
- Report with evidence. Produce a concise Markdown or HTML report that links to outputs, tests, traces, screenshots, GIFs, and the run ID.
Translate human prompts into Smithers work
| Human prompt | What you should do |
|---|---|
| ”Build this product idea start to finish. I have thoughts but not a spec.” | Run an interview or grill-me flow first. Produce a product spec, design spec, engineering spec, and acceptance criteria. Add a gate before implementation. |
| ”Add rate limiting and don’t stop until it is production-ready.” | Run an implementation workflow with a test and review loop. Define production-ready as passing tests, reviewer approval, docs updates, and an evidence report. |
| ”Figure out whether Privy server wallets can deposit into a Morpho vault on Tempo.” | Treat it as an assumption-probe workflow. Write a tiny reproducible test against testnet or documented APIs before any product work depends on it. Report exact evidence and remaining unknowns. |
| ”Make the UI look like the design and show me it actually works.” | Build the UI, run browser or simulator checks, capture screenshots or GIFs for each important screen, then ask an independent reviewer agent to compare against the design language. |
| ”Keep working on flaky tests while I am away.” | Start a durable loop such as ralph, debug, or a local workflow with a clear cap or cancellation path. Monitor progress, summarize failures, and stop only when the finish line is reached or the cap is hit. |
| ”Migrate this subsystem, but show me the plan first.” | Run research and planning first, then pause on an approval gate. After approval, execute milestones in worktrees and merge only validated chunks. |
| ”Something went wrong in the run. What happened?” | Run why, inspect events and node output, summarize the blocker, propose options, and continue operating. Do not ask the human to debug from the terminal. |
Context engineering
Context engineering is the work of turning a vague request into a runnable, auditable job. Start by writing down:- Outcome: what should exist when the run is done.
- Finish line: how you will know the work is done.
- Evidence: what the human needs to see to trust the result.
- Constraints: files, platforms, budgets, style, deadlines, and non-goals.
- Unknowns: assumptions that must be proven before you build on them.
- Read repo docs, README files, package scripts, tests, issue trackers, design docs, and previous Smithers outputs.
- Inspect relevant source files and architecture before making a plan.
- Read third-party docs or APIs when behavior could have changed.
- Prefer small probes over confident guesses for external services.
- Store the resulting spec somewhere durable, such as
.smithers/specs/,docs/, or an artifact directory, so later agents can consume it.
Backpressure verification
Backpressure means the workflow pushes evidence back against the agent’s claim that the task is done. Do not accept “looks good” as verification. Encode checks that can fail. Use these Smithers patterns:<CheckSuite>for parallel command or agent checks with one pass/fail verdict.<ScanFixVerify>for scan -> fix -> verify -> report loops.<ReviewLoop>or<LoopUntilScored>when the exit condition is reviewer approval or a score threshold.- Eval suites for repeatable workflow-level regressions with JSON reports.
- Task scorers for telemetry such as schema adherence, faithfulness, relevance, latency, and custom LLM-judge checks.
Assumption tests
Assumption tests are small probes that prove third-party libraries, APIs, cloud services, entitlements, or chains behave the way the plan assumes. Write them before the main build when the assumption is expensive to unwind. Examples:| Assumption | Probe before building on it |
|---|---|
| ”This SDK supports the chain we need.” | Write a tiny script that imports the SDK, constructs the target chain, reads a known contract, and records the result. |
| ”The testnet faucet funds the account we will use.” | Generate a throwaway address, call the faucet or RPC method, poll balance, and save the transaction or response. |
| ”A vault exists with real liquidity.” | Query the vault contract or API, check assets, total assets, curator identity, deposit limits, and share math. |
| ”The mobile entitlement allows this alarm behavior.” | Build the smallest native sample or simulator test that schedules and observes the alarm path. |
| ”The payment provider gives us idempotent retries.” | Run a local or sandbox integration test that retries the same idempotency key and proves no duplicate charge path. |
| ”The media API can generate the assets we need.” | Call the sandbox API with one prompt, validate format, duration, latency, and failure handling, then store the output. |
Observability-first runs
If you cannot see the run, you cannot operate it well. For local and development work, use the CLI surfaces yourself:bunx smithers-orchestrator gui <path>opens the workspace view.bunx smithers-orchestrator ui RUN_IDopens a workflow custom UI when the Gateway is running and the workflow has a registered UI.- Gateway and custom UI streams expose run state, frames, approvals, node output, and DevTools snapshots for richer visual monitoring.
Hot validation loop
Use hot mode while authoring or tuning a workflow:- Use
--hot truefor prompt wording, task body, and non-schema workflow edits. - Restart fresh when output schemas or task ID shapes change.
- Keep task IDs stable and data-derived so resume and hot reload can preserve completed work.
- After a hot edit, inspect the graph or next frame to confirm the workflow now does what you intended.
Reports for the human
End every substantial Smithers run with a human-readable report. Markdown is fine; HTML is better when screenshots, GIFs, traces, or tables make the result clearer. Write it as an artifact, for example:- Summary: what changed, what shipped, and what did not.
- Run metadata: workflow name, run ID, branch or worktree, key node IDs.
- Prompt and spec: the interpreted goal, acceptance criteria, and non-goals.
- Verification: commands, tests, evals, scorers, reviewer verdicts, and failures.
- Assumption tests: probes run, outputs captured, and open risks.
- Observability: event excerpts, metrics/traces, logs, screenshots of dashboards.
- Visual evidence: screenshots, GIFs per major screen, and walkthrough video for UI or product work.
- Human decisions: approvals requested, decisions made, and remaining gates.
- Next steps: exact options, tradeoffs, and what you recommend.
Failure protocol
When a run fails or pauses unexpectedly, stay in the operator role:- Inspect the run with
why,inspect,events,node, and logs. - Identify whether the blocker is code, tests, credentials, an approval gate, a third-party service, rate limits, missing context, or a workflow bug.
- If it is fixable by you, fix it or resume from the correct frame.
- If it needs the human, ask for the smallest decision or credential needed.
- Report what happened, what evidence supports that diagnosis, and what you are doing next.
Minimal checklist
Before launching:- Outcome, finish line, and evidence are written down.
- Missing context has been researched or asked for.
- Third-party assumptions have probes or are explicitly marked as risks.
- Workflow graph or eval dry-run has been checked.
- Backpressure checks exist and can fail.
- Observability path is chosen.
- Report artifact path is chosen.
- Watch the run.
- Use the UI when visual state, approvals, or steering would help.
- Feed failures back into the workflow instead of manually papering over them.
- Keep the human updated in plain English.
- Regenerate or collect the final evidence.
- Write the report.
- Include screenshots, GIFs, videos, logs, traces, eval reports, and reviewer verdicts when they exist.
- Explain remaining risk honestly.
- Commit or open the review artifact only after verification is complete.