Skip to main content
Here is the uncomfortable truth about agent workflows: the orchestration code is easy. Getting agents to do what you actually want — consistently, correctly, every time — is the hard part. These practices address that hard part.

Give Agents Big, Coherent Tasks

Every task boundary is a context boundary. When one task ends and another begins, the agent forgets everything from the previous task. It starts fresh with only the prompt you give it and the structured output you pass in. So do not split one logical operation into four tiny tasks. You are not decomposing work — you are destroying context.
// assuming outputs from createSmithers

// Avoid: splitting one logical operation into many tiny tasks
<Sequence>
  <Task id="read-files" output={outputs.files} agent={codeAgent}>Read the config files</Task>
  <Task id="find-bugs" output={outputs.bugs} agent={codeAgent}>Find bugs in the files</Task>
  <Task id="fix-bugs" output={outputs.fixes} agent={codeAgent}>Fix the bugs you found</Task>
  <Task id="write-fixes" output={outputs.written} agent={codeAgent}>Write the fixes to disk</Task>
</Sequence>

// Better: one coherent task with tools
<Task id="fix-config-bugs" output={outputs.result} agent={codeAgentWithTools}>
  {`Analyze the config files in ${ctx.input.configDir}, find any bugs,
fix them, and write the corrected files. Return a summary of changes.`}
</Task>
The second version gives the agent the full picture and lets it use tools (read, edit, bash) to accomplish everything in one pass. Only split into multiple tasks when the context genuinely changes, you need an explicit checkpoint, or a later task depends on the structured output of an earlier one.

Use Measurable Stop Conditions for Loops

A <Loop> should stop based on a concrete, measurable signal — not a subjective judgment. Ask yourself: can a machine evaluate this condition without interpretation? Good stop conditions:
  • Tests passing (boolean)
  • Approval flag from a reviewer (boolean)
  • Score exceeding a threshold (number comparison)
  • All items in a list processed (array length check)
// Good: concrete stop condition
// assuming outputs from createSmithers
<Loop
  until={ctx.outputMaybe(outputs.review, { nodeId: "review" })?.approved === true}
  maxIterations={5}
  onMaxReached="return-last"
>
  <Sequence>
    <Task id="implement" output={outputs.implement} agent={implementer}>
      {`Implement the feature. Previous feedback: ${
        ctx.outputMaybe(outputs.review, { nodeId: "review" })?.feedback ?? "None yet"
      }`}
    </Task>
    <Task id="review" output={outputs.review} agent={reviewer}>
      {`Review the implementation. Approve if it meets requirements.
Return JSON with approved (boolean) and feedback (string).`}
    </Task>
  </Sequence>
</Loop>
Always set maxIterations. A loop without a cap is a bug waiting to burn your API budget at 3 AM.

Ask for Validation in Prompts

Do not assume the agent will run checks without being told. If you need tests, linting, or verification, say so explicitly in the prompt. Agents are literal-minded collaborators — they do what you ask, not what you hope.
// assuming outputs from createSmithers

// Vague -- agent might not verify
<Task id="implement" output={outputs.result} agent={codeAgent}>
  Fix the authentication bug.
</Task>

// Better -- explicit verification instructions
<Task id="implement" output={outputs.result} agent={codeAgentWithTools}>
  {`Fix the authentication bug in ${ctx.input.file}.

After making changes:
1. Run the test suite with \`bun test\`
2. Verify the specific failing test passes
3. Check that no other tests regressed

Return JSON with:
- summary (string): what you changed
- testsRun (number): how many tests ran
- testsPassed (number): how many passed
- filesChanged (string[]): list of modified files`}
</Task>
The second prompt leaves nothing to chance. The agent knows what to verify, how to verify it, and what to report back.

Request Structured Reports

Design your output schemas to capture the data you need for downstream tasks and human inspection. Every field should be something you can act on programmatically:
const { Workflow, smithers, outputs } = createSmithers({
  analysis: z.object({
    summary: z.string(),
    issuesFound: z.number(),
    criticalIssues: z.number(),
    filesAnalyzed: z.array(z.string()),
    recommendations: z.array(z.object({
      file: z.string(),
      line: z.number(),
      severity: z.enum(["low", "medium", "high", "critical"]),
      description: z.string(),
      suggestedFix: z.string(),
    })),
  }),
});
With this schema you can conditionally branch based on criticalIssues > 0, generate summary reports from structured data, track metrics across runs, and feed specific recommendations into a fix task. Free-text summaries give you none of that.

Use outputSchema for Type Safety

The outputSchema prop validates the agent’s response against your Zod schema. It does three things:
  1. Validation — Responses are validated against the schema, with auto-retry on failure.
  2. Auto-injection — When children are JSX/MDX elements, props.schema is auto-injected with a JSON example.
  3. Cache key — The schema shape is part of the cache key, so schema changes invalidate stale caches.
const reviewSchema = z.object({
  approved: z.boolean(),
  feedback: z.string().min(10),
  score: z.number().int().min(1).max(10),
});

// assuming outputs from createSmithers
<Task id="review" output={outputs.review} outputSchema={reviewSchema} agent={reviewer} deps={{ implement: outputs.implement }}>
  {(deps) => <ReviewPrompt code={deps.implement.code} />}
</Task>
If the agent returns { approved: "yes" } instead of { approved: true }, schema validation catches it and retries — without burning a full task retry.

Design for Resumability

Every long-running workflow will eventually crash. A network blip, a rate limit, a deploy that kills the process. If your workflow cannot resume from where it stopped, you are starting over from scratch every time. Three rules:
  • Use deterministic task IDs. No timestamps, no random strings, no array indices. If the ID changes between renders, Smithers treats it as a different task.
  • Make tasks idempotent where possible. If a task writes files, design it so re-running produces the same result.
  • Use deps for direct task handoff and ctx.outputMaybe() for orchestration decisions. This keeps prompt wiring terse while preserving explicit control-flow logic.
// Good: deterministic, conditional, resumable
// assuming outputs from createSmithers
export default smithers((ctx) => (
  <Workflow name="robust">
    <Sequence>
      <Task id="analyze" output={outputs.analysis} agent={analyst}>
        Analyze the codebase.
      </Task>
      <Task id="fix" output={outputs.fix} agent={fixer} deps={{ analyze: outputs.analysis }}>
        {(deps) => `Fix: ${deps.analyze.summary}`}
      </Task>
      <Task id="report" output={outputs.report} deps={{ fix: outputs.fix }}>
        {(deps) => ({ summary: deps.fix.explanation, filesChanged: deps.fix.files })}
      </Task>
    </Sequence>
  </Workflow>
));

Keep Prompts and Schemas Separate from Logic

As your workflow grows, you will want to iterate on prompts without touching orchestration logic, and swap agents without changing schemas. Separate your concerns:
  • schemas.ts — All Zod schemas in one file.
  • agents.ts — Agent configuration (model, system prompt, tools).
  • prompts/ — MDX prompt templates.
  • workflow.tsx — Composition only (how tasks connect, branch, and hand typed deps into steps).
When a prompt change requires editing workflow.tsx, something is wrong with your factoring.

Set Reasonable Timeouts and Retry Limits

Every agent task should have a timeout. Agent calls can hang due to rate limits, network issues, or unexpectedly long generation. A task without a timeout is a task that might run forever.
{/* assuming outputs from createSmithers */}
<Task
  id="analyze"
  output={outputs.analysis}
  agent={analyst}
  timeoutMs={120_000}   // 2 minutes
  retries={2}            // 3 total attempts
>
  Analyze the codebase.
</Task>
Rules of thumb:
  • Simple analysis tasks: 30-60 seconds timeout, 1-2 retries.
  • Tool-using tasks (read, edit, bash): 2-5 minutes timeout, 1-2 retries.
  • Large generation tasks: 5-10 minutes timeout, 0-1 retries.
  • Non-critical tasks: add continueOnFail so failures do not block the workflow.

Use Caching for Iterative Development

You are going to iterate on prompts. A lot. Each iteration should not re-run every upstream task that already succeeded.
<Workflow name="my-workflow" cache>
The cache key includes the prompt, model, tools, schema, and JJ pointer. Changing any of these invalidates the cache for that specific task. This means you can safely tweak a downstream prompt without re-running the expensive analysis step that feeds it. Disable caching in production if you need fresh results on every run.

Example: Complete Review Loop

Here is a full example combining these practices — a review loop with structured output, measurable stop conditions, explicit validation instructions, and reasonable error handling:
import { createSmithers, Task, Sequence, Loop } from "smithers-orchestrator";
import { ToolLoopAgent as Agent } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { read, grep, bash, edit, write } from "smithers-orchestrator";
import { z } from "zod";

const { Workflow, smithers, outputs } = createSmithers({
  implement: z.object({
    summary: z.string(),
    filesChanged: z.array(z.string()),
    testsRun: z.number(),
    testsPassed: z.number(),
  }),
  review: z.object({
    approved: z.boolean(),
    feedback: z.string(),
    score: z.number().int().min(1).max(10),
  }),
  report: z.object({
    title: z.string(),
    body: z.string(),
    iterations: z.number(),
    finalScore: z.number(),
  }),
});

const implementer = new Agent({
  model: anthropic("claude-sonnet-4-20250514"),
  instructions: "You are a senior engineer. Implement changes, run tests, and return structured JSON.",
  tools: { read, grep, bash, edit, write },
});

const reviewer = new Agent({
  model: anthropic("claude-sonnet-4-20250514"),
  instructions: "You are a strict code reviewer. Return structured JSON with your assessment.",
  tools: { read, grep },
});

export default smithers((ctx) => {
  const review = ctx.outputMaybe(outputs.review, { nodeId: "review" });

  return (
    <Workflow name="review-loop">
      <Sequence>
        <Loop
          until={review?.approved === true}
          maxIterations={3}
          onMaxReached="return-last"
        >
          <Sequence>
            <Task
              id="implement"
              output={outputs.implement}
              agent={implementer}
              timeoutMs={300_000}
              retries={1}
            >
              {`Implement: ${ctx.input.description}

${review?.feedback ? `Previous review feedback:\n${review.feedback}` : ""}

After making changes:
1. Run \`bun test\` and report results
2. Verify your changes address the requirements

Return JSON with summary, filesChanged, testsRun, testsPassed.`}
            </Task>

            <Task
              id="review"
              output={outputs.review}
              agent={reviewer}
              timeoutMs={120_000}
              retries={1}
              deps={{ implement: outputs.implement }}
            >
              {(deps) => `Review the implementation.
Summary: ${deps.implement.summary}
Files changed: ${deps.implement.filesChanged.join(", ")}
Tests: ${deps.implement.testsPassed}/${deps.implement.testsRun} passed

Approve only if tests pass and the code is clean.
Return JSON with approved (boolean), feedback (string), score (1-10).`}
            </Task>
          </Sequence>
        </Loop>

        {review ? (
          <Task id="report" output={outputs.report}>
            {{
              title: `Review: ${ctx.input.description}`,
              body: review.feedback,
              iterations: ctx.iterationCount("review", "review") ?? 1,
              finalScore: review.score,
            }}
          </Task>
        ) : null}
      </Sequence>
    </Workflow>
  );
});

Next Steps