Skip to main content
monitor-smithers is a watchdog over your Smithers fleet. It reads the live run table, sorts every run into a health bucket, and, only when something is wrong, produces concrete escalation actions (which gate to clear, which run to triage, which to cancel). Reach for it on a schedule or whenever you want a quick “is anything stuck?” sweep of in-flight runs.
bunx smithers-orchestrator workflow run monitor-smithers --input '{"staleMinutes":15}'

Stages

  1. poll: deterministic (no agent), shells out to bunx smithers-orchestrator ps --format json, parses it, and normalises each run into { runId, status, ageMinutes, lastEvent }. Any failure degrades to an empty list rather than throwing.
  2. classify: a cheap/fast agent sorts the runs into buckets: healthy, stuck, blocked, failed, overBudget.
  3. triage: runs only when a non-healthy run exists. A smart agent surfaces the pending approval/question for blocked runs (bunx smithers-orchestrator why, bunx smithers-orchestrator approve), recommends the triage-run workflow for stuck/failed runs, and returns a digest.

Inputs

InputTypeDefault
staleMinutesnumber15 (a run idle past this many minutes is treated as stuck)

Monitor it

bunx smithers-orchestrator workflow run monitor-smithers -d   # detach
bunx smithers-orchestrator ps                                 # active / paused / recent
bunx smithers-orchestrator logs RUN_ID -f                    # follow events
bunx smithers-orchestrator inspect RUN_ID                    # full run state + outputs
A healthy fleet short-circuits before the expensive triage step, so a clean sweep is cheap. When the triage step does run, its digest and actions tell you exactly which runs need a human and what to do first.