Local stack
Start the local stack:- Grafana:
http://localhost:3001 - Prometheus:
http://localhost:9090 - Tempo:
http://localhost:3200 - Loki:
http://localhost:3100 - OTEL collector HTTP:
http://localhost:4318
Enable OTEL export
Demo workflow
Use the built-in reproducible workflow atworkflows/agent-trace-otel-demo.tsx.
It emits:
- one Pi-like high-fidelity attempt with canonical trace events plus persisted session transcript rows
- one Claude-like structured attempt with canonical trace events plus provider session transcript rows
- one Codex-like structured attempt with canonical trace events plus provider session transcript rows
- one Gemini-like structured attempt
- one SDK-style final-only attempt
- stable run annotations for Loki queries
End-to-end verification script
scripts/verify-observability.sh runs the full reset → start → demo workflow (success + failure) → Loki/Tempo query verification flow and writes a timestamped evidence bundle under tmp/verification/<timestamp>. Use this as the canonical reproduction path for reviewers:
Loki queries
Smithers OTEL attributes are exposed to Loki as sanitized structured metadata fields such as:smithers_event_categoryrun_idworkflow_pathnode_idnode_attemptagent_familyagent_capture_modetrace_completenessevent_kindsession_row_type
agent-trace: normalized canonical execution events such as deltas, tool lifecycle, usage, and capture warnings/errorsagent-session: provider transcript/session rows observed live or backfilled from persisted session logs
artifact.created remains local-only and is not exported to Loki.
Use {service_name="smithers-dev"} as the stream selector, then filter on structured metadata in the LogQL pipeline. Use | json to inspect the structured log body.
All events for one run:
API query examples
Equivalent direct Loki API checks:Tempo trace checks
Tempo search should show Smithers spans once a workflow has run:service.name = smithers-devrunId = <RUN_ID>workflowPath = <workflow path>
Verification checklist
- stack starts successfully in Docker
- Loki is present and queryable
- collector logs pipeline is active
- Pi traces show text deltas, thinking deltas, tool execution lifecycle, final message, usage, and run/node/attempt correlation
- Pi session transcript rows are queryable in Loki
- Claude session transcript rows are queryable in Loki
- Codex session transcript rows are queryable in Loki
- second agent family is exported with truthful
final-onlycompleteness classification - Gemini
stream-jsonattempts preserve structured deltas truthfully - malformed or truncated structured streams emit
capture.errorand classify ascapture-failed - artifact write failures emit
capture.warningand degrade topartial-observedwithout losing durable DB truth - Tempo search shows Smithers spans and trace attributes including
runId - Prometheus is still scraping the collector successfully
- secrets are redacted from canonical events, OTEL log bodies, and persisted trace artifacts
Automated coverage
apps/observability/tests/agentTrace.test.js covers the canonical contract:
- capability profiles for every agent family
- family + capture-mode detection
- Pi / Claude / Codex / Gemini structured event normalization
- redaction rules for API keys, bearer tokens, and secret-ish key=value pairs
- canonical OTEL log record shaping with stable Loki query attributes
- session-event OTEL log record shaping