RAG (Retrieval-Augmented Generation)

Your agent needs to answer questions about a codebase, a set of design docs, or a knowledge base. The model’s training data does not cover your private documents. You could paste everything into the prompt, but that blows up the context window and costs a fortune. RAG solves this by fetching only the relevant pieces at query time.

The Pipeline

RAG in Smithers is a four-step pipeline:

Chunk — split documents into small, overlapping pieces
Embed — convert each chunk into a vector using an embedding model
Store — persist vectors in a SQLite table alongside the original text
Retrieve — embed the query, find the closest vectors, return the matching chunks

Document ──▶ Chunker ──▶ Embedder ──▶ Vector Store
                                           │
Query ──▶ Embedder ──▶ Similarity Search ──┘──▶ Ranked Results

Each step is a plain function. You can use them individually or wire them together with createRagPipeline.

Chunking Strategies

A document rarely fits in a single embedding. Chunking breaks it into pieces that are small enough to embed and specific enough to be useful when retrieved. Smithers ships five strategies:

Strategy	Splits on	Best for
`recursive`	Paragraphs, then sentences, then words	General text (default)
`character`	Fixed character count	Uniform chunk sizes
`sentence`	Sentence boundaries	Prose, articles
`markdown`	Headings and sections	Documentation, READMEs
`token`	Approximate token count	Token-budget-aware splitting

Every strategy accepts size (max characters per chunk) and overlap (characters shared between adjacent chunks). Overlap prevents information loss at chunk boundaries.

import { chunk, createDocument } from "smithers-orchestrator/rag";

const doc = createDocument("Your long document text here...");
const chunks = chunk(doc, { strategy: "recursive", size: 1000, overlap: 200 });

Embedding

Smithers wraps the Vercel AI SDK’s embed() and embedMany(). You bring any embedding model the AI SDK supports — OpenAI, Google, Mistral, Cohere.

import { embedChunks, embedQuery } from "smithers-orchestrator/rag";
import { openai } from "@ai-sdk/openai";

const model = openai.embedding("text-embedding-3-small");
const embedded = await embedChunks(chunks, model);
const queryVector = await embedQuery("How does caching work?", model);

The embedder is intentionally thin. It bridges Smithers chunk types to the AI SDK and adds structured logging. No custom vector math.

Vector Store

Vectors live in SQLite. No external database required. The _smithers_vectors table stores each chunk’s text, embedding (as a BLOB), and metadata. Queries do a full-table scan with JavaScript cosine similarity using the AI SDK’s cosineSimilarity(). This is fast enough for typical RAG workloads (hundreds to low thousands of chunks). If you outgrow it, swap in a different store implementation.

import { createSqliteVectorStore } from "smithers-orchestrator/rag";

const store = createSqliteVectorStore(workflow.db);
await store.upsert(embedded);
const results = await store.query(queryVector, { topK: 5 });

The RAG Pipeline

createRagPipeline wires all four steps together:

import { createRagPipeline } from "smithers-orchestrator/rag";

const pipeline = createRagPipeline({
  vectorStore: store,
  embeddingModel: model,
  chunkOptions: { strategy: "markdown", size: 1000, overlap: 200 },
});

// Ingest
await pipeline.ingest([doc1, doc2]);
await pipeline.ingestFile("./docs/architecture.md");

// Retrieve
const results = await pipeline.retrieve("How does the scheduler work?", { topK: 5 });

The RAG Tool

Agents can search the knowledge base themselves. createRagTool exposes the pipeline as a tool:

import { createRagTool } from "smithers-orchestrator/rag";

const searchKnowledge = createRagTool(pipeline, {
  name: "search_knowledge",
  description: "Search the project knowledge base",
});

Hand this tool to any agent. When the agent calls it, Smithers embeds the query, searches the vector store, and returns the top results. The agent sees text and relevance scores — it never touches raw vectors.

Namespaces

A single vector store can hold multiple document collections. Pass a namespace to keep them separate:

const pipeline = createRagPipeline({
  vectorStore: store,
  embeddingModel: model,
  chunkOptions: { strategy: "recursive" },
  namespace: "api-docs",
});

Different namespaces share the same SQLite table but queries only search within their namespace.

Document Format Detection

When you call createDocument(content), Smithers auto-detects the format so the chunker can split intelligently:

Detection rule	Assigned format
Content starts with `{` or `[` and is valid JSON	`json`
Content starts with `<!` or `<html`	`html`
Content has a line starting with one to six `#` characters	`markdown`
Everything else	`text`

You can override auto-detection by passing format explicitly:

const doc = createDocument(content, { format: "markdown" });

loadDocument(path) uses the file extension (.md, .mdx, .html, .htm, .json) as a hint before inspecting the content, so the chunker uses heading-aware splitting for Markdown files even if the heading markers are uncommon.

Deleting Vectors

Remove specific chunks from the vector store by ID:

await store.delete(["chunk-id-1", "chunk-id-2"]);

Passing an empty array is a no-op. Use this to keep the store current when source documents are updated or removed.

Counting Vectors

Check how many chunks are stored in a namespace:

const total = await store.count();            // default namespace
const apiDocs = await store.count("api-docs"); // specific namespace

Useful for verifying that ingestion completed and for monitoring store growth over time.

Query Filters

VectorQueryOptions accepts an optional filter map that is passed through to the store implementation. The SQLite store does not apply metadata filters during the SQL query (it scores all rows and sorts), but custom store implementations can use filter to pre-select rows:

const results = await store.query(queryVector, {
  topK: 5,
  namespace: "api-docs",
  filter: { source: "architecture.md" },
});

When using the default SQLite store, include relevant metadata fields in the document itself and filter the returned results in application code.

Effect Service Layer

For Effect-native workflows, a RagService Effect layer wraps the pipeline:

import { RagService, createRagServiceLayer, retrieve, ingest } from "smithers-orchestrator/rag";
import { Effect } from "effect";

const layer = createRagServiceLayer({
  vectorStore: store,
  embeddingModel: model,
  chunkOptions: { strategy: "markdown" },
});

const program = Effect.gen(function* () {
  yield* ingest([doc]);
  const results = yield* retrieve("How does auth work?", 5);
  return results;
}).pipe(Effect.provide(layer));

ingest and retrieve are convenience functions that pull RagService from Effect context automatically.

Observability Metrics

RAG operations export four metrics:

Metric	Type	Description
`smithers.rag.ingest_total`	counter	Total documents ingested (incremented per `ingest` call by document count)
`smithers.rag.retrieve_total`	counter	Total retrieval queries executed
`smithers.rag.retrieve_duration_ms`	histogram	End-to-end retrieval latency (embed + query)
`smithers.rag.embed_duration_ms`	histogram	Time to embed a batch of chunks

These integrate with the standard Smithers observability pipeline and appear in Prometheus exports and OpenTelemetry traces.

CLI

Ingest files and query from the command line:

# Ingest a file
smithers rag ingest ./docs/api.md --workflow my-workflow.tsx

# Query the knowledge base
smithers rag query "How does authentication work?" --workflow my-workflow.tsx --top-k 5

Getting Started

Build Workflows

Run and Operate

Core Concepts

Guides

Components

Integrations

Runtime API

Examples

Reference

RAG (Retrieval-Augmented Generation)

The Pipeline

Chunking Strategies

Embedding

Vector Store

The RAG Pipeline

The RAG Tool

Namespaces

Document Format Detection

Deleting Vectors

Counting Vectors

Query Filters

Effect Service Layer

Observability Metrics

CLI

Getting Started

Build Workflows

Run and Operate

Core Concepts

Guides

Components

Integrations

Runtime API

Examples

Reference

​The Pipeline

​Chunking Strategies

​Embedding

​Vector Store

​The RAG Pipeline

​The RAG Tool

​Namespaces

​Document Format Detection

​Deleting Vectors

​Counting Vectors

​Query Filters

​Effect Service Layer

​Observability Metrics

​CLI

The Pipeline

Chunking Strategies

Embedding

Vector Store

The RAG Pipeline

The RAG Tool

Namespaces

Document Format Detection

Deleting Vectors

Counting Vectors

Query Filters

Effect Service Layer

Observability Metrics

CLI