Skip to main content
Your agent needs to answer questions about a codebase, a set of design docs, or a knowledge base. The model’s training data does not cover your private documents. You could paste everything into the prompt, but that blows up the context window and costs a fortune. RAG solves this by fetching only the relevant pieces at query time.

The Pipeline

RAG in Smithers is a four-step pipeline:
  1. Chunk — split documents into small, overlapping pieces
  2. Embed — convert each chunk into a vector using an embedding model
  3. Store — persist vectors in a SQLite table alongside the original text
  4. Retrieve — embed the query, find the closest vectors, return the matching chunks
Document ──▶ Chunker ──▶ Embedder ──▶ Vector Store

Query ──▶ Embedder ──▶ Similarity Search ──┘──▶ Ranked Results
Each step is a plain function. You can use them individually or wire them together with createRagPipeline.

Chunking Strategies

A document rarely fits in a single embedding. Chunking breaks it into pieces that are small enough to embed and specific enough to be useful when retrieved. Smithers ships five strategies:
StrategySplits onBest for
recursiveParagraphs, then sentences, then wordsGeneral text (default)
characterFixed character countUniform chunk sizes
sentenceSentence boundariesProse, articles
markdownHeadings and sectionsDocumentation, READMEs
tokenApproximate token countToken-budget-aware splitting
Every strategy accepts size (max characters per chunk) and overlap (characters shared between adjacent chunks). Overlap prevents information loss at chunk boundaries.
import { chunk, createDocument } from "smithers-orchestrator/rag";

const doc = createDocument("Your long document text here...");
const chunks = chunk(doc, { strategy: "recursive", size: 1000, overlap: 200 });

Embedding

Smithers wraps the Vercel AI SDK’s embed() and embedMany(). You bring any embedding model the AI SDK supports — OpenAI, Google, Mistral, Cohere.
import { embedChunks, embedQuery } from "smithers-orchestrator/rag";
import { openai } from "@ai-sdk/openai";

const model = openai.embedding("text-embedding-3-small");
const embedded = await embedChunks(chunks, model);
const queryVector = await embedQuery("How does caching work?", model);
The embedder is intentionally thin. It bridges Smithers chunk types to the AI SDK and adds structured logging. No custom vector math.

Vector Store

Vectors live in SQLite. No external database required. The _smithers_vectors table stores each chunk’s text, embedding (as a BLOB), and metadata. Queries do a full-table scan with JavaScript cosine similarity using the AI SDK’s cosineSimilarity(). This is fast enough for typical RAG workloads (hundreds to low thousands of chunks). If you outgrow it, swap in a different store implementation.
import { createSqliteVectorStore } from "smithers-orchestrator/rag";

const store = createSqliteVectorStore(workflow.db);
await store.upsert(embedded);
const results = await store.query(queryVector, { topK: 5 });

The RAG Pipeline

createRagPipeline wires all four steps together:
import { createRagPipeline } from "smithers-orchestrator/rag";

const pipeline = createRagPipeline({
  vectorStore: store,
  embeddingModel: model,
  chunkOptions: { strategy: "markdown", size: 1000, overlap: 200 },
});

// Ingest
await pipeline.ingest([doc1, doc2]);
await pipeline.ingestFile("./docs/architecture.md");

// Retrieve
const results = await pipeline.retrieve("How does the scheduler work?", { topK: 5 });

The RAG Tool

Agents can search the knowledge base themselves. createRagTool exposes the pipeline as a tool:
import { createRagTool } from "smithers-orchestrator/rag";

const searchKnowledge = createRagTool(pipeline, {
  name: "search_knowledge",
  description: "Search the project knowledge base",
});
Hand this tool to any agent. When the agent calls it, Smithers embeds the query, searches the vector store, and returns the top results. The agent sees text and relevance scores — it never touches raw vectors.

Namespaces

A single vector store can hold multiple document collections. Pass a namespace to keep them separate:
const pipeline = createRagPipeline({
  vectorStore: store,
  embeddingModel: model,
  chunkOptions: { strategy: "recursive" },
  namespace: "api-docs",
});
Different namespaces share the same SQLite table but queries only search within their namespace.

Document Format Detection

When you call createDocument(content), Smithers auto-detects the format so the chunker can split intelligently:
Detection ruleAssigned format
Content starts with { or [ and is valid JSONjson
Content starts with <! or <htmlhtml
Content has a line starting with one to six # charactersmarkdown
Everything elsetext
You can override auto-detection by passing format explicitly:
const doc = createDocument(content, { format: "markdown" });
loadDocument(path) uses the file extension (.md, .mdx, .html, .htm, .json) as a hint before inspecting the content, so the chunker uses heading-aware splitting for Markdown files even if the heading markers are uncommon.

Deleting Vectors

Remove specific chunks from the vector store by ID:
await store.delete(["chunk-id-1", "chunk-id-2"]);
Passing an empty array is a no-op. Use this to keep the store current when source documents are updated or removed.

Counting Vectors

Check how many chunks are stored in a namespace:
const total = await store.count();            // default namespace
const apiDocs = await store.count("api-docs"); // specific namespace
Useful for verifying that ingestion completed and for monitoring store growth over time.

Query Filters

VectorQueryOptions accepts an optional filter map that is passed through to the store implementation. The SQLite store does not apply metadata filters during the SQL query (it scores all rows and sorts), but custom store implementations can use filter to pre-select rows:
const results = await store.query(queryVector, {
  topK: 5,
  namespace: "api-docs",
  filter: { source: "architecture.md" },
});
When using the default SQLite store, include relevant metadata fields in the document itself and filter the returned results in application code.

Effect Service Layer

For Effect-native workflows, a RagService Effect layer wraps the pipeline:
import { RagService, createRagServiceLayer, retrieve, ingest } from "smithers-orchestrator/rag";
import { Effect } from "effect";

const layer = createRagServiceLayer({
  vectorStore: store,
  embeddingModel: model,
  chunkOptions: { strategy: "markdown" },
});

const program = Effect.gen(function* () {
  yield* ingest([doc]);
  const results = yield* retrieve("How does auth work?", 5);
  return results;
}).pipe(Effect.provide(layer));
ingest and retrieve are convenience functions that pull RagService from Effect context automatically.

Observability Metrics

RAG operations export four metrics:
MetricTypeDescription
smithers.rag.ingest_totalcounterTotal documents ingested (incremented per ingest call by document count)
smithers.rag.retrieve_totalcounterTotal retrieval queries executed
smithers.rag.retrieve_duration_mshistogramEnd-to-end retrieval latency (embed + query)
smithers.rag.embed_duration_mshistogramTime to embed a batch of chunks
These integrate with the standard Smithers observability pipeline and appear in Prometheus exports and OpenTelemetry traces.

CLI

Ingest files and query from the command line:
# Ingest a file
smithers rag ingest ./docs/api.md --workflow my-workflow.tsx

# Query the knowledge base
smithers rag query "How does authentication work?" --workflow my-workflow.tsx --top-k 5