The Pipeline
RAG in Smithers is a four-step pipeline:- Chunk — split documents into small, overlapping pieces
- Embed — convert each chunk into a vector using an embedding model
- Store — persist vectors in a SQLite table alongside the original text
- Retrieve — embed the query, find the closest vectors, return the matching chunks
createRagPipeline.
Chunking Strategies
A document rarely fits in a single embedding. Chunking breaks it into pieces that are small enough to embed and specific enough to be useful when retrieved. Smithers ships five strategies:| Strategy | Splits on | Best for |
|---|---|---|
recursive | Paragraphs, then sentences, then words | General text (default) |
character | Fixed character count | Uniform chunk sizes |
sentence | Sentence boundaries | Prose, articles |
markdown | Headings and sections | Documentation, READMEs |
token | Approximate token count | Token-budget-aware splitting |
size (max characters per chunk) and overlap (characters shared between adjacent chunks). Overlap prevents information loss at chunk boundaries.
Embedding
Smithers wraps the Vercel AI SDK’sembed() and embedMany(). You bring any embedding model the AI SDK supports — OpenAI, Google, Mistral, Cohere.
Vector Store
Vectors live in SQLite. No external database required. The_smithers_vectors table stores each chunk’s text, embedding (as a BLOB), and metadata. Queries do a full-table scan with JavaScript cosine similarity using the AI SDK’s cosineSimilarity().
This is fast enough for typical RAG workloads (hundreds to low thousands of chunks). If you outgrow it, swap in a different store implementation.
The RAG Pipeline
createRagPipeline wires all four steps together:
The RAG Tool
Agents can search the knowledge base themselves.createRagTool exposes the pipeline as a tool:
Namespaces
A single vector store can hold multiple document collections. Pass anamespace to keep them separate:
Document Format Detection
When you callcreateDocument(content), Smithers auto-detects the format so the chunker can split intelligently:
| Detection rule | Assigned format |
|---|---|
Content starts with { or [ and is valid JSON | json |
Content starts with <! or <html | html |
Content has a line starting with one to six # characters | markdown |
| Everything else | text |
format explicitly:
loadDocument(path) uses the file extension (.md, .mdx, .html, .htm, .json) as a hint before inspecting the content, so the chunker uses heading-aware splitting for Markdown files even if the heading markers are uncommon.
Deleting Vectors
Remove specific chunks from the vector store by ID:Counting Vectors
Check how many chunks are stored in a namespace:Query Filters
VectorQueryOptions accepts an optional filter map that is passed through to the store implementation. The SQLite store does not apply metadata filters during the SQL query (it scores all rows and sorts), but custom store implementations can use filter to pre-select rows:
Effect Service Layer
For Effect-native workflows, aRagService Effect layer wraps the pipeline:
ingest and retrieve are convenience functions that pull RagService from Effect context automatically.
Observability Metrics
RAG operations export four metrics:| Metric | Type | Description |
|---|---|---|
smithers.rag.ingest_total | counter | Total documents ingested (incremented per ingest call by document count) |
smithers.rag.retrieve_total | counter | Total retrieval queries executed |
smithers.rag.retrieve_duration_ms | histogram | End-to-end retrieval latency (embed + query) |
smithers.rag.embed_duration_ms | histogram | Time to embed a batch of chunks |