Technical roadmap for implementing production-grade semantic search and RAG capabilities in doc-agent.
Support multiple vector store backends via a common interface. The vector store is decoupled from chunk storage—it only knows about IDs and embeddings.
interface VectorStoreItem {
id: string; // maps to Chunk.id
embedding: number[];
metadata?: Record<string, unknown>; // for filtering
}
interface VectorStoreResult {
id: string;
score: number;
}
interface VectorStore {
name: string;
insert(items: VectorStoreItem[]): Promise<void>;
search(
queryEmbedding: number[],
topK: number,
filters?: Record<string, unknown>
): Promise<VectorStoreResult[]>;
delete(ids: string[]): Promise<void>;
}Implementations:
CustomVectorStore— brute-force → HNSWLanceDBVectorStore— baseline comparison
The search orchestrator hydrates results by joining VectorStoreResult.id against the chunks table.
| Strategy | Flag | Implementation | Best For |
|---|---|---|---|
| Line | --chunk line |
Split on \n, group empty lines |
Receipts, invoices |
| Sentence | --chunk sentence |
NLP tokenizer | Natural text |
| Semantic | --chunk semantic |
LLM-assisted boundary detection | Contracts, reports |
Auto-routing by document type:
- Receipts/invoices →
line - Bank statements →
lineorsentence - Contracts/reports →
semantic
interface EmbeddingProvider {
name: string;
dims: number;
embed(texts: string[]): Promise<number[][]>;
}| Provider | Models | Notes |
|---|---|---|
| Ollama (default) | nomic-embed-text, mxbai-embed-large |
Local, no API key |
| OpenAI | text-embedding-3-small |
High quality |
| Gemini | text-embedding-005, text-multilingual-embedding-002 |
Multilingual support |
| Transformers.js | Local ONNX | Zero external deps |
interface LLMProvider {
name: string;
generate(prompt: string, options?: { system?: string }): Promise<string>;
}| Provider | Models | Notes |
|---|---|---|
| Ollama (default) | llama3.2, mistral |
Local |
| OpenAI | gpt-4o-mini |
High quality |
| Gemini | gemini-1.5-flash |
Fast |
CREATE TABLE chunks (
id TEXT PRIMARY KEY,
document_id INTEGER REFERENCES documents(id),
content TEXT NOT NULL,
metadata JSON,
chunk_index INTEGER NOT NULL
);Embedding storage:
- Phase 1: SQLite BLOB (brute-force search)
- Phase 2+: Vector store's native format (HNSW memory-mapped files)
FTS5 contentless index alongside vector search:
CREATE VIRTUAL TABLE chunks_fts USING fts5(
content,
content='chunks',
content_rowid='id'
);Both FTS5 and vector search return chunk.id, enabling fusion:
interface HybridSearchResult {
chunk: Chunk;
vectorScore?: number;
keywordScore?: number;
combinedScore: number;
ranks: {
vectorRank?: number;
keywordRank?: number;
};
}Search modes:
--mode vector— Cosine similarity only--mode keyword— BM25 only--mode hybrid— RRF fusion
interface Reranker {
rerank(query: string, candidates: ScoredChunk[]): Promise<ScoredChunk[]>;
}Reranker receives scored results to preserve retrieval context for debugging and score blending.
interface RAGResponse {
answer: string;
chunks: RAGChunk[];
debug?: {
vectorResults: ScoredChunk[];
keywordResults: ScoredChunk[];
rerankedResults: ScoredChunk[];
stats: {
vectorLatencyMs: number;
keywordLatencyMs?: number;
rerankLatencyMs?: number;
totalLatencyMs: number;
};
};
}Exposed via:
- CLI:
doc search "query" --rag - MCP:
search_documentstool - HTTP:
POST /rag(optional)
interface EvalQuery {
id: string;
query: string;
relevantChunkIds: string[];
category?: string;
}
interface EvalDataset {
name: string;
chunks: Chunk[];
queries: EvalQuery[];
}
interface EvalResult {
recallAtK: Record<number, number>;
precisionAtK: Record<number, number>;
mrr: number;
byCategory?: Record<string, EvalResult>;
}- Chunking module (
line,sentence) - Embedding provider abstraction + Ollama implementation
- Custom vector store with brute-force cosine similarity
chunkstable in SQLite- CLI:
doc ingestanddoc search - Evaluation harness
packages/vector-store/src/
├── chunking/
│ ├── types.ts
│ ├── line.ts
│ └── sentence.ts
├── embeddings/
│ ├── types.ts
│ └── ollama.ts
├── stores/
│ ├── types.ts
│ └── custom.ts
├── eval/
│ ├── types.ts
│ ├── dataset.ts
│ └── metrics.ts
├── search.ts
└── index.ts
-
ChunkandChunkingStrategytypes - Line chunker
- Sentence chunker
-
EmbeddingProviderinterface - Ollama embedding provider
-
VectorStoreinterface - Brute-force cosine similarity
-
chunksschema migration - Search orchestrator
-
doc ingest <file>command -
doc search <query>command - Evaluation dataset
-
doc evalcommand
- Chunk size vs recall@k
- Embedding latency by provider
- FTS5 integration for keyword search
- BM25 scoring
- Reciprocal Rank Fusion (RRF)
- HNSW index
- Metadata filtering
packages/vector-store/src/
├── ranking/
│ ├── bm25.ts
│ ├── rrf.ts
│ └── hybrid.ts
├── stores/
│ └── hnsw.ts
- FTS5 virtual table + sync triggers
-
bm25Search()function -
rrfFusion()function -
HybridSearchResulttype -
hybridSearch()orchestrator -
--mode vector | keyword | hybridflag - HNSW vector store
-
--filtermetadata filtering
- Vector vs keyword vs hybrid recall
- HNSW accuracy vs brute-force
- HNSW latency vs
efparameter - Custom vs LanceDB comparison
- LLM provider abstraction
- Reranking
- RAG engine with citations
- MCP tool integration
- Provider comparison
packages/vector-store/src/
├── llm/
│ ├── types.ts
│ └── ollama.ts
├── rerank/
│ ├── types.ts
│ └── ollama.ts
├── rag/
│ ├── types.ts
│ ├── engine.ts
│ └── prompts.ts
-
LLMProviderinterface - Ollama LLM provider
-
Rerankerinterface - Ollama reranker
-
runRAG()engine - RAG prompt templates
-
doc search --ragcommand - MCP
search_documentstool - Provider comparison report
- Reranking impact on precision
- Context window size vs answer quality
- Embedding provider comparison (recall, latency)
- HTTP server (
POST /rag) - Search debugging UI
- OpenAI / Gemini providers
- Transformers.js embeddings
- Semantic chunking
- Index persistence
- Embeddings versioning
- Query caching
- Multi-modal search
// ─────────────────────────────────────────────────────────────
// Chunking
// ─────────────────────────────────────────────────────────────
interface Chunk {
id: string;
documentId: string;
content: string;
index: number;
metadata: {
page?: number;
section?: string;
source: string;
[key: string]: unknown;
};
}
type ChunkingStrategy = 'line' | 'sentence' | 'semantic';
interface Chunker {
strategy: ChunkingStrategy;
chunk(text: string, documentId: string, metadata?: Record<string, unknown>): Chunk[];
}
// ─────────────────────────────────────────────────────────────
// Embeddings
// ─────────────────────────────────────────────────────────────
interface EmbeddingProvider {
name: string;
dims: number;
embed(texts: string[]): Promise<number[][]>;
}
// ─────────────────────────────────────────────────────────────
// Vector Store
// ─────────────────────────────────────────────────────────────
interface VectorStoreItem {
id: string;
embedding: number[];
metadata?: Record<string, unknown>;
}
interface VectorStoreResult {
id: string;
score: number;
}
interface VectorStore {
name: string;
insert(items: VectorStoreItem[]): Promise<void>;
search(
queryEmbedding: number[],
topK: number,
filters?: Record<string, unknown>
): Promise<VectorStoreResult[]>;
delete(ids: string[]): Promise<void>;
}
// ─────────────────────────────────────────────────────────────
// LLM
// ─────────────────────────────────────────────────────────────
interface LLMProvider {
name: string;
generate(prompt: string, options?: { system?: string }): Promise<string>;
}
// ─────────────────────────────────────────────────────────────
// Ranking
// ─────────────────────────────────────────────────────────────
interface ScoredChunk {
chunk: Chunk;
vectorScore?: number;
keywordScore?: number;
combinedScore: number;
}
interface HybridSearchResult extends ScoredChunk {
ranks: {
vectorRank?: number;
keywordRank?: number;
};
}
interface Reranker {
rerank(query: string, candidates: ScoredChunk[]): Promise<ScoredChunk[]>;
}
// ─────────────────────────────────────────────────────────────
// RAG
// ─────────────────────────────────────────────────────────────
interface RAGRequest {
query: string;
topK?: number;
mode?: 'vector' | 'keyword' | 'hybrid';
filters?: Record<string, unknown>;
rerank?: boolean;
}
interface RAGChunk {
id: string;
content: string;
score: number;
source: {
documentId: string;
filename: string;
page?: number;
};
}
interface RAGResponse {
answer: string;
chunks: RAGChunk[];
debug?: {
vectorResults: ScoredChunk[];
keywordResults: ScoredChunk[];
rerankedResults: ScoredChunk[];
stats: {
vectorLatencyMs: number;
keywordLatencyMs?: number;
rerankLatencyMs?: number;
totalLatencyMs: number;
};
};
}
// ─────────────────────────────────────────────────────────────
// Evaluation
// ─────────────────────────────────────────────────────────────
interface EvalQuery {
id: string;
query: string;
relevantChunkIds: string[];
category?: string;
}
interface EvalDataset {
name: string;
description?: string;
chunks: Chunk[];
queries: EvalQuery[];
}
interface EvalResult {
recallAtK: Record<number, number>;
precisionAtK: Record<number, number>;
mrr: number;
byCategory?: Record<string, EvalResult>;
}# Ingestion
doc ingest <file>
doc ingest <file> --chunk line|sentence|semantic
doc ingest <file> --embed-provider ollama|openai|gemini|transformers
doc ingest <file> --embed-model <model-name>
# Search
doc search <query>
doc search <query> --mode vector|keyword|hybrid
doc search <query> --vector-store custom|lancedb
doc search <query> --filter "key:value"
doc search <query> --rag
doc search <query> --rerank
doc search <query> --top-k 10
doc search <query> --json
# Evaluation
doc eval --dataset <path>
doc eval --compare ollama,openai,gemini
# Servers
doc mcp
doc serve --port 3000- HNSW Paper — Hierarchical Navigable Small World graphs
- BM25 Explained
- RRF Paper — Reciprocal Rank Fusion
- RAG Paper — Retrieval-Augmented Generation