Installation, quick start, and full API reference for the ragforge package. View source on GitHub →
git clone https://github.com/Vaibhav1196/hybrid-rag-lab.git
cd hybrid-rag-lab
uv venv .venv --python 3.11
uv pip install -e ".[dev]"
from ragforge.retrieval.pipeline import HybridPipeline
pipeline = HybridPipeline.from_directory("/path/to/data", chunk_size=300, overlap=50)
results = pipeline.search("what is hybrid retrieval", top_k=3)
for r in results:
print(f"{r.score:.4f} {r.chunk.doc_id} {r.chunk.text[:80]}...")
0.0323 hybrid_retrieval Hybrid retrieval combines sparse retrieval, such as BM25... 0.0161 rag Retrieval-Augmented Generation, or RAG, is a pattern... 0.0161 langchain LangChain is a framework designed to simplify the...
uv run python scripts/retrieval/run_bm25_retrieval.py --data-dir /path/to/data --query 'hybrid retrieval' --top-k 3
uv run python scripts/retrieval/run_dense_retrieval.py --data-dir /path/to/data --query 'hybrid retrieval' --top-k 3
uv run python scripts/retrieval/run_hybrid_retrieval.py --data-dir /path/to/data --query 'hybrid retrieval' --top-k 3
uv run python scripts/retrieval/run_reranked_retrieval.py --data-dir /path/to/data --query 'hybrid retrieval' --top-k 3 --candidate-top-k 3
uv run python scripts/generation/run_rag_pipeline.py --data-dir /path/to/data --query 'What is RAG?' --llm-mode fallback --retrieval-top-k 3 --candidate-top-k 3
uv run python scripts/evaluation/run_retrieval_evaluation.py --pipeline reranked --data-dir /path/to/data --eval-path /path/to/retrieval_eval.jsonl --top-k 3 --candidate-top-k 3
uv run pytest
Classes, functions, and protocols organized by module.
Shared data objects — the common language of the whole system.
Document
dataclass
A fully loaded source document with doc_id, text, and metadata.
Chunk
dataclass
A smaller piece of a Document. Carries chunk_id, doc_id, text, and metadata including chunk_index.
RetrievalResult
dataclass
A ranked result from any retriever. Contains the Chunk, a float score, and the source method name.
Lightweight observability — per-stage latency tracking.
StageTiming
dataclass
A single (stage_name, duration_ms) timing entry.
PipelineTrace
dataclass
Collects StageTiming entries and arbitrary metadata. Use .add_stage() to record, .total_duration_ms to sum.
Document loading and chunking.
load_text_documents(data_dir)
function
Load all .txt files from a directory into Document objects. Skips empty files. Returns list[Document].
chunk_text(text, chunk_size, overlap)
function
Split a string into overlapping character-based chunks. Returns list[str].
chunk_documents(documents, chunk_size, overlap)
function
Convert list[Document] into list[Chunk], preserving doc_id lineage and chunk_index in metadata.
Sparse retrieval using BM25 probabilistic ranking.
tokenize(text)
function
Lowercase whitespace tokenizer for BM25 indexing.
BM25Retriever
class
Sparse retriever wrapping rank_bm25.BM25Okapi. Initialize with chunks, call .search(query, top_k) to retrieve.
Dense semantic retrieval using embeddings + FAISS.
DenseRetriever
class
Vector search over chunk embeddings. Initialize with chunks + a TextEmbedder. Call .search(query, top_k) to retrieve by cosine similarity.
Embedding interfaces and implementations.
TextEmbedder
protocol
Interface for any embedding model. Must implement .encode(texts) → np.ndarray. Swap in OpenAI, Cohere, or any provider.
SentenceTransformerEmbedder
class
Concrete embedder wrapping sentence-transformers. Default model: all-MiniLM-L6-v2 (384d). Outputs normalized float32 vectors.
Reciprocal Rank Fusion for combining retriever outputs.
reciprocal_rank_fusion(result_lists, top_k, k)
function
Merge multiple ranked lists using RRF. Combines ranks, not scores. Formula: score = Σ 1/(k + rank). Default k=60.
Cross-encoder reranking for precision-optimized results.
QueryDocumentScorer
protocol
Interface for scoring (query, document) pairs. Must implement .score(query, texts) → list[float].
CrossEncoderScorer
class
Concrete scorer wrapping cross-encoder models. Default: ms-marco-MiniLM-L-6-v2.
RetrievalReranker
class
Reranks a list of RetrievalResults using a QueryDocumentScorer. Call .rerank(query, results, top_k).
Ready-to-use retrieval pipelines combining all components.
BM25Pipeline
class
End-to-end sparse retrieval. Build with .from_directory(path) or .from_documents(docs). Call .search(query, top_k).
DensePipeline
class
End-to-end dense retrieval. Same interface. Accepts optional custom TextEmbedder.
HybridPipeline
class
BM25 + Dense + RRF fusion. The core hybrid retriever. Accepts rrf_k parameter.
RerankedHybridPipeline
class
HybridPipeline + cross-encoder reranking. The most complete retrieval pipeline. Accepts candidate_top_k for the reranking pool size.
Context construction and LLM-based answer generation.
ContextBuilder
class
Assembles retrieval results into prompt-ready context with citation IDs. Respects max_chunks and max_chars budget.
ChatLLM
protocol
Interface for chat LLMs. Must implement .generate(system_prompt, user_prompt) → LLMResponse.
ExtractiveFallbackLLM
class
Offline-safe fallback that extracts answers from context without an external API.
OpenAICompatibleLLM
class
Minimal client for any OpenAI-compatible API. Uses stdlib only (no openai package needed).
HuggingFaceInferenceLLM
class
Hugging Face Inference Providers client built on the OpenAI-compatible router. Uses HF_TOKEN or an explicit api_key.
RAGPipeline
class
End-to-end retrieval → context → generation pipeline. Call .answer(query) to get a GenerationResponse with trace.
Retrieval and answer quality evaluation.
evaluate_retrieval(pipeline, samples, top_k)
function
Evaluate a retrieval pipeline on labeled queries. Returns RetrievalEvaluationReport with Hit Rate@k, MRR@k, Recall@k.
evaluate_answer_heuristics(sample, response)
function
Fast heuristic checks: answer_present, cites_context, grounded_to_relevant_context, reference_term_overlap.
build_llm_judge_prompts(sample, response)
function
Build rubric prompt for LLM-as-judge scoring (groundedness, correctness, completeness, overall).