raglab_ — building RAG pipelines in public

// ragforge library documentation

$ docs

Installation, quick start, and full API reference for the ragforge package. View source on GitHub →

// get running in 2 minutes

$ quick start

install

git clone https://github.com/Vaibhav1196/hybrid-rag-lab.git
cd hybrid-rag-lab
uv venv .venv --python 3.11
uv pip install -e ".[dev]"

run hybrid retrieval in 5 lines

from ragforge.retrieval.pipeline import HybridPipeline

pipeline = HybridPipeline.from_directory("/path/to/data", chunk_size=300, overlap=50)
results  = pipeline.search("what is hybrid retrieval", top_k=3)

for r in results:
    print(f"{r.score:.4f}  {r.chunk.doc_id}  {r.chunk.text[:80]}...")

output

0.0323  hybrid_retrieval  Hybrid retrieval combines sparse retrieval, such as BM25...
0.0161  rag               Retrieval-Augmented Generation, or RAG, is a pattern...
0.0161  langchain          LangChain is a framework designed to simplify the...

or use the CLI scripts

Run BM25 retrieval uv run python scripts/retrieval/run_bm25_retrieval.py --data-dir /path/to/data --query 'hybrid retrieval' --top-k 3

Run dense retrieval uv run python scripts/retrieval/run_dense_retrieval.py --data-dir /path/to/data --query 'hybrid retrieval' --top-k 3

Run hybrid retrieval uv run python scripts/retrieval/run_hybrid_retrieval.py --data-dir /path/to/data --query 'hybrid retrieval' --top-k 3

Run reranked retrieval

uv run python scripts/retrieval/run_reranked_retrieval.py --data-dir /path/to/data --query 'hybrid retrieval' --top-k 3 --candidate-top-k 3

Run end-to-end RAG

uv run python scripts/generation/run_rag_pipeline.py --data-dir /path/to/data --query 'What is RAG?' --llm-mode fallback --retrieval-top-k 3 --candidate-top-k 3

Benchmark retrieval

uv run python scripts/evaluation/run_retrieval_evaluation.py --pipeline reranked --data-dir /path/to/data --eval-path /path/to/retrieval_eval.jsonl --top-k 3 --candidate-top-k 3

Run the test suite uv run pytest

// ragforge module reference

$ api reference

Classes, functions, and protocols organized by module.

schemas telemetry ingestion bm25 dense embeddings fusion reranking pipeline generation evaluation

ragforge.core.schemas

source

Shared data objects — the common language of the whole system.

Document dataclass

A fully loaded source document with doc_id, text, and metadata.

Chunk dataclass

A smaller piece of a Document. Carries chunk_id, doc_id, text, and metadata including chunk_index.

RetrievalResult dataclass

A ranked result from any retriever. Contains the Chunk, a float score, and the source method name.

ragforge.core.telemetry

source

Lightweight observability — per-stage latency tracking.

StageTiming dataclass

A single (stage_name, duration_ms) timing entry.

PipelineTrace dataclass

Collects StageTiming entries and arbitrary metadata. Use .add_stage() to record, .total_duration_ms to sum.

ragforge.ingestion

source

Document loading and chunking.

load_text_documents(data_dir) function

Load all .txt files from a directory into Document objects. Skips empty files. Returns list[Document].

chunk_text(text, chunk_size, overlap) function

Split a string into overlapping character-based chunks. Returns list[str].

chunk_documents(documents, chunk_size, overlap) function

Convert list[Document] into list[Chunk], preserving doc_id lineage and chunk_index in metadata.

ragforge.retrieval.bm25

source

Sparse retrieval using BM25 probabilistic ranking.

tokenize(text) function

Lowercase whitespace tokenizer for BM25 indexing.

BM25Retriever class

Sparse retriever wrapping rank_bm25.BM25Okapi. Initialize with chunks, call .search(query, top_k) to retrieve.

ragforge.retrieval.dense

source

Dense semantic retrieval using embeddings + FAISS.

DenseRetriever class

Vector search over chunk embeddings. Initialize with chunks + a TextEmbedder. Call .search(query, top_k) to retrieve by cosine similarity.

ragforge.retrieval.embeddings

source

Embedding interfaces and implementations.

TextEmbedder protocol

Interface for any embedding model. Must implement .encode(texts) → np.ndarray. Swap in OpenAI, Cohere, or any provider.

SentenceTransformerEmbedder class

Concrete embedder wrapping sentence-transformers. Default model: all-MiniLM-L6-v2 (384d). Outputs normalized float32 vectors.

ragforge.retrieval.fusion

source

Reciprocal Rank Fusion for combining retriever outputs.

reciprocal_rank_fusion(result_lists, top_k, k) function

Merge multiple ranked lists using RRF. Combines ranks, not scores. Formula: score = Σ 1/(k + rank). Default k=60.

ragforge.retrieval.reranking

source

Cross-encoder reranking for precision-optimized results.

QueryDocumentScorer protocol

Interface for scoring (query, document) pairs. Must implement .score(query, texts) → list[float].

CrossEncoderScorer class

Concrete scorer wrapping cross-encoder models. Default: ms-marco-MiniLM-L-6-v2.

RetrievalReranker class

Reranks a list of RetrievalResults using a QueryDocumentScorer. Call .rerank(query, results, top_k).

ragforge.retrieval.pipeline

source

Ready-to-use retrieval pipelines combining all components.

BM25Pipeline class

End-to-end sparse retrieval. Build with .from_directory(path) or .from_documents(docs). Call .search(query, top_k).

DensePipeline class

End-to-end dense retrieval. Same interface. Accepts optional custom TextEmbedder.

HybridPipeline class

BM25 + Dense + RRF fusion. The core hybrid retriever. Accepts rrf_k parameter.

RerankedHybridPipeline class

HybridPipeline + cross-encoder reranking. The most complete retrieval pipeline. Accepts candidate_top_k for the reranking pool size.

ragforge.generation

source

Context construction and LLM-based answer generation.

ContextBuilder class

Assembles retrieval results into prompt-ready context with citation IDs. Respects max_chunks and max_chars budget.

ChatLLM protocol

Interface for chat LLMs. Must implement .generate(system_prompt, user_prompt) → LLMResponse.

ExtractiveFallbackLLM class

Offline-safe fallback that extracts answers from context without an external API.

OpenAICompatibleLLM class

Minimal client for any OpenAI-compatible API. Uses stdlib only (no openai package needed).

HuggingFaceInferenceLLM class

Hugging Face Inference Providers client built on the OpenAI-compatible router. Uses HF_TOKEN or an explicit api_key.

RAGPipeline class

End-to-end retrieval → context → generation pipeline. Call .answer(query) to get a GenerationResponse with trace.

ragforge.evaluation

source

Retrieval and answer quality evaluation.

evaluate_retrieval(pipeline, samples, top_k) function

Evaluate a retrieval pipeline on labeled queries. Returns RetrievalEvaluationReport with Hit Rate@k, MRR@k, Recall@k.

evaluate_answer_heuristics(sample, response) function

Fast heuristic checks: answer_present, cites_context, grounded_to_relevant_context, reference_term_overlap.

build_llm_judge_prompts(sample, response) function

Build rubric prompt for LLM-as-judge scoring (groundedness, correctness, completeness, overall).

← back to raglab_ home | view source on github →