raglab_ — building RAG pipelines in public

~/hybrid-rag-lab

$ pip install ragforge

# ragforge

hybrid RAG framework for Python.

BM25 + Dense + RRF Fusion + Cross-Encoder Reranking.

evaluation suite. observability. protocol-based. swappable.

# v0.1.0 — src-layout package, 13 test files, CLI scripts

# built in public at github.com/Vaibhav1196/hybrid-rag-lab

$ raglab --list-stages

// try ragforge in the browser — no install needed

$ live demos

live

Document RAG Playground

Upload TXT, PDF, or DOCX files, run BM25, Dense, Hybrid, or Reranked retrieval, and generate grounded answers with either fallback or Hugging Face generation.

BM25/Dense/Hybrid reranking Hugging Face

→ try on HF Spaces

planned

Reranker Impact Test

See how cross-encoder reranking reorders hybrid results. Compare before/after rankings on the same query.

cross-encoder ms-marco-MiniLM

coming soon

planned

Retrieval Evaluation Dashboard

Run the eval suite live. See Hit Rate, MRR, Recall@k across all pipeline tiers on different eval datasets.

Hit Rate MRR Recall@k

coming soon

planned

End-to-End RAG Playground

Full pipeline: upload docs → retrieve → generate answer with citations. Inspect the PipelineTrace for latency breakdown.

RAGPipeline ContextBuilder PipelineTrace

coming soon

// copy-paste recipes

$ examples

Want the full API reference and installation guide? Read the docs →

Run hybrid retrieval

Build the hybrid pipeline the same way the repo scripts do, then search a local document directory.

python

from ragforge.retrieval.pipeline import HybridPipeline

pipeline = HybridPipeline.from_directory(
    data_dir="/path/to/data",
    chunk_size=300,
    overlap=50,
    model_name="all-MiniLM-L6-v2",
    rrf_k=60,
)
results = pipeline.search(query="hybrid retrieval", top_k=3)

for result in results:
    print(result.score, result.chunk.doc_id, result.source)

End-to-end RAG with the fallback LLM

Use the reranked retrieval pipeline with the built-in fallback model for an offline end-to-end RAG run.

python

from ragforge.retrieval.pipeline import RerankedHybridPipeline
from ragforge.generation.pipeline import RAGPipeline

retriever = RerankedHybridPipeline.from_directory(
    data_dir="/path/to/data",
    candidate_top_k=3,
)
rag = RAGPipeline.with_fallback_llm(retrieval_pipeline=retriever)

response = rag.answer("What is hybrid retrieval?", retrieval_top_k=3)

print(response.answer)
print(f"Latency: {response.trace.total_duration_ms:.1f}ms")

End-to-end RAG with OpenAI

Keep the same retrieval stack and swap in an OpenAI-compatible backend for answer generation.

python

from ragforge.retrieval.pipeline import RerankedHybridPipeline
from ragforge.generation import ContextBuilder, OpenAICompatibleLLM, RAGPipeline

retriever = RerankedHybridPipeline.from_directory(
    data_dir="/path/to/data",
    candidate_top_k=3,
)
rag = RAGPipeline(
    retrieval_pipeline=retriever,
    context_builder=ContextBuilder(max_chunks=4, max_chars=1800),
    llm=OpenAICompatibleLLM(model_name="gpt-4.1-mini"),
)

response = rag.answer("What is hybrid retrieval?", retrieval_top_k=3)
print(response.answer)

Evaluate retrieval quality

Run the same retrieval evaluation flow used by the repo scripts against a labeled JSONL dataset.

python

from ragforge.retrieval.pipeline import RerankedHybridPipeline
from ragforge.evaluation import evaluate_retrieval, load_retrieval_samples

pipeline = RerankedHybridPipeline.from_directory(
    data_dir="/path/to/data",
    candidate_top_k=3,
)
samples = load_retrieval_samples("/path/to/retrieval_eval.jsonl")

report = evaluate_retrieval(pipeline, samples, top_k=5)

print(f"Hit Rate@5: {report.metrics.hit_rate:.2f}")
print(f"MRR@5:      {report.metrics.mean_reciprocal_rank:.2f}")
print(f"Recall@5:   {report.metrics.recall_at_k:.2f}")

// what's been built

$ git log --oneline

2026-03-23 v0.1.0 ragforge v0.1.0 — full hybrid RAG pipeline with evaluation suite and observability → repo

2026-03-23 release 30-page masterclass PDF published — 15 chapters covering the full architecture → PDF

2026-03-22 exper. dual answer evaluation — heuristic checks + LLM-as-judge with rubric prompts → source

2026-03-20 finding reranked hybrid hits MRR@5 = 1.00 on easy eval set vs BM25-only at 0.75 → eval script

2026-03-18 exper. PipelineTrace shows retrieval 12ms, context build 0.3ms, LLM generation dominates at 1.8s → telemetry

2026-03-15 release end-to-end RAG pipeline — ExtractiveFallbackLLM for offline + OpenAI-compatible client → source

2026-03-12 exper. RRF fusion (k=60) on sample docs — hybrid retrieval hits 100% recall vs BM25's 88% → source

// where ragforge is headed

$ roadmap

FastAPI HTTP server Wrap the pipeline in an API. Query endpoint, health checks, structured JSON responses.

Embedding cache Cache embeddings to avoid re-embedding unchanged documents. FAISS or Qdrant backed.

Advanced chunking Sentence-aware, token-based, and semantic chunking strategies. Compare retrieval quality.

PDF / HTML / Markdown ingestion Support real-world document formats beyond .txt files.

later

Query routing Route keyword queries to BM25, semantic queries to dense retrieval automatically.

Multi-agent orchestration LangGraph integration — planner, retriever, and generator agents.

OpenTelemetry export Export PipelineTrace data to OpenTelemetry or LangSmith for production monitoring.

research

HyDE (Hypothetical Document Embeddings) Generate a hypothetical answer, embed it, use that for retrieval instead of the raw query.

SPLADE / ColBERT Learned sparse retrieval and late-interaction models as alternatives to BM25 + bi-encoder.