| building RAG pipelines in public
~/hybrid-rag-lab
$ pip install ragforge
 
# ragforge
hybrid RAG framework for Python.
BM25 + Dense + RRF Fusion + Cross-Encoder Reranking.
evaluation suite. observability. protocol-based. swappable.
 
# v0.1.0 — src-layout package, 13 test files, CLI scripts
# built in public at github.com/Vaibhav1196/hybrid-rag-lab
 
$ raglab --list-stages
// try ragforge in the browser — no install needed

$ live demos

live

Document RAG Playground

Upload TXT, PDF, or DOCX files, run BM25, Dense, Hybrid, or Reranked retrieval, and generate grounded answers with either fallback or Hugging Face generation.

→ try on HF Spaces
planned

Reranker Impact Test

See how cross-encoder reranking reorders hybrid results. Compare before/after rankings on the same query.

coming soon
planned

Retrieval Evaluation Dashboard

Run the eval suite live. See Hit Rate, MRR, Recall@k across all pipeline tiers on different eval datasets.

coming soon
planned

End-to-End RAG Playground

Full pipeline: upload docs → retrieve → generate answer with citations. Inspect the PipelineTrace for latency breakdown.

coming soon
// copy-paste recipes

$ examples

Want the full API reference and installation guide? Read the docs →

Run hybrid retrieval

Build the hybrid pipeline the same way the repo scripts do, then search a local document directory.

python
from ragforge.retrieval.pipeline import HybridPipeline

pipeline = HybridPipeline.from_directory(
    data_dir="/path/to/data",
    chunk_size=300,
    overlap=50,
    model_name="all-MiniLM-L6-v2",
    rrf_k=60,
)
results = pipeline.search(query="hybrid retrieval", top_k=3)

for result in results:
    print(result.score, result.chunk.doc_id, result.source)

End-to-end RAG with the fallback LLM

Use the reranked retrieval pipeline with the built-in fallback model for an offline end-to-end RAG run.

python
from ragforge.retrieval.pipeline import RerankedHybridPipeline
from ragforge.generation.pipeline import RAGPipeline

retriever = RerankedHybridPipeline.from_directory(
    data_dir="/path/to/data",
    candidate_top_k=3,
)
rag = RAGPipeline.with_fallback_llm(retrieval_pipeline=retriever)

response = rag.answer("What is hybrid retrieval?", retrieval_top_k=3)

print(response.answer)
print(f"Latency: {response.trace.total_duration_ms:.1f}ms")

End-to-end RAG with OpenAI

Keep the same retrieval stack and swap in an OpenAI-compatible backend for answer generation.

python
from ragforge.retrieval.pipeline import RerankedHybridPipeline
from ragforge.generation import ContextBuilder, OpenAICompatibleLLM, RAGPipeline

retriever = RerankedHybridPipeline.from_directory(
    data_dir="/path/to/data",
    candidate_top_k=3,
)
rag = RAGPipeline(
    retrieval_pipeline=retriever,
    context_builder=ContextBuilder(max_chunks=4, max_chars=1800),
    llm=OpenAICompatibleLLM(model_name="gpt-4.1-mini"),
)

response = rag.answer("What is hybrid retrieval?", retrieval_top_k=3)
print(response.answer)

Evaluate retrieval quality

Run the same retrieval evaluation flow used by the repo scripts against a labeled JSONL dataset.

python
from ragforge.retrieval.pipeline import RerankedHybridPipeline
from ragforge.evaluation import evaluate_retrieval, load_retrieval_samples

pipeline = RerankedHybridPipeline.from_directory(
    data_dir="/path/to/data",
    candidate_top_k=3,
)
samples = load_retrieval_samples("/path/to/retrieval_eval.jsonl")

report = evaluate_retrieval(pipeline, samples, top_k=5)

print(f"Hit Rate@5: {report.metrics.hit_rate:.2f}")
print(f"MRR@5:      {report.metrics.mean_reciprocal_rank:.2f}")
print(f"Recall@5:   {report.metrics.recall_at_k:.2f}")
// what's been built

$ git log --oneline

2026-03-23 v0.1.0 ragforge v0.1.0 — full hybrid RAG pipeline with evaluation suite and observability → repo
2026-03-23 release 30-page masterclass PDF published — 15 chapters covering the full architecture → PDF
2026-03-22 exper. dual answer evaluation — heuristic checks + LLM-as-judge with rubric prompts → source
2026-03-20 finding reranked hybrid hits MRR@5 = 1.00 on easy eval set vs BM25-only at 0.75 → eval script
2026-03-18 exper. PipelineTrace shows retrieval 12ms, context build 0.3ms, LLM generation dominates at 1.8s → telemetry
2026-03-15 release end-to-end RAG pipeline — ExtractiveFallbackLLM for offline + OpenAI-compatible client → source
2026-03-12 exper. RRF fusion (k=60) on sample docs — hybrid retrieval hits 100% recall vs BM25's 88% → source
// where ragforge is headed

$ roadmap

next
FastAPI HTTP server Wrap the pipeline in an API. Query endpoint, health checks, structured JSON responses.
Embedding cache Cache embeddings to avoid re-embedding unchanged documents. FAISS or Qdrant backed.
Advanced chunking Sentence-aware, token-based, and semantic chunking strategies. Compare retrieval quality.
PDF / HTML / Markdown ingestion Support real-world document formats beyond .txt files.
later
Query routing Route keyword queries to BM25, semantic queries to dense retrieval automatically.
Multi-agent orchestration LangGraph integration — planner, retriever, and generator agents.
OpenTelemetry export Export PipelineTrace data to OpenTelemetry or LangSmith for production monitoring.
research
HyDE (Hypothetical Document Embeddings) Generate a hypothetical answer, embed it, use that for retrieval instead of the raw query.
SPLADE / ColBERT Learned sparse retrieval and late-interaction models as alternatives to BM25 + bi-encoder.
// get notified when I ship new pipeline stages

No spam. Just a short email when a new stage goes live or when I find something worth sharing.