Upload TXT, PDF, or DOCX files, run BM25, Dense, Hybrid, or Reranked retrieval, and generate grounded answers with either fallback or Hugging Face generation.
→ try on HF SpacesSee how cross-encoder reranking reorders hybrid results. Compare before/after rankings on the same query.
coming soonRun the eval suite live. See Hit Rate, MRR, Recall@k across all pipeline tiers on different eval datasets.
coming soonFull pipeline: upload docs → retrieve → generate answer with citations. Inspect the PipelineTrace for latency breakdown.
coming soonWant the full API reference and installation guide? Read the docs →
Build the hybrid pipeline the same way the repo scripts do, then search a local document directory.
from ragforge.retrieval.pipeline import HybridPipeline
pipeline = HybridPipeline.from_directory(
data_dir="/path/to/data",
chunk_size=300,
overlap=50,
model_name="all-MiniLM-L6-v2",
rrf_k=60,
)
results = pipeline.search(query="hybrid retrieval", top_k=3)
for result in results:
print(result.score, result.chunk.doc_id, result.source)
Use the reranked retrieval pipeline with the built-in fallback model for an offline end-to-end RAG run.
from ragforge.retrieval.pipeline import RerankedHybridPipeline
from ragforge.generation.pipeline import RAGPipeline
retriever = RerankedHybridPipeline.from_directory(
data_dir="/path/to/data",
candidate_top_k=3,
)
rag = RAGPipeline.with_fallback_llm(retrieval_pipeline=retriever)
response = rag.answer("What is hybrid retrieval?", retrieval_top_k=3)
print(response.answer)
print(f"Latency: {response.trace.total_duration_ms:.1f}ms")
Keep the same retrieval stack and swap in an OpenAI-compatible backend for answer generation.
from ragforge.retrieval.pipeline import RerankedHybridPipeline
from ragforge.generation import ContextBuilder, OpenAICompatibleLLM, RAGPipeline
retriever = RerankedHybridPipeline.from_directory(
data_dir="/path/to/data",
candidate_top_k=3,
)
rag = RAGPipeline(
retrieval_pipeline=retriever,
context_builder=ContextBuilder(max_chunks=4, max_chars=1800),
llm=OpenAICompatibleLLM(model_name="gpt-4.1-mini"),
)
response = rag.answer("What is hybrid retrieval?", retrieval_top_k=3)
print(response.answer)
Run the same retrieval evaluation flow used by the repo scripts against a labeled JSONL dataset.
from ragforge.retrieval.pipeline import RerankedHybridPipeline
from ragforge.evaluation import evaluate_retrieval, load_retrieval_samples
pipeline = RerankedHybridPipeline.from_directory(
data_dir="/path/to/data",
candidate_top_k=3,
)
samples = load_retrieval_samples("/path/to/retrieval_eval.jsonl")
report = evaluate_retrieval(pipeline, samples, top_k=5)
print(f"Hit Rate@5: {report.metrics.hit_rate:.2f}")
print(f"MRR@5: {report.metrics.mean_reciprocal_rank:.2f}")
print(f"Recall@5: {report.metrics.recall_at_k:.2f}")