Hybrid RAG retrieval combines searches from an original question embedding and a reformulated question embedding to retrieve more comprehensive context for Retrieval-Augmented Generation. A single query embedding may miss relevant documents that a reformulated version of the question would surface. Fusing both result sets captures relevant context from multiple perspectives.
This code example defines a reusable retrieval function that generates embeddings for both the original and reformulated question, searches with each embedding, and fuses the results using RRF. It then builds a context string from the retrieved documents for use in an LLM prompt.
In production, use an actual embedding model to generate question embeddings and an LLM to reformulate the user’s question. The example below uses random vectors as placeholders to demonstrate the retrieval and fusion pattern.
from actian_vectorai import VectorAIClient, reciprocal_rank_fusion
import random
COLLECTION = "documents"
DIMENSION = 384
def hybrid_rag_retrieval(client, user_question, top_k=5):
"""
Perform hybrid retrieval for RAG application
Combines:
1. Semantic search on question
2. Semantic search on reformulated question
3. Returns top-k most relevant documents
"""
# In production, use actual embedding model
# question_embedding = embed_model.encode(user_question)
question_embedding = [random.gauss(0, 1) for _ in range(DIMENSION)]
# Reformulate question (in production, use LLM)
# reformulated = llm.reformulate(user_question)
# reformulated_embedding = embed_model.encode(reformulated)
reformulated_embedding = [random.gauss(0.1, 0.95) for _ in range(DIMENSION)]
# Search with both queries
original_results = client.points.search(
COLLECTION,
vector=question_embedding,
limit=15,
with_payload=True
)
reformulated_results = client.points.search(
COLLECTION,
vector=reformulated_embedding,
limit=15,
with_payload=True
)
# Fuse results
fused = reciprocal_rank_fusion(
[original_results, reformulated_results],
ranking_constant_k=60,
limit=15
)
# Return top-k for context
return fused[:top_k]
# Usage in RAG pipeline
with VectorAIClient("localhost:50051") as client:
user_question = "How do I reset my password?"
# Retrieve relevant context
context_docs = hybrid_rag_retrieval(client, user_question, top_k=3)
# Build context for LLM
context = "\n\n".join([
doc.payload.get('text', '')
for doc in context_docs
])
print(f"Retrieved {len(context_docs)} context documents for RAG")
print(f"\nContext for LLM ({len(context)} chars):")
print(context[:500] + "...")
# In production: Pass context + question to LLM
# response = llm.generate(question=user_question, context=context)
The hybrid RAG retrieval function (hybrid_rag_retrieval in Python, hybridRagRetrieval in JavaScript) returns a list of fused results, each containing:
id: The unique identifier of the matching point
score: Fused score from RRF across both query searches
payload: Metadata object containing the document text and any additional fields
Hybrid RAG retrieval improves context quality by:
- Capturing relevant documents that a single query might miss
- Combining signals from both the original and reformulated question
- Providing more diverse context to the LLM for answer generation