Skip to main content
Hybrid search runs multiple queries and a fusion step, so it takes longer than a single search. Benchmarking helps you understand the latency tradeoff and find the right balance between retrieval quality and speed for your use case. This example assumes a documents collection already exists with indexed points. For collection setup, see Reciprocal Rank Fusion. The code below runs a single vector search and a hybrid search side by side, measures the execution time of each, and outputs the latency difference and slowdown ratio so you can evaluate the performance tradeoff.
import time
from actian_vectorai import VectorAIClient, reciprocal_rank_fusion
import random

COLLECTION = "documents"
DIMENSION = 128

with VectorAIClient("localhost:50051") as client:
    query = [random.gauss(0, 1) for _ in range(DIMENSION)]

    # Single search benchmark
    start = time.time()
    single_results = client.points.search(COLLECTION, vector=query, limit=10)
    single_time = time.time() - start

    # Hybrid search benchmark (3 queries)
    start = time.time()
    results_list = []
    for i in range(3):
        varied_query = [x + random.gauss(0, 0.05) for x in query]
        results = client.points.search(COLLECTION, vector=varied_query, limit=10)
        results_list.append(results)
    hybrid_results = reciprocal_rank_fusion(results_list)
    hybrid_time = time.time() - start

    print(f"Single search: {single_time*1000:.2f}ms")
    print(f"Hybrid search: {hybrid_time*1000:.2f}ms ({hybrid_time/single_time:.1f}x slower)")
    print(f"\nSingle results: {len(single_results)}")
    print(f"Hybrid results: {len(hybrid_results)}")
The benchmark outputs these metrics:
  • Single search time: Baseline latency for one vector search
  • Hybrid search time: Total latency for multiple searches plus fusion
  • Slowdown ratio: How many times slower hybrid search is compared to single search
  • Result counts: Number of results from each approach
Performance considerations for hybrid search:
  • Latency increases roughly linearly with the number of searches
  • The fusion step adds minimal overhead compared to the search operations
  • Use smaller limit values on individual searches to reduce candidate processing
  • For latency-sensitive applications, balance the number of queries against acceptable response time