- Embedding models are general-purpose. A 384-dim model captures broad semantics but may not distinguish subtle relevance differences in your domain.
- ANN search is approximate. HNSW may miss some true nearest neighbours, especially with conservative
hnsw_ef. - Relevance is multi-dimensional. A product search cares about semantic match, recency, popularity, and price—not just vector distance.
- Server-side re-ranking with
prefetch+query— Retrieve broadly, then re-score with a different vector or metric. - Quantization rescore — Search quantized vectors fast, then rescore from originals.
- Cross-encoder re-ranking — Use a dedicated cross-encoder model client-side.
- Payload-based re-ranking — Boost scores using structured metadata.
- Fusion re-ranking — Combine multiple retrieval signals and fuse.
- Cascaded multi-stage pipelines — Chain multiple re-ranking stages.
- Score threshold pruning — Cut low-confidence results after re-ranking.
Architecture overview
The following diagram shows how a query flows through a multi-stage re-ranking pipeline.Environment setup
Install the Actian VectorAI Python SDK and the sentence-transformers library for embedding and cross-encoder models.Step 1: Create a test collection and ingest data
This step uses a corpus of 25 technical documents across several topics to demonstrate how re-ranking improves result ordering.Step 2: Baseline single-pass search
Before adding any re-ranking, establish what a single-pass vector search returns.Step 3: Server-side re-ranking with prefetch
The most powerful re-ranking pattern in Actian VectorAI DB uses thequery endpoint with prefetch. The server retrieves a broad candidate pool first, then re-scores it.
How prefetch re-ranking works
The two-stage flow fetches a broad set of candidates and then re-scores them in the final query.ef while the final stage re-scores all prefetched candidates. To guarantee exact (brute-force) cosine similarity instead of another HNSW pass, set params=SearchParams(exact=True) on the outer query() call—demonstrated in Step 4.
The pattern becomes much more powerful when the prefetch and final query use different signals — explored in the next section.
Step 4: Re-rank with higher accuracy (exact rescore)
Retrieve candidates with fast approximate search, then rescore them withexact (brute-force) computation.
Why this matters
The prefetch retrieves 25 candidates using the HNSW index (fast, approximate). The finalparams=SearchParams(exact=True) computes exact cosine similarity over those 25 candidates. This corrects any scoring inaccuracies from the approximate index without scanning the entire collection.
Step 5: Quantization rescore
When using scalar quantization for memory savings, the compressed vectors introduce scoring noise. Therescore parameter re-scores candidates using the original full-precision vectors.
How quantization rescore works
The three parameters below control whether and how the server re-scores candidates after a quantized search.| Parameter | Effect |
|---|---|
rescore=False | Rank by quantized (int8) scores only—fast but noisy |
rescore=True | Re-compute scores from original float32 vectors—accurate |
oversampling=2.0 | Retrieve 2x more candidates from quantized index before rescoring |
- Search the int8 quantized index for
limit * oversampling = 10candidates - Load the original float32 vectors for those 10 candidates
- Recompute exact cosine similarity
- Return the top 5 by exact score
Step 6: Cross-encoder re-ranking (client-side)
A cross-encoder processes the query and each candidate together as a pair, producing a more precise relevance score than independent embeddings. This is the gold standard for re-ranking quality, but is too slow for first-pass retrieval.Bi-encoder vs cross-encoder
The table below contrasts the two model types used in this pipeline.| Model type | How it scores | Speed | Quality |
|---|---|---|---|
| Bi-encoder | Encode query and document separately, compute cosine | Fast (index-based) | Good |
| Cross-encoder | Encode query+document as a pair, output relevance score | Slow (per-pair) | Excellent |
Step 7: Payload-based re-ranking (score boosting)
Combine vector similarity with structured metadata to create a composite relevance score.How composite scoring works
The composite score blends three normalised signals into a single value.| Signal | Weight | What it captures |
|---|---|---|
| Vector similarity | 0.5 | Semantic relevance to the query |
| Popularity | 0.3 | Community consensus on importance |
| Recency | 0.2 | Freshness of the content |
Step 8: Fusion re-ranking
Run two different searches (for example, with different filters or different query formulations) and fuse the results.How multi-stream fusion works
The server runs three independent retrieval streams and then fuses their rankings with RRF.Step 9: Client-side fusion with weighted re-ranking
For full control over how different signals are blended, use client-side fusion with weights.RRF vs DBSF for re-ranking
The two fusion methods differ in how they normalise scores before combining them.| Method | Scoring | Best when |
|---|---|---|
| RRF | 1 / (k + rank), summed across lists | Score scales differ across lists |
| DBSF | Normalize by mean/std, then average | Scores are on comparable scales |
weights parameter in RRF controls how much each list influences the final ranking. A weight of 0.6 on the topic-filtered list biases toward ML-specific results.
Step 10: Cascaded multi-stage pipeline
Chain multiple re-ranking stages using nested prefetch. Each stage narrows and refines the candidate pool.How the cascade works
Each nested prefetch narrows the candidate pool while increasing scoring accuracy.exact rescore operates on just 15 candidates—fast enough to be imperceptible.
Step 11: Re-rank with score threshold pruning
After re-ranking, apply a score threshold to remove low-confidence results.Threshold placement: before vs after re-ranking
Where you place the threshold determines which scores it evaluates.| Approach | Behavior |
|---|---|
score_threshold on the outer query() | Applied after the final rescore—recommended |
score_threshold on a PrefetchQuery | Applied within that prefetch stage—use to pre-filter noisy candidates |
Step 12: Full re-ranking pipeline
Bring everything together: server-side prefetch for broad retrieval, cross-encoder for precision re-ranking, and payload boosting for business signals.The full pipeline
The four stages below show how each layer refines the candidate set before the final ranking.Step 13: Collection cleanup
Flush the collection to disk and optionally delete it when you no longer need the tutorial data.Re-ranking strategies compared
The table below summarises each strategy’s trade-offs to help you decide which to apply.| Strategy | Where | Latency | Quality gain | When to use |
|---|---|---|---|---|
| Prefetch + rescore | Server | Low | Moderate | Always—baseline improvement |
| Exact rescore | Server | Low | Moderate | When HNSW approximation matters |
| Quantization rescore | Server | Low | High | When using scalar/product quantization |
| Cross-encoder | Client | High | Highest | When precision is critical (RAG, QA) |
| Payload boosting | Client | Negligible | Domain-dependent | When business signals matter (popularity, recency) |
| Fusion (RRF/DBSF) | Server or Client | Moderate | High | When combining multiple retrieval strategies |
| Cascaded pipeline | Server | Moderate | High | When you need progressive refinement |
| Score threshold | Server | Negligible | Precision gain | Always—remove noise from output |
Choosing your re-ranking strategy
Start with the simplest option and layer on additional stages as your relevance requirements grow.Start simple
Begin with a prefetch and exact rescore. This is the lowest-effort improvement over a raw HNSW search.Add cross-encoder when quality matters
When precision is critical (RAG pipelines, question-answering), add a cross-encoder stage after the initial retrieval.Add payload signals for business relevance
If your application weights non-semantic signals like popularity or recency, combine them into a composite score.Use fusion when you have multiple retrieval paths
When you run multiple searches with different filters or query formulations, fuse the results server-side.Actian VectorAI features used
The table below summarises every Actian VectorAI DB API surface covered in this tutorial and its role in a re-ranking pipeline.| Feature | API | Purpose |
|---|---|---|
| Prefetch + re-rank | PrefetchQuery(query=..., limit=...) | Broad retrieval before re-scoring |
| Exact rescore | SearchParams(exact=True) on outer query() | Brute-force re-ranking of candidates |
| Quantization rescore | QuantizationSearchParams(rescore=True, oversampling=2.0) | Recover accuracy after quantized search |
| Cascaded prefetch | Nested PrefetchQuery(prefetch=[...]) | Multi-stage progressive refinement |
| Server-side RRF | query={"fusion": Fusion.RRF} | Rank-based fusion of prefetch streams |
| Server-side DBSF | query={"fusion": Fusion.DBSF} | Score-normalized fusion |
| Client-side RRF | reciprocal_rank_fusion(results, weights=...) | Weighted client-side fusion |
| Client-side DBSF | distribution_based_score_fusion(results) | Score-aware client-side fusion |
| Score threshold | score_threshold=0.5 on query() or PrefetchQuery | Prune low-confidence results |
| Per-stage params | params=SearchParams(hnsw_ef=256) in PrefetchQuery | Different accuracy per stage |
| Payload filtering | FilterBuilder().must(Field(...).eq(...)) | Topic-specific retrieval streams |
| Similarity search | points.search(limit=...) | First-stage candidate retrieval |
| Universal query | points.query(query=..., prefetch=...) | Multi-stage re-ranking pipelines |
| Scalar quantization | QuantizationConfig(scalar=ScalarQuantization(...)) | Memory-efficient vector storage |
| Collection admin | vde.flush(), vde.get_vector_count() | Persistence and monitoring |
Next steps
Explore the tutorials below to put re-ranking into practice alongside other Actian VectorAI DB capabilities.Building multi-modal systems
Combine text, image, and metadata embeddings with named vectors
Optimizing retrieval quality
Tune HNSW, quantization, and distance metrics
Predicate filters
Combine vector search with structured payload constraints
Similarity search basics
Learn the core retrieval workflow