- Recall — the fraction of truly relevant results that the system returns. If there are 10 relevant documents and the system finds 8, recall is 80%.
- Precision — the fraction of returned results that are actually relevant. If the system returns 10 results and 7 are relevant, precision is 70%.
- Measure — Establish a ground truth baseline with exact search.
- Tune HNSW — Adjust
m,ef_construct, andhnsw_effor the recall–speed trade-off. - Choose distance — Pick the right metric for your embeddings.
- Configure quantization — Compress vectors without destroying accuracy.
- Adjust search-time parameters — Use
hnsw_ef,rescore, andoversampling. - Use multi-stage prefetch — Widen the candidate pool then re-rank.
- Apply payload indexes — Accelerate filtered search.
- Set score thresholds — Cut noise at the right level.
- Rebuild and compact — Keep index quality fresh after updates.
Environment setup
Run the following command to install the Python packages required for all code samples in this tutorial.Step 1: Create a test collection and ingest data
This step sets up the shared imports, constants, and embedding helpers used throughout the tutorial. Running this block loads theall-MiniLM-L6-v2 model, defines two encoding functions, and establishes the server address and collection name that all subsequent steps reference.
Expected Output
This block embeds all 30 corpus documents usingall-MiniLM-L6-v2, constructs PointStruct objects pairing each vector with its text and category payload, upserts them into the Retrieval-Quality collection using cosine distance and default HNSW settings (m=16, ef_construct=128), flushes the writes to disk, and then queries the server for the total stored vector count to confirm the ingestion completed successfully.
Step 2: Establish a baseline with exact search
Before tuning anything, measure ground truth. Anexact (brute-force) search scans every vector in the collection and returns the mathematically correct nearest neighbours with 100% recall. Every tuning step in this tutorial should be measured against this baseline.
The following block defines three functions: exact_search, which runs a brute-force scan; approx_search, which uses the HNSW index with an optional hnsw_ef override; and compute_recall, which calculates what fraction of the exact top-K results the approximate search also returned. Running the block then queries both functions for the same input and prints a side-by-side comparison.
Expected Output
This block queries the same sentence — “How do neural networks learn from data?” — using bothexact_search (brute-force scan with SearchParams(exact=True)) and approx_search (HNSW traversal). It then calls compute_recall to measure what fraction of the exact top-10 IDs appear in the approximate results. When both result sets share identical IDs, recall reaches 100%, confirming the HNSW index is producing no approximation error on this query.
Step 3: Tune HNSW index parameters
The HNSW index has two sets of parameters: build-time parameters that affect the quality of the graph structure stored on disk, and search-time parameters that affect how many nodes the query traverses at runtime.Build-time parameters: m and ef_construct
m and ef_construct are set when creating the collection. Once set, changing them requires recreating the index. The following block defines a helper that creates a new collection for a given m and ef_construct combination and ingests the full corpus into it, then runs that helper four times to produce collections at low, default, high, and maximum index quality.
| Parameter | Low | Default | High | Max |
|---|---|---|---|---|
m | 4 | 16 | 32 | 64 |
ef_construct | 32 | 128 | 256 | 512 |
| Build speed | Fastest | Fast | Slower | Slowest |
| Memory usage | Lowest | Moderate | Higher | Highest |
| Recall potential | Lower | Good | Better | Best |
m— The number of bi-directional links created for each node. Higher values produce a denser graph with more traversal paths, which improves recall at the cost of memory and build time.ef_construct— The search width used during index construction. Higher values produce a better-connected graph. Set this to at least2 * m.
Measure recall across configurations
The following block queries the same sentence against all four collections and computes recall against the exact baseline for each, printing a row per configuration so the effect of each parameter level is directly visible.Expected Output
This block embeds the query “How do neural networks learn from data?”, fetches exact ground-truth results from the baseline collection, then runs the same approximate search against each of the four HNSW collections in turn. For each collection it computesrecall@10 against the exact baseline and prints one row per configuration, making the impact of m and ef_construct on retrieval accuracy directly visible. The low configuration uses a sparse graph that misses some traversal paths; default and above close that gap entirely.
Search-time parameter: hnsw_ef
hnsw_ef controls how many candidate nodes the search explores at query time. It can be set per request without rebuilding the index, which makes it the primary knob for trading latency against recall at runtime. The following block sweeps six values of hnsw_ef, runs an approximate search at each value, and prints the resulting recall and wall-clock latency so you can identify the point where accuracy plateaus.
Expected Output
This block sweeps six values ofhnsw_ef — 16, 32, 64, 128, 256, and 512 — against the query “How do neural networks learn from data?”. For each value it runs an approximate search, measures wall-clock latency in milliseconds using time.perf_counter, and computes recall against the exact ground-truth baseline. The output shows how recall improves from low to high ef values while latency increases proportionally, helping you identify the inflection point where accuracy plateaus before further latency cost is incurred.
hnsw_ef ranges to their typical recall and latency characteristics. As a starting point, set hnsw_ef to at least the value of your limit (top-K) and ideally 2–4x larger.
hnsw_ef | Recall | Latency | Use case |
|---|---|---|---|
| 16–32 | Lower | Fastest | Real-time autocomplete, high QPS |
| 64–128 | Good | Fast | General search, most applications |
| 256–512 | Excellent | Slower | High-accuracy retrieval, RAG |
| Exact mode | Perfect | Slowest | Evaluation, ground truth |
Step 4: Choose the right distance metric
The distance metric defines what the index considers “similar”. Choosing the wrong metric for your embedding model produces systematically lower recall regardless of any other tuning. The following block creates one collection per metric, ingests the full corpus into each, runs the same query, and prints the top results with their scores so you can see how each metric ranks the same documents differently.| Embedding model | Recommended metric | Why |
|---|---|---|
all-MiniLM-L6-v2 | Cosine | Produces normalized vectors. |
text-embedding-3-small | Cosine | Normalized by default. |
| CLIP (ViT-B-32) | Cosine | Normalized image/text embeddings. |
| Custom non-normalized | Dot or Euclid | Magnitude carries meaning. |
| Sparse/hybrid | Dot | Standard for TF-IDF/BM25. |
Step 5: Configure quantization without losing accuracy
Scalar quantization compresses 32-bit float vectors to 8-bit integers, reducing memory by 4x. The compression introduces scoring noise that lowers recall. Therescore and oversampling parameters recover that accuracy by fetching a larger candidate pool using quantized scores and then re-ranking it with the original full-precision vectors. The following block creates a quantized collection and runs three search modes against it — no rescoring, rescoring with 2x oversampling, and quantization disabled — then prints the recall each mode achieves so the trade-off is directly visible.
Expected Output
This block queries “How do transformers process sequences?” against a collection configured with scalar quantization at the 99th-percentile quantile andalways_ram=True. It runs three search modes in sequence — quantized scores only (rescore=False), quantized candidate fetch with full-precision re-ranking (rescore=True, oversampling=2.0), and full-precision scoring with quantization bypassed (ignore=True) — then computes recall@10 for each mode against an exact baseline from the same collection. The results show how rescoring with oversampling recovers the accuracy lost by compression while retaining most of its speed and memory benefit.
| Mode | Speed | Accuracy | Memory |
|---|---|---|---|
ignore=False, rescore=False | Fastest | Lower | Lowest (quantized only). |
ignore=False, rescore=True, oversampling=2.0 | Fast | High | Quantized + originals for rescore. |
ignore=True | Slower | Perfect | Full precision. |
Step 6: Use multi-stage prefetch to widen the candidate pool
A single HNSW traversal only explores one path through the graph. If the most relevant documents sit in a different region of the vector space — for example, in a specific category — that path may never reach them. Multi-stage prefetch runs several candidate-gathering passes in parallel, then re-ranks the combined pool. The following block runs a standard single-pass search and a three-stage prefetch side-by-side and prints the ranked results of each so you can compare which approach surfaces more relevant documents.| Approach | Candidate pool | Trade-off |
|---|---|---|
| Single search | Top-K from one pass. | May miss results outside the HNSW traversal path. |
| Prefetch (unfiltered) | Broader initial pool, then re-rank. | Catches near-misses from the same vector region. |
| Prefetch (multi-filter) | Candidates from different payload categories. | Ensures diversity across category boundaries. |
| Prefetch (multi-vector) | Candidates from different embedding spaces. | Enables cross-perspective matching. |
limit=5 re-ranks from the union of all prefetched candidates. Even if one prefetch path misses a relevant result, another path may find it.
Step 7: Accelerate filtered search with payload indexes
Without a payload index, every filtered search scans all stored payloads to evaluate the filter condition, making filter latency grow linearly with collection size. A payload index allows the server to look up matching points directly, reducing filter time to a constant-cost lookup. The following block runs the same category-filtered search twice — once before creating a keyword index oncategory and once after — and prints the latency of each run so the speedup is measurable.
| Scenario | Index needed? |
|---|---|
| Filtering on a field in most queries. | Yes — significant speedup. |
| Filtering on a field rarely. | Maybe — adds memory overhead. |
| Ordering by a field (OrderBy). | Yes — mark as is_principal=True. |
| Field has very few distinct values (for example, boolean). | Smaller benefit but still useful. |
| Field has high cardinality (for example, user_id). | Yes — consider is_tenant=True for keyword. |
| Filter pattern | Index type | Parameters |
|---|---|---|
Field("status").eq("active") | Keyword | KeywordIndexParams() |
Field("price").between(10, 100) | Float | FloatIndexParams(is_principal=True) |
Field("created_at").datetime_gte(...) | Datetime | DatetimeIndexParams(is_principal=True) |
Field("location").geo_radius(...) | Geo | GeoIndexParams() |
Field("description").text("keyword") | Text | TextIndexParams(lowercase=True) |
Field("count").gte(5) | Integer | IntegerIndexParams(range=True) |
Step 8: Set the right score threshold
A score threshold rejects any result whose similarity score falls below a minimum value. Setting it too low returns noisy, irrelevant results. Setting it too high discards valid matches. The right threshold depends on the score distribution of your specific embedding model and corpus. The following block fetches 30 results using exact search and then applies seven different thresholds to that result set, printing precision, recall, and result count at each level so you can identify the threshold that balances the two for your workload.Expected Output
This block runsthreshold_analysis with the query “machine learning and neural network training” and the known-relevant set {5, 6, 7, 8, 9, 10, 11}. It fetches the top 30 results using exact search, then applies seven score thresholds from 0.2 to 0.8. For each threshold it counts how many returned results are truly relevant (true positives), then prints precision (fraction of returned results that are relevant) and recall (fraction of relevant documents that were returned). The table shows the precision–recall trade-off as the threshold tightens, helping you choose the cutoff that best fits your workload.
- At threshold 0.5 — 87.5% precision, 100% recall — a good general-purpose cutoff.
- At threshold 0.6 — 100% precision, 100% recall — optimal for this query.
- At threshold 0.7 — 100% precision but only 57% recall — too aggressive for full coverage.
Step 9: Rebuild and compact for sustained quality
After many updates and deletions, index quality degrades over time. Deleted vectors leave tombstones that waste memory and slow search. Segments accumulate and fragment, reducing scan locality. The following block checks the collection state, runs a full rebuild, optimization, and compaction sequence, then checks the state again and prints both snapshots so you can confirm the collection returned to a healthy state.Expected Output
This block records the collection’s vector count, state, and segment count before any maintenance, then sequentially callsrebuild_index to regenerate the HNSW graph from live vectors, optimize to merge fragmented segments, and compact_collection with wait=True to purge deleted-vector tombstones and reclaim memory. After each operation it prints a confirmation message, then flushes to disk and reads the collection state again to confirm the post-maintenance snapshot matches the pre-maintenance vector count with a clean, compacted structure.
| Operation | When to run | Impact |
|---|---|---|
rebuild_index | After bulk updates (more than 20% of data changed). | Rebuilds HNSW graph for better recall. |
optimize | Periodically (daily or weekly). | Merges small segments for better locality. |
compact_collection | After many deletions. | Purges tombstones and reclaims memory. |
flush | After any write operation. | Persists data to disk. |
Step 10: Update HNSW config without rebuilding data
collections.update lets you change HNSW parameters on an existing collection without re-ingesting any data. This is useful when you start a project with conservative settings for fast iteration and want to raise quality before going to production. The following block reads the current configuration, applies higher m and ef_construct values, and then triggers an explicit rebuild so the new settings take effect on the existing index immediately.
m and ef_construct values keeps build times fast during development. Increasing them before deployment raises the recall ceiling without requiring any data migration.
Retrieval quality checklist
The following tables summarize every lever available in Actian VectorAI DB for optimizing retrieval quality, grouped by when the parameter takes effect.Collection-level settings (set once, rebuild to change)
These parameters are fixed at collection creation time. Changing them requires recreating the index.| Lever | Parameter | Effect on quality |
|---|---|---|
| Distance metric | Distance.Cosine / Dot / Euclid / Manhattan | Defines similarity semantics. |
| HNSW connectivity | HnswConfigDiff(m=16) | Higher m = denser graph = better recall. |
| HNSW build quality | HnswConfigDiff(ef_construct=128) | Higher = better-connected graph. |
| Quantization | QuantizationConfig(scalar=...) | Reduces memory; needs rescore for accuracy. |
| Optimizer config | OptimizersConfigDiff(indexing_threshold=...) | Controls when the HNSW index is built. |
Query-level settings (adjustable per search)
These parameters can be tuned on every search request without changing or rebuilding the index.| Lever | Parameter | Effect on quality |
|---|---|---|
| Search width | SearchParams(hnsw_ef=128) | Higher = more accurate, slower. |
| Exact mode | SearchParams(exact=True) | Perfect recall, no approximation. |
| Rescore after quantization | QuantizationSearchParams(rescore=True) | Recovers accuracy lost to quantization. |
| Oversampling | QuantizationSearchParams(oversampling=2.0) | Retrieves more candidates before rescoring. |
| Score threshold | score_threshold=0.5 | Removes low-confidence results. |
| Multi-stage prefetch | PrefetchQuery(query=..., filter=..., limit=20) | Widens candidate pool from multiple angles. |
Operational maintenance (run periodically)
Run these operations on a schedule to keep retrieval quality from degrading as data changes over time.| Lever | Method | Effect on quality |
|---|---|---|
| Index rebuild | vde.rebuild_index() | Refreshes HNSW graph after bulk changes. |
| Optimization | vde.optimize() | Merges segments for better locality. |
| Compaction | vde.compact_collection() | Purges deleted vectors and reclaims memory. |
| Payload indexing | points.create_field_index() | Accelerates filtered search. |
Next steps
With retrieval quality optimized, explore these related tutorials to continue building your search pipeline.Similarity search basics
Learn the core retrieval workflow.
Predicate filters
Combine vector search with structured payload constraints.
Hybrid search patterns
Mix dense and sparse retrieval with fusion.
Geospatial search
Make retrieval location-aware.