- Not all queries need retrieval. “What is 2 + 2?” should skip the vector database entirely. Sending it through retrieval wastes latency and may inject irrelevant context.
- Different queries need different retrieval strategies. A factual lookup (“What is the capital of France?”) needs high-precision single-pass search. An exploratory question (“How does authentication work in the system?”) needs broad multi-stage retrieval across multiple document types.
- Retrieval confidence varies. If the top result has a score of 0.92, the LLM probably has enough context. If the best score is 0.35, the system should either try a different search strategy or tell the user it does not know.
- User feedback should improve future retrieval. When a user marks a response as unhelpful, the system should learn which documents were not relevant.
- A knowledge base collection with payload indexes for routing, feedback, and analytics.
- A keyword-signal query classifier that maps queries to four retrieval strategies.
- Three retrieval strategies (precise, broad multi-stage, and nested troubleshooting prefetch) plus an automatic fallback.
- A confidence evaluator that decides whether results are good enough or a fallback is needed.
- A user feedback loop that updates per-document usefulness scores over time.
- A feedback-aware retrieval function that boosts historically helpful documents.
- An analytics function that shows which documents are most retrieved and most useful.
- A prompt-assembly step that packages context and confidence instructions for any LLM.
Architecture overview
The following diagram shows how queries flow through the adaptive RAG pipeline, from classification through strategy selection, confidence evaluation, and the feedback loop back into the knowledge base:Environment setup
Run the following command to install the Actian VectorAI SDK and the sentence-transformers library used for embedding:Step 1: Import dependencies and configure the environment
The following block imports all SDK symbols used throughout the tutorial, loads theall-MiniLM-L6-v2 embedding model, and defines three constants (SERVER, COLLECTION, EMBED_DIM) that every subsequent step shares. Running it prints the active configuration so you can confirm the setup before proceeding:
Expected output
This block loads theall-MiniLM-L6-v2 sentence-transformer model and defines the three shared constants—SERVER, COLLECTION, and EMBED_DIM—that every subsequent step references. The three print calls confirm the active server address, the target collection name, and the embedding model with its vector dimensionality, so you can verify the configuration is correct before proceeding.
Step 2: Create the knowledge base collection
The following block creates theAdaptive-RAG collection with a cosine-distance HNSW index and registers six payload field indexes. Running it prints a confirmation message when the collection and all indexes are ready:
| Field | Purpose in adaptive RAG |
|---|---|
doc_type | Route different query types to different document categories. |
source | Filter by origin (API docs vs. tutorials vs. changelogs). |
section | Narrow retrieval to specific parts of the documentation. |
retrieval_count | Track which documents are retrieved frequently. |
usefulness_score | Boost or demote documents based on user feedback. |
created_at | Enable time-based range queries and filtering on document age. |
Step 3: Ingest documents into the knowledge base
The following block defines 20 sample documents across five categories (API reference, tutorials, conceptual guides, troubleshooting, and changelog) and upserts them into the collection. Each point is assigned initial metadata values:retrieval_count: 0, usefulness_score: 0.5, and a UTC timestamp. Running it prints the total number of documents confirmed in the collection:
Expected output
This block embeds all 20 document texts in a single batch call and upserts them asPointStruct objects, each carrying its source metadata alongside the initial tracking fields (retrieval_count: 0, usefulness_score: 0.5, feedback_count: 0). After upserting, it calls flush to persist the writes to disk and then queries get_vector_count to confirm the exact number of vectors now stored in the collection.
Step 4: Build the query classifier
The classifier inspects keyword signals in a query and returns aClassifiedQuery that names the query type and the target document categories to search. The following block defines the QueryType enum, the ClassifiedQuery dataclass, and the classify_query function, then runs it against four test queries and prints the assigned type and confidence for each:
Expected output
The classifier inspects each query for keyword signals and maps it to one of fourQueryType values. The four test queries are designed to exercise every branch: a “how to use” phrase triggers factual, an open-ended “how does” triggers exploratory, an error-related phrase triggers troubleshooting, and a greeting triggers no_retrieval. Each line of output shows the assigned type right-aligned, the classifier’s confidence score, and the original query text.
Step 5: Strategy 1—Precise retrieval for factual queries
Factual queries require high precision. The following block definesprecise_retrieval, which searches only within the specified document type categories, applies a score_threshold of 0.5 to discard low-similarity results, and uses hnsw_ef=256 to maximise recall accuracy. Running the test query prints each result’s score, document type, and a truncated text preview:
| Parameter | Setting | Rationale |
|---|---|---|
hnsw_ef=256 | High | Factual queries need the right answer, not just a plausible one. |
score_threshold=0.5 | Strict | Drops results below cosine 0.5—better to return nothing than noise. |
doc_types filter | Focused | Searches only API reference and concepts for factual questions. |
top_k=3 | Small | Factual answers are usually found in one or two documents. |
Step 6: Strategy 2—Broad multi-stage retrieval for exploratory queries
Exploratory queries need breadth across multiple document types. The following block definesbroad_retrieval, which creates one prefetch stream per document type plus an unfiltered catch-all stream, then merges all candidates with RRF fusion. Running the test query prints each result’s score, document type, and a text preview:
hnsw_ef=128 per stream is a deliberate trade-off: the four parallel streams compensate for any individual miss, so per-stream precision matters less than overall breadth.
Step 7: Strategy 3—Troubleshooting retrieval with nested prefetch
Troubleshooting queries benefit from a wide net across both FAQ-style documents and changelogs, which often contain relevant fixes. The following block definestroubleshooting_retrieval, which uses a nested prefetch pipeline: inner prefetch stages gather candidates from troubleshooting docs and changelogs, DBSF fusion merges them, and a final re-rank pass uses the query vector to surface the most relevant results. Running the test query prints each result’s score, document type, and a text preview:
Step 8: Build the confidence evaluator
After retrieval, the pipeline needs to decide whether the results are strong enough to pass to the LLM or whether a fallback is needed. The following block defines theRetrievalResult dataclass and the evaluate_confidence function, then runs it against a test query and prints the confidence level, top score, average score, and document count:
| Confidence | Condition | Action |
|---|---|---|
high | Top score >= 0.6 and avg >= 0.35. | Proceed to LLM with full confidence. |
medium | Top score >= 0.35. | Proceed but add a caveat: “Based on available information…” |
low | Top score < 0.35. | Try the fallback strategy or respond with “I don’t know.” |
no_results | Empty result set. | Skip retrieval, answer directly or say “No relevant docs found.” |
Step 9: Fallback strategy—Widen the search
When initial retrieval has low confidence, the fallback strategy removes all filters, raises the candidate pool size, and merges the original results with an unfiltered search using client-side RRF. The following block definesfallback_retrieval, simulates a low-confidence query, and prints the fallback confidence level and the top results returned:
Sample.Random returns random points from the collection. In the fallback function above, it acts as a last-resort “did you mean?” response: if neither the original filtered search nor the unfiltered widening returns any results, the function returns these random documents so the user can see what is in the knowledge base and reformulate the query. Both fallback queries run inside a single client connection to avoid an extra round-trip.
Step 10: Build the adaptive router
The router is the central coordinator. It classifies the incoming query, dispatches it to the appropriate retrieval strategy, evaluates the result confidence, invokes the fallback when needed, and increments a retrieval counter on every returned document. The following block defines theAdaptiveRAGRouter class:
Expected output
Thedemo_router function passes five representative queries through the full adaptive pipeline. Each query is first classified, then dispatched to the appropriate strategy—precise for factual, broad for exploratory, troubleshooting for error queries, and no_retrieval for the greeting. The final query about a non-existent “quantum flux capacitor” does not match any document closely enough, so the primary broad search scores poorly and the router automatically invokes the fallback strategy, producing the broad+fallback label with a low confidence rating and a reduced top score.
Step 11: User feedback loop
When a user marks a response as helpful or unhelpful, theusefulness_score of every retrieved document is updated. The following block defines record_feedback, simulates a helpful feedback event on a real retrieval result, and prints a confirmation with the number of documents updated:
| Scenario | Formula | Effect |
|---|---|---|
| Helpful feedback. | score += (1.0 - score) * 0.1 | Score rises asymptotically toward 1.0. |
| Unhelpful feedback. | score -= score * 0.15 | Score drops faster, penalizing poor results. |
| No feedback. | Score unchanged. | Stays at the default of 0.5. |
Step 12: Feedback-aware retrieval
Theusefulness_score accumulated in step 11 can be used to bias future retrieval toward documents that users have consistently found helpful. The following block defines feedback_aware_retrieval, runs a test query, and prints each result’s score, usefulness score, retrieval count, document type, and a text preview:
Step 13: Analytics—what is the system learning?
As the system accumulates retrieval events and feedback, payload fields likeretrieval_count and usefulness_score reflect its usage patterns. The following block queries the collection for the five most-retrieved documents, the five most-useful documents, and any documents that are frequently retrieved but consistently rated unhelpful, then prints all three groups:
Step 14: Prepare the prompt for LLM integration
The final pipeline step assembles the retrieved context chunks and a confidence-adjusted instruction into a prompt string. The following block definesadaptive_rag_answer, runs it against four test queries, and prints the strategy, confidence level, and source document count for each. The actual LLM call is left as a stub (# answer = await llm.generate(prompt)) so any provider can be plugged in:
Step 15: Collection cleanup
The following block retrieves the current document count, flushes all pending writes to disk, and prints a confirmation. Uncomment the delete lines to remove the collection entirely:Adaptive strategies summary
The following table summarizes the retrieval strategy, search configuration, and fusion method used for each query type:| Query type | Strategy | Search config | Prefetch | Fusion | Threshold |
|---|---|---|---|---|---|
| Factual. | Precise. | hnsw_ef=256 | None. | None. | 0.5 |
| Exploratory. | Broad multi-stage. | hnsw_ef=128 | 4 streams (per doc_type + unfiltered). | Fusion.RRF | None. |
| Troubleshooting. | Nested prefetch. | Default. | 2 inner (FAQ + changelog) → DBSF → re-rank. | Fusion.DBSF inner. | None. |
| Low confidence. | Fallback. | hnsw_ef=256 | None (unfiltered) + Sample.Random. | Client-side RRF. | None. |
| Feedback-aware. | Boosted fusion. | Default. | 2 streams (all + useful). | Fusion.RRF | usefulness >= 0.4 |
APIs and features used in this tutorial
The following table lists every VectorAI DB API and feature demonstrated across the fifteen steps:| Feature | API | Purpose |
|---|---|---|
| Collection creation. | collections.get_or_create(hnsw_config=...) | Knowledge base setup. |
| Semantic search. | points.search(params=SearchParams(hnsw_ef=256)) | Precise factual retrieval. |
| Score threshold. | points.search(score_threshold=0.5) | Cut low-confidence results. |
| Multi-stage prefetch. | PrefetchQuery(query=..., filter=..., limit=...) | Per-doc-type retrieval streams. |
| Nested prefetch. | PrefetchQuery(prefetch=[...]) | Three-stage troubleshooting pipeline. |
| Server-side RRF. | query={"fusion": Fusion.RRF} | Broad exploratory fusion. |
| Server-side DBSF. | query={"fusion": Fusion.DBSF} | Troubleshooting score-normalized fusion. |
| Random sampling. | query={"sample": Sample.Random} | Fallback discovery. |
| Client-side RRF. | reciprocal_rank_fusion(results, limit=...) | Fallback merge. |
| Payload updates. | points.set_payload(payload=...) | Feedback tracking, retrieval counters. |
| Payload ordering. | query(query={"order_by": OrderBy(...)}) | Most-retrieved, most-useful analytics. |
| Selective payload. | WithPayloadSelector(include=[...]) | Return only needed fields. |
| Keyword index. | FieldType.FieldTypeKeyword | Doc type and source filtering. |
| Float index (principal). | FloatIndexParams(is_principal=True) | Usefulness score ordering. |
| Integer index (range). | IntegerIndexParams(range=True) | Retrieval count range queries. |
| Datetime index. | DatetimeIndexParams(is_principal=True) | Index created_at for range queries and time-based filtering. |
any_of filter. | Field("doc_type").any_of([...]) | Multi-value doc type matching. |
gte / lt filters. | Field("usefulness_score").gte(0.4) | Feedback-based boosting. |
| Filtered count. | points.count(filter=..., exact=True) | Analytics per doc type. |
| Vector count. | vde.get_vector_count() | Collection statistics. |
| Flush. | vde.flush() | Persist pending writes. |
Next steps
Re-ranking search results
Improve relevance with multi-stage re-ranking
Building multi-modal systems
Add image search to your RAG pipeline
Optimizing retrieval quality
Tune HNSW, quantization, and search parameters
Predicate filters
Master the full Filter DSL