all-MiniLM-L6-v2 for text (384-dim) and clip-ViT-B-32 for images (512-dim).
Architecture overview
The diagram below shows how a single collection stores two named vector spaces—one for text embeddings and one for image embeddings. At query time, the system embeds the user query into both spaces, prefetches candidates from each, and fuses the ranked lists server-side before returning a single result set.Environment setup
Run the following command to install the three Python packages this tutorial depends on.actian-vectoraiis the Actian VectorAI Python SDK, providing the async client, named vector support, server-side fusion, and gRPC transport.sentence-transformersgenerates text embeddings usingall-MiniLM-L6-v2and image embeddings usingclip-ViT-B-32.pillowhandles image loading and preprocessing.
Step 1: Import dependencies and configure the client
The block below imports the Actian VectorAI SDK alongside the embedding models, then sets the server address, collection name, and dimensionality constants for both vector spaces. Running it loads both models into memory and prints a confirmation of the active configuration.Expected output
Running this block initializes the Actian VectorAI client, loads both theall-MiniLM-L6-v2 text model and the clip-ViT-B-32 image model into memory, and echoes the active server address, collection name, and the output dimensionality of each model. No collection is created at this stage—it simply confirms that all dependencies are loaded and the configuration constants are set.
Step 2: Define embedding helpers
Each modality has its own embedding function. CLIP maps both images and text into the same 512-dim space, whileall-MiniLM-L6-v2 produces richer text representations in 384 dimensions. Running this block defines five helper functions but produces no output.
| Function | Model | Dim | Purpose |
|---|---|---|---|
embed_text | all-MiniLM-L6-v2 | 384 | High-quality semantic text matching |
embed_text_clip | clip-ViT-B-32 | 512 | Cross-modal matching (text ↔ image) |
Step 3: Create a collection with named vectors
Named vectors let you store multiple vector spaces in one collection. Running this block callsget_or_create with a vectors_config dictionary that defines a 384-dim text space and a 512-dim image space, each with its own HNSW parameters.
VectorParams, pass a dictionary where each key becomes a named vector space. The snippet below shows the minimal form of that dictionary.
"text" and one under "image". Each space can have its own:
- Dimensionality—384 for text, 512 for CLIP.
- Distance metric—Cosine, Dot, Euclid, or Manhattan.
- HNSW config—different
mandef_constructper space.
Expected output
Runningcreate_collection() calls get_or_create with a vectors_config dictionary that registers a 384-dim cosine text space and a 512-dim cosine image space, each with its own HNSW parameters. If the collection already exists it is returned as-is; if it does not, it is created. The printed line confirms that both named vector spaces are active and ready to accept points.
Step 4: Prepare multi-model product data
Each product entry has a text description and a visual description. In production, the image vector would come from actual product photos throughembed_image_from_bytes(). This example uses CLIP text embeddings of visual descriptions as stand-ins so you can run the tutorial without downloading image files. Running this block defines the products list and prints the count.
Step 5: Ingest with named vectors
The function below batch-embeds all product descriptions and visual descriptions, then upserts them as named vectors. EachPointStruct carries a dictionary whose keys ("text" and "image") match the named vector spaces defined during collection creation.
PointStruct carries both a "text" and an "image" vector. The keys must match the names declared in vectors_config when the collection was created—each vector is stored in its own HNSW index and searched independently.
Expected output
Runningingest_products() batch-embeds all ten product descriptions using all-MiniLM-L6-v2 (producing 384-dim text vectors) and all visual descriptions using the CLIP text encoder (producing 512-dim image vectors). Each PointStruct is assigned a sequential integer ID and carries both named vectors alongside the full product payload. After upserting, flush persists the collection to disk and get_vector_count confirms the total number of indexed vectors.
Step 6: Search a single vector space
Before fusing results across modalities, it helps to see what each vector space returns on its own. The two functions below search the"text" and "image" spaces independently using the using parameter, then print both ranked lists for the same query.
| Space | What it captures | Strength |
|---|---|---|
text | Semantic meaning of descriptions | Matches “cold weather” to “Gore-Tex membrane” |
image | Visual appearance and style | Matches “jacket” to brown leather visual |
Expected output
Both functions embed the query"warm jacket for cold weather" using their respective encoders—all-MiniLM-L6-v2 for the text space and the CLIP text encoder for the image space—then search each vector space independently, returning the top five scored matches. The text space ranks products by semantic overlap with the query terms, while the image space ranks them by visual similarity to the concept of a jacket in cold weather. Comparing the two lists side by side reveals where the two models agree and where they diverge.
Why do Waterproof Hiking Boots rank first in the text space? The product description mentions “Gore-Tex membrane” and “ankle support”—terms that semantically overlap with cold-weather protection. all-MiniLM-L6-v2 captures this association between weatherproof gear and cold-weather queries. The image space correctly ranks the leather jacket first, since CLIP responds to the visual cue “jacket” in the query. This is exactly why fusing both spaces in Step 7 produces better results than either alone.
Step 7: Multi-stage prefetch with server-side fusion
This is the core multi-model search pattern. The function below prefetches 20 candidates from each vector space, then passes both lists to the server for RRF fusion, returning a single merged ranking.- Prefetch stage 1—search the
"text"vector space with anall-MiniLM-L6-v2embedding and return 20 candidates. - Prefetch stage 2—search the
"image"vector space with a CLIP embedding and return 20 candidates. - Fusion—the server merges both candidate lists using Reciprocal Rank Fusion, producing a single ranked list.
query={"fusion": Fusion.RRF} tells the server to fuse the prefetch results rather than search directly.
Expected output
The function embeds the query"warm jacket for cold weather" into both the 384-dim text space and the 512-dim CLIP image space, then issues two prefetch requests—each retrieving 20 candidates from their respective vector space. The server applies Reciprocal Rank Fusion to merge both candidate lists and returns a single ranked result set of the top five products. RRF assigns each item a score based on its position across both ranked lists, so products that appear highly in both spaces receive the highest fused scores.
Step 8: Compare fusion methods—RRF vs DBSF
Actian VectorAI DB supports two server-side fusion algorithms. The function below runs the same prefetch stages through both algorithms so you can compare the ranking and score differences.| Method | How it works | Best for |
|---|---|---|
| RRF | Merges by rank position, ignores raw scores | When text and image scores are on different scales |
| DBSF | Normalizes scores using mean/std, then averages | When you want score-aware blending |
Expected output
The function embeds the query"lightweight shoes for running" into both vector spaces and runs two separate fusion queries against the same prefetch stages. The RRF query fuses the candidate lists by rank position alone, producing small fractional scores bounded by the RRF formula. The DBSF query normalizes scores from each prefetch stage using their mean and standard deviation before averaging them, resulting in scores on a 0–1 scale. Both queries return the same top-ranked items, but the score magnitudes differ significantly between the two methods.
Step 9: Add payload filters to multi-model search
The function below combines multi-model RRF fusion with structured payload filters. It builds a filter from optionalcategory and max_price arguments and passes it to the outer query() call so it applies after the two prefetch stages have been fused.
filter on the outer query() call applies after fusion. The sequence is:
- Both prefetch stages retrieve 20 candidates each, unfiltered within their space.
- The server fuses the candidate lists.
- The filter removes products that do not match—for example, wrong category or too expensive.
- The top-K from the filtered fused list is returned.
filter acts as a gate on the already-merged candidate pool, not on individual prefetch stages. To filter before fusion—for example, to restrict which documents each modality can retrieve—pass filter directly to PrefetchQuery instead.
Expected output
The function queries the collection with"stylish casual outerwear", applying category=apparel and max_price=200.0 as post-fusion constraints. Both prefetch stages retrieve 20 candidates each from the text and image spaces without filtering; the server then fuses those candidates with RRF and removes any product whose category is not apparel or whose price exceeds $200. Only the products that satisfy both constraints appear in the final ranked list, each showing its fused RRF score and price.
Step 10: Client-side fusion as an alternative
Server-side fusion treats both vector spaces equally. When you need to weight one modality higher than the other—for example, favoring text relevance over visual similarity—you can search each space independently and fuse the results client-side. The function below accepts analpha parameter that controls the text-to-image weight balance and sweeps it from 1.0 (text only) to 0.0 (image only).
| Aspect | Server-side (query + Fusion) | Client-side (reciprocal_rank_fusion) |
|---|---|---|
| Network calls | 1 (single query) | 2+ (one per vector space) |
| Weight control | No (equal weights) | Yes (weights parameter) |
| Algorithms | RRF, DBSF | RRF, DBSF |
| Latency | Lower (server merges internally) | Higher (extra round-trips) |
| Flexibility | Limited to server-supported fusions | Arbitrary post-processing |
Expected output
The code sweeps thealpha parameter across five values (1.0, 0.7, 0.5, 0.3, 0.0) for the query "comfortable everyday shoes". At alpha=1.0 the fusion result is driven entirely by text-space scores; at alpha=0.0 it is driven entirely by the CLIP image space. Each iteration calls client_side_fusion_search, independently retrieves 15 candidates from each space, and passes both result lists to reciprocal_rank_fusion with the corresponding per-list weights. The printed top-3 names illustrate how the ranking shifts as the image space gains influence.
Step 11: Run batch searches across named vectors
When you need to run several queries at once—across different vector spaces or with different search terms—search_batch sends them all in a single gRPC call. The function below accepts a list of query dictionaries and dispatches them together, reducing total latency compared to issuing individual requests.
search_batch sends all queries in a single gRPC call, reducing total latency—especially important when searching multiple vector spaces for comparison or multi-query interfaces.
Expected output
The function accepts three queries—"leather outerwear" in the text space, "brown leather jacket" in the image space, and "electronic gadgets" in the text space—and dispatches them together in a single search_batch gRPC call. Each query is encoded with the appropriate model: all-MiniLM-L6-v2 for text-space queries and the CLIP text encoder for image-space queries. The batch returns a separate ranked list for each query, with scores reflecting the cosine similarity of each product vector to the query embedding within its respective named space.
Step 12: Retrieve specific vectors from named spaces
By default, search results include payloads but not the vectors themselves. The function below runs the same query twice: once requesting the"text" vector and a subset of payload fields, and once requesting the full payload with no vectors. This lets you compare both response shapes.
| Selector | Effect |
|---|---|
with_vectors=True | Return all named vectors. |
with_vectors=False | Return no vectors (default). |
WithVectorsSelector(include=["text"]) | Return only the "text" vector. |
WithPayloadSelector(include=["name"]) | Return only the "name" payload field. |
WithPayloadSelector(exclude=["description"]) | Return all fields except "description". |
Step 13: Update a single named vector
In a multi-model system, different modalities change at different rates—product images may be re-shot while descriptions stay the same. The function below re-embeds and updates only the"image" vector for a given point without touching the "text" vector or any payload fields.
- Product descriptions rarely change, so skip re-embedding
"text". - Product images change when new photos are taken, so update only
"image". - Metadata changes with price updates, so use
set_payloadinstead.
Step 14: Nested prefetch—three-stage pipeline
When simple fusion is not enough, you can nest prefetch stages to build a multi-stage retrieval pipeline. The function below retrieves candidates from each vector space, fuses them with RRF, and then re-ranks the merged list by text similarity to give the semantic model the final say.query=text_vec, using="text" in the final stage re-scores the fused candidates using the text vector, giving the text space the final say on ranking while the image space contributed to the candidate pool.
Expected output
The function queries for"outdoor hiking gear" using a three-stage nested pipeline. Stage 1 retrieves 20 candidates from the text space and 20 from the image space. Stage 2 fuses both lists with RRF, keeping the top 15. Stage 3 re-scores those 15 candidates using the all-MiniLM-L6-v2 text embedding as the final query vector (using="text"), so the product whose description is semantically closest to the query surfaces at the top. The final scores are cosine similarity values from the text re-ranking step, not RRF scores.
Step 15: Per-space search parameters
Different vector spaces may need different accuracy-latency trade-offs. The function below assigns a lowerhnsw_ef to the text space for faster retrieval and a higher hnsw_ef to the image space for more accurate candidate selection, then fuses the results with DBSF.
hnsw_ef value for each vector space based on which modality matters more to your use case. A higher value gives more accurate results at the cost of higher latency.
| Scenario | Text hnsw_ef | Image hnsw_ef |
|---|---|---|
| Text is more important | 256 | 64 |
| Image is more important | 64 | 256 |
| Equal priority | 128 | 128 |
| Accuracy-critical | 512 | 512 |
Step 16: Inspect collection configuration
After ingestion and updates, you can verify that the collection is configured correctly. The function below retrieves the named vector configuration, total vector count, and VDE state and prints them together.Step 17: Collection cleanup
The function below flushes any pending writes to disk and optionally deletes the collection when you are done experimenting. Uncomment the delete lines to remove the collection entirely.Patterns summary
The following patterns recap the core multi-model techniques covered in this tutorial. Use them as a quick reference when building your own pipelines.Pattern 1: Independent space search
Passusing="text" or using="image" to search one named vector space at a time.
Pattern 2: Server-side multi-model fusion
Provide twoPrefetchQuery entries and set query={"fusion": Fusion.RRF} to have the server merge the candidate lists.
Pattern 3: Client-side weighted fusion
Search each space independently, then pass both result lists toreciprocal_rank_fusion with a weights list to control the text-to-image balance.
Pattern 4: Nested prefetch with re-ranking
Nest an RRF-fusion prefetch inside the outer query and setquery=text_vec to fuse first, then re-rank by the text vector.
Pattern 5: Partial vector update
Pass avector dictionary containing only the named vector to change. The server updates that vector in place without touching the others.
Actian VectorAI features used
The table below lists every Actian VectorAI feature this tutorial demonstrated, along with the corresponding API call and its purpose.| Feature | API | Purpose |
|---|---|---|
| Named vectors | vectors_config={"text": VectorParams(...), "image": VectorParams(...)} | Store multiple embedding spaces per collection |
| Named vector search | points.search(using="text") | Search a specific vector space |
| Server-side RRF | query={"fusion": Fusion.RRF} | Rank-based fusion of prefetch results |
| Server-side DBSF | query={"fusion": Fusion.DBSF} | Score-normalized fusion |
| Prefetch | PrefetchQuery(query=..., using=..., limit=...) | Multi-stage candidate retrieval |
| Nested prefetch | PrefetchQuery(prefetch=[...]) | Three-stage fuse-then-re-rank pipeline |
| Client-side RRF | reciprocal_rank_fusion(results, weights=...) | Weighted client-side fusion |
| Client-side DBSF | distribution_based_score_fusion(results) | Score-aware client-side fusion |
| Search batch | points.search_batch(searches=[...]) | Multiple queries in one call |
| Partial vector update | points.update_vectors(points=[...]) | Update one modality only |
| Payload filtering | FilterBuilder().must(Field(...).eq(...)) | Structured constraints on fusion |
| Selective return | WithPayloadSelector(include=[...]) | Return specific payload fields |
| Vector return | WithVectorsSelector(include=["text"]) | Return specific named vectors |
| Per-space tuning | SearchParams(hnsw_ef=256) in PrefetchQuery | Different accuracy per modality |
| Collection info | collections.get_info() | Inspect named vector configuration |
| VDE operations | vde.flush(), vde.get_vector_count(), vde.get_state() | Administration and persistence |
Next steps
Use the links below to continue building on what you learned in this tutorial.Optimizing retrieval quality
Tune HNSW, quantization, and search params for accuracy
Predicate filters
Combine vector search with structured payload constraints
Similarity search basics
Learn the core retrieval workflow
Hybrid search patterns
Mix dense and sparse retrieval with fusion