- The exact terms: “jeans”, “french connection”, “blue”.
- The semantic meaning of the query.
- Visual similarity with product images.
- CLIP ViT-B-32 embeddings (512-dim) for semantic understanding of images and text.
- BM25 sparse scoring for keyword relevance.
- Actian VectorAI DB for scalable vector storage and retrieval.
- Actian VectorAI SDK fusion algorithms (RRF and DBSF) for combining dense and sparse results.
Prerequisites
Before starting, make sure you have the following in place:- A running Actian VectorAI instance reachable at
localhost:50051. - Python 3.10 or later.
- A set of product images and associated metadata (product name, category, color, gender, and so on). This tutorial uses a fashion product dataset as its example, but the same pipeline applies to any product catalog.
Architecture overview
The system is structured around two pipelines. The product registration pipeline takes a product image and its metadata, generates a 512-dimensional CLIP embedding from the image, concatenates the metadata into a searchable text string for BM25, and stores both in Actian VectorAI DB as a single point. The hybrid search pipeline takes a user query — text or image — runs a dense CLIP search server-side and a sparse BM25 search client-side, then fuses the results using RRF or DBSF to produce a single ranked output. The diagram below shows how these two pipelines connect, from product registration through to final ranked results:Why hybrid search matters
Real-world queries usually contain two types of signals — keyword signals and semantic signals. The sections below explain each type and describe how hybrid search combines them.Keyword signals
Sparse search targets exact tokens such as the brand name, color, and product type. For the query below, BM25 matches against three distinct tokens:- French connection (brand).
- Blue (color).
- Jeans (product type).
Semantic signals
Dense embeddings from CLIP capture semantic meaning, allowing results to surface even when no exact tokens match. For the query below, the product description may not contain these exact words, but the system still returns similar items:How hybrid search combines both
Instead of choosing one method, this approach combines both using fusion. The Actian VectorAI SDK provides two built-in fusion algorithms:- Reciprocal Rank Fusion (RRF) — Rank-based merging that ignores raw scores. Use this when dense and sparse scores are on different scales.
- Distribution-Based Score Fusion (DBSF) — Normalizes and averages scores using mean and standard deviation. Use this when you want score-aware blending.
alpha parameter controls the weight balance in RRF. Higher values favor dense results; lower values favor sparse results:
Environment setup
The following command installs the three packages required for image processing, CLIP embeddings, and the Actian VectorAI SDK. Run this before proceeding with the implementation:Implementation
The following steps build the complete multimodal hybrid search system, from loading the CLIP model and initializing the collection through to running dense, sparse, and fused queries.Step 1: Import dependencies and configure
The block below imports all required libraries, sets the server address and collection name, and loads the CLIP model. Running it prints the configured server address, collection name, and CLIP model dimensionality, confirming everything is ready before any collections or vectors are created.embed_image or embed_text_clip reuses the same instance without reloading weights. On the first run, SentenceTransformer("clip-ViT-B-32") downloads the model weights before returning. Running this block confirms the configured server address, collection name, and CLIP model dimensionality, verifying that everything is ready before any collections or vectors are created.
Expected output
Running this block prints the server address, collection name, and the CLIP model dimensionality, confirming the configuration is valid before any collections or vectors are created.Step 2: Define embedding helpers
CLIP maps images and text into the same 512-dimensional vector space. This shared space is what enables cross-modal search — a text query can retrieve products whose embeddings were generated from images, because both live in the same space. The two functions below handle each input type separately but produce vectors that are directly comparable.Step 3: Build the BM25 text representation
BM25 scoring operates on plain text rather than vectors. The function below concatenates all product metadata fields into a single lowercase string, which is stored in the point payload at registration time and scored against the query at search time.Expected output
For a denim product with complete metadata, the concatenated string looks like this. BM25 uses this string to match query tokens such as “french connection”, “jeans”, and “blue” at search time.Step 4: Implement client-side BM25 scoring
The BM25 function below takes a list of query tokens, the text of a single document, and corpus-level statistics (average document length, per-token document frequency, and total document count). It returns a float relevance score for that document. Scores of zero indicate no token overlap between the query and the document.k1=1.5 and b=0.75 are standard BM25 values that work well across most text corpora.
Step 5: Initialize the VectorAI collection
The function below creates theNextGen-Purchase collection if it does not already exist. get_or_create is idempotent, so calling it on every startup is safe — it returns immediately if the collection is present. Running the block prints a confirmation that the collection is ready to accept vectors.
NextGen-Purchase collection using 512-dimensional CLIP vectors with cosine distance. The HNSW parameters m=32 and ef_construct=256 balance recall quality against indexing speed. Because get_or_create is idempotent, this call is safe to repeat on every startup — it returns immediately when the collection already exists.
Expected output
Running this block prints a confirmation that the collection is ready. If the collection already exists, the message is identical —get_or_create does not raise an error on repeat calls.
Step 6: Register a product
The function below registers a single product. It generates a 512-dim CLIP embedding from the product image, builds the BM25 text string from the metadata, and upserts both as a single point in the collection. After registration, it flushes the collection to disk and prints the product name alongside the updated total vector count.- Vector — A 512-dim CLIP image embedding used for dense similarity search.
- payload.product_text — The concatenated metadata string scored by BM25 at query time.
- Additional payload fields — All original metadata fields, returned alongside each search result.
vde.flush() call persists any buffered writes to disk before the function returns, ensuring the point is available for search immediately after registration.
Step 7: Dense search (server-side)
The two functions below perform dense similarity searches using CLIP embeddings. Both encode the query into a 512-dim vector and send it to the Actian VectorAI server, which runs HNSW approximate nearest-neighbor search and returns a ranked list ofScoredPoint objects. The only difference between the two is the query input type.
ScoredPoint objects sorted by descending cosine similarity.
Step 8: Sparse BM25 search (client-side)
Unlike dense search, BM25 runs entirely on the client. The function below fetches all points from the collection in batches of 500, computes BM25 scores locally by comparing query tokens against theproduct_text payload field of each point, and returns the top-K results sorted by descending score.
Step 9: Hybrid search with fusion
The function below runs both dense and sparse searches in sequence and merges the results using either RRF or DBSF. It fetchestop_k * 5 candidates (up to 50) from each search before fusing, giving the fusion algorithm a broad enough input to rerank effectively before returning the final top_k results.
product_text payload field. Third, it passes both result lists to either reciprocal_rank_fusion or distribution_based_score_fusion from the Actian VectorAI SDK.
The alpha parameter shifts the weight between the two sources in RRF. The table below shows how different values change the balance:
| Alpha | Behavior |
|---|---|
1.0 | 100% dense — Pure visual/semantic similarity. |
0.7 | 70% dense, 30% sparse — Mostly semantic with keyword boost. |
0.5 | Equal blend — Balanced hybrid. |
0.3 | 30% dense, 70% sparse — Mostly keyword with semantic boost. |
0.0 | 100% sparse — Pure BM25 keyword matching. |
Step 10: Run the end-to-end hybrid search
The block below runs the same query through all four search modes — dense-only, sparse-only, RRF-fused, and DBSF-fused — and prints a ranked result list for each. Running it lets you compare how each approach ranks “Dark Blue French Connection Jeans” against other products in the collection."dark blue french connection jeans for men" through all four search modes in sequence. Dense-only search encodes the query as a CLIP text vector and runs server-side cosine similarity against stored image embeddings. Sparse-only search scores every point’s product_text payload field using client-side BM25, rewarding exact token matches for terms like “french connection” and “jeans”. The RRF-fused mode combines both ranked lists with equal weight (alpha=0.5), merging by rank position regardless of raw score scale. The DBSF-fused mode normalizes scores by their distribution before averaging. Hybrid search ranks “Dark Blue French Connection Jeans” highest across both fusion methods because it satisfies both the CLIP semantic similarity and the BM25 exact-keyword match.
Expected output
Exact scores depend on the dataset and the products registered. The values below are illustrative. Notice that BM25 scores are on a different scale from CLIP cosine similarity scores — RRF handles this by merging on rank position rather than raw values.Step 11: Collection administration
The block below demonstrates three VDE operations: retrieving the current vector count, flushing buffered writes to disk, and deleting the collection (shown as a comment). Running it prints the total number of stored product vectors and a confirmation that the flush completed successfully.Expected output
Running this block prints the current vector count followed by a confirmation that the flush completed. The count reflects how many product points have been registered in the collection.How images are returned after retrieval
Vector databases store embeddings, not raw image files. A common question when building retrieval systems is how to get images back from search results. The answer is payload metadata. When registering a product, store theimage_filename in the payload alongside the embedding. The payload dictionary below shows what a complete point entry looks like:
image_filename. The application uses that value to load the image from disk or object storage and render it to the user. The vector database stores representations, not the raw images.
Fusion methods compared
The hybrid search pipeline produces two separate ranked lists — one from dense CLIP search and one from sparse BM25 scoring. A fusion algorithm merges these lists into a single ranking. The table below compares the two built-in options provided by the Actian VectorAI SDK:| Method | How it works | When to use |
|---|---|---|
| RRF (Reciprocal Rank Fusion) | Merges by rank position, ignores raw scores. | When dense and sparse scores are on different scales. |
| DBSF (Distribution-Based Score Fusion) | Normalizes scores using mean and standard deviation, then averages. | When you want score-aware blending. |
ScoredPoint lists — and return a single merged and ranked list. The example below shows how to call each one directly:
Actian VectorAI features used
The system in this tutorial relies on the following Actian VectorAI SDK features. The table below lists each feature, the corresponding API call, and its role in the pipeline:| Feature | API | Purpose |
|---|---|---|
| Collection creation | client.collections.get_or_create() | Create vector space with HNSW config. |
| Point upsert | client.points.upsert() | Store CLIP vectors with product payload. |
| Dense search | client.points.search() | Server-side CLIP similarity search. |
| Point retrieval | client.points.get() | Fetch points by ID for BM25 scoring. |
| Vector count | client.vde.get_vector_count() | Return total number of indexed points. |
| Flush | client.vde.flush() | Persist buffered writes to disk. |
| Delete collection | client.collections.delete() | Remove collection and all its vectors. |
| RRF fusion | reciprocal_rank_fusion() | Rank-based result merging. |
| DBSF fusion | distribution_based_score_fusion() | Score-normalized result merging. |
Benefits of hybrid search
Using dense and sparse retrieval together produces results that neither approach achieves alone. The three main advantages are outlined below.Better ranking
Hybrid search improves result ranking by combining two complementary signals. Semantic meaning allows CLIP to match “casual dark denim” against “jeans”. Exact token matching allows BM25 to surface brand names like “French Connection” that CLIP embeddings may not distinguish from other text.Multimodal query support
The pipeline accepts three types of query input through a singlehybrid_search call, making it straightforward to support different client interfaces without changing the search logic:
- Text queries, processed through the CLIP text encoder.
- Image queries, processed through the CLIP image encoder.
- Metadata keywords, scored by BM25 against stored product text.
Tunable balance
Thealpha parameter lets you shift the retrieval balance between visual similarity and keyword precision without changing any code — only the value passed to hybrid_search changes.
Next steps
This tutorial covered a complete multimodal hybrid search pipeline — from CLIP embeddings and BM25 scoring through to RRF and DBSF fusion. The tutorials below cover additional retrieval patterns that can be layered on top of what was built here:Hybrid search patterns
Combine vector similarity with structured constraints.
Similarity search basics
Learn the core retrieval workflow.
Filtering with boolean logic
Add
must, should, and must_not conditions.Geospatial search
Make retrieval location-aware.