- “Warm and comforting” is a semantic concept — it maps to soups, stews, casseroles, and curries, but none of those words appear in the query.
- Dietary restrictions create hard constraints — a gluten-free user must never see recipes with wheat flour, regardless of semantic relevance.
- Available ingredients create soft preferences — “I have chicken, garlic, and tomatoes” should boost recipes using those ingredients without excluding others.
- User taste evolves — someone who keeps rating Thai dishes highly should see more Thai cuisine in future recommendations.
should/min_should logic, and preference learning through payload updates.
Prerequisites
Before starting this tutorial, make sure the following are in place:- Python 3.10 or later is installed.
- An Actian VectorAI DB instance is running locally or accessible at a network address.
- Basic familiarity with Python async/await syntax is assumed.
Architecture overview
The system takes a natural-language craving, converts it into an embedding, and searches Actian VectorAI DB with structured filters. Results are ranked by a combination of semantic similarity and stored preferences, and feedback loops back into the database to refine future recommendations. The diagram below shows the end-to-end data flow from the user query through embedding, search, ranking, and preference learning.Environment setup
Before running any code, install the Actian VectorAI Python client and thesentence-transformers library. The following command installs both packages into the active Python environment.
Implementation
The following steps walk through each layer of the recommendation agent — from initial setup and data ingestion to semantic search, constraint filtering, preference learning, and administration.Step 1: Import dependencies and configure
The block below imports all required modules, sets the server address and collection name, loads the embedding model once so every call reuses it, and defines two helper functions for single and batch embedding. Running this block prints the active configuration and confirms the environment is ready before any further steps.Expected output
Running the block above prints the server address, collection name, and embedding model name to confirm the configuration loaded correctly.Step 2: Create the recipe collection with payload indexes
Each recipe has structured metadata covering cuisine, dietary tags, ingredients, cook time, difficulty, and rating. The function below creates the vector collection with cosine similarity and a tuned HNSW graph, then registers eight payload indexes — one for each filter pattern used later in this guide. Running this function creates the collection if it does not already exist, attaches all eight indexes, and prints a confirmation message.Field | Index type | Filter pattern |
|---|---|---|
cuisine | Keyword | eq("thai"), any_of(["thai", "indian"]) |
diet_tags | Keyword | any_of(["gluten-free", "dairy-free"]) |
meal_type | Keyword | eq("dinner"), any_of(["lunch", "dinner"]) |
ingredients_text | Text (word tokenizer) | text("chicken") — full-text substring search |
cook_time_min | Integer (range) | lte(30), between(15, 45) |
rating | Float (principal) | gte(4.0) — ordering by rating |
is_vegetarian | Bool | eq(True) — boolean constraint |
difficulty | Keyword | eq("easy"), except_of(["hard"]) |
TextIndexParams configuration with the Word tokenizer and lowercase=True enables ingredient keyword search. Calling Field("ingredients_text").text("garlic") finds any recipe whose ingredients list contains the word “garlic”, regardless of case.
Step 3: Prepare the recipe dataset
The list below defines twelve recipes spanning multiple cuisines, meal types, and dietary profiles. Each recipe includes the structured metadata that the indexes from step 2 will filter against. Running this block loads the dataset into memory and prints a count confirming all twelve recipes are ready for ingestion.Step 4: Embed and ingest recipes
The function below batch-embeds all recipe descriptions in a single model forward pass, constructs one point per recipe by pairing its vector with its full metadata payload, and upserts all twelve points in one call. The ingredients list is joined into a space-separated string and stored asingredients_text so the text index can match individual ingredient words. Running this function inserts all twelve recipes into the collection and prints the total stored count to confirm the write succeeded.
Expected output
The function batch-embeds all twelve recipe descriptions in a single model forward pass, converts each description into a 384-dimensional vector, and upserts all twelve points into the collection in one call. Theingredients_text field is constructed by joining each recipe’s ingredient list into a space-separated string so the word-tokenized text index can match individual ingredient words. After writing, the collection is flushed to disk and the total point count is retrieved to confirm that all twelve records were stored successfully.
Step 5: Basic semantic search — “what am I craving?”
The simplest recommendation matches a craving to recipe descriptions by meaning alone, with no structural filters applied. The function below embeds the query string and searches the collection by cosine similarity, returning the top results ordered by score. Running this block with the query “I want something warm and comforting with a rich broth” returns broth-based recipes ranked by how closely their descriptions match the expressed feeling.Expected output
The function embeds the natural-language craving “I want something warm and comforting with a rich broth” into a 384-dimensional vector and performs a cosine similarity search across all twelve recipes with no payload filters applied. Results are returned in descending score order, where each score reflects how closely a recipe’s embedded description matches the semantic meaning of the query. Broth-based dishes such as soups, ramen, and pho rank highest because their descriptions carry similar semantic content to the expressed feeling.Step 6: Dietary restrictions — hard constraints with must
Dietary restrictions are non-negotiable — a gluten-free user must never see a recipe containing gluten regardless of how high it scores semantically. The function below adds a must condition for each supplied diet tag, so only recipes that carry every tag are returned. Running this block with diet_tags=["gluten-free", "dairy-free"] returns only the recipes whose diet_tags array contains both labels.
Expected output
The function embeds the query “spicy curry with coconut” and applies twomust conditions — one for "gluten-free" and one for "dairy-free" — so only recipes whose diet_tags array contains both labels are eligible. The filter Field("diet_tags").any_of(["gluten-free"]) matches any recipe that carries the tag, and wrapping each tag in must enforces AND logic so every supplied restriction must be satisfied before a recipe can appear in the results. The scores reflect semantic closeness to the craving within the filtered candidate set.
Step 7: Available ingredients — soft preferences with should and min_should
Unlike dietary restrictions, ingredient availability is a soft preference. Recipes that use available ingredients should rank higher, but recipes that do not use them should not be excluded entirely. The function below adds a should condition for each ingredient and requires at least min_match of them to appear in the result. Running this block with available=["chicken", "garlic", "tomatoes", "onion", "cream"] and min_match=2 returns recipes that contain at least two of those five ingredients, ranked by semantic similarity to the query.
should and min_should. Each should call adds one OR candidate; min_should sets the minimum number of those candidates that must match for a point to qualify.
min_should value controls how strictly the result set matches the available pantry. The table below shows how each value changes the behavior.
min_should | Behavior |
|---|---|
| 1 (default) | At least one ingredient matches — very lenient. |
| 2 | At least two ingredients match — moderate. |
| 3 | All three ingredients match — strict. |
Expected output
The function embeds the query “quick dinner tonight” and applies ashould condition for each of the five available ingredients — chicken, garlic, tomatoes, onion, and cream — with min_should(2) requiring that at least two of them appear in a recipe’s ingredients_text field. Recipes are ranked by cosine similarity to the craving vector within the filtered candidate set, and each result shows which available ingredients it matched.
Step 8: Exclude allergens — must_not and except_of
Some ingredients must be strictly excluded because of allergies or strong dislikes. The function below adds a must_not condition for each ingredient to exclude, so no returned recipe contains any of them in its ingredients_text field. Running this block with exclude=["pork", "fish sauce"] returns only recipes whose ingredient lists contain neither ingredient, ranked by semantic similarity to the query.
Expected output
The function embeds the query “creamy pasta or rice dish” and applies amust_not condition for each excluded ingredient — pork and fish sauce — so any recipe whose ingredients_text field contains either word is removed from the candidate set before scoring. The remaining recipes are ranked by cosine similarity to the craving vector, and each result shows the cuisine it belongs to. Dishes like Japanese Miso Ramen (pork belly) and Vietnamese Pho Bo (fish sauce) are absent from the results because they were eliminated by the exclusion filters.
Step 9: Combined constraints — the full recommendation query
The function below combines all constraint types into a single search call: dietary filters, a cook-time ceiling, difficulty exclusions, a rating floor, cuisine preferences, and ingredient boosts. Running this block withdiet_tags=["gluten-free"], max_cook_time=60, exclude_difficulty=["hard"], min_rating=4.0, and preferred_ingredients=["chicken", "coconut milk"] returns gluten-free dinner recipes that take no more than 60 minutes, are not hard difficulty, have a rating of at least 4.0, and preferably contain chicken or coconut milk.
Expected output
The function runs a single search combining all constraint types. The craving “something spicy and satisfying for dinner” is embedded into a vector and used for cosine similarity scoring. Hardmust conditions enforce that results are gluten-free, take no more than 60 minutes, exclude hard-difficulty recipes, and have a rating of at least 4.0. Soft should conditions boost recipes that contain chicken or coconut milk without excluding those that do not. The hnsw_ef=128 parameter is passed to increase recall at query time. Each result shows the cuisine, cook time, difficulty, rating, and dietary tags.
Step 10: Batch recommendations for meal planning
Thesearch_batch method sends multiple queries to the server in a single network call instead of one call per meal. The function below builds one query per meal — each with its own craving vector, meal-type filter, cook-time limit, and optional vegetarian flag — then dispatches all queries at once. Running this block with the three meal requests defined below returns a ranked list for each meal and prints a summary of scores and cook times.
Expected output
The function builds three independent search queries — one for a light vegetarian lunch under 30 minutes, one for a hearty dinner under 60 minutes, and one for a quick Asian dinner under 45 minutes — and dispatches all three in a singlesearch_batch call. Each query uses its own craving vector and filter combination. Without batching, three meals require three separate network round-trips. With search_batch, all three queries execute in a single gRPC call, reducing latency from three round-trips to one. Each meal’s results are printed with their similarity score, cook time, and rating.
Step 11: User preference learning
When a user rates a recipe, its metadata can be updated to influence future recommendations without re-ingesting the entire dataset. The function below fetches the current payload for a recipe, merges the new user rating into the existinguser_ratings map, recomputes the aggregate average across all stored ratings, and writes only the changed fields back using set_payload. Running this block records a rating of 5.0 from user-alice and 4.5 from user-bob for recipe 0 (Thai Green Curry), and a rating of 4.8 from user-alice for recipe 4 (Indian Butter Chicken).
Step 12: Personalized recommendations with preference boosting
After recording feedback, recommendations can be personalized by blending the current query vector with vectors from previously liked recipes. The function below collects the stored vectors for all recipes a user has liked, averages them into a taste-profile vector, then issues a two-stage prefetch query: one stage retrieves candidates by the current craving vector and the other by the taste-profile vector. Reciprocal Rank Fusion (RRF) then merges the two ranked lists into a single result. Running this block for “user-alice” — who liked Thai Green Curry and Butter Chicken — returns results that reflect both the current query and her recorded preferences.Step 13: Delete user data — GDPR compliance
To honor a right-to-erasure request, all stored ratings and preference data for a specific user must be removed from every recipe in the collection. The function below iterates over all recipe points, removes the target user’s entry from eachuser_ratings map using set_payload, recomputes the aggregate statistics from the remaining ratings, and writes the updated payload back. Running this block for “user-bob” removes his ratings from every recipe that stored them and prints the count of updated records.
Step 14: Collection administration
The function below queries four collection endpoints in sequence to gather health metrics — state, recipe count, segment count, storage bytes, and index memory usage — then flushes any pending writes to disk. Running this block after all previous steps prints a full snapshot of the collection state and confirms that buffered writes have been persisted.Filter patterns used in this article
The table below summarizes every filter pattern used in the recommendation agent, the API call that implements it, and an example value.| Pattern | API | Example |
|---|---|---|
| Exact match | Field("cuisine").eq("thai") | Match one cuisine. |
| Multi-value match | Field("cuisine").any_of(["thai", "indian"]) | Match any of several cuisines. |
| Exclusion | Field("difficulty").except_of(["hard"]) | Exclude hard recipes. |
| Full-text ingredient search | Field("ingredients_text").text("garlic") | Keyword search in ingredient list. |
| Numeric range | Field("cook_time_min").lte(30) | Maximum cook time. |
| Float threshold | Field("rating").gte(4.0) | Minimum rating. |
| Boolean | Field("is_vegetarian").eq(True) | Vegetarian only. |
| AND logic | FilterBuilder().must(...) | All constraints must match. |
| OR logic | FilterBuilder().should(...) | Preferred but not required. |
| Minimum match | FilterBuilder().min_should(2) | At least N preferences match. |
| Exclusion logic | FilterBuilder().must_not(...) | Allergen or ingredient exclusion. |
Actian VectorAI features used
The table below maps each Actian VectorAI feature to the API method and its role in the recommendation pipeline.| Feature | API | Purpose |
|---|---|---|
| Collection creation | collections.get_or_create(hnsw_config=...) | Recipe vector space. |
| Point upsert | points.upsert() | Store recipe embeddings with metadata. |
| Semantic search | points.search(filter=..., params=...) | Craving-to-recipe matching. |
| Search batch | points.search_batch(searches=[...]) | Multi-meal planning in one call. |
| Server-side fusion | query(query={"fusion": Fusion.RRF}, prefetch=[...]) | Personalized preference fusion. |
| Prefetch | PrefetchQuery(query=..., limit=...) | Multi-signal candidate retrieval. |
| Point retrieval | points.get(with_vectors=True) | Load liked recipe vectors. |
| Payload merge | points.set_payload(payload=...) | Record user ratings. |
| Keyword index | FieldType.FieldTypeKeyword | Cuisine, diet tags, meal type, difficulty. |
| Text index | TextIndexParams(tokenizer=Word, lowercase=True) | Full-text ingredient search. |
| Bool index | BoolIndexParams() | Vegetarian flag. |
| Integer index (range) | IntegerIndexParams(range=True, is_principal=True) | Cook time range queries. |
| Float index (principal) | FloatIndexParams(is_principal=True) | Rating threshold and ordering. |
any_of filter | Field("diet_tags").any_of([...]) | Multi-value dietary matching. |
except_of filter | Field("difficulty").except_of([...]) | Difficulty exclusion. |
text filter | Field("ingredients_text").text("garlic") | Ingredient keyword search. |
should / min_should | FilterBuilder().should(...).min_should(2) | Soft ingredient preferences. |
must_not | FilterBuilder().must_not(...) | Allergen exclusion. |
| Vector count | vde.get_vector_count() | Collection statistics. |
| Collection stats | vde.get_stats() | Storage monitoring. |
| Flush | vde.flush() | Persist to disk. |
Conclusion
Recipe recommendation is a microcosm of every real-world search problem: semantic understanding for vague queries, hard constraints for safety (allergies), soft preferences for personalization (pantry ingredients), and evolving taste. This system illustrates how each Actian VectorAI feature maps to a concrete product need:text()filters with the word tokenizer turn ingredient lists into searchable keyword fields without a separate text search engine.should()andmin_should()express “at least 2 of these ingredients should match” — exactly the pantry-matching behavior expected from a recommendation system.any_of()andexcept_of()handle multi-value fields like dietary tags and difficulty levels naturally.search_batch()makes meal planning practical by eliminating per-query network overhead.- Server-side RRF fusion with
prefetchblends current cravings with historical preferences without client-side ranking logic. set_payload()enables incremental preference learning without re-ingesting the entire recipe dataset.
Next steps
Predicate filters deep dive
Master the full Filter DSL with all field types.
Re-ranking search results
Improve relevance with multi-stage pipelines.
Building multi-modal systems
Add recipe image search with named vectors.
Scalable agent memory
Build persistent memory for AI agents.