OpenAI provides pre-trained embedding models that convert text into dense vector representations. These embeddings capture semantic meaning, making them well-suited for similarity search, retrieval-augmented generation (RAG), and clustering tasks.
VectorAI DB works with any OpenAI embedding model. You generate embeddings using the OpenAI API, then store and search them in VectorAI DB using a supported client.
Before running the examples on this page, make sure you have a VectorAI DB collection created and your VectorAI DB instance running. See Collections for setup instructions.
Supported models
| Model | Dimensions | Description |
|---|
text-embedding-3-small | 1536 | Smaller, faster model with strong performance for most use cases. |
text-embedding-3-large | 3072 | Higher-dimensional model for maximum accuracy. |
text-embedding-ada-002 | 1536 | Legacy model. Use text-embedding-3-small for new projects. |
Installation
Install the OpenAI Python client and the VectorAI DB client:
pip install openai actian-vectorai
Generate and store embeddings
The following example generates embeddings for a set of texts using OpenAI’s text-embedding-3-small model, stores them in a VectorAI DB collection, and runs a similarity search:
import openai
from actian_vectorai import VectorAIClient, VectorParams, Distance, PointStruct
OPENAI_API_KEY = "<YOUR_API_KEY>"
EMBEDDING_MODEL = "text-embedding-3-small"
COLLECTION = "openai_docs"
# Initialize OpenAI client
openai_client = openai.Client(api_key=OPENAI_API_KEY)
# Texts to embed
texts = [
"VectorAI DB enables fast and scalable semantic search.",
"Embeddings capture the meaning of text as dense vectors.",
"Cosine similarity measures the angle between two vectors.",
]
# Generate embeddings using OpenAI
response = openai_client.embeddings.create(input=texts, model=EMBEDDING_MODEL)
# Connect to VectorAI DB and create a collection
with VectorAIClient("localhost:50051") as client:
client.collections.create(
COLLECTION, # Collection name
vectors_config=VectorParams(
size=1536, # Matches text-embedding-3-small output
distance=Distance.Cosine, # Distance metric
),
)
# Build points from embeddings
points = [
PointStruct(
id=idx, # Point ID
vector=data.embedding, # OpenAI embedding vector
payload={"text": text}, # Original text as metadata
)
for idx, (data, text) in enumerate(zip(response.data, texts))
]
# Store vectors in the collection
client.points.upsert(COLLECTION, points)
print(f"Stored {len(points)} vectors in '{COLLECTION}'")
Search with OpenAI embeddings
Before running this example, create the collection and upsert the sample points from the previous section.
To search, generate an embedding for the query text using the same model, then pass it as the query vector:
# Generate an embedding for the search query
query = "How does vector similarity work?"
query_embedding = openai_client.embeddings.create(
input=[query], model=EMBEDDING_MODEL
).data[0].embedding
# Search the collection
with VectorAIClient("localhost:50051") as client:
results = client.points.search(
COLLECTION,
query_vector=query_embedding, # Query embedding
limit=3, # Number of results
)
for result in results:
print(f"[{result.score:.4f}] {result.payload['text']}")
Always use the same embedding model for both indexing and querying. Mixing models produces incompatible vector spaces and returns meaningless results.
Using text-embedding-3-large
For higher accuracy, use text-embedding-3-large, which produces 3072-dimensional vectors. Update the model name and collection dimension accordingly:
EMBEDDING_MODEL = "text-embedding-3-large"
# Create a collection sized for the larger model
with VectorAIClient("localhost:50051") as client:
client.collections.create(
"openai_large_docs",
vectors_config=VectorParams(
size=3072, # Matches text-embedding-3-large output
distance=Distance.Cosine,
),
)
Next steps
To continue building with embeddings, see the following resources:
- LangChain — Use OpenAI embeddings with VectorAI DB through the LangChain framework.
- Vectors — Learn how VectorAI DB stores and indexes vector data.
- Search — Explore the vector search operations available in VectorAI DB.
- Collections — Understand how collections organize your vectors.