Skip to main content
Use this page to check collection state, flush pending writes to disk, optimize storage after deletions, rebuild the HNSW index, and manage snapshots for backup and recovery.
Before you begin, make sure you have a running VectorAI DB instance and the Python client library installed (pip install actian-vectorai).

Get collection state

A collection can be in one of the following states:
  • Ready. Collection is loaded into memory and available for operations.
  • Loading. Collection is currently loading into memory.
  • Closed. Collection exists on disk but is not loaded into memory.
  • Error. Collection encountered an issue.
Use get_state() to retrieve the current state.
import asyncio
from actian_vectorai import AsyncVectorAIClient

async def main():
    # Connect to VectorAI DB server
    async with AsyncVectorAIClient("localhost:50051") as client:
        # Get detailed collection state
        state = await client.vde.get_state("my_collection")
        print(f"Collection state: {state}")
asyncio.run(main())
The method returns one of the four states listed above.

Flush a collection

VectorAI DB writes data changes to disk asynchronously for performance. Flushing forces all pending writes to be persisted immediately, ensuring data durability. Use this operation in these situations:
  • Before critical operations or backups.
  • After large batch inserts.
  • Before shutting down applications.
  • When data durability is critical.
import asyncio
from actian_vectorai import AsyncVectorAIClient

async def main():
    # Connect to VectorAI DB server
    async with AsyncVectorAIClient("localhost:50051") as client:
        # Insert data operations would go here
        # ... insert operations ...

        # Force flush to disk
        await client.vde.flush("my_collection")
        print("Collection flushed to disk")

asyncio.run(main())

Optimize a collection

When points are deleted from a collection, VectorAI DB marks them as deleted but does not immediately reclaim storage. Optimization compacts the collection by removing deleted points, reducing disk usage and improving index efficiency. Run optimization after large deletions or when storage usage is high.
import asyncio
from actian_vectorai import AsyncVectorAIClient

async def main():
    # Connect to VectorAI DB server
    async with AsyncVectorAIClient("localhost:50051") as client:
        # Optimize collection (reclaim space from deleted points)
        print("Optimizing collection...")
        await client.vde.optimize("my_collection")
        print("Optimization complete")

        # Check stats after optimization
        stats = await client.vde.get_stats("my_collection")
        print(f"Stats after optimization: {stats}")

asyncio.run(main())
get_stats() returns the following fields after optimization.
  • total_points: Total number of points in the collection.
  • deleted_points: Number of points still pending cleanup.
  • segment_count: Number of storage segments in the collection.
  • index_size_bytes: Total size of the HNSW index in bytes.

Rebuild an index

Rebuilding creates a new HNSW index from scratch. This can restore search accuracy after large bulk updates and is required for HNSW parameter changes to take effect. Rebuild an index in these situations:
  • After changing HNSW parameters.
  • When search performance degrades significantly.
  • After massive bulk updates.
  • During scheduled maintenance windows.
import asyncio
from actian_vectorai import AsyncVectorAIClient


async def main():
    # Connect to VectorAI DB server
    async with AsyncVectorAIClient("localhost:50051") as client:
        COLLECTION = "large_dataset"

        # Start index rebuild
        print("Starting index rebuild...")
        task_id = await client.vde.rebuild_index(COLLECTION)
        print(f"Rebuild task started: {task_id}")

        # Monitor rebuild progress
        while True:
            # Get all rebuild tasks (returns tuple: tasks_list, total_count)
            tasks_list, total = await client.vde.list_rebuild_tasks(collection_name=COLLECTION)

            # Find current task by ID
            current_task = next(
                (t for t in tasks_list if t.task_id == task_id), None)

            # Check if rebuild is complete
            if not current_task or current_task.status == "completed":
                print("\nIndex rebuild complete!")
                break

            # Display progress
            progress = current_task.progress_percent
            print(f"Progress: {progress:.1f}%", end='\r')

            # Wait before checking again
            await asyncio.sleep(1)

asyncio.run(main())

Manage snapshots

The following examples show how to create and load collection snapshots for backup and recovery.
import asyncio
from actian_vectorai import AsyncVectorAIClient

async def main():
    # Connect to VectorAI DB server
    async with AsyncVectorAIClient("localhost:50051") as client:
        COLLECTION = "important_data"
        
        # Save snapshot
        snapshot_path = await client.vde.save_snapshot(
            COLLECTION  # LOCATION OF THE SNAPSHOT
        )
        print(f"Snapshot saved: {snapshot_path}")
        
        # Later: restore from snapshot
        await client.vde.load_snapshot(
            COLLECTION
        )
        print("Collection restored from snapshot")

asyncio.run(main())
Snapshots support these scenarios:
  • Regular backups.
  • Testing with production data.
  • Disaster recovery.
  • Environment promotion across stages.

List rebuild tasks

The following example lists all ongoing index rebuild operations.
import asyncio
from actian_vectorai import AsyncVectorAIClient

async def main():
    # Connect to VectorAI DB server
    async with AsyncVectorAIClient("localhost:50051") as client:
        # List all rebuild tasks
        tasks = await client.vde.list_rebuild_tasks()
        
        if not tasks:
            print("No rebuild tasks running")
        else:
            print(f"Active rebuild tasks: {len(tasks)}\n")
            for task in tasks:
                # Display task details
                print(f"Task ID: {task.task_id}")
                print(f"  Collection: {task.collection_name}")
                print(f"  Status: {task.status}")
                print(f"  Progress: {task.progress_percent:.1f}%")
                print(f"  Started: {task.start_time}")
                print()

asyncio.run(main())
Each task includes these fields.
  • task_id: Unique identifier for the rebuild task.
  • collection_name: Name of the collection being rebuilt.
  • status: Current status of the task (running, completed, failed).
  • progress_percent: Completion percentage (0-100).
  • start_time: Timestamp when the task started.

Complete maintenance workflow

The following example combines the individual maintenance operations from this page into a single workflow. Use it as a template for scheduled maintenance routines that flush, optimize, and back up a collection in sequence.
import asyncio
from actian_vectorai import AsyncVectorAIClient

async def maintenance_workflow(client, collection_name):
    """Complete maintenance workflow for a collection"""
    
    print(f"=== Maintenance for '{collection_name}' ===\n")
    
    # 1. Get initial stats
    print("1. Initial state:")
    stats = await client.vde.get_stats(collection_name)
    print(f"   Points: {stats.total_points:,}")
    print(f"   Deleted: {stats.deleted_points:,}")
    print(f"   Segments: {stats.segment_count}")
    
    # 2. Flush pending changes
    print("\n2. Flushing to disk...")
    await client.vde.flush(collection_name)
    
    # 3. Optimize if needed
    if stats.deleted_points > 1000:
        print(f"\n3. Optimizing ({stats.deleted_points:,} deleted points)...")
        await client.vde.optimize(collection_name)
    
    # 4. Save snapshot
    print("\n4. Creating snapshot...")
    snapshot = await client.vde.save_snapshot(
        collection_name,
        snapshot_path=f"/backups/{collection_name}_maintenance.snap"  # Backup path
    )
    print(f"   Saved: {snapshot}")
    
    # 5. Get final stats
    print("\n5. Final state:")
    final_stats = await client.vde.get_stats(collection_name)
    print(f"   Points: {final_stats.total_points:,}")
    print(f"   Deleted: {final_stats.deleted_points:,}")
    print(f"   Index size: {final_stats.index_size_bytes / 1024 / 1024:.2f} MB")
    
    print("\n=== Maintenance complete ===")

async def main():
    # Connect to VectorAI DB server
    async with AsyncVectorAIClient("localhost:50051") as client:
        # Run maintenance workflow
        await maintenance_workflow(client, "products")

asyncio.run(main())