Documentation Index
Fetch the complete documentation index at: https://docs.vectoraidb.actian.com/llms.txt
Use this file to discover all available pages before exploring further.
VectorAI DB exposes a /metrics endpoint on the REST API port (default 6333) that serves metrics in Prometheus/OpenMetrics format. Use these metrics to monitor REST API usage, process health, application status, and collection statistics.
| |
|---|
| Endpoint | GET /metrics |
| Port | REST API port (default 6333) |
| Format | Prometheus / OpenMetrics |
Scrape configuration
Add VectorAI DB as a Prometheus scrape target. The following example shows a minimal prometheus.yml configuration:
scrape_configs:
- job_name: "vectorai"
scrape_interval: 15s
static_configs:
- targets: ["localhost:6333"]
For Docker Compose deployments, replace localhost with the service name:
scrape_configs:
- job_name: "vectorai"
scrape_interval: 15s
static_configs:
- targets: ["vectorai:6333"]
The /metrics endpoint does not require authentication. If you expose it on a public network, restrict access with a firewall rule or reverse proxy.
Available metrics
The following sections describe every metric exposed by the /metrics endpoint, grouped by category. All metrics use the prefix actian_vectorai_. The full metric name is actian_vectorai_<name>. For example, actian_vectorai_collections_total, actian_vectorai_rest_responses_total. In tables, the prefix may be omitted from metric names for space considerations.
Label Keys
| Concept | Label key |
|---|
| Collection name | collection |
| Named vector space | vector_name |
| HTTP or gRPC route | endpoint |
| HTTP verb | method |
| HTTP or gRPC status code (string) | status |
| App name | name |
| App version | version |
Prometheus Naming Rules Applied
- Counters end in
_total.
- Duration histograms end in
_duration_seconds (base unit: seconds).
- Memory gauges end in
_bytes.
- Boolean state gauges have a descriptive suffix (
_running, _mode).
Application info
These metrics expose application identity and operational state.
| Metric | Type | Labels | Description |
|---|
app_info | Gauge | name, version | Application identity as name and version. Set once when the process starts from built-in metadata. |
app_status_recovery_mode | Gauge | — | 1 if the engine is in recovery mode, 0 otherwise. Changes whenever the engine enters or exits recovery mode. |
Collection metrics
These metrics provide visibility into collection sizes, vector counts, point counts, and optimization state.
| Metric | Type | Labels | Description |
|---|
collections_total | Gauge | — | Total number of collection, both loaded in memory and present on disk. Increased on creation and decreased on removal. |
collections_vector_total | Gauge | — | Aggregate vector count across all collections. Recomputed whenever any collection’s vector count changes. |
collection_point_total | Gauge | — | Aggregate point count across all collections. |
collection_points | Gauge | collection | Live point count in a named collection. Taken from the count of external identifiers the collection tracks. |
collection_vectors | Gauge | collection, vector_name | Vector count per named vector space. Calculated by summing vector counts per space; updated on inserts, deletes, and rebuilds. |
collection_indexed_only_excluded_vectors | Gauge | collection, vector_name | Number of vectors excluded from the indexed-only search (for example, deleted or hidden points). |
collection_running_optimizations | Gauge | collection | 1 if the collection is undergoing a rebuild or optimization, 0 if idle. Set when a rebuild task begins and cleared when it ends. |
Rebuild metrics
These metrics track index rebuild operations across all collections.
| Metric | Type | Labels | Description |
|---|
rebuild_running | Gauge | collection | 1 if at least one rebuild is in progress, 0 otherwise. Reset to 0 when the last active rebuild finishes. |
rebuild_triggered_total | Counter | collection | Cumulative count of rebuild tasks submitted. Incremented each time a rebuild request is accepted. |
rebuild_success_total | Counter | collection | Cumulative count of rebuilds that finished successfully. |
rebuild_failed_total | Counter | collection | Cumulative count of rebuilds that failed or were cancelled. |
rebuild_duration_seconds | Histogram | collection | Total rebuild durations, measured from start to finish and recorded in predefined time buckets. |
rebuild_vectors_processed_total | Counter | collection | Total vectors processed across all rebuilds (read or written). |
rebuild_vectors_skipped_total | Counter | collection | Total vectors skipped during rebuilds because they were already updated. |
rebuild_vectors_deleted_total | Counter | collection | Total vectors deleted across all rebuilds. |
rebuild_phase_duration_seconds | Histogram | collection, phase | Duration of individual rebuild phases (for example, initialize, populate, catchup, finalize). |
Snapshot
These metrics track snapshot creation and recovery operations.
| Metric | Type | Labels | Description |
|---|
snapshot_creation_running | Gauge | collection | 1 while SaveSnapshot is executing for a collection, 0 if idle. |
snapshot_recovery_running | Gauge | collection | 1 while LoadSnapshot is executing for a collection, 0 if idle. |
snapshot_created_total | Counter | collection | Cumulative count of successful snapshot saves. |
REST API
These metrics track HTTP request volume and latency across all REST endpoints.
| Metric | Type | Labels | Description |
|---|
rest_responses_total | Counter | endpoint, method, status | Total number of REST responses by route, method, and status code. Increased for every response the server sends. |
rest_responses_fail_total | Counter | endpoint, method | REST responses that returned a 5xx status. |
rest_responses_duration_seconds | Histogram | endpoint, method | REST request latency per route and method. |
Use actian_vectorai_rest_responses_total to track request rates and error ratios. Use actian_vectorai_rest_responses_duration_seconds to compute percentile latencies (p50, p95, p99) per endpoint.
gRPC API
These metrics track gRPC call volume and latency.
| Metric | Type | Labels | Description |
|---|
grpc_responses_total | Counter | endpoint, status | Total number of gRPC responses by fully qualified method and status. Increased for every completed RPC call. |
grpc_responses_fail_total | Counter | endpoint | gRPC responses with an error status. |
grpc_responses_duration_seconds | Histogram | endpoint | gRPC call latency per fully qualified method, measured from call start to final status. |
Combined API
These metrics track combined requests and latency for REST and gRPC.
| Metric | Type | Description |
|---|
api_requests_total | Counter | Total number of API requests (REST + gRPC) received since the last server start |
api_responses_duration_seconds | Histogram | Request latency across REST and gRPC, buckets shared with per-API histograms |
Process metrics
These metrics report on the health of the VectorAI DB process at the operating system level, including memory usage from the allocator.
| Metric | Type | Description |
|---|
memory_resident_bytes | Gauge | Resident set size |
process_threads | Gauge | Number of live threads |
process_open_fds | Gauge | Open file descriptor / handle count |
process_open_mmaps | Gauge | Open memory-mapped regions |
process_cpu_cores | Gauge | Logical CPU core count observed by the process |
process_cpu_frequency_hz | Gauge | Observed CPU frequency (hertz, from /proc/cpuinfo or Windows registry) |
process_minor_page_faults_total | Counter | Minor page faults since start (Linux only) |
process_major_page_faults_total | Counter | Major page faults since start (Linux only) |
process_cpu_seconds_total | Counter | Total CPU time consumed (user + kernel) |
process_uptime_seconds | Gauge | Process uptime in seconds (time since telemetry initialization) |
process_memory_usage_bytes | Gauge | Total memory currently used by the process (working set/private bytes) |
process_memory_total_bytes | Gauge | Total physical memory available to the machine |
process_memory_free_bytes | Gauge | Currently available physical memory observed on the host |
process_disk_usage_bytes | Gauge | Disk space consumed in the process data path |
process_disk_size_bytes | Gauge | Total disk capacity reported by std::filesystem::space() for the configured VDE data path |
- The metric
actian_vectorai_process_memory_free_bytes is sourced directly from the operating system (Windows GlobalMemoryStatusEx, Linux sysinfo). It reflects machine-wide available RAM, independent of the process’s own usage metrics.
A sustained increase in actian_vectorai_process_major_page_faults_total indicates the system is running low on physical memory and paging to disk, which severely degrades search performance. Consider increasing available memory or reducing the number of loaded collections.
Example PromQL queries
The following Prometheus Query Language examples demonstrate common monitoring patterns that you can use in Grafana or any Prometheus-compatible dashboard tool.
REST request rate by endpoint
sum by (endpoint) (rate(actian_vectorai_rest_responses_total[5m]))
REST error ratio
sum(rate(actian_vectorai_rest_responses_fail_total[5m]))
/
sum(rate(actian_vectorai_rest_responses_total[5m]))
REST p95 latency per endpoint
histogram_quantile(0.95, sum by (le, endpoint) (rate(actian_vectorai_rest_responses_duration_seconds_bucket[5m])))
gRPC request rate by method
sum by (method) (actian_vectorai_rate(grpc_responses_total[5m]))
gRPC error ratio
sum(rate(actian_vectorai_grpc_responses_fail_total[5m]))
/
sum(rate(actian_vectorai_grpc_responses_total[5m]))
Memory usage
actian_vectorai_memory_resident_bytes
Total vectors across all collections
actian_vectorai_collections_vector_total
Points per collection
actian_vectorai_collection_points
Active rebuilds
actian_vectorai_rebuild_running
Rebuild success rate
sum(rate(actian_vectorai_rebuild_success_total[1h]))
/
sum(rate(actian_vectorai_rebuild_triggered_total[1h]))
Recommended alerts
The following table lists suggested Prometheus alerting rules for production deployments.
| Alert | Condition | Severity | Description |
|---|
| High REST error rate | sum(rate(actian_vectorai_rest_responses_fail_total[5m])) / sum(rate(actian_vectorai_rest_responses_total[5m])) > 0.05 | Warning | More than 5% of REST requests failing |
| High REST p95 latency | histogram_quantile(0.95, sum by (le) (rate(actian_vectorai_rest_responses_duration_seconds_bucket[5m]))) > 2 | Warning | REST p95 latency exceeds 2 seconds |
| High gRPC error rate | sum(actian_vectorai_rate(grpc_responses_fail_total[5m])) / sum(rate(actian_vectorai_grpc_responses_total[5m])) > 0.05 | Warning | More than 5% of gRPC calls failing |
| Recovery mode active | actian_vectorai_app_status_recovery_mode == 1 | Critical | Engine is in recovery mode |
| High memory usage | actian_vectorai_memory_resident_bytes > 0.8 * <memory_limit> | Warning | RSS exceeds 80% of available memory |
| Major page faults rising | rate(actian_vectorai_process_major_page_faults_total[5m]) > 10 | Warning | Sustained major page faults indicate memory pressure |
| File descriptor exhaustion | actian_vectorai_process_open_fds > 0.8 * <fd_limit> | Warning | Open file descriptors approaching system limit |
| Rebuild failures | rate(actian_vectorai_rebuild_failed_total[1h]) > 0 | Warning | One or more index rebuilds have failed |
Replace <memory_limit> and <fd_limit> with the actual limits for your deployment environment.
Example alerting rule
The following Prometheus alerting rule fires when the REST error ratio exceeds 5% for more than 5 minutes:
groups:
- name: vectorai
rules:
- alert: VectorAIHighErrorRate
expr: >
sum(rate(actian_vectorai_rest_responses_fail_total[5m]))
/
sum(rate(actian_vectorai_rest_responses_total[5m]))
> 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "VectorAI DB error rate above 5%"
description: "{{ $value | humanizePercentage }} of requests are returning errors."
Logging
VectorAI DB writes structured logs to stdout. Configure the log format and level to suit your log aggregation pipeline.
Set the log format to json for machine-readable output compatible with log aggregation tools such as Elasticsearch, Loki, or Datadog:
The default format is text, which is human-readable but harder to parse programmatically.
Log level
Control log verbosity with the level setting:
| Level | Use case |
|---|
error | Production — only errors |
warn | Production — errors and warnings |
info | Production default — normal operational messages |
debug | Troubleshooting — verbose output |
trace | Development only — extremely verbose |
Running at debug or trace level in production generates significant log volume and may impact performance. Use these levels only for short-term troubleshooting.
Next steps
Explore these related guides to learn more.
Troubleshooting
Diagnose connection, performance, and startup issues.
Error handling
Handle specific gRPC error codes in your application code.
Docker installation
Container setup, volume mounts, and Docker Compose configuration.
License and upgrade
Manage license keys and upgrade your VectorAI DB deployment.