Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vectoraidb.actian.com/llms.txt

Use this file to discover all available pages before exploring further.

VectorAI DB exposes a /metrics endpoint on the REST API port (default 6333) that serves metrics in Prometheus/OpenMetrics format. Use these metrics to monitor REST API usage, process health, application status, and collection statistics.
EndpointGET /metrics
PortREST API port (default 6333)
FormatPrometheus / OpenMetrics

Scrape configuration

Add VectorAI DB as a Prometheus scrape target. The following example shows a minimal prometheus.yml configuration:
scrape_configs:
  - job_name: "vectorai"
    scrape_interval: 15s
    static_configs:
      - targets: ["localhost:6333"]
For Docker Compose deployments, replace localhost with the service name:
scrape_configs:
  - job_name: "vectorai"
    scrape_interval: 15s
    static_configs:
      - targets: ["vectorai:6333"]
The /metrics endpoint does not require authentication. If you expose it on a public network, restrict access with a firewall rule or reverse proxy.

Available metrics

The following sections describe every metric exposed by the /metrics endpoint, grouped by category. All metrics use the prefix actian_vectorai_. The full metric name is actian_vectorai_<name>. For example, actian_vectorai_collections_total, actian_vectorai_rest_responses_total. In tables, the prefix may be omitted from metric names for space considerations.

Label Keys

ConceptLabel key
Collection namecollection
Named vector spacevector_name
HTTP or gRPC routeendpoint
HTTP verbmethod
HTTP or gRPC status code (string)status
App namename
App versionversion

Prometheus Naming Rules Applied

  • Counters end in _total.
  • Duration histograms end in _duration_seconds (base unit: seconds).
  • Memory gauges end in _bytes.
  • Boolean state gauges have a descriptive suffix (_running, _mode).

Application info

These metrics expose application identity and operational state.
MetricTypeLabelsDescription
app_infoGaugename, versionApplication identity as name and version. Set once when the process starts from built-in metadata.
app_status_recovery_modeGauge1 if the engine is in recovery mode, 0 otherwise. Changes whenever the engine enters or exits recovery mode.

Collection metrics

These metrics provide visibility into collection sizes, vector counts, point counts, and optimization state.
MetricTypeLabelsDescription
collections_totalGaugeTotal number of collection, both loaded in memory and present on disk. Increased on creation and decreased on removal.
collections_vector_totalGaugeAggregate vector count across all collections. Recomputed whenever any collection’s vector count changes.
collection_point_totalGaugeAggregate point count across all collections.
collection_pointsGaugecollectionLive point count in a named collection. Taken from the count of external identifiers the collection tracks.
collection_vectorsGaugecollection, vector_nameVector count per named vector space. Calculated by summing vector counts per space; updated on inserts, deletes, and rebuilds.
collection_indexed_only_excluded_vectorsGaugecollection, vector_nameNumber of vectors excluded from the indexed-only search (for example, deleted or hidden points).
collection_running_optimizationsGaugecollection1 if the collection is undergoing a rebuild or optimization, 0 if idle. Set when a rebuild task begins and cleared when it ends.

Rebuild metrics

These metrics track index rebuild operations across all collections.
MetricTypeLabelsDescription
rebuild_runningGaugecollection1 if at least one rebuild is in progress, 0 otherwise. Reset to 0 when the last active rebuild finishes.
rebuild_triggered_totalCountercollectionCumulative count of rebuild tasks submitted. Incremented each time a rebuild request is accepted.
rebuild_success_totalCountercollectionCumulative count of rebuilds that finished successfully.
rebuild_failed_totalCountercollectionCumulative count of rebuilds that failed or were cancelled.
rebuild_duration_secondsHistogramcollectionTotal rebuild durations, measured from start to finish and recorded in predefined time buckets.
rebuild_vectors_processed_totalCountercollectionTotal vectors processed across all rebuilds (read or written).
rebuild_vectors_skipped_totalCountercollectionTotal vectors skipped during rebuilds because they were already updated.
rebuild_vectors_deleted_totalCountercollectionTotal vectors deleted across all rebuilds.
rebuild_phase_duration_secondsHistogramcollection, phaseDuration of individual rebuild phases (for example, initialize, populate, catchup, finalize).

Snapshot

These metrics track snapshot creation and recovery operations.
MetricTypeLabelsDescription
snapshot_creation_runningGaugecollection1 while SaveSnapshot is executing for a collection, 0 if idle.
snapshot_recovery_runningGaugecollection1 while LoadSnapshot is executing for a collection, 0 if idle.
snapshot_created_totalCountercollectionCumulative count of successful snapshot saves.

REST API

These metrics track HTTP request volume and latency across all REST endpoints.
MetricTypeLabelsDescription
rest_responses_totalCounterendpoint, method, statusTotal number of REST responses by route, method, and status code. Increased for every response the server sends.
rest_responses_fail_totalCounterendpoint, methodREST responses that returned a 5xx status.
rest_responses_duration_secondsHistogramendpoint, methodREST request latency per route and method.
Use actian_vectorai_rest_responses_total to track request rates and error ratios. Use actian_vectorai_rest_responses_duration_seconds to compute percentile latencies (p50, p95, p99) per endpoint.

gRPC API

These metrics track gRPC call volume and latency.
MetricTypeLabelsDescription
grpc_responses_totalCounterendpoint, statusTotal number of gRPC responses by fully qualified method and status. Increased for every completed RPC call.
grpc_responses_fail_totalCounterendpointgRPC responses with an error status.
grpc_responses_duration_secondsHistogramendpointgRPC call latency per fully qualified method, measured from call start to final status.

Combined API

These metrics track combined requests and latency for REST and gRPC.
MetricTypeDescription
api_requests_totalCounterTotal number of API requests (REST + gRPC) received since the last server start
api_responses_duration_secondsHistogramRequest latency across REST and gRPC, buckets shared with per-API histograms

Process metrics

These metrics report on the health of the VectorAI DB process at the operating system level, including memory usage from the allocator.
MetricTypeDescription
memory_resident_bytesGaugeResident set size
process_threadsGaugeNumber of live threads
process_open_fdsGaugeOpen file descriptor / handle count
process_open_mmapsGaugeOpen memory-mapped regions
process_cpu_coresGaugeLogical CPU core count observed by the process
process_cpu_frequency_hzGaugeObserved CPU frequency (hertz, from /proc/cpuinfo or Windows registry)
process_minor_page_faults_totalCounterMinor page faults since start (Linux only)
process_major_page_faults_totalCounterMajor page faults since start (Linux only)
process_cpu_seconds_totalCounterTotal CPU time consumed (user + kernel)
process_uptime_secondsGaugeProcess uptime in seconds (time since telemetry initialization)
process_memory_usage_bytesGaugeTotal memory currently used by the process (working set/private bytes)
process_memory_total_bytesGaugeTotal physical memory available to the machine
process_memory_free_bytesGaugeCurrently available physical memory observed on the host
process_disk_usage_bytesGaugeDisk space consumed in the process data path
process_disk_size_bytesGaugeTotal disk capacity reported by std::filesystem::space() for the configured VDE data path
  • The metric actian_vectorai_process_memory_free_bytes is sourced directly from the operating system (Windows GlobalMemoryStatusEx, Linux sysinfo). It reflects machine-wide available RAM, independent of the process’s own usage metrics.
A sustained increase in actian_vectorai_process_major_page_faults_total indicates the system is running low on physical memory and paging to disk, which severely degrades search performance. Consider increasing available memory or reducing the number of loaded collections.

Example PromQL queries

The following Prometheus Query Language examples demonstrate common monitoring patterns that you can use in Grafana or any Prometheus-compatible dashboard tool.

REST request rate by endpoint

sum by (endpoint) (rate(actian_vectorai_rest_responses_total[5m]))

REST error ratio

sum(rate(actian_vectorai_rest_responses_fail_total[5m]))
/
sum(rate(actian_vectorai_rest_responses_total[5m]))

REST p95 latency per endpoint

histogram_quantile(0.95, sum by (le, endpoint) (rate(actian_vectorai_rest_responses_duration_seconds_bucket[5m])))

gRPC request rate by method

sum by (method) (actian_vectorai_rate(grpc_responses_total[5m]))

gRPC error ratio

sum(rate(actian_vectorai_grpc_responses_fail_total[5m]))
/
sum(rate(actian_vectorai_grpc_responses_total[5m]))

Memory usage

actian_vectorai_memory_resident_bytes

Total vectors across all collections

actian_vectorai_collections_vector_total

Points per collection

actian_vectorai_collection_points

Active rebuilds

actian_vectorai_rebuild_running

Rebuild success rate

sum(rate(actian_vectorai_rebuild_success_total[1h]))
/
sum(rate(actian_vectorai_rebuild_triggered_total[1h]))
The following table lists suggested Prometheus alerting rules for production deployments.
AlertConditionSeverityDescription
High REST error ratesum(rate(actian_vectorai_rest_responses_fail_total[5m])) / sum(rate(actian_vectorai_rest_responses_total[5m])) > 0.05WarningMore than 5% of REST requests failing
High REST p95 latencyhistogram_quantile(0.95, sum by (le) (rate(actian_vectorai_rest_responses_duration_seconds_bucket[5m]))) > 2WarningREST p95 latency exceeds 2 seconds
High gRPC error ratesum(actian_vectorai_rate(grpc_responses_fail_total[5m])) / sum(rate(actian_vectorai_grpc_responses_total[5m])) > 0.05WarningMore than 5% of gRPC calls failing
Recovery mode activeactian_vectorai_app_status_recovery_mode == 1CriticalEngine is in recovery mode
High memory usageactian_vectorai_memory_resident_bytes > 0.8 * <memory_limit>WarningRSS exceeds 80% of available memory
Major page faults risingrate(actian_vectorai_process_major_page_faults_total[5m]) > 10WarningSustained major page faults indicate memory pressure
File descriptor exhaustionactian_vectorai_process_open_fds > 0.8 * <fd_limit>WarningOpen file descriptors approaching system limit
Rebuild failuresrate(actian_vectorai_rebuild_failed_total[1h]) > 0WarningOne or more index rebuilds have failed
Replace <memory_limit> and <fd_limit> with the actual limits for your deployment environment.

Example alerting rule

The following Prometheus alerting rule fires when the REST error ratio exceeds 5% for more than 5 minutes:
groups:
  - name: vectorai
    rules:
      - alert: VectorAIHighErrorRate
        expr: >
          sum(rate(actian_vectorai_rest_responses_fail_total[5m]))
          /
          sum(rate(actian_vectorai_rest_responses_total[5m]))
          > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "VectorAI DB error rate above 5%"
          description: "{{ $value | humanizePercentage }} of requests are returning errors."

Logging

VectorAI DB writes structured logs to stdout. Configure the log format and level to suit your log aggregation pipeline.

Log format

Set the log format to json for machine-readable output compatible with log aggregation tools such as Elasticsearch, Loki, or Datadog:
logging:
  format: json
The default format is text, which is human-readable but harder to parse programmatically.

Log level

Control log verbosity with the level setting:
logging:
  level: info
LevelUse case
errorProduction — only errors
warnProduction — errors and warnings
infoProduction default — normal operational messages
debugTroubleshooting — verbose output
traceDevelopment only — extremely verbose
Running at debug or trace level in production generates significant log volume and may impact performance. Use these levels only for short-term troubleshooting.

Next steps

Explore these related guides to learn more.

Troubleshooting

Diagnose connection, performance, and startup issues.

Error handling

Handle specific gRPC error codes in your application code.

Docker installation

Container setup, volume mounts, and Docker Compose configuration.

License and upgrade

Manage license keys and upgrade your VectorAI DB deployment.