Skip to content
On this page
Engineering
December 19, 2024

Vector Usage in Production Environments

This blog explores Vector's deployment topologies, including distributed, centralized, and streaming architectures, and highlights its API capabilities for system health monitoring and self-observability. Vector offers a reliable solution for production-ready data pipelines.

What is Vector

In modern application architectures, data collection, processing, and transmission play a vital role in maintaining system efficiency. With the explosive growth in data volumes, scalable, efficient, and flexible solutions have become increasingly important.

Vector is an open-source, high-performance data collection and transmission tool designed to deliver exceptional performance and reliability in production environments. This article explores how to use Vector in production to build efficient data pipelines.

Deployment Models

Distributed Topology

In a distributed topology, a Vector agent is deployed on each client node, directly communicating between the client and downstream services. Each client node acts as a data collection and forwarding point, processing local data sources (e.g., application logs or system metrics). This design minimizes intermediate layers, simplifying direct data transmission to the target system.

Figure 1:GreptimeDB Vector as A Distributed Agent
Figure 1:Vector as A Distributed Agent

Advantages

  • Simplicity: Easy to implement and understand.

  • Elasticity: Scales resources with application growth.

Disadvantages

  • Lower efficiency: Complex pipelines may consume more resources, affecting other applications.

  • Limited durability: Cached data may be lost in the event of an unrecoverable crash.

  • Downstream pressure: Increases load on downstream services due to smaller, frequent requests.

  • Lack of cross-host context: Cannot perform cross-host operations.

Centralized Topology

This topology balances simplicity, stability, and control. Data is collected by Vector agents on client nodes and then sent to centralized Vector aggregators for processing. This architecture improves overall efficiency and reliability through centralized data management.

Figure 2:GreptimeDB Vector as A Central Service
Figure 2:Vector as A Central Service

Advantages

  • Higher efficiency: Aggregators buffer data, optimize requests, and reduce load on client nodes and downstream services.

  • Improved reliability: Smooth buffering and flush strategies protect downstream services from traffic spikes.

  • Cross-host context: Enables operations across hosts, suitable for large-scale deployments.

Disadvantages

  • Increased complexity: Requires managing both Vector agents and aggregators.

  • Durability limitations: Centralized failures can result in buffered data loss.

Stream-Based Topology

Designed for environments demanding high durability and resilience, this topology is ideal for large-scale data streams. Vector agents collect data and forward it to streaming systems like Kafka. Configuration files specify data sources, stream parameters, and downstream connections. Kafka then distributes the data to consumers.

Figure 3:GreptimeDB Vector as A Stream Consumer
Figure 3:Vector as A Stream Consumer

Advantages

  • Durability and reliability: Streaming services (e.g., Kafka) are built for high durability, replicating data across nodes.

  • Efficiency: Agents focus on routing without handling durability.

  • Replay capabilities: Data can be replayed based on stream retention periods.

Disadvantages

  • Operational overhead: Requires experienced teams to manage streaming systems.

  • Higher complexity: Demands in-depth knowledge of production-grade stream management.

  • Increased costs: Resources and management of streaming clusters can raise operational expenses.

Vector API Modules

The API module in Vector provides the capability to interact with external systems, supporting a variety of operations and monitoring tasks. These APIs allow users to easily manage Vector instances, retrieve system status information, and perform data queries and configuration management.

Health Check API

To ensure the instance is available, enable the API and probes in Vector to monitor its health status. You can enable the API using the following configuration:

yaml
role: "Agent"

tolerations:
  - operator: Exists

livenessProbe: 
  httpGet:
    path: /health
    port: api

readinessProbe: 
  httpGet:
    path: /health
    port: api

customConfig:
  data_dir: /vector-data-dir
  api:
    enabled: true
    address: 0.0.0.0:8686
    playground: true
    
  sources:
    kubernetes_logs:
      type: kubernetes_logs  
  
  sinks:
    stdout:
      type: console
      inputs:
        - kubernetes_logs
      encoding:
        codec: json

To check Vector’s health status, use:

bash
curl 127.0.0.1:8686/health  
{"ok":true}

GraphQL API

The GraphQL API provides flexible data querying and operational control via GraphQL endpoints.

Note: The GraphQL endpoint is only enabled when the Playground endpoint is activated.

  • Retrieve current Vector configuration:
bash
curl -X POST http://127.0.0.1:8686/graphql \
-H "Content-Type: application/json" \
-d '{"query": "query { sources { edges { node { componentId componentType } } } sinks { edges { node { componentId componentType } } } }"}'

{"data":{"sources":{"edges":[{"node":{"componentId":"kubernetes_logs","componentType":"kubernetes_logs"}}]},"sinks":{"edges":[{"node":{"componentId":"stdout","componentType":"console"}}]}}}
  • Check the Vector's version:
bash
curl -X POST http://127.0.0.1:8686/graphql \
-H "Content-Type: application/json" \
-d '{"query": "query { meta { versionString hostname } }"}'

{"data":{"meta":{"versionString":"0.42.0 (aarch64-unknown-linux-gnu 3d16e34 2024-10-21 14:10:14.375255220)","hostname":"vector-2p6ts"}}}

Playground API

Vector’s Playground API offers an interactive UI for executing queries and retrieving information at: http://localhost:8686/playground

Figure 4:GreptimeDB Vector Playground API
Figure 4:Acquire Information of the Vector Playground API

Vector Self-Monitoring

Metrics Collection

Vector collects and exports internal metrics using the internal_metrics source. These metrics provide insights into performance and health:

yaml
role: "Agent"

tolerations:
  - operator: Exists

service:
  ports:
    - name: prom-exporter
      port: 9598

containerPorts:
  - name: prom-exporter
    containerPort: 9598
    protocol: TCP  

customConfig:
  data_dir: /vector-data-dir  
  sources:
    vector_metrics:
      type: internal_metrics
      scrape_interval_secs: 10

  sinks:
    prom-exporter:
      type: prometheus_exporter
      inputs:
        - vector_metrics
      address: 0.0.0.0:9598
bash
curl 127.0.0.1:9598/metrics
# HELP vector_buffer_byte_size buffer_byte_size
# TYPE vector_buffer_byte_size gauge
vector_buffer_byte_size{buffer_type="memory",component_id="prom-exporter",component_kind="sink",component_type="prometheus_exporter",host="vector-xbw7f",stage="0"} 0 1731949489084
# HELP vector_buffer_events buffer_events
# TYPE vector_buffer_events gauge
vector_buffer_events{buffer_type="memory",component_id="prom-exporter",component_kind="sink",component_type="prometheus_exporter",host="vector-xbw7f",stage="0"} 0 1731949489084
# HELP vector_buffer_max_event_size buffer_max_event_size
# TYPE vector_buffer_max_event_size gauge
vector_buffer_max_event_size{buffer_type="memory",component_id="prom-exporter",component_kind="sink",component_type="prometheus_exporter",host="vector-xbw7f",stage="0"} 500 1731949489084
# HELP vector_buffer_received_bytes_total buffer_received_bytes_total
# TYPE vector_buffer_received_bytes_total counter
vector_buffer_received_bytes_total{buffer_type="memory",component_id="prom-exporter",component_kind="sink",component_type="prometheus_exporter",host="vector-xbw7f",stage="0"} 73519 1731949489084
...

The following are references to Metrics:

MetricTypeDescription
adaptive_concurrency_averaged_rttHistogramAverage round-trip time (RTT) for the current window.
adaptive_concurrency_in_flightHistogramNumber of outbound requests awaiting a response.
adaptive_concurrency_limitHistogramConcurrency limit determined by the adaptive concurrency feature for the window.
adaptive_concurrency_observed_rttHistogramObserved round-trip time (RTT) of requests.
aggregate_events_recorded_totalCounterTotal number of events recorded by the aggregation transformation.
aggregate_failed_updatesCounterNumber of failed metric updates and additions during aggregation.
aggregate_flushes_totalCounterTotal number of completed flushes for aggregation transformations.
api_started_totalCounterTotal number of times the Vector GraphQL API was started.
buffer_byte_sizeGaugeCurrent number of bytes in the buffer.
buffer_discarded_events_totalCounterTotal number of events discarded by the non-blocking buffer.
buffer_eventsGaugeTotal number of events currently in the buffer.
buffer_received_event_bytes_totalCounterTotal number of bytes received by the buffer.
buffer_received_events_totalCounterTotal number of events received by the buffer.
buffer_send_duration_secondsHistogramTime taken to send payloads to the buffer.
buffer_sent_event_bytes_totalCounterTotal number of bytes sent from the buffer.
buffer_sent_events_totalCounterTotal number of events sent from the buffer.
build_infoGaugeBuild version information.
checkpoints_totalCounterTotal number of checkpoint files.
checksum_errors_totalCounterTotal number of errors identified via checksum.
collect_completed_totalCounterTotal number of completed metric collections for the component.
collect_duration_secondsHistogramTime spent collecting metrics for the component.
command_executed_totalCounterTotal number of commands executed.
command_execution_duration_secondsHistogramDuration of command executions in seconds.
component_discarded_events_totalCounterTotal number of events discarded by the component.
component_errors_totalCounterTotal number of errors encountered by the component.
component_received_bytesHistogramByte size of each event received by the source.
component_received_bytes_totalCounterTotal raw bytes received by the component from the source.
component_received_event_bytes_totalCounterTotal event bytes received by the component from marked sources (e.g., files, URIs).
component_received_events_countHistogramHistogram of events passed in each internal batch within Vector's topology.
component_received_events_totalCounterTotal number of events received by the component from marked sources or others.
component_sent_bytes_totalCounterTotal raw bytes sent to the target receiver by the component.
component_sent_event_bytes_totalCounterTotal event bytes sent by the component to the target.
component_sent_events_totalCounterTotal number of events sent by the component to the target.
connection_established_totalCounterTotal number of established connections.
connection_read_errors_totalCounterTotal number of read errors encountered while processing data packets.
connection_send_errors_totalCounterTotal number of send errors encountered while transmitting data.
connection_shutdown_totalCounterTotal number of connection shutdowns.
container_processed_events_totalCounterTotal number of container events processed.
containers_unwatched_totalCounterTotal number of times Vector stopped monitoring container logs.
containers_watched_totalCounterTotal number of times Vector started monitoring container logs.
events_discarded_totalCounterTotal number of events discarded by the component.
files_added_totalCounterTotal number of files being monitored by Vector.
files_deleted_totalCounterTotal number of files deleted from monitoring.
files_resumed_totalCounterTotal number of times files resumed monitoring.
files_unwatched_totalCounterTotal number of times files stopped being monitored.
grpc_server_handler_duration_secondsHistogramTime taken to handle gRPC requests.
grpc_server_messages_received_totalCounterTotal number of gRPC messages received.
grpc_server_messages_sent_totalCounterTotal number of gRPC messages sent.
http_client_requests_sent_totalCounterTotal number of HTTP requests sent, labeled by request method.
http_client_response_rtt_secondsHistogramRound-trip time (RTT) for HTTP requests.
http_client_responses_totalCounterTotal number of HTTP client responses.
http_client_rtt_secondsHistogramRound-trip time (RTT) for HTTP requests.
http_requests_totalCounterTotal number of HTTP requests made by the component.
http_server_handler_duration_secondsHistogramTime taken to process HTTP requests.
http_server_requests_received_totalCounterTotal number of HTTP requests received by the server.
http_server_responses_sent_totalCounterTotal number of HTTP responses sent by the server.
internal_metrics_cardinalityGaugeTotal number of metrics emitted from the internal metrics registry.
invalid_record_totalCounterTotal number of invalid records discarded.

(Chart 1: Metrics Reference)

Logs

Vector's internal_logs collects and processes the internal logs generated by Vector itself, which helps us understand Vector's operational status and diagnose any issues.

yaml
role: "Agent"
tolerations:
  - operator: Exists
service:
  ports:
    - name: prom-exporter
      port: 9598
containerPorts:
  - name: prom-exporter
    containerPort: 9598
    protocol: TCP  
customConfig:
  data_dir: /vector-data-dir
  sources:
    vector_logs:
      type: internal_logs
  sinks:
    stdout:
      type: console
      inputs:
        - vector_logs
      encoding:
        codec: json

Alerts

By exposing Vector's internal_metrics, we can retrieve metrics from Vector and create Prometheus rules for alerting.

Interruptions

If data sending is interrupted for more than one minute, an alert is triggered:

yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: vector-sink-down
spec:
  groups:
    - name: vector
      rules:
        - alert: "VectorSinkDown"
          annotations:
            summary: "Vector sink down"
            description: "Vector sink down, sinks: {{ $labels.component_id }}"
          expr: |
            rate(vector_buffer_sent_events_total{component_type="${SINK_NAME}"}[30s]) == 0
          for: 1m
          labels:
            severity: critical

Latency

If the 95th percentile latency exceeds 0.5 seconds for more than one minute, an alert is triggered:

yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: vector-high-latency
spec:
  groups:
    - name: vector
      rules:
        - alert: "VectorHighLatency"
          annotations:
            summary: "High latency in Vector"
            description: "The 95th percentile latency for HTTP client responses is above 0.5 seconds."
          expr: |
            histogram_quantile(0.95, rate(vector_http_client_response_rtt_seconds_bucket[5m])) > 0.5
          for: 1m
          labels:
            severity: warning

Error Rate

If the error rate for HTTP requests (status 5xx) exceeds 5% for more than two minutes, an alert is triggered:

yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: vector-high-error
spec:
  groups:
    - name: vector
      rules:
        - alert: "VectorHighERROR"
          annotations:
            summary: "High error rate in Vector"
            description: "The error rate for HTTP client responses exceeds 5% over the last 5 minutes."
          expr: |
            rate(vector_http_client_responses_total{status=~"5.*"}[5m]) / rate(vector_http_client_responses_total[5m]) > 0.05
          for: 2m
          labels:
            severity: warning

Vector Debugging

Log Level

The default log level in Vector is info, but it supports the following levels:

  • trace

  • debug

  • info

  • warn

  • error

To facilitate debugging or capture more detailed information, you can change the log level to debug and set the log format to json for easier processing and viewing. Here’s how to configure it:

yaml
role: "Agent"

tolerations:
  - operator: Exists

logLevel: "debug"

env:
  - name: VECTOR_LOG_FORMAT
    value: "json"

VRL Syntax

Vector Remap Language (VRL) is the language used in Vector for data transformation and processing. Its goal is to simplify the handling of data flows, allowing for more flexible and intuitive manipulation of data pipelines.

VRL provides powerful data manipulation capabilities, including field transformations, conditional logic, loops, data mapping, and more. It is widely used in various scenarios such as log processing, metrics aggregation, and event filtering.

To verify the correctness of VRL syntax, you can visit the following link to test and validate your code: VRL Playground

Figure 5: GreptimeDB Vector VRL Syntax
Figure 5: Verifying VRL Syntax

Vector: a robust tool for varied data processing needs

Vector provides versatile deployment models and powerful APIs for efficient and reliable data pipelines in modern production environments. Its flexibility across distributed, centralized, and streaming topologies makes it a robust tool for varied data processing needs.

About Greptime

Greptime offers industry-leading time series database products and solutions to empower IoT and Observability scenarios, enabling enterprises to uncover valuable insights from their data with less time, complexity, and cost.

GreptimeDB is an open-source, high-performance time-series database offering unified storage and analysis for metrics, logs, and events. Try it out instantly with GreptimeCloud, a fully-managed DBaaS solution—no deployment needed!

The Edge-Cloud Integrated Solution combines multimodal edge databases with cloud-based GreptimeDB to optimize IoT edge scenarios, cutting costs while boosting data performance.

Star us on GitHub or join GreptimeDB Community on Slack to get connected.

Join our community

Get the latest updates and discuss with other users.