Join us for a virtual meetup on Zoom at 8 PM, July 31 (PDT) about using One Time Series Database for Both Metrics and Logs 👉🏻 Register Now

Skip to content
On this page
Engineering
September 3, 2024

How to Build a TSDB Running Inside a Car

The Greptime Vehicle-Cloud Integration Solution disrupts the traditional vehicle-cloud collaboration model by adopting a more cost-effective and efficient solution to meet the current market demand. This article describes the details of the solution.

Why do you need a TSDB inside a car?

As automotive intelligence progresses, vehicles generate increasing amounts of time-series data from various sources. This leads to high costs in data collection, transmission, and storage. GreptimeDB's Integrated Vehicle-Cloud Solution addresses these issues by leveraging the advanced computational capabilities of modern on-vehicle devices. Unlike traditional vehicle-cloud coordination where vehicles are mere data collectors, this new approach treats them as full-fledged servers capable of running complex tasks locally. The evolution from 32-bit MCUs to powerful chip modules like Qualcomm’s 8155 or 8295 has enabled intelligent vehicles to perform edge computing efficiently, reducing transmission costs and improving overall efficiency.

The challenges and how GreptimeDB solved them

GreptimeDB is a cloud-native time-series database built on a highly scalable foundation. However, we did not initially anticipate it running on edge devices such as vehicles, which has presented significant challenges.

  • The first challenge is resource usage constraints. GreptimeDB runs in the vehicle's cockpit domain controller and must minimize CPU and memory usage to avoid interfering with infotainment systems.

  • The second concern is robustness; GreptimeDB collects critical diagnostic metrics from the CAN bus, so any crashes could result in data loss.

  • Lastly, unlike servers in datacenters, vehicle-based GreptimeDB operates under various conditions—frequent power cycles, fluctuating ADAS data rates due to changing road traffic, etc.—and needs to adapt while remaining stable and efficient.

In the following sections, we will detail the challenges we encountered and how we addressed them.

CPU usage

Like other databases running on data centers, GreptimeDB tends to occupy as much CPU resource as necessary when required-this is also a key metric for assessing the scalability of a database. However, things are a bit different in cars. The Human-Machine interface (HMl) on cars not only performs essential data collection tasks but also provides infotainment functions to passengers. Therefore, we need to limit the CPU resources consumed by the on-vehicle database to avoid impacting other processes.

Since #1694, GretpimeDB offers a convenient tool for recording and analyzing CPU usage. During continuous data ingestion, it captures the CPU cycles spent on various tasks and generates a flamegraph illustrating the CPU usage of each component, as shown below.

Database side (left) and SDK side (right) CPU flamegraph
Database side (left) and SDK side (right) CPU flamegraph

Proprietary SDK based on shared memory lPC

The above graphs show that protocol encoding/decoding consumes about 30% of CPU cycles on the database side and around 36% on the SDK side. Optimizing protocol handling will significantly reduce average CPU usage. Currently, the open-source GreptimeDB SDK uses gRPC as its protocol. While gRPC is fast and user-friendly, its codec overhead becomes significant with large volumes of data ingestion, especially when dealing with nested message structures.

Data flow of gRPC SDK
Data flow of gRPC SDK

Since the SDK and GreptimeDB instance are on the same host, we chose shared memory (shm) to bypass the kernel network stack and used Arrow IPC format for inter-process communication. Data rows submitted to the SDK are first buffered in Arrow arrays, then encoded into a memory region shared between the SDK and database. The SDK then notifies the database to read and decode this data region. The picture below shows the data flow in shared memory solution.

Data flow in shared memory lPC solution
Data flow in shared memory lPC solution

To efficiently use the shared memory region, we designed a circular buffer. Encoded data must be written in a continuous memory area, similar to buffers used in Direct Memory Access (DMA). Therefore, the circular buffer must ensure that bytes are not split when the write offset reaches the end of the memory region.

Smooth flush and compaction

Like other LSM tree-based databases, GreptimeDB schedules periodic flush and compaction tasks in addition to handling data writes. These background tasks can cause CPU spikes due to their encoding and compression processes. Therefore, it is crucial to limit these activities.

GreptimeDB allows different compression and encoding algorithms for various columns. Based on our field experience and benchmarks with diverse data patterns, we have selected specific algorithms for each sensor's data to balance compression rates and CPU usage.

Controlling the timing of background tasks is also important. If all tasks run simultaneously, they can cause CPU spikes that lead to noticeable lags in infotainment operations for passengers. In GreptimeDB, flush and compaction tasks run in a dedicated runtime. By properly configuring the number of threads and the yielding strategy within this runtime, we can maintain low CPU consumption while meeting performance requirements for data writes.

Currently, these limitations are soft constraints; during periods of extremely high write traffic, CPU spikes may still occur. To address this comprehensively, we have initiated a research program under the OSPP (Open Source Promotion Plan) activity to explore mechanisms similar to cgroups for imposing hard constraints on GreptimeDB's resource consumption. This aims to fully resolve such issues.

In benchmarks on production grade Qualcomm 8295 modules, vehicle side GreptimeDB can ingest 600K of data points per second which less than 8% of CPU usage and 300MB of memory, without data loss or out-of-order.

CPU usage on Qualcomm 8295
CPU usage on Qualcomm 8295

Memory usage

Similar to CPU profiling, GreptimeDB offers a memory profiling tool. For more details, check out our blog. If you're familiar with LSM trees, it's easy to guess that the memtable consumes most of the memory, and the profiling results confirm this. Although GreptimeDB provides a global_write_buffer_size parameter to control memory usage, the default BTree memtable's overhead often causes memory usage to exceed the threshold. The image below from Nicole's blog illustrates the overhead of different collections in Rust.

The overhead of different collections in Rust
the overhead of different collections in Rust

GreptimeDB is currently in rapid development, and its primary memory buffer component, the memtable, has undergone multiple iterations. In subsequent iterations, we resolved the memory bloat issue by introducing the time series memtable, and provided extremely high write performance under workloads with a low data cardinality. The diagram below shows the structure of time-series memtable.

The structure of time-series memtable
the structure of time-series memtable

In addition to the memtable, the memory usage of other components must also be carefully examined—such as the logging component. On the Android platform, our log output is bridged to logcat, so the buffer for the default log appender can be disabled. This change can save us tens of megabytes of memory.

Flash wear out

Vehicle HMI systems typically use NAND Flash storage rather than solid-state disks (SSDs). NAND Flash has a limited write-erase lifespan, so it's crucial to minimize write amplification when running a time-series database on such storage. This is a known issue with databases based on LSM tree architecture.

To address this problem, the GreptimeDB instance running on vehicles supports configurable Write-Ahead Logging (WAL) strategies for different tables. For tables with high write volumes, WAL can be disabled to reduce write amplification. Conversely, for tables with lower data volumes but higher reliability requirements, WAL can remain enabled. Given that modern electric cars are typically always on, ensuring data consistency through WAL in the event of unexpected power loss is often unnecessary. This allows for more flexible and efficient use of NAND Flash storage. In case of manual power off, GreptimeDB also provides a hook to persist all pending data in flash.

For compactions involving data reading and writing, considering the relatively small data volume on the vehicle side that will soon be uploaded to the cloud and merged by the ingester component, GreptimeDB allows disabling compaction on the vehicle side. This also helps reduce flash wear.

File merging in ingester component on cloud
File merging in ingester component on cloud

In addition to the above measures, we explored the shared memory method based on anonymous memory. On Linux, this mechanism typically creates a file under /dev/shm, which both the SDK and database map in their own processes. Unfortunately, Android does not support /dev/shm. Initially, we created physical files on flash storage but soon discovered that this caused severe wear and tear on flash. Eventually, we found that Android provides ashmem, which does not require writing physical files.

Shared memory based on physical file (left) vs. ashmem (right)
Shared memory based on physical file (left) vs. ashmem (right)

During performance testing on real-world cars, we observed that no disk I/O occurs on the SDK side; all I/O is related to database flush tasks. Thanks to its high compression rate, GreptimeDB writes only 2MB of files per second.

IO monitor for SDK (left) and database (right)
IO monitor for SDK (left) and database (right)

Build a binary that actually works on vehicles

Modern intelligent vehicle HMI often run on Android platform, so our primary challenge was how to compile GreptimeDB's source code into a binary that can run on Android and integrate with manufacture's code base. In traditional C/C++ system programming, cross-compilation is often a complex task. Fortunately, thanks to Rust's rich ecosystem, we found cargo-ndk, a Cargo plugin that helps set up cross-compilation for the Android platform. With cargo-ndk, we can compile binaries that run on Android from Linux, macOS, and Windows platforms.

bash
cargo ndk --platform 23 \
        -t aarch64-linux-android \
        build --bin greptime \
        --profile release \
        --no-default-features

Besides cross compilation, to run a database on Android, you also have to provide polyglot components that fits Android systems, for example, logging, there's also an android-logger crate that addresses this issue. It can seamlessly bridge normal logging macros in Rust like debug!/info!/warn!/error! To logcat. GreptimeDB has integrated the verbose work into Github workflow, so you can download Android binaries from our Github release page.

It is worth noting that our SDK is also written in Rust, while the code base of car manufacturers usually use C/C++. Thus, we provide a foreign function interface (FFI) that wraps the Rust APIs and exposes them to C, as shown in the picture below:

Foreign function interface (FFl)
foreign function interface (FFl)

This architecture not only benefits from the ergonomic Rust ecosystem but also provides a stable, easy-to-integrate user interface. However, you should still be cautious with pointers and panics that cross the FFI boundary as mentioned in this blog post.

Towards the future of vehicle-side intelligence

This article introduces a low-cost, real-time vehicle-cloud integrated data solution built on Greptime's time-series database. It also discusses the issues we faced in this solution and the corresponding resolutions. So far, our solution has achieved excellent results in the collection and analysis of time-series diagnostic data on the vehicle side, saving manufacturers millions of dollars per year. However, our journey in exploring edge computing for smart cars has never ceased. With the rise of large language model (LLM) applications, smart cars can now leverage LLMs to provide richer passenger entertainment experiences and intelligent diagnostic tools. Under stringent privacy protection regulations, edge-side intelligence is gaining increasing importance, requiring smart cars to handle various multimodal data types such as vectors and images. GreptimeDB is exploring how to integrate multimodal data into the existing vehicle-cloud integrated solution, thereby accelerating the evolution of cars into more intelligent and versatile terminals.


About Greptime

We help industries that generate large amounts of time-series data, such as Connected Vehicles (CV), IoT, and Observability, to efficiently uncover the hidden value of data in real-time.

Visit the latest version from any device to get started and get the most out of your data.

  • GreptimeDB, written in Rust, is a distributed, open-source, time-series database designed for scalability, efficiency, and powerful analytics.
  • Edge-Cloud Integrated TSDB is designed for the unique demands of edge storage and compute in IoT. It tackles the exponential growth of edge data by integrating a multimodal edge-side database with cloud-based GreptimeDB Enterprise. This combination reduces traffic, computing, and storage costs while enhancing data timeliness and business insights.
  • GreptimeCloud is a fully-managed cloud database-as-a-service (DBaaS) solution built on GreptimeDB. It efficiently supports applications in fields such as observability, IoT, and finance.

Star us on GitHub or join GreptimeDB Community on Slack to get connected. Also, you can go to our contribution page to find some interesting issues to start with.

TSDB
Vehicle

Join our community

Get the latest updates and discuss with other users.