โœ•

Join us for a virtual meetup on Zoom at 8 PM, July 31 (PDT) about using One Time Series Database for Both Metrics and Logs ๐Ÿ‘‰๐Ÿป Register Now

โœ•
Skip to content
On this page
Announcement
โ€ข
July 17, 2024

GreptimeDB v0.9 Release โ€” Unifying Metric and Log Analysis in a Single Time-Series Database

GreptimeDB v0.9 is now officially launched. In this new version, we introduce a new storage engine, Log Engine, which is optimized for log storage and queries.

As we step into the second half of 2024, we are excited to announce the official release of a new version of GreptimeDB.

In this new version, we introduce a new storage engine, Log Engine, which is optimized for log storage and queries. This engine not only enhances the efficiency of log data storage but also provides users with more powerful log data processing and querying capabilities.

GreptimeDB Roadmap 2024
GreptimeDB Roadmap 2024

Log Engine โ€‹

Log Engine is a storage engine specifically optimized for log storage and queries, featuring fulltext indexing.

Why is Log Engine Important? โ€‹

Log data is often unstructured or semi-structured, including text, timestamps, error messages, event descriptions, and more. This type of data is detailed and rich, recording various system events and operations. Previously, GreptimeDB supported metric-based data analysis. However, in scenarios such as IoT and observability, log analysis is crucial for fault diagnosis and resolution, performance, security monitoring, and understanding user behavior.

Log, as an event, contain unstructured messages that often require sophisticated searching mechanisms to extract meaningful insights. From this version onwards, GreptimeDB will gradually become a unified database supporting both metrics and log analysis. This will significantly enhance the ability to perform correlation analysis across different data sources. For example, root cause analysis will become straightforward, as all relevant event data will be in one place, eliminating the need to switch between multiple systems and interfaces.

Unified Database
Unified Database

The advantages of unifying logs and metrics in monitoring systems and how GreptimeDB design unified event management can be found in this blog: Unifying Logs and Metrics โ€” Revolutionizing Event Management in Monitoring Systems

Log Engine Components โ€‹

  1. Log Pipeline

Pipeline is a mechanism in GreptimeDB for transforming log data (in JSON format). It consists of a unique name and a set of configuration rules that define how to format, split, and transform log data.

These configurations are provided in YAML format, allowing the Pipeline to process data according to the set rules during log ingestion and store the processed data in the database for subsequent structured queries.

The Pipeline is composed of two parts:

  • Processor: Used for preprocessing log data, such as parsing time fields, replacing fields, etc.
  • Transform: Used for transforming log data, such as converting string types to numeric types.
  1. Fulltext Search

Fulltext search capabilities include:

  • FullText Index: Accelerate fulltext searches; certain columns can be specified to use fulltext indexing in the table creation statement or Pipeline configuration to accelerate search operations.

  • MATCHES Search Function: Allows users to search using various term expressions, including simple terms, negation terms, required terms, and other search types.

Log Engine Application Demo โ€‹

  1. Creating a Pipeline
YAML
## pipeline.yaml file
processors:
  - date:
      field: time
      formats:
        - "%Y-%m-%d %H:%M:%S%.3f"
      ignore_missing: true

transform:
  - fields:
      - id1
      - id2
    type: int32
  - fields:
      - type
      - logger
    type: string
    index: tag
  - fields:
      - log
    type: string
    index: fulltext
  - field: time
    type: time
    index: timestamp
bash
## Upload pipeline file๏ผŒtest refers to the name of Pipeline 
curl -X "POST" "http://localhost:4000/v1/events/pipelines/test" -F "[email protected]"
  1. Write Log data in
go
curl -X "POST" "http://localhost:4000/v1/events/logs?db=public&table=logs&pipeline_name=test" \
     -H 'Content-Type: application/json' \
     -d $'{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"this is a test log message"}
{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"Started logging"}
{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"Attended meeting discussion"}
{"time":"2024-05-25 20:16:37.217","id1":"2436","id2":"2528","type":"I","logger":"INTERACT.MANAGER","log":"Handled customer support requests"}'
  1. Table structure of logs (including the syntax for automatically creating tables)
sql
DESC TABLE logs;
 Column |        Type         | Key | Null | Default | Semantic Type
--------+---------------------+-----+------+---------+---------------
 id1    | Int32               |     | YES  |         | FIELD
 id2    | Int32               |     | YES  |         | FIELD
 type   | String              | PRI | YES  |         | TAG
 logger | String              | PRI | YES  |         | TAG
 log    | String              |     | YES  |         | FIELD
 time   | TimestampNanosecond | PRI | NO   |         | TIMESTAMP
(6 rows)
  1. Conduct fulltext searching using the MATCHES function
sql
SELECT * FROM logs WHERE MATCHES(log, "Attended OR Handled");
+------+------+------+------------------+-----------------------------------+----------------------------+
| id1  | id2  | type | logger           | log                               | time                       |
+------+------+------+------------------+-----------------------------------+----------------------------+
| 2436 | 2528 | I    | INTERACT.MANAGER | Handled customer support requests | 2024-05-25 20:16:37.217000 |
| 2436 | 2528 | I    | INTERACT.MANAGER | Attended meeting discussion       | 2024-05-25 20:16:37.217000 |
+------+------+------+------------------+-----------------------------------+----------------------------+

Other Functions โ€‹

  1. Flow Engine Enhancements
  • Flow Engine now supports cluster deployment.
  • Added support for SHOW CREATE FLOW.
  • Performance optimizations and bug fixes.
  1. Optimized Remote WAL, Distributed Region Failover is recommended to be turned on
  • Refactored the write logic for Remote WAL, introducing zero-delay batch accumulation.
  • Accelerate the speed of opening batch regions with Remote WAL.
  • Conducted extensive testing of the Remote WAL-based Region Failover feature to ensure stable execution and high data reliability.
Region Failover Workflow
Region Failover Workflow
  1. Support for InfluxDB Merge Read
  • Introduced a new table parameter, merge_mode, to control how GreptimeDB merges rows with the same tags and timestamps. The options are last_row and last_non_null, with last_row as the default. In last_row mode, GreptimeDB selects the most recent row as the merge result.

  • In last_non_null mode, GreptimeDB selects the most recent non-null value for each field as the merge result. This mode allows users to update specific columns of a row, enabling GreptimeDB to be compatible with InfluxDB semantics.

  • Here is an example:

SQL
create table if not exists last_non_null_table(
    host string,
    ts timestamp,
    cpu double,
    memory double,
    TIME INDEX (ts),
    PRIMARY KEY(host)
)
engine=mito
with('merge_mode'='last_non_null');

INSERT INTO last_non_null_table VALUES ('host1', 0, 0, NULL), ('host2', 1, NULL, 1);

INSERT INTO last_non_null_table VALUES ('host1', 0, NULL, 10), ('host2', 1, 11, NULL);

The result of the query is as follows:

SQL
SELECT * from last_non_null_table ORDER BY host, ts;

+-------+-------------------------+------+--------+
| host  | ts                      | cpu  | memory |
+-------+-------------------------+------+--------+
| host1 | 1970-01-01T00:00:00     | 0.0  | 10.0   |
| host2 | 1970-01-01T00:00:00.001 | 11.0 | 1.0    |
+-------+-------------------------+------+--------+
  • When writing data to GreptimeDB via the InfluxDB line protocol and triggering automatic table creation, GreptimeDB will set merge_mode to last_non_null by default.
  1. Support View
  • Users can now create and query views using SQL syntax. This feature encapsulates complex logic into reusable virtual tables, offering better data security, performance optimization, and abstraction of data complexity.

  • The newly supported SQL syntax are as follows:

    a. Creating a view

    sql
    CREATE [OR REPLACE] [IF NOT EXISTS] VIEW <view_name> AS <SELECT_statement>;

    b. Querying a view

    sql
    SELECT * FROM <view_name>;
  1. Support for Abbreviated Interval Expressions

In PR 4220 and 4182, we support abbreviated interval expressions, allowing shorthand representations. For example, y represents year. Other abbreviations include:

  • mon - month
  • w - week-
  • d - day-
  • m - minute
  • s - second
  • ms / millis - millisecond
  • us - microsecond
  • ns - nanosecond

In interval expressions, abbreviations and full text can be mixed. For example:

sql
SELECT INTERVAL '7 days' - INTERVAL '1d';

+----------------------------------------------------------------------------------------------+
|IntervalMonthDayNano("129127208515966861312")-IntervalMonthDayNano("18446744073709551616")|
+----------------------------------------------------------------------------------------------+
|0 years 0 mons 6 days 0 hours 0 mins 0.000000000 secs                                        |
+----------------------------------------------------------------------------------------------+

It's also supported transforming abbreviations to interval expressions:

sql
SELECT '3y2mon'::INTERVAL;

+---------------------------------------------------------+
| IntervalMonthDayNano("3010670175542044828554670112768") |
+---------------------------------------------------------+
| 0 years 38 mons 0 days 0 hours 0 mins 0.000000000 secs  |
+---------------------------------------------------------+
  1. Parallel Scan Optimization
  • Introduced partition parallel scanning capabilities, enabling parallel scans at the Row Group level under certain conditions, with scan speeds improving by up to 55%.

  • With a scan concurrency of 4, the optimization effects on certain queries are:

Optimization Results
Optimization Results
  1. gRPC Service Adds TLS Support

This PR enhances the security of the gRPC service by adding TLS support. gRPC Server TLS Configuration:

toml
[grpc.tls]
## TLS mode
mode = "enable"
## certificate file path
cert_path = "/path/to/certfile"
## key file path
key_path = "/path/to/keyfile"
## Monitoring changes to certificate and key files and automatically reloading them
watch = false
  1. Support for Manual Compaction with Different Strategies
  • This PR introduces the ability to manually trigger different types of compaction using SQL commands. The newly supported SQL syntax is:
sql
SELECT COMPACT_TABLE(<table_name>, [<compact_type>], [<options>])
  • Currently, the supported compact_type options include:
  1. regular: Triggers standard compaction similar to a flush operation.
  2. strict_window: Strictly divides SST files according to a specified time window.
  • The <options> parameter can be used to configure specific compaction strategies. For example, for strict_window, options specify the number of seconds in the compaction window:
rust
SELECT COMPACT_TABLE("monitor", "strict_window", "3600");

Upgrade Guide โ€‹

Given the significant changes in the new version, upgrading to v0.9 requires downtime. We recommend using our official upgrade tool for a smooth transition. Here's a general upgrade process:

  1. Create a fresh v0.9 cluster
  2. Stop traffic ingress to the old cluster (stop writing)
  3. Export table structure and data using the GreptimeDB CLI upgrade tool
  4. Import data into the new cluster using the GreptimeDB CLI upgrade tool
  5. Switch traffic ingress to the new cluster

For detailed upgrade instructions, please refer to: https://docs.greptime.com/user-guide/upgrade

Future Outlook โ€‹

GreptimeDB's short-term goal is to become a unified time-series database that integrates both Metrics and Logs. In the next version, we will continue refining the Log Engine to reduce transform overhead, optimize query performance, and expand the ecosystem with additional log collectors. Additionally, we may integrate the Log Engine with the Flow Engine for tasks such as parsing and extracting log contents.


About Greptime โ€‹

We help industries that generate large amounts of time-series data, such as Connected Vehicles (CV), IoT, and Observability, to efficiently uncover the hidden value of data in real-time.

Visit the latest version from any device to get started and get the most out of your data.

  • GreptimeDB, written in Rust, is a distributed, open-source, time-series database designed for scalability, efficiency, and powerful analytics.
  • Edge-Cloud Integrated TSDB is designed for the unique demands of edge storage and compute in IoT. It tackles the exponential growth of edge data by integrating a multimodal edge-side database with cloud-based GreptimeDB Enterprise. This combination reduces traffic, computing, and storage costs while enhancing data timeliness and business insights.
  • GreptimeCloud is a fully-managed cloud database-as-a-service (DBaaS) solution built on GreptimeDB. It efficiently supports applications in fields such as observability, IoT, and finance.

Star us on GitHub or join GreptimeDB Community on Slack to get connected. Also, you can go to our contribution page to find some interesting issues to start with.

log
release

Join our community

Get the latest updates and discuss with other users.