Optimized Bulk Writes, Slow Query Analysis & Index Result Caching | Greptime Biweekly Report

Summary

Together with our global community of contributors, GreptimeDB continues to evolve and flourish as a growing open-source project. We are grateful to each and every one of you.

Below are the highlights among recent commits:

Continuously improve bulk insert
Record slow queries in the system table
Introduce Index result caching

Contributors

For the past two weeks, our community has been super active with a total of 104 PRs merged. 2 PRs from 2 individual contributors merged successfully and lots pending to be merged.

Congrats on becoming our most active contributors in the past 2 weeks:

@yinheli db#6128
@zqr10159 dbingesterjava#86

👏 Welcome @yinheli to the community as a new contributor with a successfully merged PR, and more PRs from other individual contributors are waiting to be merged.

（Figure 1: New Contributor of GreptimeDB）

🎉 A big THANK YOU to all our members and contributors! It is people like you who are making GreptimeDB a great product. Let's build an even greater community together.

Highlights of Recent PRs

db#6086 Write to Multiple Time Partitions

This PR introduces support for multiple time partitions in the bulk insert ingestion path. To leverage the performance benefits of vectored operations, it manually implements the gt_eq && lt operation rather than relying on the Arrow kernel. Benchmarks demonstrate a performance improvement of over 20%.

db#6008 Add `SlowQueryRecorder` to Record Slow Query in System Table and Refactor Slow Query Options

Previously, slow queries were only logged, requiring database maintainers to manually extract and store them elsewhere for analysis. This PR introduces a SlowQueryRecorder that automatically logs slow queries into a system table, enabling easier tracking and analysis with no additional overhead.

Here is a glimpse of the slow query table:

sql

+------+-----------+---------------------------------------------+-----------+----------------------------+--------------+-------------+---------------------+---------------------+
| cost | threshold | query                                       | is_promql | timestamp                  | promql_range | promql_step | promql_start        | promql_end          |
+------+-----------+---------------------------------------------+-----------+----------------------------+--------------+-------------+---------------------+---------------------+
|    2 |         0 | irate(process_cpu_seconds_total[1h])        |         1 | 2025-05-14 13:59:36.368575 |     86400000 |     3600000 | 2024-11-24 00:00:00 | 2024-11-25 00:00:00 |
|   22 |         0 | SELECT * FROM greptime_private.slow_queries |         0 | 2025-05-14 13:59:44.844201 |            0 |           0 | 1970-01-01 00:00:00 | 1970-01-01 00:00:00 |
+------+-----------+---------------------------------------------+-----------+----------------------------+--------------+-------------+---------------------+---------------------+

db#5981 Prometheus Remote Write with Pipeline

The pipeline engine is originally designed for text pre-processing. However, as GreptimeDB starts to embrace the observability ecosystem, we begin to see that it can also be used in pre-processing traces and even metrics for its transformation ability(e.g, process and transform the labels). This PR introduces the pipeline execution in Prometheus' remote write process. With more incoming enhancing PRs for the pipeline engine, modifying metrics requests should be as straightforward as modifying text requests.

db#6110 Introduce Index Result Cache

GreptimeDB already supports multiple index types. This PR adds index result caching, significantly improving performance for repeated pattern queries.

db#6121 Revise Compaction Picker

The TWCS (Time Window Compaction Strategy) compaction strategy was originally introduced in db#1851. As database architecture has continued to evolve, the original implementation no longer meets current requirements effectively. This PR enhances the adaptability of the compaction strategy to the updated architecture and significantly reduces its time complexity.

Good First Issue

Issue#6095 Make events/logs to Accept `x-greptime-pipeline-name` Header

x-greptime-pipeline-name is supported in OTEL protocol endpoints, but not our /event/logs endpoint. Make other HTTP endpoints that have pipeline executions accept this header too.

Difficulty: Simple

Keywords: HTTP, Pipeline

Issue#6105 Enable Common Datasource Support for Azblob, Oss, and Gcs storage

Currently, both copy from/to table/database and external table utilize build_backend to create object storage, but only S3 is supported at this time. We need to extend support to other types of object storage.(Originally posted by @yihong0618 in #5585 (comment))

Difficulty: Simple

Keywords: Object storage

Issue#6188 Add a Metadata CLI Tool Similar to `etcdctl`

We need a CLI tool to interact with the metadata layer of GreptimeDB, similar to how etcdctl allows direct interaction with etcd. This tool would help operators inspect, query, and manage metadata entries.

Difficulty: Simple

Keywords: Metasrv

Optimized Bulk Writes, Slow Query Analysis & Index Result Caching | Greptime Biweekly Report

Summary ​

Contributors ​

Highlights of Recent PRs ​

Good First Issue ​

Join our community

Summary

Contributors

Highlights of Recent PRs

Good First Issue