Join us for a virtual meetup on Zoom at 8 PM, July 31 (PDT) about using One Time Series Database for Both Metrics and Logs 👉🏻 Register Now

Skip to content
On this page
Biweekly
May 8, 2024

Introducing the Cluster Info Table for Optimized Cluster Management, Support Alter Column Type | Greptime Biweekly Report

A recap of the past two-weeks progress and changes happened on GreptimeDB.

Summary

Together with our global community of contributors, GreptimeDB continues to evolve and flourish as a growing open-source project. We are grateful to each and every one of you.

Below are the highlights among recent commits:

  • Add the cluster_info table to information_schema: This allows users to query the status of each node in the cluster via SQL.

  • Support alters column type: GreptimeDB currently supports modifying column data types in tables.

  • Introduce new optimizer rules to optimize count(*) query performance.

Contributors

For the past two weeks, our community has been super active with a total of 62 PRs merged. 13 PRs from 5 individual contributors merged successfully, and lots are pending to be merged.

Congrats on becoming our most active contributors in the past 2 weeks:

👏 Welcome contributor @Kelvinyu1117 join to the community as the new individual contributor, and congratulations on successfully merging their first PR, more PRs are waiting to be merged.

New Contributor of GreptimeDB

A big THANK YOU to all our members and contributors! It is people like you who are making GreptimeDB a great product. Let's build an even greater community together.

Highlights of Recent PRs

#3796 #3757 #3745 Support column type changes in tables

This patch supports altering column type. It allows users to easily change the data type of columns in the table without rebuilding the table or manually migrating data, which improves the flexibility and maintainability of the database.

For example, the following statement changes the load_15 column of the monitor table to the STRING type:

sql
ALTER TABLE monitor MODIFY COLUMN load_15 STRING;

Note: The modified column cannot be a tag (primary key) or the time index. It must be nullable to ensure that the data can be safely converted (returns NULL on cast failures).

#3709 Write manifests in background tasks

This PR supports writing manifest files in the background thread instead of in the region worker thread.

If the region worker thread is responsible for writing the manifest file, it may cause the region worker thread to be blocked. So we move this operation to background thread to reduce the time the worker thread is blocked.

#3832 Add cluster_info table to information_schema

This patch adds the cluster_info table to information_schema. cluster_info provides the current cluster info, like node topology. This feature can assist administrators in monitoring and managing the health status of the database cluster, promptly identifying and resolving issues.

Give an example:

shell
mysql> USE INFORMATION_SCHEMA;
mysql> SELECT * FROM CLUSTER_INFO;

The result in standalone mode is as follows:

shell
mysql> SELECT * FROM CLUSTER_INFO;
+---------+------------+-----------+---------+------------+-------------------------+--------+-------------+
| peer_id | peer_type  | peer_addr | version | git_commit | start_time              | uptime | active_time |
+---------+------------+-----------+---------+------------+-------------------------+--------+-------------+
| 0       | STANDALONE |           | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:02.074 | 18ms   |             |
+---------+------------+-----------+---------+------------+-------------------------+--------+-------------+

The result in distributed mode is as follows:

shell
+---------+-----------+----------------+---------+------------+-------------------------+----------+-------------+
| peer_id | peer_type | peer_addr      | version | git_commit | start_time              | uptime   | active_time |
+---------+-----------+----------------+---------+------------+-------------------------+----------+-------------+
| 1       | DATANODE  | 127.0.0.1:4101 | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:04.791 | 4s 478ms | 1s 467ms    |
| 2       | DATANODE  | 127.0.0.1:4102 | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:06.098 | 3s 171ms | 162ms       |
| 3       | DATANODE  | 127.0.0.1:4103 | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:07.425 | 1s 844ms | 1s 839ms    |
| -1      | FRONTEND  | 127.0.0.1:4001 | 0.7.2   | 86ab3d9    | 2024-04-30T06:40:08.815 | 454ms    | 47ms        |
| 0       | METASRV   | 127.0.0.1:3002 | unknown | unknown    |                         |          |             |
+---------+-----------+----------------+---------+------------+-------------------------+----------+-------------+

#3845 Optimize count(*) query performance

count() is a commonly used aggregate function for counting the number of rows in a table. To enhance query performance, new optimizer rules have been introduced to optimize the execution plan of count() queries. This reduces query time and improves the system's response speed, which is particularly important for scenarios involving large datasets and frequent statistical queries.

The new optimization rules transform count() into count(<TIME INDEX>). This change has improved the performance of count() by five times. This optimization is based on the fact that the underlying storage engine scans the time index column faster than the primary key column. Reading the time index column does not require decoding like reading the primary key column does.

#3858 Mirror insert req to flow node

  • Add a path in do_request of Inserter to allow copy source table insert requests and send to flow node.

  • Also added TableFlowManager field in Inserter to allow query which insert request should go which flownode, but since we only have standalone flownode for now, they are mostly blanket impl.

  • Some minor refactors, including flownode handle_request change to handle_requests for batching request, and add XXXRef for Flow related managers.

Good First Issue

#3336 TLS for gRPC service

Currently, GreptimeDB's gRPC service does not support encryption. Adding TLS protocol support will enhance the security when using the gRPC protocol.

Keywords: gRPC, TLS, security

Difficulty: Medium

#3265 Add more tests for Copy From

Add more tests for Copy from to ensure Copy from behaves as expected.

Keywords: test

Difficulty: Simple

#3004 Checksum for manifests

The manifest doesn't have any checksum for data validation. We need a way to do the checksum validation for region manifests. A possible way is to save the checksum as the part of manifest file name, for example, 000000000001-{checksum}.json.

After reading the file content, we can calculate the content checksum by CRC32 or other algorithms and ensure the value is equal to the checksum in the file name.

Keywords: manifests, CRC32

Difficulty: Medium


About Greptime

We help industries that generate large amounts of time-series data, such as Connected Vehicles (CV), IoT, and Observability, to efficiently uncover the hidden value of data in real-time.

Visit the latest version to get started and get the most out of your data.

  • GreptimeDB, written in Rust, is a distributed, open-source time-series database designed for unlimited horizontal scaling, high performance, and integrated analytics. We provide GreptimeDB Enterprise for corporate users which supports more enterprise features and customized services. Contact us here for more information.

  • GreptimeCloud is a fully-managed cloud database-as-a-service (DBaaS) solution built on GreptimeDB. It efficiently supports applications in fields such as observability, IoT, and finance. The built-in observability solution, GreptimeAI, helps users comprehensively monitor the cost, performance, traffic, and security of LLM applications.

  • The Vehicle-Cloud Integrated TSDB is a finely tailored solution that aligns closely with the specific business scenarios of automotive companies, addressing the challenges posed by the exponential growth of vehicle data. The multimodal vehicle-side database, combined with the cloud-based GreptimeDB Enterprise, greatly reduces traffic, computing, and storage costs, and boosts data timeliness and business insight capabilities.

If anything above draws your attention, don't hesitate to star us on GitHub or join GreptimeDB Community on Slack. Also, you can go to our contribution page to find some interesting issues to start with.

biweekly

Join our community

Get the latest updates and discuss with other users.