โœ•

Join us for a virtual meetup on Zoom at 8 PM, July 31 (PDT) about using One Time Series Database for Both Metrics and Logs ๐Ÿ‘‰๐Ÿป Register Now

โœ•
Skip to content
On this page
Engineering
โ€ข
March 5, 2025

HyperLogLog Function Now Live, PromQL Fully Upgraded | Greptime Biweekly Report

A recap of the past 2 weeks progress and changes happened on GreptimeDB.

Together with our global community of contributors, GreptimeDB continues to evolve and flourish as a growing open-source project. We are grateful to each and every one of you.

Below are the highlights among recent commits:

  • Added a large number of custom functions (UDF/UDAF)

    • Supports approximate counting using HyperLogLog
    • New IP address related functions, such as ipv4_string_to_num/ipv4_num_to_string, etc.
  • PromQL feature updates:

    • Support for nested subqueries in PromQL
    • Support for ignoring keyword to avoid joining on null values
    • Add topk and bottomk functions for calculating maximum/minimum k values
    • Add prom_round function to support rounding to arbitrary values
  • Bug Fixes

    • Fixed the error caused by Metasrv nodes in Follower state not rejecting DDL requests
    • Fixed the issue where Flownode could not automatically restore heartbeat after Metasrv restarts
    • Fixed the out-of-bounds access error of Bloom Filter in certain scenarios

Contributors โ€‹

For the past two weeks, our community has been super active with a total of 80 PRs merged. 5 PRs from 3 individual contributors merged successfully and lots pending to be merged.

Congrats on becoming our most active contributors in the past 2 weeks:

๐Ÿ‘ Welcome @weyert @xiaoniaoyouhuajiang to the community as a new contributor with a successfully merged PR, and more PRs from other individual contributors are waiting to be merged.

New Contributors of GreptimeDB
New Contributors of GreptimeDB

๐ŸŽ‰ A big THANK YOU to all our members and contributors! It is people like you who are making GreptimeDB a great product. Let's build an even greater community together.

Highlights of Recent PRs โ€‹

db#5579 New HyperLogLog functions (hll/hll_merge/hll_count) for approximate counting โ€‹

The hll function can aggregate the values of a certain column into an intermediate state using the HyperLogLog data structure. Subsequently, different intermediate states can be aggregated through hll_merge, and finally, the hll_count function achieves an approximate calculation of the number of distinct values. A simple example is as follows:

sql
mysql> CREATE TABLE access_log (
    `url` STRING,
    user_id BIGINT,
    ts TIMESTAMP TIME INDEX
);

mysql> CREATE TABLE access_log_10s (
    `url` STRING,
    time_window timestamp time INDEX,
    state BINARY
);

mysql> INSERT INTO access_log VALUES
         ("/dashboard", 1, "2025-03-04 00:00:00"),
         ("/dashboard", 1, "2025-03-04 00:00:01"),
         ("/dashboard", 2, "2025-03-04 00:00:05"),
         ("/not_found", 3, "2025-03-04 00:00:11");

-- Aggregate the raw data in 10-second windows, calculate 
-- the HyperLogLog state for user IDs within the same window, 
-- and write it to the access_log_10s aggregation table.
mysql> INSERT INTO
    access_log_10s
SELECT
    `url`,
    date_bin("10s" :: INTERVAL, ts) AS time_window,
    hll(`user_id`) AS state
FROM
    access_log
GROUP BY
    `url`,
    time_window;

-- Finally, use the hll_count function to perform approximate counting 
-- on the data in HyperLogLog.
mysql> SELECT
    `url`,
    time_window,
    hll_count(state) AS approx_count
FROM
    access_log_10s;

-- Results๏ผš
-- +------------+---------------------+--------------+
-- | url        | time_window         | approx_count |
-- +------------+---------------------+--------------+
-- | /dashboard | 2025-03-04 00:00:00 | 2            |
-- | /not_found | 2025-03-04 00:00:10 | 1            |
-- +------------+---------------------+--------------+

db#5602 Support for topk and bottomk functions in PromQL โ€‹

In the metrics monitoring scenario, it is often necessary to calculate the maximum or minimum values of several data points within a specified time window. This PR implements the topk and bottomk functions for quickly achieving this.

sql
mysql> CREATE TABLE cpu_user (
    ts timestamp(3) time INDEX,
    host STRING,
    idc STRING,
    val FLOAT,
    PRIMARY KEY(host, idc)
);

mysql> INSERT INTO
    TABLE cpu_user
VALUES
    (0, 'host1', "idc1", 1.0),
    (0, 'host2', "idc1", 2.0),
    (0, 'host3', "idc2", 3.0),
    (5000, 'host1', "idc1", 1.0),
    (5000, 'host2', "idc1", 4.0),
    (5000, 'host3', "idc2", 1.0),
    (10000, 'host1', "idc1", 3.0),
    (10000, 'host2', "idc1", 5.0),
    (10000, 'host3', "idc2", 3.0),
    (15000, 'host1', "idc1", 1.0),
    (15000, 'host2', "idc1", 2.0),
    (15000, 'host3', "idc2", 3.0);

-- Calculate the highest 1 row of data for each time window using the topk function.
mysql> TQL EVAL (0, 15, '5s') topk(1, cpu_user);

-- Results:
-- +------+-------+------+---------------------+
-- | val  | host  | idc  | ts                  |
-- +------+-------+------+---------------------+
-- |    3 | host3 | idc2 | 1970-01-01 00:00:00 |
-- |    4 | host2 | idc1 | 1970-01-01 00:00:05 |
-- |    5 | host2 | idc1 | 1970-01-01 00:00:10 |
-- |    3 | host3 | idc2 | 1970-01-01 00:00:15 |
-- +------+-------+------+---------------------+

db#5606 Add support for subqueries in PromQL โ€‹

The previous version of GreptimeDB could not execute subqueries through PromQL (e.g., sum_over_time(metrics[50s:10s])). This PR adds support for PromQL subqueries. A simple example is as follows:

sql
-- Create metrics table and insert data
CREATE TABLE metrics (ts timestamp time INDEX, val double,);
INSERT INTO metrics VALUES (0, 1), (10000, 2);

-- Run PromQL subquery
TQL EVAL (10, 10, '1s') sum_over_time(metric_total [50s:10s]);

-- Result๏ผš
-- +---------------------+----------------------------------+
-- | ts                  | prom_sum_over_time(ts_range,val) |
-- +---------------------+----------------------------------+
-- | 1970-01-01 00:00:10 |                                3 |
-- +---------------------+----------------------------------+

Good First Issue โ€‹

db#5613 Adding support for function alias โ€‹

  • Level: Medium

  • Keyword: UDF


About Greptime โ€‹

Greptime offers industry-leading time series database products and solutions to empower IoT and Observability scenarios, enabling enterprises to uncover valuable insights from their data with less time, complexity, and cost.

GreptimeDB is an open-source, high-performance time-series database offering unified storage and analysis for metrics, logs, and events. Try it out instantly with GreptimeCloud, a fully-managed DBaaS solutionโ€”no deployment needed!

The Edge-Cloud Integrated Solution combines multimodal edge databases with cloud-based GreptimeDB to optimize IoT edge scenarios, cutting costs while boosting data performance.

Star us on GitHub or join GreptimeDB Community on Slack to get connected.

Join our community

Get the latest updates and discuss with other users.