โœ•

Join us for a virtual meetup on Zoom at 8 PM, July 31 (PDT) about using One Time Series Database for Both Metrics and Logs ๐Ÿ‘‰๐Ÿป Register Now

โœ•
Skip to content
On this page
Engineering
โ€ข
January 23, 2025

Elasticsearch Protocol Support, Inverted Index Optimization, and Performance Improvements โ€“ Individual Contributor Has 'Gone the Extra Mile' Again! | Greptime Biweekly Report

A recap of the past 2 weeks progress and changes happened on GreptimeDB.

Together with our global community of contributors, GreptimeDB continues to evolve and flourish as a growing open-source project. We are grateful to each and every one of you.

Below are the highlights among recent commits:

  • Support log ingestion via Elasticsearch protocol
  • Support inverted index modification using ALTER command
  • Introduce sparse primary key encoding to optimize Metrics performance
  • Begin implementing BloomFilter as an alternative to full-text indexing

Contributors โ€‹

For the past two weeks, our community has been super active with a total of 113 PRs merged. 23 PRs from 6 individual contributors merged successfully and lots pending to be merged.

Congrats on becoming our most active contributors in the past 2 weeks:

๐Ÿ‘ Welcome @mtrbpr to the community as a new contributor with a successfully merged PR, and more PRs from other individual contributors are waiting to be merged.

New Contributor of GreptimeDB
New Contributor of GreptimeDB

๐ŸŽ‰ A big THANK YOU to all our members and contributors! It is people like you who are making GreptimeDB a great product. Let's build an even greater community together.

Highlights of Recent PRs โ€‹

db#5261 Support Elasticsearch _bulk API for Log Ingestion โ€‹

Users can now ingest logs using either Elasticsearch _bulk API or Logstash, further enriching GreptimeDB's support for the logging ecosystem.

db#5131 Support Inverted Index Modification via ALTER Command โ€‹

Users can configure inverted indexes using the ALTER command, making index adjustments more flexible and straightforward.

db#5365 Introduce SparsePrimaryKeyCodec and SparsePrimaryKeyFilter โ€‹

In Metrics scenarios, when the number of primary key columns in physical tables becomes excessive, the CPU overhead required for encoding all primary keys increases significantly. This has led to notable performance bottlenecks in both write and query operations.

This PR introduces sparse primary keys to encode only non-null keys, reducing CPU overhead and improving performance.

For complete details, refer to Tracking Issue db#5282.

db#5406 Initial Implementation of BloomFilter as an Alternative to Full-Text Indexing โ€‹

Full-text indexing in logging scenarios incurs substantial resource overhead. To address this, this PR begins implementing BloomFilter as an indexing method, serving as an alternative to full-text indexing. This indexing approach can significantly reduce resource consumption compared to full-text indexing.

Good First Issue โ€‹

db#5084 Add the HTTP API for Querying Pipelines โ€‹

Although we decide not to expose many HTTP APIs for DB, it will be natural to have an HTTP API for querying pipelines besides the create and delete operation for pipeline management.

For developer experience, when they create a pipeline, it will be convenient to use a similar API to query the pipeline unless using SQL to query greptime_private, for example:

plaintext
curl -XGET "http://localhost:4000/v1/events/pipelines/test?db=public"
  • Level: Simple

  • Keyword: Logs


About Greptime โ€‹

Greptime offers industry-leading time series database products and solutions to empower IoT and Observability scenarios, enabling enterprises to uncover valuable insights from their data with less time, complexity, and cost.

GreptimeDB is an open-source, high-performance time-series database offering unified storage and analysis for metrics, logs, and events. Try it out instantly with GreptimeCloud, a fully-managed DBaaS solutionโ€”no deployment needed!

The Edge-Cloud Integrated Solution combines multimodal edge databases with cloud-based GreptimeDB to optimize IoT edge scenarios, cutting costs while boosting data performance.

Star us on GitHub or join GreptimeDB Community on Slack to get connected.

Join our community

Get the latest updates and discuss with other users.