Skip to content
On this page
Tutorial
2026-03-24

How to Choose the Right Ingestion Protocol for GreptimeDB

How to Choose the Right Ingestion Protocol for GreptimeDB
Benchmark results comparing 7 GreptimeDB ingestion protocols under identical conditions, with throughput ranging from 36K to 3M rows/sec. Practical guidance for choosing the right one.

Different ingestion protocols can differ by up to 50x in throughput. This post uses benchmark data to help you pick the right one.

GreptimeDB supports over a dozen ingestion protocols, and the most common question in our community is: which one should I use?

There's plenty of scattered data out there, but test conditions vary so much that direct comparison is impossible. So I built an open-source benchmark tool, greptimedb-ingestion-benchmark, to test the most common protocols under identical conditions. This post shares the results and recommendations.

Protocols tested

Three categories, picked from GreptimeDB's many ingestion options:

GreptimeDB gRPC protocol, used through official SDKs, with three write modes:

Write modeDescription
gRPC SDK (Unary)One RPC call per batch, simplest
gRPC StreamBidirectional streaming over a persistent connection, suited for high-frequency and sustained high-throughput writes
gRPC Bulk (Arrow)Arrow Flight DoPut with columnar transfer, highest throughput

Open standard protocols: InfluxDB Line Protocol (HTTP text) and OTLP Logs (HTTP + Protobuf).

SQL protocols: MySQL INSERT and PostgreSQL INSERT.

We tested OTLP Logs rather than OTLP Metrics. In GreptimeDB's OTLP data model, Metrics maps each metric name to a separate table. This benchmark has 5 metric fields, so the Metrics model would create 5 tables — not a fair comparison. The Logs model writes all fields into a single table, keeping conditions consistent.

On Schemaless writes: gRPC SDK, gRPC Stream, InfluxDB LP, and OTLP all support automatic table creation[1] — just write new fields and GreptimeDB adds columns on the fly. SQL INSERT and gRPC Bulk (Arrow) require pre-created tables. SQL depends on an existing table structure for INSERT INTO statements; Arrow Bulk needs the target table to exist for column mapping via the DoPut interface. If your data structure changes frequently (IoT device fields, LLM conversation data, etc.), go with a Schemaless-capable protocol.

Test setup

10 million rows, 1 million time series (1,000 hosts × 5 regions × 10 datacenters × 20 services), 5 float64 metric fields per row, fixed random seed (seed=42). Each protocol writes to its own isolated table. 5 concurrent workers, all SDKs at default settings.

Test environment: MacBook Pro 14-inch (M4 Max, 48 GB), GreptimeDB standalone mode. This is a single-machine test to compare relative differences between protocols, not to measure absolute throughput limits. A production distributed cluster will yield higher absolute numbers, but the relative ordering stays the same. Full methodology in the repository README.

Results

batch=1000, 1M series

1 million time series is close to real production cardinality, and batch=1000 is a reasonable default for most workloads.

ProtocolThroughput (rows/sec)DurationP50 latencyP99 latency
gRPC Bulk (Arrow)2,010,0175.0s1.7 ms8.6 ms
gRPC Stream1,446,5086.9s2.6 ms9.7 ms
gRPC SDK1,189,2778.4s3.6 ms9.4 ms
OTLP Logs (HTTP)1,046,5189.6s4.6 ms8.1 ms
InfluxDB LP985,40910.2s4.2 ms10.5 ms
MySQL INSERT68,987145.0s68.0 ms147.3 ms
PostgreSQL INSERT36,300275.5s134.6 ms202.9 ms

Protocol throughput comparison

The three gRPC modes land between 1.2M and 2.0M rows/sec. HTTP protocols (OTLP Logs and InfluxDB LP) sit around 1M rows/sec. SQL comes in at 36K–69K rows/sec. That's nearly a 55x gap between the fastest and slowest.

OTLP Logs slightly outperforms InfluxDB LP here, but not because the OTLP protocol is inherently faster. The OTLP Logs table defaults to append_mode = 'true' and doesn't use business dimensions like host or region as primary keys (those live in the log_attributes JSON column). At 1M series, other protocols' tables pay a heavy cost maintaining primary keys; the OTLP Logs table doesn't. More on this in the "Series cardinality impact" section.

A note on the SQL results: both the connection pool and concurrency were set to 5, same as every other protocol in this benchmark. In practice, you can improve SQL write throughput by increasing the connection pool size and concurrency, but that's outside the scope of this test. The numbers here reflect relative performance under identical concurrency.

Lower cardinality reference: batch=1000, 100K series

With fewer time series (a few hundred hosts, say), throughput climbs noticeably:

ProtocolThroughput (rows/sec)P50 latencyP99 latency
gRPC Bulk (Arrow)2,978,3571.5 ms5.3 ms
gRPC Stream1,890,9272.5 ms6.9 ms
gRPC SDK1,423,2773.4 ms6.6 ms
InfluxDB LP1,177,1554.1 ms7.0 ms
OTLP Logs (HTTP)985,8264.5 ms13.5 ms
MySQL INSERT73,68265.2 ms122.9 ms
PostgreSQL INSERT43,762112.4 ms149.8 ms

At lower cardinality, gRPC Bulk reaches nearly 3M rows/sec, with 20–50% gains across all protocols. Note that InfluxDB LP now ranks above OTLP Logs — at 100K series, indexing pressure is lighter, so the append-only advantage fades.

Batch size impact

Four batch sizes (50 / 200 / 1,000 / 2,000) at 1M series:

Protocolbatch=50batch=200batch=1000batch=2000
gRPC Bulk (Arrow)1,141,8281,605,7522,010,0172,059,978
gRPC Stream903,4371,201,4001,446,5081,559,845
gRPC SDK780,2981,044,4541,189,2771,114,197
InfluxDB LP723,982924,198985,4091,078,624
OTLP Logs (HTTP)685,865896,4341,046,5181,065,914
MySQL INSERT65,66267,24068,98770,324
PostgreSQL INSERT37,96233,61136,30040,731

Batch size impact on throughput

gRPC is highly sensitive to batch size. Bulk goes from 1.14M at batch=50 to 2.06M at batch=2000 — nearly doubling. Larger batches mean fewer RPC calls and more efficient columnar encoding. However, gRPC SDK dips slightly at batch=2000 (1.11M vs 1.19M at batch=1000) — the Unary model pays extra serialization cost for oversized request bodies.

InfluxDB LP goes from 720K to 1.08M (+49%). The HTTP request-response model caps the upside.

SQL barely moves. MySQL goes from 66K to 70K. The bottleneck is SQL text parsing and the synchronous connection model, not batch size.

Series cardinality impact

100K vs 1M series, batch=1000:

Protocol100K series1M seriesChange
gRPC Bulk (Arrow)2,978,3572,010,017-32%
gRPC Stream1,890,9271,446,508-24%
gRPC SDK1,423,2771,189,277-16%
InfluxDB LP1,177,155985,409-16%
OTLP Logs (HTTP)985,8261,046,518+6%

Series cardinality impact

More time series means more Memtable writes and primary key maintenance. gRPC Bulk has the highest absolute throughput, so it drops the most when the Memtable becomes the bottleneck (-32%). For ultra-high cardinality workloads, GreptimeDB has a dedicated flat format designed to handle this — delivering 4x write throughput and up to 10x faster queries.

OTLP Logs actually goes up. Its table defaults to append_mode = 'true' with only scope_name as the primary key — host, region, and other dimensions live in the log_attributes JSON column and aren't part of the primary key. Series cardinality is irrelevant to it. This is the characteristic of append-only mode: faster writes, and you can add indexes on specific columns later as needed.

Why the gap is so large

gRPC's advantage comes from encoding efficiency. Protocol Buffers is a compact binary format — small payloads, fast parsing. The three modes differ in connection handling: SDK sends one independent RPC per batch; Stream reuses a bidirectional stream, skipping per-batch connection negotiation for roughly 20–30% higher throughput; Bulk uses the Arrow Flight protocol[2] for columnar transfer, and since GreptimeDB also uses Arrow internally as its in-memory format, writes are near zero-copy — that's where the 2M rows/sec comes from. The tradeoff: you need to pre-create the table.

InfluxDB LP and OTLP both run over HTTP, with a full request-response cycle per batch. That's their ceiling. InfluxDB LP uses a text format, so text parsing overhead is more visible at small batch sizes; at larger batches, the gap with OTLP's Protobuf narrows.

SQL is slow for two reasons. First, the processing path is long: the client assembles INSERT INTO ... VALUES (...) text, the server parses the SQL, converts types row by row, then writes. Every step adds overhead, and the text payload is much larger than binary. Second, the concurrency model: MySQL and PostgreSQL protocols use synchronous connections — one connection handles one statement at a time, and concurrency is limited by the connection pool. This is fundamentally different from gRPC's asynchronous streaming model. None of this is GreptimeDB-specific — any time-series database accepting SQL writes faces the same protocol overhead.

How to choose

Most workloads: gRPC SDK. Around 1.2M rows/sec, simple code, Schemaless support. Our official SDKs cover Go, Java, Rust, Erlang, and .NET. If you don't have special requirements, start here. For JS/TS stacks (no gRPC JS client yet), use InfluxDB LP or OTLP instead — both have mature JS libraries and perform at the million-rows-per-second level.

Bulk imports: gRPC Bulk. Data migrations, backfills, ETL. 2M rows/sec, 10 million rows in 5 seconds. Requires pre-created tables. The Erlang SDK doesn't support this mode yet.

High-frequency or sustained high-throughput: gRPC Stream. IoT gateways, monitoring collectors, or any scenario with continuous non-stop writes. Also a good fit when write frequency is very high with small payloads per request. Bidirectional streaming avoids per-batch connection setup, delivering 1.2–1.5M rows/sec with Schemaless support.

InfluxDB ecosystem: InfluxDB Line Protocol. Already running Telegraf or outputting Line Protocol? Plug straight into GreptimeDB's compatible endpoint. Around 990K rows/sec, near-zero migration cost.

OTel ecosystem: OTLP. Already using OpenTelemetry Collector or OTel SDKs? OTLP is the natural fit at around 1.05M rows/sec with Schemaless support. Note that Metrics and Logs use different data models[3]: Metrics creates one table per metric name (suited for Prometheus-style monitoring), while Logs writes to a unified log table (suited for flexible data structures). Pick based on your actual data model.

Development and debugging: MySQL / PostgreSQL. Write throughput is low, but mysql, psql, DBeaver, ORMs, and language drivers all connect directly. No Schemaless support — create tables first. Slow writes don't mean slow queries: MySQL/PG protocols are GreptimeDB's primary query interface.

Quick reference

gRPC SDKgRPC StreamgRPC BulkInfluxDB LPOTLPMySQL/PG
Throughput1.19M/s1.45M/s2.01M/s990K/s1.05M/s36–69K/s
Schemaless❌ Pre-create❌ Pre-create
Wire formatProtobufProtobufArrow IPCTextProtobufSQL text
SDK coverageGo/Java/Rust/Erlang/.NETSameSame (no Erlang)All languagesAll languagesAll languages
Best forGeneral defaultHigh-freq / sustainedBulk importInfluxDB migrationOTel ecosystemQueries & debugging

In short: pick gRPC for performance (start with SDK, move to Stream or Bulk when needed), pick the compatible protocol for your existing ecosystem (InfluxDB LP / OTLP), and use SQL for queries and debugging.

Reproduce it yourself

bash
git clone https://github.com/killme2008/greptimedb-ingestion-benchmark.git
cd greptimedb-ingestion-benchmark
bin/run.sh

The script downloads GreptimeDB, starts it, runs every protocol, and prints results. Customize as needed:

bash
bin/run.sh -protocols grpc,grpc_bulk,influxdb -batch-size 500,1000,2000
bin/run.sh -host 10.0.0.1  # connect to a remote instance

Got different results, or findings from a specific workload? We'd love to hear about it on GitHub Discussions or Slack.

References


  1. GreptimeDB ingestion — automatic schema generation ↩︎

  2. Apache Arrow Flight protocol ↩︎

  3. GreptimeDB OpenTelemetry data model ↩︎

Stay in the loop

Join our community