Different ingestion protocols can differ by up to 50x in throughput. This post uses benchmark data to help you pick the right one.
GreptimeDB supports over a dozen ingestion protocols, and the most common question in our community is: which one should I use?
There's plenty of scattered data out there, but test conditions vary so much that direct comparison is impossible. So I built an open-source benchmark tool, greptimedb-ingestion-benchmark, to test the most common protocols under identical conditions. This post shares the results and recommendations.
Protocols tested
Three categories, picked from GreptimeDB's many ingestion options:
GreptimeDB gRPC protocol, used through official SDKs, with three write modes:
| Write mode | Description |
|---|---|
| gRPC SDK (Unary) | One RPC call per batch, simplest |
| gRPC Stream | Bidirectional streaming over a persistent connection, suited for high-frequency and sustained high-throughput writes |
| gRPC Bulk (Arrow) | Arrow Flight DoPut with columnar transfer, highest throughput |
Open standard protocols: InfluxDB Line Protocol (HTTP text) and OTLP Logs (HTTP + Protobuf).
SQL protocols: MySQL INSERT and PostgreSQL INSERT.
We tested OTLP Logs rather than OTLP Metrics. In GreptimeDB's OTLP data model, Metrics maps each metric name to a separate table. This benchmark has 5 metric fields, so the Metrics model would create 5 tables — not a fair comparison. The Logs model writes all fields into a single table, keeping conditions consistent.
On Schemaless writes: gRPC SDK, gRPC Stream, InfluxDB LP, and OTLP all support automatic table creation[1] — just write new fields and GreptimeDB adds columns on the fly. SQL INSERT and gRPC Bulk (Arrow) require pre-created tables. SQL depends on an existing table structure for INSERT INTO statements; Arrow Bulk needs the target table to exist for column mapping via the DoPut interface. If your data structure changes frequently (IoT device fields, LLM conversation data, etc.), go with a Schemaless-capable protocol.
Test setup
10 million rows, 1 million time series (1,000 hosts × 5 regions × 10 datacenters × 20 services), 5 float64 metric fields per row, fixed random seed (seed=42). Each protocol writes to its own isolated table. 5 concurrent workers, all SDKs at default settings.
Test environment: MacBook Pro 14-inch (M4 Max, 48 GB), GreptimeDB standalone mode. This is a single-machine test to compare relative differences between protocols, not to measure absolute throughput limits. A production distributed cluster will yield higher absolute numbers, but the relative ordering stays the same. Full methodology in the repository README.
Results
batch=1000, 1M series
1 million time series is close to real production cardinality, and batch=1000 is a reasonable default for most workloads.
| Protocol | Throughput (rows/sec) | Duration | P50 latency | P99 latency |
|---|---|---|---|---|
| gRPC Bulk (Arrow) | 2,010,017 | 5.0s | 1.7 ms | 8.6 ms |
| gRPC Stream | 1,446,508 | 6.9s | 2.6 ms | 9.7 ms |
| gRPC SDK | 1,189,277 | 8.4s | 3.6 ms | 9.4 ms |
| OTLP Logs (HTTP) | 1,046,518 | 9.6s | 4.6 ms | 8.1 ms |
| InfluxDB LP | 985,409 | 10.2s | 4.2 ms | 10.5 ms |
| MySQL INSERT | 68,987 | 145.0s | 68.0 ms | 147.3 ms |
| PostgreSQL INSERT | 36,300 | 275.5s | 134.6 ms | 202.9 ms |

The three gRPC modes land between 1.2M and 2.0M rows/sec. HTTP protocols (OTLP Logs and InfluxDB LP) sit around 1M rows/sec. SQL comes in at 36K–69K rows/sec. That's nearly a 55x gap between the fastest and slowest.
OTLP Logs slightly outperforms InfluxDB LP here, but not because the OTLP protocol is inherently faster. The OTLP Logs table defaults to append_mode = 'true' and doesn't use business dimensions like host or region as primary keys (those live in the log_attributes JSON column). At 1M series, other protocols' tables pay a heavy cost maintaining primary keys; the OTLP Logs table doesn't. More on this in the "Series cardinality impact" section.
A note on the SQL results: both the connection pool and concurrency were set to 5, same as every other protocol in this benchmark. In practice, you can improve SQL write throughput by increasing the connection pool size and concurrency, but that's outside the scope of this test. The numbers here reflect relative performance under identical concurrency.
Lower cardinality reference: batch=1000, 100K series
With fewer time series (a few hundred hosts, say), throughput climbs noticeably:
| Protocol | Throughput (rows/sec) | P50 latency | P99 latency |
|---|---|---|---|
| gRPC Bulk (Arrow) | 2,978,357 | 1.5 ms | 5.3 ms |
| gRPC Stream | 1,890,927 | 2.5 ms | 6.9 ms |
| gRPC SDK | 1,423,277 | 3.4 ms | 6.6 ms |
| InfluxDB LP | 1,177,155 | 4.1 ms | 7.0 ms |
| OTLP Logs (HTTP) | 985,826 | 4.5 ms | 13.5 ms |
| MySQL INSERT | 73,682 | 65.2 ms | 122.9 ms |
| PostgreSQL INSERT | 43,762 | 112.4 ms | 149.8 ms |
At lower cardinality, gRPC Bulk reaches nearly 3M rows/sec, with 20–50% gains across all protocols. Note that InfluxDB LP now ranks above OTLP Logs — at 100K series, indexing pressure is lighter, so the append-only advantage fades.
Batch size impact
Four batch sizes (50 / 200 / 1,000 / 2,000) at 1M series:
| Protocol | batch=50 | batch=200 | batch=1000 | batch=2000 |
|---|---|---|---|---|
| gRPC Bulk (Arrow) | 1,141,828 | 1,605,752 | 2,010,017 | 2,059,978 |
| gRPC Stream | 903,437 | 1,201,400 | 1,446,508 | 1,559,845 |
| gRPC SDK | 780,298 | 1,044,454 | 1,189,277 | 1,114,197 |
| InfluxDB LP | 723,982 | 924,198 | 985,409 | 1,078,624 |
| OTLP Logs (HTTP) | 685,865 | 896,434 | 1,046,518 | 1,065,914 |
| MySQL INSERT | 65,662 | 67,240 | 68,987 | 70,324 |
| PostgreSQL INSERT | 37,962 | 33,611 | 36,300 | 40,731 |

gRPC is highly sensitive to batch size. Bulk goes from 1.14M at batch=50 to 2.06M at batch=2000 — nearly doubling. Larger batches mean fewer RPC calls and more efficient columnar encoding. However, gRPC SDK dips slightly at batch=2000 (1.11M vs 1.19M at batch=1000) — the Unary model pays extra serialization cost for oversized request bodies.
InfluxDB LP goes from 720K to 1.08M (+49%). The HTTP request-response model caps the upside.
SQL barely moves. MySQL goes from 66K to 70K. The bottleneck is SQL text parsing and the synchronous connection model, not batch size.
Series cardinality impact
100K vs 1M series, batch=1000:
| Protocol | 100K series | 1M series | Change |
|---|---|---|---|
| gRPC Bulk (Arrow) | 2,978,357 | 2,010,017 | -32% |
| gRPC Stream | 1,890,927 | 1,446,508 | -24% |
| gRPC SDK | 1,423,277 | 1,189,277 | -16% |
| InfluxDB LP | 1,177,155 | 985,409 | -16% |
| OTLP Logs (HTTP) | 985,826 | 1,046,518 | +6% |

More time series means more Memtable writes and primary key maintenance. gRPC Bulk has the highest absolute throughput, so it drops the most when the Memtable becomes the bottleneck (-32%). For ultra-high cardinality workloads, GreptimeDB has a dedicated flat format designed to handle this — delivering 4x write throughput and up to 10x faster queries.
OTLP Logs actually goes up. Its table defaults to append_mode = 'true' with only scope_name as the primary key — host, region, and other dimensions live in the log_attributes JSON column and aren't part of the primary key. Series cardinality is irrelevant to it. This is the characteristic of append-only mode: faster writes, and you can add indexes on specific columns later as needed.
Why the gap is so large
gRPC's advantage comes from encoding efficiency. Protocol Buffers is a compact binary format — small payloads, fast parsing. The three modes differ in connection handling: SDK sends one independent RPC per batch; Stream reuses a bidirectional stream, skipping per-batch connection negotiation for roughly 20–30% higher throughput; Bulk uses the Arrow Flight protocol[2] for columnar transfer, and since GreptimeDB also uses Arrow internally as its in-memory format, writes are near zero-copy — that's where the 2M rows/sec comes from. The tradeoff: you need to pre-create the table.
InfluxDB LP and OTLP both run over HTTP, with a full request-response cycle per batch. That's their ceiling. InfluxDB LP uses a text format, so text parsing overhead is more visible at small batch sizes; at larger batches, the gap with OTLP's Protobuf narrows.
SQL is slow for two reasons. First, the processing path is long: the client assembles INSERT INTO ... VALUES (...) text, the server parses the SQL, converts types row by row, then writes. Every step adds overhead, and the text payload is much larger than binary. Second, the concurrency model: MySQL and PostgreSQL protocols use synchronous connections — one connection handles one statement at a time, and concurrency is limited by the connection pool. This is fundamentally different from gRPC's asynchronous streaming model. None of this is GreptimeDB-specific — any time-series database accepting SQL writes faces the same protocol overhead.
How to choose
Most workloads: gRPC SDK. Around 1.2M rows/sec, simple code, Schemaless support. Our official SDKs cover Go, Java, Rust, Erlang, and .NET. If you don't have special requirements, start here. For JS/TS stacks (no gRPC JS client yet), use InfluxDB LP or OTLP instead — both have mature JS libraries and perform at the million-rows-per-second level.
Bulk imports: gRPC Bulk. Data migrations, backfills, ETL. 2M rows/sec, 10 million rows in 5 seconds. Requires pre-created tables. The Erlang SDK doesn't support this mode yet.
High-frequency or sustained high-throughput: gRPC Stream. IoT gateways, monitoring collectors, or any scenario with continuous non-stop writes. Also a good fit when write frequency is very high with small payloads per request. Bidirectional streaming avoids per-batch connection setup, delivering 1.2–1.5M rows/sec with Schemaless support.
InfluxDB ecosystem: InfluxDB Line Protocol. Already running Telegraf or outputting Line Protocol? Plug straight into GreptimeDB's compatible endpoint. Around 990K rows/sec, near-zero migration cost.
OTel ecosystem: OTLP. Already using OpenTelemetry Collector or OTel SDKs? OTLP is the natural fit at around 1.05M rows/sec with Schemaless support. Note that Metrics and Logs use different data models[3]: Metrics creates one table per metric name (suited for Prometheus-style monitoring), while Logs writes to a unified log table (suited for flexible data structures). Pick based on your actual data model.
Development and debugging: MySQL / PostgreSQL. Write throughput is low, but mysql, psql, DBeaver, ORMs, and language drivers all connect directly. No Schemaless support — create tables first. Slow writes don't mean slow queries: MySQL/PG protocols are GreptimeDB's primary query interface.
Quick reference
| gRPC SDK | gRPC Stream | gRPC Bulk | InfluxDB LP | OTLP | MySQL/PG | |
|---|---|---|---|---|---|---|
| Throughput | 1.19M/s | 1.45M/s | 2.01M/s | 990K/s | 1.05M/s | 36–69K/s |
| Schemaless | ✅ | ✅ | ❌ Pre-create | ✅ | ✅ | ❌ Pre-create |
| Wire format | Protobuf | Protobuf | Arrow IPC | Text | Protobuf | SQL text |
| SDK coverage | Go/Java/Rust/Erlang/.NET | Same | Same (no Erlang) | All languages | All languages | All languages |
| Best for | General default | High-freq / sustained | Bulk import | InfluxDB migration | OTel ecosystem | Queries & debugging |
In short: pick gRPC for performance (start with SDK, move to Stream or Bulk when needed), pick the compatible protocol for your existing ecosystem (InfluxDB LP / OTLP), and use SQL for queries and debugging.
Reproduce it yourself
git clone https://github.com/killme2008/greptimedb-ingestion-benchmark.git
cd greptimedb-ingestion-benchmark
bin/run.shThe script downloads GreptimeDB, starts it, runs every protocol, and prints results. Customize as needed:
bin/run.sh -protocols grpc,grpc_bulk,influxdb -batch-size 500,1000,2000
bin/run.sh -host 10.0.0.1 # connect to a remote instanceGot different results, or findings from a specific workload? We'd love to hear about it on GitHub Discussions or Slack.


