GreptimeDB v0.10's JSON Support for Unified Time-Series Data
GreptimeDB v0.10(FYI. v0.11.0
already released) introduces support for the JSON type, enhancing its ability to handle both semi-structured and structured data. GreptimeDB's schemaless model, similar to MongoDB, supports dynamic table creation and automatic column insersion when writing through protocols like gRPC and Prometheus Remote Write, allowing it to adapt flexibly to changing data structures.
However, in many use cases, much of the data is structured, such as the typical Traces Data in OpenTelemetry:
{
"name": "hello",
"context": {
"trace_id": "5b8aa5a2d2c872e8321cf37308d69df2",
"span_id": "051581bf3cb55c13"
},
"parent_id": null,
"start_time": "2022-04-29T18:52:58.114201Z",
"end_time": "2022-04-29T18:52:58.114687Z",
"attributes": {
"http.route": "some_route1"
},
"events": [
{
"name": "Guten Tag!",
"timestamp": "2022-04-29T18:52:58.114561Z",
"attributes": {
"event_attributes": 1
}
}
]
}
The context
, attributes
, and events
are all JSON objects or arrays. Since GreptimeDB is a time-series database capable of unifying metrics, logs, events, and traces, it is essential to support the JSON type.
This article uses the MySQL protocol supported in GreptimeDB as an example, assuming that readers have already installed GreptimeDB v0.10 or later versions. For installation instructions, please refer to the documentation.
Creating JSON-Type Tables and Managing JSON Data with GreptimeDB
Creating tables with JSON Column in GreptimeDB
First, when creating a table, you can specify a column as the JSON type:
CREATE TABLE traces(
`name` STRING,
context JSON,
parent_id STRING,
start_time TIMESTAMP,
end_time TIMESTAMP,
attributes JSON,
events JSON,
TIME INDEX (start_time),
PRIMARY KEY(`name`, parent_id, context)
);
In this example, we set context
, attributes
, and events
as JSON types. The time index is set to start_time
, and name, parent_id
, and context
are included in the Primary Key as tag columns.
Since
name
is a reserved keyword, it needs to be escaped using backticks.
Inserting Data
Now, let's try inserting data:
INSERT INTO traces
VALUES
(
'hello',
'{
"trace_id": "5b8aa5a2d2c872e8321cf37308d69df2",
"span_id": "051581bf3cb55c13"
}',
null,
'2022-04-29T18:52:58.114201Z'::TIMESTAMP,
'2022-04-29T18:52:58.114687Z'::TIMESTAMP,
'{
"http.route": "some_route1"
}',
'[
{
"name": "Guten Tag!",
"timestamp": "2022-04-29T18:52:58.114561Z",
"attributes": {
"event_attributes": 1
}
}
]');
You can insert a JSON string as a JSON type, and GreptimeDB will automatically convert it to the JSON type.
Querying and Handling JSON Data
To query the table:
mysql> SELECT * FROM traces\G;
*************************** 1. row ***************************
name: hello
context: {"span_id":"051581bf3cb55c13","trace_id":"5b8aa5a2d2c872e8321cf37308d69df2"}
parent_id: NULL
start_time: 2022-04-29 18:52:58.114000
end_time: 2022-04-29 18:52:58.114000
attributes: {"http.route":"some_route1"}
events: [{"attributes":{"event_attributes":1},"name":"Guten Tag!","timestamp":"2022-04-29T18:52:58.114561Z"}]
1 row in set (0.01 sec)
ERROR:
No query specified
When querying table data, the JSON type is automatically converted to a string for display, as the type information can be retrieved from the table's schema.
GreptimeDB offers several JSON functions to assist with processing JSON data. For instance, parse_json
can be used to convert a string into JSON:
mysql> select parse_json('{"span_id":"051581bf3cb55c13","trace_id":"5b8aa5a2d2c872e8321cf37308d69df2"}');
+--------------------------------------------------------------------------------------------------+
| parse_json(Utf8("{"span_id":"051581bf3cb55c13","trace_id":"5b8aa5a2d2c872e8321cf37308d69df2"}")) |
+--------------------------------------------------------------------------------------------------+
| @ span_idtrace_id051581bf3cb55c135b8aa5a2d2c872e8321cf37308d69df2 |
+--------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
Since GreptimeDB stores JSON data in binary format at the underlying level, the parse_json function returns a binary result.
The json_to_string function converts the JSON type into a string:
mysql> select json_to_string(parse_json('{"span_id":"051581bf3cb55c13","trace_id":"5b8aa5a2d2c872e8321cf37308d69df2"}'));
+------------------------------------------------------------------------------------------------------------------+
| json_to_string(parse_json(Utf8("{"span_id":"051581bf3cb55c13","trace_id":"5b8aa5a2d2c872e8321cf37308d69df2"}"))) |
+------------------------------------------------------------------------------------------------------------------+
| {"span_id":"051581bf3cb55c13","trace_id":"5b8aa5a2d2c872e8321cf37308d69df2"} |
+------------------------------------------------------------------------------------------------------------------+
You can extract specific fields from a JSON object using the json_get_
functions. For example:
mysql> SELECT json_get_string(context, 'trace_id') FROM traces;
+--------------------------------------------------+
| json_get_string(traces.context,Utf8("trace_id")) |
+--------------------------------------------------+
| 5b8aa5a2d2c872e8321cf37308d69df2 |
+--------------------------------------------------+
1 row in set (0.02 sec)
mysql> select json_get_string(parse_json('{"span_id":"051581bf3cb55c13","trace_id":"5b8aa5a2d2c872e8321cf37308d69df2"}'), 'span_id');
+-----------------------------------------------------------------------------------------------------------------------------------+
| json_get_string(parse_json(Utf8("{"span_id":"051581bf3cb55c13","trace_id":"5b8aa5a2d2c872e8321cf37308d69df2"}")),Utf8("span_id")) |
+-----------------------------------------------------------------------------------------------------------------------------------+
| 051581bf3cb55c13 |
+-----------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
At this point, those familiar with MongoDB might wonder if we can directly query with SELECT context.trace_id FROM traces
.
Unfortunately, this isn't possible for now. This is because we can't guarantee that the type of context.trace_id
will be consistent across all rows. GreptimeDB's SQL query engine and results are strongly typed, meaning that users must explicitly specify the type they want to return.
If the field type doesn't match while using json_get_
functions, it will either attempt a type conversion or return null
.
To test the compatibility of different JSON-type fields, let's insert data with an integer trace_id
:
INSERT INTO traces(name, context,start_time)
values ('world', '{"trace_id": 999}',now());
Next, let's query using json_get_string
and json_get_int
respectively:
mysql> SELECT json_get_string(context, 'trace_id') FROM traces;
+--------------------------------------------------+
| json_get_string(traces.context,Utf8("trace_id")) |
+--------------------------------------------------+
| 999 |
| 5b8aa5a2d2c872e8321cf37308d69df2 |
+--------------------------------------------------+
2 rows in set (0.02 sec)
mysql> SELECT json_get_int(context, 'trace_id') FROM traces;
+-----------------------------------------------+
| json_get_int(traces.context,Utf8("trace_id")) |
+-----------------------------------------------+
| NULL |
| 999 |
+-----------------------------------------------+
2 rows in set (0.02 sec)
The results show that json_get_string
successfully returns both rows, converting the integer 999
to a string and displaying it correctly. On the other hand, json_get_int
correctly extracts the integer field 999
, but since the other row's trace_id
is a string and cannot be converted, it returns NULL
.
Enhanced JSON Support in GreptimeDB Coming Versions
With the introduction of JSON type support in GreptimeDB v0.10, along with JSON functions, the system has enhanced its ability to handle both semi-structured and structured data. In the future, GreptimeDB will further improve its support for JSON types, such as enabling inverted indexing on specific fields within JSON objects to accelerate queries and introducing functions similar to Databend JQ function to simplify data processing.
About Greptime
Greptime offers industry-leading time series database products and solutions to empower IoT and Observability scenarios, enabling enterprises to uncover valuable insights from their data with less time, complexity, and cost.
GreptimeDB is an open-source, high-performance time-series database offering unified storage and analysis for metrics, logs, and events. Try it out instantly with GreptimeCloud, a fully-managed DBaaS solution—no deployment needed!
The Edge-Cloud Integrated Solution combines multimodal edge databases with cloud-based GreptimeDB to optimize IoT edge scenarios, cutting costs while boosting data performance.
Star us on GitHub or join GreptimeDB Community on Slack to get connected.