โœ•

Join us for a virtual meetup on Zoom at 8 PM, July 31 (PDT) about using One Time Series Database for Both Metrics and Logs ๐Ÿ‘‰๐Ÿป Register Now

โœ•
Skip to content
On this page
Contributor Stories
โ€ข
July 10, 2024

Enhancing the SQL Interval syntax: A story of open-source contribution

Eugene Tolbakov is the first individual committer of the GreptimeDB community. In this post, he shares the experience of why people spend time and energy to contribute to open-source projects. With an example to enhance the SQL Interval syntax for GreptimeDB, Eugene showcases how smooth it could be if you collaborate in the GreptimeDB community.

There are many reasons why developers dive into the world of Open Source contributions.

Contributing can be a way to give back to the community and use your skills for the greater good. It's a fantastic environment that allows you to network with talented developers, build relationships, and potentially find mentors or collaborators. For those seeking career advancement, contributions become a public portfolio showcasing your skills and experience. Sometimes, it's about a personal itch! You might encounter a bug or limitation in a project you rely on, and contributing a fix not only solves your own frustration but benefits everyone who uses that software.

The recipe for a successful Open Source contribution goes beyond just code. A strong desire to learn fuels the journey, as you navigate unfamiliar codebases and tackle new challenges. But learning flourishes best within a supportive environment.

A responsive community acts as a safety net, offering guidance and feedback to help you refine your contributions. Ideally, you can also "dogfood" the tool you contribute to, using it in your own work or personal projects. This firsthand experience provides valuable context for your contributions and ensures they address real-world needs. With these elements in place, you're well on your way to making a lasting impact on the Open Source community.

While I've been building my software development skills for a while now, Rust still feels like a whole new world to explore. This "newbie" feeling might make some shy away from contribution, fearing their code won't be good enough. However, I use silly mistakes as stepping stones to improve my skills, not a reason to feel discouraged.

Over a year of contributing to GreptimeDB has been a rewarding journey filled with learning experiences. Today, I'd like to walk you through one of those. Let's get our hands dirty (or should I say, claws? ๐Ÿฆ€)

Motivation โ€‹

This time we will be enhancing the Interval syntax by allowing a shortened version:

sql
select interval '1h';

SQL standard defines syntax like:

sql
select interval '1 hour';

Now, we would like both query syntax work, i.e., select interval '1 hour' and select interval '1h' should give you the same result.

Diving right into the code, I discovered that the core functionality for handling transformations already exists. The scope of a change boils down to adding a new rule specifically for the Interval data type: intervals with abbreviated time formats will be automatically expanded to their full versions. Let's take a closer look at a specific section of the code that does the logic:

rust
fn visit_expr(&self, expr: &mut Expr) -> ControlFlow<()> {
  match expr {
    Expr::Interval(interval) => match *interval.value.clone() {
        Expr::Value(Value::SingleQuotedString(value)) => {
            if let Some(data) = expand_interval_name(&value) {
                *expr = create_interval_with_expanded_name(
                    interval,
                    single_quoted_string_expr(data),
                );
            }
        }
        Expr::BinaryOp { left, op, right } => match *left {
            Expr::Value(Value::SingleQuotedString(value))=> {
                if let Some(data) = expand_interval_name(&value) {
                    let new_value = Box::new(Expr::BinaryOp {
                        left: single_quoted_string_expr(data),
                        op,
                        right,
                    });
                    *expr = create_interval_with_expanded_name(interval, new_value);
                }
// ...

Code Review โ€‹

An experienced Rust developer, Dennis, quickly identified an area for improvement. This is what exactly happened:

Dennis Comment
Dennis spotted an improvement opportunity

Code review shines as a learning tool.

Beyond simply accepting the suggested change (though the rationale for "efficiency" was clear!), I decided to take a deep dive. By analyzing the proposed improvement and attempting to explain it myself, I solidified my understanding of the code and best practices in Rust.

Avoiding unnecessary cloning and ownership transfers. โ€‹

Originally, I used .clone() on interval.value:

rust
match *interval.value.clone() { ... }

Cloning here creates a new instance of the data each time, which can be inefficient if the data structure is large or cloning is expensive. The suggested version avoids this by using references:

rust
match &*interval.value { ... }

Matching on a reference (&*interval.value) eliminates the cost of cloning. The same improvement is applied to the match on left in the binary operation case:

rust
match &**left { ... }

This one is slightly more involved: It uses a double dereference to get a reference to the value inside a Box, which is more efficient than cloning.

Clearer Pattern Matching โ€‹

Using references in pattern matching makes emphasise the intention of only reading data, not transferring ownership:

rust
match &*interval.value { ... }

This shows explicitly that the matched value is not being moved. It help with reasoning about the code, especially in a context with complex ownership and borrowing rules.

Reduced Cloning inside the Binary Operation โ€‹

In the original code, the op and right fields of the Expr::BinaryOp variant are cloned unconditionally:

rust
let new_value = Box::new(Expr::BinaryOp {
    left: single_quoted_string_expr(data),
    op,
    right,
});

However, they only need to be cloned if the left field is an Expr::Value variant with a string value. The suggested enhancement moves the cloning inside the if let block, so it only happens when necessary:

rust
let new_value = Box::new(Expr::BinaryOp {
    left: single_quoted_string_expr(data),
    op: op.clone(),
    right: right.clone(),
});

Using references instead of cloning โ€‹

In the original code, expand_interval_name(&value) is used, which borrows value.

However, value is of type String, which implements the Deref<Target=str> trait (more information could be found here). This means that it can be automatically dereferenced to a &str when necessary. The improved version uses expand_interval_name(value), which avoids the need for explicit reference.

Summary โ€‹

So in the context of this change, "efficiency" stands for:

  • Avoiding unnecessary cloning, reducing overhead.
  • Making the borrowing and ownership patterns clearer and safer.
  • Enhancing overall readability and maintainability.

This is how the visit_expr function looks like after all suggestions have been applied:

rust
fn visit_expr(&self, expr: &mut Expr) -> ControlFlow<()> {
  match expr {
      Expr::Interval(interval) => match &*interval.value {
        Expr::Value(Value::SingleQuotedString(value))
        | Expr::Value(Value::DoubleQuotedString(value)) => {
            if let Some(normalized_name) = normalize_interval_name(value) {
                *expr = update_existing_interval_with_value(
                    interval,
                    single_quoted_string_expr(normalized_name),
                );
            }
        }
        Expr::BinaryOp { left, op, right } => match &**left {
            Expr::Value(Value::SingleQuotedString(value))
            | Expr::Value(Value::DoubleQuotedString(value)) => {
                if let Some(normalized_name) = normalize_interval_name(value) {
                    let new_expr_value = Box::new(Expr::BinaryOp {
                        left: single_quoted_string_expr(normalized_name),
                        op: op.clone(),
                        right: right.clone(),
                    });
                    *expr = update_existing_interval_with_value(interval, new_expr_value);
                }
            }
            _ => {}
        },
        _ => {}
    },
// ...

Open Source contribution has been an incredible way to accelerate my Rust learning curve. This contribution to GreptimeDB is just one example that illustrates how I gain knowledge through open source contributions. And depending on your feedback, I'm excited to share more of these learning experiences in future posts!

A huge thanks to the entire Greptime team, especially Dennis, for their support and guidance throughout my contributions. Let's keep the contributing and learning going!

community
sql

Join our community

Get the latest updates and discuss with other users.