When we talk about GitOps, it's essential to begin with IaC (Infrastructure as Code). Indeed, IaC is not a novel idea and it has been around for quite some time. We unconsciously use IaC in our daily development, such as:
Using Vagrantfile to create virtual machines;
Using Vimscript to configure a personalized Vim editor;
Using Dockerfile to build containers;
And so on. The above tools are essentially IaC. As the name suggests, the core of IaC is to construct infrastructure using a code-based approach, from a single container image to an extremely large and complex cluster, to achieve write once and deploy anywhere.
Entering the era of cloud computing, developers urgently need IaC to describe resources on public clouds. Public cloud vendors such as AWS and GCP provide resource-based APIs for their cloud services, making it convenient for users to use APIs to define their infrastructure. At the same time, more advanced IaC tools like Terraform and Pulumi have made more concise abstractions based on public cloud APIs, thus providing better IaC options for developers to define cloud resources.
In today's cloud-native era, Kubernetes (K8s) has become the de facto standard for container orchestration. Many services run as containers on K8s, and K8s' native declarative and resource-oriented API naturally adapts to YAML as a configuration language. We can use YAML as an "assembly language" to describe applications in K8s, making YAML the IaC for defining K8s services.
GitOps is a DevOps workflow built on top of IaC. Without IaC, it is not possible to implement GitOps. The key elements of GitOps are:
All infrastructure configurations must use IaC;
Use Git to manage IaC code and adapt its workflows. We can use Pull Requests for change submission, code review, and Git history for change audit.
In short: GitOps lets us maintain our infrastructure as a purely code-based project.
The following are key reasons why we use GitOps:
Single Source of Truth: All configuration details of the infrastructure can be maintained through code, thus avoiding commonly encountered configuration problems such as configuration drift, snowflake instances, and so on.
Solving Various CI Problems in Code: We can use various programming methods to solve problems in continuous integration, such as configuring different CI plugins for change inspections and writing test code for configuration code. This approach provides great flexibility for continuous integration.
Scalable Operations Capabilities: We no longer need ClickOps because code is the best lever for developers, enabling us to manage larger clusters at a lower cost.
Key Technical Decisions
Greptime is a start-up company dedicated to building next-generation cloud-native time series databases. We have no historical operational burden, so we have enthusiastically embraced GitOps from day one.
We follow a few simple principles in designing GitOps:
GitOps First: We implemented the most critical elements of GitOps before continuously iterating and improving. Treating IaC projects as code projects that require continuous iteration allows for initial imperfections but ensuring sufficient flexibility for later iterations.
Simple and pragmatic: We avoid using complex technical solutions. Instead, we choose relatively mature, less encapsulated and easy-to-maintain technologies.
Therefore, we have made the following key technical decisions:
Monorepo: We maintain our cloud infrastructure and K8s service code in one repository called greptime-config. We only maintain one timeline, represented by the main branch, which consistently reflects the current state of our infrastructure. The entire repository is open to all Greptime internal developers, anyone can submit code to modify the infrastructure as long as it passes the owner’s code review. The repository does not store any sensitive data. All confidential information is encrypted by the sealed-secrets service or stored as GitHub Secrets in advance.
Use Terraform to describe cloud infrastructure: Terraform is an IaC tool with the best ecosystem for describing cloud resources. The HCL as a DSL is relatively simple and easy to use. In addition, we also use Terragrunt to simplify Terraform code. Based on Terraform, we can easily manage cloud resources such as EKS, RDS, S3, Load Balancer, etc.
Use Helm to package K8s services: Helm is essentially a YAML client-side rendering engine. It has only a thin layer of abstraction over K8s native YAML, so its usability may not be sufficient. However, Helm is currently the de facto standard for packaging services within the K8s ecosystem. The overall logic is straightforward, so we continue to use Helm for service packaging.
Use Argo CD for K8s service deployment: Argo CD is an open-source project that graduated from CNCF this year. It can watch our code repository and perform real-time deployment and configuration alignment. Argo CD has a user-friendly web UI and an active community. The overall architecture is also easy to maintain. We employ a single Argo CD meta-cluster to manage services across multiple K8s clusters in different regions.
GitHub Actions: As we utilize GitHub, it is natural to employ GitHub Actions. We can easily integrate a series of functions by configuring different actions for GitHub Actions. Once our code is merged into the main branch, GitHub Actions takes care of the deployment.
The overall Greptime GitOps architecture can be illustrated as follows based on the above discussion:
We believe that GitOps can become the cornerstone of xOps:
We use tools like Infracost to monitor cloud expenses, thereby implementing efficient FinOps;
We can easily integrate secure change defense strategies by developing various CI plugins and performing checks and blocks when users submit code. Based on this pattern, it will also be easy for us to load security rules in CI and implement DevSecOps in the future;
Integrating with IM like SlackBot allows us to utilize ChatOps. We can take it a step further by introducing large language models(LLMs) like ChatGPT to further assist with daily operations. For example, having ChatGPT helps generate configurations and then automatically initiate Code Reviews.
GitOps is an imaginative DevOps practice, and we will continue to explore more possibilities for GitOps in the future.