AUTOMATING GITLAB RUNNERS WITH TERRAFORM
GitLab Runners are used to run pipelines in GitLab CI/CD. While you can use the public runners, many organizations choose to use self-managed runners for various reasons.
About
This post will go over the deployment of GitLab Runners using Terraform using several different Terraform Providers: (Helm, GitLab, Kubernetes).
Provider setup
In this scenario, we will assume you have an EKS cluster setup in your AWS account already, and we will have created a GitLab Group Access token at our top group level giving ‘Owner’ level with the API scope.
The first thing we need to do is setup our providers. I have the AWS CLI installed to fetch my temporary token from my EKS cluster.
Let’s declare our providers: (you’ll notice I have some data resources declared, we’ll cover that further down)
terraform {
required_providers {
helm = {
source = "hashicorp/helm"
version = "<= 2.5.1"
}
gitlab = {
source = "gitlabhq/gitlab"
version = "< 4.0.0"
}
}
}
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
}
}
provider "helm" {
kubernetes {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
}
}
}
provider "gitlab" {
token = var.gitlab_token
base_url = "https://gitlab.com/api/v4/"
}
Provider notes
You will need to have the aws cli v2 installed wherever you invoke this pipeline. The pipeline will also need credentials to access your AWS account and must have access to your EKS cluster with access to the system:masters group in Kubernetes.
Variables
Notes: You can can declare the gitlab_token variable in your variables file and this should be set to sensitive and inferred from either an environmental variable exported locally or provided by a CI/CD variable in GitLab and masked. the base_url will also be different depending on your setup. Additionally, I find it handy to define the ‘chart_version’ variable value at the GitLab Group level. By doing this, I can just update the variable in one spot and re-run my Terraform Pipelines for my Runners to get the update. This is very nice if you have several deployments across one GitLab instance.
Let’s create a variables.tf file and add the following:
variable "gitlab_group_id" {
type = number
description = "Gitlab group ID to register runners against"
default = null
}
variable "gitlab_token" {
type = string
sensitive = true
default = ""
}
variable "chart_version" {
type = string
default = ""
}
GitLab Runner Helm Chart
Here we are going to define our GitLab Runner Helm chart. We can infer the registration token by using by using the GitLab provider in our data source. I do this so that my code is flexible and I can register to any GitLab group in my organization by just passing it the numeric Group ID in the ‘gitlab_group_id’ variable. The below is an example for my lab. You’ll want to adjust for your specific environment. I add an empty volume for Docker so my pipelines can support using Docker in Docker (DIND) using TLS. You’ll also notice I added the following annotation:
"cluster-autoscaler.kubernetes.io/safe-to-evict" = false
My EKS cluster uses the Kubernetes Autoscaler (deployed separately) and I don’t want want it to scale down nodes and therefore kill pods when I have CI/CD jobs running. In my use case scenario, I am running Terraform Pipelines and nothing like having a job killed during the middle of a terraform apply. It can be a real mess to clean up.
Important note on autoscaling and Node Groups in AWS availability zones:
AWS will re-balance your node groups if they span multiple AZ’s by default. This can also be disruptive to running pods. My setup ended up specifying a node group per AZ so this re-balancing doesn’t occur. This still meets availability/resiliency requirements in this scenario. This setting can be changed on your launch template, however having node groups assigned to a specific AZ was easier to manage in the long run.
resource "kubernetes_namespace" "gitlab_runner" {
metadata {
name = "gitlab-runner"
}
}
resource "helm_release" "gitlab-runner" {
name = "gitlab-runner"
repository = "https://charts.gitlab.io"
chart = "gitlab-runner"
version = var.chart_version
namespace = kubernetes_namespace.gitlab_runner.metadata[0].name
wait = true
lint = true
values = [
yamlencode(
{
image_pull_policy = "IfNotPresent"
probeTimeoutSeconds = 5
replicas = 2
gitlabUrl = "https://gitlab.com"
runnerRegistrationToken = data.gitlab_group.gitlab.runners_token
unregisterRunners = true
terminationGracePeriodSeconds = 3600
concurrent = 50
checkInterval = 30
logLevel = "info"
rbac = {
create = true
clusterWideAccess = true
}
runners = {
runUntagged = false
name = "gitlab runners"
tags = "sandbox"
config = <<-EOF
[[runners]]
[runners.kubernetes]
privileged = true
poll_timeout = 180
service_account = "gitlab-runner"
[[runners.kubernetes.volumes.empty_dir]]
name = "docker-certs"
mount_path = "/certs/client"
medium = "Memory"
[runners.kubernetes.pod_annotations]
"cluster-autoscaler.kubernetes.io/safe-to-evict" = "false"
[runners.cache]
Type = "s3"
Shared = true
Path = "gitlab_cache/"
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
BucketName = "mygitlab-cache"
BucketLocation = "us-east-2"
Insecure = false
EOF
}
podAnnotations = {
"cluster-autoscaler.kubernetes.io/safe-to-evict" = false
}
}
)
]
}
Data sources
We will declare some data sources to connect our providers and grab our GitLab Group ID. In our case, we are using the public Terraform EKS module. You can read the documentation on deploying and using the proper providers. In my example, I have named my module ‘eks’, although that code is not shown in this post.
data "gitlab_group" "gitlab {
group_id = var.gitlab_group_id
}
data "aws_eks_cluster" "cluster" {
name = module.eks.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks.cluster_id
}
data "tls_certificate" "eks" {
url = data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer
}
Final Thoughts
I find it easier to setup purpose-built EKS clusters per business unit/team. AWS charges you for the control plane and compute, however you can scale down your node groups to zero with a lambda function during non-use hours to really save some money. By doing that, you’re only paying for the control plane cost and compute for when you are actually going to be running GitLab jobs.
This post did not go in depth in setting resources. Perhaps another post at a later time will cover that topic as it will impact your node instance type and group sizes.
You can have your GitLab Pipeline manage itself with this code as long as you don’t destroy the deployment or EKS cluster! (Don’t ask me how I know that…) The helm chart is setup in a way that ensures the new pods are ready before removing the old ones, and it won’t kill any job pods until they are finished running any current jobs.