AUTOMATING GITLAB RUNNERS WITH TERRAFORM

March 24, 2023 Chris Lehnert Comments 0 Comment

Table of Contents

Provider setup
Provider notes
Variables
GitLab Runner Helm Chart
Data sources
Final Thoughts

GitLab Runners are used to run pipelines in GitLab CI/CD. While you can use the public runners, many organizations choose to use self-managed runners for various reasons.

About

This post will go over the deployment of GitLab Runners using Terraform using several different Terraform Providers: (Helm, GitLab, Kubernetes).

Provider setup

In this scenario, we will assume you have an EKS cluster setup in your AWS account already, and we will have created a GitLab Group Access token at our top group level giving ‘Owner’ level with the API scope.

The first thing we need to do is setup our providers. I have the AWS CLI installed to fetch my temporary token from my EKS cluster.

Let’s declare our providers: (you’ll notice I have some data resources declared, we’ll cover that further down)

terraform {
  required_providers {
    helm = {
      source  = "hashicorp/helm"
      version = "<= 2.5.1"
    }
    gitlab = {
      source  = "gitlabhq/gitlab"
      version = "< 4.0.0"
    }
  }
}
provider "kubernetes" {
  host                   = data.aws_eks_cluster.cluster.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
  }
}
provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws"
      args        = ["eks", "get-token", "--cluster-name", module.eks.cluster_id]
    }
  }
}
provider "gitlab" {
  token    = var.gitlab_token 
  base_url = "https://gitlab.com/api/v4/"
}

Provider notes

You will need to have the aws cli v2 installed wherever you invoke this pipeline. The pipeline will also need credentials to access your AWS account and must have access to your EKS cluster with access to the system:masters group in Kubernetes.

Variables

Notes: You can can declare the gitlab_token variable in your variables file and this should be set to sensitive and inferred from either an environmental variable exported locally or provided by a CI/CD variable in GitLab and masked. the base_url will also be different depending on your setup. Additionally, I find it handy to define the ‘chart_version’ variable value at the GitLab Group level. By doing this, I can just update the variable in one spot and re-run my Terraform Pipelines for my Runners to get the update. This is very nice if you have several deployments across one GitLab instance.

Let’s create a variables.tf file and add the following:

variable "gitlab_group_id" {
  type        = number
  description = "Gitlab group ID to register runners against"
  default     = null
}
variable "gitlab_token" {
  type      = string
  sensitive = true
  default   = ""
}
variable "chart_version" {
  type = string
  default = ""
}

GitLab Runner Helm Chart

Here we are going to define our GitLab Runner Helm chart. We can infer the registration token by using by using the GitLab provider in our data source. I do this so that my code is flexible and I can register to any GitLab group in my organization by just passing it the numeric Group ID in the ‘gitlab_group_id’ variable. The below is an example for my lab. You’ll want to adjust for your specific environment. I add an empty volume for Docker so my pipelines can support using Docker in Docker (DIND) using TLS. You’ll also notice I added the following annotation:

"cluster-autoscaler.kubernetes.io/safe-to-evict" = false

My EKS cluster uses the Kubernetes Autoscaler (deployed separately) and I don’t want want it to scale down nodes and therefore kill pods when I have CI/CD jobs running. In my use case scenario, I am running Terraform Pipelines and nothing like having a job killed during the middle of a terraform apply. It can be a real mess to clean up.

Important note on autoscaling and Node Groups in AWS availability zones:

AWS will re-balance your node groups if they span multiple AZ’s by default. This can also be disruptive to running pods. My setup ended up specifying a node group per AZ so this re-balancing doesn’t occur. This still meets availability/resiliency requirements in this scenario. This setting can be changed on your launch template, however having node groups assigned to a specific AZ was easier to manage in the long run.

resource "kubernetes_namespace" "gitlab_runner" {
  metadata {
    name = "gitlab-runner"
  }
}
resource "helm_release" "gitlab-runner" {
  name       = "gitlab-runner"
  repository = "https://charts.gitlab.io"
  chart      = "gitlab-runner"
  version    = var.chart_version
  namespace  = kubernetes_namespace.gitlab_runner.metadata[0].name
  wait       = true
  lint       = true
  values = [
    yamlencode(
      {
        image_pull_policy             = "IfNotPresent"
        probeTimeoutSeconds           = 5
        replicas                      = 2
        gitlabUrl                     = "https://gitlab.com"
        runnerRegistrationToken       = data.gitlab_group.gitlab.runners_token
        unregisterRunners             = true
        terminationGracePeriodSeconds = 3600
        concurrent                    = 50
        checkInterval                 = 30
        logLevel                      = "info"
        rbac = {
          create            = true
          clusterWideAccess = true
        }
        runners = {
          runUntagged = false
          name        = "gitlab runners"
          tags        = "sandbox"
          config      = <<-EOF
            [[runners]]
            [runners.kubernetes]
                privileged = true
                poll_timeout = 180
                service_account = "gitlab-runner"
            [[runners.kubernetes.volumes.empty_dir]]
                name = "docker-certs"
                mount_path = "/certs/client"
                medium = "Memory"
           [runners.kubernetes.pod_annotations]
                "cluster-autoscaler.kubernetes.io/safe-to-evict" = "false"
            [runners.cache]
                Type = "s3"
                Shared = true
                Path = "gitlab_cache/"
                [runners.cache.s3]
                    ServerAddress = "s3.amazonaws.com"
                    BucketName = "mygitlab-cache"
                    BucketLocation = "us-east-2"
                    Insecure = false
        EOF
        }
        podAnnotations = {
          "cluster-autoscaler.kubernetes.io/safe-to-evict" = false
        }
      }
    )
  ]
}

Data sources

We will declare some data sources to connect our providers and grab our GitLab Group ID. In our case, we are using the public Terraform EKS module. You can read the documentation on deploying and using the proper providers. In my example, I have named my module ‘eks’, although that code is not shown in this post.

data "gitlab_group" "gitlab {
  group_id = var.gitlab_group_id
}
data "aws_eks_cluster" "cluster" {
  name = module.eks.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
  name = module.eks.cluster_id
}
data "tls_certificate" "eks" {
  url = data.aws_eks_cluster.cluster.identity[0].oidc[0].issuer
}

Final Thoughts

I find it easier to setup purpose-built EKS clusters per business unit/team. AWS charges you for the control plane and compute, however you can scale down your node groups to zero with a lambda function during non-use hours to really save some money. By doing that, you’re only paying for the control plane cost and compute for when you are actually going to be running GitLab jobs.

This post did not go in depth in setting resources. Perhaps another post at a later time will cover that topic as it will impact your node instance type and group sizes.

You can have your GitLab Pipeline manage itself with this code as long as you don’t destroy the deployment or EKS cluster! (Don’t ask me how I know that…) The helm chart is setup in a way that ensures the new pods are ready before removing the old ones, and it won’t kill any job pods until they are finished running any current jobs.

Codedriven

Automate Everything

AUTOMATING GITLAB RUNNERS WITH TERRAFORM

March 24, 2023 Chris Lehnert Comments 0 Comment

About

Provider setup

Provider notes

Variables

GitLab Runner Helm Chart

Data sources

Final Thoughts

Like this:

Leave a Reply Cancel reply

About

Provider setup

Provider notes

Variables

GitLab Runner Helm Chart

Data sources

Final Thoughts

Share this:

Like this:

Leave a Reply Cancel reply