Home and Personal Infrastructure Overhaul: Part 7 - Using Drone with Terraform :: YetiOps — A view on the tech and open source world from a hairy human

This post is the next in the series on how I overhauled my personal infrastructure to make it easier to manage, make changes and integrate new applications.

Previous posts in the series are: -

This post will cover using Drone to manage and deploy resources with Terraform.

Background

When I first started using Drone, I wasn’t using it with Terraform. In fact very little of my infrastructure was managed by Terraform. Recently though, I decided to start moving most external services to it. In keeping with my theme of trying to automate my infrastructure and avoid manual changes, it made sense to finally start using Terraform for my personal usage of external services.

Ansible and Salt still control what runs on my machines (including VPSs/instances on cloud providers), but managing external services makes a lot of sense with Terraform. While Ansible and Salt are good at making changes, they don’t manage “desired state” very well, in the sense that Ansible can create a VPS on Hetzner, but it does not know that one may already exist.

With Terraform, it knows what resources it manages (using State management), and whether they should already exist or not. Terraform may still be incorrect as to the state of a resource, but usually only if the resource changed since the last time Terraform ran. Something like Ansible will just try to make the same changes it did last time, which could lead to duplicate resources, or the Playbook failing because it can’t create a resource (because it already exists and can’t be created again).

The following covers what I use: -

Cloudflare for external DNS
- This includes externally accessibly domains
- It also includes “internal” domains I use that I want valid Let’s Encrypt certificates for
Digital Ocean and Hetzner for VPSs
- My blog and other external services (RSS, Read-It-Later, Black Box Exporter for monitoring) are hosted on these
Google Cloud Platform and Oracle Cloud Infrastructure for the always-free virtual machines
Lab/Proxmox for testing technologies

For most of these, the steps are the same, with some minor differences in variable/secrets for each provider). In some cases, extra steps are required. I’ll first show the most basic pipeline (the one used for Cloudflare DNS) and then show some of the extra steps in others.

Full Drone Pipeline - Cloudflare

kind: pipeline
name: default
type: docker

trigger:
  branch:
    - main

steps:
  - name: Terraform FMT PR
    image: jmccann/drone-terraform:latest
    settings:
      actions:
        - fmt
      fmt_options:
        write: false
        diff: true
        check: true
    when:
      event:
      - pull_request

  - name: Terraform Plan
    image: jmccann/drone-terraform:latest
    settings:
      actions:
        - validate
        - plan
    environment:
      DIGITALOCEAN_TOKEN:
        from_secret: digitalocean_token 
      HCLOUD_TOKEN:
        from_secret: hcloud_token
      CLOUDFLARE_API_TOKEN:
        from_secret: cloudflare_api_token
    when:
      event:
      - pull_request

  - name: slack-pr
    image: plugins/slack
    settings:
      webhook:
        from_secret: drone_builds_slack_webhook 
      channel: builds
      template: >
        {{#success build.status}}
          {{repo.name}} PR build passed. 
          Merge in to apply.
          PR: https://git.noisepalace.co.uk/YetiOps/{{repo.name}}/pulls/{{build.pull}}
          Build: https://drone.noisepalace.co.uk/YetiOps/{{repo.name}}/{{build.number}}
        {{else}}
          {{repo.name}} PR build failed. 
          Please investigate. 
          PR: https://git.noisepalace.co.uk/YetiOps/{{repo.name}}/pulls/{{build.pull}}
          Build: https://drone.noisepalace.co.uk/YetiOps/{{repo.name}}/{{build.number}}
        {{/success}}            
    when:
      status:
      - failure
      - success
      event:
        - pull_request

  - name: slack-push-start
    image: plugins/slack
    settings:
      webhook:
        from_secret: drone_builds_slack_webhook 
      channel: builds
      template: >
        {{repo.name}} build is starting.
        Build: https://drone.noisepalace.co.uk/YetiOps/{{repo.name}}/{{build.number}}      
    when:
      branch:
      - main
      event:
      - push
      - tag
  
  - name: Terraform FMT
    image: jmccann/drone-terraform:latest
    settings:
      actions:
        - fmt
      fmt_options:
        write: false
        diff: true
        check: true
    when:
      branch:
      - main
      event:
      - push
      - tag


  - name: Terraform Apply
    image: jmccann/drone-terraform:latest
    settings:
      actions:
        - validate
        - plan
        - apply
    environment:
      DIGITALOCEAN_TOKEN:
        from_secret: digitalocean_token 
      HCLOUD_TOKEN:
        from_secret: hcloud_token
      CLOUDFLARE_API_TOKEN:
        from_secret: cloudflare_api_token
    when:
      branch:
      - main
      event:
      - push
      - tag

  - name: slack-push
    image: plugins/slack
    settings:
      webhook:
        from_secret: drone_builds_slack_webhook 
      channel: builds
      template: >
        {{#success build.status}}
          {{repo.name}} build passed.
          Build: https://drone.noisepalace.co.uk/YetiOps/{{repo.name}}/{{build.number}}
        {{else}}
          {{repo.name}} build {{build.number}} failed. Please investigate. 
          Build: https://drone.noisepalace.co.uk/YetiOps/{{repo.name}}/{{build.number}}
        {{/success}}            
    when:
      status:
      - failure
      - success
      branch:
      - main
      event:
      - push
      - tag

This pipeline is very similar to the one used for Ansible. This uses the same Slack notification steps, the same kind of triggers/conditionals for running steps, and runs as a single Docker-based pipeline (rather than the multiple pipelines required for Salt).

The bulk of the Terraform-specific steps in the pipelines use the Terraform Drone plugin. It is possible to use the official Hashicorp Terraform Docker image, but the Drone plugin is a little more convenient. If you use the official Hashicorp Terraform Docker image, you need to define each command you want to run, whereas the plugin reduces most commands to fields.

Steps - Terraform FMT

- name: Terraform FMT PR
  image: jmccann/drone-terraform:latest
  settings:
    actions:
      - fmt
    fmt_options:
      write: false
      diff: true
      check: true
  when:
    event:
    - pull_request

This step is the same as running the terraform fmt command with the flags -diff and -check. While this isn’t a necessary step, it does apply some consistent formatting rules to Terraform files.

In the diff and check mode, it doesn’t make any changes, but will fail if terraform fmt mentions any required changes (as it will exit with a non-zero exit code).

Terraform Drone FMT step

Steps - Terraform Plan

  - name: Terraform Plan
    image: jmccann/drone-terraform:latest
    settings:
      actions:
        - validate
        - plan
    environment:
      DIGITALOCEAN_TOKEN:
        from_secret: digitalocean_token 
      HCLOUD_TOKEN:
        from_secret: hcloud_token
      CLOUDFLARE_API_TOKEN:
        from_secret: cloudflare_api_token
    when:
      event:
      - pull_request

This step runs terraform validate and terraform plan. Validate ensures that the Terraform configuration is correct (i.e. brackets in the correct place, variables defined correctly etc), and the Plan stage shows the changes Terraform wants to make.

In this Drone expose a set of environment variables, namely my Digital Ocean API token, Hetzner HCloud API token and the Cloudflare API token. It uses the Digital Ocean and Hetzner tokens to create DNS records for the public IPs of my VPSs within Hetzner and Digital Ocean, and the Cloudflare API token is used to read/write updates to the Cloudflare API. An example of the relevant Terraform code is shown below: -

data.tf

data "hcloud_server" "hcloud-vps-a" {
  name = "vps-shme-hcloud-a"
}

data "hcloud_server" "hcloud-vps-b" {
  name = "vps-shme-hcloud-b"
}

data "digitalocean_droplet" "vps-shme" {
  name = "vps-shme"
}

main.tf

# VPS Records - IPv4

resource "cloudflare_record" "vps-a" {
  zone_id = local.this_zone_id
  name    = "vps-a"
  type    = "A"
  proxied = "false"
  value   = data.hcloud_server.hcloud-vps-a.ipv4_address
}

resource "cloudflare_record" "vps-b" {
  zone_id = local.this_zone_id
  name    = "vps-b"
  type    = "A"
  proxied = "false"
  value   = data.hcloud_server.hcloud-vps-b.ipv4_address
}

resource "cloudflare_record" "vps-shme" {
  zone_id = local.this_zone_id
  name    = "vps"
  type    = "A"
  proxied = "false"
  value   = data.digitalocean_droplet.vps-shme.ipv4_address
}

# VPS Records - IPv6

resource "cloudflare_record" "vps-a-v6" {
  zone_id = local.this_zone_id
  name    = "vps-a"
  type    = "AAAA"
  proxied = "false"
  value   = data.hcloud_server.hcloud-vps-a.ipv6_address
}

resource "cloudflare_record" "vps-b-v6" {
  zone_id = local.this_zone_id
  name    = "vps-b"
  type    = "AAAA"
  proxied = "false"
  value   = data.hcloud_server.hcloud-vps-b.ipv6_address
}

resource "cloudflare_record" "vps-shme-v6" {
  zone_id = local.this_zone_id
  name    = "vps"
  type    = "AAAA"
  proxied = "false"
  value   = data.digitalocean_droplet.vps-shme.ipv6_address
}

As you can see, this uses data sources to get information from Digital Ocean and Hetzner, and then applies the IPv4/IPv6 addresses as values in cloudflare_record resources.

I also create DNS records for instances within Google Cloud Platform and Oracle Cloud Infrastructure, but with these I use Terraform Remote State. The reason for this is that both of these require extra configuration and files (GPG keys for OCI, a JSON document for GCP) that add extra steps and complication to the pipeline. While this does mean that I am reliant on the Terraform Remote State being up to date (rather than a direct data source like for Digital Ocean or Hetzner), I accept this trade off to avoid the additional complexity.

The relevant Terraform for the GCP and OCI records are shown below: -

data.tf

data "terraform_remote_state" "oracle-ork" {
  backend = "consul"
  config = {
    address = "consul.noisepalace.co.uk"
    scheme  = "https"
    path    = "terraform/oraclecloud/oci"
  }
}

data "terraform_remote_state" "yetiops-goggle" {
  backend = "consul"
  config = {
    address = "consul.noisepalace.co.uk"
    scheme  = "https"
    path    = "terraform/gcp/yetiops-goggle"
  }
}

main.tf

resource "cloudflare_record" "vps-ork-01" {
  zone_id = local.this_zone_id
  name    = "ork-01"
  type    = "A"
  proxied = "false"
  value   = data.terraform_remote_state.oracle-ork.outputs.instance_public_ip
}

resource "cloudflare_record" "vps-gog-01" {
  zone_id = local.this_zone_id
  name    = "gog-01"
  type    = "A"
  proxied = "false"
  value   = data.terraform_remote_state.yetiops-goggle.outputs.gog-01-ipv4
}

These use Consul as the source for Remote State. More details on this are in this section.

The Drone step will run a plan against these resources (and all other defined resources), and show what changes need to be made (if any).

Terraform Drone Plan

Steps - Terraform Apply

  - name: Terraform Apply
    image: jmccann/drone-terraform:latest
    settings:
      actions:
        - validate
        - plan
        - apply
    environment:
      DIGITALOCEAN_TOKEN:
        from_secret: digitalocean_token 
      HCLOUD_TOKEN:
        from_secret: hcloud_token
      CLOUDFLARE_API_TOKEN:
        from_secret: cloudflare_api_token
    when:
      branch:
      - main
      event:
      - push
      - tag

This step is almost identical to the Plan stage. The only difference is that it uses the apply action as well. This runs a validate and a plan action (to ensure that the code is still valid when merged with the main branch) and then applies the changes.

This is all controlled via Gitea pull requests, meaning that any PRs raised will go through all the validation and planning steps. Changes are then applied without any human interaction with the Terraform CLI itself.

Drone Terraform Apply

For most Terraform code, these are all the steps you will need.

Additional steps

As noted above, most of my Terraform code doesn’t need anything more than the steps covered already. The secrets to expose may differ (based upon the infrastructure used), otherwise everything else is the same.

However for some Terraform code, I need to make files available (e.g. SSH keys, GPG keys, configuration objects). The following step covers how to do this: -

steps:
  - name: Place SSH keys - PR
    image: alpine
    environment: 
      SSH_PRIV:
        from_secret: drone_ssh_priv
      SSH_PUB:
        from_secret: drone_ssh_pub
      OCI_PRIVATE_KEY:
        from_secret: oci_private_key
    volumes:
      - name: cache
        path: /ssh
    commands:
      - echo -e "$SSH_PRIV" | tee /ssh/id_rsa
      - echo -e "$SSH_PUB" | tee /ssh/id_rsa.pub
      - echo -e "$OCI_PRIVATE_KEY" | tee /ssh/oci_private_key.pem
      - chmod 644 /ssh/*
    when:
      event:
      - pull_request

[...]

volumes:
  - name: cache
    temp: {}

This step takes the contents of some secrets, and then places them into files. In the above, these are my Drone SSH keys, and also the OCI GPG private key. These are placed in a volume. A volume is a shared directory/path that can be made available to other steps within a pipeline.

For Oracle Cloud Infrastructure, the GPG key is required to authenticate against the API, whereas the SSH keys (specifically the public key) are used in the cloud-init/user data to bootstrap the instances. By default they do not allow password-based authentication to login via SSH, so they need some form of SSH key to allow you to login.

An example of using this is below: -

- name: Terraform Plan
  image: jmccann/drone-terraform:latest
  settings:
    actions:
      - validate
      - plan
    vars:
      ssh_public_key: "/ssh/id_rsa.pub"
  environment:
      TF_VAR_tenancy_ocid: 
        from_secret: oci_tenancy_ocid
      TF_VAR_user_ocid: 
        from_secret: oci_user_ocid
      TF_VAR_compartment_ocid: 
        from_secret: oci_compartment_ocid
      TF_VAR_fingerprint: 
        from_secret: oci_fingerprint
      TF_VAR_private_key_path: "/ssh/oci_private_key.pem"
  volumes:
    - name: cache
      path: /ssh
  when:
    event:
    - pull_request

As you can see, the SSH key path is supplied as a variable. We also supply the OCI private key path as a TF_VAR environment variable. The most important part here is the volumes section, as without this the keys created in the previous step would not be available.

Volumes could be used in many different ways (e.g. cached dependencies, creating build artifacts and pushing them). In this case it make files available between steps, allowing authentication to certain providers.

The same step is used in my Lab Terraform when creating Proxmox virtual machines. This is because cloud-init in Proxmox requires adding files directly to the Proxmox host’s file system (rather than being stored in an API). In my case, I use SSH/SCP within Terraform to transfer the files to the Proxmox host(s), using the Drone SSH keys for authentication.

Remote State Backend with Consul

While not specific to Drone, using some form of Remote State Backend within Terraform allows the following: -

A central place to source state between Drone jobs and local Terraform testing
Using Remote State as a data source in Terraform code
State Locking - avoids multiple jobs/people applying changes at the same time (and potentially breaking each others changes)
- Not all backends support this, and some require a separate backend for locking (e.g. AWS S3 for state storage, and DynamoDB for state locking)
Most importantly, it means Drone does not need to write to a terraform.tfstate file in the repository and push it back to the repository after changes are made

The last point is especially pertinent, as without this the following steps would be required: -

Create a cache/state volume in the pipeline
Make the Terraform Drone plugin place state files in the volume
Run a step after each Terraform Apply stage to add, commit and push changes back to the code repository
- This also means giving Drone access to push (and not just pull/clone rights) access to the repository

Instead, using a Remote State Backend means none of these extra steps are required. Many different options for Backends are available (e.g. AWS S3, Azure Blob Storage, etcd, Postgres), but I chose Consul as I already run Consul for Prometheus Service Discovery.

Setting up this backend looks like the below: -

terraform {
  backend "consul" {
    address = "consul.noisepalace.co.uk"
    scheme  = "https"
    path    = "terraform/oraclecloud/oci"
  }
}

The path structure is arbitrary. I decided to use a $APPLICATION/$PROVIDER/$PURPOSE structure, but you could name each one after characters in Star Wars or Transformers if you wished! This stores state in the Consul KV store, as shown below: -

Drone Terraform Consul KV store

Demonstration

The below video is a demonstration of making a change to the Terraform repository and committing it. In this step I am going to create a DNS record in Cloudflare: -

Summary

In this, we have seen how to can make use of Drone to apply Terraform configuration.

This has also shown how we can make use of Consul to provide a consistent Backend for Terraform, reducing the need for additional steps to manage Terraform state files with Drone.

In the next post, I’m going to cover using Drone to build Go releases.