Skip to main content

Provision the First Kubernetes Load Balancer with Terraform

This page provisions only the first shared Kubernetes infrastructure VM:

VM name:      cicd-ac-k8s-lb-01
VM ID:        3156201
MAC address:  aa:bb:cc:05:14:01
Reserved IP:  192.168.8.201
API VIP:      192.168.8.200 (assigned later by Ansible and Keepalived)
CPU:          2 vCPU
RAM:          4096 MB
Boot disk:    scsi0, 40G, local-lvm
Template:     tmplt-ub-26-min-base
Template ID:  90000
Proxmox node: pve
Bridge:       vmbr0

The VM is created automatically after the approved Terraform change reaches the prod branch. The GitHub Actions job runs on the repository-level self-hosted runner:

prod-ac-cicd-infra-deploy-rnr-01

1. Scope of this page

This page performs the first controlled infrastructure test in a clean rebuild:

  1. Add the reusable Proxmox VM Terraform module.
  2. Define only cicd-ac-k8s-lb-01 in the initial shared Kubernetes stack.
  3. Push the feature change to dev, where Terraform validates and produces a plan only.
  4. Promote the same reviewed change from dev to prod.
  5. Allow the prod workflow to apply only after the shared-k8s GitHub Environment approval.
  6. Verify that Proxmox created the VM and DHCP assigned 192.168.8.201.
  7. Configure the VM with Ansible from prod.
  8. Verify HAProxy, Keepalived, the runtime socket, and API VIP 192.168.8.200.

The Terraform portion first creates the VM. The Ansible section at the bottom of this same page then sets the Ubuntu hostname, updates /etc/hostname and /etc/hosts, installs HAProxy and Keepalived, assigns the API VIP, and verifies the completed load balancer.

This page does not bootstrap the Kubernetes control planes or workers.

Important hostname note

Because tmplt-ub-26-min-base is a non-Cloud-Init template, Terraform changes the Proxmox VM name, but it may not change the hostname inside Ubuntu. The Ansible section on this page sets the guest hostname to cicd-ac-k8s-lb-01 and updates both /etc/hostname and /etc/hosts.

2. Files changed by this implementation

terraform/modules/proxmox-vm/
terraform/stacks/shared-k8s/
.github/workflows/terraform-plan-shared-k8s.yml
.github/workflows/terraform-apply-shared-k8s.yml

Do not add the other 15 VMs yet. The first prod apply must propose and create only one VM.

3. Prerequisites

Confirm all of the following before editing Terraform:

RequirementExpected value
GitHub repositoryASPIRECLAN-LLC-Org/ac-cicd-infra
Working branchFeature branch created from dev
Runner VMprod-terraform-deploy-02 / 192.168.8.93
Runnerprod-ac-cicd-infra-deploy-rnr-01
Runner statusIdle / online
Runner version2.327.1 or newer; verified implementation used 2.334.0
Runner labelsself-hosted, Linux, X64, prod, terraform, deploy, ac-cicd-infra
Proxmox APIhttps://192.168.8.23:8006/api2/json
Proxmox nodepve
VM templatetmplt-ub-26-min-base / VM ID 90000
Storagelocal-lvm
Bridgevmbr0
DHCP reservationAA:BB:CC:05:14:01192.168.8.201
API VIP reservation192.168.8.200 is unused and is not assigned to a DHCP client
Terraform backendPersistent local backend on prod-terraform-deploy-02
GitHub apply environmentshared-k8s

Before provisioning, verify the load-balancer address and VIP are not responding:

ping -c 1 -W 1 192.168.8.200 || true
ping -c 1 -W 1 192.168.8.201 || true

ip neigh show | grep -E '192.168.8.(200|201)' || true

Also inspect the router lease and reservation tables. A device that blocks ICMP may not answer ping.

Do not continue with an invalid backend file

terraform/stacks/shared-k8s/backend.tf must contain a valid Terraform backend block. This page uses the approved persistent local backend on prod-terraform-deploy-02. The state file must remain outside the GitHub runner checkout.

4. Create a feature branch from dev

Run from Windows PowerShell:

cd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra

git switch dev
git pull --ff-only origin dev

git switch -c feature/provision-k8s-load-balancer

The current infrastructure workflow model uses dev for validation and planning and prod for apply/configuration. The retained local, qa, and main branches are not part of this page's automated execution path.

5. Implement the reusable Proxmox VM module

5.1 Replace terraform/modules/proxmox-vm/variables.tf

variable "name" {
  description = "Proxmox VM name."
  type        = string
}

variable "description" {
  description = "Description shown in Proxmox."
  type        = string
  default     = ""
}

variable "vmid" {
  description = "Unique Proxmox VM ID."
  type        = number
}

variable "target_node" {
  description = "Proxmox node where the VM will be created."
  type        = string
}

variable "template_name" {
  description = "Proxmox template name used for cloning."
  type        = string
}

variable "cores" {
  description = "Number of CPU cores."
  type        = number
}

variable "memory_mb" {
  description = "RAM allocated to the VM in MB."
  type        = number
}

variable "disk_size" {
  description = "Boot disk size, such as 40G."
  type        = string
}

variable "storage" {
  description = "Proxmox storage used for the boot disk."
  type        = string
}

variable "bridge" {
  description = "Proxmox network bridge."
  type        = string
}

variable "mac_address" {
  description = "Permanent VM MAC address."
  type        = string

  validation {
    condition     = can(regex("^([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}$", var.mac_address))
    error_message = "mac_address must be a valid six-byte colon-separated MAC address."
  }
}

variable "tags" {
  description = "Proxmox tags."
  type        = list(string)
  default     = []
}

5.2 Replace terraform/modules/proxmox-vm/main.tf

resource "proxmox_vm_qemu" "this" {
  name        = var.name
  desc        = var.description
  vmid        = var.vmid
  target_node = var.target_node

  clone      = var.template_name
  full_clone = true

  agent                  = 1
  define_connection_info = false
  skip_ipv6              = true

  vm_state = "running"
  onboot   = true

  boot     = "order=scsi0;net0"
  bootdisk = "scsi0"
  scsihw   = "virtio-scsi-pci"

  memory  = var.memory_mb
  balloon = 0
  tablet  = false

  cpu {
    cores   = var.cores
    sockets = 1
    type    = "host"
  }

  disk {
    slot    = "scsi0"
    type    = "disk"
    storage = var.storage
    size    = var.disk_size

    discard = true
    backup  = true
  }

  network {
    id       = 0
    model    = "virtio"
    bridge   = var.bridge
    macaddr  = var.mac_address
    firewall = false
  }

  tags = join(",", var.tags)
}

5.3 Replace terraform/modules/proxmox-vm/outputs.tf

output "name" {
  description = "Created VM name."
  value       = proxmox_vm_qemu.this.name
}

output "vmid" {
  description = "Created Proxmox VM ID."
  value       = proxmox_vm_qemu.this.vmid
}

output "target_node" {
  description = "Proxmox node hosting the VM."
  value       = proxmox_vm_qemu.this.target_node
}

5.4 Verify terraform/modules/proxmox-vm/versions.tf

terraform {
  required_version = ">= 1.15.5"

  required_providers {
    proxmox = {
      source  = "Telmate/proxmox"
      version = "3.0.1-rc9"
    }
  }
}

6. Implement the shared-k8s Terraform stack

6.1 Replace terraform/stacks/shared-k8s/variables.tf

variable "pm_api_url" {
  description = "Proxmox API URL."
  type        = string
}

variable "pm_api_token_id" {
  description = "Proxmox API token ID."
  type        = string
  sensitive   = true
}

variable "pm_api_token_secret" {
  description = "Proxmox API token secret."
  type        = string
  sensitive   = true
}

variable "pm_tls_insecure" {
  description = "Allow the Proxmox self-signed TLS certificate."
  type        = bool
  default     = true
}

6.2 Replace terraform/stacks/shared-k8s/providers.tf

terraform {
  required_version = ">= 1.15.5"

  required_providers {
    proxmox = {
      source  = "Telmate/proxmox"
      version = "3.0.1-rc9"
    }
  }
}

provider "proxmox" {
  pm_api_url          = var.pm_api_url
  pm_api_token_id     = var.pm_api_token_id
  pm_api_token_secret = var.pm_api_token_secret
  pm_tls_insecure     = var.pm_tls_insecure
}

6.3 Replace terraform/stacks/shared-k8s/main.tf

This first version defines only the load-balancer VM.

module "api_load_balancer" {
  source = "../../modules/proxmox-vm"

  name        = "cicd-ac-k8s-lb-01"
  description = "Aspireclan shared Kubernetes API load balancer"

  vmid        = 3156201
  target_node = "pve"

  template_name = "tmplt-ub-26-min-base"

  cores     = 2
  memory_mb = 4096

  disk_size = "40G"
  storage   = "local-lvm"

  bridge      = "vmbr0"
  mac_address = "aa:bb:cc:05:14:01"

  tags = [
    "ac-cicd",
    "shared-k8s",
    "load-balancer",
    "terraform",
    "ansible",
  ]
}

6.4 Replace terraform/stacks/shared-k8s/outputs.tf

output "api_load_balancer" {
  description = "Kubernetes API load-balancer VM."
  value = {
    name        = module.api_load_balancer.name
    vmid        = module.api_load_balancer.vmid
    target_node = module.api_load_balancer.target_node
    reserved_ip = "192.168.8.201"
    api_vip     = "192.168.8.200"
  }
}

6.5 Keep or update terraform.tfvars.example

This file is documentation only. The current stack uses explicit approved values for the first VM.

# Non-sensitive examples only.
proxmox_node     = "pve"
proxmox_storage  = "local-lvm"
proxmox_bridge   = "vmbr0"
proxmox_template = "tmplt-ub-26-min-base"

6.6 Configure the persistent local backend

Replace terraform/stacks/shared-k8s/backend.tf with:

terraform {
  backend "local" {}
}

Both Terraform workflows use the same persistent state path on prod-terraform-deploy-02:

/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate

Create the directory once on prod-terraform-deploy-02:

sudo install -d -m 700 -o acllc -g acllc /var/lib/ac-cicd-infra/terraform-state/shared-k8s

touch /var/lib/ac-cicd-infra/terraform-state/shared-k8s/.write-test
rm /var/lib/ac-cicd-infra/terraform-state/shared-k8s/.write-test

The state path is outside the GitHub Actions checkout. Keep the repository-level Terraform runner count at one while this local backend is in use.

7. Configure the GitHub variable and secrets

Run from Windows PowerShell. The secret commands prompt for their values and do not display them afterward.

cd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra

gh variable set PM_API_URL `
  --repo ASPIRECLAN-LLC-Org/ac-cicd-infra `
  --body "https://192.168.8.23:8006/api2/json"

gh secret set PM_API_TOKEN_ID `
  --repo ASPIRECLAN-LLC-Org/ac-cicd-infra

gh secret set PM_API_TOKEN_SECRET `
  --repo ASPIRECLAN-LLC-Org/ac-cicd-infra

Verify only the names:

gh variable list `
  --repo ASPIRECLAN-LLC-Org/ac-cicd-infra

gh secret list `
  --repo ASPIRECLAN-LLC-Org/ac-cicd-infra

Expected names:

PM_API_URL
PM_API_TOKEN_ID
PM_API_TOKEN_SECRET

No cloud-backend credentials are required for the current local backend. The runner service account must have read/write access to /var/lib/ac-cicd-infra/terraform-state/shared-k8s.

8. Configure the shared-k8s GitHub Environment

In GitHub, open:

ac-cicd-infra
→ Settings
→ Environments
→ New environment
→ shared-k8s

Recommended protection:

  • Allow deployments only from prod.
  • Add a required reviewer before Terraform apply.
  • Keep Terraform validation and planning automatic on dev.
  • Do not expose the environment to local, qa, or main.

The apply workflow starts after the reviewed change reaches prod, waits for environment approval, and then creates the VM.

9. Replace the Terraform plan workflow

Replace .github/workflows/terraform-plan-shared-k8s.yml with:

name: Terraform Plan - Shared Kubernetes

on:
  push:
    branches:
      - dev
    paths:
      - "terraform/modules/**"
      - "terraform/stacks/shared-k8s/**"
      - ".github/workflows/terraform-plan-shared-k8s.yml"
      - ".github/workflows/terraform-apply-shared-k8s.yml"

  workflow_dispatch:

permissions:
  contents: read

concurrency:
  group: shared-k8s-terraform
  cancel-in-progress: false

env:
  TF_IN_AUTOMATION: "true"
  TF_INPUT: "false"
  TF_WORKING_DIR: "terraform/stacks/shared-k8s"
  TF_STATE_PATH: "/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate"

  TF_VAR_pm_api_url: ${{ vars.PM_API_URL }}
  TF_VAR_pm_api_token_id: ${{ secrets.PM_API_TOKEN_ID }}
  TF_VAR_pm_api_token_secret: ${{ secrets.PM_API_TOKEN_SECRET }}
  TF_VAR_pm_tls_insecure: "true"

jobs:
  plan:
    name: Validate and plan shared Kubernetes
    runs-on:
      - self-hosted
      - Linux
      - X64
      - prod
      - terraform
      - deploy
      - ac-cicd-infra

    timeout-minutes: 45

    steps:
      - name: Checkout repository
        uses: actions/checkout@v5

      - name: Set up Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.15.5"
          terraform_wrapper: false

      - name: Display execution context
        shell: bash
        run: |
          set -euo pipefail
          echo "Repository: ${GITHUB_REPOSITORY}"
          echo "Event: ${GITHUB_EVENT_NAME}"
          echo "Branch: ${GITHUB_REF_NAME}"
          echo "Commit: ${GITHUB_SHA}"
          echo "Runner: ${RUNNER_NAME}"

      - name: Verify Terraform stack
        shell: bash
        run: |
          set -euo pipefail

          required_files=(
            "main.tf"
            "variables.tf"
            "outputs.tf"
            "providers.tf"
            "backend.tf"
          )

          for file in "${required_files[@]}"; do
            if [ ! -f "${TF_WORKING_DIR}/${file}" ]; then
              echo "ERROR: Missing ${TF_WORKING_DIR}/${file}"
              exit 1
            fi
          done

      - name: Terraform format check
        shell: bash
        run: |
          set -euo pipefail
          terraform fmt -check -recursive terraform

      - name: Terraform init
        working-directory: ${{ env.TF_WORKING_DIR }}
        shell: bash
        run: |
          set -euo pipefail
          terraform init \
            -input=false \
            -backend-config="path=${TF_STATE_PATH}"

      - name: Verify Terraform state location
        shell: bash
        run: |
          set -euo pipefail

          STATE_DIRECTORY="$(dirname "${TF_STATE_PATH}")"

          echo "Terraform state path:"
          echo "  ${TF_STATE_PATH}"

          if [ ! -d "${STATE_DIRECTORY}" ]; then
            echo "ERROR: State directory does not exist: ${STATE_DIRECTORY}"
            exit 1
          fi

          if [ ! -w "${STATE_DIRECTORY}" ]; then
            echo "ERROR: Runner cannot write to: ${STATE_DIRECTORY}"
            exit 1
          fi

          echo "Terraform state directory is writable."

      - name: Terraform validate
        working-directory: ${{ env.TF_WORKING_DIR }}
        shell: bash
        run: |
          set -euo pipefail
          terraform validate -no-color

      - name: Terraform plan
        working-directory: ${{ env.TF_WORKING_DIR }}
        shell: bash
        run: |
          set -euo pipefail
          terraform plan \
            -input=false \
            -lock-timeout=5m \
            -no-color

The plan workflow runs on:

  • Pushes to dev that change the Terraform module, shared stack, or either Terraform workflow.
  • Manual dispatch when the selected branch contains the intended code.

It never performs terraform apply. It does not use a pull_request trigger.

10. Replace the Terraform apply workflow

Replace .github/workflows/terraform-apply-shared-k8s.yml with:

name: Terraform Apply - Shared Kubernetes

on:
  push:
    branches:
      - prod
    paths:
      - "terraform/modules/**"
      - "terraform/stacks/shared-k8s/**"
      - ".github/workflows/terraform-apply-shared-k8s.yml"

  workflow_dispatch:

permissions:
  contents: read

concurrency:
  group: shared-k8s-terraform
  cancel-in-progress: false

env:
  TF_IN_AUTOMATION: "true"
  TF_INPUT: "false"
  TF_WORKING_DIR: "terraform/stacks/shared-k8s"
  TF_STATE_PATH: "/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate"

  TF_VAR_pm_api_url: ${{ vars.PM_API_URL }}
  TF_VAR_pm_api_token_id: ${{ secrets.PM_API_TOKEN_ID }}
  TF_VAR_pm_api_token_secret: ${{ secrets.PM_API_TOKEN_SECRET }}
  TF_VAR_pm_tls_insecure: "true"

jobs:
  apply:
    name: Apply shared Kubernetes infrastructure

    environment:
      name: shared-k8s

    runs-on:
      - self-hosted
      - Linux
      - X64
      - prod
      - terraform
      - deploy
      - ac-cicd-infra

    timeout-minutes: 90

    steps:
      - name: Checkout repository
        uses: actions/checkout@v5

      - name: Set up Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.15.5"
          terraform_wrapper: false

      - name: Verify production branch
        shell: bash
        run: |
          set -euo pipefail

          if [ "${GITHUB_REF_NAME}" != "prod" ]; then
            echo "ERROR: Terraform apply is permitted only from prod."
            exit 1
          fi

      - name: Terraform format check
        shell: bash
        run: |
          set -euo pipefail
          terraform fmt -check -recursive terraform

      - name: Terraform init
        working-directory: ${{ env.TF_WORKING_DIR }}
        shell: bash
        run: |
          set -euo pipefail
          terraform init \
            -input=false \
            -backend-config="path=${TF_STATE_PATH}"

      - name: Verify Terraform state location
        shell: bash
        run: |
          set -euo pipefail

          STATE_DIRECTORY="$(dirname "${TF_STATE_PATH}")"

          echo "Terraform state path:"
          echo "  ${TF_STATE_PATH}"

          if [ ! -d "${STATE_DIRECTORY}" ]; then
            echo "ERROR: State directory does not exist: ${STATE_DIRECTORY}"
            exit 1
          fi

          if [ ! -w "${STATE_DIRECTORY}" ]; then
            echo "ERROR: Runner cannot write to: ${STATE_DIRECTORY}"
            exit 1
          fi

          echo "Terraform state directory is writable."

      - name: Terraform validate
        working-directory: ${{ env.TF_WORKING_DIR }}
        shell: bash
        run: |
          set -euo pipefail
          terraform validate -no-color

      - name: Create saved Terraform plan
        working-directory: ${{ env.TF_WORKING_DIR }}
        shell: bash
        run: |
          set -euo pipefail
          terraform plan \
            -input=false \
            -lock-timeout=5m \
            -out=tfplan

      - name: Apply saved Terraform plan
        working-directory: ${{ env.TF_WORKING_DIR }}
        shell: bash
        run: |
          set -euo pipefail
          terraform apply \
            -input=false \
            -lock-timeout=5m \
            tfplan

      - name: Display outputs
        working-directory: ${{ env.TF_WORKING_DIR }}
        shell: bash
        run: |
          set -euo pipefail
          terraform output

The apply workflow runs only for prod pushes or an explicitly dispatched run using the prod branch. The shared-k8s environment remains the approval gate. The workflow creates and applies a saved Terraform plan against the same persistent state used by the dev plan workflow.

11. Review the change before committing

Run from Windows PowerShell:

git status
git diff --check
git diff --stat

git diff -- `
  terraform/modules/proxmox-vm `
  terraform/stacks/shared-k8s `
  .github/workflows/terraform-plan-shared-k8s.yml `
  .github/workflows/terraform-apply-shared-k8s.yml

Confirm:

  • No .tfstate or terraform.tfvars file is staged.
  • No Proxmox token is present in the diff.
  • Only the first load-balancer VM is defined.
  • The MAC address is aa:bb:cc:05:14:01.
  • The VM ID is 3156201.
  • The disk is scsi0, 40G, and local-lvm.

12. Commit, push, and open the dev pull request

git add `
  terraform/modules/proxmox-vm `
  terraform/stacks/shared-k8s `
  .github/workflows/terraform-plan-shared-k8s.yml `
  .github/workflows/terraform-apply-shared-k8s.yml

git commit -m "Provision shared Kubernetes API load balancer"

git push -u origin feature/provision-k8s-load-balancer

Create the pull request into dev:

gh pr create `
  --base dev `
  --head feature/provision-k8s-load-balancer `
  --title "Provision shared Kubernetes API load balancer" `
  --body "Adds Terraform and GitHub Actions configuration for cicd-ac-k8s-lb-01. The dev workflow validates and plans only."

Merge the pull request only after reviewing the Terraform module, stack, workflow paths, state path, VM identity, and disk configuration.

13. Validate the dev Terraform plan

After the feature pull request is merged into dev, open:

GitHub repository
→ Actions
→ Terraform Plan - Shared Kubernetes
→ Latest dev run

The plan must end with:

Plan: 1 to add, 0 to change, 0 to destroy.
Stop conditions

Do not promote the change when the plan proposes:

  • More than one VM.
  • A change or deletion to an existing Proxmox VM.
  • A disk smaller than 40G.
  • A disk slot other than scsi0.
  • A VM ID other than 3156201.
  • A MAC address other than aa:bb:cc:05:14:01.
  • A state path other than /var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate.

14. Review the dev to prod promotion

After the dev plan succeeds, refresh the remote branches and review exactly what will be promoted:

git fetch origin

git log --oneline origin/prod..origin/dev

git diff --check origin/prod...origin/dev
git diff --stat origin/prod...origin/dev

Confirm that the dev branch contains only the approved load-balancer Terraform implementation and its two workflows. Do not promote unrelated infrastructure changes in the same pull request.

15. Create the prod pull request

Create the production pull request directly from dev:

gh pr create `
  --base prod `
  --head dev `
  --title "Provision Kubernetes API load balancer" `
  --body "Promotes the validated cicd-ac-k8s-lb-01 Terraform configuration from dev to prod for approved apply."

Review the pull request files and confirm that its source branch is dev and its target branch is prod.

Open the pull request in GitHub when needed:

gh pr view `
  --repo ASPIRECLAN-LLC-Org/ac-cicd-infra `
  --web

16. Apply after the change reaches prod

After the approved pull request is merged into prod:

  1. GitHub starts Terraform Apply - Shared Kubernetes.
  2. The repository-level runner accepts the job.
  3. Terraform reads the persistent local state on prod-terraform-deploy-02.
  4. Terraform validates the stack and creates a saved plan.
  5. The shared-k8s environment requests approval when protection is enabled.
  6. Terraform applies that exact saved plan.
  7. Proxmox full-clones tmplt-ub-26-min-base.
  8. The new VM starts automatically.
  9. The router gives it 192.168.8.201 based on MAC AA:BB:CC:05:14:01.

Do not run the apply workflow from dev, qa, local, or main.

17. Verify the VM in Proxmox

Run on the Proxmox host:

qm status 3156201
qm config 3156201

Expected important values:

status: running
name: cicd-ac-k8s-lb-01
memory: 4096
cores: 2
scsi0: local-lvm
net0: virtio=AA:BB:CC:05:14:01,bridge=vmbr0

The exact scsi0 line contains additional storage-volume information; confirm that it is on local-lvm and has a size of 40G.

18. Verify the DHCP address and SSH access

Run from prod-terraform-deploy-02:

ping -c 4 192.168.8.201

ssh   -i ~/.ssh/id_ed25519_ansible   -o IdentitiesOnly=yes   -o BatchMode=yes   -o ConnectTimeout=10   acllc@192.168.8.201   'hostnamectl --static; ip -brief address; sudo -n whoami' 

Expected results:

  • Ping reaches 192.168.8.201.
  • SSH works with the Ansible automation key.
  • ens18 has 192.168.8.201/22 through DHCP.
  • sudo -n whoami returns root.

The guest hostname may still show the template hostname. That is expected until the Ansible common baseline runs.

You can also ask the Proxmox guest agent for network information:

qm guest cmd 3156201 network-get-interfaces

19. Successful completion state

At the end of this page, the environment should match:

Proxmox VM name: cicd-ac-k8s-lb-01
Proxmox VM ID: 3156201
VM state: running
CPU: 2 cores
RAM: 4096 MB
Boot disk: scsi0 on local-lvm, 40G
MAC: AA:BB:CC:05:14:01
DHCP address: 192.168.8.201
API VIP 192.168.8.200: not assigned yet
HAProxy: not installed yet
Keepalived: not installed yet

Do not expect 192.168.8.200 to answer yet. The API VIP is assigned later by Keepalived.

20. Failure handling

Plan fails

Do not promote the branch. Correct the Terraform or backend configuration on the feature branch and push another commit.

Apply fails before creating the VM

Review the workflow log, correct the issue, and rerun the same prod workflow. Do not manually create the VM in Proxmox.

VM exists but the workflow reports failure

Do not delete it manually. First inspect:

qm status 3156201
qm config 3156201

Then review the remote Terraform state and the workflow log. Terraform must remain the authoritative owner of the VM.

Wrong IP address

Confirm that the VM network device has MAC AA:BB:CC:05:14:01, then verify the router reservation and renew the DHCP lease. Do not configure a static Netplan address inside Ubuntu.

21. Ansible files configured in the next section

After the VM creation and SSH checks pass, continue with the Ansible implementation below:

ansible/ansible.cfg
ansible/inventories/shared-k8s/hosts.ini
ansible/inventories/shared-k8s/group_vars/all.yml
ansible/inventories/shared-k8s/group_vars/load_balancers.yml
ansible/roles/common/tasks/main.yml
ansible/roles/haproxy/tasks/main.yml
ansible/roles/haproxy/handlers/main.yml
ansible/roles/haproxy/templates/haproxy.cfg.j2
ansible/roles/keepalived/tasks/main.yml
ansible/roles/keepalived/handlers/main.yml
ansible/roles/keepalived/files/check-haproxy.sh
ansible/roles/keepalived/templates/keepalived.conf.j2
ansible/playbooks/shared-k8s/01-common-baseline.yml
ansible/playbooks/shared-k8s/02-configure-load-balancer.yml
.github/workflows/ansible-configure-load-balancer.yml

That phase will:

  • Set the Ubuntu hostname to cicd-ac-k8s-lb-01.
  • Apply the common Ubuntu baseline.
  • Install and configure HAProxy.
  • Install and configure Keepalived.
  • Assign the Kubernetes API VIP 192.168.8.200.
  • Add future control-plane backends 192.168.8.202–204:6443.

The control-plane backends may initially be reported as unavailable until the three control-plane VMs are provisioned and Kubernetes is listening on port 6443.


Configure the Load Balancer with Ansible

The Terraform phase created the Ubuntu VM. This section configures that VM automatically from the ac-cicd-infra repository after the Ansible change reaches prod.

22. What the Ansible phase configures

The Ansible phase performs all of the following:

  1. Confirms that node_primary_ip and node_interface are defined in inventory.
  2. Confirms the detected address is 192.168.8.201 and the interface is ens18.
  3. Uses ansible_facts instead of deprecated injected top-level fact variables.
  4. Uses /var/tmp/ansible-acllc as the explicit Ansible remote temporary directory.
  5. Force-refreshes the APT package index before every package-installing role.
  6. Installs ca-certificates, curl, gpg, jq, qemu-guest-agent, and UFW.
  7. Sets the active hostname to cicd-ac-k8s-lb-01.
  8. Writes cicd-ac-k8s-lb-01 to /etc/hostname.
  9. Updates the 127.0.1.1 hostname line in /etc/hosts.
  10. Adds the API VIP and future control-plane names to /etc/hosts.
  11. Installs and validates HAProxy and socat.
  12. Configures the TCP frontend on port 443.
  13. Adds Kubernetes API backends on 192.168.8.202–204:6443.
  14. Creates and validates the HAProxy runtime socket at /run/haproxy/admin.sock.
  15. Creates the local-only statistics page at 127.0.0.1:8404/stats.
  16. Installs and configures Keepalived.
  17. Assigns 192.168.8.200/22 to ens18.
  18. Enables HAProxy and Keepalived at boot.
  19. Verifies the hostname, services, listener, and VIP.

The control-plane backend health checks will initially show DOWN. That is expected until the three control-plane VMs exist and Kubernetes is listening on port 6443.

23. Files changed by the Ansible implementation

ansible/ansible.cfg
ansible/inventories/shared-k8s/hosts.ini
ansible/inventories/shared-k8s/group_vars/all.yml
ansible/inventories/shared-k8s/group_vars/load_balancers.yml
ansible/roles/common/tasks/main.yml
ansible/roles/haproxy/tasks/main.yml
ansible/roles/haproxy/handlers/main.yml
ansible/roles/haproxy/templates/haproxy.cfg.j2
ansible/roles/keepalived/tasks/main.yml
ansible/roles/keepalived/handlers/main.yml
ansible/roles/keepalived/files/check-haproxy.sh
ansible/roles/keepalived/templates/keepalived.conf.j2
ansible/playbooks/shared-k8s/01-common-baseline.yml
ansible/playbooks/shared-k8s/02-configure-load-balancer.yml
.github/workflows/ansible-configure-load-balancer.yml

24. Verify Ansible on the repository runner

Run once on prod-terraform-deploy-02:

ansible --version
ansible-playbook --version
ssh-keyscan -h 2>&1 | head

When Ansible or the OpenSSH client is missing:

sudo apt update
sudo apt install -y ansible-core openssh-client

ansible --version
ansible-playbook --version

The GitHub Actions workflow intentionally verifies these tools instead of silently installing a different version during every run.

25. Confirm the runner-side SSH key

The repository-level runner service on prod-terraform-deploy-02 must run as the Linux user that owns:

~/.ssh/id_ed25519_ansible

On prod-terraform-deploy-02, verify:

whoami
ls -l ~/.ssh/id_ed25519_ansible ~/.ssh/id_ed25519_ansible.pub
chmod 600 ~/.ssh/id_ed25519_ansible

ssh -i ~/.ssh/id_ed25519_ansible -o IdentitiesOnly=yes acllc@192.168.8.201 'hostnamectl --static; sudo -n whoami'

Expected:

The SSH connection succeeds.
sudo -n whoami returns root.

Do not commit either SSH key to Git.

26. Create the Ansible feature branch from dev

Run from Windows PowerShell:

cd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra

git switch dev
git pull --ff-only origin dev

git switch -c feature/configure-k8s-load-balancer

This follows the same source flow as Terraform: validate after the feature reaches dev, then configure only after the reviewed change reaches prod.

27. Replace ansible/ansible.cfg

[defaults]
inventory = inventories/shared-k8s/hosts.ini
roles_path = roles
host_key_checking = True
retry_files_enabled = False
interpreter_python = auto_silent
stdout_callback = default
inject_facts_as_vars = False
remote_tmp = /var/tmp/ansible-acllc
timeout = 30

[ssh_connection]
pipelining = True
ssh_args = -o IdentitiesOnly=yes -o ServerAliveInterval=30 -o ServerAliveCountMax=4

This configuration permanently addresses two warnings:

  • inject_facts_as_vars = False requires roles to use ansible_facts[...], preventing the Ansible 2.24 fact-injection deprecation.
  • remote_tmp = /var/tmp/ansible-acllc prevents temporary module files from being created under /root/.ansible/tmp.

The production workflow creates this directory with the correct owner and mode before the first Ansible connection. You can verify it manually with:

ssh   -i ~/.ssh/id_ed25519_ansible   -o IdentitiesOnly=yes   acllc@192.168.8.201   'sudo install      -d      -m 0700      -o acllc      -g acllc      /var/tmp/ansible-acllc

   ls -ld /var/tmp/ansible-acllc'

28. Replace the shared Kubernetes inventory

Replace ansible/inventories/shared-k8s/hosts.ini with:

[load_balancers]
cicd-ac-k8s-lb-01 ansible_host=192.168.8.201 ansible_user=acllc node_primary_ip=192.168.8.201 node_interface=ens18

[first_control_plane]
# Added by the control-plane page:
# cicd-ac-k8s-cp-01 ansible_host=192.168.8.202 ansible_user=acllc node_primary_ip=192.168.8.202 node_interface=ens18

[additional_control_planes]
# Added by the control-plane page:
# cicd-ac-k8s-cp-02 ansible_host=192.168.8.203 ansible_user=acllc node_primary_ip=192.168.8.203 node_interface=ens18
# cicd-ac-k8s-cp-03 ansible_host=192.168.8.204 ansible_user=acllc node_primary_ip=192.168.8.204 node_interface=ens18

[control_planes:children]
first_control_plane
additional_control_planes

[dev_workers]

[qa_workers]

[prod_workers]

[workers:children]
dev_workers
qa_workers
prod_workers

[k8s_cluster:children]
control_planes
workers

[all:vars]
ansible_python_interpreter=/usr/bin/python3

Only the load balancer has an active host entry at this stage. The control-plane groups already exist, but their host lines remain comments until the control-plane page provisions those VMs. The load-balancer entry includes the node_primary_ip and node_interface values required by the current common role.

29. Add the shared and load-balancer group variables

Create or replace:

ansible/inventories/shared-k8s/group_vars/all.yml

with:

---
cluster_admin_user: acllc

kubernetes_version: "v1.36.1"
kubernetes_package_version: "1.36.1-1.1"
kubernetes_minor_repository: "v1.36"
kubernetes_cri_socket: "unix:///run/containerd/containerd.sock"
kubernetes_api_endpoint: "cicd-ac-k8s-api.aspireclan.com:443"
kubernetes_api_vip: "192.168.8.200"
kubernetes_api_backend_port: 6443
kubernetes_pod_cidr: "10.244.0.0/16"
kubernetes_service_cidr: "10.96.0.0/12"
kubernetes_dns_domain: "cluster.local"

calico_version: "v3.32.0"
calico_crd_url: "https://raw.githubusercontent.com/projectcalico/calico/v3.32.0/manifests/v1_crd_projectcalico_org.yaml"
calico_operator_url: "https://raw.githubusercontent.com/projectcalico/calico/v3.32.0/manifests/tigera-operator.yaml"

managed_hosts_entries:
  - ip: 192.168.8.200
    names:
      - cicd-ac-k8s-api.aspireclan.com
      - cicd-ac-k8s-api
  - ip: 192.168.8.201
    names:
      - cicd-ac-k8s-lb-01
  - ip: 192.168.8.202
    names:
      - cicd-ac-k8s-cp-01
  - ip: 192.168.8.203
    names:
      - cicd-ac-k8s-cp-02
  - ip: 192.168.8.204
    names:
      - cicd-ac-k8s-cp-03

Then create or replace:

ansible/inventories/shared-k8s/group_vars/load_balancers.yml

with:

---
load_balancer_hostname: cicd-ac-k8s-lb-01
load_balancer_primary_ip: 192.168.8.201
load_balancer_interface: ens18

kubernetes_api_vip: 192.168.8.200
kubernetes_api_vip_prefix: 22
kubernetes_api_port: 443

keepalived_router_id: CICD_AC_K8S_LB_01
keepalived_state: MASTER
keepalived_virtual_router_id: 201
keepalived_priority: 100

control_plane_backends:
  - name: cicd-ac-k8s-cp-01
    address: 192.168.8.202
    port: 6443
  - name: cicd-ac-k8s-cp-02
    address: 192.168.8.203
    port: 6443
  - name: cicd-ac-k8s-cp-03
    address: 192.168.8.204
    port: 6443

managed_hosts_entries:
  - ip: 192.168.8.200
    names:
      - cicd-ac-k8s-api.aspireclan.com
      - cicd-ac-k8s-api
  - ip: 192.168.8.202
    names:
      - cicd-ac-k8s-cp-01
  - ip: 192.168.8.203
    names:
      - cicd-ac-k8s-cp-02
  - ip: 192.168.8.204
    names:
      - cicd-ac-k8s-cp-03

The shared variables provide cluster_admin_user, Kubernetes version pins, network ranges, Calico version pins, and default managed-host entries. The load-balancer variables provide the load-balancer-specific API VIP and control-plane host mappings together with HAProxy, Keepalived, interface, and backend settings.

30. Implement the common role

Replace:

ansible/roles/common/tasks/main.yml

with:

---
- name: Confirm required host identity variables are defined
  ansible.builtin.assert:
    that:
      - node_primary_ip is defined
      - node_interface is defined
      - inventory_hostname | length > 0
    fail_msg: >-
      The inventory must define node_primary_ip and node_interface for every host.

- name: Confirm the target IP and interface match the approved inventory
  ansible.builtin.assert:
    that:
      - ansible_facts["default_ipv4"]["address"] == node_primary_ip
      - ansible_facts["default_ipv4"]["interface"] == node_interface
    fail_msg: >-
      The detected default IPv4 address or interface does not match the approved inventory.

- name: Force refresh the APT package cache
  ansible.builtin.apt:
    update_cache: true
  register: common_apt_cache_refresh
  retries: 5
  delay: 15
  until: common_apt_cache_refresh is succeeded

- name: Install common operating-system packages
  ansible.builtin.apt:
    name:
      - ca-certificates
      - curl
      - gpg
      - jq
      - qemu-guest-agent
      - ufw
    state: present
  register: common_package_install
  retries: 3
  delay: 10
  until: common_package_install is succeeded

- name: Ensure the Ansible remote temporary directory exists
  ansible.builtin.file:
    path: /var/tmp/ansible-acllc
    state: directory
    owner: "{{ cluster_admin_user }}"
    group: "{{ cluster_admin_user }}"
    mode: "0700"

- name: Write the permanent hostname file
  ansible.builtin.copy:
    dest: /etc/hostname
    content: "{{ inventory_hostname }}\n"
    owner: root
    group: root
    mode: "0644"

- name: Set the active system hostname
  ansible.builtin.hostname:
    name: "{{ inventory_hostname }}"

- name: Set the local hostname mapping
  ansible.builtin.lineinfile:
    path: /etc/hosts
    regexp: '^127\.0\.1\.1\s+'
    line: "127.0.1.1 {{ inventory_hostname }}"
    create: true
    owner: root
    group: root
    mode: "0644"

- name: Add shared Kubernetes host mappings
  ansible.builtin.blockinfile:
    path: /etc/hosts
    marker: "# {mark} ASPIRECLAN SHARED K8S"
    block: |
      {% for item in managed_hosts_entries %}
      {{ item.ip }} {{ item.names | join(' ') }}
      {% endfor %}
    owner: root
    group: root
    mode: "0644"

- name: Enable and start QEMU Guest Agent
  ansible.builtin.service:
    name: qemu-guest-agent
    enabled: true
    state: started

- name: Allow SSH through UFW when UFW is enabled later
  ansible.builtin.command:
    cmd: ufw allow 22/tcp
  register: common_ufw_ssh_rule
  changed_when: "'Rule added' in common_ufw_ssh_rule.stdout"

- name: Verify the resulting hostname
  ansible.builtin.command:
    cmd: hostnamectl --static
  register: common_configured_hostname
  changed_when: false
  failed_when: common_configured_hostname.stdout | trim != inventory_hostname

30.1 Permanent APT, remote-temp, and UFW behavior

The template cleanup intentionally removes /var/lib/apt/lists/*. A new clone therefore starts without an APT package index. Installing packages in the template would only hide the problem.

The permanent behavior is:

Template cleanup removes /var/lib/apt/lists/*
Every package-installing Ansible role force-refreshes APT first
No clone depends on package indexes inherited from the template
APT refresh retries: 5 attempts, 15 seconds apart
Package installation retries: 3 attempts, 10 seconds apart
UFW, HAProxy, Keepalived, and future packages resolve from current Ubuntu repositories
Template rebuild required: No

The common role also creates /var/tmp/ansible-acllc with owner acllc and mode 0700. UFW is installed, and the SSH rule is added, but the firewall remains disabled until a later hardening phase. HAProxy port 443 must be considered when UFW is eventually enabled.

31. Implement the HAProxy role

31.1 Replace ansible/roles/haproxy/tasks/main.yml

---
- name: Force refresh the APT package cache before installing HAProxy
  ansible.builtin.apt:
    update_cache: true
  register: haproxy_apt_cache_refresh
  retries: 5
  delay: 15
  until: haproxy_apt_cache_refresh is succeeded

- name: Install HAProxy and the runtime statistics client
  ansible.builtin.apt:
    name:
      - haproxy
      - socat
    state: present
  register: haproxy_package_install
  retries: 3
  delay: 10
  until: haproxy_package_install is succeeded

- name: Render the Kubernetes API HAProxy configuration
  ansible.builtin.template:
    src: haproxy.cfg.j2
    dest: /etc/haproxy/haproxy.cfg
    owner: root
    group: root
    mode: "0644"
    validate: "haproxy -c -f %s"
  notify: Restart HAProxy

- name: Enable and start HAProxy
  ansible.builtin.service:
    name: haproxy
    enabled: true
    state: started

- name: Apply any pending HAProxy restart
  ansible.builtin.meta: flush_handlers

- name: Validate the active HAProxy configuration
  ansible.builtin.command:
    cmd: haproxy -c -f /etc/haproxy/haproxy.cfg
  changed_when: false

- name: Confirm the HAProxy runtime socket exists
  ansible.builtin.stat:
    path: /run/haproxy/admin.sock
  register: haproxy_runtime_socket

- name: Assert that the HAProxy runtime socket is available
  ansible.builtin.assert:
    that:
      - haproxy_runtime_socket.stat.exists
      - haproxy_runtime_socket.stat.issock
    fail_msg: >-
      The HAProxy runtime socket /run/haproxy/admin.sock is not available.

- name: Confirm that the HAProxy Runtime API responds
  ansible.builtin.shell:
    executable: /bin/bash
    cmd: |
      set -euo pipefail

      printf 'show info\n' |
        socat - UNIX-CONNECT:/run/haproxy/admin.sock
  register: haproxy_runtime_api_check
  changed_when: false

31.2 Replace ansible/roles/haproxy/handlers/main.yml

---
- name: Restart HAProxy
  ansible.builtin.service:
    name: haproxy
    state: restarted

31.3 Create ansible/roles/haproxy/templates/haproxy.cfg.j2

global
    log /dev/log local0
    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon
    maxconn 4096

defaults
    log global
    mode tcp
    option tcplog
    option dontlognull
    timeout connect 10s
    timeout client 1m
    timeout server 1m

frontend kubernetes_api
    bind *:{{ kubernetes_api_port }}
    default_backend kubernetes_control_planes

backend kubernetes_control_planes
    balance roundrobin
    option tcp-check
    default-server inter 3s fall 3 rise 2
{% for backend in control_plane_backends %}
    server {{ backend.name }} {{ backend.address }}:{{ backend.port }} check
{% endfor %}

listen local_stats
    bind 127.0.0.1:8404
    mode http
    stats enable
    stats uri /stats
    stats refresh 10s

HAProxy listens on TCP 443 and distributes connections across the three future Kubernetes API servers.

The configuration also provides:

  • Runtime API socket: /run/haproxy/admin.sock
  • Local-only statistics page: http://127.0.0.1:8404/stats
  • socat for querying runtime information without exposing an administrative port to the LAN

32. Implement the Keepalived role

32.1 Replace ansible/roles/keepalived/tasks/main.yml

---
- name: Force refresh the APT package cache before installing Keepalived
  ansible.builtin.apt:
    update_cache: true
  register: keepalived_apt_cache_refresh
  retries: 5
  delay: 15
  until: keepalived_apt_cache_refresh is succeeded

- name: Install Keepalived
  ansible.builtin.apt:
    name:
      - keepalived
    state: present
  register: keepalived_package_install
  retries: 3
  delay: 10
  until: keepalived_package_install is succeeded

- name: Install the HAProxy health-check script
  ansible.builtin.copy:
    src: check-haproxy.sh
    dest: /usr/local/sbin/check-haproxy.sh
    owner: root
    group: root
    mode: "0755"
  notify: Restart Keepalived

- name: Render the Keepalived configuration
  ansible.builtin.template:
    src: keepalived.conf.j2
    dest: /etc/keepalived/keepalived.conf
    owner: root
    group: root
    mode: "0644"
  notify: Restart Keepalived

- name: Enable and start Keepalived
  ansible.builtin.service:
    name: keepalived
    enabled: true
    state: started

- name: Apply any pending Keepalived restart
  ansible.builtin.meta: flush_handlers

- name: Wait for the Kubernetes API VIP to appear
  ansible.builtin.command:
    cmd: "ip -4 address show dev {{ load_balancer_interface }}"
  register: load_balancer_addresses
  changed_when: false
  retries: 15
  delay: 2
  until: >-
    (kubernetes_api_vip ~ '/' ~ (kubernetes_api_vip_prefix | string))
    in load_balancer_addresses.stdout

32.2 Replace ansible/roles/keepalived/handlers/main.yml

---
- name: Restart Keepalived
  ansible.builtin.service:
    name: keepalived
    state: restarted

32.3 Create ansible/roles/keepalived/files/check-haproxy.sh

#!/usr/bin/env bash
set -euo pipefail

systemctl is-active --quiet haproxy
pgrep -x haproxy >/dev/null

32.4 Create ansible/roles/keepalived/templates/keepalived.conf.j2

global_defs {
    router_id {{ keepalived_router_id }}
    enable_script_security
    script_user root
}

vrrp_script check_haproxy {
    script "/usr/local/sbin/check-haproxy.sh"
    interval 2
    timeout 2
    fall 2
    rise 2
}

vrrp_instance VI_K8S_API {
    state {{ keepalived_state }}
    interface {{ load_balancer_interface }}
    virtual_router_id {{ keepalived_virtual_router_id }}
    priority {{ keepalived_priority }}
    advert_int 1

    virtual_ipaddress {
        {{ kubernetes_api_vip }}/{{ kubernetes_api_vip_prefix }} dev {{ load_balancer_interface }}
    }

    track_script {
        check_haproxy
    }
}

This is currently a one-load-balancer Keepalived configuration. It gives 192.168.8.200/22 to cicd-ac-k8s-lb-01. VRRP virtual-router ID 201 is intentionally different from common default values used by older clusters on the same LAN. When a second load balancer is introduced, add a second host with lower priority and update Keepalived for the two-node design.

33. Replace the common-baseline playbook

Replace:

ansible/playbooks/shared-k8s/01-common-baseline.yml

with:

---
- name: Apply the common Ubuntu baseline
  hosts: all
  become: true
  gather_facts: true

  roles:
    - role: common

34. Replace the load-balancer playbook

Replace:

ansible/playbooks/shared-k8s/02-configure-load-balancer.yml

with:

---
- name: Configure the Kubernetes API load balancer
  hosts: load_balancers
  become: true
  gather_facts: true

  roles:
    - role: haproxy
    - role: keepalived

35. Add the Ansible GitHub Actions workflow

Create:

.github/workflows/ansible-configure-load-balancer.yml

with:

name: Ansible Configure - Kubernetes Load Balancer

on:
  push:
    branches:
      - dev
      - prod
    paths:
      - "ansible/inventories/shared-k8s/group_vars/load_balancers.yml"
      - "ansible/roles/haproxy/**"
      - "ansible/roles/keepalived/**"
      - "ansible/playbooks/shared-k8s/02-configure-load-balancer.yml"
      - ".github/workflows/ansible-configure-load-balancer.yml"

  workflow_dispatch:

permissions:
  contents: read

concurrency:
  group: shared-k8s-ansible
  cancel-in-progress: false

env:
  ANSIBLE_CONFIG: ${{ github.workspace }}/ansible/ansible.cfg

jobs:
  validate:
    name: Validate load-balancer Ansible configuration
    runs-on:
      - self-hosted
      - Linux
      - X64
      - prod
      - terraform
      - deploy
      - ac-cicd-infra

    steps:
      - name: Checkout repository
        uses: actions/checkout@v5

      - name: Verify Ansible
        shell: bash
        run: |
          set -euo pipefail
          ansible --version
          ansible-playbook --version

      - name: Validate inventory
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail
          ansible-inventory \
            -i inventories/shared-k8s/hosts.ini \
            --graph

      - name: Syntax-check the common baseline
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail
          ansible-playbook \
            -i inventories/shared-k8s/hosts.ini \
            playbooks/shared-k8s/01-common-baseline.yml \
            --syntax-check

      - name: Syntax-check the load-balancer playbook
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail
          ansible-playbook \
            -i inventories/shared-k8s/hosts.ini \
            playbooks/shared-k8s/02-configure-load-balancer.yml \
            --syntax-check

  configure:
    name: Configure cicd-ac-k8s-lb-01
    needs:
      - validate

    if: >-
      (github.event_name == 'push' && github.ref_name == 'prod') ||
      (github.event_name == 'workflow_dispatch' && github.ref_name == 'prod')

    environment:
      name: shared-k8s

    runs-on:
      - self-hosted
      - Linux
      - X64
      - prod
      - terraform
      - deploy
      - ac-cicd-infra

    timeout-minutes: 45

    steps:
      - name: Checkout repository
        uses: actions/checkout@v5

      - name: Verify the production branch
        shell: bash
        run: |
          set -euo pipefail

          if [ "${GITHUB_REF_NAME}" != "prod" ]; then
            echo "ERROR: Ansible configuration is permitted only from prod."
            exit 1
          fi

      - name: Prepare the existing Ansible SSH key
        shell: bash
        run: |
          set -euo pipefail

          KEY_PATH="${HOME}/.ssh/id_ed25519_ansible"

          if [ ! -f "${KEY_PATH}" ]; then
            echo "ERROR: Missing Ansible key: ${KEY_PATH}"
            exit 1
          fi

          chmod 600 "${KEY_PATH}"
          echo "ANSIBLE_PRIVATE_KEY_FILE=${KEY_PATH}" >> "${GITHUB_ENV}"

      - name: Refresh the load-balancer SSH host key
        shell: bash
        run: |
          set -euo pipefail

          mkdir -p "${HOME}/.ssh"
          chmod 700 "${HOME}/.ssh"
          touch "${HOME}/.ssh/known_hosts"
          chmod 600 "${HOME}/.ssh/known_hosts"

          ssh-keygen \
            -f "${HOME}/.ssh/known_hosts" \
            -R "192.168.8.201" || true

          for attempt in $(seq 1 30); do
            if ssh-keyscan \
              -T 5 \
              -H "192.168.8.201" \
              >> "${HOME}/.ssh/known_hosts" 2>/dev/null
            then
              echo "SSH host key captured."
              exit 0
            fi

            echo "Waiting for SSH on 192.168.8.201 (attempt ${attempt}/30)..."
            sleep 10
          done

          echo "ERROR: Unable to capture the SSH host key."
          exit 1

      - name: Verify Ansible connectivity
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail

          ansible \
            -i inventories/shared-k8s/hosts.ini \
            load_balancers \
            --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
            -m ping

      - name: Apply the common baseline
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail

          ansible-playbook \
            -i inventories/shared-k8s/hosts.ini \
            --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
            playbooks/shared-k8s/01-common-baseline.yml

      - name: Configure HAProxy, Keepalived, and the API VIP
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail

          ansible-playbook \
            -i inventories/shared-k8s/hosts.ini \
            --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
            playbooks/shared-k8s/02-configure-load-balancer.yml

      - name: Verify the completed load balancer
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail

          ansible \
            -i inventories/shared-k8s/hosts.ini \
            load_balancers \
            --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
            -b \
            -m shell \
            -a '
              set -e
              hostnamectl --static
              grep -F "127.0.1.1 cicd-ac-k8s-lb-01" /etc/hosts
              grep -F "192.168.8.200 cicd-ac-k8s-api.aspireclan.com cicd-ac-k8s-api" /etc/hosts
              systemctl is-active haproxy
              systemctl is-active keepalived
              haproxy -c -f /etc/haproxy/haproxy.cfg
              ip -4 address show dev ens18 | grep -F "192.168.8.200/22"
              ss -lntp | grep -E "[:.]443[[:space:]]"
            '

The workflow uses actions/checkout@v5, which uses the Node.js 24 action runtime. The self-hosted runner must be version 2.327.1 or newer. The verified runner version for this implementation was 2.334.0.

The workflow behaves as follows:

Branch or eventBehavior
Push to dev with load-balancer-specific changesInventory and playbook syntax validation only
Push to prod with load-balancer-specific changesValidation, SSH connectivity, common baseline, HAProxy, Keepalived, VIP, and verification
Manual run from devValidation only
Manual run from prodValidation and configuration
pull_requestNot used by this workflow
local, qa, or main pushDoes not trigger this workflow

The component-specific path filters intentionally prevent unrelated worker or ARC changes from invoking the load-balancer workflow.

36. Review the Ansible change

Run from Windows PowerShell:

git status
git diff --check
git diff --stat

git diff -- `
  ansible `
  .github/workflows/ansible-configure-load-balancer.yml

Confirm:

  • No private SSH key is staged.
  • The target address is only 192.168.8.201.
  • The API VIP is 192.168.8.200.
  • The primary interface is ens18.
  • The control-plane backends are 192.168.8.202–204:6443.
  • The workflow configures the VM only from prod.

37. Commit, push, and open the Ansible dev pull request

git add `
  ansible `
  .github/workflows/ansible-configure-load-balancer.yml

git commit -m "Configure Kubernetes API load balancer with Ansible"

git push -u origin feature/configure-k8s-load-balancer

Create the pull request into dev:

gh pr create `
  --base dev `
  --head feature/configure-k8s-load-balancer `
  --title "Configure Kubernetes API load balancer" `
  --body "Adds the Ansible baseline, HAProxy, Keepalived, API VIP, and prod-only configuration for cicd-ac-k8s-lb-01."

Merge it only after reviewing the inventory identity, shared variables, HAProxy backends, Keepalived VIP, role content, workflow paths, and prod-only configure condition.

38. Promote the Ansible configuration from dev to prod

After the dev validation workflow succeeds, review the exact promotion:

git fetch origin

git diff --check origin/prod...origin/dev
git diff --stat origin/prod...origin/dev

Create the production pull request:

gh pr create `
  --base prod `
  --head dev `
  --title "Configure the shared Kubernetes API load balancer" `
  --body "Promotes the validated load-balancer Ansible configuration from dev to prod."

Open it in GitHub when needed:

gh pr view `
  --repo ASPIRECLAN-LLC-Org/ac-cicd-infra `
  --web

After the reviewed pull request is merged into prod, approve the shared-k8s environment when prompted. The workflow then configures cicd-ac-k8s-lb-01.

39. Verify the completed load balancer manually

Run the detailed verification from prod-terraform-deploy-02:

ssh   -i ~/.ssh/id_ed25519_ansible   -o IdentitiesOnly=yes   acllc@192.168.8.201   'sudo bash -c "
    set -e

    echo === HOSTNAME ===
    hostnamectl --static
    cat /etc/hostname

    echo === HOSTS ===
    grep -E "127.0.1.1|192.168.8.200|192.168.8.202|192.168.8.203|192.168.8.204" /etc/hosts

    echo === SERVICES ===
    systemctl is-active haproxy
    systemctl is-active keepalived
    systemctl is-enabled haproxy
    systemctl is-enabled keepalived

    echo === HAPROXY ===
    haproxy -c -f /etc/haproxy/haproxy.cfg
    ss -lntp | grep -E ":443[[:space:]]"
    ss -lntp | grep -F "127.0.0.1:8404"
    test -S /run/haproxy/admin.sock

    echo === RUNTIME_API ===
    echo "show info" | socat - UNIX-CONNECT:/run/haproxy/admin.sock | head -20

    echo === ADDRESSES ===
    ip -brief address show ens18
  "'

A shorter repeatable health check is:

ssh   -i ~/.ssh/id_ed25519_ansible   -o IdentitiesOnly=yes   acllc@192.168.8.201   '
    echo "=== HOSTNAME ==="
    hostnamectl --static

    echo
    echo "=== SERVICES ==="
    systemctl is-active haproxy
    systemctl is-active keepalived
    systemctl is-enabled haproxy
    systemctl is-enabled keepalived

    echo
    echo "=== VIP ==="
    ip -4 address show dev ens18 |
      grep -F "192.168.8.200/22"

    echo
    echo "=== LISTENERS ==="
    sudo ss -lntp |
      grep -E ":443[[:space:]]|127\.0\.0\.1:8404"

    echo
    echo "=== CONFIGURATION ==="
    sudo haproxy -c -f /etc/haproxy/haproxy.cfg

    echo
    echo "=== RUNTIME SOCKET ==="
    sudo test -S /run/haproxy/admin.sock
    echo "show info" |
      sudo socat - UNIX-CONNECT:/run/haproxy/admin.sock |
      head -20
  '

Also verify the VIP and TCP frontend from another machine on the same network:

ping -c 4 192.168.8.200

nc -vz -w 5 192.168.8.200 443

The VIP and TCP 443 listener should answer. A Kubernetes /readyz request will not succeed until at least one control-plane node is listening on port 6443.

40. View HAProxy runtime statistics

The runtime socket is available only inside the load-balancer VM and is not exposed to the LAN.

ssh   -i ~/.ssh/id_ed25519_ansible   -o IdentitiesOnly=yes   acllc@192.168.8.201   'echo "show info" |
     sudo socat - UNIX-CONNECT:/run/haproxy/admin.sock'

ssh   -i ~/.ssh/id_ed25519_ansible   -o IdentitiesOnly=yes   acllc@192.168.8.201   'echo "show stat" |
     sudo socat - UNIX-CONNECT:/run/haproxy/admin.sock'

The CSV output from show stat contains frontend, backend, and per-control-plane health information. Until the control planes exist, the three server rows correctly report DOWN.

41. Open the local HAProxy statistics page

From Windows PowerShell or another SSH client, create an SSH tunnel:

ssh   -L 8404:127.0.0.1:8404   acllc@192.168.8.201

# Keep the SSH window open, then browse to:
http://127.0.0.1:8404/stats

The page shows frontend connections, backend status, health checks, queues, and error counters. It is intentionally bound to 127.0.0.1 on the VM and is not directly exposed to the LAN.

42. Successful Ansible completion state

Guest hostname: cicd-ac-k8s-lb-01
/etc/hostname: cicd-ac-k8s-lb-01
/etc/hosts: managed hostname, API VIP, and control-plane mappings present
APT behavior: package cache force-refreshed before common, HAProxy, and Keepalived package installation
Ansible remote temp: /var/tmp/ansible-acllc owned by acllc with mode 0700
UFW: installed; SSH rule present; firewall remains disabled until a later hardening phase
HAProxy: active and enabled
HAProxy frontend: TCP 443 on 0.0.0.0
HAProxy runtime socket: /run/haproxy/admin.sock
HAProxy Runtime API: responding through socat
HAProxy local statistics page: http://127.0.0.1:8404/stats
HAProxy backends:
  192.168.8.202:6443
  192.168.8.203:6443
  192.168.8.204:6443
Keepalived: active and enabled
Keepalived state: MASTER on the current single load balancer
Primary address: 192.168.8.201/22 on ens18
Kubernetes API VIP: 192.168.8.200/22 on ens18
Control-plane backend status: expected to be DOWN until Kubernetes is listening on port 6443

43. Expected first-deployment log messages

Normal messages during the first load-balancer deployment:

HAProxy backend servers DOWN:
  Expected until 192.168.8.202-204 are running Kubernetes on TCP 6443.

backend kubernetes_control_planes has no server available:
  Expected before the control-plane nodes are provisioned.

Keepalived VIP retry:
  Expected for a few seconds while Keepalived restarts and enters MASTER state.

HAProxy worker exit code 143:
  Expected during the Ansible-controlled HAProxy restart.

Keepalived ConditionFileNotEmpty skipped:
  Expected before Ansible writes /etc/keepalived/keepalived.conf.

Post job cleanup / Cleaning up orphan processes:
  Normal GitHub Actions runner housekeeping, not a workflow failure.

44. Ansible failure handling

SSH connectivity fails

Verify:

ssh -i ~/.ssh/id_ed25519_ansible -o IdentitiesOnly=yes acllc@192.168.8.201 'sudo -n whoami'

Confirm the public automation key is present in the VM's ~acllc/.ssh/authorized_keys.

The hostname assertion fails

Confirm that DHCP assigned 192.168.8.201 to MAC AA:BB:CC:05:14:01 and that the network interface is ens18.

APT reports that a package is unavailable

Run:

sudo apt-get update
apt-cache policy ufw haproxy keepalived

Each package must show a nonempty Candidate. The committed roles already force-refresh APT and retry transient failures. Do not rebuild the template merely because /var/lib/apt/lists/* was cleaned before conversion.

Ansible reports a remote temporary-directory warning

Confirm ansible.cfg contains:

remote_tmp = /var/tmp/ansible-acllc

Then create or repair the directory:

ssh   -i ~/.ssh/id_ed25519_ansible   -o IdentitiesOnly=yes   acllc@192.168.8.201   'sudo install      -d      -m 0700      -o acllc      -g acllc      /var/tmp/ansible-acllc

   ls -ld /var/tmp/ansible-acllc'

HAProxy validation fails

Run:

sudo haproxy -c -f /etc/haproxy/haproxy.cfg
sudo journalctl -u haproxy -n 100 --no-pager

Correct the Jinja template in Git and promote the fix. Do not edit the generated HAProxy file manually as the permanent solution.

Keepalived is not assigning the VIP

Run:

sudo systemctl status keepalived --no-pager
sudo journalctl -u keepalived -n 100 --no-pager
ip -4 address show dev ens18
sudo /usr/local/sbin/check-haproxy.sh

Keepalived intentionally removes or withholds the VIP when the HAProxy health check fails.

HAProxy backends are DOWN

This is expected until these machines exist and listen on TCP 6443:

cicd-ac-k8s-cp-01  192.168.8.202:6443
cicd-ac-k8s-cp-02  192.168.8.203:6443
cicd-ac-k8s-cp-03  192.168.8.204:6443

Do not remove the backend definitions.

45. Source consistency status and expected next phase

This page is written for a from-scratch rebuild. It does not assume that the load-balancer VM, HAProxy, Keepalived, or the VIP already exist.

Documentation mode: from-scratch rebuild
Source alignment: cleaned ac-cicd-infra repository
Terraform allocation: cicd-ac-k8s-lb-01 / 3156201 / AA:BB:CC:05:14:01 / 192.168.8.201
API VIP: 192.168.8.200/22
Terraform workflow model: dev validates and plans; prod applies
Ansible workflow model: dev validates; prod configures
Terraform backend: persistent local state on prod-terraform-deploy-02
State path: /var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate
HAProxy backends: 192.168.8.202-204:6443
Page completion target: load-balancer VM, HAProxy, Keepalived, runtime socket, and API VIP verified
Live environment completion assumed by this page: none

The embedded Terraform module, Terraform workflows, Ansible configuration, shared variables, common role, HAProxy role, Keepalived role, playbooks, and load-balancer workflow were reconciled with the cleaned ac-cicd-infra source. The initial shared-k8s/main.tf and outputs.tf shown on this page intentionally define only the first load-balancer VM; later pages expand that same stack with control planes and workers.

After this page has been executed successfully, the next infrastructure phase is to provision and configure:

cicd-ac-k8s-cp-01  192.168.8.202
cicd-ac-k8s-cp-02  192.168.8.203
cicd-ac-k8s-cp-03  192.168.8.204