Provision the First Kubernetes Load Balancer with Terraform
This page provisions only the first shared Kubernetes infrastructure VM:
VM name: cicd-ac-k8s-lb-01
VM ID: 3156201
MAC address: aa:bb:cc:05:14:01
Reserved IP: 192.168.8.201
API VIP: 192.168.8.200 (assigned later by Ansible and Keepalived)
CPU: 2 vCPU
RAM: 4096 MB
Boot disk: scsi0, 40G, local-lvm
Template: tmplt-ub-26-min-base
Template ID: 90000
Proxmox node: pve
Bridge: vmbr0The VM is created automatically after the approved Terraform change reaches the prod branch. The GitHub Actions job runs on the repository-level self-hosted runner:
prod-ac-cicd-infra-deploy-rnr-011. Scope of this page
This page performs the first controlled infrastructure test in a clean rebuild:
- Add the reusable Proxmox VM Terraform module.
- Define only
cicd-ac-k8s-lb-01in the initial shared Kubernetes stack. - Push the feature change to
dev, where Terraform validates and produces a plan only. - Promote the same reviewed change from
devtoprod. - Allow the
prodworkflow to apply only after theshared-k8sGitHub Environment approval. - Verify that Proxmox created the VM and DHCP assigned
192.168.8.201. - Configure the VM with Ansible from
prod. - Verify HAProxy, Keepalived, the runtime socket, and API VIP
192.168.8.200.
The Terraform portion first creates the VM. The Ansible section at the bottom of this same page then sets the Ubuntu hostname, updates /etc/hostname and /etc/hosts, installs HAProxy and Keepalived, assigns the API VIP, and verifies the completed load balancer.
This page does not bootstrap the Kubernetes control planes or workers.
Because tmplt-ub-26-min-base is a non-Cloud-Init template, Terraform changes the Proxmox VM name, but it may not change the hostname inside Ubuntu. The Ansible section on this page sets the guest hostname to cicd-ac-k8s-lb-01 and updates both /etc/hostname and /etc/hosts.
2. Files changed by this implementation
terraform/modules/proxmox-vm/
terraform/stacks/shared-k8s/
.github/workflows/terraform-plan-shared-k8s.yml
.github/workflows/terraform-apply-shared-k8s.ymlDo not add the other 15 VMs yet. The first prod apply must propose and create only one VM.
3. Prerequisites
Confirm all of the following before editing Terraform:
| Requirement | Expected value |
|---|---|
| GitHub repository | ASPIRECLAN-LLC-Org/ac-cicd-infra |
| Working branch | Feature branch created from dev |
| Runner VM | prod-terraform-deploy-02 / 192.168.8.93 |
| Runner | prod-ac-cicd-infra-deploy-rnr-01 |
| Runner status | Idle / online |
| Runner version | 2.327.1 or newer; verified implementation used 2.334.0 |
| Runner labels | self-hosted, Linux, X64, prod, terraform, deploy, ac-cicd-infra |
| Proxmox API | https://192.168.8.23:8006/api2/json |
| Proxmox node | pve |
| VM template | tmplt-ub-26-min-base / VM ID 90000 |
| Storage | local-lvm |
| Bridge | vmbr0 |
| DHCP reservation | AA:BB:CC:05:14:01 → 192.168.8.201 |
| API VIP reservation | 192.168.8.200 is unused and is not assigned to a DHCP client |
| Terraform backend | Persistent local backend on prod-terraform-deploy-02 |
| GitHub apply environment | shared-k8s |
Before provisioning, verify the load-balancer address and VIP are not responding:
ping -c 1 -W 1 192.168.8.200 || true
ping -c 1 -W 1 192.168.8.201 || true
ip neigh show | grep -E '192.168.8.(200|201)' || trueAlso inspect the router lease and reservation tables. A device that blocks ICMP may not answer ping.
terraform/stacks/shared-k8s/backend.tf must contain a valid Terraform backend block. This page uses the approved persistent local backend on prod-terraform-deploy-02. The state file must remain outside the GitHub runner checkout.
4. Create a feature branch from dev
Run from Windows PowerShell:
cd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra
git switch dev
git pull --ff-only origin dev
git switch -c feature/provision-k8s-load-balancerThe current infrastructure workflow model uses dev for validation and planning and prod for apply/configuration. The retained local, qa, and main branches are not part of this page's automated execution path.
5. Implement the reusable Proxmox VM module
5.1 Replace terraform/modules/proxmox-vm/variables.tf
variable "name" {
description = "Proxmox VM name."
type = string
}
variable "description" {
description = "Description shown in Proxmox."
type = string
default = ""
}
variable "vmid" {
description = "Unique Proxmox VM ID."
type = number
}
variable "target_node" {
description = "Proxmox node where the VM will be created."
type = string
}
variable "template_name" {
description = "Proxmox template name used for cloning."
type = string
}
variable "cores" {
description = "Number of CPU cores."
type = number
}
variable "memory_mb" {
description = "RAM allocated to the VM in MB."
type = number
}
variable "disk_size" {
description = "Boot disk size, such as 40G."
type = string
}
variable "storage" {
description = "Proxmox storage used for the boot disk."
type = string
}
variable "bridge" {
description = "Proxmox network bridge."
type = string
}
variable "mac_address" {
description = "Permanent VM MAC address."
type = string
validation {
condition = can(regex("^([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}$", var.mac_address))
error_message = "mac_address must be a valid six-byte colon-separated MAC address."
}
}
variable "tags" {
description = "Proxmox tags."
type = list(string)
default = []
}5.2 Replace terraform/modules/proxmox-vm/main.tf
resource "proxmox_vm_qemu" "this" {
name = var.name
desc = var.description
vmid = var.vmid
target_node = var.target_node
clone = var.template_name
full_clone = true
agent = 1
define_connection_info = false
skip_ipv6 = true
vm_state = "running"
onboot = true
boot = "order=scsi0;net0"
bootdisk = "scsi0"
scsihw = "virtio-scsi-pci"
memory = var.memory_mb
balloon = 0
tablet = false
cpu {
cores = var.cores
sockets = 1
type = "host"
}
disk {
slot = "scsi0"
type = "disk"
storage = var.storage
size = var.disk_size
discard = true
backup = true
}
network {
id = 0
model = "virtio"
bridge = var.bridge
macaddr = var.mac_address
firewall = false
}
tags = join(",", var.tags)
}5.3 Replace terraform/modules/proxmox-vm/outputs.tf
output "name" {
description = "Created VM name."
value = proxmox_vm_qemu.this.name
}
output "vmid" {
description = "Created Proxmox VM ID."
value = proxmox_vm_qemu.this.vmid
}
output "target_node" {
description = "Proxmox node hosting the VM."
value = proxmox_vm_qemu.this.target_node
}5.4 Verify terraform/modules/proxmox-vm/versions.tf
terraform {
required_version = ">= 1.15.5"
required_providers {
proxmox = {
source = "Telmate/proxmox"
version = "3.0.1-rc9"
}
}
}6. Implement the shared-k8s Terraform stack
6.1 Replace terraform/stacks/shared-k8s/variables.tf
variable "pm_api_url" {
description = "Proxmox API URL."
type = string
}
variable "pm_api_token_id" {
description = "Proxmox API token ID."
type = string
sensitive = true
}
variable "pm_api_token_secret" {
description = "Proxmox API token secret."
type = string
sensitive = true
}
variable "pm_tls_insecure" {
description = "Allow the Proxmox self-signed TLS certificate."
type = bool
default = true
}6.2 Replace terraform/stacks/shared-k8s/providers.tf
terraform {
required_version = ">= 1.15.5"
required_providers {
proxmox = {
source = "Telmate/proxmox"
version = "3.0.1-rc9"
}
}
}
provider "proxmox" {
pm_api_url = var.pm_api_url
pm_api_token_id = var.pm_api_token_id
pm_api_token_secret = var.pm_api_token_secret
pm_tls_insecure = var.pm_tls_insecure
}6.3 Replace terraform/stacks/shared-k8s/main.tf
This first version defines only the load-balancer VM.
module "api_load_balancer" {
source = "../../modules/proxmox-vm"
name = "cicd-ac-k8s-lb-01"
description = "Aspireclan shared Kubernetes API load balancer"
vmid = 3156201
target_node = "pve"
template_name = "tmplt-ub-26-min-base"
cores = 2
memory_mb = 4096
disk_size = "40G"
storage = "local-lvm"
bridge = "vmbr0"
mac_address = "aa:bb:cc:05:14:01"
tags = [
"ac-cicd",
"shared-k8s",
"load-balancer",
"terraform",
"ansible",
]
}6.4 Replace terraform/stacks/shared-k8s/outputs.tf
output "api_load_balancer" {
description = "Kubernetes API load-balancer VM."
value = {
name = module.api_load_balancer.name
vmid = module.api_load_balancer.vmid
target_node = module.api_load_balancer.target_node
reserved_ip = "192.168.8.201"
api_vip = "192.168.8.200"
}
}6.5 Keep or update terraform.tfvars.example
This file is documentation only. The current stack uses explicit approved values for the first VM.
# Non-sensitive examples only.
proxmox_node = "pve"
proxmox_storage = "local-lvm"
proxmox_bridge = "vmbr0"
proxmox_template = "tmplt-ub-26-min-base"6.6 Configure the persistent local backend
Replace terraform/stacks/shared-k8s/backend.tf with:
terraform {
backend "local" {}
}Both Terraform workflows use the same persistent state path on prod-terraform-deploy-02:
/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstateCreate the directory once on prod-terraform-deploy-02:
sudo install -d -m 700 -o acllc -g acllc /var/lib/ac-cicd-infra/terraform-state/shared-k8s
touch /var/lib/ac-cicd-infra/terraform-state/shared-k8s/.write-test
rm /var/lib/ac-cicd-infra/terraform-state/shared-k8s/.write-testThe state path is outside the GitHub Actions checkout. Keep the repository-level Terraform runner count at one while this local backend is in use.
7. Configure the GitHub variable and secrets
Run from Windows PowerShell. The secret commands prompt for their values and do not display them afterward.
cd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra
gh variable set PM_API_URL `
--repo ASPIRECLAN-LLC-Org/ac-cicd-infra `
--body "https://192.168.8.23:8006/api2/json"
gh secret set PM_API_TOKEN_ID `
--repo ASPIRECLAN-LLC-Org/ac-cicd-infra
gh secret set PM_API_TOKEN_SECRET `
--repo ASPIRECLAN-LLC-Org/ac-cicd-infraVerify only the names:
gh variable list `
--repo ASPIRECLAN-LLC-Org/ac-cicd-infra
gh secret list `
--repo ASPIRECLAN-LLC-Org/ac-cicd-infraExpected names:
PM_API_URL
PM_API_TOKEN_ID
PM_API_TOKEN_SECRETNo cloud-backend credentials are required for the current local backend. The runner service account must have read/write access to /var/lib/ac-cicd-infra/terraform-state/shared-k8s.
8. Configure the shared-k8s GitHub Environment
In GitHub, open:
ac-cicd-infra
→ Settings
→ Environments
→ New environment
→ shared-k8sRecommended protection:
- Allow deployments only from
prod. - Add a required reviewer before Terraform apply.
- Keep Terraform validation and planning automatic on
dev. - Do not expose the environment to
local,qa, ormain.
The apply workflow starts after the reviewed change reaches prod, waits for environment approval, and then creates the VM.
9. Replace the Terraform plan workflow
Replace .github/workflows/terraform-plan-shared-k8s.yml with:
name: Terraform Plan - Shared Kubernetes
on:
push:
branches:
- dev
paths:
- "terraform/modules/**"
- "terraform/stacks/shared-k8s/**"
- ".github/workflows/terraform-plan-shared-k8s.yml"
- ".github/workflows/terraform-apply-shared-k8s.yml"
workflow_dispatch:
permissions:
contents: read
concurrency:
group: shared-k8s-terraform
cancel-in-progress: false
env:
TF_IN_AUTOMATION: "true"
TF_INPUT: "false"
TF_WORKING_DIR: "terraform/stacks/shared-k8s"
TF_STATE_PATH: "/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate"
TF_VAR_pm_api_url: ${{ vars.PM_API_URL }}
TF_VAR_pm_api_token_id: ${{ secrets.PM_API_TOKEN_ID }}
TF_VAR_pm_api_token_secret: ${{ secrets.PM_API_TOKEN_SECRET }}
TF_VAR_pm_tls_insecure: "true"
jobs:
plan:
name: Validate and plan shared Kubernetes
runs-on:
- self-hosted
- Linux
- X64
- prod
- terraform
- deploy
- ac-cicd-infra
timeout-minutes: 45
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Set up Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.15.5"
terraform_wrapper: false
- name: Display execution context
shell: bash
run: |
set -euo pipefail
echo "Repository: ${GITHUB_REPOSITORY}"
echo "Event: ${GITHUB_EVENT_NAME}"
echo "Branch: ${GITHUB_REF_NAME}"
echo "Commit: ${GITHUB_SHA}"
echo "Runner: ${RUNNER_NAME}"
- name: Verify Terraform stack
shell: bash
run: |
set -euo pipefail
required_files=(
"main.tf"
"variables.tf"
"outputs.tf"
"providers.tf"
"backend.tf"
)
for file in "${required_files[@]}"; do
if [ ! -f "${TF_WORKING_DIR}/${file}" ]; then
echo "ERROR: Missing ${TF_WORKING_DIR}/${file}"
exit 1
fi
done
- name: Terraform format check
shell: bash
run: |
set -euo pipefail
terraform fmt -check -recursive terraform
- name: Terraform init
working-directory: ${{ env.TF_WORKING_DIR }}
shell: bash
run: |
set -euo pipefail
terraform init \
-input=false \
-backend-config="path=${TF_STATE_PATH}"
- name: Verify Terraform state location
shell: bash
run: |
set -euo pipefail
STATE_DIRECTORY="$(dirname "${TF_STATE_PATH}")"
echo "Terraform state path:"
echo " ${TF_STATE_PATH}"
if [ ! -d "${STATE_DIRECTORY}" ]; then
echo "ERROR: State directory does not exist: ${STATE_DIRECTORY}"
exit 1
fi
if [ ! -w "${STATE_DIRECTORY}" ]; then
echo "ERROR: Runner cannot write to: ${STATE_DIRECTORY}"
exit 1
fi
echo "Terraform state directory is writable."
- name: Terraform validate
working-directory: ${{ env.TF_WORKING_DIR }}
shell: bash
run: |
set -euo pipefail
terraform validate -no-color
- name: Terraform plan
working-directory: ${{ env.TF_WORKING_DIR }}
shell: bash
run: |
set -euo pipefail
terraform plan \
-input=false \
-lock-timeout=5m \
-no-colorThe plan workflow runs on:
- Pushes to
devthat change the Terraform module, shared stack, or either Terraform workflow. - Manual dispatch when the selected branch contains the intended code.
It never performs terraform apply. It does not use a pull_request trigger.
10. Replace the Terraform apply workflow
Replace .github/workflows/terraform-apply-shared-k8s.yml with:
name: Terraform Apply - Shared Kubernetes
on:
push:
branches:
- prod
paths:
- "terraform/modules/**"
- "terraform/stacks/shared-k8s/**"
- ".github/workflows/terraform-apply-shared-k8s.yml"
workflow_dispatch:
permissions:
contents: read
concurrency:
group: shared-k8s-terraform
cancel-in-progress: false
env:
TF_IN_AUTOMATION: "true"
TF_INPUT: "false"
TF_WORKING_DIR: "terraform/stacks/shared-k8s"
TF_STATE_PATH: "/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate"
TF_VAR_pm_api_url: ${{ vars.PM_API_URL }}
TF_VAR_pm_api_token_id: ${{ secrets.PM_API_TOKEN_ID }}
TF_VAR_pm_api_token_secret: ${{ secrets.PM_API_TOKEN_SECRET }}
TF_VAR_pm_tls_insecure: "true"
jobs:
apply:
name: Apply shared Kubernetes infrastructure
environment:
name: shared-k8s
runs-on:
- self-hosted
- Linux
- X64
- prod
- terraform
- deploy
- ac-cicd-infra
timeout-minutes: 90
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Set up Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.15.5"
terraform_wrapper: false
- name: Verify production branch
shell: bash
run: |
set -euo pipefail
if [ "${GITHUB_REF_NAME}" != "prod" ]; then
echo "ERROR: Terraform apply is permitted only from prod."
exit 1
fi
- name: Terraform format check
shell: bash
run: |
set -euo pipefail
terraform fmt -check -recursive terraform
- name: Terraform init
working-directory: ${{ env.TF_WORKING_DIR }}
shell: bash
run: |
set -euo pipefail
terraform init \
-input=false \
-backend-config="path=${TF_STATE_PATH}"
- name: Verify Terraform state location
shell: bash
run: |
set -euo pipefail
STATE_DIRECTORY="$(dirname "${TF_STATE_PATH}")"
echo "Terraform state path:"
echo " ${TF_STATE_PATH}"
if [ ! -d "${STATE_DIRECTORY}" ]; then
echo "ERROR: State directory does not exist: ${STATE_DIRECTORY}"
exit 1
fi
if [ ! -w "${STATE_DIRECTORY}" ]; then
echo "ERROR: Runner cannot write to: ${STATE_DIRECTORY}"
exit 1
fi
echo "Terraform state directory is writable."
- name: Terraform validate
working-directory: ${{ env.TF_WORKING_DIR }}
shell: bash
run: |
set -euo pipefail
terraform validate -no-color
- name: Create saved Terraform plan
working-directory: ${{ env.TF_WORKING_DIR }}
shell: bash
run: |
set -euo pipefail
terraform plan \
-input=false \
-lock-timeout=5m \
-out=tfplan
- name: Apply saved Terraform plan
working-directory: ${{ env.TF_WORKING_DIR }}
shell: bash
run: |
set -euo pipefail
terraform apply \
-input=false \
-lock-timeout=5m \
tfplan
- name: Display outputs
working-directory: ${{ env.TF_WORKING_DIR }}
shell: bash
run: |
set -euo pipefail
terraform outputThe apply workflow runs only for prod pushes or an explicitly dispatched run using the prod branch. The shared-k8s environment remains the approval gate. The workflow creates and applies a saved Terraform plan against the same persistent state used by the dev plan workflow.
11. Review the change before committing
Run from Windows PowerShell:
git status
git diff --check
git diff --stat
git diff -- `
terraform/modules/proxmox-vm `
terraform/stacks/shared-k8s `
.github/workflows/terraform-plan-shared-k8s.yml `
.github/workflows/terraform-apply-shared-k8s.ymlConfirm:
- No
.tfstateorterraform.tfvarsfile is staged. - No Proxmox token is present in the diff.
- Only the first load-balancer VM is defined.
- The MAC address is
aa:bb:cc:05:14:01. - The VM ID is
3156201. - The disk is
scsi0,40G, andlocal-lvm.
12. Commit, push, and open the dev pull request
git add `
terraform/modules/proxmox-vm `
terraform/stacks/shared-k8s `
.github/workflows/terraform-plan-shared-k8s.yml `
.github/workflows/terraform-apply-shared-k8s.yml
git commit -m "Provision shared Kubernetes API load balancer"
git push -u origin feature/provision-k8s-load-balancerCreate the pull request into dev:
gh pr create `
--base dev `
--head feature/provision-k8s-load-balancer `
--title "Provision shared Kubernetes API load balancer" `
--body "Adds Terraform and GitHub Actions configuration for cicd-ac-k8s-lb-01. The dev workflow validates and plans only."Merge the pull request only after reviewing the Terraform module, stack, workflow paths, state path, VM identity, and disk configuration.
13. Validate the dev Terraform plan
After the feature pull request is merged into dev, open:
GitHub repository
→ Actions
→ Terraform Plan - Shared Kubernetes
→ Latest dev runThe plan must end with:
Plan: 1 to add, 0 to change, 0 to destroy.Do not promote the change when the plan proposes:
- More than one VM.
- A change or deletion to an existing Proxmox VM.
- A disk smaller than
40G. - A disk slot other than
scsi0. - A VM ID other than
3156201. - A MAC address other than
aa:bb:cc:05:14:01. - A state path other than
/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate.
14. Review the dev to prod promotion
After the dev plan succeeds, refresh the remote branches and review exactly what will be promoted:
git fetch origin
git log --oneline origin/prod..origin/dev
git diff --check origin/prod...origin/dev
git diff --stat origin/prod...origin/devConfirm that the dev branch contains only the approved load-balancer Terraform implementation and its two workflows. Do not promote unrelated infrastructure changes in the same pull request.
15. Create the prod pull request
Create the production pull request directly from dev:
gh pr create `
--base prod `
--head dev `
--title "Provision Kubernetes API load balancer" `
--body "Promotes the validated cicd-ac-k8s-lb-01 Terraform configuration from dev to prod for approved apply."Review the pull request files and confirm that its source branch is dev and its target branch is prod.
Open the pull request in GitHub when needed:
gh pr view `
--repo ASPIRECLAN-LLC-Org/ac-cicd-infra `
--web16. Apply after the change reaches prod
After the approved pull request is merged into prod:
- GitHub starts
Terraform Apply - Shared Kubernetes. - The repository-level runner accepts the job.
- Terraform reads the persistent local state on
prod-terraform-deploy-02. - Terraform validates the stack and creates a saved plan.
- The
shared-k8senvironment requests approval when protection is enabled. - Terraform applies that exact saved plan.
- Proxmox full-clones
tmplt-ub-26-min-base. - The new VM starts automatically.
- The router gives it
192.168.8.201based on MACAA:BB:CC:05:14:01.
Do not run the apply workflow from dev, qa, local, or main.
17. Verify the VM in Proxmox
Run on the Proxmox host:
qm status 3156201
qm config 3156201Expected important values:
status: running
name: cicd-ac-k8s-lb-01
memory: 4096
cores: 2
scsi0: local-lvm
net0: virtio=AA:BB:CC:05:14:01,bridge=vmbr0The exact scsi0 line contains additional storage-volume information; confirm that it is on local-lvm and has a size of 40G.
18. Verify the DHCP address and SSH access
Run from prod-terraform-deploy-02:
ping -c 4 192.168.8.201
ssh -i ~/.ssh/id_ed25519_ansible -o IdentitiesOnly=yes -o BatchMode=yes -o ConnectTimeout=10 acllc@192.168.8.201 'hostnamectl --static; ip -brief address; sudo -n whoami' Expected results:
- Ping reaches
192.168.8.201. - SSH works with the Ansible automation key.
ens18has192.168.8.201/22through DHCP.sudo -n whoamireturnsroot.
The guest hostname may still show the template hostname. That is expected until the Ansible common baseline runs.
You can also ask the Proxmox guest agent for network information:
qm guest cmd 3156201 network-get-interfaces19. Successful completion state
At the end of this page, the environment should match:
Proxmox VM name: cicd-ac-k8s-lb-01
Proxmox VM ID: 3156201
VM state: running
CPU: 2 cores
RAM: 4096 MB
Boot disk: scsi0 on local-lvm, 40G
MAC: AA:BB:CC:05:14:01
DHCP address: 192.168.8.201
API VIP 192.168.8.200: not assigned yet
HAProxy: not installed yet
Keepalived: not installed yetDo not expect 192.168.8.200 to answer yet. The API VIP is assigned later by Keepalived.
20. Failure handling
Plan fails
Do not promote the branch. Correct the Terraform or backend configuration on the feature branch and push another commit.
Apply fails before creating the VM
Review the workflow log, correct the issue, and rerun the same prod workflow. Do not manually create the VM in Proxmox.
VM exists but the workflow reports failure
Do not delete it manually. First inspect:
qm status 3156201
qm config 3156201Then review the remote Terraform state and the workflow log. Terraform must remain the authoritative owner of the VM.
Wrong IP address
Confirm that the VM network device has MAC AA:BB:CC:05:14:01, then verify the router reservation and renew the DHCP lease. Do not configure a static Netplan address inside Ubuntu.
21. Ansible files configured in the next section
After the VM creation and SSH checks pass, continue with the Ansible implementation below:
ansible/ansible.cfg
ansible/inventories/shared-k8s/hosts.ini
ansible/inventories/shared-k8s/group_vars/all.yml
ansible/inventories/shared-k8s/group_vars/load_balancers.yml
ansible/roles/common/tasks/main.yml
ansible/roles/haproxy/tasks/main.yml
ansible/roles/haproxy/handlers/main.yml
ansible/roles/haproxy/templates/haproxy.cfg.j2
ansible/roles/keepalived/tasks/main.yml
ansible/roles/keepalived/handlers/main.yml
ansible/roles/keepalived/files/check-haproxy.sh
ansible/roles/keepalived/templates/keepalived.conf.j2
ansible/playbooks/shared-k8s/01-common-baseline.yml
ansible/playbooks/shared-k8s/02-configure-load-balancer.yml
.github/workflows/ansible-configure-load-balancer.ymlThat phase will:
- Set the Ubuntu hostname to
cicd-ac-k8s-lb-01. - Apply the common Ubuntu baseline.
- Install and configure HAProxy.
- Install and configure Keepalived.
- Assign the Kubernetes API VIP
192.168.8.200. - Add future control-plane backends
192.168.8.202–204:6443.
The control-plane backends may initially be reported as unavailable until the three control-plane VMs are provisioned and Kubernetes is listening on port 6443.
Configure the Load Balancer with Ansible
The Terraform phase created the Ubuntu VM. This section configures that VM automatically from the ac-cicd-infra repository after the Ansible change reaches prod.
22. What the Ansible phase configures
The Ansible phase performs all of the following:
- Confirms that
node_primary_ipandnode_interfaceare defined in inventory. - Confirms the detected address is
192.168.8.201and the interface isens18. - Uses
ansible_factsinstead of deprecated injected top-level fact variables. - Uses
/var/tmp/ansible-acllcas the explicit Ansible remote temporary directory. - Force-refreshes the APT package index before every package-installing role.
- Installs
ca-certificates,curl,gpg,jq,qemu-guest-agent, and UFW. - Sets the active hostname to
cicd-ac-k8s-lb-01. - Writes
cicd-ac-k8s-lb-01to/etc/hostname. - Updates the
127.0.1.1hostname line in/etc/hosts. - Adds the API VIP and future control-plane names to
/etc/hosts. - Installs and validates HAProxy and
socat. - Configures the TCP frontend on port
443. - Adds Kubernetes API backends on
192.168.8.202–204:6443. - Creates and validates the HAProxy runtime socket at
/run/haproxy/admin.sock. - Creates the local-only statistics page at
127.0.0.1:8404/stats. - Installs and configures Keepalived.
- Assigns
192.168.8.200/22toens18. - Enables HAProxy and Keepalived at boot.
- Verifies the hostname, services, listener, and VIP.
The control-plane backend health checks will initially show DOWN. That is expected until the three control-plane VMs exist and Kubernetes is listening on port 6443.
23. Files changed by the Ansible implementation
ansible/ansible.cfg
ansible/inventories/shared-k8s/hosts.ini
ansible/inventories/shared-k8s/group_vars/all.yml
ansible/inventories/shared-k8s/group_vars/load_balancers.yml
ansible/roles/common/tasks/main.yml
ansible/roles/haproxy/tasks/main.yml
ansible/roles/haproxy/handlers/main.yml
ansible/roles/haproxy/templates/haproxy.cfg.j2
ansible/roles/keepalived/tasks/main.yml
ansible/roles/keepalived/handlers/main.yml
ansible/roles/keepalived/files/check-haproxy.sh
ansible/roles/keepalived/templates/keepalived.conf.j2
ansible/playbooks/shared-k8s/01-common-baseline.yml
ansible/playbooks/shared-k8s/02-configure-load-balancer.yml
.github/workflows/ansible-configure-load-balancer.yml24. Verify Ansible on the repository runner
Run once on prod-terraform-deploy-02:
ansible --version
ansible-playbook --version
ssh-keyscan -h 2>&1 | headWhen Ansible or the OpenSSH client is missing:
sudo apt update
sudo apt install -y ansible-core openssh-client
ansible --version
ansible-playbook --versionThe GitHub Actions workflow intentionally verifies these tools instead of silently installing a different version during every run.
25. Confirm the runner-side SSH key
The repository-level runner service on prod-terraform-deploy-02 must run as the Linux user that owns:
~/.ssh/id_ed25519_ansibleOn prod-terraform-deploy-02, verify:
whoami
ls -l ~/.ssh/id_ed25519_ansible ~/.ssh/id_ed25519_ansible.pub
chmod 600 ~/.ssh/id_ed25519_ansible
ssh -i ~/.ssh/id_ed25519_ansible -o IdentitiesOnly=yes acllc@192.168.8.201 'hostnamectl --static; sudo -n whoami'Expected:
The SSH connection succeeds.
sudo -n whoami returns root.Do not commit either SSH key to Git.
26. Create the Ansible feature branch from dev
Run from Windows PowerShell:
cd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra
git switch dev
git pull --ff-only origin dev
git switch -c feature/configure-k8s-load-balancerThis follows the same source flow as Terraform: validate after the feature reaches dev, then configure only after the reviewed change reaches prod.
27. Replace ansible/ansible.cfg
[defaults]
inventory = inventories/shared-k8s/hosts.ini
roles_path = roles
host_key_checking = True
retry_files_enabled = False
interpreter_python = auto_silent
stdout_callback = default
inject_facts_as_vars = False
remote_tmp = /var/tmp/ansible-acllc
timeout = 30
[ssh_connection]
pipelining = True
ssh_args = -o IdentitiesOnly=yes -o ServerAliveInterval=30 -o ServerAliveCountMax=4This configuration permanently addresses two warnings:
inject_facts_as_vars = Falserequires roles to useansible_facts[...], preventing the Ansible 2.24 fact-injection deprecation.remote_tmp = /var/tmp/ansible-acllcprevents temporary module files from being created under/root/.ansible/tmp.
The production workflow creates this directory with the correct owner and mode before the first Ansible connection. You can verify it manually with:
ssh -i ~/.ssh/id_ed25519_ansible -o IdentitiesOnly=yes acllc@192.168.8.201 'sudo install -d -m 0700 -o acllc -g acllc /var/tmp/ansible-acllc
ls -ld /var/tmp/ansible-acllc'28. Replace the shared Kubernetes inventory
Replace ansible/inventories/shared-k8s/hosts.ini with:
[load_balancers]
cicd-ac-k8s-lb-01 ansible_host=192.168.8.201 ansible_user=acllc node_primary_ip=192.168.8.201 node_interface=ens18
[first_control_plane]
# Added by the control-plane page:
# cicd-ac-k8s-cp-01 ansible_host=192.168.8.202 ansible_user=acllc node_primary_ip=192.168.8.202 node_interface=ens18
[additional_control_planes]
# Added by the control-plane page:
# cicd-ac-k8s-cp-02 ansible_host=192.168.8.203 ansible_user=acllc node_primary_ip=192.168.8.203 node_interface=ens18
# cicd-ac-k8s-cp-03 ansible_host=192.168.8.204 ansible_user=acllc node_primary_ip=192.168.8.204 node_interface=ens18
[control_planes:children]
first_control_plane
additional_control_planes
[dev_workers]
[qa_workers]
[prod_workers]
[workers:children]
dev_workers
qa_workers
prod_workers
[k8s_cluster:children]
control_planes
workers
[all:vars]
ansible_python_interpreter=/usr/bin/python3Only the load balancer has an active host entry at this stage. The control-plane groups already exist, but their host lines remain comments until the control-plane page provisions those VMs. The load-balancer entry includes the node_primary_ip and node_interface values required by the current common role.
29. Add the shared and load-balancer group variables
Create or replace:
ansible/inventories/shared-k8s/group_vars/all.ymlwith:
---
cluster_admin_user: acllc
kubernetes_version: "v1.36.1"
kubernetes_package_version: "1.36.1-1.1"
kubernetes_minor_repository: "v1.36"
kubernetes_cri_socket: "unix:///run/containerd/containerd.sock"
kubernetes_api_endpoint: "cicd-ac-k8s-api.aspireclan.com:443"
kubernetes_api_vip: "192.168.8.200"
kubernetes_api_backend_port: 6443
kubernetes_pod_cidr: "10.244.0.0/16"
kubernetes_service_cidr: "10.96.0.0/12"
kubernetes_dns_domain: "cluster.local"
calico_version: "v3.32.0"
calico_crd_url: "https://raw.githubusercontent.com/projectcalico/calico/v3.32.0/manifests/v1_crd_projectcalico_org.yaml"
calico_operator_url: "https://raw.githubusercontent.com/projectcalico/calico/v3.32.0/manifests/tigera-operator.yaml"
managed_hosts_entries:
- ip: 192.168.8.200
names:
- cicd-ac-k8s-api.aspireclan.com
- cicd-ac-k8s-api
- ip: 192.168.8.201
names:
- cicd-ac-k8s-lb-01
- ip: 192.168.8.202
names:
- cicd-ac-k8s-cp-01
- ip: 192.168.8.203
names:
- cicd-ac-k8s-cp-02
- ip: 192.168.8.204
names:
- cicd-ac-k8s-cp-03Then create or replace:
ansible/inventories/shared-k8s/group_vars/load_balancers.ymlwith:
---
load_balancer_hostname: cicd-ac-k8s-lb-01
load_balancer_primary_ip: 192.168.8.201
load_balancer_interface: ens18
kubernetes_api_vip: 192.168.8.200
kubernetes_api_vip_prefix: 22
kubernetes_api_port: 443
keepalived_router_id: CICD_AC_K8S_LB_01
keepalived_state: MASTER
keepalived_virtual_router_id: 201
keepalived_priority: 100
control_plane_backends:
- name: cicd-ac-k8s-cp-01
address: 192.168.8.202
port: 6443
- name: cicd-ac-k8s-cp-02
address: 192.168.8.203
port: 6443
- name: cicd-ac-k8s-cp-03
address: 192.168.8.204
port: 6443
managed_hosts_entries:
- ip: 192.168.8.200
names:
- cicd-ac-k8s-api.aspireclan.com
- cicd-ac-k8s-api
- ip: 192.168.8.202
names:
- cicd-ac-k8s-cp-01
- ip: 192.168.8.203
names:
- cicd-ac-k8s-cp-02
- ip: 192.168.8.204
names:
- cicd-ac-k8s-cp-03The shared variables provide cluster_admin_user, Kubernetes version pins, network ranges, Calico version pins, and default managed-host entries. The load-balancer variables provide the load-balancer-specific API VIP and control-plane host mappings together with HAProxy, Keepalived, interface, and backend settings.
30. Implement the common role
Replace:
ansible/roles/common/tasks/main.ymlwith:
---
- name: Confirm required host identity variables are defined
ansible.builtin.assert:
that:
- node_primary_ip is defined
- node_interface is defined
- inventory_hostname | length > 0
fail_msg: >-
The inventory must define node_primary_ip and node_interface for every host.
- name: Confirm the target IP and interface match the approved inventory
ansible.builtin.assert:
that:
- ansible_facts["default_ipv4"]["address"] == node_primary_ip
- ansible_facts["default_ipv4"]["interface"] == node_interface
fail_msg: >-
The detected default IPv4 address or interface does not match the approved inventory.
- name: Force refresh the APT package cache
ansible.builtin.apt:
update_cache: true
register: common_apt_cache_refresh
retries: 5
delay: 15
until: common_apt_cache_refresh is succeeded
- name: Install common operating-system packages
ansible.builtin.apt:
name:
- ca-certificates
- curl
- gpg
- jq
- qemu-guest-agent
- ufw
state: present
register: common_package_install
retries: 3
delay: 10
until: common_package_install is succeeded
- name: Ensure the Ansible remote temporary directory exists
ansible.builtin.file:
path: /var/tmp/ansible-acllc
state: directory
owner: "{{ cluster_admin_user }}"
group: "{{ cluster_admin_user }}"
mode: "0700"
- name: Write the permanent hostname file
ansible.builtin.copy:
dest: /etc/hostname
content: "{{ inventory_hostname }}\n"
owner: root
group: root
mode: "0644"
- name: Set the active system hostname
ansible.builtin.hostname:
name: "{{ inventory_hostname }}"
- name: Set the local hostname mapping
ansible.builtin.lineinfile:
path: /etc/hosts
regexp: '^127\.0\.1\.1\s+'
line: "127.0.1.1 {{ inventory_hostname }}"
create: true
owner: root
group: root
mode: "0644"
- name: Add shared Kubernetes host mappings
ansible.builtin.blockinfile:
path: /etc/hosts
marker: "# {mark} ASPIRECLAN SHARED K8S"
block: |
{% for item in managed_hosts_entries %}
{{ item.ip }} {{ item.names | join(' ') }}
{% endfor %}
owner: root
group: root
mode: "0644"
- name: Enable and start QEMU Guest Agent
ansible.builtin.service:
name: qemu-guest-agent
enabled: true
state: started
- name: Allow SSH through UFW when UFW is enabled later
ansible.builtin.command:
cmd: ufw allow 22/tcp
register: common_ufw_ssh_rule
changed_when: "'Rule added' in common_ufw_ssh_rule.stdout"
- name: Verify the resulting hostname
ansible.builtin.command:
cmd: hostnamectl --static
register: common_configured_hostname
changed_when: false
failed_when: common_configured_hostname.stdout | trim != inventory_hostname30.1 Permanent APT, remote-temp, and UFW behavior
The template cleanup intentionally removes /var/lib/apt/lists/*. A new clone therefore starts without an APT package index. Installing packages in the template would only hide the problem.
The permanent behavior is:
Template cleanup removes /var/lib/apt/lists/*
Every package-installing Ansible role force-refreshes APT first
No clone depends on package indexes inherited from the template
APT refresh retries: 5 attempts, 15 seconds apart
Package installation retries: 3 attempts, 10 seconds apart
UFW, HAProxy, Keepalived, and future packages resolve from current Ubuntu repositories
Template rebuild required: NoThe common role also creates /var/tmp/ansible-acllc with owner acllc and mode 0700. UFW is installed, and the SSH rule is added, but the firewall remains disabled until a later hardening phase. HAProxy port 443 must be considered when UFW is eventually enabled.
31. Implement the HAProxy role
31.1 Replace ansible/roles/haproxy/tasks/main.yml
---
- name: Force refresh the APT package cache before installing HAProxy
ansible.builtin.apt:
update_cache: true
register: haproxy_apt_cache_refresh
retries: 5
delay: 15
until: haproxy_apt_cache_refresh is succeeded
- name: Install HAProxy and the runtime statistics client
ansible.builtin.apt:
name:
- haproxy
- socat
state: present
register: haproxy_package_install
retries: 3
delay: 10
until: haproxy_package_install is succeeded
- name: Render the Kubernetes API HAProxy configuration
ansible.builtin.template:
src: haproxy.cfg.j2
dest: /etc/haproxy/haproxy.cfg
owner: root
group: root
mode: "0644"
validate: "haproxy -c -f %s"
notify: Restart HAProxy
- name: Enable and start HAProxy
ansible.builtin.service:
name: haproxy
enabled: true
state: started
- name: Apply any pending HAProxy restart
ansible.builtin.meta: flush_handlers
- name: Validate the active HAProxy configuration
ansible.builtin.command:
cmd: haproxy -c -f /etc/haproxy/haproxy.cfg
changed_when: false
- name: Confirm the HAProxy runtime socket exists
ansible.builtin.stat:
path: /run/haproxy/admin.sock
register: haproxy_runtime_socket
- name: Assert that the HAProxy runtime socket is available
ansible.builtin.assert:
that:
- haproxy_runtime_socket.stat.exists
- haproxy_runtime_socket.stat.issock
fail_msg: >-
The HAProxy runtime socket /run/haproxy/admin.sock is not available.
- name: Confirm that the HAProxy Runtime API responds
ansible.builtin.shell:
executable: /bin/bash
cmd: |
set -euo pipefail
printf 'show info\n' |
socat - UNIX-CONNECT:/run/haproxy/admin.sock
register: haproxy_runtime_api_check
changed_when: false31.2 Replace ansible/roles/haproxy/handlers/main.yml
---
- name: Restart HAProxy
ansible.builtin.service:
name: haproxy
state: restarted31.3 Create ansible/roles/haproxy/templates/haproxy.cfg.j2
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
maxconn 4096
defaults
log global
mode tcp
option tcplog
option dontlognull
timeout connect 10s
timeout client 1m
timeout server 1m
frontend kubernetes_api
bind *:{{ kubernetes_api_port }}
default_backend kubernetes_control_planes
backend kubernetes_control_planes
balance roundrobin
option tcp-check
default-server inter 3s fall 3 rise 2
{% for backend in control_plane_backends %}
server {{ backend.name }} {{ backend.address }}:{{ backend.port }} check
{% endfor %}
listen local_stats
bind 127.0.0.1:8404
mode http
stats enable
stats uri /stats
stats refresh 10sHAProxy listens on TCP 443 and distributes connections across the three future Kubernetes API servers.
The configuration also provides:
- Runtime API socket:
/run/haproxy/admin.sock - Local-only statistics page:
http://127.0.0.1:8404/stats socatfor querying runtime information without exposing an administrative port to the LAN
32. Implement the Keepalived role
32.1 Replace ansible/roles/keepalived/tasks/main.yml
---
- name: Force refresh the APT package cache before installing Keepalived
ansible.builtin.apt:
update_cache: true
register: keepalived_apt_cache_refresh
retries: 5
delay: 15
until: keepalived_apt_cache_refresh is succeeded
- name: Install Keepalived
ansible.builtin.apt:
name:
- keepalived
state: present
register: keepalived_package_install
retries: 3
delay: 10
until: keepalived_package_install is succeeded
- name: Install the HAProxy health-check script
ansible.builtin.copy:
src: check-haproxy.sh
dest: /usr/local/sbin/check-haproxy.sh
owner: root
group: root
mode: "0755"
notify: Restart Keepalived
- name: Render the Keepalived configuration
ansible.builtin.template:
src: keepalived.conf.j2
dest: /etc/keepalived/keepalived.conf
owner: root
group: root
mode: "0644"
notify: Restart Keepalived
- name: Enable and start Keepalived
ansible.builtin.service:
name: keepalived
enabled: true
state: started
- name: Apply any pending Keepalived restart
ansible.builtin.meta: flush_handlers
- name: Wait for the Kubernetes API VIP to appear
ansible.builtin.command:
cmd: "ip -4 address show dev {{ load_balancer_interface }}"
register: load_balancer_addresses
changed_when: false
retries: 15
delay: 2
until: >-
(kubernetes_api_vip ~ '/' ~ (kubernetes_api_vip_prefix | string))
in load_balancer_addresses.stdout32.2 Replace ansible/roles/keepalived/handlers/main.yml
---
- name: Restart Keepalived
ansible.builtin.service:
name: keepalived
state: restarted32.3 Create ansible/roles/keepalived/files/check-haproxy.sh
#!/usr/bin/env bash
set -euo pipefail
systemctl is-active --quiet haproxy
pgrep -x haproxy >/dev/null32.4 Create ansible/roles/keepalived/templates/keepalived.conf.j2
global_defs {
router_id {{ keepalived_router_id }}
enable_script_security
script_user root
}
vrrp_script check_haproxy {
script "/usr/local/sbin/check-haproxy.sh"
interval 2
timeout 2
fall 2
rise 2
}
vrrp_instance VI_K8S_API {
state {{ keepalived_state }}
interface {{ load_balancer_interface }}
virtual_router_id {{ keepalived_virtual_router_id }}
priority {{ keepalived_priority }}
advert_int 1
virtual_ipaddress {
{{ kubernetes_api_vip }}/{{ kubernetes_api_vip_prefix }} dev {{ load_balancer_interface }}
}
track_script {
check_haproxy
}
}This is currently a one-load-balancer Keepalived configuration. It gives 192.168.8.200/22 to cicd-ac-k8s-lb-01. VRRP virtual-router ID 201 is intentionally different from common default values used by older clusters on the same LAN. When a second load balancer is introduced, add a second host with lower priority and update Keepalived for the two-node design.
33. Replace the common-baseline playbook
Replace:
ansible/playbooks/shared-k8s/01-common-baseline.ymlwith:
---
- name: Apply the common Ubuntu baseline
hosts: all
become: true
gather_facts: true
roles:
- role: common34. Replace the load-balancer playbook
Replace:
ansible/playbooks/shared-k8s/02-configure-load-balancer.ymlwith:
---
- name: Configure the Kubernetes API load balancer
hosts: load_balancers
become: true
gather_facts: true
roles:
- role: haproxy
- role: keepalived35. Add the Ansible GitHub Actions workflow
Create:
.github/workflows/ansible-configure-load-balancer.ymlwith:
name: Ansible Configure - Kubernetes Load Balancer
on:
push:
branches:
- dev
- prod
paths:
- "ansible/inventories/shared-k8s/group_vars/load_balancers.yml"
- "ansible/roles/haproxy/**"
- "ansible/roles/keepalived/**"
- "ansible/playbooks/shared-k8s/02-configure-load-balancer.yml"
- ".github/workflows/ansible-configure-load-balancer.yml"
workflow_dispatch:
permissions:
contents: read
concurrency:
group: shared-k8s-ansible
cancel-in-progress: false
env:
ANSIBLE_CONFIG: ${{ github.workspace }}/ansible/ansible.cfg
jobs:
validate:
name: Validate load-balancer Ansible configuration
runs-on:
- self-hosted
- Linux
- X64
- prod
- terraform
- deploy
- ac-cicd-infra
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Verify Ansible
shell: bash
run: |
set -euo pipefail
ansible --version
ansible-playbook --version
- name: Validate inventory
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-inventory \
-i inventories/shared-k8s/hosts.ini \
--graph
- name: Syntax-check the common baseline
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
playbooks/shared-k8s/01-common-baseline.yml \
--syntax-check
- name: Syntax-check the load-balancer playbook
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
playbooks/shared-k8s/02-configure-load-balancer.yml \
--syntax-check
configure:
name: Configure cicd-ac-k8s-lb-01
needs:
- validate
if: >-
(github.event_name == 'push' && github.ref_name == 'prod') ||
(github.event_name == 'workflow_dispatch' && github.ref_name == 'prod')
environment:
name: shared-k8s
runs-on:
- self-hosted
- Linux
- X64
- prod
- terraform
- deploy
- ac-cicd-infra
timeout-minutes: 45
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Verify the production branch
shell: bash
run: |
set -euo pipefail
if [ "${GITHUB_REF_NAME}" != "prod" ]; then
echo "ERROR: Ansible configuration is permitted only from prod."
exit 1
fi
- name: Prepare the existing Ansible SSH key
shell: bash
run: |
set -euo pipefail
KEY_PATH="${HOME}/.ssh/id_ed25519_ansible"
if [ ! -f "${KEY_PATH}" ]; then
echo "ERROR: Missing Ansible key: ${KEY_PATH}"
exit 1
fi
chmod 600 "${KEY_PATH}"
echo "ANSIBLE_PRIVATE_KEY_FILE=${KEY_PATH}" >> "${GITHUB_ENV}"
- name: Refresh the load-balancer SSH host key
shell: bash
run: |
set -euo pipefail
mkdir -p "${HOME}/.ssh"
chmod 700 "${HOME}/.ssh"
touch "${HOME}/.ssh/known_hosts"
chmod 600 "${HOME}/.ssh/known_hosts"
ssh-keygen \
-f "${HOME}/.ssh/known_hosts" \
-R "192.168.8.201" || true
for attempt in $(seq 1 30); do
if ssh-keyscan \
-T 5 \
-H "192.168.8.201" \
>> "${HOME}/.ssh/known_hosts" 2>/dev/null
then
echo "SSH host key captured."
exit 0
fi
echo "Waiting for SSH on 192.168.8.201 (attempt ${attempt}/30)..."
sleep 10
done
echo "ERROR: Unable to capture the SSH host key."
exit 1
- name: Verify Ansible connectivity
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible \
-i inventories/shared-k8s/hosts.ini \
load_balancers \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
-m ping
- name: Apply the common baseline
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
playbooks/shared-k8s/01-common-baseline.yml
- name: Configure HAProxy, Keepalived, and the API VIP
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
playbooks/shared-k8s/02-configure-load-balancer.yml
- name: Verify the completed load balancer
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible \
-i inventories/shared-k8s/hosts.ini \
load_balancers \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
-b \
-m shell \
-a '
set -e
hostnamectl --static
grep -F "127.0.1.1 cicd-ac-k8s-lb-01" /etc/hosts
grep -F "192.168.8.200 cicd-ac-k8s-api.aspireclan.com cicd-ac-k8s-api" /etc/hosts
systemctl is-active haproxy
systemctl is-active keepalived
haproxy -c -f /etc/haproxy/haproxy.cfg
ip -4 address show dev ens18 | grep -F "192.168.8.200/22"
ss -lntp | grep -E "[:.]443[[:space:]]"
'The workflow uses actions/checkout@v5, which uses the Node.js 24 action runtime. The self-hosted runner must be version 2.327.1 or newer. The verified runner version for this implementation was 2.334.0.
The workflow behaves as follows:
| Branch or event | Behavior |
|---|---|
Push to dev with load-balancer-specific changes | Inventory and playbook syntax validation only |
Push to prod with load-balancer-specific changes | Validation, SSH connectivity, common baseline, HAProxy, Keepalived, VIP, and verification |
Manual run from dev | Validation only |
Manual run from prod | Validation and configuration |
pull_request | Not used by this workflow |
local, qa, or main push | Does not trigger this workflow |
The component-specific path filters intentionally prevent unrelated worker or ARC changes from invoking the load-balancer workflow.
36. Review the Ansible change
Run from Windows PowerShell:
git status
git diff --check
git diff --stat
git diff -- `
ansible `
.github/workflows/ansible-configure-load-balancer.ymlConfirm:
- No private SSH key is staged.
- The target address is only
192.168.8.201. - The API VIP is
192.168.8.200. - The primary interface is
ens18. - The control-plane backends are
192.168.8.202–204:6443. - The workflow configures the VM only from
prod.
37. Commit, push, and open the Ansible dev pull request
git add `
ansible `
.github/workflows/ansible-configure-load-balancer.yml
git commit -m "Configure Kubernetes API load balancer with Ansible"
git push -u origin feature/configure-k8s-load-balancerCreate the pull request into dev:
gh pr create `
--base dev `
--head feature/configure-k8s-load-balancer `
--title "Configure Kubernetes API load balancer" `
--body "Adds the Ansible baseline, HAProxy, Keepalived, API VIP, and prod-only configuration for cicd-ac-k8s-lb-01."Merge it only after reviewing the inventory identity, shared variables, HAProxy backends, Keepalived VIP, role content, workflow paths, and prod-only configure condition.
38. Promote the Ansible configuration from dev to prod
After the dev validation workflow succeeds, review the exact promotion:
git fetch origin
git diff --check origin/prod...origin/dev
git diff --stat origin/prod...origin/devCreate the production pull request:
gh pr create `
--base prod `
--head dev `
--title "Configure the shared Kubernetes API load balancer" `
--body "Promotes the validated load-balancer Ansible configuration from dev to prod."Open it in GitHub when needed:
gh pr view `
--repo ASPIRECLAN-LLC-Org/ac-cicd-infra `
--webAfter the reviewed pull request is merged into prod, approve the shared-k8s environment when prompted. The workflow then configures cicd-ac-k8s-lb-01.
39. Verify the completed load balancer manually
Run the detailed verification from prod-terraform-deploy-02:
ssh -i ~/.ssh/id_ed25519_ansible -o IdentitiesOnly=yes acllc@192.168.8.201 'sudo bash -c "
set -e
echo === HOSTNAME ===
hostnamectl --static
cat /etc/hostname
echo === HOSTS ===
grep -E "127.0.1.1|192.168.8.200|192.168.8.202|192.168.8.203|192.168.8.204" /etc/hosts
echo === SERVICES ===
systemctl is-active haproxy
systemctl is-active keepalived
systemctl is-enabled haproxy
systemctl is-enabled keepalived
echo === HAPROXY ===
haproxy -c -f /etc/haproxy/haproxy.cfg
ss -lntp | grep -E ":443[[:space:]]"
ss -lntp | grep -F "127.0.0.1:8404"
test -S /run/haproxy/admin.sock
echo === RUNTIME_API ===
echo "show info" | socat - UNIX-CONNECT:/run/haproxy/admin.sock | head -20
echo === ADDRESSES ===
ip -brief address show ens18
"'A shorter repeatable health check is:
ssh -i ~/.ssh/id_ed25519_ansible -o IdentitiesOnly=yes acllc@192.168.8.201 '
echo "=== HOSTNAME ==="
hostnamectl --static
echo
echo "=== SERVICES ==="
systemctl is-active haproxy
systemctl is-active keepalived
systemctl is-enabled haproxy
systemctl is-enabled keepalived
echo
echo "=== VIP ==="
ip -4 address show dev ens18 |
grep -F "192.168.8.200/22"
echo
echo "=== LISTENERS ==="
sudo ss -lntp |
grep -E ":443[[:space:]]|127\.0\.0\.1:8404"
echo
echo "=== CONFIGURATION ==="
sudo haproxy -c -f /etc/haproxy/haproxy.cfg
echo
echo "=== RUNTIME SOCKET ==="
sudo test -S /run/haproxy/admin.sock
echo "show info" |
sudo socat - UNIX-CONNECT:/run/haproxy/admin.sock |
head -20
'Also verify the VIP and TCP frontend from another machine on the same network:
ping -c 4 192.168.8.200
nc -vz -w 5 192.168.8.200 443The VIP and TCP 443 listener should answer. A Kubernetes /readyz request will not succeed until at least one control-plane node is listening on port 6443.
40. View HAProxy runtime statistics
The runtime socket is available only inside the load-balancer VM and is not exposed to the LAN.
ssh -i ~/.ssh/id_ed25519_ansible -o IdentitiesOnly=yes acllc@192.168.8.201 'echo "show info" |
sudo socat - UNIX-CONNECT:/run/haproxy/admin.sock'
ssh -i ~/.ssh/id_ed25519_ansible -o IdentitiesOnly=yes acllc@192.168.8.201 'echo "show stat" |
sudo socat - UNIX-CONNECT:/run/haproxy/admin.sock'The CSV output from show stat contains frontend, backend, and per-control-plane health information. Until the control planes exist, the three server rows correctly report DOWN.
41. Open the local HAProxy statistics page
From Windows PowerShell or another SSH client, create an SSH tunnel:
ssh -L 8404:127.0.0.1:8404 acllc@192.168.8.201
# Keep the SSH window open, then browse to:
http://127.0.0.1:8404/statsThe page shows frontend connections, backend status, health checks, queues, and error counters. It is intentionally bound to 127.0.0.1 on the VM and is not directly exposed to the LAN.
42. Successful Ansible completion state
Guest hostname: cicd-ac-k8s-lb-01
/etc/hostname: cicd-ac-k8s-lb-01
/etc/hosts: managed hostname, API VIP, and control-plane mappings present
APT behavior: package cache force-refreshed before common, HAProxy, and Keepalived package installation
Ansible remote temp: /var/tmp/ansible-acllc owned by acllc with mode 0700
UFW: installed; SSH rule present; firewall remains disabled until a later hardening phase
HAProxy: active and enabled
HAProxy frontend: TCP 443 on 0.0.0.0
HAProxy runtime socket: /run/haproxy/admin.sock
HAProxy Runtime API: responding through socat
HAProxy local statistics page: http://127.0.0.1:8404/stats
HAProxy backends:
192.168.8.202:6443
192.168.8.203:6443
192.168.8.204:6443
Keepalived: active and enabled
Keepalived state: MASTER on the current single load balancer
Primary address: 192.168.8.201/22 on ens18
Kubernetes API VIP: 192.168.8.200/22 on ens18
Control-plane backend status: expected to be DOWN until Kubernetes is listening on port 644343. Expected first-deployment log messages
Normal messages during the first load-balancer deployment:
HAProxy backend servers DOWN:
Expected until 192.168.8.202-204 are running Kubernetes on TCP 6443.
backend kubernetes_control_planes has no server available:
Expected before the control-plane nodes are provisioned.
Keepalived VIP retry:
Expected for a few seconds while Keepalived restarts and enters MASTER state.
HAProxy worker exit code 143:
Expected during the Ansible-controlled HAProxy restart.
Keepalived ConditionFileNotEmpty skipped:
Expected before Ansible writes /etc/keepalived/keepalived.conf.
Post job cleanup / Cleaning up orphan processes:
Normal GitHub Actions runner housekeeping, not a workflow failure.44. Ansible failure handling
SSH connectivity fails
Verify:
ssh -i ~/.ssh/id_ed25519_ansible -o IdentitiesOnly=yes acllc@192.168.8.201 'sudo -n whoami'Confirm the public automation key is present in the VM's ~acllc/.ssh/authorized_keys.
The hostname assertion fails
Confirm that DHCP assigned 192.168.8.201 to MAC AA:BB:CC:05:14:01 and that the network interface is ens18.
APT reports that a package is unavailable
Run:
sudo apt-get update
apt-cache policy ufw haproxy keepalivedEach package must show a nonempty Candidate. The committed roles already force-refresh APT and retry transient failures. Do not rebuild the template merely because /var/lib/apt/lists/* was cleaned before conversion.
Ansible reports a remote temporary-directory warning
Confirm ansible.cfg contains:
remote_tmp = /var/tmp/ansible-acllcThen create or repair the directory:
ssh -i ~/.ssh/id_ed25519_ansible -o IdentitiesOnly=yes acllc@192.168.8.201 'sudo install -d -m 0700 -o acllc -g acllc /var/tmp/ansible-acllc
ls -ld /var/tmp/ansible-acllc'HAProxy validation fails
Run:
sudo haproxy -c -f /etc/haproxy/haproxy.cfg
sudo journalctl -u haproxy -n 100 --no-pagerCorrect the Jinja template in Git and promote the fix. Do not edit the generated HAProxy file manually as the permanent solution.
Keepalived is not assigning the VIP
Run:
sudo systemctl status keepalived --no-pager
sudo journalctl -u keepalived -n 100 --no-pager
ip -4 address show dev ens18
sudo /usr/local/sbin/check-haproxy.shKeepalived intentionally removes or withholds the VIP when the HAProxy health check fails.
HAProxy backends are DOWN
This is expected until these machines exist and listen on TCP 6443:
cicd-ac-k8s-cp-01 192.168.8.202:6443
cicd-ac-k8s-cp-02 192.168.8.203:6443
cicd-ac-k8s-cp-03 192.168.8.204:6443Do not remove the backend definitions.
45. Source consistency status and expected next phase
This page is written for a from-scratch rebuild. It does not assume that the load-balancer VM, HAProxy, Keepalived, or the VIP already exist.
Documentation mode: from-scratch rebuild
Source alignment: cleaned ac-cicd-infra repository
Terraform allocation: cicd-ac-k8s-lb-01 / 3156201 / AA:BB:CC:05:14:01 / 192.168.8.201
API VIP: 192.168.8.200/22
Terraform workflow model: dev validates and plans; prod applies
Ansible workflow model: dev validates; prod configures
Terraform backend: persistent local state on prod-terraform-deploy-02
State path: /var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate
HAProxy backends: 192.168.8.202-204:6443
Page completion target: load-balancer VM, HAProxy, Keepalived, runtime socket, and API VIP verified
Live environment completion assumed by this page: noneThe embedded Terraform module, Terraform workflows, Ansible configuration, shared variables, common role, HAProxy role, Keepalived role, playbooks, and load-balancer workflow were reconciled with the cleaned ac-cicd-infra source. The initial shared-k8s/main.tf and outputs.tf shown on this page intentionally define only the first load-balancer VM; later pages expand that same stack with control planes and workers.
After this page has been executed successfully, the next infrastructure phase is to provision and configure:
cicd-ac-k8s-cp-01 192.168.8.202
cicd-ac-k8s-cp-02 192.168.8.203
cicd-ac-k8s-cp-03 192.168.8.204