Provision and Join the Four Kubernetes QA Workers
This is the sixth infrastructure page in the from-scratch build sequence. Follow it after the control-plane and development-worker pages have produced a healthy cluster with four development workers; it provisions the four QA ARC worker VMs.
FROM-SCRATCH SEQUENCE CHECKPOINT
Required before this page
Load-balancer VM, HAProxy, Keepalived, and API VIP configured
Three control-plane nodes Ready
Four development workers at 192.168.8.213-.216 Ready
Development labels and taints verified
Implemented by this page
QA worker Terraform definitions
Four QA worker VM provisions
Ubuntu baseline and Kubernetes node preparation
QA worker joins
QA worker labels and taints
Verification that development workers remain intact
Implemented by later pages
Production workers
ARC controller
Tenant runner scale sets1. Scope and execution order
This page is performed in two separate promotions:
- Terraform promotion: create only the four QA worker VMs.
- Ansible promotion: after all four VMs are reachable, configure Ubuntu, containerd, Kubernetes prerequisites, join the workers, and apply the approved QA labels and taints.
Do not combine these two changes into the same production promotion. The Ansible workflow must not start until Terraform has created all four VMs and DHCP has assigned the approved addresses.
The approved branch model is:
feature/*
↓ pull request
dev
↓ validation and Terraform plan only
dev → prod pull request
↓ review and shared-k8s approval
prod
↓ Terraform apply or Ansible configuration
Not used by this infrastructure execution flow:
local, qa, maindev performs validation and Terraform plan only. prod performs Terraform apply or Ansible configuration. Do not use local or qa in this infrastructure execution flow.
2. Approved QA worker allocation
cicd-ac-k8s-qa-wk-01
VM ID: 3156209
MAC: aa:bb:cc:07:0f:01
Reserved IP: 192.168.8.209
CPU: 4 vCPU
RAM: 16384 MB
Disk: scsi0, 250G, local-lvm
cicd-ac-k8s-qa-wk-02
VM ID: 3156210
MAC: aa:bb:cc:07:0f:02
Reserved IP: 192.168.8.210
CPU: 4 vCPU
RAM: 16384 MB
Disk: scsi0, 250G, local-lvm
cicd-ac-k8s-qa-wk-03
VM ID: 3156211
MAC: aa:bb:cc:07:0f:03
Reserved IP: 192.168.8.211
CPU: 4 vCPU
RAM: 16384 MB
Disk: scsi0, 250G, local-lvm
cicd-ac-k8s-qa-wk-04
VM ID: 3156212
MAC: aa:bb:cc:07:0f:04
Reserved IP: 192.168.8.212
CPU: 4 vCPU
RAM: 16384 MB
Disk: scsi0, 250G, local-lvm
Shared values
Template: tmplt-ub-26-min-base / VM ID 90000
Node: pve
Bridge: vmbr0
Environment: qa
Workload: github-runnerThe environment-specific address ranges are intentionally not ordered dev-first:
Approved worker address order
Production workers: 192.168.8.205-.208
QA workers: 192.168.8.209-.212
Development workers:192.168.8.213-.216
QA workers therefore use VM IDs 3156209-.3156212.Confirm the router has all four DHCP reservations before running Terraform. Ubuntu must continue using DHCP; do not configure static Netplan addresses.
Do not change the approved VM IDs, MAC addresses, IP reservations, scsi0 disk slot, local-lvm storage, or QA environment assignment while following this page.
Part A — Provision the QA Worker VMs with Terraform
3. Terraform files changed
terraform/stacks/shared-k8s/main.tf
terraform/stacks/shared-k8s/outputs.tfThe proxmox-vm and proxmox-vm-group modules created by the preceding pages remain unchanged. This page adds a new QA-worker map to the cumulative shared Kubernetes stack.
4. Create the Terraform feature branch from dev
Run from Windows PowerShell:
cd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra
git switch dev
git pull --ff-only origin dev
git switch -c feature/provision-k8s-qa-workers5. Extend terraform/stacks/shared-k8s/main.tf
Do not replace or edit the load-balancer, control-plane, or development-worker definitions created by the preceding pages. Append this QA block after the existing development-worker module:
locals {
qa_workers = {
qa_wk01 = {
name = "cicd-ac-k8s-qa-wk-01"
description = "Aspireclan shared Kubernetes QA ARC worker 01"
vmid = 3156209
target_node = "pve"
template_name = "tmplt-ub-26-min-base"
cores = 4
memory_mb = 16384
disk_size = "250G"
storage = "local-lvm"
bridge = "vmbr0"
mac_address = "aa:bb:cc:07:0f:01"
tags = [
"ac-cicd",
"shared-k8s",
"worker",
"qa",
"arc-runner",
"terraform",
"ansible",
]
}
qa_wk02 = {
name = "cicd-ac-k8s-qa-wk-02"
description = "Aspireclan shared Kubernetes QA ARC worker 02"
vmid = 3156210
target_node = "pve"
template_name = "tmplt-ub-26-min-base"
cores = 4
memory_mb = 16384
disk_size = "250G"
storage = "local-lvm"
bridge = "vmbr0"
mac_address = "aa:bb:cc:07:0f:02"
tags = [
"ac-cicd",
"shared-k8s",
"worker",
"qa",
"arc-runner",
"terraform",
"ansible",
]
}
qa_wk03 = {
name = "cicd-ac-k8s-qa-wk-03"
description = "Aspireclan shared Kubernetes QA ARC worker 03"
vmid = 3156211
target_node = "pve"
template_name = "tmplt-ub-26-min-base"
cores = 4
memory_mb = 16384
disk_size = "250G"
storage = "local-lvm"
bridge = "vmbr0"
mac_address = "aa:bb:cc:07:0f:03"
tags = [
"ac-cicd",
"shared-k8s",
"worker",
"qa",
"arc-runner",
"terraform",
"ansible",
]
}
qa_wk04 = {
name = "cicd-ac-k8s-qa-wk-04"
description = "Aspireclan shared Kubernetes QA ARC worker 04"
vmid = 3156212
target_node = "pve"
template_name = "tmplt-ub-26-min-base"
cores = 4
memory_mb = 16384
disk_size = "250G"
storage = "local-lvm"
bridge = "vmbr0"
mac_address = "aa:bb:cc:07:0f:04"
tags = [
"ac-cicd",
"shared-k8s",
"worker",
"qa",
"arc-runner",
"terraform",
"ansible",
]
}
}
}
module "qa_workers" {
source = "../../modules/proxmox-vm-group"
vms = local.qa_workers
}The four workers use the approved modified sizing of 4 vCPU, 16 GB RAM, and 250 GB disk per VM.
6. Extend terraform/stacks/shared-k8s/outputs.tf
Append:
output "qa_workers" {
description = "Shared Kubernetes QA worker VMs."
value = {
qa_wk01 = merge(module.qa_workers.vms["qa_wk01"], {
reserved_ip = "192.168.8.209"
mac_address = "aa:bb:cc:07:0f:01"
environment = "qa"
})
qa_wk02 = merge(module.qa_workers.vms["qa_wk02"], {
reserved_ip = "192.168.8.210"
mac_address = "aa:bb:cc:07:0f:02"
environment = "qa"
})
qa_wk03 = merge(module.qa_workers.vms["qa_wk03"], {
reserved_ip = "192.168.8.211"
mac_address = "aa:bb:cc:07:0f:03"
environment = "qa"
})
qa_wk04 = merge(module.qa_workers.vms["qa_wk04"], {
reserved_ip = "192.168.8.212"
mac_address = "aa:bb:cc:07:0f:04"
environment = "qa"
})
}
}7. Confirm the existing Terraform workflow contract
The Terraform workflows created by the preceding pages must continue to implement:
Terraform plan workflow
push branch: dev
manual dispatch: supported
pull_request trigger: not used
action: fmt, init, validate, plan
Terraform apply workflow
push branch: prod
manual dispatch: supported from prod
action: fmt, init, validate, saved plan, apply
Persistent state
/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstateNo new workflow is required for this Terraform change. The existing path filters already cover the shared stack.
8. Review and commit the Terraform change
git status
git diff --check
git diff --stat
git diff -- \
terraform/stacks/shared-k8s/main.tf \
terraform/stacks/shared-k8s/outputs.tfConfirm:
- The load balancer, three control planes, and four development-worker definitions created by the preceding pages remain unchanged.
- Exactly four VM additions are present.
- VM IDs are
3156209through3156212. - MAC addresses are
aa:bb:cc:07:0f:01throughaa:bb:cc:07:0f:04. - Each worker uses
4cores,16384MB RAM, and a250Gscsi0disk onlocal-lvm. - No Terraform state, Proxmox token, kubeconfig, join command, or private SSH key is staged.
Commit and push:
git add \
terraform/stacks/shared-k8s/main.tf \
terraform/stacks/shared-k8s/outputs.tf
git commit -m "Provision shared Kubernetes QA workers"
git push -u origin feature/provision-k8s-qa-workers9. Create the Terraform pull request into dev
gh pr create \
--base dev \
--head feature/provision-k8s-qa-workers \
--title "Provision shared Kubernetes QA workers" \
--body "Adds the four approved QA worker VMs to the shared Kubernetes Terraform stack."After merge, the dev plan must end with:
Plan: 4 to add, 0 to change, 0 to destroy.Do not promote to prod when Terraform proposes any update, replacement, or deletion of the load balancer or control planes, or any change to the completed development workers, or anything other than the four approved QA workers.
10. Promote the Terraform change from dev to prod
gh pr create \
--base prod \
--head dev \
--title "Provision shared Kubernetes QA workers" \
--body "Promotes the validated four QA worker Terraform plan to prod."After merge and shared-k8s environment approval, the production workflow applies the saved plan and writes the new worker resources to the existing persistent Terraform state.
11. Verify the worker VMs in Proxmox
Run on the Proxmox host:
qm status 3156209
qm config 3156209
qm status 3156210
qm config 3156210
qm status 3156211
qm config 3156211
qm status 3156212
qm config 3156212For every VM, confirm:
- Status is
running. - CPU is
4cores. - RAM is
16384MB. scsi0is onlocal-lvmwith size250G.- The expected MAC address is present.
onbootremains enabled.
12. Verify DHCP, SSH, and sudo
Run from prod-terraform-deploy-02:
for ip in 209 210 211 212; do
echo "=== 192.168.8.${ip} ==="
ping -c 2 -W 2 "192.168.8.${ip}"
ssh \
-i ~/.ssh/id_ed25519_ansible \
-o IdentitiesOnly=yes \
-o BatchMode=yes \
-o ConnectTimeout=10 \
"acllc@192.168.8.${ip}" \
'hostnamectl --static; ip -brief address; sudo -n whoami'
doneExpected before Ansible:
.209through.212respond.- The Ansible automation key authenticates as
acllc. sudo -n whoamireturnsroot.- The Ubuntu hostname may still show the base-template hostname.
Stop here until all four workers pass SSH and passwordless sudo checks.
Part B — Configure and Join the QA Workers with Ansible
13. QA-only Ansible files changed
ansible/inventories/shared-k8s/hosts.ini
ansible/inventories/shared-k8s/group_vars/qa_workers.yml
ansible/playbooks/shared-k8s/07-join-qa-workers.yml
ansible/playbooks/shared-k8s/08-label-and-taint-qa-workers.yml
.github/workflows/ansible-configure-qa-workers.ymlThe development-worker implementation created by the preceding page must remain intact. This QA phase does not replace the development join playbook, development labels-and-taints playbook, development group variables, or development workflow.
Preserve without replacement
ansible/inventories/shared-k8s/group_vars/dev_workers.yml
ansible/roles/common/**
ansible/roles/containerd/**
ansible/roles/kubernetes-common/**
ansible/roles/kubernetes-worker/**
ansible/playbooks/shared-k8s/01-common-baseline.yml
ansible/playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml
ansible/playbooks/shared-k8s/07-join-dev-workers.yml
ansible/playbooks/shared-k8s/08-label-and-taint-workers.yml
.github/workflows/ansible-configure-dev-workers.ymlThe QA phase makes only these cumulative or new-file changes:
Update cumulatively
ansible/inventories/shared-k8s/hosts.ini
Create new QA-only files
ansible/inventories/shared-k8s/group_vars/qa_workers.yml
ansible/playbooks/shared-k8s/07-join-qa-workers.yml
ansible/playbooks/shared-k8s/08-label-and-taint-qa-workers.yml
.github/workflows/ansible-configure-qa-workers.ymlThe common, containerd, Kubernetes-common, and Kubernetes-worker roles created by the preceding pages contain the required permanent fixes and are reused without replacement:
- Forced APT cache refresh before package installation.
- Package-install retries.
ansible_factsaccess instead of deprecated injected variables./var/tmp/ansible-acllcas the remote temporary directory.- Containerd 2.x-compatible CRI validation.
- Correct task-level indentation.
- Explicit
crictlruntime and image endpoints.
14. Create the Ansible feature branch from dev
Create this branch only after the production Terraform apply has completed successfully:
cd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra
git switch dev
git pull --ff-only origin dev
git switch -c feature/configure-k8s-qa-workers15. Update the shared inventory cumulatively
At this rebuild stage, the inventory must contain the control planes and development workers created by preceding pages plus the four new QA workers. Replace ansible/inventories/shared-k8s/hosts.ini only with this complete cumulative version:
[load_balancers]
cicd-ac-k8s-lb-01 ansible_host=192.168.8.201 ansible_user=acllc node_primary_ip=192.168.8.201 node_interface=ens18
[first_control_plane]
cicd-ac-k8s-cp-01 ansible_host=192.168.8.202 ansible_user=acllc node_primary_ip=192.168.8.202 node_interface=ens18
[additional_control_planes]
cicd-ac-k8s-cp-02 ansible_host=192.168.8.203 ansible_user=acllc node_primary_ip=192.168.8.203 node_interface=ens18
cicd-ac-k8s-cp-03 ansible_host=192.168.8.204 ansible_user=acllc node_primary_ip=192.168.8.204 node_interface=ens18
[control_planes:children]
first_control_plane
additional_control_planes
[dev_workers]
cicd-ac-k8s-dev-wk-01 ansible_host=192.168.8.213 ansible_user=acllc node_primary_ip=192.168.8.213 node_interface=ens18
cicd-ac-k8s-dev-wk-02 ansible_host=192.168.8.214 ansible_user=acllc node_primary_ip=192.168.8.214 node_interface=ens18
cicd-ac-k8s-dev-wk-03 ansible_host=192.168.8.215 ansible_user=acllc node_primary_ip=192.168.8.215 node_interface=ens18
cicd-ac-k8s-dev-wk-04 ansible_host=192.168.8.216 ansible_user=acllc node_primary_ip=192.168.8.216 node_interface=ens18
[qa_workers]
cicd-ac-k8s-qa-wk-01 ansible_host=192.168.8.209 ansible_user=acllc node_primary_ip=192.168.8.209 node_interface=ens18
cicd-ac-k8s-qa-wk-02 ansible_host=192.168.8.210 ansible_user=acllc node_primary_ip=192.168.8.210 node_interface=ens18
cicd-ac-k8s-qa-wk-03 ansible_host=192.168.8.211 ansible_user=acllc node_primary_ip=192.168.8.211 node_interface=ens18
cicd-ac-k8s-qa-wk-04 ansible_host=192.168.8.212 ansible_user=acllc node_primary_ip=192.168.8.212 node_interface=ens18
[prod_workers]
[workers:children]
dev_workers
qa_workers
prod_workers
[k8s_cluster:children]
control_planes
workers
[all:vars]
ansible_python_interpreter=/usr/bin/python3Before continuing, confirm both worker groups remain present:
grep -nE 'cicd-ac-k8s-dev-wk-0[1-4]' ansible/inventories/shared-k8s/hosts.ini
grep -nE 'cicd-ac-k8s-qa-wk-0[1-4]' ansible/inventories/shared-k8s/hosts.ini16. Create QA worker variables without changing development variables
Do not edit or replace:
ansible/inventories/shared-k8s/group_vars/dev_workers.ymlCreate the new file ansible/inventories/shared-k8s/group_vars/qa_workers.yml:
---
worker_environment: qa
worker_workload: github-runner
kubernetes_node_tcp_ports:
- "10250"
calico_node_tcp_ports:
- "5473"
calico_node_udp_ports:
- "4789"
worker_labels:
environment: qa
workload: github-runner
worker_taints:
- key: environment
value: qa
effect: NoScheduleThe existing control_planes.yml, generic Kubernetes firewall tasks, node-preparation playbook, and Kubernetes-worker role remain unchanged from the completed development-worker phase.
17. Preserve the existing development-worker playbooks
Do not edit, rename, or replace:
ansible/playbooks/shared-k8s/07-join-dev-workers.yml
ansible/playbooks/shared-k8s/08-label-and-taint-workers.ymlThose files continue to own development-worker reconciliation.
18. Add a QA-specific worker-join playbook
Create the new file:
ansible/playbooks/shared-k8s/07-join-qa-workers.ymlUse:
---
- name: Generate a fresh Kubernetes worker join command for QA workers
hosts: first_control_plane
become: true
gather_facts: false
tasks:
- name: Confirm the Kubernetes API is ready
ansible.builtin.command:
cmd: kubectl --kubeconfig=/etc/kubernetes/admin.conf get --raw=/readyz
register: qa_worker_join_api_ready
changed_when: false
retries: 12
delay: 10
until: qa_worker_join_api_ready.stdout | trim == "ok"
- name: Generate a fresh QA worker bootstrap-token join command
ansible.builtin.command:
cmd: kubeadm token create --ttl 2h --print-join-command
register: generated_qa_worker_join_command
changed_when: true
no_log: true
- name: Store the temporary QA worker join command in memory
ansible.builtin.set_fact:
shared_qa_worker_join_command: "{{ generated_qa_worker_join_command.stdout }}"
no_log: true
- name: Join the QA workers one at a time
hosts: qa_workers
serial: 1
become: true
gather_facts: true
vars:
worker_join_command: >-
{{ hostvars[groups['first_control_plane'][0]].shared_qa_worker_join_command }}
roles:
- role: kubernetes-worker
- name: Verify all QA workers are Ready
hosts: first_control_plane
become: true
gather_facts: false
environment:
KUBECONFIG: /etc/kubernetes/admin.conf
tasks:
- name: Wait for each QA worker to become Ready
ansible.builtin.command:
cmd: >-
kubectl wait
--for=condition=Ready
node/{{ item }}
--timeout=15m
loop: "{{ groups['qa_workers'] }}"
changed_when: false
- name: Display the joined QA workers
ansible.builtin.command:
cmd: >-
kubectl get nodes
-l environment=qa,workload=github-runner
-L environment,workload
-o wide
register: joined_qa_workers
changed_when: false
failed_when: false
- name: Print the QA worker table
ansible.builtin.debug:
var: joined_qa_workers.stdout_linesThis playbook targets only qa_workers. It cannot join or modify development workers.
19. Add a QA-specific labels-and-taints playbook
Create the new file:
ansible/playbooks/shared-k8s/08-label-and-taint-qa-workers.ymlUse:
---
- name: Apply QA worker labels and taints
hosts: first_control_plane
become: true
gather_facts: false
environment:
KUBECONFIG: /etc/kubernetes/admin.conf
tasks:
- name: Apply the approved QA worker labels
ansible.builtin.command:
cmd: >-
kubectl label node {{ item }}
environment={{ hostvars[item].worker_environment | default('qa') }}
workload={{ hostvars[item].worker_workload | default('github-runner') }}
--overwrite
loop: "{{ groups['qa_workers'] }}"
register: qa_worker_label_results
changed_when: "'not labeled' not in qa_worker_label_results.stdout"
- name: Apply the approved QA worker taint
ansible.builtin.command:
cmd: >-
kubectl taint node {{ item }}
environment=qa:NoSchedule
--overwrite
loop: "{{ groups['qa_workers'] }}"
register: qa_worker_taint_results
changed_when: "'not tainted' not in qa_worker_taint_results.stdout"
- name: Display QA worker labels
ansible.builtin.command:
cmd: >-
kubectl get nodes
-l environment=qa,workload=github-runner
-L environment,workload
-o wide
register: labeled_qa_workers
changed_when: false
- name: Print labeled QA workers
ansible.builtin.debug:
var: labeled_qa_workers.stdout_lines
- name: Verify the QA taint on every worker
ansible.builtin.shell:
executable: /bin/bash
cmd: |
set -euo pipefail
kubectl get node {{ item }} \
-o jsonpath='{range .spec.taints[*]}{.key}={.value}:{.effect}{"\n"}{end}' |
grep -Fx 'environment=qa:NoSchedule'
loop: "{{ groups['qa_workers'] }}"
changed_when: falseEvery QA worker receives:
environment=qa
workload=github-runner
environment=qa:NoScheduleThe QA default is explicitly qa; it does not fall back to dev.
20. Add the QA-worker GitHub Actions workflow
Create .github/workflows/ansible-configure-qa-workers.yml:
name: Ansible Configure - Kubernetes QA Workers
on:
push:
branches:
- dev
- prod
paths:
- "ansible/inventories/shared-k8s/group_vars/qa_workers.yml"
- "ansible/playbooks/shared-k8s/07-join-qa-workers.yml"
- "ansible/playbooks/shared-k8s/08-label-and-taint-qa-workers.yml"
- ".github/workflows/ansible-configure-qa-workers.yml"
workflow_dispatch:
permissions:
contents: read
concurrency:
group: shared-k8s-ansible
cancel-in-progress: false
env:
ANSIBLE_CONFIG: ${{ github.workspace }}/ansible/ansible.cfg
jobs:
validate:
name: Validate QA-worker Ansible configuration
runs-on:
- self-hosted
- Linux
- X64
- prod
- terraform
- deploy
- ac-cicd-infra
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Verify Ansible
shell: bash
run: |
set -euo pipefail
ansible --version
ansible-playbook --version
- name: Validate the cumulative shared inventory
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-inventory -i inventories/shared-k8s/hosts.ini --graph
for host in cicd-ac-k8s-dev-wk-01 cicd-ac-k8s-dev-wk-02 cicd-ac-k8s-dev-wk-03 cicd-ac-k8s-dev-wk-04
do
grep -Fq "${host} " inventories/shared-k8s/hosts.ini
done
for host in cicd-ac-k8s-qa-wk-01 cicd-ac-k8s-qa-wk-02 cicd-ac-k8s-qa-wk-03 cicd-ac-k8s-qa-wk-04
do
grep -Fq "${host} " inventories/shared-k8s/hosts.ini
done
- name: Confirm completed development-worker files remain present
shell: bash
run: |
set -euo pipefail
required_dev_files=(
"ansible/inventories/shared-k8s/group_vars/dev_workers.yml"
"ansible/playbooks/shared-k8s/07-join-dev-workers.yml"
"ansible/playbooks/shared-k8s/08-label-and-taint-workers.yml"
".github/workflows/ansible-configure-dev-workers.yml"
)
for file in "${required_dev_files[@]}"; do
if [ ! -f "${file}" ]; then
echo "ERROR: Completed development-worker file is missing: ${file}"
exit 1
fi
done
- name: Syntax-check the QA-worker playbooks
working-directory: ansible
shell: bash
run: |
set -euo pipefail
for playbook in playbooks/shared-k8s/01-common-baseline.yml playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml playbooks/shared-k8s/07-join-qa-workers.yml playbooks/shared-k8s/08-label-and-taint-qa-workers.yml
do
ansible-playbook -i inventories/shared-k8s/hosts.ini "${playbook}" --syntax-check
done
configure:
name: Configure and join QA workers
needs:
- validate
if: >-
(github.event_name == 'push' && github.ref_name == 'prod') ||
(github.event_name == 'workflow_dispatch' && github.ref_name == 'prod')
environment:
name: shared-k8s
runs-on:
- self-hosted
- Linux
- X64
- prod
- terraform
- deploy
- ac-cicd-infra
timeout-minutes: 180
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Verify the production branch
shell: bash
run: |
set -euo pipefail
if [ "${GITHUB_REF_NAME}" != "prod" ]; then
echo "ERROR: QA-worker configuration is permitted only from prod."
exit 1
fi
- name: Prepare the existing Ansible SSH key
shell: bash
run: |
set -euo pipefail
KEY_PATH="${HOME}/.ssh/id_ed25519_ansible"
if [ ! -f "${KEY_PATH}" ]; then
echo "ERROR: Missing Ansible key: ${KEY_PATH}"
exit 1
fi
chmod 600 "${KEY_PATH}"
echo "ANSIBLE_PRIVATE_KEY_FILE=${KEY_PATH}" >> "${GITHUB_ENV}"
- name: Refresh QA-worker SSH host keys
shell: bash
run: |
set -euo pipefail
mkdir -p "${HOME}/.ssh"
chmod 700 "${HOME}/.ssh"
touch "${HOME}/.ssh/known_hosts"
chmod 600 "${HOME}/.ssh/known_hosts"
for ip in 192.168.8.209 192.168.8.210 192.168.8.211 192.168.8.212; do
ssh-keygen -f "${HOME}/.ssh/known_hosts" -R "${ip}" || true
captured=false
for attempt in $(seq 1 60); do
if ssh-keyscan -T 5 -H "${ip}" >> "${HOME}/.ssh/known_hosts" 2>/dev/null; then
echo "SSH host key captured for ${ip}."
captured=true
break
fi
echo "Waiting for SSH on ${ip} (attempt ${attempt}/60)..."
sleep 10
done
if [ "${captured}" != "true" ]; then
echo "ERROR: Unable to capture SSH host key for ${ip}."
exit 1
fi
done
- name: Prepare the Ansible remote temporary directory
shell: bash
run: |
set -euo pipefail
for ip in 192.168.8.209 192.168.8.210 192.168.8.211 192.168.8.212; do
ssh -i "${ANSIBLE_PRIVATE_KEY_FILE}" -o IdentitiesOnly=yes -o BatchMode=yes "acllc@${ip}" 'sudo install -d -m 0700 -o acllc -g acllc /var/tmp/ansible-acllc'
done
- name: Verify Ansible connectivity
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible -i inventories/shared-k8s/hosts.ini qa_workers --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" -m ping
- name: Confirm the existing development workers remain Ready
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible -i inventories/shared-k8s/hosts.ini first_control_plane --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" -b -m shell -a '
set -e
export KUBECONFIG=/etc/kubernetes/admin.conf
for node in cicd-ac-k8s-dev-wk-01 cicd-ac-k8s-dev-wk-02 cicd-ac-k8s-dev-wk-03 cicd-ac-k8s-dev-wk-04
do
kubectl wait --for=condition=Ready "node/${node}" --timeout=2m
done
'
- name: Confirm the existing Kubernetes API is healthy
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible -i inventories/shared-k8s/hosts.ini first_control_plane --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" -b -m command -a 'kubectl --kubeconfig=/etc/kubernetes/admin.conf get --raw=/readyz'
- name: Apply the common Ubuntu baseline
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook -i inventories/shared-k8s/hosts.ini --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" --limit qa_workers playbooks/shared-k8s/01-common-baseline.yml
- name: Prepare containerd and Kubernetes prerequisites
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook -i inventories/shared-k8s/hosts.ini --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" --limit qa_workers playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml
- name: Join only the QA workers
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook -i inventories/shared-k8s/hosts.ini --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" playbooks/shared-k8s/07-join-qa-workers.yml
- name: Apply only QA labels and taints
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook -i inventories/shared-k8s/hosts.ini --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" playbooks/shared-k8s/08-label-and-taint-qa-workers.yml
- name: Verify QA workers and preserve development workers
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible -i inventories/shared-k8s/hosts.ini first_control_plane --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" -b -m shell -a '
set -e
export KUBECONFIG=/etc/kubernetes/admin.conf
echo "=== QA WORKERS ==="
kubectl get nodes -l environment=qa,workload=github-runner -L environment,workload -o wide
echo "=== DEVELOPMENT WORKERS — MUST REMAIN PRESENT ==="
kubectl get nodes -l environment=dev,workload=github-runner -L environment,workload -o wide
kubectl get --raw=/readyz
'The QA workflow:
- Uses QA-specific playbook filenames.
- Targets only
qa_workersfor baseline and node preparation. - Verifies all completed development workers still exist in inventory.
- Verifies completed development workers remain
Ready. - Does not call the development join or labels playbooks.
- Uses QA addresses
.209-.212.
Workflow behavior:
| Event | Result |
|---|---|
Push to dev | Cumulative inventory and QA playbook validation only |
Push to prod | QA baseline, preparation, join, labels, taints, and verification |
Manual dispatch from prod | Idempotent QA-worker reconciliation |
21. Review the QA-only change and prove development files are unchanged
git status
git diff --check
git diff --stat
git diff -- \
ansible/inventories/shared-k8s/hosts.ini \
ansible/inventories/shared-k8s/group_vars/qa_workers.yml \
ansible/playbooks/shared-k8s/07-join-qa-workers.yml \
ansible/playbooks/shared-k8s/08-label-and-taint-qa-workers.yml \
.github/workflows/ansible-configure-qa-workers.yml
git diff --exit-code -- \
ansible/inventories/shared-k8s/group_vars/dev_workers.yml \
ansible/playbooks/shared-k8s/07-join-dev-workers.yml \
ansible/playbooks/shared-k8s/08-label-and-taint-workers.yml \
.github/workflows/ansible-configure-dev-workers.ymlThe final git diff --exit-code command must return exit code 0. Any output means a completed development-worker file was modified; restore it before committing.
Confirm:
- All four development workers remain in
hosts.ini. dev_workers.ymlis unchanged.07-join-dev-workers.ymlis unchanged.08-label-and-taint-workers.ymlis unchanged.ansible-configure-dev-workers.ymlis unchanged.- QA files use
.209-.212,environment=qa, andenvironment=qa:NoSchedule. - No secret, kubeconfig, join token, Terraform state, or private key is staged.
Commit only the cumulative inventory and QA-specific files:
git add ansible/inventories/shared-k8s/hosts.ini ansible/inventories/shared-k8s/group_vars/qa_workers.yml ansible/playbooks/shared-k8s/07-join-qa-workers.yml ansible/playbooks/shared-k8s/08-label-and-taint-qa-workers.yml .github/workflows/ansible-configure-qa-workers.yml
git commit -m "Configure shared Kubernetes QA workers without changing development workers"
git push -u origin feature/configure-k8s-qa-workers22. Create the Ansible pull request into dev
gh pr create \
--base dev \
--head feature/configure-k8s-qa-workers \
--title "Configure shared Kubernetes QA workers" \
--body "Adds the Ubuntu baseline, Kubernetes prerequisites, worker joins, and approved QA labels and taints."Merge only after cumulative inventory validation and all QA playbook syntax checks succeed.
23. Promote the Ansible change from dev to prod
gh pr create \
--base prod \
--head dev \
--title "Configure shared Kubernetes QA workers" \
--body "Promotes the validated four QA worker configuration to prod."After merge and environment approval, the workflow runs in this order:
- Confirm all four completed development workers remain
Ready. - Verify the existing Kubernetes API.
- Apply the Ubuntu baseline only to QA workers.
- Configure containerd and Kubernetes prerequisites only on QA workers.
- Generate a temporary QA worker join command.
- Join the four QA workers serially.
- Wait for every QA worker to become
Ready. - Apply only QA labels and the QA taint.
- Display both the QA and preserved development worker sets.
24. Manual verification
Run from prod-terraform-deploy-02:
ssh -i ~/.ssh/id_ed25519_ansible -o IdentitiesOnly=yes acllc@192.168.8.202 'sudo bash -c "
export KUBECONFIG=/etc/kubernetes/admin.conf
echo === QA WORKERS ===
kubectl get nodes -l environment=qa,workload=github-runner -L environment,workload -o wide
echo === QA TAINTS ===
for node in cicd-ac-k8s-qa-wk-01 cicd-ac-k8s-qa-wk-02 cicd-ac-k8s-qa-wk-03 cicd-ac-k8s-qa-wk-04
do
echo --- ${node} ---
kubectl get node ${node} -o jsonpath=\"{range .spec.taints[*]}{.key}={.value}:{.effect}{'\\n'}{end}\"
done
echo === DEVELOPMENT WORKERS PRESERVED ===
kubectl get nodes -l environment=dev,workload=github-runner -L environment,workload -o wide
echo === API READINESS ===
kubectl get --raw=/readyz
"'Verify:
- Four QA workers are
Ready. - QA labels and taints are correct.
- Four development workers are still present and
Ready. - Development labels and taints remain unchanged.
25. Expected final state
EXPECTED REBUILD ACCEPTANCE CHECKPOINT
QA worker VMs:
cicd-ac-k8s-qa-wk-01 192.168.8.209 Ready
cicd-ac-k8s-qa-wk-02 192.168.8.210 Ready
cicd-ac-k8s-qa-wk-03 192.168.8.211 Ready
cicd-ac-k8s-qa-wk-04 192.168.8.212 Ready
Kubernetes labels on every QA worker:
environment=qa
workload=github-runner
Kubernetes taint on every QA worker:
environment=qa:NoSchedule
Cluster checkpoint after this page:
3 control planes Ready
4 development workers Ready and unchanged
4 QA workers Ready
Kubernetes API /readyz returns ok
Still implemented by later pages:
4 production workers at 192.168.8.205-.208
Shared ARC controller
Repository and environment runner scale sets
This is a rebuild acceptance checkpoint, not a statement about the current live environment.26. Failure handling
Terraform proposes changes to existing infrastructure
Stop. Do not apply. The plan must be exactly four additions and no changes or deletions.
A worker receives the wrong DHCP address
Check its Proxmox MAC and router reservation. Do not configure a static Netplan address.
APT reports a package is unavailable
The roles already force an APT refresh. Check DNS, internet access, Ubuntu repository files, and pkgs.k8s.io. Do not rebuild the template merely to preload packages.
Containerd CRI validation fails
Run on the affected worker:
sudo systemctl status containerd --no-pager
sudo ctr plugins ls
sudo grep -nE 'disabled_plugins|SystemdCgroup' /etc/containerd/config.toml
sudo crictl --runtime-endpoint unix:///run/containerd/containerd.sock infoThe containerd role must continue using the corrected type-and-ID column checks for both legacy and containerd 2.x plugin layouts.
A worker cannot join
Inspect:
sudo journalctl -u kubelet -n 200 --no-pager
sudo crictl ps -a
sudo test -f /etc/kubernetes/kubelet.conf && echo joined || echo not-joinedRerun the approved worker-join workflow. It generates a fresh token automatically. Do not paste join commands into Git.
A worker remains NotReady
From cicd-ac-k8s-cp-01, inspect:
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get nodes -o wide
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get pods -A -o wide
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf describe node <WORKER_NAME>Also verify Calico and kubelet logs on the affected worker.
ARC pods are pending after the controller is installed later
The QA runner values must include a matching selector and toleration:
nodeSelector:
environment: qa
workload: github-runner
tolerations:
- key: environment
operator: Equal
value: qa
effect: NoSchedule27. Project status after successful completion
Use this as the expected acceptance checkpoint for a fresh rebuild. It does not assert that any previously existing Kubernetes VM or cluster is still present.
FROM-SCRATCH STATUS AFTER THIS PAGE
Expected prerequisites retained
Load balancer, API VIP, and three control planes healthy
Development workers at 192.168.8.213-.216 Ready
Expected QA result
QA Terraform definitions added
QA VMs at 192.168.8.209-.212 provisioned
QA Ubuntu baseline applied
QA Kubernetes prerequisites configured
QA workers joined and Ready
QA labels and NoSchedule taints verified
Development worker files and nodes preserved
Next page
Production workers at 192.168.8.205-.208
Later pages
Shared ARC controller
Tenant runner scale setsAfter this rebuild checkpoint passes, continue to the production-worker page to provision .205-.208 and apply environment=prod, workload=github-runner, and environment=prod:NoSchedule.