Provision and Join the Four Kubernetes Development Workers
This is the fifth infrastructure page in the from-scratch build sequence. Follow it after the control-plane page has produced a healthy three-node Kubernetes control plane; it provisions the four development ARC worker VMs.
FROM-SCRATCH SEQUENCE CHECKPOINT
Required before this page
Load-balancer VM, HAProxy, Keepalived, and API VIP configured
Three control-plane VMs provisioned and reachable
Kubernetes v1.36.1 control plane bootstrapped
Calico v3.32.0 installed
All three control planes Ready
Implemented by this page
Development worker Terraform definitions
Four development worker VM provisions
Ubuntu baseline and Kubernetes node preparation
Development worker joins
Development worker labels and taints
Implemented by later pages
QA workers
Production workers
ARC controller
Tenant runner scale sets1. Scope and execution order
This page is performed in two separate promotions:
- Terraform promotion: create only the four development worker VMs.
- Ansible promotion: after all four VMs are reachable, configure Ubuntu, containerd, Kubernetes prerequisites, join the workers, and apply the approved development labels and taints.
Do not combine these two changes into the same production promotion. The Ansible workflow must not start until Terraform has created all four VMs and DHCP has assigned the approved addresses.
The approved branch model is:
feature/*
↓ pull request
dev
↓ validation and Terraform plan only
dev → prod pull request
↓ review and shared-k8s approval
prod
↓ Terraform apply or Ansible configuration
Not used by this infrastructure execution flow:
local, qa, maindev performs validation and Terraform plan only. prod performs Terraform apply or Ansible configuration. Do not use local or qa in this infrastructure execution flow.
2. Approved development worker allocation
cicd-ac-k8s-dev-wk-01
VM ID: 3156213
MAC: aa:bb:cc:06:0f:01
Reserved IP: 192.168.8.213
CPU: 4 vCPU
RAM: 16384 MB
Disk: scsi0, 250G, local-lvm
cicd-ac-k8s-dev-wk-02
VM ID: 3156214
MAC: aa:bb:cc:06:0f:02
Reserved IP: 192.168.8.214
CPU: 4 vCPU
RAM: 16384 MB
Disk: scsi0, 250G, local-lvm
cicd-ac-k8s-dev-wk-03
VM ID: 3156215
MAC: aa:bb:cc:06:0f:03
Reserved IP: 192.168.8.215
CPU: 4 vCPU
RAM: 16384 MB
Disk: scsi0, 250G, local-lvm
cicd-ac-k8s-dev-wk-04
VM ID: 3156216
MAC: aa:bb:cc:06:0f:04
Reserved IP: 192.168.8.216
CPU: 4 vCPU
RAM: 16384 MB
Disk: scsi0, 250G, local-lvm
Shared values
Template: tmplt-ub-26-min-base / VM ID 90000
Node: pve
Bridge: vmbr0
Environment: dev
Workload: github-runnerThe environment-specific address ranges are intentionally not ordered dev-first:
Approved worker address order
Production workers: 192.168.8.205-.208
QA workers: 192.168.8.209-.212
Development workers:192.168.8.213-.216
Development workers therefore use VM IDs 3156213-.3156216.Confirm the router has all four DHCP reservations before running Terraform. Ubuntu must continue using DHCP; do not configure static Netplan addresses.
Do not change the approved VM IDs, MAC addresses, IP reservations, scsi0 disk slot, local-lvm storage, or development environment assignment while following this page.
Part A — Provision the Development Worker VMs with Terraform
3. Terraform files changed
terraform/stacks/shared-k8s/main.tf
terraform/stacks/shared-k8s/outputs.tfThe proxmox-vm and proxmox-vm-group modules created by the preceding pages remain unchanged. This page adds a new development-worker map to the existing shared Kubernetes stack.
4. Create the Terraform feature branch from dev
Run from Windows PowerShell:
cd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra
git switch dev
git pull --ff-only origin dev
git switch -c feature/provision-k8s-dev-workers5. Extend terraform/stacks/shared-k8s/main.tf
Do not replace or edit the completed load-balancer and control-plane definitions. Append this block after the existing control-plane module:
locals {
dev_workers = {
dev_wk01 = {
name = "cicd-ac-k8s-dev-wk-01"
description = "Aspireclan shared Kubernetes development ARC worker 01"
vmid = 3156213
target_node = "pve"
template_name = "tmplt-ub-26-min-base"
cores = 4
memory_mb = 16384
disk_size = "250G"
storage = "local-lvm"
bridge = "vmbr0"
mac_address = "aa:bb:cc:06:0f:01"
tags = [
"ac-cicd",
"shared-k8s",
"worker",
"dev",
"arc-runner",
"terraform",
"ansible",
]
}
dev_wk02 = {
name = "cicd-ac-k8s-dev-wk-02"
description = "Aspireclan shared Kubernetes development ARC worker 02"
vmid = 3156214
target_node = "pve"
template_name = "tmplt-ub-26-min-base"
cores = 4
memory_mb = 16384
disk_size = "250G"
storage = "local-lvm"
bridge = "vmbr0"
mac_address = "aa:bb:cc:06:0f:02"
tags = [
"ac-cicd",
"shared-k8s",
"worker",
"dev",
"arc-runner",
"terraform",
"ansible",
]
}
dev_wk03 = {
name = "cicd-ac-k8s-dev-wk-03"
description = "Aspireclan shared Kubernetes development ARC worker 03"
vmid = 3156215
target_node = "pve"
template_name = "tmplt-ub-26-min-base"
cores = 4
memory_mb = 16384
disk_size = "250G"
storage = "local-lvm"
bridge = "vmbr0"
mac_address = "aa:bb:cc:06:0f:03"
tags = [
"ac-cicd",
"shared-k8s",
"worker",
"dev",
"arc-runner",
"terraform",
"ansible",
]
}
dev_wk04 = {
name = "cicd-ac-k8s-dev-wk-04"
description = "Aspireclan shared Kubernetes development ARC worker 04"
vmid = 3156216
target_node = "pve"
template_name = "tmplt-ub-26-min-base"
cores = 4
memory_mb = 16384
disk_size = "250G"
storage = "local-lvm"
bridge = "vmbr0"
mac_address = "aa:bb:cc:06:0f:04"
tags = [
"ac-cicd",
"shared-k8s",
"worker",
"dev",
"arc-runner",
"terraform",
"ansible",
]
}
}
}
module "dev_workers" {
source = "../../modules/proxmox-vm-group"
vms = local.dev_workers
}The four workers use the approved modified sizing of 4 vCPU, 16 GB RAM, and 250 GB disk per VM.
6. Extend terraform/stacks/shared-k8s/outputs.tf
Append:
output "dev_workers" {
description = "Shared Kubernetes development worker VMs."
value = {
dev_wk01 = merge(module.dev_workers.vms["dev_wk01"], {
reserved_ip = "192.168.8.213"
mac_address = "aa:bb:cc:06:0f:01"
environment = "dev"
})
dev_wk02 = merge(module.dev_workers.vms["dev_wk02"], {
reserved_ip = "192.168.8.214"
mac_address = "aa:bb:cc:06:0f:02"
environment = "dev"
})
dev_wk03 = merge(module.dev_workers.vms["dev_wk03"], {
reserved_ip = "192.168.8.215"
mac_address = "aa:bb:cc:06:0f:03"
environment = "dev"
})
dev_wk04 = merge(module.dev_workers.vms["dev_wk04"], {
reserved_ip = "192.168.8.216"
mac_address = "aa:bb:cc:06:0f:04"
environment = "dev"
})
}
}7. Confirm the existing Terraform workflow contract
The Terraform workflows created by the preceding pages must implement:
Terraform plan workflow
push branch: dev
manual dispatch: supported
pull_request trigger: not used
action: fmt, init, validate, plan
Terraform apply workflow
push branch: prod
manual dispatch: supported from prod
action: fmt, init, validate, saved plan, apply
Persistent state
/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstateNo new workflow is required for this Terraform change. The current path filters cover the shared stack.
8. Review and commit the Terraform change
git status
git diff --check
git diff --stat
git diff -- \
terraform/stacks/shared-k8s/main.tf \
terraform/stacks/shared-k8s/outputs.tfConfirm:
- The load-balancer and three control-plane definitions created by the preceding pages remain unchanged.
- Exactly four VM additions are present.
- VM IDs are
3156213through3156216. - MAC addresses are
aa:bb:cc:06:0f:01throughaa:bb:cc:06:0f:04. - Each worker uses
4cores,16384MB RAM, and a250Gscsi0disk onlocal-lvm. - No Terraform state, Proxmox token, kubeconfig, join command, or private SSH key is staged.
Commit and push:
git add \
terraform/stacks/shared-k8s/main.tf \
terraform/stacks/shared-k8s/outputs.tf
git commit -m "Provision shared Kubernetes development workers"
git push -u origin feature/provision-k8s-dev-workers9. Create the Terraform pull request into dev
gh pr create \
--base dev \
--head feature/provision-k8s-dev-workers \
--title "Provision shared Kubernetes development workers" \
--body "Adds the four approved development worker VMs to the shared Kubernetes Terraform stack."After merge, the dev plan must end with:
Plan: 4 to add, 0 to change, 0 to destroy.Do not promote to prod when Terraform proposes any update, replacement, or deletion of the load balancer or control planes, or anything other than the four approved development workers.
10. Promote the Terraform change from dev to prod
gh pr create \
--base prod \
--head dev \
--title "Provision shared Kubernetes development workers" \
--body "Promotes the validated four-development-worker Terraform plan to prod."After merge and shared-k8s environment approval, the production workflow applies the saved plan and writes the new worker resources to the existing persistent Terraform state.
11. Verify the worker VMs in Proxmox
Run on the Proxmox host:
qm status 3156213
qm config 3156213
qm status 3156214
qm config 3156214
qm status 3156215
qm config 3156215
qm status 3156216
qm config 3156216For every VM, confirm:
- Status is
running. - CPU is
4cores. - RAM is
16384MB. scsi0is onlocal-lvmwith size250G.- The expected MAC address is present.
onbootremains enabled.
12. Verify DHCP, SSH, and sudo
Run from prod-terraform-deploy-02:
for ip in 213 214 215 216; do
echo "=== 192.168.8.${ip} ==="
ping -c 2 -W 2 "192.168.8.${ip}"
ssh \
-i ~/.ssh/id_ed25519_ansible \
-o IdentitiesOnly=yes \
-o BatchMode=yes \
-o ConnectTimeout=10 \
"acllc@192.168.8.${ip}" \
'hostnamectl --static; ip -brief address; sudo -n whoami'
doneExpected before Ansible:
.213through.216respond.- The Ansible automation key authenticates as
acllc. sudo -n whoamireturnsroot.- The Ubuntu hostname may still show the base-template hostname.
Stop here until all four workers pass SSH and passwordless sudo checks.
Part B — Configure and Join the Development Workers with Ansible
13. Ansible files changed
ansible/inventories/shared-k8s/hosts.ini
ansible/inventories/shared-k8s/group_vars/control_planes.yml
ansible/inventories/shared-k8s/group_vars/dev_workers.yml
ansible/roles/kubernetes-common/tasks/main.yml
ansible/roles/kubernetes-worker/defaults/main.yml
ansible/roles/kubernetes-worker/tasks/main.yml
ansible/playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml
ansible/playbooks/shared-k8s/07-join-dev-workers.yml
ansible/playbooks/shared-k8s/08-label-and-taint-workers.yml
.github/workflows/ansible-configure-dev-workers.ymlThe common, containerd, and Kubernetes roles carried forward from the control-plane page retain these required fixes:
- Forced APT cache refresh before package installation.
- Package-install retries.
ansible_factsaccess instead of deprecated injected fact variables./var/tmp/ansible-acllcas the remote temporary directory.- Containerd 2.x-compatible CRI plugin validation.
- Correct task-level indentation for
register,retries,delay, anduntil. - Explicit
crictlruntime and image endpoints.
14. Create the Ansible feature branch from dev
Create this branch only after the production Terraform apply has completed successfully:
cd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra
git switch dev
git pull --ff-only origin dev
git switch -c feature/configure-k8s-dev-workers15. Replace the shared inventory
Replace ansible/inventories/shared-k8s/hosts.ini with:
[load_balancers]
cicd-ac-k8s-lb-01 ansible_host=192.168.8.201 ansible_user=acllc node_primary_ip=192.168.8.201 node_interface=ens18
[first_control_plane]
cicd-ac-k8s-cp-01 ansible_host=192.168.8.202 ansible_user=acllc node_primary_ip=192.168.8.202 node_interface=ens18
[additional_control_planes]
cicd-ac-k8s-cp-02 ansible_host=192.168.8.203 ansible_user=acllc node_primary_ip=192.168.8.203 node_interface=ens18
cicd-ac-k8s-cp-03 ansible_host=192.168.8.204 ansible_user=acllc node_primary_ip=192.168.8.204 node_interface=ens18
[control_planes:children]
first_control_plane
additional_control_planes
[dev_workers]
cicd-ac-k8s-dev-wk-01 ansible_host=192.168.8.213 ansible_user=acllc node_primary_ip=192.168.8.213 node_interface=ens18
cicd-ac-k8s-dev-wk-02 ansible_host=192.168.8.214 ansible_user=acllc node_primary_ip=192.168.8.214 node_interface=ens18
cicd-ac-k8s-dev-wk-03 ansible_host=192.168.8.215 ansible_user=acllc node_primary_ip=192.168.8.215 node_interface=ens18
cicd-ac-k8s-dev-wk-04 ansible_host=192.168.8.216 ansible_user=acllc node_primary_ip=192.168.8.216 node_interface=ens18
[qa_workers]
[prod_workers]
[workers:children]
dev_workers
qa_workers
prod_workers
[k8s_cluster:children]
control_planes
workers
[all:vars]
ansible_python_interpreter=/usr/bin/python3This allocation intentionally places development workers at .213-.216. QA and production groups remain empty until their own pages are completed.
16. Generalize node firewall variables
Replace ansible/inventories/shared-k8s/group_vars/control_planes.yml with:
---
kubernetes_node_tcp_ports:
- "6443"
- "2379:2380"
- "10250"
- "10257"
- "10259"
calico_node_tcp_ports:
- "5473"
calico_node_udp_ports:
- "4789"Create ansible/inventories/shared-k8s/group_vars/dev_workers.yml:
---
worker_environment: dev
worker_workload: github-runner
kubernetes_node_tcp_ports:
- "10250"
calico_node_tcp_ports:
- "5473"
calico_node_udp_ports:
- "4789"
worker_labels:
environment: dev
workload: github-runner
worker_taints:
- key: environment
value: dev
effect: NoScheduleUFW remains disabled unless a later hardening phase enables it. These rules are pre-created so later firewall activation does not break kubelet or Calico communication.
17. Update the generic Kubernetes firewall tasks
In ansible/roles/kubernetes-common/tasks/main.yml, replace the three existing control-plane-specific UFW tasks with:
- name: Allow approved Kubernetes control-plane TCP ports through UFW
ansible.builtin.command:
cmd: >-
ufw allow from 192.168.8.0/22 to any port {{ item }} proto tcp
loop: "{{ kubernetes_node_tcp_ports | default([]) }}"
register: kubernetes_ufw_tcp_rules
changed_when: "'Rule added' in kubernetes_ufw_tcp_rules.stdout"
- name: Allow Calico node TCP ports through UFW
ansible.builtin.command:
cmd: >-
ufw allow from 192.168.8.0/22 to any port {{ item }} proto tcp
loop: "{{ calico_node_tcp_ports | default([]) }}"
register: calico_ufw_tcp_rules
changed_when: "'Rule added' in calico_ufw_tcp_rules.stdout"
- name: Allow Calico node UDP ports through UFW
ansible.builtin.command:
cmd: >-
ufw allow from 192.168.8.0/22 to any port {{ item }} proto udp
loop: "{{ calico_node_udp_ports | default([]) }}"
register: calico_ufw_udp_rules
changed_when: "'Rule added' in calico_ufw_udp_rules.stdout"Keep every other task in the role unchanged, including the corrected containerd/CRI validation and explicit crictl endpoint verification.
18. Generalize the Kubernetes preparation playbook
Replace ansible/playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml with:
---
- name: Prepare Kubernetes nodes
hosts: k8s_cluster
become: true
gather_facts: true
roles:
- role: containerd
- role: kubernetes-commonThe worker workflow uses --limit dev_workers, so this playbook prepares only the four new workers during this phase.
19. Implement the Kubernetes worker role
19.1 Create or replace ansible/roles/kubernetes-worker/defaults/main.yml
---
worker_join_command: ""19.2 Create or replace ansible/roles/kubernetes-worker/tasks/main.yml
---
- name: Confirm a generated worker join command is available
ansible.builtin.assert:
that:
- worker_join_command | length > 0
fail_msg: "The first control plane did not provide a worker join command."
no_log: true
- name: Check whether this worker is already joined
ansible.builtin.stat:
path: /etc/kubernetes/kubelet.conf
register: worker_kubelet_config
- name: Join this node as a Kubernetes worker
ansible.builtin.command:
cmd: >-
{{ worker_join_command }}
--node-name {{ inventory_hostname }}
--cri-socket {{ kubernetes_cri_socket }}
creates: /etc/kubernetes/kubelet.conf
when: not worker_kubelet_config.stat.exists
no_log: true
- name: Enable and start kubelet
ansible.builtin.service:
name: kubelet
enabled: true
state: started
- name: Wait for the kubelet configuration to exist
ansible.builtin.wait_for:
path: /etc/kubernetes/kubelet.conf
timeout: 300The creates: /etc/kubernetes/kubelet.conf guard makes normal reruns safe. A worker that has already joined is not joined again.
20. Add the worker-join playbook
Create or replace ansible/playbooks/shared-k8s/07-join-dev-workers.yml:
---
- name: Generate a fresh Kubernetes worker join command
hosts: first_control_plane
become: true
gather_facts: false
tasks:
- name: Confirm the Kubernetes API is ready
ansible.builtin.command:
cmd: kubectl --kubeconfig=/etc/kubernetes/admin.conf get --raw=/readyz
register: worker_join_api_ready
changed_when: false
retries: 12
delay: 10
until: worker_join_api_ready.stdout | trim == "ok"
- name: Generate a fresh worker bootstrap-token join command
ansible.builtin.command:
cmd: kubeadm token create --ttl 2h --print-join-command
register: generated_worker_join_command
changed_when: true
no_log: true
- name: Store the temporary worker join command in memory
ansible.builtin.set_fact:
shared_worker_join_command: "{{ generated_worker_join_command.stdout }}"
no_log: true
- name: Join the development workers one at a time
hosts: dev_workers
serial: 1
become: true
gather_facts: true
vars:
worker_join_command: >-
{{ hostvars[groups['first_control_plane'][0]].shared_worker_join_command }}
roles:
- role: kubernetes-worker
- name: Verify all development workers are Ready
hosts: first_control_plane
become: true
gather_facts: false
environment:
KUBECONFIG: /etc/kubernetes/admin.conf
tasks:
- name: Wait for each development worker to become Ready
ansible.builtin.command:
cmd: >-
kubectl wait
--for=condition=Ready
node/{{ item }}
--timeout=15m
loop: "{{ groups['dev_workers'] }}"
changed_when: false
- name: Display the joined development workers
ansible.builtin.command:
cmd: kubectl get nodes -o wide
register: joined_dev_workers
changed_when: false
- name: Print the development worker table
ansible.builtin.debug:
var: joined_dev_workers.stdout_linesThe join token exists only in Ansible memory, is hidden from logs, expires after two hours, and is never committed to Git.
21. Add the development labels-and-taints playbook
Create or replace ansible/playbooks/shared-k8s/08-label-and-taint-workers.yml:
---
- name: Apply development worker labels and taints
hosts: first_control_plane
become: true
gather_facts: false
environment:
KUBECONFIG: /etc/kubernetes/admin.conf
tasks:
- name: Apply the approved development worker labels
ansible.builtin.command:
cmd: >-
kubectl label node {{ item }}
environment={{ hostvars[item].worker_environment | default('dev') }}
workload={{ hostvars[item].worker_workload | default('github-runner') }}
--overwrite
loop: "{{ groups['dev_workers'] }}"
register: dev_worker_label_results
changed_when: "'not labeled' not in dev_worker_label_results.stdout"
- name: Apply the approved development worker taint
ansible.builtin.command:
cmd: >-
kubectl taint node {{ item }}
environment=dev:NoSchedule
--overwrite
loop: "{{ groups['dev_workers'] }}"
register: dev_worker_taint_results
changed_when: "'not tainted' not in dev_worker_taint_results.stdout"
- name: Display development worker labels
ansible.builtin.command:
cmd: >-
kubectl get nodes
-l environment=dev,workload=github-runner
-L environment,workload
-o wide
register: labeled_dev_workers
changed_when: false
- name: Print labeled development workers
ansible.builtin.debug:
var: labeled_dev_workers.stdout_lines
- name: Verify the development taint on every worker
ansible.builtin.shell:
executable: /bin/bash
cmd: |
set -euo pipefail
kubectl get node {{ item }} \
-o jsonpath='{range .spec.taints[*]}{.key}={.value}:{.effect}{"\n"}{end}' |
grep -Fx 'environment=dev:NoSchedule'
loop: "{{ groups['dev_workers'] }}"
changed_when: falseEvery development worker receives:
environment=dev
workload=github-runner
environment=dev:NoScheduleThe future development ARC runner scale sets must use a matching node selector and toleration.
22. Add the development-worker GitHub Actions workflow
Create .github/workflows/ansible-configure-dev-workers.yml:
name: Ansible Configure - Kubernetes Development Workers
on:
push:
branches:
- dev
- prod
paths:
- "ansible/inventories/shared-k8s/group_vars/dev_workers.yml"
- "ansible/playbooks/shared-k8s/07-join-dev-workers.yml"
- "ansible/playbooks/shared-k8s/08-label-and-taint-workers.yml"
- ".github/workflows/ansible-configure-dev-workers.yml"
workflow_dispatch:
permissions:
contents: read
concurrency:
group: shared-k8s-ansible
cancel-in-progress: false
env:
ANSIBLE_CONFIG: ${{ github.workspace }}/ansible/ansible.cfg
jobs:
validate:
name: Validate development-worker Ansible configuration
runs-on:
- self-hosted
- Linux
- X64
- prod
- terraform
- deploy
- ac-cicd-infra
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Verify Ansible
shell: bash
run: |
set -euo pipefail
ansible --version
ansible-playbook --version
- name: Validate the shared inventory
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-inventory -i inventories/shared-k8s/hosts.ini --graph
- name: Syntax-check the development-worker playbooks
working-directory: ansible
shell: bash
run: |
set -euo pipefail
for playbook in \
playbooks/shared-k8s/01-common-baseline.yml \
playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml \
playbooks/shared-k8s/07-join-dev-workers.yml \
playbooks/shared-k8s/08-label-and-taint-workers.yml
do
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
"${playbook}" \
--syntax-check
done
configure:
name: Configure and join development workers
needs:
- validate
if: >-
(github.event_name == 'push' && github.ref_name == 'prod') ||
(github.event_name == 'workflow_dispatch' && github.ref_name == 'prod')
environment:
name: shared-k8s
runs-on:
- self-hosted
- Linux
- X64
- prod
- terraform
- deploy
- ac-cicd-infra
timeout-minutes: 180
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Verify the production branch
shell: bash
run: |
set -euo pipefail
if [ "${GITHUB_REF_NAME}" != "prod" ]; then
echo "ERROR: Development-worker configuration is permitted only from prod."
exit 1
fi
- name: Prepare the existing Ansible SSH key
shell: bash
run: |
set -euo pipefail
KEY_PATH="${HOME}/.ssh/id_ed25519_ansible"
if [ ! -f "${KEY_PATH}" ]; then
echo "ERROR: Missing Ansible key: ${KEY_PATH}"
exit 1
fi
chmod 600 "${KEY_PATH}"
echo "ANSIBLE_PRIVATE_KEY_FILE=${KEY_PATH}" >> "${GITHUB_ENV}"
- name: Refresh development-worker SSH host keys
shell: bash
run: |
set -euo pipefail
mkdir -p "${HOME}/.ssh"
chmod 700 "${HOME}/.ssh"
touch "${HOME}/.ssh/known_hosts"
chmod 600 "${HOME}/.ssh/known_hosts"
for ip in 192.168.8.213 192.168.8.214 192.168.8.215 192.168.8.216; do
ssh-keygen -f "${HOME}/.ssh/known_hosts" -R "${ip}" || true
captured=false
for attempt in $(seq 1 60); do
if ssh-keyscan -T 5 -H "${ip}" >> "${HOME}/.ssh/known_hosts" 2>/dev/null; then
echo "SSH host key captured for ${ip}."
captured=true
break
fi
echo "Waiting for SSH on ${ip} (attempt ${attempt}/60)..."
sleep 10
done
if [ "${captured}" != "true" ]; then
echo "ERROR: Unable to capture SSH host key for ${ip}."
exit 1
fi
done
- name: Prepare the Ansible remote temporary directory
shell: bash
run: |
set -euo pipefail
for ip in 192.168.8.213 192.168.8.214 192.168.8.215 192.168.8.216; do
ssh \
-i "${ANSIBLE_PRIVATE_KEY_FILE}" \
-o IdentitiesOnly=yes \
-o BatchMode=yes \
"acllc@${ip}" \
'sudo install -d -m 0700 -o acllc -g acllc /var/tmp/ansible-acllc'
done
- name: Verify Ansible connectivity
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible \
-i inventories/shared-k8s/hosts.ini \
dev_workers \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
-m ping
- name: Confirm the existing Kubernetes API is healthy
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible \
-i inventories/shared-k8s/hosts.ini \
first_control_plane \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
-b \
-m command \
-a 'kubectl --kubeconfig=/etc/kubernetes/admin.conf get --raw=/readyz'
- name: Apply the common Ubuntu baseline
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
--limit dev_workers \
playbooks/shared-k8s/01-common-baseline.yml
- name: Prepare containerd and Kubernetes prerequisites
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
--limit dev_workers \
playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml
- name: Join the development workers
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
playbooks/shared-k8s/07-join-dev-workers.yml
- name: Apply development labels and taints
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
playbooks/shared-k8s/08-label-and-taint-workers.yml
- name: Verify the completed development workers
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible \
-i inventories/shared-k8s/hosts.ini \
first_control_plane \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
-b \
-m shell \
-a '
set -e
export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl get nodes -l environment=dev,workload=github-runner -L environment,workload -o wide
kubectl get pods -A
kubectl get --raw=/readyz
' The path filter and syntax-check list intentionally reference 07-join-dev-workers.yml. Keep that filename consistent across the playbook, workflow paths, syntax checks, and execution step.
Workflow behavior:
| Event | Result |
|---|---|
Push to dev | Inventory and playbook syntax validation only |
Push to prod | Validation, baseline, Kubernetes preparation, join, labels, taints, and verification |
Manual dispatch from prod | Idempotent development-worker reconciliation |
23. Review and commit the Ansible change
git status
git diff --check
git diff --stat
git diff -- \
ansible \
.github/workflows/ansible-configure-dev-workers.ymlConfirm:
- No private key, bootstrap token, kubeconfig, join command, or Terraform state is staged.
- All four development worker IPs are
.213-.216. - Every worker uses
environment=devandworkload=github-runner. - The taint is exactly
environment=dev:NoSchedule. - The workflow performs configuration only from
prod. - The existing control-plane inventory and variables remain present.
Commit and push:
git add \
ansible \
.github/workflows/ansible-configure-dev-workers.yml
git commit -m "Configure shared Kubernetes development workers"
git push -u origin feature/configure-k8s-dev-workers24. Create the Ansible pull request into dev
gh pr create \
--base dev \
--head feature/configure-k8s-dev-workers \
--title "Configure shared Kubernetes development workers" \
--body "Adds the Ubuntu baseline, Kubernetes prerequisites, worker joins, and approved development labels and taints."Merge only after inventory validation and all four playbook syntax checks succeed.
25. Promote the Ansible change from dev to prod
gh pr create \
--base prod \
--head dev \
--title "Configure shared Kubernetes development workers" \
--body "Promotes the validated four-development-worker configuration to prod."After merge and environment approval, the workflow runs in this order:
- Verify the existing Kubernetes API is healthy.
- Apply the common Ubuntu baseline to the four development workers.
- Configure containerd and Kubernetes prerequisites.
- Generate a temporary worker join command on
cicd-ac-k8s-cp-01. - Join the four workers serially.
- Wait for every development worker to become
Ready. - Apply the approved labels and taint.
- Verify the development worker set through the Kubernetes API.
26. Manual verification
Run from prod-terraform-deploy-02:
ssh \
-i ~/.ssh/id_ed25519_ansible \
-o IdentitiesOnly=yes \
acllc@192.168.8.202 \
'sudo bash -c "
export KUBECONFIG=/etc/kubernetes/admin.conf
echo === DEVELOPMENT WORKERS ===
kubectl get nodes \
-l environment=dev,workload=github-runner \
-L environment,workload \
-o wide
echo === TAINTS ===
for node in \
cicd-ac-k8s-dev-wk-01 \
cicd-ac-k8s-dev-wk-02 \
cicd-ac-k8s-dev-wk-03 \
cicd-ac-k8s-dev-wk-04
do
echo --- ${node} ---
kubectl get node ${node} \
-o jsonpath="{range .spec.taints[*]}{.key}={.value}:{.effect}{'\\n'}{end}"
done
echo === API READINESS ===
kubectl get --raw=/readyz
"' The worker rows must be Ready, show environment=dev and workload=github-runner, and contain the environment=dev:NoSchedule taint.
27. Expected final state
Development worker VMs:
cicd-ac-k8s-dev-wk-01 192.168.8.213 Ready
cicd-ac-k8s-dev-wk-02 192.168.8.214 Ready
cicd-ac-k8s-dev-wk-03 192.168.8.215 Ready
cicd-ac-k8s-dev-wk-04 192.168.8.216 Ready
Kubernetes labels on every development worker:
environment=dev
workload=github-runner
Kubernetes taint on every development worker:
environment=dev:NoSchedule
Cluster state:
3 control planes Ready
4 development workers Ready
Kubernetes API readyz: ok
Still pending:
4 QA workers
4 production workers
shared cluster services
ARC controller
tenant runner scale sets28. Failure handling
Terraform proposes changes to existing infrastructure
Stop. Do not apply. The plan must be exactly four additions and no changes or deletions.
A worker receives the wrong DHCP address
Check its Proxmox MAC and router reservation. Do not configure a static Netplan address.
APT reports a package is unavailable
The roles already force an APT refresh. Check DNS, internet access, Ubuntu repository files, and pkgs.k8s.io. Do not rebuild the template merely to preload packages.
Containerd CRI validation fails
Run on the affected worker:
sudo systemctl status containerd --no-pager
sudo ctr plugins ls
sudo grep -nE 'disabled_plugins|SystemdCgroup' /etc/containerd/config.toml
sudo crictl --runtime-endpoint unix:///run/containerd/containerd.sock infoThe containerd role must continue using the corrected type-and-ID column checks for both legacy and containerd 2.x plugin layouts.
A worker cannot join
Inspect:
sudo journalctl -u kubelet -n 200 --no-pager
sudo crictl ps -a
sudo test -f /etc/kubernetes/kubelet.conf && echo joined || echo not-joinedRerun the approved worker-join workflow. It generates a fresh token automatically. Do not paste join commands into Git.
A worker remains NotReady
From cicd-ac-k8s-cp-01, inspect:
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get nodes -o wide
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get pods -A -o wide
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf describe node <WORKER_NAME>Also verify Calico and kubelet logs on the affected worker.
ARC pods are pending after the controller is installed later
The development runner values must include a matching selector and toleration:
nodeSelector:
environment: dev
workload: github-runner
tolerations:
- key: environment
operator: Equal
value: dev
effect: NoSchedule29. Project status after successful completion
Use this as the expected acceptance checkpoint for a fresh rebuild. It does not assert that any previously existing Kubernetes VM or cluster is still present.
EXPECTED CHECKPOINT AFTER COMPLETING THIS PAGE
Infrastructure expected to exist
API load balancer and VIP available
Three control-plane nodes Ready
Four development worker VMs running at 192.168.8.213-.216
Development worker acceptance criteria
All four workers report Ready
environment=dev label present
workload=github-runner label present
environment=dev:NoSchedule taint present
Kubernetes API /readyz returns ok
Still implemented by later pages
Four QA workers at 192.168.8.209-.212
Four production workers at 192.168.8.205-.208
Shared ARC controller
Repository and environment runner scale sets
This is a rebuild acceptance checkpoint, not a statement about the current live environment.After this rebuild checkpoint passes, continue to the QA-worker page to provision .209-.212 and apply environment=qa, workload=github-runner, and environment=qa:NoSchedule.