Provision and Bootstrap the Three Kubernetes Control Planes
This is the fourth infrastructure page. During a from-scratch rebuild, start it only after the preceding load-balancer page has provisioned and verified the shared API load balancer, HAProxy, Keepalived, and the API VIP. This page then provisions and bootstraps the three highly available Kubernetes control-plane VMs.
Load-balancer VM provisioning Required prerequisite
HAProxy configuration Required prerequisite
Keepalived configuration Required prerequisite
API VIP 192.168.8.200 Required prerequisite
Control-plane Terraform definitions Part A of this page
Control-plane VM provisioning Part A expected result
Control-plane Ubuntu baseline Part B of this page
Kubernetes prerequisites Part B of this page
First control-plane bootstrap Part B of this page
CNI installation Part B of this page
Additional control-plane joins Part B of this page
Development workers Next page
QA workers Later page
Production workers Later page
ARC controller Later page
Tenant runner scale sets Later page1. Scope and execution order
This page is performed in two separate promotions:
- Terraform promotion: create only the three control-plane VMs.
- Ansible promotion: after all three VMs are reachable, configure Ubuntu, containerd, Kubernetes, the first control plane, Calico, and the remaining control planes.
Do not combine the Terraform and Ansible changes into the same production promotion. The Ansible workflow must not start until Terraform has created all three VMs.
The approved branch model is:
feature/*
↓ pull request
dev
↓ Terraform validation and plan only
dev → prod pull request
↓ review
prod
↓ Terraform apply or Ansible configuration
Not used by this infrastructure execution flow:
local, qa, maindev performs validation and Terraform plan only. prod performs Terraform apply or Ansible configuration. Do not use local or qa in this infrastructure execution flow.
2. Approved control-plane VM allocation
cicd-ac-k8s-cp-01
VM ID: 3156202
MAC: aa:bb:cc:05:0e:01
Reserved IP: 192.168.8.202
CPU: 4 vCPU
RAM: 8192 MB
Disk: scsi0, 100G, local-lvm
cicd-ac-k8s-cp-02
VM ID: 3156203
MAC: aa:bb:cc:05:0e:02
Reserved IP: 192.168.8.203
CPU: 4 vCPU
RAM: 8192 MB
Disk: scsi0, 100G, local-lvm
cicd-ac-k8s-cp-03
VM ID: 3156204
MAC: aa:bb:cc:05:0e:03
Reserved IP: 192.168.8.204
CPU: 4 vCPU
RAM: 8192 MB
Disk: scsi0, 100G, local-lvm
Shared values
Template: tmplt-ub-26-min-base / VM ID 90000
Node: pve
Bridge: vmbr0
API VIP: 192.168.8.200
API endpoint:cicd-ac-k8s-api.aspireclan.com:443Confirm that the router already contains these DHCP reservations and that no existing device is using .202, .203, or .204.
3. Approved Kubernetes and CNI versions
Kubernetes minor repository: v1.36
Kubernetes release: v1.36.1
Kubernetes DEB version: 1.36.1-1.1
kubeadm API: kubeadm.k8s.io/v1beta4
Container runtime: containerd
CRI socket: unix:///run/containerd/containerd.sock
CNI: Calico v3.32.0
Pod CIDR: 10.244.0.0/16
Service CIDR: 10.96.0.0/12
Service DNS domain: cluster.local
Calico encapsulation: VXLAN
Calico BGP: DisabledThe Pod CIDR deliberately uses 10.244.0.0/16. Do not use Calico's example default 192.168.0.0/16, because that overlaps the Aspireclan home-lab network.
Changing the Pod CIDR or Service CIDR after the cluster is created is a disruptive redesign. Confirm these values before the first kubeadm init.
Part A — Provision the Control-Plane VMs with Terraform
4. Terraform files changed
terraform/modules/proxmox-vm-group/main.tf
terraform/modules/proxmox-vm-group/variables.tf
terraform/modules/proxmox-vm-group/outputs.tf
terraform/modules/proxmox-vm-group/versions.tf
terraform/stacks/shared-k8s/main.tf
terraform/stacks/shared-k8s/outputs.tfThe reusable proxmox-vm module from the load-balancer page remains unchanged. This page implements the group wrapper and adds three control-plane definitions to the intermediate shared stack. Later worker pages extend this same stack; they are intentionally absent here.
5. Create the Terraform feature branch from dev
Run from Windows PowerShell:
cd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra
git switch dev
git pull --ff-only origin dev
git switch -c feature/provision-k8s-control-planes6. Implement terraform/modules/proxmox-vm-group
6.1 Replace variables.tf
variable "vms" {
description = "Map of Proxmox VM definitions keyed by a stable logical name."
type = map(object({
name = string
description = string
vmid = number
target_node = string
template_name = string
cores = number
memory_mb = number
disk_size = string
storage = string
bridge = string
mac_address = string
tags = list(string)
}))
validation {
condition = alltrue([
for vm in values(var.vms) :
can(regex("^([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}$", vm.mac_address))
])
error_message = "Every VM must have a valid six-byte colon-separated MAC address."
}
}6.2 Replace main.tf
module "vm" {
for_each = var.vms
source = "../proxmox-vm"
name = each.value.name
description = each.value.description
vmid = each.value.vmid
target_node = each.value.target_node
template_name = each.value.template_name
cores = each.value.cores
memory_mb = each.value.memory_mb
disk_size = each.value.disk_size
storage = each.value.storage
bridge = each.value.bridge
mac_address = each.value.mac_address
tags = each.value.tags
}6.3 Replace outputs.tf
output "vms" {
description = "Created Proxmox VMs keyed by their logical map key."
value = {
for key, vm in module.vm : key => {
name = vm.name
vmid = vm.vmid
target_node = vm.target_node
}
}
}6.4 Replace versions.tf
terraform {
required_version = ">= 1.15.5"
required_providers {
proxmox = {
source = "Telmate/proxmox"
version = "3.0.1-rc9"
}
}
}7. Extend the shared Kubernetes Terraform stack
7.1 Replace terraform/stacks/shared-k8s/main.tf
This is the complete intermediate file for this checkpoint. It preserves the load-balancer definition from the preceding page and adds exactly three control-plane VMs. Later worker pages replace this file with extended versions.
module "api_load_balancer" {
source = "../../modules/proxmox-vm"
name = "cicd-ac-k8s-lb-01"
description = "Aspireclan shared Kubernetes API load balancer"
vmid = 3156201
target_node = "pve"
template_name = "tmplt-ub-26-min-base"
cores = 2
memory_mb = 4096
disk_size = "40G"
storage = "local-lvm"
bridge = "vmbr0"
mac_address = "aa:bb:cc:05:14:01"
tags = [
"ac-cicd",
"shared-k8s",
"load-balancer",
"terraform",
"ansible",
]
}
locals {
control_planes = {
cp01 = {
name = "cicd-ac-k8s-cp-01"
description = "Aspireclan shared Kubernetes control plane 01"
vmid = 3156202
target_node = "pve"
template_name = "tmplt-ub-26-min-base"
cores = 4
memory_mb = 8192
disk_size = "100G"
storage = "local-lvm"
bridge = "vmbr0"
mac_address = "aa:bb:cc:05:0e:01"
tags = [
"ac-cicd",
"shared-k8s",
"control-plane",
"terraform",
"ansible",
]
}
cp02 = {
name = "cicd-ac-k8s-cp-02"
description = "Aspireclan shared Kubernetes control plane 02"
vmid = 3156203
target_node = "pve"
template_name = "tmplt-ub-26-min-base"
cores = 4
memory_mb = 8192
disk_size = "100G"
storage = "local-lvm"
bridge = "vmbr0"
mac_address = "aa:bb:cc:05:0e:02"
tags = [
"ac-cicd",
"shared-k8s",
"control-plane",
"terraform",
"ansible",
]
}
cp03 = {
name = "cicd-ac-k8s-cp-03"
description = "Aspireclan shared Kubernetes control plane 03"
vmid = 3156204
target_node = "pve"
template_name = "tmplt-ub-26-min-base"
cores = 4
memory_mb = 8192
disk_size = "100G"
storage = "local-lvm"
bridge = "vmbr0"
mac_address = "aa:bb:cc:05:0e:03"
tags = [
"ac-cicd",
"shared-k8s",
"control-plane",
"terraform",
"ansible",
]
}
}
}
module "control_planes" {
source = "../../modules/proxmox-vm-group"
vms = local.control_planes
}7.2 Replace terraform/stacks/shared-k8s/outputs.tf
output "api_load_balancer" {
description = "Kubernetes API load-balancer VM."
value = {
name = module.api_load_balancer.name
vmid = module.api_load_balancer.vmid
target_node = module.api_load_balancer.target_node
reserved_ip = "192.168.8.201"
api_vip = "192.168.8.200"
}
}
output "control_planes" {
description = "Shared Kubernetes control-plane VMs."
value = {
cp01 = merge(module.control_planes.vms["cp01"], {
reserved_ip = "192.168.8.202"
mac_address = "aa:bb:cc:05:0e:01"
})
cp02 = merge(module.control_planes.vms["cp02"], {
reserved_ip = "192.168.8.203"
mac_address = "aa:bb:cc:05:0e:02"
})
cp03 = merge(module.control_planes.vms["cp03"], {
reserved_ip = "192.168.8.204"
mac_address = "aa:bb:cc:05:0e:03"
})
}
}8. Confirm the existing Terraform workflow contract
No branch expansion is required. The cleaned repository workflows use push-based validation and apply behavior and must continue to implement:
Terraform plan workflow
push branch: dev
manual dispatch: supported
action: fmt, init, validate, plan
Terraform apply workflow
push branch: prod
manual dispatch: supported only from prod
action: fmt, init, validate, saved plan, apply
Persistent state
/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstateThe plan workflow must use actions/checkout@v5 and run from dev. The apply workflow must run only from prod and must use the same persistent local state file introduced by the load-balancer page.
9. Review and commit the Terraform change
git status
git diff --check
git diff --stat
git diff -- `
terraform/modules/proxmox-vm-group `
terraform/stacks/shared-k8s/main.tf `
terraform/stacks/shared-k8s/outputs.tfConfirm all of the following:
- The load-balancer module remains present and unchanged.
- Exactly three new VM resources are proposed.
- VM IDs are
3156202,3156203, and3156204. - MAC addresses end in
0e:01,0e:02, and0e:03. - Every control-plane disk is
scsi0,100G, andlocal-lvm. - No
.tfstate, secret, token, or private key is staged.
Commit and push:
git add `
terraform/modules/proxmox-vm-group `
terraform/stacks/shared-k8s/main.tf `
terraform/stacks/shared-k8s/outputs.tf
git commit -m "Provision shared Kubernetes control planes"
git push -u origin feature/provision-k8s-control-planes10. Create the Terraform pull request into dev
gh pr create `
--base dev `
--head feature/provision-k8s-control-planes `
--title "Provision shared Kubernetes control planes" `
--body "Adds the three approved control-plane VMs to the shared Kubernetes Terraform stack.After merge, the dev plan must end with:
Plan: 3 to add, 0 to change, 0 to destroy.Do not promote to prod if the plan proposes any update, replacement, or deletion of cicd-ac-k8s-lb-01, or if it proposes anything other than the three approved control-plane VMs.
11. Promote the Terraform change from dev to prod
gh pr create `
--base prod `
--head dev `
--title "Provision shared Kubernetes control planes" `
--body "Promotes the validated three-control-plane Terraform plan to prod.After merge and environment approval, the production apply should create the three VMs and update the persistent state file.
12. Verify the VMs in Proxmox
Run on the Proxmox host:
qm status 3156202
qm config 3156202
qm status 3156203
qm config 3156203
qm status 3156204
qm config 3156204Confirm each VM is running, has the approved VM ID, has four CPU cores, 8192 MB RAM, a 100G scsi0 disk on local-lvm, and the correct MAC address.
13. Verify DHCP and SSH
Run from prod-terraform-deploy-02:
for ip in 202 203 204; do
echo "=== 192.168.8.${ip} ==="
ping -c 2 -W 2 "192.168.8.${ip}"
ssh -i ~/.ssh/id_ed25519_ansible -o IdentitiesOnly=yes -o BatchMode=yes -o ConnectTimeout=10 "acllc@192.168.8.${ip}" 'hostnamectl --static; ip -brief address; sudo -n whoami'
doneExpected before Ansible:
- All three IP addresses respond.
- The Ansible key authenticates as
acllc. - Passwordless sudo returns
root. - The Ubuntu hostname may still be the template hostname.
Stop here until all three nodes pass SSH and sudo checks.
Part B — Bootstrap the Highly Available Control Plane with Ansible
14. Ansible and Kubernetes files changed
ansible/ansible.cfg
ansible/inventories/shared-k8s/hosts.ini
ansible/inventories/shared-k8s/group_vars/all.yml
ansible/inventories/shared-k8s/group_vars/control_planes.yml
ansible/roles/common/tasks/main.yml
ansible/roles/haproxy/tasks/main.yml
ansible/roles/containerd/tasks/main.yml
ansible/roles/containerd/handlers/main.yml
ansible/roles/kubernetes-common/tasks/main.yml
ansible/roles/kubernetes-common/handlers/main.yml
ansible/roles/kubernetes-control-plane/defaults/main.yml
ansible/roles/kubernetes-control-plane/tasks/main.yml
ansible/roles/kubernetes-control-plane/tasks/bootstrap.yml
ansible/roles/kubernetes-control-plane/tasks/join.yml
ansible/roles/kubernetes-control-plane/templates/kubeadm-init-config.yaml.j2
ansible/playbooks/shared-k8s/01-common-baseline.yml
ansible/playbooks/shared-k8s/02-configure-load-balancer.yml
ansible/playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml
ansible/playbooks/shared-k8s/04-bootstrap-first-control-plane.yml
ansible/playbooks/shared-k8s/05-join-control-planes.yml
ansible/playbooks/shared-k8s/06-install-cni.yml
kubernetes/common/cni/calico-custom-resources.yaml
.github/workflows/ansible-configure-control-planes.ymlThis phase uses stacked etcd: each control-plane VM runs its own API server and etcd member. Join commands, bootstrap tokens, and the certificate key are generated only in memory during the workflow and are hidden from logs.
15. Create the Ansible feature branch from dev
Create this branch only after the production Terraform apply has completed successfully:
cd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra
git switch dev
git pull --ff-only origin dev
git switch -c feature/bootstrap-k8s-control-planes16. Replace ansible/ansible.cfg
[defaults]
inventory = inventories/shared-k8s/hosts.ini
roles_path = roles
host_key_checking = True
retry_files_enabled = False
interpreter_python = auto_silent
stdout_callback = default
inject_facts_as_vars = False
remote_tmp = /var/tmp/ansible-acllc
timeout = 30
[ssh_connection]
pipelining = True
ssh_args = -o IdentitiesOnly=yes -o ServerAliveInterval=30 -o ServerAliveCountMax=4This keeps the permanent APT-cache fix, disables deprecated fact injection, and uses /var/tmp/ansible-acllc to avoid the remote temporary-directory warning.
17. Replace the shared inventory
Replace ansible/inventories/shared-k8s/hosts.ini with:
[load_balancers]
cicd-ac-k8s-lb-01 ansible_host=192.168.8.201 ansible_user=acllc node_primary_ip=192.168.8.201 node_interface=ens18
[first_control_plane]
cicd-ac-k8s-cp-01 ansible_host=192.168.8.202 ansible_user=acllc node_primary_ip=192.168.8.202 node_interface=ens18
[additional_control_planes]
cicd-ac-k8s-cp-02 ansible_host=192.168.8.203 ansible_user=acllc node_primary_ip=192.168.8.203 node_interface=ens18
cicd-ac-k8s-cp-03 ansible_host=192.168.8.204 ansible_user=acllc node_primary_ip=192.168.8.204 node_interface=ens18
[control_planes:children]
first_control_plane
additional_control_planes
[dev_workers]
[qa_workers]
[prod_workers]
[workers:children]
dev_workers
qa_workers
prod_workers
[k8s_cluster:children]
control_planes
workers
[all:vars]
ansible_python_interpreter=/usr/bin/python3The first control plane is deliberately separated from the two additional control planes so bootstrap and join operations can target the correct machines.
18. Add cluster-wide variables
Replace ansible/inventories/shared-k8s/group_vars/all.yml with:
---
cluster_admin_user: acllc
kubernetes_version: "v1.36.1"
kubernetes_package_version: "1.36.1-1.1"
kubernetes_minor_repository: "v1.36"
kubernetes_cri_socket: "unix:///run/containerd/containerd.sock"
kubernetes_api_endpoint: "cicd-ac-k8s-api.aspireclan.com:443"
kubernetes_api_vip: "192.168.8.200"
kubernetes_api_backend_port: 6443
kubernetes_pod_cidr: "10.244.0.0/16"
kubernetes_service_cidr: "10.96.0.0/12"
kubernetes_dns_domain: "cluster.local"
calico_version: "v3.32.0"
calico_crd_url: "https://raw.githubusercontent.com/projectcalico/calico/v3.32.0/manifests/v1_crd_projectcalico_org.yaml"
calico_operator_url: "https://raw.githubusercontent.com/projectcalico/calico/v3.32.0/manifests/tigera-operator.yaml"
managed_hosts_entries:
- ip: 192.168.8.200
names:
- cicd-ac-k8s-api.aspireclan.com
- cicd-ac-k8s-api
- ip: 192.168.8.201
names:
- cicd-ac-k8s-lb-01
- ip: 192.168.8.202
names:
- cicd-ac-k8s-cp-01
- ip: 192.168.8.203
names:
- cicd-ac-k8s-cp-02
- ip: 192.168.8.204
names:
- cicd-ac-k8s-cp-0319. Add control-plane variables
Replace ansible/inventories/shared-k8s/group_vars/control_planes.yml with:
---
kubernetes_node_tcp_ports:
- "6443"
- "2379:2380"
- "10250"
- "10257"
- "10259"
calico_node_tcp_ports:
- "5473"
calico_node_udp_ports:
- "4789"UFW is not enabled by this page, but the required Kubernetes and Calico rules are pre-created so a later firewall-hardening phase does not break the cluster.
20. Generalize the common Ubuntu role
Replace ansible/roles/common/tasks/main.yml with:
---
- name: Confirm required host identity variables are defined
ansible.builtin.assert:
that:
- node_primary_ip is defined
- node_interface is defined
- inventory_hostname | length > 0
fail_msg: >-
The inventory must define node_primary_ip and node_interface for every host.
- name: Confirm the target IP and interface match the approved inventory
ansible.builtin.assert:
that:
- ansible_facts["default_ipv4"]["address"] == node_primary_ip
- ansible_facts["default_ipv4"]["interface"] == node_interface
fail_msg: >-
The detected default IPv4 address or interface does not match the approved inventory.
- name: Force refresh the APT package cache
ansible.builtin.apt:
update_cache: true
register: common_apt_cache_refresh
retries: 5
delay: 15
until: common_apt_cache_refresh is succeeded
- name: Install common operating-system packages
ansible.builtin.apt:
name:
- ca-certificates
- curl
- gpg
- jq
- qemu-guest-agent
- ufw
state: present
register: common_package_install
retries: 3
delay: 10
until: common_package_install is succeeded
- name: Ensure the Ansible remote temporary directory exists
ansible.builtin.file:
path: /var/tmp/ansible-acllc
state: directory
owner: "{{ cluster_admin_user }}"
group: "{{ cluster_admin_user }}"
mode: "0700"
- name: Write the permanent hostname file
ansible.builtin.copy:
dest: /etc/hostname
content: "{{ inventory_hostname }}\n"
owner: root
group: root
mode: "0644"
- name: Set the active system hostname
ansible.builtin.hostname:
name: "{{ inventory_hostname }}"
- name: Set the local hostname mapping
ansible.builtin.lineinfile:
path: /etc/hosts
regexp: '^127\.0\.1\.1\s+'
line: "127.0.1.1 {{ inventory_hostname }}"
create: true
owner: root
group: root
mode: "0644"
- name: Add shared Kubernetes host mappings
ansible.builtin.blockinfile:
path: /etc/hosts
marker: "# {mark} ASPIRECLAN SHARED K8S"
block: |
{% for item in managed_hosts_entries %}
{{ item.ip }} {{ item.names | join(' ') }}
{% endfor %}
owner: root
group: root
mode: "0644"
- name: Enable and start QEMU Guest Agent
ansible.builtin.service:
name: qemu-guest-agent
enabled: true
state: started
- name: Allow SSH through UFW when UFW is enabled later
ansible.builtin.command:
cmd: ufw allow 22/tcp
register: common_ufw_ssh_rule
changed_when: "'Rule added' in common_ufw_ssh_rule.stdout"
- name: Verify the resulting hostname
ansible.builtin.command:
cmd: hostnamectl --static
register: common_configured_hostname
changed_when: false
failed_when: common_configured_hostname.stdout | trim != inventory_hostnameThe common role is shared by all infrastructure nodes. It validates every host using node_primary_ip and node_interface from inventory and forces an APT refresh before package installation.
20.1 Reconcile the existing HAProxy role and install socat
The final control-plane verification reads the HAProxy Runtime API through /run/haproxy/admin.sock. The socat client must therefore be managed by the HAProxy role rather than installed manually.
Replace ansible/roles/haproxy/tasks/main.yml with:
---
- name: Force refresh the APT package cache before installing HAProxy
ansible.builtin.apt:
update_cache: true
register: haproxy_apt_cache_refresh
retries: 5
delay: 15
until: haproxy_apt_cache_refresh is succeeded
- name: Install HAProxy and the runtime statistics client
ansible.builtin.apt:
name:
- haproxy
- socat
state: present
register: haproxy_package_install
retries: 3
delay: 10
until: haproxy_package_install is succeeded
- name: Render the Kubernetes API HAProxy configuration
ansible.builtin.template:
src: haproxy.cfg.j2
dest: /etc/haproxy/haproxy.cfg
owner: root
group: root
mode: "0644"
validate: "haproxy -c -f %s"
notify: Restart HAProxy
- name: Enable and start HAProxy
ansible.builtin.service:
name: haproxy
enabled: true
state: started
- name: Apply any pending HAProxy restart
ansible.builtin.meta: flush_handlers
- name: Validate the active HAProxy configuration
ansible.builtin.command:
cmd: haproxy -c -f /etc/haproxy/haproxy.cfg
changed_when: false
- name: Confirm the HAProxy runtime socket exists
ansible.builtin.stat:
path: /run/haproxy/admin.sock
register: haproxy_runtime_socket
- name: Assert that the HAProxy runtime socket is available
ansible.builtin.assert:
that:
- haproxy_runtime_socket.stat.exists
- haproxy_runtime_socket.stat.issock
fail_msg: >-
The HAProxy runtime socket /run/haproxy/admin.sock is not available.
- name: Confirm that the HAProxy Runtime API responds
ansible.builtin.shell:
executable: /bin/bash
cmd: |
set -euo pipefail
printf 'show info\n' |
socat - UNIX-CONNECT:/run/haproxy/admin.sock
register: haproxy_runtime_api_check
changed_when: falseThe preceding load-balancer page must already have applied this HAProxy role. The control-plane workflow does not rerun the load-balancer playbook; it checks the HAProxy runtime socket at the end of the control-plane bootstrap. Keeping this role block source-aligned ensures socat and the runtime socket are available when the load-balancer workflow is run.
21. Implement the containerd role
21.1 Replace ansible/roles/containerd/tasks/main.yml
---
- name: Force refresh the APT package cache before configuring containerd
ansible.builtin.apt:
update_cache: true
register: containerd_apt_cache_refresh
retries: 5
delay: 15
until: containerd_apt_cache_refresh is succeeded
- name: Ensure containerd is installed
ansible.builtin.apt:
name:
- containerd.io
state: present
register: containerd_package_install
retries: 3
delay: 10
until: containerd_package_install is succeeded
- name: Ensure the containerd configuration directory exists
ansible.builtin.file:
path: /etc/containerd
state: directory
owner: root
group: root
mode: "0755"
- name: Check whether the containerd configuration already exists
ansible.builtin.stat:
path: /etc/containerd/config.toml
register: containerd_config_file
- name: Generate the default containerd configuration when missing
ansible.builtin.shell:
cmd: containerd config default > /etc/containerd/config.toml
when: not containerd_config_file.stat.exists
notify: Restart containerd
- name: Ensure the CRI plugin is not disabled
ansible.builtin.lineinfile:
path: /etc/containerd/config.toml
regexp: '^disabled_plugins\s*='
line: 'disabled_plugins = []'
insertbefore: BOF
owner: root
group: root
mode: "0644"
notify: Restart containerd
- name: Configure containerd to use the systemd cgroup driver
ansible.builtin.replace:
path: /etc/containerd/config.toml
regexp: 'SystemdCgroup = false'
replace: 'SystemdCgroup = true'
notify: Restart containerd
- name: Write the crictl runtime configuration
ansible.builtin.copy:
dest: /etc/crictl.yaml
owner: root
group: root
mode: "0644"
content: |
runtime-endpoint: {{ kubernetes_cri_socket }}
image-endpoint: {{ kubernetes_cri_socket }}
timeout: 10
debug: false
- name: Enable and start containerd
ansible.builtin.service:
name: containerd
enabled: true
state: started
- name: Apply any pending containerd restart
ansible.builtin.meta: flush_handlers
- name: Confirm the containerd CRI plugin is healthy
ansible.builtin.shell:
executable: /bin/bash
cmd: |
set -euo pipefail
plugin_output="$(ctr plugins ls)"
printf '%s\n' "${plugin_output}"
printf '%s\n' "${plugin_output}" |
awk '
$1 == "io.containerd.grpc.v1" &&
$2 == "cri" &&
$NF == "ok" {
legacy_cri_ok = 1
}
$1 == "io.containerd.cri.v1" &&
$2 == "runtime" &&
$NF == "ok" {
runtime_cri_ok = 1
}
$1 == "io.containerd.cri.v1" &&
$2 == "images" &&
$NF == "ok" {
images_cri_ok = 1
}
END {
if (legacy_cri_ok || (runtime_cri_ok && images_cri_ok)) {
exit 0
}
exit 1
}
'
register: containerd_cri_plugin_check
changed_when: false
retries: 15
delay: 2
until: containerd_cri_plugin_check.rc == 021.2 Replace ansible/roles/containerd/handlers/main.yml
---
- name: Restart containerd
ansible.builtin.service:
name: containerd
state: restartedThe role preserves the template's Docker installation but configures the shared containerd runtime for Kubernetes, enables the CRI plugin, and sets SystemdCgroup = true.
The CRI validation intentionally parses the TYPE and ID columns from ctr plugins ls separately. It supports both the legacy io.containerd.grpc.v1 / cri layout and the containerd 2.x io.containerd.cri.v1 / runtime plus images layout. The task-level register, retries, delay, until, and changed_when keywords must remain aligned with ansible.builtin.shell; placing them inside the module block causes an unsupported-parameter failure.
22. Implement the Kubernetes common role
22.1 Replace ansible/roles/kubernetes-common/tasks/main.yml
---
- name: Disable swap immediately
ansible.builtin.command:
cmd: swapoff -a
changed_when: ansible_facts["swaptotal_mb"] | int > 0
- name: Disable swap entries permanently in fstab
ansible.builtin.replace:
path: /etc/fstab
regexp: '^([^#].*\s+swap\s+.*)$'
replace: '# \1'
- name: Configure Kubernetes kernel modules
ansible.builtin.copy:
dest: /etc/modules-load.d/k8s.conf
owner: root
group: root
mode: "0644"
content: |
overlay
br_netfilter
- name: Load Kubernetes kernel modules now
ansible.builtin.command:
cmd: "modprobe {{ item }}"
loop:
- overlay
- br_netfilter
changed_when: false
- name: Configure Kubernetes networking sysctls
ansible.builtin.copy:
dest: /etc/sysctl.d/99-kubernetes-cri.conf
owner: root
group: root
mode: "0644"
content: |
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
notify: Reload Kubernetes sysctls
- name: Apply pending sysctl changes
ansible.builtin.meta: flush_handlers
- name: Force refresh the APT cache before Kubernetes repository setup
ansible.builtin.apt:
update_cache: true
register: kubernetes_prerequisite_apt_refresh
retries: 5
delay: 15
until: kubernetes_prerequisite_apt_refresh is succeeded
- name: Install Kubernetes repository prerequisites
ansible.builtin.apt:
name:
- apt-transport-https
- ca-certificates
- curl
- gpg
state: present
register: kubernetes_prerequisite_packages
retries: 3
delay: 10
until: kubernetes_prerequisite_packages is succeeded
- name: Ensure the APT keyring directory exists
ansible.builtin.file:
path: /etc/apt/keyrings
state: directory
owner: root
group: root
mode: "0755"
- name: Install the Kubernetes repository signing key
ansible.builtin.shell:
cmd: >-
curl -fsSL
https://pkgs.k8s.io/core:/stable:/{{ kubernetes_minor_repository }}/deb/Release.key |
gpg --dearmor --yes --output /etc/apt/keyrings/kubernetes-apt-keyring.gpg
changed_when: false
- name: Configure the Kubernetes minor-version repository
ansible.builtin.copy:
dest: /etc/apt/sources.list.d/kubernetes.list
owner: root
group: root
mode: "0644"
content: >-
deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg]
https://pkgs.k8s.io/core:/stable:/{{ kubernetes_minor_repository }}/deb/ /
- name: Force refresh the APT cache after adding the Kubernetes repository
ansible.builtin.apt:
update_cache: true
register: kubernetes_repository_apt_refresh
retries: 5
delay: 15
until: kubernetes_repository_apt_refresh is succeeded
- name: Install the approved Kubernetes packages
ansible.builtin.apt:
name:
- "kubelet={{ kubernetes_package_version }}"
- "kubeadm={{ kubernetes_package_version }}"
- "kubectl={{ kubernetes_package_version }}"
- cri-tools
state: present
allow_downgrade: true
register: kubernetes_package_install
retries: 3
delay: 10
until: kubernetes_package_install is succeeded
- name: Hold Kubernetes packages for controlled upgrades
ansible.builtin.dpkg_selections:
name: "{{ item }}"
selection: hold
loop:
- kubelet
- kubeadm
- kubectl
- name: Configure the kubelet node IP
ansible.builtin.copy:
dest: /etc/default/kubelet
owner: root
group: root
mode: "0644"
content: |
KUBELET_EXTRA_ARGS=--node-ip={{ node_primary_ip }}
notify: Restart kubelet
- name: Enable kubelet at boot
ansible.builtin.systemd:
name: kubelet
enabled: true
daemon_reload: true
- name: Allow approved Kubernetes control-plane TCP ports through UFW
ansible.builtin.command:
cmd: >-
ufw allow from 192.168.8.0/22 to any port {{ item }} proto tcp
loop: "{{ kubernetes_node_tcp_ports | default([]) }}"
register: kubernetes_ufw_tcp_rules
changed_when: "'Rule added' in kubernetes_ufw_tcp_rules.stdout"
- name: Allow Calico node TCP ports through UFW
ansible.builtin.command:
cmd: >-
ufw allow from 192.168.8.0/22 to any port {{ item }} proto tcp
loop: "{{ calico_node_tcp_ports | default([]) }}"
register: calico_ufw_tcp_rules
changed_when: "'Rule added' in calico_ufw_tcp_rules.stdout"
- name: Allow Calico node UDP ports through UFW
ansible.builtin.command:
cmd: >-
ufw allow from 192.168.8.0/22 to any port {{ item }} proto udp
loop: "{{ calico_node_udp_ports | default([]) }}"
register: calico_ufw_udp_rules
changed_when: "'Rule added' in calico_ufw_udp_rules.stdout"
- name: Apply any pending kubelet restart
ansible.builtin.meta: flush_handlers
- name: Pull the approved Kubernetes control-plane images
ansible.builtin.command:
cmd: >-
kubeadm config images pull
--kubernetes-version {{ kubernetes_version }}
--cri-socket {{ kubernetes_cri_socket }}
register: kubernetes_image_pull
changed_when: "'pulled' in kubernetes_image_pull.stdout | lower"
retries: 3
delay: 15
until: kubernetes_image_pull is succeeded
- name: Verify that swap is disabled
ansible.builtin.command:
cmd: swapon --show
register: kubernetes_swap_status
changed_when: false
failed_when: kubernetes_swap_status.stdout | trim | length > 0
- name: Verify the container runtime through CRI
ansible.builtin.command:
cmd: >-
crictl
--runtime-endpoint {{ kubernetes_cri_socket }}
--image-endpoint {{ kubernetes_cri_socket }}
info
register: kubernetes_cri_runtime_check
changed_when: false
retries: 15
delay: 2
until: kubernetes_cri_runtime_check.rc == 0
- name: Verify installed Kubernetes versions
ansible.builtin.shell:
cmd: |
kubeadm version -o short
kubelet --version
kubectl version --client=true
changed_when: false22.2 Replace ansible/roles/kubernetes-common/handlers/main.yml
---
- name: Reload Kubernetes sysctls
ansible.builtin.command:
cmd: sysctl --system
- name: Restart kubelet
ansible.builtin.service:
name: kubelet
state: restartedThe kubelet can restart while waiting for kubeadm configuration. That behavior before kubeadm init or kubeadm join is expected.
23. Implement the Kubernetes control-plane role
23.1 Replace defaults/main.yml
---
kubernetes_control_plane_action: bootstrap
control_plane_join_command: ""23.2 Replace tasks/main.yml
---
- name: Include first-control-plane bootstrap tasks
ansible.builtin.include_tasks: bootstrap.yml
when: kubernetes_control_plane_action == "bootstrap"
- name: Include additional-control-plane join tasks
ansible.builtin.include_tasks: join.yml
when: kubernetes_control_plane_action == "join"23.3 Create tasks/bootstrap.yml
---
- name: Render the kubeadm initialization configuration
ansible.builtin.template:
src: kubeadm-init-config.yaml.j2
dest: /etc/kubernetes/kubeadm-init-config.yaml
owner: root
group: root
mode: "0600"
- name: Validate the kubeadm initialization configuration
ansible.builtin.command:
cmd: kubeadm config validate --config /etc/kubernetes/kubeadm-init-config.yaml
changed_when: false
- name: Bootstrap the first control plane and upload shared certificates
ansible.builtin.command:
cmd: >-
kubeadm init
--config /etc/kubernetes/kubeadm-init-config.yaml
--upload-certs
creates: /etc/kubernetes/admin.conf
no_log: true
- name: Ensure the administrator kubeconfig directory exists
ansible.builtin.file:
path: "/home/{{ cluster_admin_user }}/.kube"
state: directory
owner: "{{ cluster_admin_user }}"
group: "{{ cluster_admin_user }}"
mode: "0700"
- name: Install the administrator kubeconfig
ansible.builtin.copy:
src: /etc/kubernetes/admin.conf
dest: "/home/{{ cluster_admin_user }}/.kube/config"
remote_src: true
owner: "{{ cluster_admin_user }}"
group: "{{ cluster_admin_user }}"
mode: "0600"
- name: Wait for the Kubernetes API through the load-balancer endpoint
ansible.builtin.command:
cmd: kubectl --kubeconfig=/etc/kubernetes/admin.conf get --raw=/readyz
register: first_control_plane_readyz
changed_when: false
retries: 30
delay: 10
until: first_control_plane_readyz.stdout | trim == "ok"
- name: Verify the first control-plane node registration
ansible.builtin.command:
cmd: >-
kubectl --kubeconfig=/etc/kubernetes/admin.conf
get node {{ inventory_hostname }} -o wide
changed_when: false23.4 Create tasks/join.yml
---
- name: Confirm a generated control-plane join command is available
ansible.builtin.assert:
that:
- control_plane_join_command | length > 0
fail_msg: "The first control plane did not provide a join command."
no_log: true
- name: Join this node as an additional control plane
ansible.builtin.command:
cmd: >-
{{ control_plane_join_command }}
--apiserver-advertise-address {{ node_primary_ip }}
--node-name {{ inventory_hostname }}
--cri-socket {{ kubernetes_cri_socket }}
creates: /etc/kubernetes/admin.conf
no_log: true
- name: Ensure the administrator kubeconfig directory exists
ansible.builtin.file:
path: "/home/{{ cluster_admin_user }}/.kube"
state: directory
owner: "{{ cluster_admin_user }}"
group: "{{ cluster_admin_user }}"
mode: "0700"
- name: Install the administrator kubeconfig
ansible.builtin.copy:
src: /etc/kubernetes/admin.conf
dest: "/home/{{ cluster_admin_user }}/.kube/config"
remote_src: true
owner: "{{ cluster_admin_user }}"
group: "{{ cluster_admin_user }}"
mode: "0600"
- name: Wait for the local API server to listen
ansible.builtin.wait_for:
host: "{{ node_primary_ip }}"
port: 6443
timeout: 30023.5 Create templates/kubeadm-init-config.yaml.j2
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: "{{ node_primary_ip }}"
bindPort: 6443
nodeRegistration:
name: "{{ inventory_hostname }}"
criSocket: "{{ kubernetes_cri_socket }}"
kubeletExtraArgs:
- name: node-ip
value: "{{ node_primary_ip }}"
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
kubernetesVersion: "{{ kubernetes_version }}"
controlPlaneEndpoint: "{{ kubernetes_api_endpoint }}"
networking:
podSubnet: "{{ kubernetes_pod_cidr }}"
serviceSubnet: "{{ kubernetes_service_cidr }}"
dnsDomain: "{{ kubernetes_dns_domain }}"
apiServer:
certSANs:
- "cicd-ac-k8s-api.aspireclan.com"
- "cicd-ac-k8s-api"
- "192.168.8.200"
- "192.168.8.202"
- "192.168.8.203"
- "192.168.8.204"
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: iptablesThe global controlPlaneEndpoint exactly matches the completed HAProxy and Keepalived endpoint: cicd-ac-k8s-api.aspireclan.com:443.
24. Add the Calico custom resources
Create kubernetes/common/cni/calico-custom-resources.yaml:
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
calicoNetwork:
bgp: Disabled
ipPools:
- name: default-ipv4-ippool
blockSize: 26
cidr: 10.244.0.0/16
encapsulation: VXLAN
natOutgoing: Enabled
nodeSelector: all()This file overrides Calico's example CIDR with the approved non-overlapping 10.244.0.0/16 Pod network and uses VXLAN with BGP disabled.
25. Replace the common-baseline playbook
Replace ansible/playbooks/shared-k8s/01-common-baseline.yml with:
---
- name: Apply the common Ubuntu baseline
hosts: all
become: true
gather_facts: true
roles:
- role: commonThe workflows use --limit when the baseline should target only one infrastructure group.
26. Replace the Kubernetes preparation playbook
Replace ansible/playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml with:
---
- name: Prepare Kubernetes nodes
hosts: k8s_cluster
become: true
gather_facts: true
roles:
- role: containerd
- role: kubernetes-common27. Replace the first-control-plane bootstrap playbook
Replace ansible/playbooks/shared-k8s/04-bootstrap-first-control-plane.yml with:
---
- name: Bootstrap the first Kubernetes control plane
hosts: first_control_plane
become: true
gather_facts: true
roles:
- role: kubernetes-control-plane
vars:
kubernetes_control_plane_action: bootstrap28. Replace the CNI installation playbook
Replace ansible/playbooks/shared-k8s/06-install-cni.yml with:
---
- name: Install Calico on the first control plane
hosts: first_control_plane
become: true
gather_facts: false
environment:
KUBECONFIG: /etc/kubernetes/admin.conf
tasks:
- name: Install Calico custom resource definitions
ansible.builtin.command:
cmd: "kubectl apply --server-side -f {{ calico_crd_url }}"
register: calico_crd_apply
changed_when: "'configured' in calico_crd_apply.stdout or 'created' in calico_crd_apply.stdout"
- name: Install the Tigera operator
ansible.builtin.command:
cmd: "kubectl apply -f {{ calico_operator_url }}"
register: calico_operator_apply
changed_when: "'configured' in calico_operator_apply.stdout or 'created' in calico_operator_apply.stdout"
- name: Copy the approved Calico custom resources
ansible.builtin.copy:
src: "{{ playbook_dir }}/../../../kubernetes/common/cni/calico-custom-resources.yaml"
dest: /etc/kubernetes/calico-custom-resources.yaml
owner: root
group: root
mode: "0644"
- name: Apply the approved Calico custom resources
ansible.builtin.command:
cmd: kubectl apply -f /etc/kubernetes/calico-custom-resources.yaml
register: calico_custom_resources_apply
changed_when: >-
'configured' in calico_custom_resources_apply.stdout or
'created' in calico_custom_resources_apply.stdout
- name: Wait for the Tigera operator deployment
ansible.builtin.command:
cmd: >-
kubectl -n tigera-operator
rollout status deployment/tigera-operator
--timeout=10m
changed_when: false
- name: Wait for Calico to report Available
ansible.builtin.command:
cmd: kubectl wait --for=condition=Available tigerastatus/calico --timeout=15m
register: calico_available
changed_when: false
retries: 3
delay: 20
until: calico_available is succeeded
- name: Wait for the first control-plane node to become Ready
ansible.builtin.command:
cmd: >-
kubectl wait
--for=condition=Ready
node/{{ inventory_hostname }}
--timeout=15m
changed_when: falseThe execution order intentionally runs playbook 06 before playbook 05. Kubernetes networking must be installed before the additional control planes are joined.
29. Replace the additional-control-plane join playbook
Replace ansible/playbooks/shared-k8s/05-join-control-planes.yml with:
---
- name: Generate a fresh control-plane join command
hosts: first_control_plane
become: true
gather_facts: false
tasks:
- name: Generate a fresh bootstrap-token join command
ansible.builtin.command:
cmd: kubeadm token create --ttl 2h --print-join-command
register: generated_base_join_command
changed_when: true
no_log: true
- name: Re-upload control-plane certificates and generate a fresh key
ansible.builtin.command:
cmd: kubeadm init phase upload-certs --upload-certs
register: generated_certificate_key
changed_when: true
no_log: true
- name: Build the temporary control-plane join command
ansible.builtin.set_fact:
generated_control_plane_join_command: >-
{{ generated_base_join_command.stdout }}
--control-plane
--certificate-key {{ generated_certificate_key.stdout_lines | last }}
no_log: true
- name: Join the remaining control planes one at a time
hosts: additional_control_planes
serial: 1
become: true
gather_facts: true
vars:
control_plane_join_command: >-
{{ hostvars[groups['first_control_plane'][0]].generated_control_plane_join_command }}
roles:
- role: kubernetes-control-plane
vars:
kubernetes_control_plane_action: join
- name: Verify the complete highly available control plane
hosts: first_control_plane
become: true
gather_facts: false
environment:
KUBECONFIG: /etc/kubernetes/admin.conf
tasks:
- name: Wait for every control-plane node to become Ready
ansible.builtin.command:
cmd: kubectl wait --for=condition=Ready nodes --all --timeout=15m
changed_when: false
- name: Rebalance CoreDNS after additional control planes join
ansible.builtin.command:
cmd: kubectl -n kube-system rollout restart deployment/coredns
changed_when: true
- name: Wait for CoreDNS rollout completion
ansible.builtin.command:
cmd: kubectl -n kube-system rollout status deployment/coredns --timeout=10m
changed_when: false
- name: Display the final control-plane nodes
ansible.builtin.command:
cmd: kubectl get nodes -o wide
register: final_control_plane_nodes
changed_when: false
- name: Print the final control-plane node table
ansible.builtin.debug:
var: final_control_plane_nodes.stdout_linesThe two additional control planes join serially. The temporary certificate key and bootstrap token are hidden with no_log and are never committed.
30. Add the control-plane GitHub Actions workflow
Create .github/workflows/ansible-configure-control-planes.yml:
name: Ansible Configure - Kubernetes Control Planes
on:
push:
branches:
- dev
- prod
paths:
- "ansible/inventories/shared-k8s/group_vars/control_planes.yml"
- "ansible/roles/kubernetes-control-plane/**"
- "ansible/playbooks/shared-k8s/04-bootstrap-first-control-plane.yml"
- "ansible/playbooks/shared-k8s/05-join-control-planes.yml"
- "ansible/playbooks/shared-k8s/06-install-cni.yml"
- "kubernetes/common/cni/**"
- ".github/workflows/ansible-configure-control-planes.yml"
workflow_dispatch:
permissions:
contents: read
concurrency:
group: shared-k8s-ansible
cancel-in-progress: false
env:
ANSIBLE_CONFIG: ${{ github.workspace }}/ansible/ansible.cfg
jobs:
validate:
name: Validate control-plane Ansible configuration
runs-on:
- self-hosted
- Linux
- X64
- prod
- terraform
- deploy
- ac-cicd-infra
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Verify Ansible
shell: bash
run: |
set -euo pipefail
ansible --version
ansible-playbook --version
- name: Validate the shared inventory
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-inventory -i inventories/shared-k8s/hosts.ini --graph
- name: Syntax-check the control-plane playbooks
working-directory: ansible
shell: bash
run: |
set -euo pipefail
for playbook in \
playbooks/shared-k8s/01-common-baseline.yml \
playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml \
playbooks/shared-k8s/04-bootstrap-first-control-plane.yml \
playbooks/shared-k8s/06-install-cni.yml \
playbooks/shared-k8s/05-join-control-planes.yml
do
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
"${playbook}" \
--syntax-check
done
configure:
name: Bootstrap the Kubernetes control planes
needs:
- validate
if: >-
(github.event_name == 'push' && github.ref_name == 'prod') ||
(github.event_name == 'workflow_dispatch' && github.ref_name == 'prod')
environment:
name: shared-k8s
runs-on:
- self-hosted
- Linux
- X64
- prod
- terraform
- deploy
- ac-cicd-infra
timeout-minutes: 150
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Verify the production branch
shell: bash
run: |
set -euo pipefail
if [ "${GITHUB_REF_NAME}" != "prod" ]; then
echo "ERROR: Control-plane configuration is permitted only from prod."
exit 1
fi
- name: Prepare the existing Ansible SSH key
shell: bash
run: |
set -euo pipefail
KEY_PATH="${HOME}/.ssh/id_ed25519_ansible"
if [ ! -f "${KEY_PATH}" ]; then
echo "ERROR: Missing Ansible key: ${KEY_PATH}"
exit 1
fi
chmod 600 "${KEY_PATH}"
echo "ANSIBLE_PRIVATE_KEY_FILE=${KEY_PATH}" >> "${GITHUB_ENV}"
- name: Refresh load-balancer and control-plane SSH host keys
shell: bash
run: |
set -euo pipefail
mkdir -p "${HOME}/.ssh"
chmod 700 "${HOME}/.ssh"
touch "${HOME}/.ssh/known_hosts"
chmod 600 "${HOME}/.ssh/known_hosts"
for ip in 192.168.8.201 192.168.8.202 192.168.8.203 192.168.8.204; do
ssh-keygen -f "${HOME}/.ssh/known_hosts" -R "${ip}" || true
captured=false
for attempt in $(seq 1 30); do
if ssh-keyscan -T 5 -H "${ip}" >> "${HOME}/.ssh/known_hosts" 2>/dev/null; then
echo "SSH host key captured for ${ip}."
captured=true
break
fi
echo "Waiting for SSH on ${ip} (attempt ${attempt}/30)..."
sleep 10
done
if [ "${captured}" != "true" ]; then
echo "ERROR: Unable to capture SSH host key for ${ip}."
exit 1
fi
done
- name: Prepare the Ansible remote temporary directory
shell: bash
run: |
set -euo pipefail
for ip in 192.168.8.201 192.168.8.202 192.168.8.203 192.168.8.204; do
ssh \
-i "${ANSIBLE_PRIVATE_KEY_FILE}" \
-o IdentitiesOnly=yes \
-o BatchMode=yes \
"acllc@${ip}" \
'sudo install -d -m 0700 -o acllc -g acllc /var/tmp/ansible-acllc'
done
- name: Verify Ansible connectivity
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible \
-i inventories/shared-k8s/hosts.ini \
control_planes \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
-m ping
- name: Apply the common Ubuntu baseline
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
--limit control_planes \
playbooks/shared-k8s/01-common-baseline.yml
- name: Prepare containerd and Kubernetes prerequisites
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
--limit control_planes \
playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml
- name: Bootstrap the first control plane
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
playbooks/shared-k8s/04-bootstrap-first-control-plane.yml
- name: Install Calico before joining additional control planes
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
playbooks/shared-k8s/06-install-cni.yml
- name: Join the remaining control planes
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
playbooks/shared-k8s/05-join-control-planes.yml
- name: Verify the completed highly available control plane
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible \
-i inventories/shared-k8s/hosts.ini \
first_control_plane \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
-b \
-m shell \
-a '
set -e
export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl get nodes -o wide
kubectl get pods -A
kubectl get --raw=/readyz
'
ansible \
-i inventories/shared-k8s/hosts.ini \
load_balancers \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
-b \
-m shell \
-a '
set -eu
command -v socat >/dev/null 2>&1 || {
echo "ERROR: socat is not installed on the load balancer."
exit 1
}
test -S /run/haproxy/admin.sock || {
echo "ERROR: HAProxy runtime socket is missing."
exit 1
}
echo "=== HAPROXY INFORMATION ==="
printf "show info\n" |
socat - UNIX-CONNECT:/run/haproxy/admin.sock
echo
echo "=== HAPROXY BACKEND STATISTICS ==="
printf "show stat\n" |
socat - UNIX-CONNECT:/run/haproxy/admin.sock
'Workflow behavior:
| Event | Result |
|---|---|
Push to dev | Inventory and syntax validation only |
Push to prod | Validation and complete control-plane bootstrap |
Manual dispatch from prod | Idempotent control-plane reconciliation |
| Pull request | No workflow trigger |
31. Review and commit the Ansible change
git status
git diff --check
git diff --stat
git diff -- `
ansible `
kubernetes/common/cni `
.github/workflows/ansible-configure-control-planes.ymlConfirm:
- No private key, bootstrap token, certificate key, admin kubeconfig, or join command is committed.
- All package-installing roles force an APT cache refresh first.
- No
ansible_default_ipv4references remain. - Calico uses
10.244.0.0/16, not192.168.0.0/16. - Kubernetes uses the VIP DNS endpoint on port
443. - Only
.202,.203, and.204are control planes.
Commit and push:
git add `
ansible `
kubernetes/common/cni/calico-custom-resources.yaml `
.github/workflows/ansible-configure-control-planes.yml
git commit -m "Bootstrap shared Kubernetes control planes"
git push -u origin feature/bootstrap-k8s-control-planes32. Create the Ansible pull request into dev
gh pr create `
--base dev `
--head feature/bootstrap-k8s-control-planes `
--title "Bootstrap shared Kubernetes control planes" `
--body "Adds the Ubuntu baseline, containerd, Kubernetes prerequisites, kubeadm HA bootstrap, Calico CNI, and additional control-plane joins.Merge only after inventory and all five playbook syntax checks succeed.
33. Promote the Ansible change from dev to prod
gh pr create `
--base prod `
--head dev `
--title "Bootstrap shared Kubernetes control planes" `
--body "Promotes the validated highly available Kubernetes control-plane configuration to prod.After merge and environment approval, the workflow runs in this order:
- Refresh SSH host keys for the load balancer and all three control planes.
- Create the shared Ansible remote temporary directory.
- Verify Ansible connectivity to the three control planes.
- Apply the common Ubuntu baseline to the three control planes.
- Configure containerd and Kubernetes prerequisites.
- Bootstrap
cicd-ac-k8s-cp-01. - Install Calico.
- Join
cicd-ac-k8s-cp-02. - Join
cicd-ac-k8s-cp-03. - Wait for all three nodes to become
Ready. - Rebalance CoreDNS.
- Confirm
/readyzreturnsok. - Display HAProxy runtime and backend statistics from the existing load balancer.
34. Manual cluster verification
Run from prod-terraform-deploy-02:
ssh \
-i ~/.ssh/id_ed25519_ansible \
-o IdentitiesOnly=yes \
acllc@192.168.8.202 \
'sudo bash -c "
export KUBECONFIG=/etc/kubernetes/admin.conf
echo === NODES ===
kubectl get nodes -o wide
echo === PODS ===
kubectl get pods -A
echo === API READINESS ===
kubectl get --raw=/readyz
echo === ETCD PODS ===
kubectl -n kube-system get pods -l component=etcd -o wide
"' Verify HAProxy separately:
ssh \
-i ~/.ssh/id_ed25519_ansible \
-o IdentitiesOnly=yes \
acllc@192.168.8.201 \
'
command -v socat
sudo test -S /run/haproxy/admin.sock
printf "show stat\n" |
sudo socat - UNIX-CONNECT:/run/haproxy/admin.sock |
awk -F, "
NR == 1 ||
(\$1 == \"kubernetes_control_planes\" &&
\$2 ~ /^cicd-ac-k8s-cp-0[123]$/)
"
'The three HAProxy server rows must report UP after their API servers are healthy.
35. Expected final state
Control-plane VMs:
cicd-ac-k8s-cp-01 192.168.8.202 Ready
cicd-ac-k8s-cp-02 192.168.8.203 Ready
cicd-ac-k8s-cp-03 192.168.8.204 Ready
Kubernetes API endpoint:
cicd-ac-k8s-api.aspireclan.com:443
192.168.8.200:443
HAProxy:
Service: active and enabled
Runtime socket: /run/haproxy/admin.sock
Stats client: socat installed
Browser stats: 127.0.0.1:8404/stats
Backends:
cicd-ac-k8s-cp-01 UP
cicd-ac-k8s-cp-02 UP
cicd-ac-k8s-cp-03 UP
Kubernetes:
Version: v1.36.1
Topology: stacked etcd
Control planes: 3
CNI: Calico v3.32.0
Pod CIDR: 10.244.0.0/16
Service CIDR: 10.96.0.0/12
API readyz: ok
Still pending:
12 worker VMs
worker joins, labels, and taints
shared services
ARC controller
tenant runner scale sets36. Failure handling
Terraform proposes a load-balancer change
Stop. Do not apply. Correct the control-plane map or module references until the plan is exactly three additions.
A control-plane VM has the wrong address
Check its Proxmox MAC and router reservation. Do not add a static Netplan address.
APT reports a package is unavailable
The roles already force apt update. Check DNS, internet access, Ubuntu sources, and the Kubernetes pkgs.k8s.io repository rather than modifying the template.
kubeadm init fails
Do not repeatedly run it manually. Inspect:
sudo journalctl -u kubelet -n 200 --no-pager
sudo crictl ps -a
sudo crictl logs <CONTAINER_ID>
sudo kubeadm config validate --config /etc/kubernetes/kubeadm-init-config.yamlCorrect the role or template in Git. Use kubeadm reset only as an explicitly reviewed recovery action.
The first node remains NotReady
Before Calico is installed, NotReady is expected. After the CNI workflow completes, inspect:
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get pods -A
sudo KUBECONFIG=/etc/kubernetes/admin.conf kubectl get tigerastatus
sudo journalctl -u kubelet -n 200 --no-pagerAn additional control plane cannot join
The uploaded certificate key expires. Rerun the approved join playbook; it generates a fresh token and re-uploads the certificates automatically.
HAProxy still shows a backend as DOWN
On the affected control plane, verify:
sudo ss -lntp | grep ':6443'
sudo crictl ps | grep kube-apiserver
sudo journalctl -u kubelet -n 100 --no-pagerThe containerd CRI health check fails even though containerd is active
Do not search for a combined dotted value such as io.containerd.cri.v1.runtime. ctr plugins ls prints plugin type and plugin ID in separate columns. Use the corrected AWK-based health check in the containerd role.
Inspect the real output:
sudo ctr plugins ls
sudo grep -nE '^disabled_plugins|SystemdCgroup' /etc/containerd/config.toml
sudo systemctl status containerd --no-pagerAnsible reports unsupported parameters for ansible.legacy.command
This means task keywords such as register, changed_when, retries, delay, or until were indented inside the ansible.builtin.shell block. Align them with the module name exactly as shown in this page.
The final HAProxy check reports socat: not found
Do not install it manually as the permanent fix. Rerun the updated 02-configure-load-balancer.yml playbook. The HAProxy role now installs both haproxy and socat, verifies the runtime socket, and calls the Runtime API.
ansible-playbook -i inventories/shared-k8s/hosts.ini --private-key ~/.ssh/id_ed25519_ansible playbooks/shared-k8s/02-configure-load-balancer.yml37. Expected rebuild checkpoint after successful completion
Expected rebuild checkpoint after this page
Load-balancer VM provisioning Verified prerequisite
HAProxy configuration Verified prerequisite
Keepalived configuration Verified prerequisite
API VIP 192.168.8.200 Verified prerequisite
Control-plane Terraform definitions Applied
Control-plane VM provisioning Verified
Control-plane Ubuntu baseline Applied
Kubernetes prerequisites Applied
First control-plane bootstrap Verified
CNI installation Verified
Additional control-plane joins Verified
Development workers Next page
QA workers Later page
Production workers Later page
ARC controller Later page
Tenant runner scale sets Later pageThe next page should provision the four development worker VMs, join them to this cluster, and apply the approved environment=dev and workload=github-runner labels and taints.
38. Source consistency and rebuild validation criteria
Source consistency review for a from-scratch rebuild:
- Control-plane VM IDs, MAC addresses, reserved IPs, memory, disks, tags, and four-core sizing match the cleaned Terraform source.
- The reusable proxmox-vm-group module matches terraform/modules/proxmox-vm-group.
- The control-plane inventory entries, cluster variables, and control-plane port variables match the cleaned Ansible source for this phase.
- The common, containerd, kubernetes-common, and kubernetes-control-plane role blocks match the cleaned repository.
- The Calico custom resources use 10.244.0.0/16, VXLAN, and BGP disabled.
- The control-plane workflow uses component-specific path filters, validates on dev and prod pushes, and configures only from prod.
- The workflow does not use pull_request triggers.
- The load-balancer configuration is a prerequisite from the preceding page; this workflow does not recreate the load balancer.
- Worker inventory entries are intentionally omitted at this checkpoint and are added by the later worker pages.
- No statement in this page assumes that a previous live Kubernetes cluster still exists.
A successful rebuild must independently demonstrate:
- All three control-plane VMs are reachable through their approved DHCP reservations.
- containerd and CRI validation succeed on all three nodes.
- kubeadm initializes cicd-ac-k8s-cp-01.
- Calico becomes Available.
- cicd-ac-k8s-cp-02 and cicd-ac-k8s-cp-03 join successfully.
- All three control-plane nodes report Ready.
- /readyz returns ok through the shared API endpoint.
- HAProxy shows the three control-plane backend rows through its runtime socket.The source blocks on this page retain the corrected containerd plugin parsing, correct Ansible task-keyword indentation, and HAProxy management of socat. Treat the runtime checks as rebuild acceptance criteria rather than evidence that a previous cluster still exists.