Skip to main content

Provision and Join the Four Kubernetes Development Workers

This is the fifth infrastructure page in the from-scratch build sequence. Follow it after the control-plane page has produced a healthy three-node Kubernetes control plane; it provisions the four development ARC worker VMs.

FROM-SCRATCH SEQUENCE CHECKPOINT

Required before this page
  Load-balancer VM, HAProxy, Keepalived, and API VIP configured
  Three control-plane VMs provisioned and reachable
  Kubernetes v1.36.1 control plane bootstrapped
  Calico v3.32.0 installed
  All three control planes Ready

Implemented by this page
  Development worker Terraform definitions
  Four development worker VM provisions
  Ubuntu baseline and Kubernetes node preparation
  Development worker joins
  Development worker labels and taints

Implemented by later pages
  QA workers
  Production workers
  ARC controller
  Tenant runner scale sets

1. Scope and execution order

This page is performed in two separate promotions:

  1. Terraform promotion: create only the four development worker VMs.
  2. Ansible promotion: after all four VMs are reachable, configure Ubuntu, containerd, Kubernetes prerequisites, join the workers, and apply the approved development labels and taints.

Do not combine these two changes into the same production promotion. The Ansible workflow must not start until Terraform has created all four VMs and DHCP has assigned the approved addresses.

The approved branch model is:

feature/*
    ↓ pull request
   dev
    ↓ validation and Terraform plan only
 dev → prod pull request
    ↓ review and shared-k8s approval
   prod
    ↓ Terraform apply or Ansible configuration

Not used by this infrastructure execution flow:
local, qa, main
Branch rule

dev performs validation and Terraform plan only. prod performs Terraform apply or Ansible configuration. Do not use local or qa in this infrastructure execution flow.

2. Approved development worker allocation

cicd-ac-k8s-dev-wk-01
  VM ID:       3156213
  MAC:         aa:bb:cc:06:0f:01
  Reserved IP: 192.168.8.213
  CPU:         4 vCPU
  RAM:         16384 MB
  Disk:        scsi0, 250G, local-lvm

cicd-ac-k8s-dev-wk-02
  VM ID:       3156214
  MAC:         aa:bb:cc:06:0f:02
  Reserved IP: 192.168.8.214
  CPU:         4 vCPU
  RAM:         16384 MB
  Disk:        scsi0, 250G, local-lvm

cicd-ac-k8s-dev-wk-03
  VM ID:       3156215
  MAC:         aa:bb:cc:06:0f:03
  Reserved IP: 192.168.8.215
  CPU:         4 vCPU
  RAM:         16384 MB
  Disk:        scsi0, 250G, local-lvm

cicd-ac-k8s-dev-wk-04
  VM ID:       3156216
  MAC:         aa:bb:cc:06:0f:04
  Reserved IP: 192.168.8.216
  CPU:         4 vCPU
  RAM:         16384 MB
  Disk:        scsi0, 250G, local-lvm

Shared values
  Template:    tmplt-ub-26-min-base / VM ID 90000
  Node:        pve
  Bridge:      vmbr0
  Environment: dev
  Workload:    github-runner

The environment-specific address ranges are intentionally not ordered dev-first:

Approved worker address order
  Production workers: 192.168.8.205-.208
  QA workers:         192.168.8.209-.212
  Development workers:192.168.8.213-.216

Development workers therefore use VM IDs 3156213-.3156216.

Confirm the router has all four DHCP reservations before running Terraform. Ubuntu must continue using DHCP; do not configure static Netplan addresses.

Resource and identity lock

Do not change the approved VM IDs, MAC addresses, IP reservations, scsi0 disk slot, local-lvm storage, or development environment assignment while following this page.


Part A — Provision the Development Worker VMs with Terraform

3. Terraform files changed

terraform/stacks/shared-k8s/main.tf
terraform/stacks/shared-k8s/outputs.tf

The proxmox-vm and proxmox-vm-group modules created by the preceding pages remain unchanged. This page adds a new development-worker map to the existing shared Kubernetes stack.

4. Create the Terraform feature branch from dev

Run from Windows PowerShell:

cd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra

git switch dev
git pull --ff-only origin dev

git switch -c feature/provision-k8s-dev-workers

5. Extend terraform/stacks/shared-k8s/main.tf

Do not replace or edit the completed load-balancer and control-plane definitions. Append this block after the existing control-plane module:

locals {
  dev_workers = {
    dev_wk01 = {
      name          = "cicd-ac-k8s-dev-wk-01"
      description   = "Aspireclan shared Kubernetes development ARC worker 01"
      vmid          = 3156213
      target_node   = "pve"
      template_name = "tmplt-ub-26-min-base"
      cores         = 4
      memory_mb     = 16384
      disk_size     = "250G"
      storage       = "local-lvm"
      bridge        = "vmbr0"
      mac_address   = "aa:bb:cc:06:0f:01"
      tags = [
        "ac-cicd",
        "shared-k8s",
        "worker",
        "dev",
        "arc-runner",
        "terraform",
        "ansible",
      ]
    }

    dev_wk02 = {
      name          = "cicd-ac-k8s-dev-wk-02"
      description   = "Aspireclan shared Kubernetes development ARC worker 02"
      vmid          = 3156214
      target_node   = "pve"
      template_name = "tmplt-ub-26-min-base"
      cores         = 4
      memory_mb     = 16384
      disk_size     = "250G"
      storage       = "local-lvm"
      bridge        = "vmbr0"
      mac_address   = "aa:bb:cc:06:0f:02"
      tags = [
        "ac-cicd",
        "shared-k8s",
        "worker",
        "dev",
        "arc-runner",
        "terraform",
        "ansible",
      ]
    }

    dev_wk03 = {
      name          = "cicd-ac-k8s-dev-wk-03"
      description   = "Aspireclan shared Kubernetes development ARC worker 03"
      vmid          = 3156215
      target_node   = "pve"
      template_name = "tmplt-ub-26-min-base"
      cores         = 4
      memory_mb     = 16384
      disk_size     = "250G"
      storage       = "local-lvm"
      bridge        = "vmbr0"
      mac_address   = "aa:bb:cc:06:0f:03"
      tags = [
        "ac-cicd",
        "shared-k8s",
        "worker",
        "dev",
        "arc-runner",
        "terraform",
        "ansible",
      ]
    }

    dev_wk04 = {
      name          = "cicd-ac-k8s-dev-wk-04"
      description   = "Aspireclan shared Kubernetes development ARC worker 04"
      vmid          = 3156216
      target_node   = "pve"
      template_name = "tmplt-ub-26-min-base"
      cores         = 4
      memory_mb     = 16384
      disk_size     = "250G"
      storage       = "local-lvm"
      bridge        = "vmbr0"
      mac_address   = "aa:bb:cc:06:0f:04"
      tags = [
        "ac-cicd",
        "shared-k8s",
        "worker",
        "dev",
        "arc-runner",
        "terraform",
        "ansible",
      ]
    }
  }
}

module "dev_workers" {
  source = "../../modules/proxmox-vm-group"

  vms = local.dev_workers
}

The four workers use the approved modified sizing of 4 vCPU, 16 GB RAM, and 250 GB disk per VM.

6. Extend terraform/stacks/shared-k8s/outputs.tf

Append:

output "dev_workers" {
  description = "Shared Kubernetes development worker VMs."

  value = {
    dev_wk01 = merge(module.dev_workers.vms["dev_wk01"], {
      reserved_ip = "192.168.8.213"
      mac_address = "aa:bb:cc:06:0f:01"
      environment = "dev"
    })
    dev_wk02 = merge(module.dev_workers.vms["dev_wk02"], {
      reserved_ip = "192.168.8.214"
      mac_address = "aa:bb:cc:06:0f:02"
      environment = "dev"
    })
    dev_wk03 = merge(module.dev_workers.vms["dev_wk03"], {
      reserved_ip = "192.168.8.215"
      mac_address = "aa:bb:cc:06:0f:03"
      environment = "dev"
    })
    dev_wk04 = merge(module.dev_workers.vms["dev_wk04"], {
      reserved_ip = "192.168.8.216"
      mac_address = "aa:bb:cc:06:0f:04"
      environment = "dev"
    })
  }
}

7. Confirm the existing Terraform workflow contract

The Terraform workflows created by the preceding pages must implement:

Terraform plan workflow
  push branch:           dev
  manual dispatch:       supported
  pull_request trigger:  not used
  action:                fmt, init, validate, plan

Terraform apply workflow
  push branch:           prod
  manual dispatch:       supported from prod
  action:                fmt, init, validate, saved plan, apply

Persistent state
  /var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate

No new workflow is required for this Terraform change. The current path filters cover the shared stack.

8. Review and commit the Terraform change

git status
git diff --check
git diff --stat

git diff -- \
  terraform/stacks/shared-k8s/main.tf \
  terraform/stacks/shared-k8s/outputs.tf

Confirm:

  • The load-balancer and three control-plane definitions created by the preceding pages remain unchanged.
  • Exactly four VM additions are present.
  • VM IDs are 3156213 through 3156216.
  • MAC addresses are aa:bb:cc:06:0f:01 through aa:bb:cc:06:0f:04.
  • Each worker uses 4 cores, 16384 MB RAM, and a 250G scsi0 disk on local-lvm.
  • No Terraform state, Proxmox token, kubeconfig, join command, or private SSH key is staged.

Commit and push:

git add \
  terraform/stacks/shared-k8s/main.tf \
  terraform/stacks/shared-k8s/outputs.tf

git commit -m "Provision shared Kubernetes development workers"

git push -u origin feature/provision-k8s-dev-workers

9. Create the Terraform pull request into dev

gh pr create \
  --base dev \
  --head feature/provision-k8s-dev-workers \
  --title "Provision shared Kubernetes development workers" \
  --body "Adds the four approved development worker VMs to the shared Kubernetes Terraform stack."

After merge, the dev plan must end with:

Plan: 4 to add, 0 to change, 0 to destroy.
Terraform stop conditions

Do not promote to prod when Terraform proposes any update, replacement, or deletion of the load balancer or control planes, or anything other than the four approved development workers.

10. Promote the Terraform change from dev to prod

gh pr create \
  --base prod \
  --head dev \
  --title "Provision shared Kubernetes development workers" \
  --body "Promotes the validated four-development-worker Terraform plan to prod."

After merge and shared-k8s environment approval, the production workflow applies the saved plan and writes the new worker resources to the existing persistent Terraform state.

11. Verify the worker VMs in Proxmox

Run on the Proxmox host:

qm status 3156213
qm config 3156213

qm status 3156214
qm config 3156214

qm status 3156215
qm config 3156215

qm status 3156216
qm config 3156216

For every VM, confirm:

  • Status is running.
  • CPU is 4 cores.
  • RAM is 16384 MB.
  • scsi0 is on local-lvm with size 250G.
  • The expected MAC address is present.
  • onboot remains enabled.

12. Verify DHCP, SSH, and sudo

Run from prod-terraform-deploy-02:

for ip in 213 214 215 216; do
  echo "=== 192.168.8.${ip} ==="

  ping -c 2 -W 2 "192.168.8.${ip}"

  ssh \
    -i ~/.ssh/id_ed25519_ansible \
    -o IdentitiesOnly=yes \
    -o BatchMode=yes \
    -o ConnectTimeout=10 \
    "acllc@192.168.8.${ip}" \
    'hostnamectl --static; ip -brief address; sudo -n whoami'
done

Expected before Ansible:

  • .213 through .216 respond.
  • The Ansible automation key authenticates as acllc.
  • sudo -n whoami returns root.
  • The Ubuntu hostname may still show the base-template hostname.

Stop here until all four workers pass SSH and passwordless sudo checks.


Part B — Configure and Join the Development Workers with Ansible

13. Ansible files changed

ansible/inventories/shared-k8s/hosts.ini
ansible/inventories/shared-k8s/group_vars/control_planes.yml
ansible/inventories/shared-k8s/group_vars/dev_workers.yml
ansible/roles/kubernetes-common/tasks/main.yml
ansible/roles/kubernetes-worker/defaults/main.yml
ansible/roles/kubernetes-worker/tasks/main.yml
ansible/playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml
ansible/playbooks/shared-k8s/07-join-dev-workers.yml
ansible/playbooks/shared-k8s/08-label-and-taint-workers.yml
.github/workflows/ansible-configure-dev-workers.yml

The common, containerd, and Kubernetes roles carried forward from the control-plane page retain these required fixes:

  • Forced APT cache refresh before package installation.
  • Package-install retries.
  • ansible_facts access instead of deprecated injected fact variables.
  • /var/tmp/ansible-acllc as the remote temporary directory.
  • Containerd 2.x-compatible CRI plugin validation.
  • Correct task-level indentation for register, retries, delay, and until.
  • Explicit crictl runtime and image endpoints.

14. Create the Ansible feature branch from dev

Create this branch only after the production Terraform apply has completed successfully:

cd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra

git switch dev
git pull --ff-only origin dev

git switch -c feature/configure-k8s-dev-workers

15. Replace the shared inventory

Replace ansible/inventories/shared-k8s/hosts.ini with:

[load_balancers]
cicd-ac-k8s-lb-01 ansible_host=192.168.8.201 ansible_user=acllc node_primary_ip=192.168.8.201 node_interface=ens18

[first_control_plane]
cicd-ac-k8s-cp-01 ansible_host=192.168.8.202 ansible_user=acllc node_primary_ip=192.168.8.202 node_interface=ens18

[additional_control_planes]
cicd-ac-k8s-cp-02 ansible_host=192.168.8.203 ansible_user=acllc node_primary_ip=192.168.8.203 node_interface=ens18
cicd-ac-k8s-cp-03 ansible_host=192.168.8.204 ansible_user=acllc node_primary_ip=192.168.8.204 node_interface=ens18

[control_planes:children]
first_control_plane
additional_control_planes

[dev_workers]
cicd-ac-k8s-dev-wk-01 ansible_host=192.168.8.213 ansible_user=acllc node_primary_ip=192.168.8.213 node_interface=ens18
cicd-ac-k8s-dev-wk-02 ansible_host=192.168.8.214 ansible_user=acllc node_primary_ip=192.168.8.214 node_interface=ens18
cicd-ac-k8s-dev-wk-03 ansible_host=192.168.8.215 ansible_user=acllc node_primary_ip=192.168.8.215 node_interface=ens18
cicd-ac-k8s-dev-wk-04 ansible_host=192.168.8.216 ansible_user=acllc node_primary_ip=192.168.8.216 node_interface=ens18

[qa_workers]

[prod_workers]

[workers:children]
dev_workers
qa_workers
prod_workers

[k8s_cluster:children]
control_planes
workers

[all:vars]
ansible_python_interpreter=/usr/bin/python3

This allocation intentionally places development workers at .213-.216. QA and production groups remain empty until their own pages are completed.

16. Generalize node firewall variables

Replace ansible/inventories/shared-k8s/group_vars/control_planes.yml with:

---
kubernetes_node_tcp_ports:
  - "6443"
  - "2379:2380"
  - "10250"
  - "10257"
  - "10259"

calico_node_tcp_ports:
  - "5473"

calico_node_udp_ports:
  - "4789"

Create ansible/inventories/shared-k8s/group_vars/dev_workers.yml:

---
worker_environment: dev
worker_workload: github-runner

kubernetes_node_tcp_ports:
  - "10250"

calico_node_tcp_ports:
  - "5473"

calico_node_udp_ports:
  - "4789"

worker_labels:
  environment: dev
  workload: github-runner

worker_taints:
  - key: environment
    value: dev
    effect: NoSchedule

UFW remains disabled unless a later hardening phase enables it. These rules are pre-created so later firewall activation does not break kubelet or Calico communication.

17. Update the generic Kubernetes firewall tasks

In ansible/roles/kubernetes-common/tasks/main.yml, replace the three existing control-plane-specific UFW tasks with:

- name: Allow approved Kubernetes control-plane TCP ports through UFW
  ansible.builtin.command:
    cmd: >-
      ufw allow from 192.168.8.0/22 to any port {{ item }} proto tcp
  loop: "{{ kubernetes_node_tcp_ports | default([]) }}"
  register: kubernetes_ufw_tcp_rules
  changed_when: "'Rule added' in kubernetes_ufw_tcp_rules.stdout"

- name: Allow Calico node TCP ports through UFW
  ansible.builtin.command:
    cmd: >-
      ufw allow from 192.168.8.0/22 to any port {{ item }} proto tcp
  loop: "{{ calico_node_tcp_ports | default([]) }}"
  register: calico_ufw_tcp_rules
  changed_when: "'Rule added' in calico_ufw_tcp_rules.stdout"

- name: Allow Calico node UDP ports through UFW
  ansible.builtin.command:
    cmd: >-
      ufw allow from 192.168.8.0/22 to any port {{ item }} proto udp
  loop: "{{ calico_node_udp_ports | default([]) }}"
  register: calico_ufw_udp_rules
  changed_when: "'Rule added' in calico_ufw_udp_rules.stdout"

Keep every other task in the role unchanged, including the corrected containerd/CRI validation and explicit crictl endpoint verification.

18. Generalize the Kubernetes preparation playbook

Replace ansible/playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml with:

---
- name: Prepare Kubernetes nodes
  hosts: k8s_cluster
  become: true
  gather_facts: true

  roles:
    - role: containerd
    - role: kubernetes-common

The worker workflow uses --limit dev_workers, so this playbook prepares only the four new workers during this phase.

19. Implement the Kubernetes worker role

19.1 Create or replace ansible/roles/kubernetes-worker/defaults/main.yml

---
worker_join_command: ""

19.2 Create or replace ansible/roles/kubernetes-worker/tasks/main.yml

---
- name: Confirm a generated worker join command is available
  ansible.builtin.assert:
    that:
      - worker_join_command | length > 0
    fail_msg: "The first control plane did not provide a worker join command."
  no_log: true

- name: Check whether this worker is already joined
  ansible.builtin.stat:
    path: /etc/kubernetes/kubelet.conf
  register: worker_kubelet_config

- name: Join this node as a Kubernetes worker
  ansible.builtin.command:
    cmd: >-
      {{ worker_join_command }}
      --node-name {{ inventory_hostname }}
      --cri-socket {{ kubernetes_cri_socket }}
    creates: /etc/kubernetes/kubelet.conf
  when: not worker_kubelet_config.stat.exists
  no_log: true

- name: Enable and start kubelet
  ansible.builtin.service:
    name: kubelet
    enabled: true
    state: started

- name: Wait for the kubelet configuration to exist
  ansible.builtin.wait_for:
    path: /etc/kubernetes/kubelet.conf
    timeout: 300

The creates: /etc/kubernetes/kubelet.conf guard makes normal reruns safe. A worker that has already joined is not joined again.

20. Add the worker-join playbook

Create or replace ansible/playbooks/shared-k8s/07-join-dev-workers.yml:

---
- name: Generate a fresh Kubernetes worker join command
  hosts: first_control_plane
  become: true
  gather_facts: false

  tasks:
    - name: Confirm the Kubernetes API is ready
      ansible.builtin.command:
        cmd: kubectl --kubeconfig=/etc/kubernetes/admin.conf get --raw=/readyz
      register: worker_join_api_ready
      changed_when: false
      retries: 12
      delay: 10
      until: worker_join_api_ready.stdout | trim == "ok"

    - name: Generate a fresh worker bootstrap-token join command
      ansible.builtin.command:
        cmd: kubeadm token create --ttl 2h --print-join-command
      register: generated_worker_join_command
      changed_when: true
      no_log: true

    - name: Store the temporary worker join command in memory
      ansible.builtin.set_fact:
        shared_worker_join_command: "{{ generated_worker_join_command.stdout }}"
      no_log: true

- name: Join the development workers one at a time
  hosts: dev_workers
  serial: 1
  become: true
  gather_facts: true

  vars:
    worker_join_command: >-
      {{ hostvars[groups['first_control_plane'][0]].shared_worker_join_command }}

  roles:
    - role: kubernetes-worker

- name: Verify all development workers are Ready
  hosts: first_control_plane
  become: true
  gather_facts: false

  environment:
    KUBECONFIG: /etc/kubernetes/admin.conf

  tasks:
    - name: Wait for each development worker to become Ready
      ansible.builtin.command:
        cmd: >-
          kubectl wait
          --for=condition=Ready
          node/{{ item }}
          --timeout=15m
      loop: "{{ groups['dev_workers'] }}"
      changed_when: false

    - name: Display the joined development workers
      ansible.builtin.command:
        cmd: kubectl get nodes -o wide
      register: joined_dev_workers
      changed_when: false

    - name: Print the development worker table
      ansible.builtin.debug:
        var: joined_dev_workers.stdout_lines

The join token exists only in Ansible memory, is hidden from logs, expires after two hours, and is never committed to Git.

21. Add the development labels-and-taints playbook

Create or replace ansible/playbooks/shared-k8s/08-label-and-taint-workers.yml:

---
- name: Apply development worker labels and taints
  hosts: first_control_plane
  become: true
  gather_facts: false

  environment:
    KUBECONFIG: /etc/kubernetes/admin.conf

  tasks:
    - name: Apply the approved development worker labels
      ansible.builtin.command:
        cmd: >-
          kubectl label node {{ item }}
          environment={{ hostvars[item].worker_environment | default('dev') }}
          workload={{ hostvars[item].worker_workload | default('github-runner') }}
          --overwrite
      loop: "{{ groups['dev_workers'] }}"
      register: dev_worker_label_results
      changed_when: "'not labeled' not in dev_worker_label_results.stdout"

    - name: Apply the approved development worker taint
      ansible.builtin.command:
        cmd: >-
          kubectl taint node {{ item }}
          environment=dev:NoSchedule
          --overwrite
      loop: "{{ groups['dev_workers'] }}"
      register: dev_worker_taint_results
      changed_when: "'not tainted' not in dev_worker_taint_results.stdout"

    - name: Display development worker labels
      ansible.builtin.command:
        cmd: >-
          kubectl get nodes
          -l environment=dev,workload=github-runner
          -L environment,workload
          -o wide
      register: labeled_dev_workers
      changed_when: false

    - name: Print labeled development workers
      ansible.builtin.debug:
        var: labeled_dev_workers.stdout_lines

    - name: Verify the development taint on every worker
      ansible.builtin.shell:
        executable: /bin/bash
        cmd: |
          set -euo pipefail

          kubectl get node {{ item }} \
            -o jsonpath='{range .spec.taints[*]}{.key}={.value}:{.effect}{"\n"}{end}' |
            grep -Fx 'environment=dev:NoSchedule'
      loop: "{{ groups['dev_workers'] }}"
      changed_when: false

Every development worker receives:

environment=dev
workload=github-runner

environment=dev:NoSchedule

The future development ARC runner scale sets must use a matching node selector and toleration.

22. Add the development-worker GitHub Actions workflow

Create .github/workflows/ansible-configure-dev-workers.yml:

name: Ansible Configure - Kubernetes Development Workers

on:
  push:
    branches:
      - dev
      - prod
    paths:
      - "ansible/inventories/shared-k8s/group_vars/dev_workers.yml"
      - "ansible/playbooks/shared-k8s/07-join-dev-workers.yml"
      - "ansible/playbooks/shared-k8s/08-label-and-taint-workers.yml"
      - ".github/workflows/ansible-configure-dev-workers.yml"

  workflow_dispatch:

permissions:
  contents: read

concurrency:
  group: shared-k8s-ansible
  cancel-in-progress: false

env:
  ANSIBLE_CONFIG: ${{ github.workspace }}/ansible/ansible.cfg

jobs:
  validate:
    name: Validate development-worker Ansible configuration
    runs-on:
      - self-hosted
      - Linux
      - X64
      - prod
      - terraform
      - deploy
      - ac-cicd-infra

    steps:
      - name: Checkout repository
        uses: actions/checkout@v5

      - name: Verify Ansible
        shell: bash
        run: |
          set -euo pipefail
          ansible --version
          ansible-playbook --version

      - name: Validate the shared inventory
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail
          ansible-inventory -i inventories/shared-k8s/hosts.ini --graph

      - name: Syntax-check the development-worker playbooks
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail

          for playbook in \
            playbooks/shared-k8s/01-common-baseline.yml \
            playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml \
            playbooks/shared-k8s/07-join-dev-workers.yml \
            playbooks/shared-k8s/08-label-and-taint-workers.yml
          do
            ansible-playbook \
              -i inventories/shared-k8s/hosts.ini \
              "${playbook}" \
              --syntax-check
          done

  configure:
    name: Configure and join development workers
    needs:
      - validate

    if: >-
      (github.event_name == 'push' && github.ref_name == 'prod') ||
      (github.event_name == 'workflow_dispatch' && github.ref_name == 'prod')

    environment:
      name: shared-k8s

    runs-on:
      - self-hosted
      - Linux
      - X64
      - prod
      - terraform
      - deploy
      - ac-cicd-infra

    timeout-minutes: 180

    steps:
      - name: Checkout repository
        uses: actions/checkout@v5

      - name: Verify the production branch
        shell: bash
        run: |
          set -euo pipefail

          if [ "${GITHUB_REF_NAME}" != "prod" ]; then
            echo "ERROR: Development-worker configuration is permitted only from prod."
            exit 1
          fi

      - name: Prepare the existing Ansible SSH key
        shell: bash
        run: |
          set -euo pipefail

          KEY_PATH="${HOME}/.ssh/id_ed25519_ansible"

          if [ ! -f "${KEY_PATH}" ]; then
            echo "ERROR: Missing Ansible key: ${KEY_PATH}"
            exit 1
          fi

          chmod 600 "${KEY_PATH}"
          echo "ANSIBLE_PRIVATE_KEY_FILE=${KEY_PATH}" >> "${GITHUB_ENV}"

      - name: Refresh development-worker SSH host keys
        shell: bash
        run: |
          set -euo pipefail

          mkdir -p "${HOME}/.ssh"
          chmod 700 "${HOME}/.ssh"
          touch "${HOME}/.ssh/known_hosts"
          chmod 600 "${HOME}/.ssh/known_hosts"

          for ip in 192.168.8.213 192.168.8.214 192.168.8.215 192.168.8.216; do
            ssh-keygen -f "${HOME}/.ssh/known_hosts" -R "${ip}" || true

            captured=false
            for attempt in $(seq 1 60); do
              if ssh-keyscan -T 5 -H "${ip}" >> "${HOME}/.ssh/known_hosts" 2>/dev/null; then
                echo "SSH host key captured for ${ip}."
                captured=true
                break
              fi

              echo "Waiting for SSH on ${ip} (attempt ${attempt}/60)..."
              sleep 10
            done

            if [ "${captured}" != "true" ]; then
              echo "ERROR: Unable to capture SSH host key for ${ip}."
              exit 1
            fi
          done

      - name: Prepare the Ansible remote temporary directory
        shell: bash
        run: |
          set -euo pipefail

          for ip in 192.168.8.213 192.168.8.214 192.168.8.215 192.168.8.216; do
            ssh \
              -i "${ANSIBLE_PRIVATE_KEY_FILE}" \
              -o IdentitiesOnly=yes \
              -o BatchMode=yes \
              "acllc@${ip}" \
              'sudo install -d -m 0700 -o acllc -g acllc /var/tmp/ansible-acllc'
          done

      - name: Verify Ansible connectivity
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail

          ansible \
            -i inventories/shared-k8s/hosts.ini \
            dev_workers \
            --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
            -m ping

      - name: Confirm the existing Kubernetes API is healthy
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail

          ansible \
            -i inventories/shared-k8s/hosts.ini \
            first_control_plane \
            --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
            -b \
            -m command \
            -a 'kubectl --kubeconfig=/etc/kubernetes/admin.conf get --raw=/readyz'

      - name: Apply the common Ubuntu baseline
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail

          ansible-playbook \
            -i inventories/shared-k8s/hosts.ini \
            --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
            --limit dev_workers \
            playbooks/shared-k8s/01-common-baseline.yml

      - name: Prepare containerd and Kubernetes prerequisites
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail

          ansible-playbook \
            -i inventories/shared-k8s/hosts.ini \
            --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
            --limit dev_workers \
            playbooks/shared-k8s/03-prepare-kubernetes-nodes.yml

      - name: Join the development workers
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail

          ansible-playbook \
            -i inventories/shared-k8s/hosts.ini \
            --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
            playbooks/shared-k8s/07-join-dev-workers.yml

      - name: Apply development labels and taints
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail

          ansible-playbook \
            -i inventories/shared-k8s/hosts.ini \
            --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
            playbooks/shared-k8s/08-label-and-taint-workers.yml

      - name: Verify the completed development workers
        working-directory: ansible
        shell: bash
        run: |
          set -euo pipefail

          ansible \
            -i inventories/shared-k8s/hosts.ini \
            first_control_plane \
            --private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
            -b \
            -m shell \
            -a '
              set -e
              export KUBECONFIG=/etc/kubernetes/admin.conf
              kubectl get nodes -l environment=dev,workload=github-runner -L environment,workload -o wide
              kubectl get pods -A
              kubectl get --raw=/readyz
            ' 

The path filter and syntax-check list intentionally reference 07-join-dev-workers.yml. Keep that filename consistent across the playbook, workflow paths, syntax checks, and execution step.

Workflow behavior:

EventResult
Push to devInventory and playbook syntax validation only
Push to prodValidation, baseline, Kubernetes preparation, join, labels, taints, and verification
Manual dispatch from prodIdempotent development-worker reconciliation

23. Review and commit the Ansible change

git status
git diff --check
git diff --stat

git diff -- \
  ansible \
  .github/workflows/ansible-configure-dev-workers.yml

Confirm:

  • No private key, bootstrap token, kubeconfig, join command, or Terraform state is staged.
  • All four development worker IPs are .213-.216.
  • Every worker uses environment=dev and workload=github-runner.
  • The taint is exactly environment=dev:NoSchedule.
  • The workflow performs configuration only from prod.
  • The existing control-plane inventory and variables remain present.

Commit and push:

git add \
  ansible \
  .github/workflows/ansible-configure-dev-workers.yml

git commit -m "Configure shared Kubernetes development workers"

git push -u origin feature/configure-k8s-dev-workers

24. Create the Ansible pull request into dev

gh pr create \
  --base dev \
  --head feature/configure-k8s-dev-workers \
  --title "Configure shared Kubernetes development workers" \
  --body "Adds the Ubuntu baseline, Kubernetes prerequisites, worker joins, and approved development labels and taints."

Merge only after inventory validation and all four playbook syntax checks succeed.

25. Promote the Ansible change from dev to prod

gh pr create \
  --base prod \
  --head dev \
  --title "Configure shared Kubernetes development workers" \
  --body "Promotes the validated four-development-worker configuration to prod."

After merge and environment approval, the workflow runs in this order:

  1. Verify the existing Kubernetes API is healthy.
  2. Apply the common Ubuntu baseline to the four development workers.
  3. Configure containerd and Kubernetes prerequisites.
  4. Generate a temporary worker join command on cicd-ac-k8s-cp-01.
  5. Join the four workers serially.
  6. Wait for every development worker to become Ready.
  7. Apply the approved labels and taint.
  8. Verify the development worker set through the Kubernetes API.

26. Manual verification

Run from prod-terraform-deploy-02:

ssh \
  -i ~/.ssh/id_ed25519_ansible \
  -o IdentitiesOnly=yes \
  acllc@192.168.8.202 \
  'sudo bash -c "
    export KUBECONFIG=/etc/kubernetes/admin.conf

    echo === DEVELOPMENT WORKERS ===
    kubectl get nodes \
      -l environment=dev,workload=github-runner \
      -L environment,workload \
      -o wide

    echo === TAINTS ===
    for node in \
      cicd-ac-k8s-dev-wk-01 \
      cicd-ac-k8s-dev-wk-02 \
      cicd-ac-k8s-dev-wk-03 \
      cicd-ac-k8s-dev-wk-04
    do
      echo --- ${node} ---
      kubectl get node ${node} \
        -o jsonpath="{range .spec.taints[*]}{.key}={.value}:{.effect}{'\\n'}{end}"
    done

    echo === API READINESS ===
    kubectl get --raw=/readyz
  "' 

The worker rows must be Ready, show environment=dev and workload=github-runner, and contain the environment=dev:NoSchedule taint.

27. Expected final state

Development worker VMs:
  cicd-ac-k8s-dev-wk-01  192.168.8.213  Ready
  cicd-ac-k8s-dev-wk-02  192.168.8.214  Ready
  cicd-ac-k8s-dev-wk-03  192.168.8.215  Ready
  cicd-ac-k8s-dev-wk-04  192.168.8.216  Ready

Kubernetes labels on every development worker:
  environment=dev
  workload=github-runner

Kubernetes taint on every development worker:
  environment=dev:NoSchedule

Cluster state:
  3 control planes Ready
  4 development workers Ready
  Kubernetes API readyz: ok

Still pending:
  4 QA workers
  4 production workers
  shared cluster services
  ARC controller
  tenant runner scale sets

28. Failure handling

Terraform proposes changes to existing infrastructure

Stop. Do not apply. The plan must be exactly four additions and no changes or deletions.

A worker receives the wrong DHCP address

Check its Proxmox MAC and router reservation. Do not configure a static Netplan address.

APT reports a package is unavailable

The roles already force an APT refresh. Check DNS, internet access, Ubuntu repository files, and pkgs.k8s.io. Do not rebuild the template merely to preload packages.

Containerd CRI validation fails

Run on the affected worker:

sudo systemctl status containerd --no-pager
sudo ctr plugins ls
sudo grep -nE 'disabled_plugins|SystemdCgroup' /etc/containerd/config.toml
sudo crictl --runtime-endpoint unix:///run/containerd/containerd.sock info

The containerd role must continue using the corrected type-and-ID column checks for both legacy and containerd 2.x plugin layouts.

A worker cannot join

Inspect:

sudo journalctl -u kubelet -n 200 --no-pager
sudo crictl ps -a
sudo test -f /etc/kubernetes/kubelet.conf && echo joined || echo not-joined

Rerun the approved worker-join workflow. It generates a fresh token automatically. Do not paste join commands into Git.

A worker remains NotReady

From cicd-ac-k8s-cp-01, inspect:

sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get nodes -o wide
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get pods -A -o wide
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf describe node <WORKER_NAME>

Also verify Calico and kubelet logs on the affected worker.

ARC pods are pending after the controller is installed later

The development runner values must include a matching selector and toleration:

nodeSelector:
environment: dev
workload: github-runner

tolerations:
- key: environment
  operator: Equal
  value: dev
  effect: NoSchedule

29. Project status after successful completion

Use this as the expected acceptance checkpoint for a fresh rebuild. It does not assert that any previously existing Kubernetes VM or cluster is still present.

EXPECTED CHECKPOINT AFTER COMPLETING THIS PAGE

Infrastructure expected to exist
  API load balancer and VIP available
  Three control-plane nodes Ready
  Four development worker VMs running at 192.168.8.213-.216

Development worker acceptance criteria
  All four workers report Ready
  environment=dev label present
  workload=github-runner label present
  environment=dev:NoSchedule taint present
  Kubernetes API /readyz returns ok

Still implemented by later pages
  Four QA workers at 192.168.8.209-.212
  Four production workers at 192.168.8.205-.208
  Shared ARC controller
  Repository and environment runner scale sets

This is a rebuild acceptance checkpoint, not a statement about the current live environment.

After this rebuild checkpoint passes, continue to the QA-worker page to provision .209-.212 and apply environment=qa, workload=github-runner, and environment=qa:NoSchedule.