K8S CI/CD Infrastructure Overview
1. Purpose
This document is the single reference for the Aspireclan shared CI/CD infrastructure project.
The repository will provide reusable infrastructure for:
- FP
- Shelvera
- Future Aspireclan products
- Multiple GitHub repositories per product
- Development, QA, and production environments
The solution will use:
- Proxmox VE for virtual machines
- Terraform for VM provisioning
- Ansible for operating-system and Kubernetes configuration
- Kubernetes for shared CI/CD execution
- GitHub Actions Runner Controller (ARC) for ephemeral runner pods
- Harbor for private container image storage
- GitHub Environments for environment-specific approvals and secrets
The shared infrastructure repository will be named:
ac-cicd-infra
This is not a Visual Studio solution. It is a private GitHub repository containing Terraform, Ansible, Kubernetes manifests, Helm values, scripts, documentation, and GitHub Actions workflows.
2. Final Approved Architecture
The approved design uses one shared Kubernetes cluster and one shared ARC controller.
There will be no permanent application build or deployment VMs such as:
dev-build-01
qa-build-01
prod-build-01
dev-deploy-01
qa-deploy-01
prod-deploy-01
Instead, ephemeral ARC runner pods will perform both build and deployment operations.
GitHub push
↓
ARC build runner pod
├── checks out the repository
├── builds the Docker image
├── performs validation and security checks
└── pushes the image to Harbor
↓
ARC deploy runner pod
├── connects to the target VM through SSH
├── authenticates to Harbor with pull-only credentials
├── pulls the immutable image
├── replaces the running container
├── performs a health check
└── rolls back when deployment fails
The deployed applications continue to run on normal Proxmox VMs.
Examples:
dev-web-01
qa-web-01
prod-web-01
dev-app-01
qa-app-01
prod-app-01
The Terraform/Ansible controller VM remains responsible for infrastructure provisioning and configuration only.
prod-terraform-deploy-02
After ARC is operational, this VM must not be used as the normal FP, Shelvera, or future-product application build/deployment runner.
3. First FP Implementation
The first application pipeline will be:
fp-web-ui-001 dev branch
↓
Build job on fp-web-ui-001-dev-arc
↓
harbor.aspireclan.com/fp-ci-cd/dev-fp-web-ui-001:<COMMIT_SHA>
↓
Deploy job on a second fp-web-ui-001-dev-arc runner pod
↓
dev-web-01
↓
http://192.168.8.120:8080/
The build and deployment jobs use the same repository-and-environment runner scale set name, but ARC creates a separate ephemeral runner pod for each job.
Later environment promotion will use:
fp-web-ui-001 qa branch
↓
Build and deploy jobs on fp-web-ui-001-qa-arc
↓
qa-web-01
fp-web-ui-001 prod branch
↓
Build and deploy jobs on fp-web-ui-001-prod-arc
↓
prod-web-01
4. Current Infrastructure
4.1 Proxmox
Proxmox node: pve
Proxmox API: https://192.168.8.23:8006
VM storage: local-lvm
Network bridge: vmbr0
4.2 Base VM Template
Template name: tmplt-ub-26-min-base
Operating system: Ubuntu Server 26.04 LTS minimized
Networking: DHCP inside Ubuntu
Address assignment: router-side MAC reservation
Cloud-Init: not used
Template boot disk: scsi0
Template disk size: 40 GiB
Template storage: local-lvm
The template includes:
OpenSSH Server
QEMU Guest Agent
Docker Engine
Docker CLI
containerd
Docker Buildx
Docker Compose plugin
Essential Linux utilities
Template-safe DHCP Netplan
SSH host-key regeneration on first clone boot
Machine-ID cleanup
Ansible controller public key for acllc
Passwordless sudo for acllc
The private Ansible SSH key must remain only on the trusted controller VM. It must never be copied into the VM template, committed to Git, or stored in an unencrypted file in the repository.
4.3 Terraform and Ansible Controller
VM: prod-terraform-deploy-02
IP: 192.168.8.93
Runner: prod-ac-cicd-infra-deploy-rnr-01
Responsibilities:
Run Terraform
Run Ansible
Create Proxmox infrastructure VMs
Configure the common Ubuntu baseline
Configure HAProxy and Keepalived
Bootstrap Kubernetes
Install common cluster services
Install the shared ARC controller
Onboard GitHub organizations
Install repository-specific ARC runner scale sets
Expected tooling:
GitHub self-hosted runner
Terraform 1.15.5
Ansible
Node.js
Docker
SSH automation key at ~/.ssh/id_ed25519_ansible
Access to Proxmox
Access to Terraform-created VMs
4.4 Existing FP Development Target
VM: dev-web-01
IP: 192.168.8.120
User: acllc
Docker: installed
SSH: working
Passwordless sudo: working
Container host port: 8080
Provisioned by Terraform
Configured by Ansible
5. Network Design
Terraform assigns every VM a permanent unique MAC address.
The router maps that MAC address to a reserved IP address.
Ubuntu remains configured for DHCP.
Terraform VM MAC
↓
Router DHCP reservation
↓
Reserved IP returned to Ubuntu through DHCP
Do not configure the same address statically inside Ubuntu Netplan.
Every VM requires a unique:
VM ID
MAC address
Reserved IP
Hostname
The VM template MAC is reserved only for template preparation:
AA:BB:CC:01:01:01
Do not run the template-build VM and a clone using the same MAC address at the same time.
6. Critical Terraform Disk Rule
The template boot disk is:
Bus: SCSI
Slot: scsi0
Storage: local-lvm
Size: 40 GiB
Terraform must either omit the disk block and inherit the template disk or explicitly define a matching or larger disk.
disk {
slot = "scsi0"
storage = "local-lvm"
disk_size = "40G"
}
Never configure a Terraform disk smaller than the template disk.
A smaller disk specification previously caused Proxmox to detach the actual Ubuntu disk as an unused disk and attach a new empty disk as scsi0.
7. Shared Kubernetes Topology and Approved Allocation
The shared Kubernetes cluster consists of:
1 Kubernetes API load balancer
3 Kubernetes control-plane nodes
4 development worker nodes
4 QA worker nodes
4 production worker nodes
All 16 VMs will be cloned from:
tmplt-ub-26-min-base
A separate Kubernetes VM template is not required. Kubernetes-specific configuration will be performed through Ansible.
7.1 Approved VM naming convention
The new shared platform uses this collision-free convention:
cicd-ac-k8s-<environment>-<role>-<number>
Shared components that do not belong to an application environment omit the environment segment:
cicd-ac-k8s-<role>-<number>
Segments:
cicd = CI/CD platform
ac = Aspireclan
k8s = Kubernetes
lb = load balancer
cp = control plane
wk = worker
dev = development worker pool
qa = QA worker pool
prod = production worker pool
01 = sequential instance number
This naming is intentionally different from the existing Shelvera Kubernetes CI/CD setup. The existing setup can remain online during migration and can be discarded after Shelvera is moved to the shared architecture.
Approved names:
cicd-ac-k8s-lb-01
cicd-ac-k8s-cp-01
cicd-ac-k8s-cp-02
cicd-ac-k8s-cp-03
cicd-ac-k8s-dev-wk-01
cicd-ac-k8s-dev-wk-02
cicd-ac-k8s-dev-wk-03
cicd-ac-k8s-dev-wk-04
cicd-ac-k8s-qa-wk-01
cicd-ac-k8s-qa-wk-02
cicd-ac-k8s-qa-wk-03
cicd-ac-k8s-qa-wk-04
cicd-ac-k8s-prod-wk-01
cicd-ac-k8s-prod-wk-02
cicd-ac-k8s-prod-wk-03
cicd-ac-k8s-prod-wk-04
7.2 Approved network block
The current LAN is:
Network: 192.168.8.0/22
Subnet mask: 255.255.252.0
Gateway: 192.168.8.1
The new shared Kubernetes allocation starts at 192.168.8.200 and stays below the template address at 192.168.8.254.
Approved address plan:
192.168.8.200 Kubernetes API virtual IP
192.168.8.201 Current API load balancer
192.168.8.202-.204 Control-plane nodes
192.168.8.205-.208 Production workers
192.168.8.209-.212 QA workers
192.168.8.213-.216 Development workers
192.168.8.217 Reserved for a future second API load balancer
192.168.8.218-.219 Reserved for future shared Kubernetes infrastructure
192.168.8.254 Unavailable; assigned to tmplt-ub-26-min-base
The approved cluster consumes 17 addresses including the API VIP and 16 VM addresses. The .200-.219 block leaves two additional shared-infrastructure addresses after the future second load balancer.
Before provisioning, verify that no current DHCP lease, router reservation, VM, container, network appliance, or physical device uses 192.168.8.200-.219.
7.3 Kubernetes API endpoint
Approved DNS name and virtual IP:
DNS: cicd-ac-k8s-api.aspireclan.com
VIP: 192.168.8.200
URL: https://cicd-ac-k8s-api.aspireclan.com:443
192.168.8.200 is not assigned to a normal DHCP client and is not a separate VM. It is the Kubernetes API virtual IP managed by Keepalived on the load-balancer tier.
Initial implementation:
VIP 192.168.8.200
↓
cicd-ac-k8s-lb-01 at 192.168.8.201
↓
cicd-ac-k8s-cp-01:6443
cicd-ac-k8s-cp-02:6443
cicd-ac-k8s-cp-03:6443
The first implementation has one load-balancer VM, so the API tier still has a temporary single point of failure. Reserve 192.168.8.217 for a future cicd-ac-k8s-lb-02. When that VM is introduced, Keepalived can move 192.168.8.200 between both load balancers without changing the Kubernetes control-plane endpoint.
7.4 Collision-free MAC convention
The established general MAC scheme uses these environment segments:
local = 01
dev = 02
qa = 03
prod = 04
The new shared Kubernetes platform uses dedicated segments so its MAC addresses do not collide with the existing Shelvera Kubernetes CI/CD setup:
cicd-shared = 05
cicd-dev = 06
cicd-qa = 07
cicd-prod = 08
The existing category segments are retained:
k8s-cp = 0e
k8s-wk = 0f
k8s-lb = 14
MAC format:
aa:bb:cc:<platform-environment>:<category>:<machine-number>
Examples:
cicd-ac-k8s-lb-01 aa:bb:cc:05:14:01
cicd-ac-k8s-cp-01 aa:bb:cc:05:0e:01
cicd-ac-k8s-dev-wk-01 aa:bb:cc:06:0f:01
cicd-ac-k8s-qa-wk-01 aa:bb:cc:07:0f:01
cicd-ac-k8s-prod-wk-01 aa:bb:cc:08:0f:01
These 05-.08 platform/environment segments are reserved for the new shared CI/CD Kubernetes cluster.
7.5 Proxmox VM ID convention
All Proxmox VM IDs continue to begin with:
3156
For a VM in 192.168.8.x, append the decimal last octet to 3156:
VM ID = 3156 + decimal last IP octet
Examples:
192.168.8.201 → 3156201
192.168.8.216 → 3156216
The VIP does not receive a VM ID:
192.168.8.200 → no VM ID
Reference implementation:
export const buildVmId = (ip) => {
const match = String(ip).match(/^192\.168\.8\.(\d{1,3})$/);
if (!match) {
return '';
}
const lastOctet = Number(match[1]);
if (lastOctet < 1 || lastOctet > 254) {
return '';
}
return `3156${lastOctet}`;
};
7.6 Approved 16-VM identity allocation
The following names, addresses, MACs, and VM IDs are approved for the new shared cluster.
| MAC address | VM name | IP address | Kubernetes role | Environment | VM ID |
|---|---|---|---|---|---|
| | | API load balancer | Shared | |
| | | Control plane | Shared | |
| | | Control plane | Shared | |
| | | Control plane | Shared | |
| | | ARC worker | Production | |
| | | ARC worker | Production | |
| | | ARC worker | Production | |
| | | ARC worker | Production | |
| | | ARC worker | QA | |
| | | ARC worker | QA | |
| | | ARC worker | QA | |
| | | ARC worker | QA | |
| | | ARC worker | Development | |
| | | ARC worker | Development | |
| | | ARC worker | Development | |
| | | ARC worker | Development | |
7.7 Proxmox tags and Kubernetes labels
| VM group | Proxmox tags | Kubernetes labels |
|---|---|---|
| Load balancer | ac-cicd;shared-k8s;load-balancer;terraform;ansible | Not a Kubernetes node |
| Control planes | ac-cicd;shared-k8s;control-plane;terraform;ansible | Kubernetes control-plane labels and taints managed by kubeadm |
| Development workers | ac-cicd;shared-k8s;worker;dev;arc-runner;terraform;ansible | environment=dev, workload=github-runner |
| QA workers | ac-cicd;shared-k8s;worker;qa;arc-runner;terraform;ansible | environment=qa, workload=github-runner |
| Production workers | ac-cicd;shared-k8s;worker;prod;arc-runner;terraform;ansible | environment=prod, workload=github-runner |
7.8 Initial resource-sizing proposal
The identity allocation above is approved. The following resource sizes match the existing terraform/stacks/shared-k8s/main.tf implementation. The load-balancer disk remains at the mandatory template minimum of 40G.
Kubernetes VM Resource Allocation
| VM group | Count | vCPU per VM | RAM per VM | Disk per VM | Disk location |
|---|---|---|---|---|---|
| API load balancer | 1 | 2 | 4 GB | 40 GB | scsi0 on local-lvm |
| Control planes | 3 | 4 | 8 GB | 100 GB | scsi0 on local-lvm |
| Development workers | 4 | 4 | 16 GB | 250 GB | scsi0 on local-lvm |
| QA workers | 4 | 4 | 16 GB | 250 GB | scsi0 on local-lvm |
| Production workers | 4 | 4 | 16 GB | 250 GB | scsi0 on local-lvm |
Full-topology totals:
VM count: 16
Allocated vCPU: 62
Allocated RAM: 220 GB
Allocated virtual disk: 3,340 GB
CPU may be overcommitted according to the Proxmox host capacity and workload profile, but RAM and available local-lvm storage must be reviewed before Terraform apply. Docker-in-Docker builds can consume substantial temporary disk space on worker nodes.
7.9 Router reservation rules
Create router-side DHCP reservations for these 16 VM addresses:
192.168.8.201-.216
Do not create a normal DHCP reservation for:
192.168.8.200
That address is the API VIP owned by Keepalived.
Keep these addresses unused for future shared infrastructure:
192.168.8.217 Future cicd-ac-k8s-lb-02
192.168.8.218 Reserved
192.168.8.219 Reserved
Do not use:
192.168.8.254 tmplt-ub-26-min-base
Do not configure any of these addresses statically inside Ubuntu Netplan. Ubuntu must continue to use DHCP, and the router must return the reserved address based on the Terraform-assigned MAC.
7.10 Pre-provisioning collision checks
Run these checks before adding the router reservations or creating the VMs:
for ip in $(seq 200 219); do
ping -c 1 -W 1 "192.168.8.${ip}" >/dev/null 2>&1 && \
echo "IN USE OR RESPONDING: 192.168.8.${ip}"
done
ip neigh show | grep -E '192\.168\.8\.(20[0-9]|21[0-9])' || true
Also inspect the router DHCP lease and reservation lists. A device that blocks ICMP may not respond to ping, so router verification is mandatory.
8. Repository Strategy
Use one private GitHub repository:
ac-cicd-infra
Repository description:
Shared Proxmox, Terraform, Ansible, Kubernetes and ARC CI/CD infrastructure for Aspireclan applications.
The cleaned repository contains two active categories.
Shared infrastructure
Reusable Proxmox VM modules
Shared Kubernetes Terraform stack
HAProxy and Keepalived configuration
Kubernetes control-plane and worker configuration
Calico CNI configuration
Shared ARC controller
Shared infrastructure GitHub Actions workflows
FP tenant infrastructure
FP GitHub organization onboarding configuration
FP development, QA, and production namespaces
FP repository-specific ARC runner scale sets
FP Helm values
FP tenant Ansible playbooks
FP organization and runner-scale-set workflows
The active product-specific path convention is:
<product>/<environment>
Current examples:
fp/dev
fp/qa
fp/prod
When Shelvera or another Aspireclan product is onboarded, create its product-specific directories from the same pattern at that time. Do not keep empty product, application-VM, documentation, script, or add-on scaffolding in the repository before it is needed.
This product-first structure prevents resources from different products from becoming mixed together.
9. Approved Repository Folder Structure
The following structure matches the cleaned infrastructure source. It contains only the files and directories currently required to rebuild the shared Kubernetes CI/CD platform and the first FP runner scale sets.
ac-cicd-infra/
├── terraform/
│ ├── modules/
│ │ ├── proxmox-vm/
│ │ │ ├── main.tf
│ │ │ ├── variables.tf
│ │ │ ├── outputs.tf
│ │ │ └── versions.tf
│ │ └── proxmox-vm-group/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── versions.tf
│ └── stacks/
│ └── shared-k8s/
│ ├── backend.tf
│ ├── main.tf
│ ├── outputs.tf
│ ├── providers.tf
│ ├── terraform.tfvars.example
│ └── variables.tf
│
├── ansible/
│ ├── ansible.cfg
│ ├── inventories/
│ │ └── shared-k8s/
│ │ ├── hosts.ini
│ │ └── group_vars/
│ │ ├── all.yml
│ │ ├── control_planes.yml
│ │ ├── dev_workers.yml
│ │ ├── load_balancers.yml
│ │ ├── prod_workers.yml
│ │ └── qa_workers.yml
│ ├── playbooks/
│ │ ├── shared-k8s/
│ │ │ ├── 01-common-baseline.yml
│ │ │ ├── 02-configure-load-balancer.yml
│ │ │ ├── 03-prepare-kubernetes-nodes.yml
│ │ │ ├── 04-bootstrap-first-control-plane.yml
│ │ │ ├── 05-join-control-planes.yml
│ │ │ ├── 06-install-cni.yml
│ │ │ ├── 07-join-dev-workers.yml
│ │ │ ├── 07-join-prod-workers.yml
│ │ │ ├── 07-join-qa-workers.yml
│ │ │ ├── 08-label-and-taint-workers.yml
│ │ │ ├── 08-label-and-taint-prod-workers.yml
│ │ │ ├── 08-label-and-taint-qa-workers.yml
│ │ │ └── 09-install-arc-controller.yml
│ │ └── tenants/
│ │ └── fp/
│ │ ├── dev/install-fp-web-ui-001-runners.yml
│ │ ├── qa/install-fp-web-ui-001-runners.yml
│ │ └── prod/install-fp-web-ui-001-runners.yml
│ └── roles/
│ ├── arc-controller/
│ ├── arc-runner-scale-set/
│ ├── common/
│ ├── containerd/
│ ├── haproxy/
│ ├── keepalived/
│ ├── kubernetes-common/
│ ├── kubernetes-control-plane/
│ └── kubernetes-worker/
│
├── kubernetes/
│ ├── common/
│ │ └── cni/
│ │ └── calico-custom-resources.yaml
│ └── tenants/
│ └── fp/
│ ├── organization/
│ │ └── config.yaml
│ ├── dev/
│ │ ├── namespace/namespace.yaml
│ │ └── runner-scale-sets/fp-web-ui-001.yaml
│ ├── qa/
│ │ ├── namespace/namespace.yaml
│ │ └── runner-scale-sets/fp-web-ui-001.yaml
│ └── prod/
│ ├── namespace/namespace.yaml
│ └── runner-scale-sets/fp-web-ui-001.yaml
│
├── helm/
│ ├── common/
│ │ └── arc-controller/
│ │ └── values.yaml
│ └── tenants/
│ └── fp/
│ ├── dev/fp-web-ui-001-values.yaml
│ ├── qa/fp-web-ui-001-values.yaml
│ └── prod/fp-web-ui-001-values.yaml
│
├── .github/
│ ├── workflows/
│ │ ├── terraform-plan-shared-k8s.yml
│ │ ├── terraform-apply-shared-k8s.yml
│ │ ├── ansible-configure-shared-k8s.yml
│ │ ├── ansible-configure-load-balancer.yml
│ │ ├── ansible-configure-control-planes.yml
│ │ ├── ansible-configure-dev-workers.yml
│ │ ├── ansible-configure-qa-workers.yml
│ │ ├── ansible-configure-prod-workers.yml
│ │ ├── ansible-install-arc-controller.yml
│ │ ├── onboard-fp-github-org.yml
│ │ ├── arc-fp-web-ui-001-dev.yml
│ │ ├── arc-fp-web-ui-001-qa.yml
│ │ └── arc-fp-web-ui-001-prod.yml
│ └── CODEOWNERS
│
├── .editorconfig
├── .gitattributes
├── .gitignore
└── README.md
The cleaned source intentionally does not contain empty placeholders for Shelvera, future products, application-VM stacks, scripts, documentation, Metrics Server, ingress, certificate management, or unused build/deploy runner directories.
Create those directories only when a real implementation requires them.
10. Multi-Product Isolation Model
The cluster infrastructure remains shared.
terraform/stacks/shared-k8s/
ansible/inventories/shared-k8s/
ansible/playbooks/shared-k8s/
kubernetes/common/
helm/common/
The current FP tenant uses isolated resources under:
ansible/playbooks/tenants/fp/<environment>/
kubernetes/tenants/fp/<environment>/
helm/tenants/fp/<environment>/
When another product is onboarded, create the same product-first paths:
ansible/playbooks/tenants/<product>/<environment>/
kubernetes/tenants/<product>/<environment>/
helm/tenants/<product>/<environment>/
Application-VM Terraform stacks and Ansible inventories should be added only when application-VM provisioning is implemented in this repository.
Isolation boundaries include:
Kubernetes namespaces
GitHub App credentials when organizations differ
Harbor projects
Harbor robot accounts
Repository-specific ARC runner scale sets
Deployment SSH keys
GitHub Environments
GitHub secrets and variables
Helm values
Product-specific tenant playbooks
Shared infrastructure does not mean shared credentials.
A development workflow must not be able to access QA or production credentials.
An FP workflow must not be able to access Shelvera or future-product credentials.
11. ARC Namespace and Runner Naming
11.1 FP
Namespaces:
arc-runners-fp-dev
arc-runners-fp-qa
arc-runners-fp-prod
The implemented FP repository-specific scale sets are:
fp-web-ui-001-dev-arc
fp-web-ui-001-qa-arc
fp-web-ui-001-prod-arc
Each application workflow uses the same scale set for its build and deploy jobs. ARC creates a new ephemeral runner pod for every job.
Future FP repositories follow the same convention:
fp-ai-srvc-001-dev-arc
fp-ai-srvc-001-qa-arc
fp-ai-srvc-001-prod-arc
11.2 Shelvera
Shelvera uses the existing short form:
ts
Namespaces:
arc-runners-ts-dev
arc-runners-ts-qa
arc-runners-ts-prod
Repository-specific runner scale sets:
ts-gw-srvc-001-dev-arc
ts-data-srvc-001-dev-arc
ts-web-ui-001-dev-arc
The same repository-and-environment naming pattern applies to QA and production.
11.3 Future Product
Assume a future product short form of:
abc
Namespaces:
arc-runners-abc-dev
arc-runners-abc-qa
arc-runners-abc-prod
Example repository-specific runner scale sets:
abc-api-srvc-001-dev-arc
abc-web-ui-001-dev-arc
The product short form must be selected before creating namespaces, Harbor projects, secrets, and runner scale sets.
12. Terraform Responsibilities
Terraform manages Proxmox infrastructure resources only.
For every VM, Terraform manages:
VM name
VM ID
Proxmox node
Source template
Full clone mode
CPU cores
Memory
Disk
MAC address
Network bridge
QEMU Guest Agent option
Start-at-boot setting
Tags
Terraform must not:
Install Kubernetes packages
Run kubeadm
Configure HAProxy
Join nodes to Kubernetes
Install ARC
Use remote-exec for operating-system configuration
Store plaintext credentials in committed tfvars files
Terraform remote-exec provisioners should be avoided.
Ansible is responsible for post-provisioning configuration.
13. Terraform Provider Requirement
Every Terraform module using proxmox_vm_qemu must explicitly declare the provider.
terraform {
required_providers {
proxmox = {
source = "Telmate/proxmox"
version = "3.0.1-rc9"
}
}
}
Without this declaration, Terraform may attempt to resolve the nonexistent provider:
hashicorp/proxmox
Initial pinned versions:
Terraform: 1.15.5
Proxmox provider: Telmate/proxmox 3.0.1-rc9
14. Terraform State Design
The shared Kubernetes infrastructure has one authoritative Terraform state.
shared-k8s
The current working implementation uses the local backend declared in:
terraform/stacks/shared-k8s/backend.tf
The workflows explicitly use this state path on prod-terraform-deploy-02:
/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate
Do not create independent dev, qa, and prod Terraform states for the same shared Kubernetes control plane.
Application VM states remain separate from the shared Kubernetes state.
Examples:
/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate
ac-cicd-infra/application-vms/fp/dev/terraform.tfstate
ac-cicd-infra/application-vms/fp/qa/terraform.tfstate
ac-cicd-infra/application-vms/fp/prod/terraform.tfstate
The local state directory must exist, be writable by the self-hosted runner account, be backed up, and be protected from other users.
A secured remote backend with locking may replace the local backend later, but the Terraform code and workflows must be changed together before that migration.
Terraform state files must never be committed to Git.
15. Ansible Responsibilities
Ansible configures the operating system and Kubernetes after Terraform creates the VMs.
All playbooks and roles must be idempotent.
Common baseline responsibilities
Set permanent hostname
Maintain /etc/hosts where required
Refresh the APT cache before package installation
Install required utilities
Verify OpenSSH
Verify QEMU Guest Agent
Verify time synchronization
Verify passwordless sudo
Create /var/tmp/ansible-acllc with mode 0700
Handle required reboots
Verify networking and DNS
The inventory explicitly pins Python:
ansible_python_interpreter=/usr/bin/python3
Example host definition matching the current inventory:
cicd-ac-k8s-cp-01 ansible_host=192.168.8.202 ansible_user=acllc node_primary_ip=192.168.8.202 node_interface=ens18
The trusted controller uses:
~/.ssh/id_ed25519_ansible
The current ansible.cfg keeps host-key checking enabled, disables fact-variable injection, uses pipelining, and configures:
remote_tmp = /var/tmp/ansible-acllc
inject_facts_as_vars = False
host_key_checking = True
Populate known_hosts before execution rather than disabling SSH host verification globally.
16. Kubernetes Node Preparation
The Kubernetes common role must:
Disable swap immediately
Remove swap from /etc/fstab
Load the overlay kernel module
Load the br_netfilter kernel module
Configure bridge netfilter sysctl settings
Enable IPv4 forwarding
Install and configure containerd
Set containerd SystemdCgroup=true
Validate the containerd CRI plugins
Install kubelet
Install kubeadm
Install kubectl where required
Hold Kubernetes packages
Configure crictl
Enable kubelet
Reboot when required
Pinned cluster values in the existing source:
Kubernetes: v1.36.1
Kubernetes package: 1.36.1-1.1
Kubernetes repository: v1.36
kubeadm API: kubeadm.k8s.io/v1beta4
CRI socket: unix:///run/containerd/containerd.sock
Pod CIDR: 10.244.0.0/16
Service CIDR: 10.96.0.0/12
DNS domain: cluster.local
Calico: v3.32.0
Calico encapsulation: VXLAN
Calico BGP: disabled
No interactive nano or manual file-editing steps should remain.
All configuration files should be rendered through Ansible templates or copied from Git-managed source files.
17. Kubernetes API Load Balancer
The first load-balancer VM will run both HAProxy and Keepalived.
VM: cicd-ac-k8s-lb-01
Reserved DHCP address: 192.168.8.201
Keepalived VIP: 192.168.8.200
API DNS: cicd-ac-k8s-api.aspireclan.com
API URL: https://cicd-ac-k8s-api.aspireclan.com:443
Keepalived owns the API VIP. HAProxy listens on the VIP and forwards Kubernetes API traffic to:
cicd-ac-k8s-cp-01:6443
cicd-ac-k8s-cp-02:6443
cicd-ac-k8s-cp-03:6443
Create or update the internal DNS record so cicd-ac-k8s-api.aspireclan.com resolves to 192.168.8.200.
The initial API tier contains only one load-balancer VM and therefore remains a temporary single point of failure. A future cicd-ac-k8s-lb-02 will use reserved address 192.168.8.217 and participate in Keepalived failover for the same VIP.
The VIP, HAProxy listener, and all three backend health checks must work before running kubeadm init.
18. Kubernetes Bootstrap Order
The approved build order is:
1. Verify that 192.168.8.200-.219 are unused.
2. Review VM CPU, RAM, and disk capacity against Proxmox capacity.
3. Add router DHCP reservations for 192.168.8.201-.216.
4. Reserve 192.168.8.200 for the API VIP and .217-.219 for shared infrastructure.
5. Terraform creates the load balancer, control planes, and workers.
6. Validate every VM name, VM ID, MAC, scsi0 disk, and reserved address.
7. Ansible applies the common Ubuntu baseline.
8. Ansible prepares all Kubernetes nodes.
9. Ansible configures HAProxy and Keepalived.
10. Validate VIP ownership, DNS resolution, and all API backend health checks.
11. Bootstrap cicd-ac-k8s-cp-01 with kubeadm init.
12. Install kubeconfig for acllc and administrative automation.
13. Install Calico v3.32.0.
14. Join cicd-ac-k8s-cp-02.
15. Join cicd-ac-k8s-cp-03.
16. Join production workers at 192.168.8.205-.208.
17. Join QA workers at 192.168.8.209-.212.
18. Join development workers at 192.168.8.213-.216.
19. Label and taint workers by environment and workload.
20. Verify cluster health.
21. Install the shared ARC 0.14.2 controller once with two replicas.
22. Onboard each GitHub organization and validate GitHub App access.
23. Create product and environment namespaces.
24. Create the GitHub App Kubernetes Secret at runtime.
25. Install one repository-specific ARC runner scale set per environment.
26. Connect application workflow build and deploy jobs to the same environment scale set.
Join commands, bootstrap tokens, certificate keys, GitHub App private keys, and kubeconfig files must not be committed to Git.
Generate or retrieve them at runtime and store them only in:
Temporary protected files
Ansible Vault when applicable
GitHub encrypted secrets
GitHub Environment secrets
Another approved secret manager
19. Worker Labels and Scheduling
The existing Ansible configuration applies these worker labels:
environment=dev
environment=qa
environment=prod
workload=github-runner
Development workers:
environment=dev
workload=github-runner
taint: environment=dev:NoSchedule
QA workers:
environment=qa
workload=github-runner
taint: environment=qa:NoSchedule
Production workers:
environment=prod
workload=github-runner
taint: environment=prod:NoSchedule
Runner scale sets use nodeSelector and matching tolerations so runner pods run only on the correct environment workers.
Example:
template:
spec:
nodeSelector:
environment: dev
workload: github-runner
tolerations:
- key: environment
operator: Equal
value: dev
effect: NoSchedule
The enforced scheduling policy is:
Dev runners cannot schedule on QA or production workers.
QA runners cannot schedule on development or production workers.
Production runners cannot schedule on development or QA workers.
20. Shared Cluster Components
After the Kubernetes control plane and workers are healthy, install the Git-managed shared components:
Calico v3.32.0
Shared ARC controller 0.14.2
Helm v3.21.0 for ARC administration
The current ARC controller configuration uses:
Namespace: arc-systems
Helm release: arc
Controller replicas: 2
Control-plane node placement
Pod anti-affinity preference across control-plane nodes
Metrics Server, ingress, storage classes, and certificate management should be installed only when a later requirement needs them.
The ARC controller is shared and installed once.
Do not install a separate ARC controller per product, repository, or environment unless a specific technical or security requirement justifies it.
21. Build Runner Scale Sets
Build jobs run on repository-and-environment ARC scale sets.
Build runner pods require:
Repository source-code access
GitHub App access for cross-repository checkout
Harbor CI push/pull credentials
Docker Buildx
Docker-in-Docker
GitHub Actions cache access
SBOM generation
Build provenance support
The implemented FP values use:
containerMode:
type: dind
minRunners: 0
maxRunners: 4
Implemented FP scale sets:
fp-web-ui-001-dev-arc
fp-web-ui-001-qa-arc
fp-web-ui-001-prod-arc
This provides:
Scale to zero when idle
Automatic runner creation
Up to four concurrent runner pods per scale set
Runner disposal after each job
Reduced long-lived runner risk
Each product, repository, and environment should have an isolated runner scale set when permissions, worker placement, credentials, or scaling limits differ.
22. Deploy Runner Scale Sets
The current design does not create a second deploy-only scale set for the same repository and environment.
The deploy job uses the same runs-on value as the build job, but ARC creates a second ephemeral runner pod after the build job completes.
build job:
runs-on: fp-web-ui-001-dev-arc
deploy job:
needs: build
runs-on: fp-web-ui-001-dev-arc
Deploy runner pods require:
SSH client
curl
Environment-specific SSH private key
Harbor credentials
Network access to the target VM
Health-check tooling
The deploy job connects to the target VM and runs Docker commands remotely. The DinD sidecar exists because it is configured at the shared scale-set level, but the remote deploy logic does not use it.
Use separate environment-specific SSH keys.
Development key:
accepted only by development servers
QA key:
accepted only by QA servers
Production key:
accepted only by production servers
Private deployment keys must never be:
Committed to Git
Stored in the base VM template
Shared across all environments
Printed in workflow logs
Copied into application source files
23. Harbor Isolation
Harbor already exists.
Registry: harbor.aspireclan.com
Use one Harbor project per product.
FP: fp-ci-cd
Shelvera: ts-ci-cd
Future product example: abc-ci-cd
Each Harbor product project must have at least:
CI robot account
Used by build jobs and by the current first FP deployment workflow.
Permissions:
Pull
Push
For fp-web-ui-001, store the credential in the application repository's environment as:
HARBOR_USERNAME
HARBOR_PASSWORD
Runtime robot account
A pull-only runtime robot may be introduced later when the application workflow and target-host credential model are separated from the CI robot.
Recommended future permissions:
Pull only
Do not substitute a pull-only runtime robot for the current build credential because the build job must push images to Harbor.
For stronger future environment isolation, separate runtime robots may use names such as:
fp-dev-runtime
fp-qa-runtime
fp-prod-runtime
24. GitHub Branch Strategy
The infrastructure repository contains:
main
local
dev
qa
prod
The current working infrastructure workflows use dev and prod as their execution branches.
main
Protected baseline branch
Not used by the current shared-cluster apply workflows
No force push
No branch deletion
The local branch is retained for local-only work and is not used by the current automated infrastructure apply model.
dev
Active validation branch
Terraform format, validation, and plan
Ansible inventory and syntax validation
ARC and tenant configuration validation
No shared-cluster apply or configuration
qa
Reserved promotion branch
Not currently targeted by the newer shared-infrastructure workflows
No shared-cluster apply
May be used later for additional pre-production validation
prod
Approved infrastructure application branch
Terraform plan and apply
Ansible configuration
Shared ARC controller installation
GitHub organization onboarding
Repository runner scale-set installation
GitHub Environment approval where configured
No force push
No branch deletion
Current execution flow:
feature branch
↓ merge or push
dev
↓ validation only
prod
↓ validation and apply/configuration
The newer workflows intentionally do not use pull_request events. Validation occurs on the resulting push to dev, and apply/configuration occurs on the resulting push to prod.
Application repositories use direct branch-to-environment mapping:
dev branch → development
qa branch → QA
prod branch → production
Shared Kubernetes infrastructure must not be independently recreated by each application environment branch.
25. GitHub Environments
Create infrastructure repository environments:
shared-k8s
arc-org-fp
Future organizations follow the organization-credential convention:
arc-org-<product-short-form>
Application repositories create their own deployment environments:
dev
qa
prod
shared-k8s
Used for:
Terraform apply
Ansible cluster bootstrap
HAProxy and Keepalived configuration
Control-plane changes
Worker-node changes
Shared ARC controller installation
Require manual approval when the environment is used for changes to the shared cluster.
Development tenant environments
The fp-web-ui-001 repository's dev environment contains:
FP_CI_APP_ID
FP_CI_APP_PRIVATE_KEY
HARBOR_USERNAME
HARBOR_PASSWORD
DEPLOY_SSH_PRIVATE_KEY
QA tenant environments
The application repository's qa environment follows the same secret pattern but uses QA-specific deployment credentials and protection rules.
Production tenant environments
The application repository's prod environment follows the same secret pattern but uses production-specific credentials.
Require:
Manual reviewers
Protected prod branch
Branch restriction
No self-approval where practical
The infrastructure repository's arc-org-fp environment stores the FP GitHub App onboarding and runner-scale-set credentials. It is not the same environment as the application repository's dev, qa, or prod environments.
26. GitHub Secrets and Variables
Shared infrastructure secrets
Store these in the ac-cicd-infra repository environment used by the shared-cluster workflows:
PM_API_TOKEN_ID
PM_API_TOKEN_SECRET
The Ansible SSH key is an existing protected file on the trusted self-hosted runner:
~/.ssh/id_ed25519_ansible
Do not use the Proxmox root password in GitHub Actions.
Use a restricted Proxmox API token.
Shared non-sensitive variables
PM_API_URL=https://192.168.8.23:8006/api2/json
The current Terraform stack contains the fixed values for:
Proxmox node: pve
Storage: local-lvm
Bridge: vmbr0
Template: tmplt-ub-26-min-base
FP deployment secrets
In each application repository environment:
DEPLOY_SSH_PRIVATE_KEY
Use a different private key for development, QA, and production.
FP Harbor secrets
In each application repository environment:
HARBOR_USERNAME
HARBOR_PASSWORD
The Harbor robot used by the build job must have Pull and Push permissions for fp-ci-cd.
The infrastructure repository's arc-org-fp environment uses these variables:
ARC_GITHUB_ORGANIZATION
ARC_GITHUB_APP_CLIENT_ID
ARC_GITHUB_APP_ID
ARC_GITHUB_APP_INSTALLATION_ID
and this secret:
ARC_GITHUB_APP_PRIVATE_KEY
The fp-web-ui-001 application repository environments use:
FP_CI_APP_ID
FP_CI_APP_PRIVATE_KEY
Secrets in one repository or GitHub Environment are not automatically available to another repository or environment.
27. Infrastructure Workflows
Keep workflows focused instead of creating one oversized workflow.
Current cleaned-source workflows:
terraform-plan-shared-k8s.yml
terraform-apply-shared-k8s.yml
ansible-configure-shared-k8s.yml
ansible-configure-load-balancer.yml
ansible-configure-control-planes.yml
ansible-configure-dev-workers.yml
ansible-configure-qa-workers.yml
ansible-configure-prod-workers.yml
ansible-install-arc-controller.yml
onboard-fp-github-org.yml
arc-fp-web-ui-001-dev.yml
arc-fp-web-ui-001-qa.yml
arc-fp-web-ui-001-prod.yml
Current branch behavior:
| Branch | Validation | Terraform plan | Shared infrastructure apply/configuration |
|---|---|---|---|
dev | Yes | Yes for Terraform changes | No |
qa | Not targeted by the newer workflows | No | No |
prod | Yes | Yes | Yes, through the applicable GitHub Environment |
main | Not targeted by the newer workflows | No | No |
The Ansible and ARC workflows run validation jobs on both dev and prod, but their configure/install jobs are conditionally restricted to prod.
The Terraform plan workflow runs on dev. The Terraform apply workflow runs on prod.
There is no standalone validate.yml in the cleaned source. Validation remains embedded in the focused Terraform, Ansible, organization-onboarding, and ARC workflows.
Product-specific ARC scale-set definitions map application branches to their matching runner names:
| Application branch | Runner scale set |
|---|---|
dev | fp-web-ui-001-dev-arc |
qa | fp-web-ui-001-qa-arc |
prod | fp-web-ui-001-prod-arc |
28. Application Repository Responsibilities
The infrastructure repository creates and maintains runners and target infrastructure.
Each application repository contains its own application build/deployment definition.
Example:
fp-web-ui-001/
├── Dockerfile
├── application source
└── .github/
└── workflows/
└── build-deploy-dev-web.yml
The application workflow selects the repository-and-environment ARC scale set.
dev branch:
build job → fp-web-ui-001-dev-arc
deploy job → fp-web-ui-001-dev-arc
qa branch:
build job → fp-web-ui-001-qa-arc
deploy job → fp-web-ui-001-qa-arc
prod branch:
build job → fp-web-ui-001-prod-arc
deploy job → fp-web-ui-001-prod-arc
Even though both jobs use the same runs-on value, ARC creates a separate ephemeral pod for each job.
The first FP workflow checks out:
fp-web-ui-001
fp-001
fp-ci-actions
The combined build context is:
buildctx/
├── Web/
│ └── FP/
│ └── fp-web-ui-001/
└── CommonModules/
29. FP Image Naming and Deployment
Use immutable commit SHA tags.
harbor.aspireclan.com/fp-ci-cd/dev-fp-web-ui-001:<COMMIT_SHA>
harbor.aspireclan.com/fp-ci-cd/qa-fp-web-ui-001:<COMMIT_SHA>
harbor.aspireclan.com/fp-ci-cd/prod-fp-web-ui-001:<COMMIT_SHA>
Do not deploy only a mutable latest tag.
A convenience environment tag may also be published, but deployment and rollback records must use the immutable SHA.
Example deployment flow:
Build image with commit SHA
Push image to Harbor
Record deployed SHA
SSH to target VM
Pull exact SHA image
Start replacement container
Run health check
Promote replacement when healthy
Restore previous SHA when unhealthy
30. fp-web-ui-001 Application Details
Application: fp-web-ui-001
Type: ASP.NET Core MVC
Target framework: .NET 10
Build-context project path: Web/FP/fp-web-ui-001
Dockerfile path: buildctx/Web/FP/fp-web-ui-001/Dockerfile
Container port: 8080
Initial development deployment:
Target VM: dev-web-01
Target IP: 192.168.8.120
Container name: fp-web-ui-001-dev
Host port: 8080
Container port: 8080
URL: http://192.168.8.120:8080/
Restart policy: unless-stopped
Optional env file: /etc/fp/fp-web-ui-001/dev.env
When the environment file is absent, the deployment supplies:
ASPNETCORE_ENVIRONMENT=Development
ASPNETCORE_URLS=http://+:8080
The application currently uses HTTPS redirection.
For direct HTTP development deployment, UseHttpsRedirection() must not force HTTPS redirects when the application runs in the Development environment without an HTTPS endpoint.
31. Target Application VM Preparation
Ansible prepares application target VMs such as:
dev-web-01
qa-web-01
prod-web-01
Required configuration:
Docker Engine
Docker Compose plugin
Deployment public SSH key
Harbor CA trust when required
Harbor credential used by the approved workflow
Port and firewall configuration
Container deployment directories
Log rotation
curl
Health-check utilities
Passwordless sudo for acllc
Before binding host port 8080:
sudo ss -lntp | grep ':8080 ' || true
docker ps --format 'table {{.Names}} {{.Ports}}'
The deployment process must:
Record the previous image
Pull the immutable SHA image
Remove and recreate the container
Run the health check
Restore the previous image when the new deployment fails
Runtime credentials should be installed or supplied securely and must not be embedded in application images.
32. Repository Initialization
Create the repository locally on prod-terraform-deploy-02.
mkdir -p ~/ac-cicd-infra
cd ~/ac-cicd-infra
git init -b main
Create the top-level folders according to the approved structure.
Create the initial commit:
git add .
git commit -m "Initialize shared CI/CD infrastructure repository"
Add the private GitHub repository remote:
git remote add origin git@github.com:ASPIRECLAN-LLC-Org/ac-cicd-infra.git
Push the default branch:
git push -u origin main
Create the retained branches:
git switch -c local
git push -u origin local
git switch main
git switch -c dev
git push -u origin dev
git switch main
git switch -c qa
git push -u origin qa
git switch main
git switch -c prod
git push -u origin prod
git switch dev
The current automation targets dev for validation and prod for apply/configuration. The local, qa, and main branches are retained but are not current shared-cluster apply branches.
33. Branch Protection
dev
Require pull request before merge when team workflow requires it
Require at least one approval
Require Terraform and Ansible validation checks
Disallow force pushes
Disallow branch deletion
The workflows themselves run on the resulting push to dev; they do not require a pull_request trigger.
qa
Retain as a protected promotion branch
Require pull request before merge when used
Disallow force pushes
Disallow branch deletion
No current shared-cluster apply workflow
prod
Require pull request before merge
Require at least one approval
Require applicable validation checks
Disallow direct pushes where practical
Disallow force pushes
Disallow branch deletion
Protect the shared-k8s and arc-org-* environments
The workflows run apply/configuration on the resulting push to prod.
main
Protected baseline branch
Require pull request before merge
Disallow direct pushes
Disallow force pushes
Disallow branch deletion
No current shared-cluster apply workflow
The current execution path is:
feature branch → dev validation → prod apply/configuration
34. Required .gitignore Rules
The repository must exclude Terraform state, private keys, tokens, kubeconfig files, generated join commands, local secrets, environment files, generated reports, and editor files.
# Terraform
**/.terraform/*
*.tfstate
*.tfstate.*
*.tfplan
crash.log
crash.*.log
.terraform.lock.hcl.backup
# Local Terraform variables and credentials
*.auto.tfvars
*.auto.tfvars.json
terraform.tfvars
!terraform.tfvars.example
# Ansible
*.retry
ansible/.vault-password
ansible/vault-password*
ansible/inventories/**/host_vars/*/secrets.yml
# Kubernetes
kubeconfig
kubeconfig.*
admin.conf
*.kubeconfig
# SSH, certificates, and private keys
*.pem
*.key
*.p8
id_rsa
id_rsa.pub
id_ed25519
id_ed25519.pub
# Temporary bootstrap secrets
join-command*
certificate-key*
bootstrap-token*
*.token
*.secret
# Environment files
.env
.env.*
!.env.example
# Local ARC credential working folders
.arc-secrets/
arc-secrets/
# Logs and generated output
*.log
artifacts/
reports/
# Editors and operating systems
.DS_Store
Thumbs.db
.vscode/
.idea/
# Visual Studio
.vs/
*.suo
*.user
*.userosscache
*.sln.docstates
*.VC.db
*.VC.VC.opendb
Commit the generated .terraform.lock.hcl file.
Ignore only lock-file backups, not the actual dependency lock file.
35. Documentation Standardization
Existing Kubernetes and CI/CD documents must be standardized on:
Ubuntu Server 26.04 LTS
Template tmplt-ub-26-min-base
DHCP inside Ubuntu
Router-side MAC-to-IP reservations
Kubernetes v1.36.1
Calico v3.32.0
3 control-plane nodes
4 development workers at 192.168.8.213-.216
4 QA workers at 192.168.8.209-.212
4 production workers at 192.168.8.205-.208
ARC controller 0.14.2 with two replicas
One repository-specific scale set per environment
minRunners: 0
maxRunners: 4
Harbor project fp-ci-cd for FP
Harbor project ts-ci-cd for Shelvera
Remove or update outdated references to:
Ubuntu 24.04
tmplt-ub-24-min
Static IP configuration inside Ubuntu
Development workers at 192.168.8.205-.208
Production workers at 192.168.8.213-.216
8-vCPU / 32-GB workers
Permanent build VMs
Permanent deploy VMs
Separate build and deploy scale sets for the same repository/environment
maxRunners: 5
Shelvera-only shared infrastructure naming
Environment-first folders without a product boundary
36. Security Rules
The following rules are mandatory:
No private SSH keys in Git
No kubeconfig files in Git
No Terraform state in Git
No join tokens in Git
No plaintext Harbor passwords in values files
No production secrets available to development workflows
No shared deployment SSH key across dev, QA and prod
No Harbor push permission for runtime robots
No Proxmox root password in GitHub Actions
No static IP duplication between the router and Netplan
No Terraform remote-exec for cluster configuration
No mutable-only image deployment
Recommended improvements:
Use a restricted Proxmox API token
Use remote Terraform state with locking
Use GitHub Environment approvals
Use Ansible Vault or another secret manager
Use managed SSH known_hosts entries
Use Kubernetes taints and tolerations
Generate SBOM and provenance
Sign container images later
Scan images before deployment
Require production approval
Record deployed commit SHA
Keep rollback images available
37. Complete Implementation Order
Proceed in this exact sequence:
1. Verify that 192.168.8.200-.219 are unused in the router and network.
2. Review Proxmox RAM and local-lvm capacity for the Terraform-defined sizing.
3. Create or restore the private ac-cicd-infra GitHub repository from the cleaned source.
4. Confirm the cleaned folder structure shown in section 9.
5. Create or restore main, local, dev, qa, and prod branches.
6. Configure branch protection and GitHub Environments.
7. Prepare /var/lib/ac-cicd-infra/terraform-state/shared-k8s on prod-terraform-deploy-02.
8. Add router DHCP reservations for 192.168.8.201-.216.
9. Reserve .200 as the API VIP and .217-.219 for future shared infrastructure.
10. Review the reusable proxmox-vm and proxmox-vm-group Terraform modules.
11. Review the shared-k8s Terraform stack with the approved names, MACs, VM IDs, and sizes.
12. Run terraform fmt, validate, and plan from dev.
13. Review the complete Terraform plan.
14. Merge or push the approved code to prod and apply the 16 VMs.
15. Validate VM names, VM IDs, MAC addresses, scsi0 disks, and reserved IPs.
16. Verify the Ansible shared-k8s inventory and group variables.
17. Run the common Ubuntu baseline.
18. Configure HAProxy and Keepalived.
19. Prepare containerd and Kubernetes prerequisites.
20. Bootstrap the first control plane.
21. Install Calico v3.32.0.
22. Join the additional control planes.
23. Join production workers at .205-.208 with 07-join-prod-workers.yml.
24. Join QA workers at .209-.212 with 07-join-qa-workers.yml.
25. Join development workers at .213-.216 with 07-join-dev-workers.yml.
26. Apply worker labels and environment taints.
27. Verify the complete cluster.
28. Install the shared ARC 0.14.2 controller with two replicas.
29. Create and configure the FP GitHub App.
30. Add the arc-org-fp GitHub Environment credentials.
31. Run FP organization onboarding validation.
32. Create the FP development, QA, and production namespaces.
33. Install fp-web-ui-001-dev-arc.
34. Install fp-web-ui-001-qa-arc.
35. Install fp-web-ui-001-prod-arc.
36. Prepare dev-web-01 at 192.168.8.120 as the first deployment target.
37. Add or restore the fp-web-ui-001 Dockerfile and build-deploy-dev-web.yml workflow.
38. Add the five fp-web-ui-001 dev environment secrets.
39. Build and push the immutable development image to Harbor.
40. Deploy fp-web-ui-001 to 192.168.8.120:8080.
41. Validate health checks and rollback.
42. Promote the application pipeline pattern to QA.
43. Promote the application pipeline pattern to production.
44. Add Shelvera tenant directories only when Shelvera onboarding begins.
45. Retire the legacy Shelvera CI/CD cluster only after successful migration.
46. Add each future product using the same product-first structure when needed.
38. Source Consistency Status and Rebuild Starting Point
The following values are fixed in the cleaned Terraform and supporting infrastructure source:
Architecture: one shared Kubernetes cluster
Shared ARC controller: version 0.14.2, two replicas
VM naming convention: cicd-ac-k8s-*
Kubernetes API DNS: cicd-ac-k8s-api.aspireclan.com
Kubernetes API VIP: 192.168.8.200
Load balancer: 192.168.8.201
Control planes: 192.168.8.202-.204
Production workers: 192.168.8.205-.208
QA workers: 192.168.8.209-.212
Development workers: 192.168.8.213-.216
All 16 Proxmox VM IDs: 3156201-.3156216
All 16 VM MAC addresses shown in section 7.6
DHCP inside Ubuntu with router-side MAC reservations
Terraform for VM creation only
Ansible for operating-system and Kubernetes configuration
One repository-specific ARC scale set per environment
The cleaned source consistency checks are:
Development join playbook: 07-join-dev-workers.yml
QA join playbook: 07-join-qa-workers.yml
Production join playbook: 07-join-prod-workers.yml
Legacy generic development-worker join filename: not used
Standalone validate.yml workflow: not present
Empty future-product and Shelvera scaffolding: not present
Empty application-VM, scripts, and docs scaffolding: not present
Unused docker-runtime-host and firewall roles: not present
Active FP scale-set files: present for dev, QA, and prod
The Terraform resource allocation is:
Load balancer: 2 vCPU / 4 GB / 40 GB
Each control plane: 4 vCPU / 8 GB / 100 GB
Each worker: 4 vCPU / 16 GB / 250 GB
Total: 62 allocated vCPU / 220 GB RAM / 3,340 GB virtual disk
When rebuilding the environment from scratch, begin with:
1. Verify that 192.168.8.200-.219 are unused in the router lease and reservation tables.
2. Verify available Proxmox RAM and local-lvm storage.
3. Add the 16 DHCP reservations from section 7.6.
4. Reserve 192.168.8.200 for the API VIP and .217-.219 for shared infrastructure.
5. Prepare prod-terraform-deploy-02 and its self-hosted infrastructure runner.
6. Prepare the local Terraform state directory.
7. Restore the cleaned ac-cicd-infra source, branches, and GitHub Environments.
8. Run the dev Terraform validation and plan workflow.
9. Review the plan.
10. Run the prod Terraform apply workflow.
Do not apply Terraform until the router collision check and Proxmox capacity review are complete.
39. Continuation Prompt
Use the following prompt when continuing implementation in a new chat:
We are rebuilding the
ac-cicd-infraenvironment from scratch using the existing Terraform, Ansible, Kubernetes, Helm, and GitHub Actions source as the source of truth. Do not redesign or modify the working infrastructure code unless I explicitly approve a code change.The design uses one shared Kubernetes cluster and one shared ARC 0.14.2 controller with two replicas. Ephemeral ARC runner pods perform both build and deployment jobs. There are no permanent development, QA, or production build/deploy VMs.
All Kubernetes VMs are cloned from
tmplt-ub-26-min-base. Terraform creates Proxmox VMs only. Ansible configures Ubuntu, HAProxy, Keepalived, containerd, Kubernetes v1.36.1, Calico v3.32.0, node joins, labels, taints, the shared ARC controller, and runner scale sets.The API endpoint is
cicd-ac-k8s-api.aspireclan.com:443with VIP192.168.8.200. The load balancer is.201, control planes are.202-.204, production workers are.205-.208, QA workers are.209-.212, and development workers are.213-.216. The 16 VM MAC addresses and VM IDs are the values in section 7.6.The Terraform sizing is: load balancer 2 vCPU/4 GB/40 GB; each control plane 4 vCPU/8 GB/100 GB; each worker 4 vCPU/16 GB/250 GB. The complete topology is 62 vCPU, 220 GB RAM, and 3,340 GB virtual disk.
Use DHCP inside Ubuntu and router-side MAC reservations. Do not configure static Netplan addresses. Keep the Terraform boot disk at
scsi0,local-lvm, and at least40G.The infrastructure repository uses
devfor validation and Terraform plan, andprodfor apply/configuration. The trusted controller isprod-terraform-deploy-02at192.168.8.93, and the local Terraform state path is/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate.FP uses GitHub organization
fp-001-org, Harbor projectfp-ci-cd, namespacesarc-runners-fp-dev,arc-runners-fp-qa, andarc-runners-fp-prod, and scale setsfp-web-ui-001-dev-arc,fp-web-ui-001-qa-arc, andfp-web-ui-001-prod-arc. Each application build and deploy job uses the same environment scale set, but ARC creates a separate ephemeral runner pod for each job.The first deployment target is
dev-web-01at192.168.8.120. The development container isfp-web-ui-001-dev, and the application is reached athttp://192.168.8.120:8080/.Begin by verifying that
.200-.219are unused, adding the 16 router DHCP reservations, checking Proxmox capacity, preparingprod-terraform-deploy-02, restoring the repository and GitHub Environments, running the dev Terraform plan, reviewing it, and then applying from prod.