Skip to main content

K8S CI/CD Infrastructure Overview

1. Purpose

This document is the single reference for the Aspireclan shared CI/CD infrastructure project.

The repository will provide reusable infrastructure for:

  • FP
  • Shelvera
  • Future Aspireclan products
  • Multiple GitHub repositories per product
  • Development, QA, and production environments

The solution will use:

  • Proxmox VE for virtual machines
  • Terraform for VM provisioning
  • Ansible for operating-system and Kubernetes configuration
  • Kubernetes for shared CI/CD execution
  • GitHub Actions Runner Controller (ARC) for ephemeral runner pods
  • Harbor for private container image storage
  • GitHub Environments for environment-specific approvals and secrets

The shared infrastructure repository will be named:


ac-cicd-infra

This is not a Visual Studio solution. It is a private GitHub repository containing Terraform, Ansible, Kubernetes manifests, Helm values, scripts, documentation, and GitHub Actions workflows.


2. Final Approved Architecture

The approved design uses one shared Kubernetes cluster and one shared ARC controller.

There will be no permanent application build or deployment VMs such as:


dev-build-01
qa-build-01
prod-build-01

dev-deploy-01
qa-deploy-01
prod-deploy-01

Instead, ephemeral ARC runner pods will perform both build and deployment operations.


GitHub push
  ↓
ARC build runner pod
  ├── checks out the repository
  ├── builds the Docker image
  ├── performs validation and security checks
  └── pushes the image to Harbor
  ↓
ARC deploy runner pod
  ├── connects to the target VM through SSH
  ├── authenticates to Harbor with pull-only credentials
  ├── pulls the immutable image
  ├── replaces the running container
  ├── performs a health check
  └── rolls back when deployment fails

The deployed applications continue to run on normal Proxmox VMs.

Examples:


dev-web-01
qa-web-01
prod-web-01

dev-app-01
qa-app-01
prod-app-01

The Terraform/Ansible controller VM remains responsible for infrastructure provisioning and configuration only.


prod-terraform-deploy-02

After ARC is operational, this VM must not be used as the normal FP, Shelvera, or future-product application build/deployment runner.


3. First FP Implementation

The first application pipeline will be:


fp-web-ui-001 dev branch
  ↓
Build job on fp-web-ui-001-dev-arc
  ↓
harbor.aspireclan.com/fp-ci-cd/dev-fp-web-ui-001:<COMMIT_SHA>
  ↓
Deploy job on a second fp-web-ui-001-dev-arc runner pod
  ↓
dev-web-01
  ↓
http://192.168.8.120:8080/

The build and deployment jobs use the same repository-and-environment runner scale set name, but ARC creates a separate ephemeral runner pod for each job.

Later environment promotion will use:


fp-web-ui-001 qa branch
  ↓
Build and deploy jobs on fp-web-ui-001-qa-arc
  ↓
qa-web-01

fp-web-ui-001 prod branch
  ↓
Build and deploy jobs on fp-web-ui-001-prod-arc
  ↓
prod-web-01

4. Current Infrastructure

4.1 Proxmox


Proxmox node: pve
Proxmox API: https://192.168.8.23:8006
VM storage: local-lvm
Network bridge: vmbr0

4.2 Base VM Template


Template name: tmplt-ub-26-min-base
Operating system: Ubuntu Server 26.04 LTS minimized
Networking: DHCP inside Ubuntu
Address assignment: router-side MAC reservation
Cloud-Init: not used
Template boot disk: scsi0
Template disk size: 40 GiB
Template storage: local-lvm

The template includes:


OpenSSH Server
QEMU Guest Agent
Docker Engine
Docker CLI
containerd
Docker Buildx
Docker Compose plugin
Essential Linux utilities
Template-safe DHCP Netplan
SSH host-key regeneration on first clone boot
Machine-ID cleanup
Ansible controller public key for acllc
Passwordless sudo for acllc

The private Ansible SSH key must remain only on the trusted controller VM. It must never be copied into the VM template, committed to Git, or stored in an unencrypted file in the repository.

4.3 Terraform and Ansible Controller


VM: prod-terraform-deploy-02
IP: 192.168.8.93
Runner: prod-ac-cicd-infra-deploy-rnr-01

Responsibilities:


Run Terraform
Run Ansible
Create Proxmox infrastructure VMs
Configure the common Ubuntu baseline
Configure HAProxy and Keepalived
Bootstrap Kubernetes
Install common cluster services
Install the shared ARC controller
Onboard GitHub organizations
Install repository-specific ARC runner scale sets

Expected tooling:


GitHub self-hosted runner
Terraform 1.15.5
Ansible
Node.js
Docker
SSH automation key at ~/.ssh/id_ed25519_ansible
Access to Proxmox
Access to Terraform-created VMs

4.4 Existing FP Development Target


VM: dev-web-01
IP: 192.168.8.120
User: acllc
Docker: installed
SSH: working
Passwordless sudo: working
Container host port: 8080
Provisioned by Terraform
Configured by Ansible

5. Network Design

Terraform assigns every VM a permanent unique MAC address.

The router maps that MAC address to a reserved IP address.

Ubuntu remains configured for DHCP.


Terraform VM MAC
  ↓
Router DHCP reservation
  ↓
Reserved IP returned to Ubuntu through DHCP

Do not configure the same address statically inside Ubuntu Netplan.

Every VM requires a unique:


VM ID
MAC address
Reserved IP
Hostname

The VM template MAC is reserved only for template preparation:


AA:BB:CC:01:01:01

Do not run the template-build VM and a clone using the same MAC address at the same time.


6. Critical Terraform Disk Rule

The template boot disk is:


Bus: SCSI
Slot: scsi0
Storage: local-lvm
Size: 40 GiB

Terraform must either omit the disk block and inherit the template disk or explicitly define a matching or larger disk.


disk {
slot      = "scsi0"
storage   = "local-lvm"
disk_size = "40G"
}

Never configure a Terraform disk smaller than the template disk.

A smaller disk specification previously caused Proxmox to detach the actual Ubuntu disk as an unused disk and attach a new empty disk as scsi0.


7. Shared Kubernetes Topology and Approved Allocation

The shared Kubernetes cluster consists of:


1 Kubernetes API load balancer
3 Kubernetes control-plane nodes
4 development worker nodes
4 QA worker nodes
4 production worker nodes

All 16 VMs will be cloned from:


tmplt-ub-26-min-base

A separate Kubernetes VM template is not required. Kubernetes-specific configuration will be performed through Ansible.

7.1 Approved VM naming convention

The new shared platform uses this collision-free convention:


cicd-ac-k8s-<environment>-<role>-<number>

Shared components that do not belong to an application environment omit the environment segment:


cicd-ac-k8s-<role>-<number>

Segments:


cicd  = CI/CD platform
ac    = Aspireclan
k8s   = Kubernetes
lb    = load balancer
cp    = control plane
wk    = worker
dev   = development worker pool
qa    = QA worker pool
prod  = production worker pool
01    = sequential instance number

This naming is intentionally different from the existing Shelvera Kubernetes CI/CD setup. The existing setup can remain online during migration and can be discarded after Shelvera is moved to the shared architecture.

Approved names:


cicd-ac-k8s-lb-01

cicd-ac-k8s-cp-01
cicd-ac-k8s-cp-02
cicd-ac-k8s-cp-03

cicd-ac-k8s-dev-wk-01
cicd-ac-k8s-dev-wk-02
cicd-ac-k8s-dev-wk-03
cicd-ac-k8s-dev-wk-04

cicd-ac-k8s-qa-wk-01
cicd-ac-k8s-qa-wk-02
cicd-ac-k8s-qa-wk-03
cicd-ac-k8s-qa-wk-04

cicd-ac-k8s-prod-wk-01
cicd-ac-k8s-prod-wk-02
cicd-ac-k8s-prod-wk-03
cicd-ac-k8s-prod-wk-04

7.2 Approved network block

The current LAN is:


Network: 192.168.8.0/22
Subnet mask: 255.255.252.0
Gateway: 192.168.8.1

The new shared Kubernetes allocation starts at 192.168.8.200 and stays below the template address at 192.168.8.254.

Approved address plan:


192.168.8.200       Kubernetes API virtual IP
192.168.8.201       Current API load balancer
192.168.8.202-.204  Control-plane nodes
192.168.8.205-.208  Production workers
192.168.8.209-.212  QA workers
192.168.8.213-.216  Development workers
192.168.8.217       Reserved for a future second API load balancer
192.168.8.218-.219  Reserved for future shared Kubernetes infrastructure
192.168.8.254       Unavailable; assigned to tmplt-ub-26-min-base

The approved cluster consumes 17 addresses including the API VIP and 16 VM addresses. The .200-.219 block leaves two additional shared-infrastructure addresses after the future second load balancer.

Before provisioning, verify that no current DHCP lease, router reservation, VM, container, network appliance, or physical device uses 192.168.8.200-.219.

7.3 Kubernetes API endpoint

Approved DNS name and virtual IP:


DNS: cicd-ac-k8s-api.aspireclan.com
VIP: 192.168.8.200
URL: https://cicd-ac-k8s-api.aspireclan.com:443

192.168.8.200 is not assigned to a normal DHCP client and is not a separate VM. It is the Kubernetes API virtual IP managed by Keepalived on the load-balancer tier.

Initial implementation:


VIP 192.168.8.200
  ↓
cicd-ac-k8s-lb-01 at 192.168.8.201
  ↓
cicd-ac-k8s-cp-01:6443
cicd-ac-k8s-cp-02:6443
cicd-ac-k8s-cp-03:6443

The first implementation has one load-balancer VM, so the API tier still has a temporary single point of failure. Reserve 192.168.8.217 for a future cicd-ac-k8s-lb-02. When that VM is introduced, Keepalived can move 192.168.8.200 between both load balancers without changing the Kubernetes control-plane endpoint.

7.4 Collision-free MAC convention

The established general MAC scheme uses these environment segments:


local = 01
dev   = 02
qa    = 03
prod  = 04

The new shared Kubernetes platform uses dedicated segments so its MAC addresses do not collide with the existing Shelvera Kubernetes CI/CD setup:


cicd-shared = 05
cicd-dev    = 06
cicd-qa     = 07
cicd-prod   = 08

The existing category segments are retained:


k8s-cp = 0e
k8s-wk = 0f
k8s-lb = 14

MAC format:


aa:bb:cc:<platform-environment>:<category>:<machine-number>

Examples:


cicd-ac-k8s-lb-01      aa:bb:cc:05:14:01
cicd-ac-k8s-cp-01      aa:bb:cc:05:0e:01
cicd-ac-k8s-dev-wk-01  aa:bb:cc:06:0f:01
cicd-ac-k8s-qa-wk-01   aa:bb:cc:07:0f:01
cicd-ac-k8s-prod-wk-01 aa:bb:cc:08:0f:01

These 05-.08 platform/environment segments are reserved for the new shared CI/CD Kubernetes cluster.

7.5 Proxmox VM ID convention

All Proxmox VM IDs continue to begin with:


3156

For a VM in 192.168.8.x, append the decimal last octet to 3156:


VM ID = 3156 + decimal last IP octet

Examples:


192.168.8.201 → 3156201
192.168.8.216 → 3156216

The VIP does not receive a VM ID:


192.168.8.200 → no VM ID

Reference implementation:


export const buildVmId = (ip) => {
const match = String(ip).match(/^192\.168\.8\.(\d{1,3})$/);

if (!match) {
  return '';
}

const lastOctet = Number(match[1]);

if (lastOctet < 1 || lastOctet > 254) {
  return '';
}

return `3156${lastOctet}`;
};

7.6 Approved 16-VM identity allocation

The following names, addresses, MACs, and VM IDs are approved for the new shared cluster.

MAC addressVM nameIP addressKubernetes roleEnvironmentVM ID
aa:bb:cc:05:14:01
cicd-ac-k8s-lb-01
192.168.8.201
API load balancerShared
3156201
aa:bb:cc:05:0e:01
cicd-ac-k8s-cp-01
192.168.8.202
Control planeShared
3156202
aa:bb:cc:05:0e:02
cicd-ac-k8s-cp-02
192.168.8.203
Control planeShared
3156203
aa:bb:cc:05:0e:03
cicd-ac-k8s-cp-03
192.168.8.204
Control planeShared
3156204
aa:bb:cc:08:0f:01
cicd-ac-k8s-prod-wk-01
192.168.8.205
ARC workerProduction
3156205
aa:bb:cc:08:0f:02
cicd-ac-k8s-prod-wk-02
192.168.8.206
ARC workerProduction
3156206
aa:bb:cc:08:0f:03
cicd-ac-k8s-prod-wk-03
192.168.8.207
ARC workerProduction
3156207
aa:bb:cc:08:0f:04
cicd-ac-k8s-prod-wk-04
192.168.8.208
ARC workerProduction
3156208
aa:bb:cc:07:0f:01
cicd-ac-k8s-qa-wk-01
192.168.8.209
ARC workerQA
3156209
aa:bb:cc:07:0f:02
cicd-ac-k8s-qa-wk-02
192.168.8.210
ARC workerQA
3156210
aa:bb:cc:07:0f:03
cicd-ac-k8s-qa-wk-03
192.168.8.211
ARC workerQA
3156211
aa:bb:cc:07:0f:04
cicd-ac-k8s-qa-wk-04
192.168.8.212
ARC workerQA
3156212
aa:bb:cc:06:0f:01
cicd-ac-k8s-dev-wk-01
192.168.8.213
ARC workerDevelopment
3156213
aa:bb:cc:06:0f:02
cicd-ac-k8s-dev-wk-02
192.168.8.214
ARC workerDevelopment
3156214
aa:bb:cc:06:0f:03
cicd-ac-k8s-dev-wk-03
192.168.8.215
ARC workerDevelopment
3156215
aa:bb:cc:06:0f:04
cicd-ac-k8s-dev-wk-04
192.168.8.216
ARC workerDevelopment
3156216

7.7 Proxmox tags and Kubernetes labels

VM groupProxmox tagsKubernetes labels
Load balancerac-cicd;shared-k8s;load-balancer;terraform;ansibleNot a Kubernetes node
Control planesac-cicd;shared-k8s;control-plane;terraform;ansibleKubernetes control-plane labels and taints managed by kubeadm
Development workersac-cicd;shared-k8s;worker;dev;arc-runner;terraform;ansibleenvironment=dev, workload=github-runner
QA workersac-cicd;shared-k8s;worker;qa;arc-runner;terraform;ansibleenvironment=qa, workload=github-runner
Production workersac-cicd;shared-k8s;worker;prod;arc-runner;terraform;ansibleenvironment=prod, workload=github-runner

7.8 Initial resource-sizing proposal

The identity allocation above is approved. The following resource sizes match the existing terraform/stacks/shared-k8s/main.tf implementation. The load-balancer disk remains at the mandatory template minimum of 40G.

Kubernetes VM Resource Allocation

VM groupCountvCPU per VMRAM per VMDisk per VMDisk location
API load balancer124 GB40 GBscsi0 on local-lvm
Control planes348 GB100 GBscsi0 on local-lvm
Development workers4416 GB250 GBscsi0 on local-lvm
QA workers4416 GB250 GBscsi0 on local-lvm
Production workers4416 GB250 GBscsi0 on local-lvm

Full-topology totals:


VM count: 16
Allocated vCPU: 62
Allocated RAM: 220 GB
Allocated virtual disk: 3,340 GB

CPU may be overcommitted according to the Proxmox host capacity and workload profile, but RAM and available local-lvm storage must be reviewed before Terraform apply. Docker-in-Docker builds can consume substantial temporary disk space on worker nodes.

7.9 Router reservation rules

Create router-side DHCP reservations for these 16 VM addresses:


192.168.8.201-.216

Do not create a normal DHCP reservation for:


192.168.8.200

That address is the API VIP owned by Keepalived.

Keep these addresses unused for future shared infrastructure:


192.168.8.217  Future cicd-ac-k8s-lb-02
192.168.8.218  Reserved
192.168.8.219  Reserved

Do not use:


192.168.8.254  tmplt-ub-26-min-base

Do not configure any of these addresses statically inside Ubuntu Netplan. Ubuntu must continue to use DHCP, and the router must return the reserved address based on the Terraform-assigned MAC.

7.10 Pre-provisioning collision checks

Run these checks before adding the router reservations or creating the VMs:


for ip in $(seq 200 219); do
ping -c 1 -W 1 "192.168.8.${ip}" >/dev/null 2>&1 && \
  echo "IN USE OR RESPONDING: 192.168.8.${ip}"
done

ip neigh show | grep -E '192\.168\.8\.(20[0-9]|21[0-9])' || true

Also inspect the router DHCP lease and reservation lists. A device that blocks ICMP may not respond to ping, so router verification is mandatory.

8. Repository Strategy

Use one private GitHub repository:


ac-cicd-infra

Repository description:


Shared Proxmox, Terraform, Ansible, Kubernetes and ARC CI/CD infrastructure for Aspireclan applications.

The cleaned repository contains two active categories.

Shared infrastructure


Reusable Proxmox VM modules
Shared Kubernetes Terraform stack
HAProxy and Keepalived configuration
Kubernetes control-plane and worker configuration
Calico CNI configuration
Shared ARC controller
Shared infrastructure GitHub Actions workflows

FP tenant infrastructure


FP GitHub organization onboarding configuration
FP development, QA, and production namespaces
FP repository-specific ARC runner scale sets
FP Helm values
FP tenant Ansible playbooks
FP organization and runner-scale-set workflows

The active product-specific path convention is:


<product>/<environment>

Current examples:


fp/dev
fp/qa
fp/prod

When Shelvera or another Aspireclan product is onboarded, create its product-specific directories from the same pattern at that time. Do not keep empty product, application-VM, documentation, script, or add-on scaffolding in the repository before it is needed.

This product-first structure prevents resources from different products from becoming mixed together.


9. Approved Repository Folder Structure

The following structure matches the cleaned infrastructure source. It contains only the files and directories currently required to rebuild the shared Kubernetes CI/CD platform and the first FP runner scale sets.


ac-cicd-infra/
├── terraform/
│   ├── modules/
│   │   ├── proxmox-vm/
│   │   │   ├── main.tf
│   │   │   ├── variables.tf
│   │   │   ├── outputs.tf
│   │   │   └── versions.tf
│   │   └── proxmox-vm-group/
│   │       ├── main.tf
│   │       ├── variables.tf
│   │       ├── outputs.tf
│   │       └── versions.tf
│   └── stacks/
│       └── shared-k8s/
│           ├── backend.tf
│           ├── main.tf
│           ├── outputs.tf
│           ├── providers.tf
│           ├── terraform.tfvars.example
│           └── variables.tf
│
├── ansible/
│   ├── ansible.cfg
│   ├── inventories/
│   │   └── shared-k8s/
│   │       ├── hosts.ini
│   │       └── group_vars/
│   │           ├── all.yml
│   │           ├── control_planes.yml
│   │           ├── dev_workers.yml
│   │           ├── load_balancers.yml
│   │           ├── prod_workers.yml
│   │           └── qa_workers.yml
│   ├── playbooks/
│   │   ├── shared-k8s/
│   │   │   ├── 01-common-baseline.yml
│   │   │   ├── 02-configure-load-balancer.yml
│   │   │   ├── 03-prepare-kubernetes-nodes.yml
│   │   │   ├── 04-bootstrap-first-control-plane.yml
│   │   │   ├── 05-join-control-planes.yml
│   │   │   ├── 06-install-cni.yml
│   │   │   ├── 07-join-dev-workers.yml
│   │   │   ├── 07-join-prod-workers.yml
│   │   │   ├── 07-join-qa-workers.yml
│   │   │   ├── 08-label-and-taint-workers.yml
│   │   │   ├── 08-label-and-taint-prod-workers.yml
│   │   │   ├── 08-label-and-taint-qa-workers.yml
│   │   │   └── 09-install-arc-controller.yml
│   │   └── tenants/
│   │       └── fp/
│   │           ├── dev/install-fp-web-ui-001-runners.yml
│   │           ├── qa/install-fp-web-ui-001-runners.yml
│   │           └── prod/install-fp-web-ui-001-runners.yml
│   └── roles/
│       ├── arc-controller/
│       ├── arc-runner-scale-set/
│       ├── common/
│       ├── containerd/
│       ├── haproxy/
│       ├── keepalived/
│       ├── kubernetes-common/
│       ├── kubernetes-control-plane/
│       └── kubernetes-worker/
│
├── kubernetes/
│   ├── common/
│   │   └── cni/
│   │       └── calico-custom-resources.yaml
│   └── tenants/
│       └── fp/
│           ├── organization/
│           │   └── config.yaml
│           ├── dev/
│           │   ├── namespace/namespace.yaml
│           │   └── runner-scale-sets/fp-web-ui-001.yaml
│           ├── qa/
│           │   ├── namespace/namespace.yaml
│           │   └── runner-scale-sets/fp-web-ui-001.yaml
│           └── prod/
│               ├── namespace/namespace.yaml
│               └── runner-scale-sets/fp-web-ui-001.yaml
│
├── helm/
│   ├── common/
│   │   └── arc-controller/
│   │       └── values.yaml
│   └── tenants/
│       └── fp/
│           ├── dev/fp-web-ui-001-values.yaml
│           ├── qa/fp-web-ui-001-values.yaml
│           └── prod/fp-web-ui-001-values.yaml
│
├── .github/
│   ├── workflows/
│   │   ├── terraform-plan-shared-k8s.yml
│   │   ├── terraform-apply-shared-k8s.yml
│   │   ├── ansible-configure-shared-k8s.yml
│   │   ├── ansible-configure-load-balancer.yml
│   │   ├── ansible-configure-control-planes.yml
│   │   ├── ansible-configure-dev-workers.yml
│   │   ├── ansible-configure-qa-workers.yml
│   │   ├── ansible-configure-prod-workers.yml
│   │   ├── ansible-install-arc-controller.yml
│   │   ├── onboard-fp-github-org.yml
│   │   ├── arc-fp-web-ui-001-dev.yml
│   │   ├── arc-fp-web-ui-001-qa.yml
│   │   └── arc-fp-web-ui-001-prod.yml
│   └── CODEOWNERS
│
├── .editorconfig
├── .gitattributes
├── .gitignore
└── README.md

The cleaned source intentionally does not contain empty placeholders for Shelvera, future products, application-VM stacks, scripts, documentation, Metrics Server, ingress, certificate management, or unused build/deploy runner directories.

Create those directories only when a real implementation requires them.


10. Multi-Product Isolation Model

The cluster infrastructure remains shared.


terraform/stacks/shared-k8s/
ansible/inventories/shared-k8s/
ansible/playbooks/shared-k8s/
kubernetes/common/
helm/common/

The current FP tenant uses isolated resources under:


ansible/playbooks/tenants/fp/<environment>/
kubernetes/tenants/fp/<environment>/
helm/tenants/fp/<environment>/

When another product is onboarded, create the same product-first paths:


ansible/playbooks/tenants/<product>/<environment>/
kubernetes/tenants/<product>/<environment>/
helm/tenants/<product>/<environment>/

Application-VM Terraform stacks and Ansible inventories should be added only when application-VM provisioning is implemented in this repository.

Isolation boundaries include:


Kubernetes namespaces
GitHub App credentials when organizations differ
Harbor projects
Harbor robot accounts
Repository-specific ARC runner scale sets
Deployment SSH keys
GitHub Environments
GitHub secrets and variables
Helm values
Product-specific tenant playbooks

Shared infrastructure does not mean shared credentials.

A development workflow must not be able to access QA or production credentials.

An FP workflow must not be able to access Shelvera or future-product credentials.

11. ARC Namespace and Runner Naming

11.1 FP

Namespaces:


arc-runners-fp-dev
arc-runners-fp-qa
arc-runners-fp-prod

The implemented FP repository-specific scale sets are:


fp-web-ui-001-dev-arc
fp-web-ui-001-qa-arc
fp-web-ui-001-prod-arc

Each application workflow uses the same scale set for its build and deploy jobs. ARC creates a new ephemeral runner pod for every job.

Future FP repositories follow the same convention:


fp-ai-srvc-001-dev-arc
fp-ai-srvc-001-qa-arc
fp-ai-srvc-001-prod-arc

11.2 Shelvera

Shelvera uses the existing short form:


ts

Namespaces:


arc-runners-ts-dev
arc-runners-ts-qa
arc-runners-ts-prod

Repository-specific runner scale sets:


ts-gw-srvc-001-dev-arc
ts-data-srvc-001-dev-arc
ts-web-ui-001-dev-arc

The same repository-and-environment naming pattern applies to QA and production.

11.3 Future Product

Assume a future product short form of:


abc

Namespaces:


arc-runners-abc-dev
arc-runners-abc-qa
arc-runners-abc-prod

Example repository-specific runner scale sets:


abc-api-srvc-001-dev-arc
abc-web-ui-001-dev-arc

The product short form must be selected before creating namespaces, Harbor projects, secrets, and runner scale sets.

12. Terraform Responsibilities

Terraform manages Proxmox infrastructure resources only.

For every VM, Terraform manages:


VM name
VM ID
Proxmox node
Source template
Full clone mode
CPU cores
Memory
Disk
MAC address
Network bridge
QEMU Guest Agent option
Start-at-boot setting
Tags

Terraform must not:


Install Kubernetes packages
Run kubeadm
Configure HAProxy
Join nodes to Kubernetes
Install ARC
Use remote-exec for operating-system configuration
Store plaintext credentials in committed tfvars files

Terraform remote-exec provisioners should be avoided.

Ansible is responsible for post-provisioning configuration.


13. Terraform Provider Requirement

Every Terraform module using proxmox_vm_qemu must explicitly declare the provider.


terraform {
required_providers {
  proxmox = {
    source  = "Telmate/proxmox"
    version = "3.0.1-rc9"
  }
}
}

Without this declaration, Terraform may attempt to resolve the nonexistent provider:


hashicorp/proxmox

Initial pinned versions:


Terraform: 1.15.5
Proxmox provider: Telmate/proxmox 3.0.1-rc9

14. Terraform State Design

The shared Kubernetes infrastructure has one authoritative Terraform state.


shared-k8s

The current working implementation uses the local backend declared in:


terraform/stacks/shared-k8s/backend.tf

The workflows explicitly use this state path on prod-terraform-deploy-02:


/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate

Do not create independent dev, qa, and prod Terraform states for the same shared Kubernetes control plane.

Application VM states remain separate from the shared Kubernetes state.

Examples:


/var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate

ac-cicd-infra/application-vms/fp/dev/terraform.tfstate
ac-cicd-infra/application-vms/fp/qa/terraform.tfstate
ac-cicd-infra/application-vms/fp/prod/terraform.tfstate

The local state directory must exist, be writable by the self-hosted runner account, be backed up, and be protected from other users.

A secured remote backend with locking may replace the local backend later, but the Terraform code and workflows must be changed together before that migration.

Terraform state files must never be committed to Git.

15. Ansible Responsibilities

Ansible configures the operating system and Kubernetes after Terraform creates the VMs.

All playbooks and roles must be idempotent.

Common baseline responsibilities


Set permanent hostname
Maintain /etc/hosts where required
Refresh the APT cache before package installation
Install required utilities
Verify OpenSSH
Verify QEMU Guest Agent
Verify time synchronization
Verify passwordless sudo
Create /var/tmp/ansible-acllc with mode 0700
Handle required reboots
Verify networking and DNS

The inventory explicitly pins Python:


ansible_python_interpreter=/usr/bin/python3

Example host definition matching the current inventory:


cicd-ac-k8s-cp-01 ansible_host=192.168.8.202 ansible_user=acllc node_primary_ip=192.168.8.202 node_interface=ens18

The trusted controller uses:


~/.ssh/id_ed25519_ansible

The current ansible.cfg keeps host-key checking enabled, disables fact-variable injection, uses pipelining, and configures:


remote_tmp = /var/tmp/ansible-acllc
inject_facts_as_vars = False
host_key_checking = True

Populate known_hosts before execution rather than disabling SSH host verification globally.

16. Kubernetes Node Preparation

The Kubernetes common role must:


Disable swap immediately
Remove swap from /etc/fstab
Load the overlay kernel module
Load the br_netfilter kernel module
Configure bridge netfilter sysctl settings
Enable IPv4 forwarding
Install and configure containerd
Set containerd SystemdCgroup=true
Validate the containerd CRI plugins
Install kubelet
Install kubeadm
Install kubectl where required
Hold Kubernetes packages
Configure crictl
Enable kubelet
Reboot when required

Pinned cluster values in the existing source:


Kubernetes: v1.36.1
Kubernetes package: 1.36.1-1.1
Kubernetes repository: v1.36
kubeadm API: kubeadm.k8s.io/v1beta4
CRI socket: unix:///run/containerd/containerd.sock
Pod CIDR: 10.244.0.0/16
Service CIDR: 10.96.0.0/12
DNS domain: cluster.local
Calico: v3.32.0
Calico encapsulation: VXLAN
Calico BGP: disabled

No interactive nano or manual file-editing steps should remain.

All configuration files should be rendered through Ansible templates or copied from Git-managed source files.

17. Kubernetes API Load Balancer

The first load-balancer VM will run both HAProxy and Keepalived.


VM: cicd-ac-k8s-lb-01
Reserved DHCP address: 192.168.8.201
Keepalived VIP: 192.168.8.200
API DNS: cicd-ac-k8s-api.aspireclan.com
API URL: https://cicd-ac-k8s-api.aspireclan.com:443

Keepalived owns the API VIP. HAProxy listens on the VIP and forwards Kubernetes API traffic to:


cicd-ac-k8s-cp-01:6443
cicd-ac-k8s-cp-02:6443
cicd-ac-k8s-cp-03:6443

Create or update the internal DNS record so cicd-ac-k8s-api.aspireclan.com resolves to 192.168.8.200.

The initial API tier contains only one load-balancer VM and therefore remains a temporary single point of failure. A future cicd-ac-k8s-lb-02 will use reserved address 192.168.8.217 and participate in Keepalived failover for the same VIP.

The VIP, HAProxy listener, and all three backend health checks must work before running kubeadm init.


18. Kubernetes Bootstrap Order

The approved build order is:


1. Verify that 192.168.8.200-.219 are unused.
2. Review VM CPU, RAM, and disk capacity against Proxmox capacity.
3. Add router DHCP reservations for 192.168.8.201-.216.
4. Reserve 192.168.8.200 for the API VIP and .217-.219 for shared infrastructure.
5. Terraform creates the load balancer, control planes, and workers.
6. Validate every VM name, VM ID, MAC, scsi0 disk, and reserved address.
7. Ansible applies the common Ubuntu baseline.
8. Ansible prepares all Kubernetes nodes.
9. Ansible configures HAProxy and Keepalived.
10. Validate VIP ownership, DNS resolution, and all API backend health checks.
11. Bootstrap cicd-ac-k8s-cp-01 with kubeadm init.
12. Install kubeconfig for acllc and administrative automation.
13. Install Calico v3.32.0.
14. Join cicd-ac-k8s-cp-02.
15. Join cicd-ac-k8s-cp-03.
16. Join production workers at 192.168.8.205-.208.
17. Join QA workers at 192.168.8.209-.212.
18. Join development workers at 192.168.8.213-.216.
19. Label and taint workers by environment and workload.
20. Verify cluster health.
21. Install the shared ARC 0.14.2 controller once with two replicas.
22. Onboard each GitHub organization and validate GitHub App access.
23. Create product and environment namespaces.
24. Create the GitHub App Kubernetes Secret at runtime.
25. Install one repository-specific ARC runner scale set per environment.
26. Connect application workflow build and deploy jobs to the same environment scale set.

Join commands, bootstrap tokens, certificate keys, GitHub App private keys, and kubeconfig files must not be committed to Git.

Generate or retrieve them at runtime and store them only in:


Temporary protected files
Ansible Vault when applicable
GitHub encrypted secrets
GitHub Environment secrets
Another approved secret manager

19. Worker Labels and Scheduling

The existing Ansible configuration applies these worker labels:


environment=dev
environment=qa
environment=prod

workload=github-runner

Development workers:


environment=dev
workload=github-runner
taint: environment=dev:NoSchedule

QA workers:


environment=qa
workload=github-runner
taint: environment=qa:NoSchedule

Production workers:


environment=prod
workload=github-runner
taint: environment=prod:NoSchedule

Runner scale sets use nodeSelector and matching tolerations so runner pods run only on the correct environment workers.

Example:


template:
spec:
  nodeSelector:
    environment: dev
    workload: github-runner

  tolerations:
    - key: environment
      operator: Equal
      value: dev
      effect: NoSchedule

The enforced scheduling policy is:


Dev runners cannot schedule on QA or production workers.
QA runners cannot schedule on development or production workers.
Production runners cannot schedule on development or QA workers.

20. Shared Cluster Components

After the Kubernetes control plane and workers are healthy, install the Git-managed shared components:


Calico v3.32.0
Shared ARC controller 0.14.2
Helm v3.21.0 for ARC administration

The current ARC controller configuration uses:


Namespace: arc-systems
Helm release: arc
Controller replicas: 2
Control-plane node placement
Pod anti-affinity preference across control-plane nodes

Metrics Server, ingress, storage classes, and certificate management should be installed only when a later requirement needs them.

The ARC controller is shared and installed once.

Do not install a separate ARC controller per product, repository, or environment unless a specific technical or security requirement justifies it.

21. Build Runner Scale Sets

Build jobs run on repository-and-environment ARC scale sets.

Build runner pods require:


Repository source-code access
GitHub App access for cross-repository checkout
Harbor CI push/pull credentials
Docker Buildx
Docker-in-Docker
GitHub Actions cache access
SBOM generation
Build provenance support

The implemented FP values use:


containerMode:
type: dind

minRunners: 0
maxRunners: 4

Implemented FP scale sets:


fp-web-ui-001-dev-arc
fp-web-ui-001-qa-arc
fp-web-ui-001-prod-arc

This provides:


Scale to zero when idle
Automatic runner creation
Up to four concurrent runner pods per scale set
Runner disposal after each job
Reduced long-lived runner risk

Each product, repository, and environment should have an isolated runner scale set when permissions, worker placement, credentials, or scaling limits differ.

22. Deploy Runner Scale Sets

The current design does not create a second deploy-only scale set for the same repository and environment.

The deploy job uses the same runs-on value as the build job, but ARC creates a second ephemeral runner pod after the build job completes.


build job:
runs-on: fp-web-ui-001-dev-arc

deploy job:
needs: build
runs-on: fp-web-ui-001-dev-arc

Deploy runner pods require:


SSH client
curl
Environment-specific SSH private key
Harbor credentials
Network access to the target VM
Health-check tooling

The deploy job connects to the target VM and runs Docker commands remotely. The DinD sidecar exists because it is configured at the shared scale-set level, but the remote deploy logic does not use it.

Use separate environment-specific SSH keys.


Development key:
accepted only by development servers

QA key:
accepted only by QA servers

Production key:
accepted only by production servers

Private deployment keys must never be:


Committed to Git
Stored in the base VM template
Shared across all environments
Printed in workflow logs
Copied into application source files

23. Harbor Isolation

Harbor already exists.


Registry: harbor.aspireclan.com

Use one Harbor project per product.


FP: fp-ci-cd
Shelvera: ts-ci-cd
Future product example: abc-ci-cd

Each Harbor product project must have at least:

CI robot account

Used by build jobs and by the current first FP deployment workflow.

Permissions:


Pull
Push

For fp-web-ui-001, store the credential in the application repository's environment as:


HARBOR_USERNAME
HARBOR_PASSWORD

Runtime robot account

A pull-only runtime robot may be introduced later when the application workflow and target-host credential model are separated from the CI robot.

Recommended future permissions:


Pull only

Do not substitute a pull-only runtime robot for the current build credential because the build job must push images to Harbor.

For stronger future environment isolation, separate runtime robots may use names such as:


fp-dev-runtime
fp-qa-runtime
fp-prod-runtime

24. GitHub Branch Strategy

The infrastructure repository contains:


main
local
dev
qa
prod

The current working infrastructure workflows use dev and prod as their execution branches.

main


Protected baseline branch
Not used by the current shared-cluster apply workflows
No force push
No branch deletion

The local branch is retained for local-only work and is not used by the current automated infrastructure apply model.

dev


Active validation branch
Terraform format, validation, and plan
Ansible inventory and syntax validation
ARC and tenant configuration validation
No shared-cluster apply or configuration

qa


Reserved promotion branch
Not currently targeted by the newer shared-infrastructure workflows
No shared-cluster apply
May be used later for additional pre-production validation

prod


Approved infrastructure application branch
Terraform plan and apply
Ansible configuration
Shared ARC controller installation
GitHub organization onboarding
Repository runner scale-set installation
GitHub Environment approval where configured
No force push
No branch deletion

Current execution flow:


feature branch
  ↓ merge or push
dev
  ↓ validation only
prod
  ↓ validation and apply/configuration

The newer workflows intentionally do not use pull_request events. Validation occurs on the resulting push to dev, and apply/configuration occurs on the resulting push to prod.

Application repositories use direct branch-to-environment mapping:


dev branch  → development
qa branch   → QA
prod branch → production

Shared Kubernetes infrastructure must not be independently recreated by each application environment branch.

25. GitHub Environments

Create infrastructure repository environments:


shared-k8s
arc-org-fp

Future organizations follow the organization-credential convention:


arc-org-<product-short-form>

Application repositories create their own deployment environments:


dev
qa
prod

shared-k8s

Used for:


Terraform apply
Ansible cluster bootstrap
HAProxy and Keepalived configuration
Control-plane changes
Worker-node changes
Shared ARC controller installation

Require manual approval when the environment is used for changes to the shared cluster.

Development tenant environments

The fp-web-ui-001 repository's dev environment contains:


FP_CI_APP_ID
FP_CI_APP_PRIVATE_KEY
HARBOR_USERNAME
HARBOR_PASSWORD
DEPLOY_SSH_PRIVATE_KEY

QA tenant environments

The application repository's qa environment follows the same secret pattern but uses QA-specific deployment credentials and protection rules.

Production tenant environments

The application repository's prod environment follows the same secret pattern but uses production-specific credentials.

Require:


Manual reviewers
Protected prod branch
Branch restriction
No self-approval where practical

The infrastructure repository's arc-org-fp environment stores the FP GitHub App onboarding and runner-scale-set credentials. It is not the same environment as the application repository's dev, qa, or prod environments.

26. GitHub Secrets and Variables

Shared infrastructure secrets

Store these in the ac-cicd-infra repository environment used by the shared-cluster workflows:


PM_API_TOKEN_ID
PM_API_TOKEN_SECRET

The Ansible SSH key is an existing protected file on the trusted self-hosted runner:


~/.ssh/id_ed25519_ansible

Do not use the Proxmox root password in GitHub Actions.

Use a restricted Proxmox API token.

Shared non-sensitive variables


PM_API_URL=https://192.168.8.23:8006/api2/json

The current Terraform stack contains the fixed values for:


Proxmox node: pve
Storage: local-lvm
Bridge: vmbr0
Template: tmplt-ub-26-min-base

FP deployment secrets

In each application repository environment:


DEPLOY_SSH_PRIVATE_KEY

Use a different private key for development, QA, and production.

FP Harbor secrets

In each application repository environment:


HARBOR_USERNAME
HARBOR_PASSWORD

The Harbor robot used by the build job must have Pull and Push permissions for fp-ci-cd.

The infrastructure repository's arc-org-fp environment uses these variables:


ARC_GITHUB_ORGANIZATION
ARC_GITHUB_APP_CLIENT_ID
ARC_GITHUB_APP_ID
ARC_GITHUB_APP_INSTALLATION_ID

and this secret:


ARC_GITHUB_APP_PRIVATE_KEY

The fp-web-ui-001 application repository environments use:


FP_CI_APP_ID
FP_CI_APP_PRIVATE_KEY

Secrets in one repository or GitHub Environment are not automatically available to another repository or environment.

27. Infrastructure Workflows

Keep workflows focused instead of creating one oversized workflow.

Current cleaned-source workflows:


terraform-plan-shared-k8s.yml
terraform-apply-shared-k8s.yml
ansible-configure-shared-k8s.yml
ansible-configure-load-balancer.yml
ansible-configure-control-planes.yml
ansible-configure-dev-workers.yml
ansible-configure-qa-workers.yml
ansible-configure-prod-workers.yml
ansible-install-arc-controller.yml
onboard-fp-github-org.yml
arc-fp-web-ui-001-dev.yml
arc-fp-web-ui-001-qa.yml
arc-fp-web-ui-001-prod.yml

Current branch behavior:

BranchValidationTerraform planShared infrastructure apply/configuration
devYesYes for Terraform changesNo
qaNot targeted by the newer workflowsNoNo
prodYesYesYes, through the applicable GitHub Environment
mainNot targeted by the newer workflowsNoNo

The Ansible and ARC workflows run validation jobs on both dev and prod, but their configure/install jobs are conditionally restricted to prod.

The Terraform plan workflow runs on dev. The Terraform apply workflow runs on prod.

There is no standalone validate.yml in the cleaned source. Validation remains embedded in the focused Terraform, Ansible, organization-onboarding, and ARC workflows.

Product-specific ARC scale-set definitions map application branches to their matching runner names:

Application branchRunner scale set
devfp-web-ui-001-dev-arc
qafp-web-ui-001-qa-arc
prodfp-web-ui-001-prod-arc

28. Application Repository Responsibilities

The infrastructure repository creates and maintains runners and target infrastructure.

Each application repository contains its own application build/deployment definition.

Example:


fp-web-ui-001/
├── Dockerfile
├── application source
└── .github/
  └── workflows/
      └── build-deploy-dev-web.yml

The application workflow selects the repository-and-environment ARC scale set.


dev branch:
build job  → fp-web-ui-001-dev-arc
deploy job → fp-web-ui-001-dev-arc

qa branch:
build job  → fp-web-ui-001-qa-arc
deploy job → fp-web-ui-001-qa-arc

prod branch:
build job  → fp-web-ui-001-prod-arc
deploy job → fp-web-ui-001-prod-arc

Even though both jobs use the same runs-on value, ARC creates a separate ephemeral pod for each job.

The first FP workflow checks out:


fp-web-ui-001
fp-001
fp-ci-actions

The combined build context is:


buildctx/
├── Web/
│   └── FP/
│       └── fp-web-ui-001/
└── CommonModules/

29. FP Image Naming and Deployment

Use immutable commit SHA tags.


harbor.aspireclan.com/fp-ci-cd/dev-fp-web-ui-001:<COMMIT_SHA>
harbor.aspireclan.com/fp-ci-cd/qa-fp-web-ui-001:<COMMIT_SHA>
harbor.aspireclan.com/fp-ci-cd/prod-fp-web-ui-001:<COMMIT_SHA>

Do not deploy only a mutable latest tag.

A convenience environment tag may also be published, but deployment and rollback records must use the immutable SHA.

Example deployment flow:


Build image with commit SHA
Push image to Harbor
Record deployed SHA
SSH to target VM
Pull exact SHA image
Start replacement container
Run health check
Promote replacement when healthy
Restore previous SHA when unhealthy

30. fp-web-ui-001 Application Details


Application: fp-web-ui-001
Type: ASP.NET Core MVC
Target framework: .NET 10
Build-context project path: Web/FP/fp-web-ui-001
Dockerfile path: buildctx/Web/FP/fp-web-ui-001/Dockerfile
Container port: 8080

Initial development deployment:


Target VM: dev-web-01
Target IP: 192.168.8.120
Container name: fp-web-ui-001-dev
Host port: 8080
Container port: 8080
URL: http://192.168.8.120:8080/
Restart policy: unless-stopped
Optional env file: /etc/fp/fp-web-ui-001/dev.env

When the environment file is absent, the deployment supplies:


ASPNETCORE_ENVIRONMENT=Development
ASPNETCORE_URLS=http://+:8080

The application currently uses HTTPS redirection.

For direct HTTP development deployment, UseHttpsRedirection() must not force HTTPS redirects when the application runs in the Development environment without an HTTPS endpoint.

31. Target Application VM Preparation

Ansible prepares application target VMs such as:


dev-web-01
qa-web-01
prod-web-01

Required configuration:


Docker Engine
Docker Compose plugin
Deployment public SSH key
Harbor CA trust when required
Harbor credential used by the approved workflow
Port and firewall configuration
Container deployment directories
Log rotation
curl
Health-check utilities
Passwordless sudo for acllc

Before binding host port 8080:


sudo ss -lntp | grep ':8080 ' || true
docker ps --format 'table {{.Names}}	{{.Ports}}'

The deployment process must:


Record the previous image
Pull the immutable SHA image
Remove and recreate the container
Run the health check
Restore the previous image when the new deployment fails

Runtime credentials should be installed or supplied securely and must not be embedded in application images.

32. Repository Initialization

Create the repository locally on prod-terraform-deploy-02.


mkdir -p ~/ac-cicd-infra
cd ~/ac-cicd-infra

git init -b main

Create the top-level folders according to the approved structure.

Create the initial commit:


git add .
git commit -m "Initialize shared CI/CD infrastructure repository"

Add the private GitHub repository remote:


git remote add origin git@github.com:ASPIRECLAN-LLC-Org/ac-cicd-infra.git

Push the default branch:


git push -u origin main

Create the retained branches:


git switch -c local
git push -u origin local

git switch main
git switch -c dev
git push -u origin dev

git switch main
git switch -c qa
git push -u origin qa

git switch main
git switch -c prod
git push -u origin prod

git switch dev

The current automation targets dev for validation and prod for apply/configuration. The local, qa, and main branches are retained but are not current shared-cluster apply branches.

33. Branch Protection

dev


Require pull request before merge when team workflow requires it
Require at least one approval
Require Terraform and Ansible validation checks
Disallow force pushes
Disallow branch deletion

The workflows themselves run on the resulting push to dev; they do not require a pull_request trigger.

qa


Retain as a protected promotion branch
Require pull request before merge when used
Disallow force pushes
Disallow branch deletion
No current shared-cluster apply workflow

prod


Require pull request before merge
Require at least one approval
Require applicable validation checks
Disallow direct pushes where practical
Disallow force pushes
Disallow branch deletion
Protect the shared-k8s and arc-org-* environments

The workflows run apply/configuration on the resulting push to prod.

main


Protected baseline branch
Require pull request before merge
Disallow direct pushes
Disallow force pushes
Disallow branch deletion
No current shared-cluster apply workflow

The current execution path is:


feature branch → dev validation → prod apply/configuration

34. Required .gitignore Rules

The repository must exclude Terraform state, private keys, tokens, kubeconfig files, generated join commands, local secrets, environment files, generated reports, and editor files.


# Terraform
**/.terraform/*
*.tfstate
*.tfstate.*
*.tfplan
crash.log
crash.*.log
.terraform.lock.hcl.backup

# Local Terraform variables and credentials
*.auto.tfvars
*.auto.tfvars.json
terraform.tfvars
!terraform.tfvars.example

# Ansible
*.retry
ansible/.vault-password
ansible/vault-password*
ansible/inventories/**/host_vars/*/secrets.yml

# Kubernetes
kubeconfig
kubeconfig.*
admin.conf
*.kubeconfig

# SSH, certificates, and private keys
*.pem
*.key
*.p8
id_rsa
id_rsa.pub
id_ed25519
id_ed25519.pub

# Temporary bootstrap secrets
join-command*
certificate-key*
bootstrap-token*
*.token
*.secret

# Environment files
.env
.env.*
!.env.example

# Local ARC credential working folders
.arc-secrets/
arc-secrets/

# Logs and generated output
*.log
artifacts/
reports/

# Editors and operating systems
.DS_Store
Thumbs.db
.vscode/
.idea/

# Visual Studio
.vs/
*.suo
*.user
*.userosscache
*.sln.docstates
*.VC.db
*.VC.VC.opendb

Commit the generated .terraform.lock.hcl file.

Ignore only lock-file backups, not the actual dependency lock file.

35. Documentation Standardization

Existing Kubernetes and CI/CD documents must be standardized on:


Ubuntu Server 26.04 LTS
Template tmplt-ub-26-min-base
DHCP inside Ubuntu
Router-side MAC-to-IP reservations
Kubernetes v1.36.1
Calico v3.32.0
3 control-plane nodes
4 development workers at 192.168.8.213-.216
4 QA workers at 192.168.8.209-.212
4 production workers at 192.168.8.205-.208
ARC controller 0.14.2 with two replicas
One repository-specific scale set per environment
minRunners: 0
maxRunners: 4
Harbor project fp-ci-cd for FP
Harbor project ts-ci-cd for Shelvera

Remove or update outdated references to:


Ubuntu 24.04
tmplt-ub-24-min
Static IP configuration inside Ubuntu
Development workers at 192.168.8.205-.208
Production workers at 192.168.8.213-.216
8-vCPU / 32-GB workers
Permanent build VMs
Permanent deploy VMs
Separate build and deploy scale sets for the same repository/environment
maxRunners: 5
Shelvera-only shared infrastructure naming
Environment-first folders without a product boundary

36. Security Rules

The following rules are mandatory:


No private SSH keys in Git
No kubeconfig files in Git
No Terraform state in Git
No join tokens in Git
No plaintext Harbor passwords in values files
No production secrets available to development workflows
No shared deployment SSH key across dev, QA and prod
No Harbor push permission for runtime robots
No Proxmox root password in GitHub Actions
No static IP duplication between the router and Netplan
No Terraform remote-exec for cluster configuration
No mutable-only image deployment

Recommended improvements:


Use a restricted Proxmox API token
Use remote Terraform state with locking
Use GitHub Environment approvals
Use Ansible Vault or another secret manager
Use managed SSH known_hosts entries
Use Kubernetes taints and tolerations
Generate SBOM and provenance
Sign container images later
Scan images before deployment
Require production approval
Record deployed commit SHA
Keep rollback images available

37. Complete Implementation Order

Proceed in this exact sequence:


1. Verify that 192.168.8.200-.219 are unused in the router and network.
2. Review Proxmox RAM and local-lvm capacity for the Terraform-defined sizing.
3. Create or restore the private ac-cicd-infra GitHub repository from the cleaned source.
4. Confirm the cleaned folder structure shown in section 9.
5. Create or restore main, local, dev, qa, and prod branches.
6. Configure branch protection and GitHub Environments.
7. Prepare /var/lib/ac-cicd-infra/terraform-state/shared-k8s on prod-terraform-deploy-02.
8. Add router DHCP reservations for 192.168.8.201-.216.
9. Reserve .200 as the API VIP and .217-.219 for future shared infrastructure.
10. Review the reusable proxmox-vm and proxmox-vm-group Terraform modules.
11. Review the shared-k8s Terraform stack with the approved names, MACs, VM IDs, and sizes.
12. Run terraform fmt, validate, and plan from dev.
13. Review the complete Terraform plan.
14. Merge or push the approved code to prod and apply the 16 VMs.
15. Validate VM names, VM IDs, MAC addresses, scsi0 disks, and reserved IPs.
16. Verify the Ansible shared-k8s inventory and group variables.
17. Run the common Ubuntu baseline.
18. Configure HAProxy and Keepalived.
19. Prepare containerd and Kubernetes prerequisites.
20. Bootstrap the first control plane.
21. Install Calico v3.32.0.
22. Join the additional control planes.
23. Join production workers at .205-.208 with 07-join-prod-workers.yml.
24. Join QA workers at .209-.212 with 07-join-qa-workers.yml.
25. Join development workers at .213-.216 with 07-join-dev-workers.yml.
26. Apply worker labels and environment taints.
27. Verify the complete cluster.
28. Install the shared ARC 0.14.2 controller with two replicas.
29. Create and configure the FP GitHub App.
30. Add the arc-org-fp GitHub Environment credentials.
31. Run FP organization onboarding validation.
32. Create the FP development, QA, and production namespaces.
33. Install fp-web-ui-001-dev-arc.
34. Install fp-web-ui-001-qa-arc.
35. Install fp-web-ui-001-prod-arc.
36. Prepare dev-web-01 at 192.168.8.120 as the first deployment target.
37. Add or restore the fp-web-ui-001 Dockerfile and build-deploy-dev-web.yml workflow.
38. Add the five fp-web-ui-001 dev environment secrets.
39. Build and push the immutable development image to Harbor.
40. Deploy fp-web-ui-001 to 192.168.8.120:8080.
41. Validate health checks and rollback.
42. Promote the application pipeline pattern to QA.
43. Promote the application pipeline pattern to production.
44. Add Shelvera tenant directories only when Shelvera onboarding begins.
45. Retire the legacy Shelvera CI/CD cluster only after successful migration.
46. Add each future product using the same product-first structure when needed.

38. Source Consistency Status and Rebuild Starting Point

The following values are fixed in the cleaned Terraform and supporting infrastructure source:


Architecture: one shared Kubernetes cluster
Shared ARC controller: version 0.14.2, two replicas
VM naming convention: cicd-ac-k8s-*
Kubernetes API DNS: cicd-ac-k8s-api.aspireclan.com
Kubernetes API VIP: 192.168.8.200
Load balancer: 192.168.8.201
Control planes: 192.168.8.202-.204
Production workers: 192.168.8.205-.208
QA workers: 192.168.8.209-.212
Development workers: 192.168.8.213-.216
All 16 Proxmox VM IDs: 3156201-.3156216
All 16 VM MAC addresses shown in section 7.6
DHCP inside Ubuntu with router-side MAC reservations
Terraform for VM creation only
Ansible for operating-system and Kubernetes configuration
One repository-specific ARC scale set per environment

The cleaned source consistency checks are:


Development join playbook: 07-join-dev-workers.yml
QA join playbook: 07-join-qa-workers.yml
Production join playbook: 07-join-prod-workers.yml
Legacy generic development-worker join filename: not used
Standalone validate.yml workflow: not present
Empty future-product and Shelvera scaffolding: not present
Empty application-VM, scripts, and docs scaffolding: not present
Unused docker-runtime-host and firewall roles: not present
Active FP scale-set files: present for dev, QA, and prod

The Terraform resource allocation is:


Load balancer: 2 vCPU / 4 GB / 40 GB
Each control plane: 4 vCPU / 8 GB / 100 GB
Each worker: 4 vCPU / 16 GB / 250 GB
Total: 62 allocated vCPU / 220 GB RAM / 3,340 GB virtual disk

When rebuilding the environment from scratch, begin with:


1. Verify that 192.168.8.200-.219 are unused in the router lease and reservation tables.
2. Verify available Proxmox RAM and local-lvm storage.
3. Add the 16 DHCP reservations from section 7.6.
4. Reserve 192.168.8.200 for the API VIP and .217-.219 for shared infrastructure.
5. Prepare prod-terraform-deploy-02 and its self-hosted infrastructure runner.
6. Prepare the local Terraform state directory.
7. Restore the cleaned ac-cicd-infra source, branches, and GitHub Environments.
8. Run the dev Terraform validation and plan workflow.
9. Review the plan.
10. Run the prod Terraform apply workflow.

Do not apply Terraform until the router collision check and Proxmox capacity review are complete.

39. Continuation Prompt

Use the following prompt when continuing implementation in a new chat:

We are rebuilding the ac-cicd-infra environment from scratch using the existing Terraform, Ansible, Kubernetes, Helm, and GitHub Actions source as the source of truth. Do not redesign or modify the working infrastructure code unless I explicitly approve a code change.

The design uses one shared Kubernetes cluster and one shared ARC 0.14.2 controller with two replicas. Ephemeral ARC runner pods perform both build and deployment jobs. There are no permanent development, QA, or production build/deploy VMs.

All Kubernetes VMs are cloned from tmplt-ub-26-min-base. Terraform creates Proxmox VMs only. Ansible configures Ubuntu, HAProxy, Keepalived, containerd, Kubernetes v1.36.1, Calico v3.32.0, node joins, labels, taints, the shared ARC controller, and runner scale sets.

The API endpoint is cicd-ac-k8s-api.aspireclan.com:443 with VIP 192.168.8.200. The load balancer is .201, control planes are .202-.204, production workers are .205-.208, QA workers are .209-.212, and development workers are .213-.216. The 16 VM MAC addresses and VM IDs are the values in section 7.6.

The Terraform sizing is: load balancer 2 vCPU/4 GB/40 GB; each control plane 4 vCPU/8 GB/100 GB; each worker 4 vCPU/16 GB/250 GB. The complete topology is 62 vCPU, 220 GB RAM, and 3,340 GB virtual disk.

Use DHCP inside Ubuntu and router-side MAC reservations. Do not configure static Netplan addresses. Keep the Terraform boot disk at scsi0, local-lvm, and at least 40G.

The infrastructure repository uses dev for validation and Terraform plan, and prod for apply/configuration. The trusted controller is prod-terraform-deploy-02 at 192.168.8.93, and the local Terraform state path is /var/lib/ac-cicd-infra/terraform-state/shared-k8s/terraform.tfstate.

FP uses GitHub organization fp-001-org, Harbor project fp-ci-cd, namespaces arc-runners-fp-dev, arc-runners-fp-qa, and arc-runners-fp-prod, and scale sets fp-web-ui-001-dev-arc, fp-web-ui-001-qa-arc, and fp-web-ui-001-prod-arc. Each application build and deploy job uses the same environment scale set, but ARC creates a separate ephemeral runner pod for each job.

The first deployment target is dev-web-01 at 192.168.8.120. The development container is fp-web-ui-001-dev, and the application is reached at http://192.168.8.120:8080/.

Begin by verifying that .200-.219 are unused, adding the 16 router DHCP reservations, checking Proxmox capacity, preparing prod-terraform-deploy-02, restoring the repository and GitHub Environments, running the dev Terraform plan, reviewing it, and then applying from prod.