Skip to main content

Production Certificate Issuer Setup

1. Purpose

Current authoritative implementation

This page now records the deployed certificate issuer, not a proposed future VM:


Hostname: prod-cert-01
IP: 192.168.8.3
MAC: aa:bb:cc:04:05:02
Proxmox VM ID: 3156003
vCPU: 4
RAM: 8192 MiB
Disk: 40G
Terraform map: cert_vms
Terraform values file: envs/prod/cert.tfvars
Ansible group: cert
Issuer playbook: envs/prod/ansible/configure-cert.yml
Issuer account: ac-cert-issuer
Configuration root: /etc/aspireclan/cert-issuer
Issuer utility: /usr/local/sbin/ac-cert-issuer

The current operational checkpoint is Vault credential preparation. The VM and Ansible foundation are complete, the Vault TLS certificate has been installed locally, and the issuer timer is intentionally disabled because RoleID and SecretID files do not yet exist.

This page is the complete Terraform-first implementation and operating plan for the Aspireclan production certificate issuer VM:


Authoritative VM hostname: prod-cert-01
Functional role: production certificate issuer
Reserved IP: 192.168.8.3
Reserved MAC: aa:bb:cc:04:05:02
Proxmox VM ID: 3156003

The earlier planning name prod-cert-issuer-01 is superseded by the router-reserved hostname prod-cert-01. This page uses prod-cert-01 consistently for Terraform, Proxmox, DHCP, DNS, Ansible inventory, systemd, Vault AppRole binding, logs, and disaster recovery.

The VM is configured to obtain and renew TLS certificates through the Let's Encrypt ACME DNS-01 challenge, use a least-privilege Cloudflare API token retrieved from Vault, validate every result, and store each approved certificate as a new Vault KV v2 version. HAProxy and other consumers will read certificates from Vault through separate read-only identities.

Terraform creates and reconciles the VM only. Ansible and root-only issuer utilities install and configure the operating system, Certbot, Vault authentication, reconciliation logic, systemd units, logging, and firewall rules. Terraform must never receive a Cloudflare token, Vault SecretID, certificate private key, ACME account key, or issued certificate.


2. Implementation Status and Approved Decisions

Current implementation status — updated after direct-server testing

The current production certificate-issuer state is:


prod-dns-01:
status: complete
address: 192.168.8.4

prod-vault-01:
status: initialized, normally unsealed, named administrator validated
address: 192.168.8.2
cert/ KV v2: present
audit device: present
initial root token: revoked
issuer helper: installed manually at /usr/local/sbin/ac-vault-prepare-cert-issuer

prod-cert-01:
Terraform: complete
Ansible foundation: complete
hostname: prod-cert-01
address: 192.168.8.3
MAC: aa:bb:cc:04:05:02
VM ID: 3156003
vCPU: 4
RAM: 8192 MiB
disk: 40G
Vault TLS leaf: installed
AppRole credentials: absent
timer: intentionally inactive
certificates enabled: none

Completed and directly observed:

  • Terraform created prod-cert-01 and subsequent production plans reported no infrastructure changes.
  • The workflow uses envs/prod/cert.tfvars and the cert_vms Terraform map.
  • The production inventory uses [cert].
  • The working issuer playbook is envs/prod/ansible/configure-cert.yml.
  • The issuer variables are in envs/prod/ansible/group_vars/cert.yml.
  • The AppRole path assertion failure was corrected by using cert_issuer_role_id_path and cert_issuer_secret_id_path.
  • The target-host unprivileged execution failure was corrected by installing acl and replacing Ansible become_user command execution with /usr/sbin/runuser.
  • Certbot, python3-certbot-dns-cloudflare, OpenSSL, curl, jq, Python, the restricted service account, issuer configuration, declarations, reconciler, systemd service, systemd timer, log rotation, and UFW rules are installed.
  • The exact Vault listener certificate was installed at /etc/aspireclan/cert-issuer/vault-ca.pem.
  • The protected directory prevents the interactive acllc account from opening the CA path directly. TLS checks must run through sudo -u ac-cert-issuer or root.
  • The missing Vault-side helper was installed at /usr/local/sbin/ac-vault-prepare-cert-issuer.
  • The Vault helper authenticated successfully but received HTTP 403 permission denied at PUT /v1/acme/config.

Current pending gate:


1. Reconcile the version-controlled platform-admin.hcl policy.
2. Apply the corrected policy to Vault.
3. Verify that the manoj-admin token has create/update on acme/config.
4. Rerun ac-vault-prepare-cert-issuer on prod-vault-01.
5. Verify acme/cloudflare/dns metadata without displaying the token.
6. Copy the returned RoleID and one-use wrapping token.
7. Bootstrap AppRole credentials directly on prod-cert-01.
8. Run issuer status and preflight.
9. Enable the twice-daily timer.
10. Run one no-op reconciliation while all declarations remain disabled.

The observed 403 occurred before the helper stored the Cloudflare token and before it generated a wrapped SecretID. This page therefore does not claim that either operation has completed.

The approved phase-one design remains:


Terraform root: envs/prod
Terraform state: prod/terraform.tfstate
Operating system: Ubuntu Server 26.04 LTS
ACME client: Certbot
DNS plugin: python3-certbot-dns-cloudflare
Challenge type: DNS-01
Staging CA: Let's Encrypt staging
Production CA: Let's Encrypt production
Certificate source of truth: Vault KV v2 at cert/
ACME credential mount: Vault KV v2 at acme/
Cloudflare credential path: acme/cloudflare/dns
Machine authentication: Vault AppRole cert-issuer
AppRole source binding: 192.168.8.3/32
Reconciliation schedule: 03:17 and 15:17 UTC with randomized delay
HAProxy integration: read from Vault; HAProxy never contacts Let's Encrypt
Public inbound access to issuer: none

3. Scope and Non-Goals

The implementation scope now includes both the completed server foundation and the direct Ubuntu credential-activation runbook. PowerShell and Windows helper scripts are no longer part of the approved operating procedure. All operator bootstrap work is performed directly on prod-vault-01 and prod-cert-01 through PuTTY/SSH with hidden prompts and root-controlled temporary files.

This page covers the Terraform-first server foundation and the exact design gates for the later Ansible and certificate-lifecycle implementation.

It includes:

  • The deployed VM name, IP, MAC, VM ID, and final sizing.
  • The decision to keep the first issuer VM in the existing envs/prod Terraform root and production state.
  • Required Terraform variable, module, output, and tfvars additions.
  • Required GitHub Actions updates that preserve DNS-first execution, always-run Terraform planning, destructive-action protection, and selective Ansible targeting.
  • The working Ansible inventory group and component boundary.
  • Vault readiness, AppRole, response-wrapping, and least-privilege policy gates.
  • Certbot, Cloudflare DNS-01, staging, renewal, Vault versioning, and HAProxy-consumption architecture.
  • Validation, disaster recovery, monitoring, and implementation order.

Terraform provisioning and the issuer Ansible foundation are complete.

It does not yet:

  • prove that the Cloudflare token has been stored in Vault; the current helper run stopped at acme/config with HTTP 403.
  • create or print a Vault AppRole SecretID outside the one-use wrapped delivery flow.
  • issue a Let's Encrypt staging or production certificate.
  • enable the issuer timer before AppRole bootstrap and preflight succeed.
  • deploy HAProxy.
  • replace the Vault listener certificate.

4. Final Architecture

The implemented flow now reaches this exact checkpoint:


Git push to prod
→ DNS-first production workflow
→ Terraform plan/apply using cert.tfvars
→ prod-cert-01 created at 192.168.8.3
→ [cert] Ansible targeting
→ common bootstrap and configure-cert.yml
→ issuer packages, account, files, service, timer, and UFW installed
→ Vault TLS leaf installed at /etc/aspireclan/cert-issuer/vault-ca.pem
→ timer held inactive because AppRole credentials are absent
→ Vault helper installed on prod-vault-01
→ helper blocked at acme/config by platform-admin policy

The first-build infrastructure flow preserves the current repository architecture:


Git push to prod
→ production DNS Terraform plan/apply
→ mandatory DNS health gate against 192.168.8.4
→ production environment Terraform refresh/plan
→ S3 state: prod/terraform.tfstate
→ create or reconcile prod-cert-01
→ wait for SSH
→ component-aware Ansible syntax-check
→ real Ansible run for a newly created prod-cert-01
→ Ansible --check for an existing prod-cert-01
→ real Ansible apply only when check mode predicts changes

The approved certificate lifecycle is:


certificates.yml in Git
→ prod-cert-01 reconciliation service
→ authenticate to Vault with dedicated AppRole
→ read Cloudflare token from acme/cloudflare/dns
→ read existing certificate and metadata from cert/<env>/<type>/<name>
→ skip issuance when the certificate is valid, SANs match, and renewal is not due
→ otherwise perform Let's Encrypt DNS-01 through Cloudflare
→ validate key, leaf, chain, SANs, issuer, and expiration
→ write a new Vault KV v2 version with check-and-set
→ proxy-side Vault Agent or sync service detects the new version
→ atomically install HAProxy PEM
→ run haproxy -c
→ gracefully reload HAProxy only after validation passes

HAProxy does not request a certificate directly from the issuer. A Git declaration drives two independent reconcilers: the issuer ensures the referenced certificate exists in Vault, and the proxy ensures the approved Vault certificate is installed and used by HAProxy.


5. Approved Build Order

The dependency order and current status are:


1. prod-dns-01                         complete
2. prod-vault-01                       complete
3. Vault initialization/unseal         complete
4. File audit device                    complete
5. Named Vault administrator            complete
6. Initial root-token retirement        complete
7. cert/ KV v2 mount                    complete
8. Terraform prod-cert-01               complete
9. Ansible issuer foundation            complete
10. Vault TLS trust on prod-cert-01      complete
11. Vault issuer helper installation     complete
12. platform-admin acme/config access    pending; current blocker
13. acme/ KV v2 configuration            pending verification
14. Cloudflare token insertion           pending verification
15. Wrapped AppRole delivery             pending
16. prod-cert-01 preflight               pending
17. Twice-daily timer activation         pending
18. Let's Encrypt staging issuance       pending
19. Idempotent second reconciliation     pending
20. Staging renewal test                 pending
21. Production issuance                  pending
22. prod-int-proxy-01                    blocked until issuer approval

Do not generate or expose an unwrapped SecretID outside the consuming host. Do not begin HAProxy merely because the issuer VM and packages exist.

Immediate continuation:


prod-vault-01:
repair and apply platform-admin.hcl
verify acme/config capability
rerun ac-vault-prepare-cert-issuer

prod-cert-01:
bootstrap the wrapped AppRole credential
run preflight
enable the timer
run a no-op reconciliation

6. VM Profile and Required Identity Inputs

The values in the allocation table below are final and deployed. The earlier 2-vCPU/4096-MiB proposal is superseded by the working 4-vCPU/8192-MiB configuration.

The deployed router-reserved VM identity and sizing are:

VM nameVM IDMACReserved IPvCPURAMDiskTemplate
prod-cert-013156003aa:bb:cc:04:05:02192.168.8.348192 MiB40Gtmplt-ub-26-min-base

Network and service identity:


Proxmox node: pve
Bridge: vmbr0
Storage: local-lvm
DHCP: enabled inside Ubuntu
Router reservation: aa:bb:cc:04:05:02 → 192.168.8.3
Internal DNS server: 192.168.8.4
Vault endpoint: https://vault.aspireclan.com:8200
Direct Vault endpoint during bootstrap: https://192.168.8.2:8200
Issuer hostname: prod-cert-01
Issuer public exposure: none

The hostname is not secret. The VM ID is not secret. The MAC and reserved IP are not secret. Never place the Vault RoleID and SecretID together in Terraform variables, and never place the SecretID, Cloudflare token, certificate private keys, or ACME account key in Terraform state.


7. DNS and Endpoint Design

prod-cert-01 must use prod-dns-01 at 192.168.8.4 as its only resolver. Public resolution must be forwarded by BIND; do not configure public resolvers beside the internal resolver on the issuer VM.

Recommended internal DNS record:


prod-cert-01.aspireclan.com  → 192.168.8.3

The short hostname remains:


prod-cert-01

Validate before Ansible attempts Vault or ACME access:


dig @192.168.8.4 prod-cert-01.aspireclan.com
dig @192.168.8.4 vault.aspireclan.com
getent ahostsv4 prod-cert-01.aspireclan.com
getent ahostsv4 vault.aspireclan.com
getent ahostsv4 acme-staging-v02.api.letsencrypt.org
getent ahostsv4 api.cloudflare.com

The issuer does not need an inbound HTTPS listener. DNS-01 allows issuance without exposing TCP 80 or 443 on prod-cert-01.


8. Certificate Issuer Trust and Bootstrap Dependency Rule

The working TLS bootstrap exposed an important Unix-permission rule. /etc/aspireclan/cert-issuer is intentionally restricted, so the interactive acllc account cannot directly open vault-ca.pem. Test Vault TLS as the service account:


sudo -u ac-cert-issuer /usr/bin/curl --silent --show-error --fail-with-body --cacert /etc/aspireclan/cert-issuer/vault-ca.pem https://192.168.8.2:8200/v1/sys/health |
jq '{initialized,sealed,standby,version}'

Do not weaken the directory to make an unprivileged curl command work.

The issuer depends on Vault for durable secrets and certificate state, but the issuer must also be able to authenticate to Vault after recreation without placing long-lived secrets in Terraform or Git.

Approved bootstrap sequence:


1. Terraform creates prod-cert-01.
2. Ansible installs the issuer software without any SecretID.
3. An administrator reads the non-secret RoleID from Vault.
4. An administrator generates a response-wrapped one-time SecretID.
5. The wrapping token is delivered out of band to prod-cert-01.
6. prod-cert-01 unwraps it once over TLS.
7. The SecretID is written to a root-only file or consumed by Vault Agent.
8. The wrapping token is immediately useless after unwrapping.
9. The issuer authenticates and receives a short-lived batch token.

The Vault listener currently uses a bootstrap self-signed certificate. Until it is replaced by an approved certificate, prod-cert-01 must receive the exact Vault CA/server certificate through a controlled, non-secret Ansible file or operator transfer and use it with VAULT_CACERT. Do not use permanent VAULT_SKIP_VERIFY=true.

The issuer must not depend on HAProxy to reach Vault. It should connect directly to the internal Vault endpoint so proxy failure cannot block certificate renewal.


9. Security Model

Mandatory rules:


prod-cert-01 runs no unrelated application workloads.
Terraform manages VM infrastructure only.
Ansible manages packages, files, users, systemd, DNS, and UFW.
Vault stores the Cloudflare token and durable certificate records.
The Cloudflare token is zone-scoped to aspireclan.com.
The token has DNS edit and only the additional zone-read permission required by the plugin.
No Global API Key is used.
No Cloudflare token is stored in Certbot renewal files permanently.
Temporary credential files are root-owned, mode 0600, and removed after use.
The AppRole is dedicated to prod-cert-01 and bound to 192.168.8.3/32 where practical.
Vault tokens are short-lived batch tokens.
The issuer policy cannot administer Vault, auth methods, policies, audit devices, or mounts.
The issuer cannot delete or destroy Vault certificate history.
Certificate writes use KV v2 check-and-set where practical.
UFW defaults: deny incoming, deny routed, allow outgoing initially.
Inbound SSH is restricted to approved management sources.
No inbound public internet access is permitted.
Logs contain metadata and errors, never private keys, tokens, SecretIDs, or PEM payloads.

The dedicated VM is a security boundary. Do not co-locate HAProxy, application workloads, GitHub runners, or general administration tools on it.


10. Repository File Structure

The repository paths below are the working names. Planning-only names such as cert-issuer.tfvars, [cert_issuer], configure-cert-issuer.yml, and group_vars/cert_issuer.yml are obsolete.

The working repository structure is aligned with the supplied working production root:


terraform/
├── .github/workflows/terraform-proxmox-deploy.yml
├── modules/proxmox-vm/
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
└── envs/prod/
  ├── backend.tf
  ├── main.tf
  ├── variables.tf
  ├── outputs.tf
  ├── terraform.tfvars
  ├── web.tfvars
  ├── app.tfvars
  ├── db.tfvars
  ├── k8s.tfvars
  ├── runner.tfvars
  ├── vault.tfvars
  ├── cert.tfvars               # working
  ├── dns/                             # separate DNS root and state
  └── ansible/
      ├── configure-vms.yml
      ├── configure-vault.yml
      ├── configure-cert.yml    # working issuer overlay
      ├── inventory.ini
      ├── requirements.yml
      ├── group_vars/cert.yml   # working issuer overlay
      ├── templates/
      └── files/cert/

The current Terraform implementation does not create envs/prod/cert-issuer as a separate root. This avoids redesigning the working workflow before the VM exists. After the issuer is stable, a controlled state extraction can move only the issuer resources to:


Terraform root: envs/prod/cert-issuer
S3 state key: prod/cert-issuer/terraform.tfstate

That migration must use an explicit state-move procedure and must not recreate the VM.


11. Terraform Responsibilities and Required Additions

The final Terraform contract is cert_vms, not cert_issuer_vms, and the production values file is cert.tfvars. The working VM object is:


cert_vms = {
prod-cert-01 = {
  vmid        = 3156003
  macaddr     = "aa:bb:cc:04:05:02"
  reserved_ip = "192.168.8.3"
  cores       = 4
  memory      = 8192
  disk_size   = "40G"
}
}

The production workflow requires and loads cert.tfvars; omitting it would leave the map empty and make the documentation inconsistent with the deployed state.

Terraform manages only the VM lifecycle. It does not install Certbot, initialize Vault, create SecretIDs, retrieve secrets, request certificates, or execute remote bootstrap commands.

Add to envs/prod/variables.tf:


variable "cert_vms" {
description = "Certificate issuer VMs for this environment."
type = map(object({
  vmid        = number
  macaddr     = string
  reserved_ip = string
  cores       = number
  memory      = number
  disk_size   = string
}))
default = {}
}

Add to envs/prod/main.tf:


module "cert_vms" {
source = "../../modules/proxmox-vm"

for_each = var.cert_vms

name          = each.key
vmid          = each.value.vmid
target_node   = var.target_node
template_name = var.template_name
storage       = var.storage
bridge        = var.bridge
macaddr       = each.value.macaddr
reserved_ip   = each.value.reserved_ip
cores         = each.value.cores
memory        = each.value.memory
disk_size     = each.value.disk_size
}

Add to envs/prod/outputs.tf:


output "cert_vms" {
value = {
  for name, vm in module.cert_vms : name => {
    vmid        = vm.vmid
    macaddr     = vm.macaddr
    reserved_ip = vm.reserved_ip
  }
}
}

The deployed envs/prod/cert.tfvars is:


cert_vms = {
prod-cert-01 = {
  vmid        = 3156003
  macaddr     = "aa:bb:cc:04:05:02"
  reserved_ip = "192.168.8.3"
  cores       = 2
  memory      = 4096
  disk_size   = "40G"
}
}

The shared VM module already uses a full clone, QEMU Guest Agent, DHCP, a fixed MAC, scsi0, and format = "raw". Preserve that working module. Do not add Cloud-Init, static Netplan, remote-exec, or secret-bearing provisioners.


12. Router Reservation and Collision Checks

The collision and reservation checks in this section are retained as rebuild safeguards. For the current production VM, VM ID 3156003, MAC aa:bb:cc:04:05:02, and IP 192.168.8.3 are already in use by prod-cert-01; do not interpret the original “expected when free” output as the current state.

The approved router reservation is:


aa:bb:cc:04:05:02 → 192.168.8.3
Hostname: prod-cert-01

For rebuild or collision review, run a live Proxmox check from an approved Proxmox shell or API client:


qm status 3156003

Expected only before first creation when free:


Configuration file 'nodes/pve/qemu-server/3156003.conf' does not exist

Also verify that no cluster resource uses the ID:


pvesh get /cluster/resources --type vm --output-format json | jq -e '.[] | select(.vmid == 3156003)' && echo 'ERROR: VM ID 3156003 is already in use.' || echo 'PASS: VM ID 3156003 is not present in cluster resources.'

Check the reserved IP before power-on:


ping -c 2 -W 1 192.168.8.3 || true
ip neigh show | grep -F '192.168.8.3' || true

An unanswered ping does not prove that an address is free. Confirm the router reservation and current DHCP lease table. Ubuntu remains DHCP-based; do not add a static Netplan address.


13. Terraform Validation, Plan, and Apply

Current Terraform state:


prod-cert-01 was created successfully.
Current production plans should report: No changes.
State key: prod/terraform.tfstate
Values file: envs/prod/cert.tfvars

A later plan proposing replacement or deletion of prod-cert-01 remains a stop condition unless explicitly reviewed.

Completed first execution — build prod-cert-01

The completed build followed these controls:

  1. Confirm the router reservation.
  2. Confirm that 3156003 is unused in live Proxmox.
  3. Add the variable, module, output, and cert.tfvars declarations.
  4. Update the workflow to include cert.tfvars without removing any current variable file.
  5. Add prod-cert-01 to the working Ansible inventory before Terraform apply so precise target resolution cannot fail.
  6. Push the reviewed change to prod.
  7. The workflow must run the production DNS plan and health gate first.
  8. The main production plan must refresh all existing resources and propose only one new VM.
  9. Automatic push apply is allowed only when the plan has no delete or replacement actions.
  10. The newly created host must receive the real Ansible playbook after SSH becomes reachable.

The workflow's working Terraform variable arguments are:


tf_var_args=(
"-var-file=terraform.tfvars"
"-var-file=web.tfvars"
"-var-file=app.tfvars"
"-var-file=db.tfvars"
"-var-file=k8s.tfvars"
"-var-file=runner.tfvars"
)

if [ -f "vault.tfvars" ]; then
tf_var_args+=("-var-file=vault.tfvars")
fi

if [ -f "cert.tfvars" ]; then
tf_var_args+=("-var-file=cert.tfvars")
fi

Local review command on the self-hosted runner:


cd envs/prod
terraform init -input=false -reconfigure
terraform fmt -check -recursive
terraform validate
terraform plan -input=false -lock-timeout=5m -var-file=terraform.tfvars -var-file=web.tfvars -var-file=app.tfvars -var-file=db.tfvars -var-file=k8s.tfvars -var-file=runner.tfvars -var-file=vault.tfvars -var-file=cert.tfvars -out=tfplan
terraform show -json tfplan > tfplan.json

Expected plan scope:


Create: prod-cert-01 only
Update: none unless separately reviewed
Delete: none
Replace: none

Stop if Terraform proposes replacement or deletion of Vault, DNS, web, Kubernetes, runner, database, or application VMs.


14. Ansible Inventory and Targeting

Working inventory and playbook wiring:


[cert]
prod-cert-01 ansible_host=192.168.8.3 ansible_user=acllc ansible_ssh_private_key_file=~/.ssh/id_ed25519_ansible ansible_python_interpreter=/usr/bin/python3 ansible_ssh_common_args='-o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'

configure-vms.yml:
bootstrap-vms.yml
configure-web.yml
configure-vault.yml
configure-cert.yml
finalize-firewall.yml

Two implementation defects were found and corrected:


Incorrect:
cert_issuer_vault_role_id_path
cert_issuer_vault_secret_id_path

Correct:
cert_issuer_role_id_path
cert_issuer_secret_id_path

Incorrect:
Ansible become_user: ac-cert-issuer for command tasks

Correct:
install acl
execute through /usr/sbin/runuser with HOME=/var/lib/ac-cert-issuer

The dedicated inventory group in envs/prod/ansible/inventory.ini:


[cert]
prod-cert-01 ansible_host=192.168.8.3 ansible_user=acllc ansible_ssh_private_key_file=~/.ssh/id_ed25519_ansible ansible_python_interpreter=/usr/bin/python3 ansible_ssh_common_args='-o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'

The working issuer playbook must use both safety boundaries:


hosts: cert

ansible-playbook -i ansible/inventory.ini ansible/configure-vms.yml --limit prod-cert-01

The working playbook is included in envs/prod/ansible/configure-vms.yml without changing the existing common order:


---
- import_playbook: bootstrap-vms.yml
- import_playbook: configure-web.yml
- import_playbook: configure-vault.yml
- import_playbook: configure-cert.yml
- import_playbook: finalize-firewall.yml

Update workflow component detection with a dedicated flag and group. Issuer files must not map to vault, web, or all merely because they reference Vault or certificates.

Recommended classification:


ansible_cert="false"

# Inside the component case statement:
*cert-issuer*|*cert_issuer*)
ansible_cert="true"
;;

# When building the Ansible pattern list:
[ "${ansible_cert}" = "true" ] && patterns+=("cert")

Shared files such as inventory.ini, configure-vms.yml, and group_vars/all.yml may still conservatively target all hosts.


15. Common Operating-System Baseline

The common bootstrap and working issuer playbook must enforce or verify:


Hostname: prod-cert-01
SSH and passwordless sudo for acllc
QEMU Guest Agent active
Internal DNS resolver: 192.168.8.4 only
UFW installed and enabled
Default incoming: deny
Default routed: deny
Default outgoing: allow initially
No public inbound service
Dedicated issuer service account
Root-only credential and working directories
Core dumps disabled for issuer services
Private temporary files created with umask 077
Certbot, OpenSSL, jq, curl, Python, and the issuer utility installed from approved sources
systemd service installed; timer kept disabled until AppRole credentials and preflight succeed

Useful checks after provisioning:


hostnamectl --static
ip -brief address
ip route
resolvectl status
systemctl is-active qemu-guest-agent ssh
sudo ufw status verbose
sudo ss -lntup

The common playbook must not run web, application, database, Kubernetes, runner, Vault-server, or proxy roles against this host.


16. Configure Internal DNS on prod-cert-01

The issuer must resolve all names through prod-dns-01 at 192.168.8.4.

Validate on prod-cert-01:


resolvectl status
getent ahostsv4 vault.aspireclan.com
getent ahostsv4 api.cloudflare.com
getent ahostsv4 acme-staging-v02.api.letsencrypt.org
getent ahostsv4 acme-v02.api.letsencrypt.org
dig @192.168.8.4 vault.aspireclan.com
dig @192.168.8.4 aspireclan.com NS

Expected internal service result:


vault.aspireclan.com → 192.168.8.2

The public authoritative nameservers for aspireclan.com must remain Cloudflare so Let's Encrypt can observe the temporary _acme-challenge TXT record. Internal BIND records do not replace public DNS-01 validation.


17. UFW Policy

Initial fail-closed host policy:


Default incoming: deny
Default routed: deny
Default outgoing: allow
TCP 22 inbound: approved management sources only
TCP 8200 outbound: prod-vault-01 at 192.168.8.2
TCP 443 outbound: Cloudflare API and Let's Encrypt ACME endpoints
UDP/TCP 53 outbound: prod-dns-01 at 192.168.8.4
Inbound TCP 80: not required
Inbound TCP 443: not required
Inbound public internet: prohibited

The initial UFW implementation may retain the common allow outgoing baseline. Egress should later be narrowed only after confirming all required package, OCSP, CRL, Cloudflare, Let's Encrypt, Vault, DNS, monitoring, and time-synchronization destinations.

Validate:


sudo ufw status verbose
sudo ss -lntup
nc -vz 192.168.8.2 8200
curl --silent --show-error --fail --head https://api.cloudflare.com/client/v4/
curl --silent --show-error --fail --head https://acme-staging-v02.api.letsencrypt.org/directory

A successful network connection is not authorization. Vault and Cloudflare must still reject requests without valid credentials.


18. Install the Approved ACME Client

The approved client is now installed through Ubuntu packages:


certbot
python3-certbot-dns-cloudflare
openssl
curl
jq
python3
python3-yaml
acl

The reconciler uses a temporary mode-0600 Cloudflare credential file under /run/ac-cert-issuer and removes it after the Certbot process exits.

Certbot remains the approved client for this design because it has a maintained Cloudflare DNS plugin, preserves renewal configuration, supports the Let's Encrypt staging endpoint, performs renewal eligibility checks, and supports deploy hooks. The current stable Certbot documentation also describes automated periodic renewal and the --staging option.

The alternatives remain valid but are not selected:


lego:
Strength: single Go binary and broad DNS provider support.
Tradeoff: more custom work for lineages, hooks, Vault mapping, and package lifecycle.

acme.sh:
Strength: lightweight shell client and broad DNS automation.
Tradeoff: larger custom shell surface and a less controlled packaging model for this design.

Decision:
Keep Certbot and certbot-dns-cloudflare.
Do not silently switch clients after certificate lineages exist.

The approved Ubuntu package installation is implemented after confirming Ubuntu 26.04 package availability and the current official Certbot installation guidance. Do not mix Snap, APT, and pip installations on the same host.

Required capabilities:


certbot certonly
--dns-cloudflare
--dns-cloudflare-credentials <root-only-temporary-file>
--dns-cloudflare-propagation-seconds <approved-value>
--staging for test issuance
certbot renew --dry-run for renewal validation
certbot renew for scheduled renewal eligibility checks

Certbot's local state is operational cache, not the sole durable source of truth. Vault remains the durable certificate source, and the issuer reconciliation layer must recover safely after VM recreation.


19. Configure Vault Trust and Machine Identity

The issuer talks to Vault through its own Python reconciler and HTTPS API calls. A Vault CLI package is not required on prod-cert-01.

Working paths:


Configuration:
/etc/aspireclan/cert-issuer/config.yml

Declarations:
/etc/aspireclan/cert-issuer/certificates.yml

Vault listener certificate:
/etc/aspireclan/cert-issuer/vault-ca.pem

AppRole directory:
/etc/aspireclan/cert-issuer/approle

RoleID:
/etc/aspireclan/cert-issuer/approle/role_id

SecretID:
/etc/aspireclan/cert-issuer/approle/secret_id

Issuer executable:
/usr/local/sbin/ac-cert-issuer

The exact certificate presented by Vault was installed locally:


subject:
O=Aspireclan LLC, CN=vault.aspireclan.com

issuer:
O=Aspireclan LLC, CN=vault.aspireclan.com

notBefore:
Jun 14 00:22:04 2026 GMT

notAfter:
Jun 14 00:27:04 2027 GMT

SHA-256 fingerprint:
FB:3A:55:29:9F:EB:B4:BC:69:0D:FC:AF:FB:09:B8:D0:
96:D6:EE:F4:19:E4:25:30:0C:B0:D2:54:9D:EE:EF:36

Installed ownership:


/etc/aspireclan/cert-issuer:
owner: root
group: ac-cert-issuer
mode: 0750

/etc/aspireclan/cert-issuer/vault-ca.pem:
owner: root
group: ac-cert-issuer
mode: 0644

Because acllc is not a member of ac-cert-issuer, an ordinary command may report that vault-ca.pem does not exist even though root installed it successfully. Test access as the service account:


sudo -u ac-cert-issuer test -r /etc/aspireclan/cert-issuer/vault-ca.pem &&
echo "PASS: Vault certificate is readable"

sudo -u ac-cert-issuer /usr/bin/curl --silent --show-error --fail-with-body --cacert /etc/aspireclan/cert-issuer/vault-ca.pem https://192.168.8.2:8200/v1/sys/health |
jq '{initialized,sealed,standby,version}'

Expected Vault state:


initialized: true
sealed: false
standby: false

Current machine-identity state:


Vault CA: installed
RoleID file: absent
SecretID file: absent
AppRole preflight: pending
Timer: inactive

Do not set VAULT_SKIP_VERIFY=true. Do not relax the protected directory merely to allow an acllc shell to read the CA file.


20. Deploy Issuer Configuration

The working Ansible overlay is deployed:


envs/prod/ansible/
├── configure-cert.yml
├── configure-vms.yml
├── inventory.ini
├── group_vars/
│   └── cert.yml
├── files/
│   └── cert/
│       ├── ac-cert-issuer
│       └── certificates.yml
└── templates/
  ├── cert-issuer-config.yml.j2
  ├── ac-cert-issuer.service.j2
  ├── ac-cert-issuer.timer.j2
  └── ac-cert-issuer-logrotate.j2

The deployed host paths are:


Configuration:
/etc/aspireclan/cert-issuer/config.yml

Declarations:
/etc/aspireclan/cert-issuer/certificates.yml

Vault trust:
/etc/aspireclan/cert-issuer/vault-ca.pem

AppRole files:
/etc/aspireclan/cert-issuer/approle/role_id
/etc/aspireclan/cert-issuer/approle/secret_id

Executable:
/usr/local/sbin/ac-cert-issuer

State:
/var/lib/ac-cert-issuer

Runtime:
/run/ac-cert-issuer

Logs:
/var/log/ac-cert-issuer

The two corrected playbook contracts are:


Correct AppRole variables:
cert_issuer_role_id_path
cert_issuer_secret_id_path

Correct unprivileged execution:
/usr/sbin/runuser
--user ac-cert-issuer
--
/usr/bin/env
HOME=/var/lib/ac-cert-issuer
/usr/local/sbin/ac-cert-issuer

The working declaration file is intentionally disabled:


---
schema_version: 1

certificates:
- name: fp-prod-public
  enabled: false
  vault_path: prod/srvc/api-fp
  domains:
    - fp.aspireclan.com
    - api.fp.aspireclan.com
  email: developer@aspireclan.com
  acme_directory: https://acme-v02.api.letsencrypt.org/directory
  renew_before_days: 30
  dns_propagation_seconds: 60
  key_type: ecdsa
  elliptic_curve: secp384r1

Keep enabled: false until Vault AppRole bootstrap and preflight pass. For the first issuance test, change the declaration to the staging directory before enabling it:


https://acme-staging-v02.api.letsencrypt.org/directory

configure-cert.yml installs the timer unit but leaves it stopped unless both credential files exist. This is the expected safe current state.


21. Start the Issuer and Perform Pre-Issuance Validation

Current pre-credential validation commands:


sudo /usr/sbin/runuser --user ac-cert-issuer -- /usr/bin/env HOME=/var/lib/ac-cert-issuer /usr/local/sbin/ac-cert-issuer --config /etc/aspireclan/cert-issuer/config.yml validate-config

sudo /usr/sbin/runuser --user ac-cert-issuer -- /usr/bin/env HOME=/var/lib/ac-cert-issuer /usr/local/sbin/ac-cert-issuer --config /etc/aspireclan/cert-issuer/config.yml status

Expected current status:


RoleID file present: false
SecretID file present: false
Enabled certificate groups: none
ac-cert-issuer.timer: inactive

Before enabling the reconciliation timer, run:


hostnamectl --static
ip -brief address
resolvectl status
sudo ufw status verbose
systemctl is-active qemu-guest-agent ssh
certbot --version
openssl version
jq --version
curl --version
/usr/local/sbin/ac-cert-issuer --help

Validate Vault TLS without a token:


VAULT_ADDR="https://vault.aspireclan.com:8200" VAULT_CACERT="/etc/aspireclan/cert-issuer/vault-ca.pem" vault status

Validate that the declared staging name is publicly managed by Cloudflare before requesting a certificate:


dig +short NS aspireclan.com
dig +short TXT _acme-challenge.acme-test.aspireclan.com

Do not enable the timer until manual staging issuance, Vault write, second-run idempotency, and renewal dry-run are complete.


22. Create and Deliver the AppRole SecretID

All bootstrap work is performed directly on Ubuntu servers. PowerShell is not part of the approved procedure.

Current Vault-side checkpoint

On prod-vault-01, this helper is installed:


/usr/local/sbin/ac-vault-prepare-cert-issuer
owner: root
group: root
mode: 0750

The helper:

  1. authenticates the named Vault administrator;
  2. writes the version-controlled policies;
  3. verifies or enables acme/ as KV v2;
  4. writes acme/config;
  5. stores the Cloudflare token at acme/cloudflare/dns;
  6. configures the CIDR-bound cert-issuer AppRole;
  7. returns the RoleID and a response-wrapped SecretID.

The last observed run stopped at step 4:


URL: PUT https://192.168.8.2:8200/v1/acme/config
HTTP status: 403
Error: permission denied

Because of that stop:


Cloudflare token write: not proven
RoleID output: not produced by that run
Wrapped SecretID: not produced by that run
prod-cert-01 credential files: absent

Repair the human administrator policy first

The version-controlled platform-admin.hcl and the policy stored in Vault must agree. After applying the corrected policy, authenticate as manoj-admin and verify:


vault token capabilities acme/config
vault token capabilities acme/data/cloudflare/dns
vault token capabilities auth/approle/role/cert-issuer
vault token capabilities auth/approle/role/cert-issuer/role-id
vault token capabilities auth/approle/role/cert-issuer/secret-id

Required minimum results:


acme/config:
create
read
update

acme/data/cloudflare/dns:
create
read
update

auth/approle/role/cert-issuer:
create
read
update

auth/approle/role/cert-issuer/role-id:
read

auth/approle/role/cert-issuer/secret-id:
create
update

Rerun the Vault helper

A successful helper response contains only bootstrap metadata:


role_id
wrapping_token
wrap_ttl: 30m
creation_path: auth/approle/role/cert-issuer/secret-id

Copy the RoleID and one-use wrapping token immediately. Do not store them in Git, Terraform, inventory, documentation, email, or ordinary files.

Bootstrap directly on prod-cert-01

Run:


bash <<'BASH'
set -Eeuo pipefail
umask 077

CONFIG="/etc/aspireclan/cert-issuer/config.yml"
ISSUER="/usr/local/sbin/ac-cert-issuer"

sudo -v

sudo test -s /etc/aspireclan/cert-issuer/vault-ca.pem || {
echo "ERROR: Vault TLS certificate is missing." >&2
exit 1
}

sudo test ! -e /etc/aspireclan/cert-issuer/approle/role_id || {
echo "ERROR: RoleID file already exists." >&2
exit 1
}

sudo test ! -e /etc/aspireclan/cert-issuer/approle/secret_id || {
echo "ERROR: SecretID file already exists." >&2
exit 1
}

read -r -p "Paste the Vault Role ID: " ROLE_ID </dev/tty
read -r -s -p "Paste the one-use Vault wrapping token: " WRAPPING_TOKEN </dev/tty
echo

[[ -n "${ROLE_ID}" ]]
[[ -n "${WRAPPING_TOKEN}" ]]

printf '%s' "${WRAPPING_TOKEN}" |
sudo -n "${ISSUER}"   --config "${CONFIG}"   bootstrap-approle   --role-id "${ROLE_ID}"

unset ROLE_ID WRAPPING_TOKEN
BASH

Expected:


PASS: Wrapped AppRole SecretID was unwrapped, login-tested, and stored atomically

A wrapping token is single-use. If it expires or is consumed by a failed attempt, generate a new one on prod-vault-01.


23. Secure and Verify Machine Credentials

The installed issuer utility writes both AppRole values atomically as the restricted issuer account.

Verify without displaying contents:


sudo stat -c '%n owner=%U group=%G mode=%a size=%s' /etc/aspireclan/cert-issuer/approle/role_id /etc/aspireclan/cert-issuer/approle/secret_id /etc/aspireclan/cert-issuer/vault-ca.pem

Expected:


role_id:
owner: ac-cert-issuer
group: ac-cert-issuer
mode: 640

secret_id:
owner: ac-cert-issuer
group: ac-cert-issuer
mode: 640

vault-ca.pem:
owner: root
group: ac-cert-issuer
mode: 644

The parent configuration directory remains intentionally restricted. The acllc account may receive “file does not exist” when attempting to open the CA directly because it cannot traverse the directory. Do not solve this by making the directory world-accessible.

Security rules:


Do not cat the RoleID or SecretID.
Do not grep secret-bearing files.
Do not enable shell tracing.
Do not place values in process arguments.
Do not paste values into chat, tickets, or documentation.
Clear clipboard history and PuTTY scrollback after bootstrap.
Generate a new wrapped SecretID after rebuild or suspected exposure.

24. Authenticate to Vault and Verify the Issuer Token

Do not manually read the RoleID and SecretID into an interactive shell. The installed utility performs AppRole login internally and limits token lifetime to the process.

Run sanitized status:


sudo /usr/sbin/runuser --user ac-cert-issuer -- /usr/bin/env HOME=/var/lib/ac-cert-issuer /usr/local/sbin/ac-cert-issuer --config /etc/aspireclan/cert-issuer/config.yml status

Run preflight:


sudo /usr/sbin/runuser --user ac-cert-issuer -- /usr/bin/env HOME=/var/lib/ac-cert-issuer /usr/local/sbin/ac-cert-issuer --config /etc/aspireclan/cert-issuer/config.yml preflight

Expected after successful bootstrap while declarations remain disabled:


RoleID file present: true
SecretID file present: true
Vault AppRole login succeeded
Cloudflare token path is readable
Enabled certificate groups: none

The issuer token must be denied access to:


sys/policies/*
sys/auth/*
sys/audit/*
sys/mounts/*
identity/*
secret deletion
certificate version destruction

Perform capability review through an administrator session without printing the issuer token.


25. Create the Cloudflare API Token

The Cloudflare token is entered interactively on prod-vault-01 when /usr/local/sbin/ac-vault-prepare-cert-issuer runs. It is not entered on prod-cert-01, stored in GitHub, or placed in a command-line argument.

Create a dedicated Cloudflare API token for ACME DNS-01.

Approved permissions and scope:


Resource: Zone → aspireclan.com only
Permission: Zone → DNS → Edit
Permission: Zone → Zone → Read only when required by the plugin
Account-wide administrator permissions: none
Global API Key: prohibited
Optional client IP restriction: evaluate against the issuer's actual public egress IP
Optional expiration: use only with a tested rotation procedure

The token secret is shown once. Do not paste it into GitHub Actions, Terraform variables, Ansible vars, documentation, chat, email, shell command arguments, or Certbot configuration committed to Git.

The issuer will create a temporary root-only Cloudflare credentials file at runtime:


dns_cloudflare_api_token = <TOKEN_RETRIEVED_FROM_VAULT_AT_RUNTIME>

The runtime file must use mode 0600, live under /run/ac-cert-issuer, and be deleted in a trap even when Certbot fails.


26. Store the ACME Credential in Vault

The approved Vault separation is:


Mount: acme/
Engine: KV v2
max_versions: 10
cas_required: false
delete_version_after: 0s

Logical path:
acme/cloudflare/dns

Fields:
api_token
managed_for
updated_at

The Cloudflare token is entered through a hidden prompt on prod-vault-01 and passed to /usr/local/sbin/ac-vault-prepare-cert-issuer through a root-controlled JSON payload. It is not entered on prod-cert-01.

The helper sequence is deliberately fail-closed:


configure acme/ KV v2
→ write acme/config
→ write acme/cloudflare/_schema when absent
→ write acme/cloudflare/dns
→ configure cert-issuer AppRole
→ generate wrapped SecretID

The observed 403 occurred at acme/config, before the token write. Do not claim the token exists until this command succeeds after policy repair:


sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault kv metadata get acme/cloudflare/dns

The metadata output may be recorded. The secret value must never be displayed in documentation or logs.

Certificate readers and HAProxy identities must have no access to acme/.


27. Enable and Verify Certificate KV v2 Paths

The approved certificate mount remains:


Mount: cert/
Plugin: kv
Version: 2
Path: cert/<environment>/<workload-type>/<workload-name>

Approved environments:


local
dev
qa
prod

Approved workload types:


web
srvc
job
infra
proxy

Initial staging path:


cert/prod/infra/acme-test

Initial production candidates remain disabled until staging is approved:


cert/prod/web/fp
cert/prod/srvc/api-fp
cert/prod/infra/vault
cert/prod/infra/harbor

Do not create every possible certificate. The declaration file defines only approved groups, and the reconciliation service skips disabled groups.


28. Certificate Data Model and Automatic Version History

Each successful certificate write should include:


certificate_pem
chain_pem
fullchain_pem
private_key_pem
haproxy_pem
serial_number
fingerprint_sha256
not_before
not_after
issuer
dns_names
issued_at
renewed_at
renew_after
acme_environment
acme_directory
source
certificate_group
schema_version

The HAProxy PEM bundle order is:


private key
leaf certificate
intermediate chain

Before writing, validate:


The private key matches the leaf certificate.
The certificate SAN set exactly matches the approved declaration.
The certificate chain parses and verifies.
The issuer is expected for staging or production.
The not-before and not-after values are valid.
The certificate has sufficient remaining lifetime.
The HAProxy PEM bundle parses.
No field is empty or contains a partial response.

Read current KV metadata, obtain the current version, and write with options.cas so two concurrent jobs cannot silently overwrite each other. A failed validation must not create a new Vault version.


29. Least-Privilege Machine Policies

The certificate issuer uses policy cert-issuer. It is separate from the human platform-admin policy.

Working cert-issuer.hcl:


# Certificate issuer: write certificate versions, but never destroy history.
path "cert/config" {
capabilities = ["read"]
}

path "cert/data/*" {
capabilities = ["create", "read", "update", "patch"]
}

path "cert/metadata" {
capabilities = ["list"]
}

path "cert/metadata/*" {
capabilities = ["read", "list"]
}

path "cert/subkeys/*" {
capabilities = ["read"]
}

# Cloudflare token is isolated in the acme/ KV v2 mount and is read-only.
path "acme/data/cloudflare/dns" {
capabilities = ["read"]
}

path "acme/metadata/cloudflare/dns" {
capabilities = ["read"]
}

The issuer policy intentionally excludes:


delete
destroy
sudo
sys/auth
sys/policies
sys/audit
sys/mounts
auth/*
identity/*
sys/raw

The current 403 is a human administrator policy/runtime capability problem, not an issuer-policy problem. platform-admin.hcl must permit the setup operations used by the Vault helper and must be updated in both places:


Repository:
envs/prod/ansible/files/vault/policies/platform-admin.hcl

Vault:
sys/policies/acl/platform-admin

After reconciliation, use vault token capabilities to prove access to acme/config before rerunning the helper. A manual server-only policy edit is not sufficient because a later Ansible run could restore the repository version.


30. Prepare the Dedicated AppRole Machine Identity

The working machine identity is:


AppRole: cert-issuer
Policy: cert-issuer
Token type: batch
Token TTL: 15m
Token maximum TTL: 30m
SecretID TTL: 0
SecretID number of uses: 0
SecretID bound CIDR: 192.168.8.3/32
Token bound CIDR: 192.168.8.3/32
Response-wrap TTL: 30m

The Vault helper configures it with:


vault write auth/approle/role/cert-issuer token_type=batch token_policies=cert-issuer token_ttl=15m token_max_ttl=30m secret_id_ttl=0 secret_id_num_uses=0 secret_id_bound_cidrs=192.168.8.3/32 token_bound_cidrs=192.168.8.3/32

Verify after the helper succeeds:


sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault read auth/approle/role/cert-issuer

sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault read auth/approle/role/cert-issuer/role-id

secret_id_ttl=0 and secret_id_num_uses=0 provide continuity but create a reusable SecretID. Rotate it after VM recreation, credential loss, suspected exposure, or security-policy change. The response-wrapping token remains short-lived and one-use.


31. Protect ACME Account State and Recovery Data

Certbot creates ACME account and lineage state under its configuration directory. That state must not be the only recovery source.

Durable design:


Vault:
issued certificates
private keys
certificate metadata
Cloudflare API token

Encrypted off-host backup:
ACME account registration state
non-secret declaration history already retained in Git

Issuer VM local disk:
working Certbot lineages
temporary Cloudflare credentials
runtime Vault tokens
logs without secret material

The implementation must determine whether to preserve the Certbot ACME account key in a dedicated Vault path or encrypted off-host backup. Do not store it beside ordinary certificate reader paths. Recreating the VM must be able to recover existing valid certificates from Vault without requesting replacements immediately.

A fresh issuer may create a new ACME account only when approved. It must still read Vault first and avoid mass reissuance.


32. Reboot and Service-Restart Procedure

Do not run the reboot acceptance test until AppRole bootstrap, preflight, and timer activation succeed. At the current checkpoint the timer is expected to remain disabled.

After the VM is configured but before production issuance, test a reboot:


sudo systemctl reboot

After reconnecting:


hostnamectl --static
resolvectl status
systemctl is-active qemu-guest-agent ssh
systemctl status ac-cert-issuer.timer --no-pager
systemctl list-timers --all | grep -F ac-cert-issuer
sudo journalctl -u ac-cert-issuer.service -n 100 --no-pager

Expected behavior:


The timer returns automatically.
The service authenticates to Vault without operator login.
A valid existing Vault certificate is reused.
No certificate is reissued merely because the VM rebooted.
No secret material appears in the journal.
A sealed or unavailable Vault causes a safe failure and alert, not issuance with fallback secrets.

Do not make the systemd timer depend on an interactive user session.


33. Certificate Issuer Operations

The issuer has no public UI. Operate it through the installed utility, systemd, journald, and Vault metadata.

Configuration validation


sudo /usr/sbin/runuser --user ac-cert-issuer -- /usr/bin/env HOME=/var/lib/ac-cert-issuer /usr/local/sbin/ac-cert-issuer --config /etc/aspireclan/cert-issuer/config.yml validate-config

Sanitized status


sudo /usr/sbin/runuser --user ac-cert-issuer -- /usr/bin/env HOME=/var/lib/ac-cert-issuer /usr/local/sbin/ac-cert-issuer --config /etc/aspireclan/cert-issuer/config.yml status

Vault and declaration preflight


sudo /usr/sbin/runuser --user ac-cert-issuer -- /usr/bin/env HOME=/var/lib/ac-cert-issuer /usr/local/sbin/ac-cert-issuer --config /etc/aspireclan/cert-issuer/config.yml preflight

Manual reconciliation


sudo systemctl start ac-cert-issuer.service

sudo systemctl status ac-cert-issuer.service --no-pager

A successful oneshot may show inactive (dead) after completion when the exit status is 0/SUCCESS.

Timer


sudo systemctl is-enabled ac-cert-issuer.timer
sudo systemctl is-active ac-cert-issuer.timer

sudo systemctl list-timers ac-cert-issuer.timer --all --no-pager

Logs


sudo journalctl -u ac-cert-issuer.service --since today --no-pager

sudo journalctl -u ac-cert-issuer.timer --since today --no-pager

The status and logs may show:


certificate declaration name
Vault logical path
current KV version
SAN names
serial number
fingerprint
not-after timestamp
days remaining
renewal eligibility
last reconciliation result

They must never show:


private keys
PEM bodies
Cloudflare tokens
Vault SecretIDs
Vault client tokens
raw Vault responses containing secrets

At the current checkpoint, status should report missing AppRole files and the timer should remain inactive.


34. Complete Validation Checklist

Run the checks in order. Do not skip from installation directly to production issuance.

Current completed checks


[✓] Terraform tracks prod-cert-01
[✓] Current Terraform plan reports no unintended VM change
[✓] Hostname is prod-cert-01
[✓] Reserved address is 192.168.8.3
[✓] QEMU Guest Agent and SSH are active
[✓] Certbot is installed
[✓] dns-cloudflare plugin is installed
[✓] Issuer configuration validates
[✓] Vault TLS leaf is installed
[✓] Vault TLS leaf is readable by ac-cert-issuer
[✓] UFW fail-closed baseline is active
[✓] AppRole files are absent before bootstrap
[✓] Timer is inactive before credentials
[✓] Vault preparation helper exists
[ ] platform-admin acme/config capability
[ ] acme/cloudflare/dns metadata
[ ] RoleID and SecretID files
[ ] issuer preflight
[ ] timer activation
[ ] staging issuance
[ ] second-run idempotency
[ ] renewal validation
[ ] production issuance

Host and software


hostname
hostname -I
resolvectl status
systemctl is-active ssh qemu-guest-agent

certbot --version
certbot plugins
openssl version
jq --version
curl --version

Vault TLS


sudo -u ac-cert-issuer /usr/bin/curl --silent --show-error --fail-with-body --cacert /etc/aspireclan/cert-issuer/vault-ca.pem https://192.168.8.2:8200/v1/sys/health |
jq '{initialized,sealed,standby,version}'

Required:


initialized: true
sealed: false
standby: false

Credential files


sudo stat -c '%U:%G %a %n' /etc/aspireclan/cert-issuer/approle/role_id /etc/aspireclan/cert-issuer/approle/secret_id

Issuer


sudo -u ac-cert-issuer /usr/local/sbin/ac-cert-issuer --config /etc/aspireclan/cert-issuer/config.yml validate-config

sudo -u ac-cert-issuer /usr/local/sbin/ac-cert-issuer --config /etc/aspireclan/cert-issuer/config.yml status

sudo -u ac-cert-issuer /usr/local/sbin/ac-cert-issuer --config /etc/aspireclan/cert-issuer/config.yml preflight

Timer and reconciliation


sudo systemctl is-enabled ac-cert-issuer.timer
sudo systemctl is-active ac-cert-issuer.timer
sudo systemctl list-timers ac-cert-issuer.timer --all --no-pager

sudo systemctl start ac-cert-issuer.service
sudo systemctl status ac-cert-issuer.service --no-pager
sudo journalctl -u ac-cert-issuer.service -n 100 --no-pager

A successful oneshot service may show inactive (dead) after completion when its exit status is 0/SUCCESS.

Staging acceptance

Before changing enabled to true, change the declaration to the Let's Encrypt staging directory. Required acceptance:


[ ] staging certificate issued
[ ] private key matches leaf
[ ] SAN set matches declaration
[ ] chain parses and verifies
[ ] Vault KV v2 version written
[ ] second reconciliation creates no new certificate
[ ] second reconciliation creates no new Vault version
[ ] renewal behavior validated
[ ] no secret values appear in journald

35. Controlled Upgrade Procedure

Treat Certbot, plugin, Python, OpenSSL, and issuer-utility changes as certificate-platform changes.

Before upgrading:


1. Confirm current certificates have safe remaining lifetime.
2. Confirm Vault and off-host snapshots are healthy.
3. Record installed Certbot and plugin versions.
4. Run the existing reconciliation in check-only mode.
5. Disable the timer during the maintenance window.
6. Upgrade only from the approved installation source.
7. Run plugin discovery and a staging renewal dry-run.
8. Run a normal reconciliation and verify no unwanted issuance.
9. Re-enable the timer.

Commands will depend on the approved installation method. Do not combine package ecosystems or perform an unattended major upgrade without staging validation.


36. Disaster-Recovery Design

Recovery requires:


Terraform and Ansible source
The approved router reservation
A free or restored Proxmox VM ID
Vault availability and audit health
The cert-issuer AppRole delivery procedure
The Cloudflare token stored in Vault
Existing certificate records in Vault
ACME account recovery data or approval to create a new account
Certificate declarations in Git

Recovery flow:


1. Terraform recreates prod-cert-01 with the same MAC and reserved IP.
2. Ansible restores the issuer software and protected directories.
3. Deliver a new wrapped AppRole SecretID; do not reuse a possibly exposed one.
4. Authenticate to Vault.
5. Read every declared certificate path before contacting Let's Encrypt.
6. Reuse valid certificates whose SANs match and which are outside the renewal window.
7. Restore approved ACME account state when available.
8. Issue only missing, invalid, changed, or renewal-eligible certificates.
9. Prove that a second reconciliation is idempotent.
10. Re-enable the timer after validation.

Recreating the issuer must not cause every certificate to be reissued. This is a mandatory acceptance test.


37. Handoff from prod-vault-01

The Vault-to-issuer handoff is not complete until every unchecked item passes:


[✓] Vault initialized
[✓] Vault normally unsealed
[✓] file audit device enabled
[✓] named administrator validated
[✓] initial root token revoked
[✓] cert/ KV v2 enabled
[✓] off-host Raft backup path implemented
[✓] exact Vault TLS leaf installed on prod-cert-01
[✓] ac-vault-prepare-cert-issuer installed on prod-vault-01
[✓] cert-issuer policy includes read-only acme/cloudflare/dns access
[ ] platform-admin token has create/update on acme/config
[ ] acme/ configuration write succeeds
[ ] Cloudflare token metadata exists at acme/cloudflare/dns
[ ] cert-issuer AppRole configuration is verified
[ ] RoleID and one-use wrapping token are returned
[ ] prod-cert-01 credential files are installed
[ ] prod-cert-01 preflight succeeds
[ ] ac-cert-issuer.timer is enabled and active

The current blocker is the runtime 403 on acme/config.

Consistency rule:


Do not fix only the live Vault policy.
Update the version-controlled platform-admin.hcl.
Apply it to Vault.
Verify token capabilities.
Then rerun the helper.

The issuer phase remains blocked from staging issuance until this handoff completes.


38. Handoff to prod-int-proxy-01

The proxy phase begins only after:


A staging certificate has been issued and stored in Vault.
A second reconciliation proves no duplicate issuance.
A staging renewal dry-run succeeds.
At least one approved production certificate has been issued.
Certificate rollback to an earlier Vault KV version is understood.
Issuer VM recreation reuses valid Vault certificates.
Expiration monitoring and failure alerts are active.

The proxy will:


Use its own read-only Vault AppRole.
Read only explicitly approved certificate paths.
Never read acme/ credentials.
Write HAProxy PEM files atomically to root-owned paths.
Run haproxy -c before every reload.
Keep the old certificate and configuration when validation fails.
Reload gracefully after successful validation.
Never request a certificate directly from Let's Encrypt or prod-cert-01.

Adding a proxy route creates a desired-state reference to a certificate group. It does not create a direct HAProxy-to-issuer API dependency.


39. Mandatory Security Rules


No Cloudflare token in Terraform, tfvars, state, outputs, Git, GitHub Actions, Ansible vars, logs, or documentation.
No Vault SecretID in Terraform, state, GitHub secrets, inventory, logs, or documentation.
No certificate private key in Terraform state or workflow artifacts.
No ACME account key in Git or ordinary backups.
No permanent VAULT_SKIP_VERIFY=true.
No Global Cloudflare API Key.
No public inbound access to prod-cert-01.
No inbound TCP 80 or 443 for DNS-01.
No direct HAProxy request to the issuer.
No certificate issuance before checking Vault.
No Vault write before complete certificate validation.
No write without check-and-set when a current version exists.
No delete or destroy capability in the issuer policy.
No timer activation before staging and idempotency tests.
No production issuance before staging renewal succeeds.
No mass reissuance after VM recreation.
No secret values in journald, stdout, stderr, process arguments, or shell history.
No unapproved Terraform state restructuring.

40. Complete Implementation Order

The full implementation history and remaining order are:


1.  [✓] Confirm prod-dns-01 at 192.168.8.4.
2.  [✓] Confirm prod-vault-01 at 192.168.8.2.
3.  [✓] Initialize and unseal Vault.
4.  [✓] Enable the Vault audit device.
5.  [✓] Validate the named administrator.
6.  [✓] Revoke the initial root token.
7.  [✓] Enable cert/ KV v2.
8.  [✓] Configure off-host Raft backups.
9.  [✓] Confirm router reservation for prod-cert-01.
10. [✓] Confirm VM ID 3156003.
11. [✓] Add cert_vms to variables.tf.
12. [✓] Add module cert_vms to main.tf.
13. [✓] Add output cert_vms to outputs.tf.
14. [✓] Add envs/prod/cert.tfvars.
15. [✓] Add the [cert] inventory group.
16. [✓] Load cert.tfvars in the production workflow.
17. [✓] Add cert-specific Ansible targeting.
18. [✓] Apply Terraform.
19. [✓] Create prod-cert-01.
20. [✓] Verify DHCP, DNS, SSH, QEMU Agent, and UFW.
21. [✓] Deploy configure-cert.yml.
22. [✓] Install Certbot and dns-cloudflare.
23. [✓] Deploy the issuer utility and systemd units.
24. [✓] Correct AppRole path variable names.
25. [✓] Replace become_user command execution with runuser.
26. [✓] Install the Vault TLS leaf on prod-cert-01.
27. [✓] Install ac-vault-prepare-cert-issuer on prod-vault-01.
28. [ ] Reconcile platform-admin.hcl in Git and on Vault.
29. [ ] Verify acme/config capabilities.
30. [ ] Rerun ac-vault-prepare-cert-issuer.
31. [ ] Verify acme/cloudflare/dns metadata.
32. [ ] Deliver RoleID and wrapped SecretID.
33. [ ] Bootstrap AppRole on prod-cert-01.
34. [ ] Run issuer status and preflight.
35. [ ] Enable the twice-daily timer.
36. [ ] Run one immediate no-op reconciliation.
37. [ ] Switch the first declaration to Let's Encrypt staging.
38. [ ] Enable and issue the staging certificate.
39. [ ] Prove second-run idempotency and renewal behavior.
40. [ ] Approve production issuance and begin prod-int-proxy-01.

Immediate next action:


prod-vault-01:
update platform-admin.hcl
apply the policy
verify vault token capabilities acme/config
rerun ac-vault-prepare-cert-issuer

prod-cert-01:
bootstrap the wrapped AppRole credential
run preflight
enable ac-cert-issuer.timer

41. Source Consistency Status

This file is a modified full replacement of the attached prod-cert-01-setup MDX page. The attached page, not the earlier condensed artifact, was used as the baseline.

Exact baseline identity:


Previous line count: 1933
Previous SHA-256:
9c4229157c9888b4045967ca4b8adcda6a76ba56affc6d453446aa291b12e053

Structure preserved:


Frontmatter field order
CustomCodeBlock import
Page-local K8S-overview-style CSS
Full-width desktop content container
260px desktop table-of-contents/anchor panel
Responsive allocation table
Single H1
Forty-three numbered H2 sections
Existing section order
Horizontal separators
Official references section
Continuation prompt section

Material updates compared with the previous file:


Previous:
planning-state document
VM not yet built
proposed 2 vCPU / 4096 MiB
cert_issuer_vms
cert-issuer.tfvars
[cert_issuer]
configure-cert-issuer.yml
group_vars/cert_issuer.yml
/etc/ac-cert-issuer
acme/cloudflare/aspireclan
AppRole prod-cert-01
PowerShell-oriented delivery references

Updated:
deployed VM ID 3156003
final 4 vCPU / 8192 MiB / 40G
cert_vms
cert.tfvars
[cert]
configure-cert.yml
group_vars/cert.yml
/etc/aspireclan/cert-issuer
acme/cloudflare/dns
AppRole cert-issuer
direct Ubuntu/PuTTY procedures only
variable-name correction
runuser/acl correction
Vault TLS permission behavior
missing Vault helper installation
observed acme/config HTTP 403
honest pending-state checklist

Validation performed for this replacement:


PASS: Updated file has more lines than the 1933-line baseline.
PASS: Exact unified diff created against the attached baseline.
PASS: Previous and updated content are not byte-identical.
PASS: YAML frontmatter parsed.
PASS: Required frontmatter keys retained.
PASS: Exactly one H1 retained.
PASS: H2 sections remain sequential from 1 through 43.
PASS: Existing H2 titles and order retained.
PASS: K8S-style desktop layout CSS retained.
PASS: 260px anchor/table-of-contents panel retained.
PASS: CustomCodeBlock opening and closing tags balanced.
PASS: No Cloudflare token value included.
PASS: No Vault token, SecretID, private key, or PEM payload included.
LIMITATION: A full Docusaurus build was not run because the complete documentation repository and package manifest were not supplied.

The accompanying comparison report contains the final updated line count, SHA-256 hash, and exact added/removed-line totals.


42. Official References


43. Continuation Prompt

Continue the Aspireclan production certificate issuer from the current verified checkpoint.

prod-cert-01 is already deployed with hostname prod-cert-01, IP 192.168.8.3, MAC aa:bb:cc:04:05:02, Proxmox VM ID 3156003, 4 vCPU, 8192 MiB RAM, and 40G disk. Terraform uses cert_vms and envs/prod/cert.tfvars. The production inventory uses [cert]. Ansible uses configure-cert.yml and group_vars/cert.yml.

The issuer foundation is installed. The AppRole variable-name error was corrected, and the Ansible unprivileged-command error was corrected with acl plus explicit runuser. The working issuer paths are /etc/aspireclan/cert-issuer, /usr/local/sbin/ac-cert-issuer, /var/lib/ac-cert-issuer, and /run/ac-cert-issuer. The Vault TLS leaf is installed at /etc/aspireclan/cert-issuer/vault-ca.pem.

The issuer RoleID and SecretID files are absent, all certificate declarations remain disabled, and ac-cert-issuer.timer is intentionally inactive.

On prod-vault-01, /usr/local/sbin/ac-vault-prepare-cert-issuer is installed. Its latest run authenticated successfully but failed with HTTP 403 permission denied at PUT /v1/acme/config. The failure occurred before the Cloudflare token write and before wrapped SecretID generation.

Next: reconcile platform-admin.hcl in Git and Vault, verify acme/config capabilities, rerun the helper, verify metadata at acme/cloudflare/dns, deliver the RoleID and one-use wrapped SecretID directly to prod-cert-01, run issuer preflight, enable the twice-daily timer, and complete a no-op reconciliation. Use direct Ubuntu commands only; do not use PowerShell.

Do not enable production issuance or begin prod-int-proxy-01 until staging issuance, second-run idempotency, renewal behavior, Vault versioning, logs, and recovery are validated.