Skip to main content

Production Vault Setup

1. Purpose

This page is the complete, current implementation and operating reference for the Aspireclan production Vault server:


prod-vault-01
192.168.8.2
aa:bb:cc:04:05:01
Proxmox VM ID 3156002

It records the exact Terraform, GitHub Actions, Ansible, TLS, initialization, unseal, logical-bootstrap, root-token-retirement, UI, dedicated SSE-S3 Raft backup, reboot-validation, recovery, and certificate-issuer handoff approach that reached the current tested checkpoint on 2026-06-14.

The server is the trust foundation for the next infrastructure phases:

  • prod-cert-01, the authoritative certificate-issuer VM at 192.168.8.3, which is now Terraform-provisioned and Ansible-configured and will obtain and renew certificates after its Vault AppRole activation is completed.
  • Internal and external HAProxy servers, which will read only the certificate paths permitted for their environment.
  • Infrastructure administrators, who will operate policies, authentication, audit, snapshots, upgrades, and disaster recovery.

Terraform creates and reconciles the Proxmox VM. Ansible installs and hardens Vault. Vault initialization is deliberately performed once, directly on prod-vault-01, so the five Shamir unseal-key shares and initial root token never enter Terraform state, Ansible variables, GitHub Actions logs, or committed files.


2. Implementation Status and Approved Decisions

Current implementation status — validated on 2026-06-14

The following foundation and direct-server bootstrap activities are working:

  • prod-dns-01 is deployed separately with state at prod/dns/terraform.tfstate and DNS service at 192.168.8.4.
  • prod-vault-01 is tracked in the production state at prod/terraform.tfstate.
  • Terraform refresh/plan runs on every dev, qa, and prod workflow execution, even when Git file detection finds no Terraform source change.
  • The production DNS job runs and verifies DNS first.
  • Ansible targets only the affected component and uses syntax-check, check mode, and apply-on-change behavior.
  • HashiCorp Vault 2.0.2 is installed from the official HashiCorp APT repository.
  • Vault uses Integrated Storage (Raft), TLS, the built-in UI, fail-closed UFW, swap disabling, and a hardened systemd override.
  • The current bootstrap certificate is generated idempotently on the Vault VM and stored under /opt/vault/tls.
  • Vault was initialized exactly once directly on prod-vault-01 with five Shamir shares and a threshold of three.
  • The five unseal shares were obtained directly from the Vault server. Their custody remains operator-controlled and must stay outside Terraform, Ansible, GitHub Actions, and the Vault VM.
  • The direct-server sudo procedure was tested successfully for TLS/status validation, interactive unseal, logical bootstrap, named-administrator validation, initial-root-token revocation, Raft snapshot creation/checksum verification, and reboot/manual-unseal validation.
  • The file audit device, cert/ KV v2 mount, policies, userpass, AppRole roles, and certificate schema records are configured.
  • prod-cert-01 is now deployed at 192.168.8.3, VM ID 3156003, MAC aa:bb:cc:04:05:02, with the certificate-issuer Ansible foundation installed under /etc/aspireclan/cert-issuer and /usr/local/sbin/ac-cert-issuer.
  • The Vault listener certificate was copied to prod-cert-01 as /etc/aspireclan/cert-issuer/vault-ca.pem. The issuer service account can read it; the ordinary acllc shell cannot traverse the restricted issuer directory, so direct TLS tests must run with sudo -u ac-cert-issuer or as root.
  • The Vault-side helper /usr/local/sbin/ac-vault-prepare-cert-issuer was initially missing because the earlier certificate-only Ansible run did not execute the Vault play. It has now been installed directly on prod-vault-01 and reached the Vault API successfully.
  • The first issuer-preparation attempt stopped at PUT /v1/acme/config with HTTP 403 permission denied. This proves authentication and connectivity are working, but the current platform-admin policy still lacks the required ACME KV v2 data/config capabilities.
  • Because the helper stops at acme/config before writing the Cloudflare token or generating the wrapped SecretID, the Cloudflare token, final AppRole credential pair, issuer preflight, systemd timer activation, staging issuance, and production issuance remain pending.
  • The initial root token was retired after the named platform-admin user was validated.
  • The dedicated SSE-S3 Vault backup bucket and first hourly upload are implemented and validated. Downloaded checksum verification, scheduled timer observation, CloudWatch stale-backup alarm testing, and the isolated snapshot-restore exercise remain the outstanding resilience gates.
Current certificate-issuer integration checkpoint

Do not mark the certificate issuer as activated yet. The confirmed stopping point is:


prod-vault-01 helper: installed and executable
Vault administrator login: successful
AppRole auth endpoint: reachable
acme/ mount creation: may have completed; verify before changing it
PUT /v1/acme/config: failed with HTTP 403 permission denied
Cloudflare token stored: not confirmed; helper did not reach that step
Wrapped SecretID generated: no
prod-cert-01 AppRole files installed: no
Issuer timer active: no
Let's Encrypt staging issuance: no

The next action is to update the version-controlled platform-admin.hcl and cert-issuer.hcl, apply the policy to Vault, rerun the helper, bootstrap the wrapped credential on prod-cert-01, run preflight, and only then enable the timer.

The approved phase-one design is:


Edition: Vault Community Edition
Installed/tested version: Vault 2.0.2
Operating system: Ubuntu Server 26.04 LTS
Storage: Integrated Storage (Raft)
Vault nodes: 1 initially
Seal type: Shamir manual unseal
Unseal shares: 5
Unseal threshold: 3
API and UI: HTTPS on TCP 8200, internal network only
Cluster port: TCP 8201, not opened to clients in single-node phase
Bootstrap TLS: self-signed certificate generated by Ansible on prod-vault-01
Authentication after bootstrap: userpass for a named administrator; AppRole for machines
Audit baseline: file audit device
Secrets engines: KV v2 mounted at cert/; acme/ is the dedicated issuer-credential mount and is at the policy-repair checkpoint
Certificate hierarchy: cert/<environment>/<workload-type>/<workload-name>
Cloudflare credential path: acme/cloudflare/dns
Certificate versions retained: 20
Initial root token: revoked after named-administrator validation
Auto-unseal: deferred
Snapshots: dedicated SSE-S3 off-host bucket implemented; first hourly snapshot uploaded, versioned, Object-Locked, metadata-verified, and downloaded; final checksum/timer/alarm/restore validation remains

Vault initialization created five unseal-key shares and one initial root token. The underlying encryption root key was not printed. Any three shares are sufficient to unseal this single node. Never run vault operator init again against the current Raft data.


3. Scope and Non-Goals

This page covers the current Vault foundation, the tested direct-server manual procedure, and the remaining resilience work.

It includes:

  • Proxmox provisioning with S3 remote state and locking.
  • DNS-first workflow sequencing.
  • Component-aware Terraform and Ansible execution.
  • Vault package installation from the official repository.
  • TLS private key, CSR, and self-signed bootstrap certificate generation.
  • Raft storage, service hardening, UFW, audit-log preparation, and policy deployment.
  • Direct initialization on prod-vault-01 with five shares and threshold three.
  • Direct, interactive unseal without putting shares in shell history.
  • Logical baseline configuration through the installed root-only bootstrap utility.
  • UI access and named administrator login.
  • Initial root-token retirement.
  • Manual Raft snapshot creation and checksum verification.
  • Dedicated SSE-S3 off-host bucket creation, Object Lock, lifecycle retention, IAM uploader, Vault AppRole, backup scripts, systemd units, and first hourly upload verification.
  • Reboot and three-share manual-unseal validation.
  • Vault-to-prod-cert-01 trust transfer using the current self-signed listener certificate.
  • Direct Ubuntu installation and operation of the Vault certificate-issuer preparation helper.
  • The exact ACME KV v2 policy correction required by the observed 403 response.
  • Response-wrapped AppRole delivery to the deployed issuer VM without PowerShell.

It does not yet claim that the certificate issuer is fully activated, that the Cloudflare token has been stored successfully, that a wrapped SecretID has been consumed, that the issuer timer is active, that a Let's Encrypt staging or production certificate has been issued, that HAProxy is deployed, that all scheduled backup timer/alarm validation is complete, or that an isolated snapshot restore has already been completed.


4. Final Architecture

The implemented phase-one flow is:


Git push to prod
→ dedicated production DNS Terraform/Ansible job
→ mandatory DNS health gate
→ production Terraform refresh/plan/apply
→ S3 state: prod/terraform.tfstate
→ component-aware Ansible syntax-check
→ Ansible --check on existing prod-vault-01
→ real Ansible apply only when required

prod-vault-01
├── Vault API and UI: HTTPS 8200
├── Raft data: /opt/vault/data
├── TLS: /opt/vault/tls
├── Configuration: /etc/vault.d/vault.hcl
├── Audit log: /var/log/vault/audit.log
├── Version-controlled policies: /usr/local/share/ac-vault/policies
└── Automated Raft backup uploader
      → aspireclan-prod-vault-raft-backups-425389089086-us-east-1
      → SSE-S3 / AES256
      → Versioning + Governance Object Lock + lifecycle retention

The current certificate-issuer handoff flow is:


prod-vault-01
├── /opt/vault/tls/vault-cert.pem
├── /usr/local/share/ac-vault/policies/platform-admin.hcl
├── /usr/local/share/ac-vault/policies/cert-issuer.hcl
└── /usr/local/sbin/ac-vault-prepare-cert-issuer
      → authenticate named administrator through userpass
      → verify/enable acme/ as KV v2
      → configure acme/config
      → store Cloudflare token at acme/cloudflare/dns
      → configure CIDR-bound cert-issuer AppRole for 192.168.8.3/32
      → return non-secret RoleID plus one-use wrapped SecretID

prod-cert-01
├── /etc/aspireclan/cert-issuer/vault-ca.pem
├── /etc/aspireclan/cert-issuer/approle/role_id
├── /etc/aspireclan/cert-issuer/approle/secret_id
├── /usr/local/sbin/ac-cert-issuer
└── ac-cert-issuer.timer

Current stop:
platform-admin lacks required acme/config capability
→ helper returned HTTP 403
→ policy source and live policy must be corrected before rerun

The future certificate hierarchy is:


cert/
├── local/
│   ├── web/<workload-name>
│   ├── srvc/<workload-name>
│   ├── job/<workload-name>
│   └── infra/<workload-name>
├── dev/
├── qa/
└── prod/

Examples:
cert/local/web/fp          → local.fp.aspireclan.com
cert/local/srvc/api-fp     → local.api.fp.aspireclan.com
cert/dev/web/fp            → dev.fp.aspireclan.com
cert/dev/srvc/api-fp       → dev.api.fp.aspireclan.com
cert/qa/web/fp             → qa.fp.aspireclan.com
cert/qa/srvc/api-fp        → qa.api.fp.aspireclan.com
cert/prod/web/fp           → fp.aspireclan.com
cert/prod/srvc/api-fp      → api.fp.aspireclan.com
cert/prod/infra/vault      → vault.aspireclan.com

Vault remains internal. TCP 8200 is allowed only from the approved LAN CIDR during bootstrap; TCP 8201 has no client-facing allow rule in the one-node design.


5. Approved Build Order

Use this dependency order:


1. prod-dns-01                         complete
2. prod-vault-01                       VM and server configuration complete
3. Initialize/unseal Vault             complete; five shares, threshold three
4. Configure logical baseline          complete; audit, cert/ KV v2, policies, userpass, AppRole
5. Revoke initial root token            complete after named-admin validation
6. Configure/test snapshots             SSE-S3 bucket and first hourly upload complete; final checksum/timer/alarm/restore validation pending
7. prod-cert-01 VM and Ansible base     complete; issuer staged with credentials absent and timer disabled
8. Install Vault issuer helper          complete directly on prod-vault-01
9. Repair ACME policy and rerun helper  current next step after observed HTTP 403 at acme/config
10. Bootstrap issuer AppRole            pending; direct Ubuntu response-wrapped delivery
11. Let's Encrypt staging lifecycle     pending
12. HAProxy servers                     only after a valid approved certificate exists

Terraform and Ansible may be coordinated in one workflow, but the dependency order remains DNS first, then Vault, then certificate issuer, then proxy.


6. VM Profile and Required Identity Inputs

The final approved VM identity is:

VM nameVM IDMACReserved IPvCPURAMDiskTemplate
prod-vault-013156002aa:bb:cc:04:05:01192.168.8.248192 MiB40Gtmplt-ub-26-min-base

Network and service identity:


Proxmox node: pve
Bridge: vmbr0
Storage: local-lvm
DHCP: enabled inside Ubuntu
Router reservation: aa:bb:cc:04:05:01 → 192.168.8.2
Internal DNS server: 192.168.8.4
Vault service name: vault.aspireclan.com
Direct API/UI endpoint: https://192.168.8.2:8200
Named API/UI endpoint: https://vault.aspireclan.com:8200

These values are not secrets. Never place Vault tokens, unseal shares, private keys, or AppRole SecretIDs beside them in Terraform variables.


7. DNS and Endpoint Design

The working Vault TLS identity includes:


Common name: vault.aspireclan.com
DNS SANs:
- vault.aspireclan.com
- prod-vault-01
IP SAN:
- 192.168.8.2

The internal DNS record must resolve:


vault.aspireclan.com  → 192.168.8.2

Validate from an internal client:


dig @192.168.8.4 vault.aspireclan.com
getent ahostsv4 vault.aspireclan.com
nc -vz 192.168.8.2 8200

The current bootstrap certificate is self-signed. Browsers and CLI clients must explicitly trust the certificate or use it as VAULT_CACERT. Do not use VAULT_SKIP_VERIFY=true as a permanent workaround.


8. Vault Listener TLS and the Bootstrap Dependency Rule

Vault cannot fetch the certificate required for its own listener from a sealed or unavailable Vault. The working implementation avoids that dependency loop by generating local bootstrap TLS material through Ansible:


# Vault server identity and network addresses.
vault_service_name: vault
vault_user: vault
vault_group: vault
vault_node_id: prod-vault-01
vault_ip_address: "192.168.8.2"
vault_api_addr: "https://192.168.8.2:8200"
vault_cluster_addr: "https://192.168.8.2:8201"

# Integrated Storage (Raft) and TLS filesystem paths.
vault_data_dir: /opt/vault/data
vault_tls_dir: /opt/vault/tls
vault_tls_private_key_path: /opt/vault/tls/vault-key.pem
vault_tls_csr_path: /opt/vault/tls/vault.csr
vault_tls_certificate_path: /opt/vault/tls/vault-cert.pem
vault_config_path: /etc/vault.d/vault.hcl
vault_audit_log_dir: /var/log/vault
vault_audit_log_path: /var/log/vault/audit.log

# This bootstrap certificate is created only when no certificate already exists.
# The certificate issuer phase will later replace the files at the same paths and
# signal Vault with SIGHUP, avoiding a Vault restart and reseal.
vault_bootstrap_tls_common_name: vault.aspireclan.com
vault_bootstrap_tls_dns_sans:
- vault.aspireclan.com
- prod-vault-01
vault_bootstrap_tls_ip_sans:
- "192.168.8.2"
vault_bootstrap_tls_validity: "+365d"

# Vault API access. Keep this narrow and replace the LAN CIDR with internal
# proxy/automation CIDRs after network segmentation is introduced.
vault_api_allowed_cidrs:
- "192.168.8.0/24"

# Port 8201 is not opened for a single-node deployment. Add only future Vault
# peer CIDRs here when a multi-node Raft cluster is approved.
vault_cluster_allowed_cidrs: []

vault_obsolete_ufw_rules: []

# Logical configuration defaults applied only after manual initialization.
vault_cert_mount: cert
vault_cert_environments:
- local
- dev
- qa
- prod
vault_cert_workload_types:
- web
- srvc
- job
- infra

The private key is generated only when absent with regenerate: never; an existing key is preserved. The CSR and certificate are created only during the real Ansible run, not during --check. The playbook verifies that the certificate and private key have the same public key and that the certificate is not immediately expired.

Important operational rule: listener TLS changes require a controlled Vault restart. Do not assume that sending SIGHUP reloads the TCP listener certificate. After Vault is initialized, schedule the restart and have three unseal shares available.


9. Security Model

Mandatory rules:


Vault runs as the dedicated vault operating-system account.
Vault is the only application service on prod-vault-01.
Raft data: /opt/vault/data, owner vault:vault, mode 0700.
TLS directory: /opt/vault/tls, owner vault:vault, mode 0700.
TLS private key: mode 0600.
Vault configuration: /etc/vault.d/vault.hcl, mode 0600.
Swap is disabled and persistent swap entries are commented.
Core dumps are disabled through systemd LimitCORE=0.
UFW defaults: deny incoming, deny routed, allow outgoing.
TCP 8200 is internal only.
TCP 8201 is reserved for future Vault peers.
Unseal shares never enter Git, Terraform, Ansible, GitHub Actions, or Vault.
The initial root token is temporary and revoked after the named administrator works.
AppRole SecretIDs are generated only when the consuming VM exists and is ready.
A single-node Vault is a single point of failure; the dedicated Object-Locked SSE-S3 backup path must remain healthy before production certificate keys are stored.

The current configuration sets disable_mlock = true and compensates by disabling swap. This is the tested phase-one baseline; revisit mlock capabilities during the next production-hardening review.


10. Repository File Structure

The working repository structure is:


terraform/
├── .github/workflows/terraform-proxmox-deploy.yml
├── modules/proxmox-vm/
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
└── envs/prod/
  ├── backend.tf
  ├── main.tf
  ├── variables.tf
  ├── outputs.tf
  ├── terraform.tfvars
  ├── web.tfvars
  ├── app.tfvars
  ├── db.tfvars
  ├── k8s.tfvars
  ├── runner.tfvars
  ├── vault.tfvars
  ├── dns/                         # separate DNS root and state
  └── ansible/
      ├── configure-vms.yml
      ├── configure-vault.yml
      ├── inventory.ini
      ├── requirements.yml
      ├── group_vars/vault.yml
      ├── templates/vault.hcl.j2
      ├── templates/vault-audit-logrotate.j2
      ├── files/vault.service.d/override.conf
      └── files/vault/
          ├── ac-vault-bootstrap-logical
          ├── ac-vault-prepare-cert-issuer
          └── policies/
              ├── platform-admin.hcl
              ├── cert-issuer.hcl
              ├── cert-reader-local.hcl
              ├── cert-reader-dev.hcl
              ├── cert-reader-qa.hcl
              └── cert-reader-prod.hcl

No generated TLS private key, unseal share, root token, administrator password, Vault token, or AppRole SecretID belongs in the repository.

The helper source must remain in Git at:


envs/prod/ansible/files/vault/ac-vault-prepare-cert-issuer

The Vault play must install it as:


/usr/local/sbin/ac-vault-prepare-cert-issuer
owner: root
group: root
mode: 0750

A direct manual installation repaired the current server, but the Ansible source and playbook must also contain the same helper so a future Vault convergence or disaster-recovery build cannot omit it again.


11. Terraform Responsibilities and Required Additions

Terraform manages only VM lifecycle and remote state. It does not initialize Vault or handle Vault secrets.

envs/prod/variables.tf:


variable "vault_vms" {
description = "Vault VMs for this environment."
type = map(object({
  vmid        = number
  macaddr     = string
  reserved_ip = string
  cores       = number
  memory      = number
  disk_size   = string
}))
default = {}
}

envs/prod/main.tf:


module "vault_vms" {
source = "../../modules/proxmox-vm"

for_each = var.vault_vms

name          = each.key
vmid          = each.value.vmid
target_node   = var.target_node
template_name = var.template_name
storage       = var.storage
bridge        = var.bridge
macaddr       = each.value.macaddr
reserved_ip   = each.value.reserved_ip
cores         = each.value.cores
memory        = each.value.memory
disk_size     = each.value.disk_size
}

envs/prod/outputs.tf:


output "vault_vms" {
value = {
  for name, vm in module.vault_vms : name => {
    vmid        = vm.vmid
    macaddr     = vm.macaddr
    reserved_ip = vm.reserved_ip
  }
}
}

envs/prod/vault.tfvars:


vault_vms = {
prod-vault-01 = {
  vmid        = 3156002
  macaddr     = "aa:bb:cc:04:05:01"
  reserved_ip = "192.168.8.2"
  cores       = 4
  memory      = 8192
  disk_size   = "40G"
}
}

envs/prod/backend.tf:


# Legacy environment-level state used during the controlled migration phase.
# This key will later be split into one S3 state per component.
terraform {
backend "s3" {
  bucket       = "aspireclan-terraform-state-425389089086-us-east-1"
  key          = "prod/terraform.tfstate"
  region       = "us-east-1"
  encrypt      = true
  use_lockfile = true
}
}

The shared VM module pins the Proxmox disk format to raw, preventing repeated provider normalization from raw to null:


resource "proxmox_vm_qemu" "vm" {
name        = var.name
vmid        = var.vmid
target_node = var.target_node

# Normal Proxmox template without a Cloud-Init drive.
clone      = var.template_name
full_clone = true

# QEMU Guest Agent is independent of Cloud-Init.
agent         = 1
agent_timeout = 180

# Prevent the provider from waiting for an IPv6 guest address.
skip_ipv6 = true

cpu {
  type    = "host"
  cores   = var.cores
  sockets = 1
}

memory = var.memory

scsihw = "virtio-scsi-pci"

# IMPORTANT:
# This disk block must match the source template boot disk layout.
# The tmplt-ub-26-min-base template currently uses scsi0 and 40G.
# Do not set this smaller than the template disk. A mismatched size such as 32G
# can create an extra empty disk and leave the real cloned Ubuntu disk unused.
disk {
  slot    = "scsi0"
  size    = var.disk_size
  storage = var.storage
  format  = "raw"
}

network {
  id      = 0
  model   = "virtio"
  bridge  = var.bridge
  macaddr = var.macaddr
}
}

12. Router Reservation and Collision Checks

The router reservation is:


aa:bb:cc:04:05:01 → 192.168.8.2

Before recreating the VM, verify the reservation and absence of a conflicting lease or neighbor entry:


ping -c 2 -W 1 192.168.8.2 || true
ip neigh show | grep -F '192.168.8.2' || true

Ubuntu remains DHCP-based. Do not add a static Netplan address. The reserved address is selected by the Terraform-assigned MAC.


13. Terraform Validation, Plan, and Apply

First execution — build a new prod-vault-01

  1. Confirm the router reservation and vault.tfvars values.
  2. Push the reviewed change to the prod branch.
  3. The workflow authenticates to AWS through OIDC, initializes the S3 backend, plans the dedicated DNS root first, verifies DNS health, then plans the main production root.
  4. A non-destructive push applies automatically.
  5. Terraform creates or updates prod/terraform.tfstate in S3 with locking enabled.
  6. A new Vault VM is always sent through the real Ansible playbook after SSH becomes reachable.

First execution — update an existing prod-vault-01

  1. Push only the relevant Terraform or Ansible files.
  2. Terraform still refreshes and plans every time, so out-of-band VM deletion and missing state are detectable.
  3. Ansible file detection maps Vault-specific changes to the vault group.
  4. Existing hosts run --check; the real playbook runs only when check mode reports changes.
  5. Destructive Terraform actions remain blocked unless explicitly approved through a manual dispatch.

The exact always-plan logic is:


        branch="${GITHUB_REF_NAME}"
        apply_requested="false"

        # Always evaluate the selected environment with Terraform whenever
        # this workflow runs. Git diffs cannot detect VMs deleted directly
        # in Proxmox, missing S3 state objects, or other infrastructure drift.
        env_terraform_changed="true"

        # Production DNS has a separate Terraform root and state. Evaluate it
        # on every prod run so a missing DNS VM or missing DNS state is
        # detected and repaired before the main production environment.
        if [ "${branch}" = "prod" ]; then
          dns_terraform_changed="true"
        else
          dns_terraform_changed="false"
        fi

The workflow verifies the complete Vault file set before a production run:


        if [ "${TF_ENV}" = "prod" ] && [ -f "${TF_WORKING_DIR}/vault.tfvars" ]; then
          required_vault_files=(
            "ansible/configure-vault.yml"
            "ansible/group_vars/vault.yml"
            "ansible/templates/vault.hcl.j2"
            "ansible/templates/vault-bootstrap-openssl.cnf.j2"
            "ansible/templates/vault-audit-logrotate.j2"
            "ansible/files/vault.service.d/override.conf"
            "ansible/files/vault/ac-vault-bootstrap-logical"
            "ansible/files/vault/ac-vault-prepare-cert-issuer"
            "ansible/files/vault/policies/platform-admin.hcl"
            "ansible/files/vault/policies/cert-issuer.hcl"
            "ansible/files/vault/policies/cert-reader-local.hcl"
            "ansible/files/vault/policies/cert-reader-dev.hcl"
            "ansible/files/vault/policies/cert-reader-qa.hcl"
            "ansible/files/vault/policies/cert-reader-prod.hcl"
          )

          for file in "${required_vault_files[@]}"; do
            if [ ! -f "${TF_WORKING_DIR}/${file}" ]; then
              echo "ERROR: Missing required Vault file ${TF_WORKING_DIR}/${file}"
              exit 1
            fi
          done
        fi

The Vault required-file check must include ansible/files/vault/ac-vault-prepare-cert-issuer. Without that check, a certificate-only workflow can succeed while leaving the Vault-side handoff utility absent.

The production plan retains every existing variable file and appends vault.tfvars when present:


        tf_var_args=(
          "-var-file=terraform.tfvars"
          "-var-file=web.tfvars"
          "-var-file=app.tfvars"
          "-var-file=db.tfvars"
          "-var-file=k8s.tfvars"
          "-var-file=runner.tfvars"
        )

        if [ -f "vault.tfvars" ]; then
          tf_var_args+=("-var-file=vault.tfvars")
        fi

For local operator review on the self-hosted runner, use:


cd envs/prod
terraform init -input=false -reconfigure
terraform fmt -check -recursive
terraform validate
terraform plan \
-input=false \
-lock-timeout=5m \
-var-file=terraform.tfvars \
-var-file=web.tfvars \
-var-file=app.tfvars \
-var-file=db.tfvars \
-var-file=k8s.tfvars \
-var-file=runner.tfvars \
-var-file=vault.tfvars \
-out=tfplan

The production DNS root remains separate under envs/prod/dns; envs/prod/dns.tfvars must remain deleted.


14. Ansible Inventory and Targeting

envs/prod/ansible/inventory.ini contains the dedicated Vault group:


[web]
prod-web-01 ansible_host=192.168.8.122 ansible_user=acllc ansible_ssh_private_key_file=~/.ssh/id_ed25519_ansible ansible_python_interpreter=/usr/bin/python3 ansible_ssh_common_args='-o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'

[app]

[db]

[k8s]

[runner]

[vault]
prod-vault-01 ansible_host=192.168.8.2 ansible_user=acllc ansible_ssh_private_key_file=~/.ssh/id_ed25519_ansible ansible_python_interpreter=/usr/bin/python3 ansible_ssh_common_args='-o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'

envs/prod/ansible/configure-vms.yml preserves the working playbook order:


---
- import_playbook: bootstrap-vms.yml
- import_playbook: configure-web.yml
- import_playbook: configure-vault.yml
- import_playbook: finalize-firewall.yml

The workflow runs syntax validation and then targets only the selected host or component:


ansible-playbook \
-i ansible/inventory.ini \
ansible/configure-vms.yml \
--limit prod-vault-01 \
--syntax-check

ansible-playbook \
-i ansible/inventory.ini \
ansible/configure-vms.yml \
--limit prod-vault-01 \
--check

A new VM bypasses check-only convergence and receives the real playbook. An existing VM receives the real playbook only when --check predicts changes.


15. Common Operating-System Baseline

The common bootstrap and Vault playbook enforce or verify:


Hostname: prod-vault-01
SSH and passwordless sudo for acllc
QEMU Guest Agent from the VM template
Internal DNS resolution through prod-dns-01
UFW installed and low-volume logging enabled
Default incoming: deny
Default routed: deny
Default outgoing: allow
Swap disabled
Vault system account and group present
Vault directories with restrictive ownership and modes
Core dumps disabled
Vault service enabled and active
TLS listener reachable on 127.0.0.1:8200

Useful checks:


sudo hostnamectl --static
sudo ip -brief address
sudo ip route
sudo resolvectl status
sudo swapon --show
sudo systemctl is-active qemu-guest-agent ssh vault
sudo ufw status verbose

The playbook does not run unrelated application roles against the Vault host.


16. Configure Internal DNS on prod-vault-01

The common environment configuration must resolve internal names through prod-dns-01 at 192.168.8.4. Public resolvers belong behind BIND forwarding, not beside the internal resolver on the Vault VM.

Validate on prod-vault-01:


sudo resolvectl status
sudo getent ahostsv4 vault.aspireclan.com
sudo getent ahostsv4 github.com
sudo dig @192.168.8.4 vault.aspireclan.com

Expected Vault service address:


vault.aspireclan.com → 192.168.8.2

The GitHub Actions workflow performs the dedicated DNS job and mandatory health gate before the main production environment job.


17. UFW Policy

The working Vault variables permit API/UI access from the internal LAN during bootstrap:


vault_api_allowed_cidrs:
- "192.168.8.0/24"

vault_cluster_allowed_cidrs: []

The playbook adds only the required rules; it does not reset UFW:


TCP 22: allowed by the common management baseline
TCP 8200: allowed from 192.168.8.0/24
TCP 8201: no allow rule in the single-node phase
Default incoming: deny
Default routed: deny
Default outgoing: allow

Validate:


sudo ufw status verbose
sudo ss -lntp | grep -E ':(8200|8201)\b'

After network segmentation and proxy deployment, narrow vault_api_allowed_cidrs to the administrator, certificate issuer, and approved proxy source addresses.


18. Install Vault Community Edition

Vault is installed by Ansible from HashiCorp's official APT repository. The playbook uses the modern deb822 repository module and verifies the HashiCorp signing-key fingerprint before package installation.

Pinned collection requirements:


---
collections:
- name: community.general
  version: "13.0.1"
- name: community.crypto
  version: "3.2.1"

The working installation sequence is part of the complete playbook in section 20 and includes:


ca-certificates
curl
gpg
jq
logrotate
openssl
python3-cryptography
python3-debian

HashiCorp key fingerprint:
798AEC654E5C15428C8E42EEAA16FCBCA621E701

Repository:
https://apt.releases.hashicorp.com

Package:
vault

Verify on the VM:


sudo /usr/bin/vault version
sudo apt-cache policy vault
sudo systemctl cat vault

The tested installation reported Vault v2.0.2. Do not hard-code a future version in Terraform state or expose package operations to unattended major-version changes without snapshot and recovery preparation.


19. Generate the Bootstrap CA and Vault Server Certificate

The earlier controller-generated private CA design was superseded by the working Ansible implementation. The current phase-one certificate is generated directly on prod-vault-01 only when no certificate exists.

Authoritative variables:


# Vault server identity and network addresses.
vault_service_name: vault
vault_user: vault
vault_group: vault
vault_node_id: prod-vault-01
vault_ip_address: "192.168.8.2"
vault_api_addr: "https://192.168.8.2:8200"
vault_cluster_addr: "https://192.168.8.2:8201"

# Integrated Storage (Raft) and TLS filesystem paths.
vault_data_dir: /opt/vault/data
vault_tls_dir: /opt/vault/tls
vault_tls_private_key_path: /opt/vault/tls/vault-key.pem
vault_tls_csr_path: /opt/vault/tls/vault.csr
vault_tls_certificate_path: /opt/vault/tls/vault-cert.pem
vault_config_path: /etc/vault.d/vault.hcl
vault_audit_log_dir: /var/log/vault
vault_audit_log_path: /var/log/vault/audit.log

# This bootstrap certificate is created only when no certificate already exists.
# The certificate issuer phase will later replace the files at the same paths and
# signal Vault with SIGHUP, avoiding a Vault restart and reseal.
vault_bootstrap_tls_common_name: vault.aspireclan.com
vault_bootstrap_tls_dns_sans:
- vault.aspireclan.com
- prod-vault-01
vault_bootstrap_tls_ip_sans:
- "192.168.8.2"
vault_bootstrap_tls_validity: "+365d"

# Vault API access. Keep this narrow and replace the LAN CIDR with internal
# proxy/automation CIDRs after network segmentation is introduced.
vault_api_allowed_cidrs:
- "192.168.8.0/24"

# Port 8201 is not opened for a single-node deployment. Add only future Vault
# peer CIDRs here when a multi-node Raft cluster is approved.
vault_cluster_allowed_cidrs: []

vault_obsolete_ufw_rules: []

# Logical configuration defaults applied only after manual initialization.
vault_cert_mount: cert
vault_cert_environments:
- local
- dev
- qa
- prod
vault_cert_workload_types:
- web
- srvc
- job
- infra

Behavior:

  • The 4096-bit RSA private key is preserved with regenerate: never.
  • Check mode predicts missing TLS material but does not attempt to consume a CSR that has not been written.
  • The real run creates or repairs the CSR, then creates the self-signed certificate.
  • The certificate contains the approved DNS and IP SANs.
  • The key and certificate are checked for a matching public key.
  • An existing certificate without its matching private key stops the playbook rather than silently generating a replacement.

File layout:


/opt/vault/tls/
├── vault-key.pem    vault:vault 0600
├── vault.csr        vault:vault 0640
└── vault-cert.pem   vault:vault 0644

This certificate is temporary bootstrap trust. The certificate issuer phase will replace it through a controlled local deployment and controlled Vault restart/unseal procedure.


20. Deploy TLS Files and Vault Configuration

envs/prod/ansible/templates/vault.hcl.j2:


ui = true

api_addr      = "{{ vault_api_addr }}"
cluster_addr  = "{{ vault_cluster_addr }}"
disable_mlock = true

# Short defaults reduce the lifetime of accidentally exposed dynamic tokens.
default_lease_ttl = "1h"
max_lease_ttl     = "24h"

log_level  = "info"
log_format = "json"

storage "raft" {
path    = "{{ vault_data_dir }}"
node_id = "{{ vault_node_id }}"
}

listener "tcp" {
address         = "0.0.0.0:8200"
cluster_address = "0.0.0.0:8201"

tls_cert_file = "{{ vault_tls_certificate_path }}"
tls_key_file  = "{{ vault_tls_private_key_path }}"

tls_min_version = "tls12"
}

envs/prod/ansible/files/vault.service.d/override.conf:


[Service]
LimitCORE=0
LimitNOFILE=1048576
Environment=VAULT_ENABLE_FILE_PERMISSIONS_CHECK=true

envs/prod/ansible/templates/vault-audit-logrotate.j2:


{{ vault_audit_log_path }} {
  daily
  rotate 30
  size 100M
  missingok
  notifempty
  compress
  delaycompress
  dateext
  create 0600 {{ vault_user }} {{ vault_group }}
  sharedscripts
  postrotate
      /bin/systemctl kill --kill-who=main --signal=HUP {{ vault_service_name }}.service >/dev/null 2>&1 || true
  endscript
}

The complete working envs/prod/ansible/configure-vault.yml, including all check-mode fixes, the strict Boolean condition fix, TLS safety checks, and correctly registered handlers, is:


---
- name: Configure and harden HashiCorp Vault on prod-vault-01
hosts: vault
become: true
gather_facts: true

pre_tasks:
  - name: Validate required Vault variables
    ansible.builtin.assert:
      that:
        - vault_service_name is defined
        - vault_user is defined
        - vault_group is defined
        - vault_node_id is defined
        - vault_api_addr is defined
        - vault_cluster_addr is defined
        - vault_data_dir is defined
        - vault_tls_dir is defined
        - vault_tls_private_key_path is defined
        - vault_tls_certificate_path is defined
        - vault_config_path is defined
        - vault_audit_log_dir is defined
        - vault_audit_log_path is defined
        - vault_bootstrap_tls_common_name is defined
        - vault_bootstrap_tls_dns_sans is defined
        - vault_bootstrap_tls_ip_sans is defined
        - vault_api_allowed_cidrs is defined
        - vault_cluster_allowed_cidrs is defined
        - vault_obsolete_ufw_rules is defined
      fail_msg: Verify ansible/group_vars/vault.yml before continuing.

  - name: Resolve derived Vault TLS paths
    ansible.builtin.set_fact:
      vault_tls_csr_path_resolved: >-
        {{
          vault_tls_csr_path
          | default(vault_tls_dir ~ '/vault.csr', true)
        }}

  - name: Check whether the Vault binary exists before configuration
    ansible.builtin.stat:
      path: /usr/bin/vault
    register: vault_binary_before

  - name: Record first-install check-mode state
    ansible.builtin.set_fact:
      vault_first_install_check_mode: >-
        {{ ansible_check_mode and not vault_binary_before.stat.exists }}

  - name: Predict first-time Vault installation during check mode
    ansible.builtin.debug:
      msg: >-
        HashiCorp Vault is not installed. The real Ansible run will add the
        official HashiCorp APT repository, install Vault, create the TLS
        bootstrap material, configure Integrated Storage, and start Vault.
    changed_when: true
    when: vault_first_install_check_mode

tasks:
  - name: Install HashiCorp repository and Vault prerequisites
    ansible.builtin.apt:
      name:
        - ca-certificates
        - curl
        - gpg
        - jq
        - logrotate
        - openssl
        - python3-cryptography
        - python3-debian
      state: present
      update_cache: true
      cache_valid_time: 3600
    when: not ansible_check_mode

  - name: Download the official HashiCorp package-signing key
    ansible.builtin.get_url:
      url: https://apt.releases.hashicorp.com/gpg
      dest: /usr/share/keyrings/hashicorp-archive-keyring.asc
      owner: root
      group: root
      mode: "0644"
    register: hashicorp_signing_key
    when: not ansible_check_mode

  - name: Check whether the HashiCorp binary keyring exists
    ansible.builtin.stat:
      path: /usr/share/keyrings/hashicorp-archive-keyring.gpg
    register: hashicorp_binary_keyring
    when: not ansible_check_mode

  - name: Convert the HashiCorp signing key to a binary keyring
    ansible.builtin.command:
      argv:
        - gpg
        - --batch
        - --yes
        - --dearmor
        - --output
        - /usr/share/keyrings/hashicorp-archive-keyring.gpg
        - /usr/share/keyrings/hashicorp-archive-keyring.asc
    changed_when: true
    when:
      - not ansible_check_mode
      - >-
        hashicorp_signing_key.changed or
        not hashicorp_binary_keyring.stat.exists

  - name: Read the HashiCorp package-signing key fingerprint
    ansible.builtin.command:
      argv:
        - gpg
        - --batch
        - --no-default-keyring
        - --keyring
        - /usr/share/keyrings/hashicorp-archive-keyring.gpg
        - --with-colons
        - --fingerprint
    check_mode: false
    changed_when: false
    register: hashicorp_key_fingerprint
    when: not ansible_check_mode

  - name: Verify the official HashiCorp package-signing key fingerprint
    ansible.builtin.assert:
      that:
        - >-
          '798AEC654E5C15428C8E42EEAA16FCBCA621E701' in
          (hashicorp_key_fingerprint.stdout | replace(' ', ''))
      fail_msg: >-
        The downloaded HashiCorp signing key fingerprint is not the expected
        official fingerprint. Vault installation has been stopped.
    when: not ansible_check_mode

  - name: Remove the legacy HashiCorp one-line APT repository file
    ansible.builtin.file:
      path: /etc/apt/sources.list.d/hashicorp.list
      state: absent
    when: not ansible_check_mode

  - name: Configure the official HashiCorp deb822 APT repository
    ansible.builtin.deb822_repository:
      name: hashicorp
      types:
        - deb
      uris:
        - https://apt.releases.hashicorp.com
      suites:
        - "{{ ansible_facts['distribution_release'] }}"
      components:
        - main
      signed_by: /usr/share/keyrings/hashicorp-archive-keyring.gpg
      enabled: true
      state: present
    when: not ansible_check_mode

  - name: Install HashiCorp Vault
    ansible.builtin.apt:
      name: vault
      state: present
      update_cache: true
    when: not ansible_check_mode

  - name: Verify the Vault binary exists after installation
    ansible.builtin.stat:
      path: /usr/bin/vault
    register: vault_binary_after

  - name: Assert that the Vault binary is installed
    ansible.builtin.assert:
      that:
        - vault_binary_after.stat.exists
        - vault_binary_after.stat.isreg
        - vault_binary_after.stat.executable
      fail_msg: HashiCorp Vault was not installed at /usr/bin/vault.
    when: not vault_first_install_check_mode

  - name: Configure the Vault runtime and server
    when: not vault_first_install_check_mode
    block:
      - name: Read the installed Vault version
        ansible.builtin.command: /usr/bin/vault version
        check_mode: false
        changed_when: false
        register: vault_version_result

      - name: Default the existing initialization state to false
        ansible.builtin.set_fact:
          vault_was_initialized: false

      - name: Read current Vault status before configuration
        ansible.builtin.command: /usr/bin/vault status -format=json
        environment:
          VAULT_ADDR: "{{ vault_api_addr }}"
          VAULT_CACERT: "{{ vault_tls_certificate_path }}"
        check_mode: false
        changed_when: false
        failed_when: false
        register: vault_status_before
        when: not ansible_check_mode

      - name: Record whether Vault was already initialized
        ansible.builtin.set_fact:
          vault_was_initialized: >-
            {{
              (
                vault_status_before.stdout
                | from_json
              ).initialized
              | default(false)
            }}
        when:
          - not ansible_check_mode
          - >-
            (
              vault_status_before.stdout
              | default('')
              | trim
              | regex_search('^\{')
            ) is not none

      - name: Verify the Vault service account exists
        ansible.builtin.getent:
          database: passwd
          key: "{{ vault_user }}"

      - name: Verify the Vault service group exists
        ansible.builtin.getent:
          database: group
          key: "{{ vault_group }}"

      - name: Create Vault directories with restrictive permissions
        ansible.builtin.file:
          path: "{{ item.path }}"
          state: directory
          owner: "{{ item.owner }}"
          group: "{{ item.group }}"
          mode: "{{ item.mode }}"
        loop:
          - path: /etc/vault.d
            owner: "{{ vault_user }}"
            group: "{{ vault_group }}"
            mode: "0700"
          - path: "{{ vault_data_dir }}"
            owner: "{{ vault_user }}"
            group: "{{ vault_group }}"
            mode: "0700"
          - path: "{{ vault_tls_dir }}"
            owner: "{{ vault_user }}"
            group: "{{ vault_group }}"
            mode: "0700"
          - path: "{{ vault_audit_log_dir }}"
            owner: "{{ vault_user }}"
            group: "{{ vault_group }}"
            mode: "0750"
          - path: /etc/systemd/system/vault.service.d
            owner: root
            group: root
            mode: "0755"
          - path: /usr/local/share/ac-vault/policies
            owner: root
            group: root
            mode: "0750"

      - name: Read active swap devices
        ansible.builtin.command: swapon --show=NAME --noheadings
        check_mode: false
        changed_when: false
        register: vault_swap_devices

      - name: Disable active swap for Vault hardening
        ansible.builtin.command: swapoff -a
        when:
          - not ansible_check_mode
          - vault_swap_devices.stdout | trim | length > 0
        changed_when: true

      - name: Disable persistent swap entries
        ansible.builtin.replace:
          path: /etc/fstab
          regexp: '^(?!#)(\s*\S+\s+\S+\s+swap\s+.*)$'
          replace: '# Disabled for Vault: \1'
          backup: true

      - name: Check whether the Vault TLS private key exists
        ansible.builtin.stat:
          path: "{{ vault_tls_private_key_path }}"
        register: vault_tls_private_key_before

      - name: Check whether the Vault TLS CSR exists
        ansible.builtin.stat:
          path: "{{ vault_tls_csr_path_resolved }}"
        register: vault_tls_csr_before

      - name: Check whether the Vault TLS certificate exists
        ansible.builtin.stat:
          path: "{{ vault_tls_certificate_path }}"
        register: vault_tls_certificate_before

      - name: Stop when a certificate exists without its matching private key
        ansible.builtin.assert:
          that:
            - >-
              not (
                vault_tls_certificate_before.stat.exists and
                not vault_tls_private_key_before.stat.exists
              )
          fail_msg: >-
            The Vault TLS certificate exists but the private key is missing.
            Restore the matching private key before continuing. Generating a
            replacement key would make the existing certificate unusable.

      - name: Predict bootstrap Vault TLS material creation during check mode
        ansible.builtin.debug:
          msg: >-
            The bootstrap Vault TLS certificate is absent. The real Ansible
            run will reuse any existing private key, create or repair the CSR,
            and generate the self-signed bootstrap certificate.
        changed_when: true
        when:
          - ansible_check_mode
          - not vault_tls_certificate_before.stat.exists

      - name: Generate or repair bootstrap Vault TLS material
        when:
          - not ansible_check_mode
          - not vault_tls_certificate_before.stat.exists
        block:
          - name: Generate the bootstrap Vault TLS private key when absent
            community.crypto.openssl_privatekey:
              path: "{{ vault_tls_private_key_path }}"
              type: RSA
              size: 4096
              owner: "{{ vault_user }}"
              group: "{{ vault_group }}"
              mode: "0600"
              regenerate: never
            notify: Reload Vault TLS

          - name: Generate or repair the bootstrap Vault TLS CSR
            community.crypto.openssl_csr:
              path: "{{ vault_tls_csr_path_resolved }}"
              privatekey_path: "{{ vault_tls_private_key_path }}"
              common_name: "{{ vault_bootstrap_tls_common_name }}"
              organization_name: Aspireclan LLC
              subject_alt_name: >-
                {{
                  (
                    vault_bootstrap_tls_dns_sans
                    | map('regex_replace', '^(.*)$', 'DNS:\1')
                    | list
                  )
                  +
                  (
                    vault_bootstrap_tls_ip_sans
                    | map('regex_replace', '^(.*)$', 'IP:\1')
                    | list
                  )
                }}
              key_usage:
                - digitalSignature
                - keyEncipherment
              extended_key_usage:
                - serverAuth
              owner: "{{ vault_user }}"
              group: "{{ vault_group }}"
              mode: "0640"
              backup: true
            notify: Reload Vault TLS

          - name: Generate the bootstrap self-signed Vault TLS certificate
            community.crypto.x509_certificate:
              path: "{{ vault_tls_certificate_path }}"
              csr_path: "{{ vault_tls_csr_path_resolved }}"
              privatekey_path: "{{ vault_tls_private_key_path }}"
              provider: selfsigned
              selfsigned_not_before: "-5m"
              selfsigned_not_after: "{{ vault_bootstrap_tls_validity }}"
              ignore_timestamps: true
              owner: "{{ vault_user }}"
              group: "{{ vault_group }}"
              mode: "0644"
              backup: true
            notify: Reload Vault TLS

      - name: Read resulting Vault TLS private-key state
        ansible.builtin.stat:
          path: "{{ vault_tls_private_key_path }}"
        register: vault_tls_private_key_after

      - name: Read resulting Vault TLS certificate state
        ansible.builtin.stat:
          path: "{{ vault_tls_certificate_path }}"
        register: vault_tls_certificate_after

      - name: Assert required Vault TLS files exist
        ansible.builtin.assert:
          that:
            - vault_tls_private_key_after.stat.exists
            - vault_tls_private_key_after.stat.isreg
            - vault_tls_certificate_after.stat.exists
            - vault_tls_certificate_after.stat.isreg
          fail_msg: >-
            Vault TLS material is incomplete. The private key and certificate
            must both exist before Vault can start.
        when:
          - not ansible_check_mode or vault_tls_certificate_before.stat.exists

      - name: Verify the Vault certificate and private key match
        ansible.builtin.shell: |
          set -euo pipefail
          cert_public_key_hash="$(
            openssl x509 \
              -in {{ vault_tls_certificate_path | quote }} \
              -pubkey \
              -noout | \
            openssl pkey -pubin -outform DER | \
            sha256sum | awk '{print $1}'
          )"
          private_public_key_hash="$(
            openssl pkey \
              -in {{ vault_tls_private_key_path | quote }} \
              -pubout \
              -outform DER | \
            sha256sum | awk '{print $1}'
          )"
          test "${cert_public_key_hash}" = "${private_public_key_hash}"
          openssl x509 \
            -in {{ vault_tls_certificate_path | quote }} \
            -noout \
            -checkend 3600
        args:
          executable: /bin/bash
        check_mode: false
        changed_when: false
        when:
          - vault_tls_certificate_after.stat.exists
          - vault_tls_private_key_after.stat.exists

      - name: Deploy the Vault server configuration
        ansible.builtin.template:
          src: vault.hcl.j2
          dest: "{{ vault_config_path }}"
          owner: "{{ vault_user }}"
          group: "{{ vault_group }}"
          mode: "0600"
          backup: true
        notify: Restart Vault configuration

      - name: Deploy the Vault systemd hardening override
        ansible.builtin.copy:
          src: vault.service.d/override.conf
          dest: /etc/systemd/system/vault.service.d/override.conf
          owner: root
          group: root
          mode: "0644"
        notify: Restart Vault configuration

      - name: Prepare the Vault audit log file
        ansible.builtin.file:
          path: "{{ vault_audit_log_path }}"
          state: touch
          owner: "{{ vault_user }}"
          group: "{{ vault_group }}"
          mode: "0600"
          modification_time: preserve
          access_time: preserve

      - name: Install Vault audit log rotation
        ansible.builtin.template:
          src: vault-audit-logrotate.j2
          dest: /etc/logrotate.d/vault-audit
          owner: root
          group: root
          mode: "0644"

      - name: Install Vault logical bootstrap utility
        ansible.builtin.copy:
          src: vault/ac-vault-bootstrap-logical
          dest: /usr/local/sbin/ac-vault-bootstrap-logical
          owner: root
          group: root
          mode: "0750"

      - name: Install Vault certificate issuer preparation utility
        ansible.builtin.copy:
          src: vault/ac-vault-prepare-cert-issuer
          dest: /usr/local/sbin/ac-vault-prepare-cert-issuer
          owner: root
          group: root
          mode: "0750"

      - name: Install version-controlled Vault policies
        ansible.builtin.copy:
          src: "vault/policies/{{ item }}"
          dest: "/usr/local/share/ac-vault/policies/{{ item }}"
          owner: root
          group: root
          mode: "0640"
        loop:
          - platform-admin.hcl
          - cert-issuer.hcl
          - cert-reader-local.hcl
          - cert-reader-dev.hcl
          - cert-reader-qa.hcl
          - cert-reader-prod.hcl

      - name: Remove obsolete Vault UFW rules
        community.general.ufw:
          rule: "{{ item.rule | default('allow') }}"
          src: "{{ item.src }}"
          to_port: "{{ item.port }}"
          proto: "{{ item.proto | default('tcp') }}"
          delete: true
        loop: "{{ vault_obsolete_ufw_rules }}"
        when: vault_obsolete_ufw_rules | length > 0

      - name: Allow approved clients to reach the Vault API
        community.general.ufw:
          rule: allow
          src: "{{ item }}"
          to_port: "8200"
          proto: tcp
          comment: Allow approved clients to reach Vault API
        loop: "{{ vault_api_allowed_cidrs }}"

      - name: Allow approved Vault peers to reach the Raft cluster port
        community.general.ufw:
          rule: allow
          src: "{{ item }}"
          to_port: "8201"
          proto: tcp
          comment: Allow approved Vault Raft peers
        loop: "{{ vault_cluster_allowed_cidrs }}"
        when: vault_cluster_allowed_cidrs | length > 0

      - name: Flush Vault configuration handlers
        ansible.builtin.meta: flush_handlers

      - name: Ensure Vault is enabled and running
        ansible.builtin.systemd_service:
          name: "{{ vault_service_name }}"
          enabled: true
          state: started
          daemon_reload: true
        when: not ansible_check_mode

      - name: Wait for the Vault TLS listener
        ansible.builtin.wait_for:
          host: 127.0.0.1
          port: 8200
          timeout: 60
        when: not ansible_check_mode

      - name: Read Vault status after configuration
        ansible.builtin.command: /usr/bin/vault status -format=json
        environment:
          VAULT_ADDR: "{{ vault_api_addr }}"
          VAULT_CACERT: "{{ vault_tls_certificate_path }}"
        check_mode: false
        changed_when: false
        failed_when: vault_status_after.rc not in [0, 2]
        register: vault_status_after
        when: not ansible_check_mode

      - name: Verify the Vault service is active
        ansible.builtin.command:
          argv:
            - systemctl
            - is-active
            - --quiet
            - "{{ vault_service_name }}.service"
        check_mode: false
        changed_when: false
        when: not ansible_check_mode

      - name: Show the non-secret Vault readiness summary
        ansible.builtin.debug:
          msg: >-
            Vault {{ vault_version_result.stdout | trim }} is listening with TLS,
            Integrated Storage, swap disabled, core dumps disabled, audit-log
            storage prepared, and fail-closed firewall rules. Initialization and
            unseal remain a direct, interactive Ubuntu server operation.

handlers:
  - name: Reload Vault TLS
    ansible.builtin.command:
      argv:
        - systemctl
        - kill
        - --kill-who=main
        - --signal=HUP
        - "{{ vault_service_name }}.service"
    changed_when: true
    when:
      - not ansible_check_mode
      - vault_was_initialized | default(false) | bool

  - name: Block automatic restart of an initialized Shamir-sealed Vault
    ansible.builtin.fail:
      msg: >-
        Vault was already initialized and vault.hcl or its systemd unit
        changed. Automatic restart is blocked because it would reseal the
        server. Schedule a controlled restart and unseal operation.
    when:
      - not ansible_check_mode
      - vault_was_initialized | default(false) | bool
    listen: Restart Vault configuration

  - name: Restart uninitialized Vault safely
    ansible.builtin.systemd_service:
      name: "{{ vault_service_name }}"
      state: restarted
      enabled: true
      daemon_reload: true
    when:
      - not ansible_check_mode
      - not (vault_was_initialized | default(false) | bool)
    listen: Restart Vault configuration

21. Start Vault and Perform Pre-Initialization Validation

The commands in this section were tested directly on prod-vault-01. Because /opt/vault/tls is intentionally mode 0700, privileged service, TLS, listener, and Vault CLI checks use sudo.

Current-server state

The current prod-vault-01 instance is already initialized. Retain the pre-initialization checks below as the rebuild/recovery runbook, but do not run vault operator init again against the existing Raft data.

Run:


sudo systemctl status vault --no-pager
sudo journalctl -u vault -n 100 --no-pager
sudo systemctl is-active vault

sudo test -s /opt/vault/tls/vault-key.pem &&
echo "PASS: TLS private key exists."

sudo test -s /opt/vault/tls/vault-cert.pem &&
echo "PASS: TLS certificate exists."

sudo openssl x509 \
-in /opt/vault/tls/vault-cert.pem \
-noout \
-checkend 3600 &&
echo "PASS: TLS certificate is currently valid."

sudo openssl x509 \
-in /opt/vault/tls/vault-cert.pem \
-noout \
-subject \
-issuer \
-dates \
-ext subjectAltName

sudo ss -lntH '( sport = :8200 )' | grep LISTEN
sudo ss -lntp | grep -E ':(8200|8201)\b'
sudo ufw status verbose

Verify the protected TLS-path permissions without weakening them:


sudo namei -l /opt/vault/tls/vault-cert.pem

sudo stat -c '%A %a %U:%G %n' \
/opt/vault/tls \
/opt/vault/tls/vault-cert.pem \
/opt/vault/tls/vault-key.pem

Expected security posture:


/opt/vault/tls                 0700 vault:vault
vault-cert.pem                 0644 vault:vault
vault-key.pem                  0600 vault:vault

Check Vault through the protected CA file:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status

Expected before first initialization:


Initialized    false
Sealed         true
Storage Type   raft

A sealed or uninitialized vault status can return exit code 2; the displayed state is the important result. Stop immediately if a rebuild candidate unexpectedly reports Initialized true.


22. Initialize Vault with Five Shares and a Threshold of Three

Completed once on the current server

The current prod-vault-01 instance has already been initialized with five shares and threshold three. The procedure below is retained as the authoritative rebuild runbook. Never run the initialization command again against the existing /opt/vault/data Raft data.

Perform this section once, directly on a new or restored-but-uninitialized prod-vault-01. Disable screen sharing and terminal recording. Do not use tee, output redirection, a transcript, or a screenshot.

Command 1 — verify state


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status

Proceed only when Initialized is false.

Command 2 — initialize and display the six recovery values


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator init -key-shares=5 -key-threshold=3

The terminal displays exactly once:


Unseal Key 1: <COPY_TO_SECURE_LOCATION_1>
Unseal Key 2: <COPY_TO_SECURE_LOCATION_2>
Unseal Key 3: <COPY_TO_SECURE_LOCATION_3>
Unseal Key 4: <COPY_TO_SECURE_LOCATION_4>
Unseal Key 5: <COPY_TO_SECURE_LOCATION_5>

Initial Root Token: <COPY_TO_TEMPORARY_SECURE_ROOT_TOKEN_LOCATION>

Copy each value carefully before continuing. Never run vault operator init again against the same Raft data.

Commands 3, 4, and 5 — submit any three different shares interactively


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
# Paste Unseal Key 1 at the hidden prompt.

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
# Paste Unseal Key 2 at the hidden prompt.

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
# Paste Unseal Key 3 at the hidden prompt.

Do not append a share to the command line; interactive entry prevents it from entering shell history or process arguments.

Command 6 — verify


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status

Expected:


Initialized    true
Sealed         false
Storage Type   raft

23. Secure and Verify Initialization Material

Create six separate records immediately:


prod-vault-01 — Unseal Share 1
prod-vault-01 — Unseal Share 2
prod-vault-01 — Unseal Share 3
prod-vault-01 — Unseal Share 4
prod-vault-01 — Unseal Share 5
prod-vault-01 — Initial Root Token

Preferred custody keeps every share separate. A practical single-operator minimum is:


Encrypted location A: shares 1 and 2
Encrypted location B: shares 3 and 4, stored separately from A
Encrypted location C: share 5
Temporary root-token record: separate from all share locations

No single storage location should contain three shares. Keep the five shares permanently. Keep the initial root token only until the named administrator and logical baseline are validated, then revoke it and delete any plaintext copy.

Suitable storage includes password-manager secure notes with strong MFA and separate BitLocker-encrypted offline media. Do not use Git, GitHub secrets, Terraform, Ansible variables, email, chat, screenshots, unencrypted USB drives, ordinary text files, or the Vault VM itself.

After copying from the terminal:

  • Re-read every value from its secure destination and compare the beginning and ending characters with the terminal output.
  • Clear the client clipboard and clipboard history.
  • Close the terminal tab to discard scrollback.
  • Do not leave a PowerShell, PuTTY, SSH, or Proxmox-console transcript containing the values.
  • Record only non-secret metadata: initialization date, share count 5, threshold 3, and custody location labels.

24. Unseal Vault and Perform the First Login

After initialization or any Vault service/VM restart, submit any three distinct shares interactively. Every command reads the protected CA file through sudo; no share appears in a command argument.


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal

Verify:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status

For a new rebuild's first root login, avoid putting the token on the command line:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault login
# Paste the Initial Root Token at the hidden prompt.

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault token lookup

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status

The current production instance no longer uses the initial root token; it has been revoked. Do not add VAULT_TOKEN to /etc/environment, .bashrc, shell profiles, systemd units, or scripts. Remove any temporary root Vault token file when the operation is complete:


sudo rm -f /root/.vault-token

25. Enable Two Audit Devices

The earlier design proposed file and syslog audit devices. The working baseline currently enables one file audit device through the installed logical-bootstrap utility. A second audit device can be added later after its destination and failure behavior are approved.

The audit file and rotation are prepared by Ansible:


/var/log/vault/audit.log
owner: vault
group: vault
mode: 0600
rotation: daily, 30 files, 100M size threshold, compressed

The complete installed utility is:


#!/usr/bin/env bash
set -euo pipefail
umask 077

VAULT_ADDR="https://192.168.8.2:8200"
VAULT_CACERT="/opt/vault/tls/vault-cert.pem"
export VAULT_ADDR VAULT_CACERT

POLICY_DIR="/usr/local/share/ac-vault/policies"
AUDIT_LOG="/var/log/vault/audit.log"
CERT_MOUNT="cert"

payload_file="$(mktemp /run/ac-vault-bootstrap.XXXXXX.json)"
login_file=""
cleanup() {
unset VAULT_TOKEN ROOT_TOKEN ADMIN_PASSWORD ADMIN_TOKEN
rm -f "${payload_file}"
if [[ -n "${login_file}" ]]; then
  rm -f "${login_file}"
fi
}
trap cleanup EXIT

cat > "${payload_file}"
chmod 0600 "${payload_file}"

ACTION="$(jq -r '.action // "configure"' "${payload_file}")"
ROOT_TOKEN="$(jq -r '.root_token // empty' "${payload_file}")"
ADMIN_USERNAME="$(jq -r '.admin_username // empty' "${payload_file}")"
ADMIN_PASSWORD="$(jq -r '.admin_password // empty' "${payload_file}")"

case "${ACTION}" in
configure|revoke-initial-root)
  if [[ -z "${ROOT_TOKEN}" ]]; then
    echo "ERROR: root_token is required for action '${ACTION}'." >&2
    exit 1
  fi
  ;;
*)
  echo "ERROR: Unsupported action '${ACTION}'." >&2
  exit 1
  ;;
esac

set +e
vault status -format=json >/dev/null 2>&1
status_rc=$?
set -e
if [[ "${status_rc}" -ne 0 ]]; then
if [[ "${status_rc}" -eq 2 ]]; then
  echo "ERROR: Vault is sealed. Unseal it before logical configuration." >&2
else
  echo "ERROR: Vault is not reachable at ${VAULT_ADDR}." >&2
fi
exit 1
fi

export VAULT_TOKEN="${ROOT_TOKEN}"
vault token lookup -format=json >/dev/null

write_policy() {
local policy_name="$1"
local policy_file="${POLICY_DIR}/${policy_name}.hcl"
[[ -f "${policy_file}" ]] || {
  echo "ERROR: Missing policy file ${policy_file}." >&2
  exit 1
}
vault policy write "${policy_name}" "${policy_file}" >/dev/null
}

enable_auth_if_missing() {
local auth_path="$1"
local auth_type="$2"
local auth_json
auth_json="$(vault auth list -format=json)"

if ! jq -e --arg path "${auth_path}/" 'has($path)' <<< "${auth_json}" >/dev/null; then
  vault auth enable -path="${auth_path}" "${auth_type}" >/dev/null
  return
fi

actual_type="$(jq -r --arg path "${auth_path}/" '.[$path].type // empty' <<< "${auth_json}")"
if [[ "${actual_type}" != "${auth_type}" ]]; then
  echo "ERROR: auth/${auth_path} exists with type '${actual_type}', expected '${auth_type}'." >&2
  exit 1
fi
}

configure_approle() {
local role_name="$1"
local policy_name="$2"
vault write "auth/approle/role/${role_name}" \
  token_type=batch \
  token_policies="${policy_name}" \
  token_ttl=15m \
  token_max_ttl=30m \
  secret_id_ttl=720h \
  secret_id_num_uses=0 >/dev/null
}

validate_logical_baseline() {
local audit_json secrets_json auth_json policy_name role_name
audit_json="$(vault audit list -format=json)"
secrets_json="$(vault secrets list -format=json)"
auth_json="$(vault auth list -format=json)"

jq -e 'has("file/")' <<< "${audit_json}" >/dev/null
jq -e --arg path "${CERT_MOUNT}/" \
  '.[$path].type == "kv" and .[$path].options.version == "2"' \
  <<< "${secrets_json}" >/dev/null
jq -e 'has("userpass/") and has("approle/")' <<< "${auth_json}" >/dev/null

for policy_name in \
  platform-admin \
  cert-issuer \
  cert-reader-local \
  cert-reader-dev \
  cert-reader-qa \
  cert-reader-prod
do
  vault policy read "${policy_name}" >/dev/null
done

for role_name in \
  cert-issuer \
  cert-reader-local \
  cert-reader-dev \
  cert-reader-qa \
  cert-reader-prod
do
  vault read "auth/approle/role/${role_name}" >/dev/null
done

vault kv get "${CERT_MOUNT}/prod/infra/_schema" >/dev/null
}

if [[ "${ACTION}" == "configure" ]]; then
if [[ -z "${ADMIN_USERNAME}" || -z "${ADMIN_PASSWORD}" ]]; then
  echo "ERROR: admin_username and admin_password are required for configure." >&2
  exit 1
fi

touch "${AUDIT_LOG}"
chown vault:vault "${AUDIT_LOG}"
chmod 0600 "${AUDIT_LOG}"

if ! vault audit list -format=json | jq -e 'has("file/")' >/dev/null; then
  vault audit enable -path=file file \
    file_path="${AUDIT_LOG}" \
    hmac_accessor=false \
    elide_list_responses=true >/dev/null
fi

secrets_json="$(vault secrets list -format=json)"
if ! jq -e --arg path "${CERT_MOUNT}/" 'has($path)' <<< "${secrets_json}" >/dev/null; then
  vault secrets enable -path="${CERT_MOUNT}" kv-v2 >/dev/null
else
  existing_type="$(jq -r --arg path "${CERT_MOUNT}/" '.[$path].type // empty' <<< "${secrets_json}")"
  existing_version="$(jq -r --arg path "${CERT_MOUNT}/" '.[$path].options.version // empty' <<< "${secrets_json}")"
  if [[ "${existing_type}" != "kv" || "${existing_version}" != "2" ]]; then
    echo "ERROR: ${CERT_MOUNT}/ exists but is not a KV v2 secrets engine." >&2
    exit 1
  fi
fi

vault write "${CERT_MOUNT}/config" \
  max_versions=20 \
  cas_required=false \
  delete_version_after=0s >/dev/null

write_policy platform-admin
write_policy cert-issuer
write_policy cert-reader-local
write_policy cert-reader-dev
write_policy cert-reader-qa
write_policy cert-reader-prod

enable_auth_if_missing userpass userpass
enable_auth_if_missing approle approle

admin_payload="$(jq -n \
  --arg password "${ADMIN_PASSWORD}" \
  '{password:$password,token_policies:"platform-admin",token_ttl:"1h",token_max_ttl:"8h"}')"
printf '%s' "${admin_payload}" | vault write "auth/userpass/users/${ADMIN_USERNAME}" - >/dev/null
unset admin_payload ADMIN_PASSWORD

configure_approle cert-issuer cert-issuer
configure_approle cert-reader-local cert-reader-local
configure_approle cert-reader-dev cert-reader-dev
configure_approle cert-reader-qa cert-reader-qa
configure_approle cert-reader-prod cert-reader-prod

for environment in local dev qa prod; do
  for workload_type in web srvc job infra; do
    domain_example="${environment}.${workload_type}.example.invalid"
    case "${environment}/${workload_type}" in
      local/web)  domain_example="local.fp.aspireclan.com" ;;
      local/srvc) domain_example="local.api.fp.aspireclan.com" ;;
      dev/web)    domain_example="dev.fp.aspireclan.com" ;;
      dev/srvc)   domain_example="dev.api.fp.aspireclan.com" ;;
      qa/web)     domain_example="qa.fp.aspireclan.com" ;;
      qa/srvc)    domain_example="qa.api.fp.aspireclan.com" ;;
      prod/web)   domain_example="fp.aspireclan.com" ;;
      prod/srvc)  domain_example="api.fp.aspireclan.com" ;;
      prod/infra) domain_example="vault.aspireclan.com" ;;
    esac

    schema_path="${CERT_MOUNT}/${environment}/${workload_type}/_schema"
    if ! vault kv get -format=json "${schema_path}" >/dev/null 2>&1; then
      vault kv put "${schema_path}" \
        environment="${environment}" \
        workload_type="${workload_type}" \
        path_format="cert/${environment}/${workload_type}/<workload-name>" \
        domain_example="${domain_example}" \
        certificate_fields="domain,certificate_pem,private_key_pem,chain_pem,fullchain_pem,sans_json,issuer,serial_number,not_before,not_after,renew_after,acme_directory,updated_at" \
        schema_version="1" >/dev/null
    fi
  done
done

validate_logical_baseline

echo "PASS: Vault logical baseline configured and validated."
echo "Configured: file audit device, cert/ KV v2, userpass, AppRole, policies, roles, and certificate path schema."
echo "No AppRole SecretIDs were generated or printed."

elif [[ "${ACTION}" == "revoke-initial-root" ]]; then
if [[ -z "${ADMIN_USERNAME}" || -z "${ADMIN_PASSWORD}" ]]; then
  echo "ERROR: admin_username and admin_password are required for root-token retirement." >&2
  exit 1
fi

login_payload="$(jq -n --arg password "${ADMIN_PASSWORD}" '{password:$password}')"
login_file="$(mktemp /run/ac-vault-login.XXXXXX.json)"
printf '%s' "${login_payload}" > "${login_file}"
chmod 0600 "${login_file}"

ADMIN_TOKEN="$(curl --silent --show-error --fail \
  --cacert "${VAULT_CACERT}" \
  --request POST \
  --data @"${login_file}" \
  "${VAULT_ADDR}/v1/auth/userpass/login/${ADMIN_USERNAME}" | jq -r '.auth.client_token')"

[[ -n "${ADMIN_TOKEN}" && "${ADMIN_TOKEN}" != "null" ]] || {
  echo "ERROR: Unable to authenticate the platform administrator." >&2
  exit 1
}

VAULT_TOKEN="${ADMIN_TOKEN}" vault secrets list -format=json >/dev/null
VAULT_TOKEN="${ADMIN_TOKEN}" vault policy read platform-admin >/dev/null
VAULT_TOKEN="${ADMIN_TOKEN}" vault audit list -format=json >/dev/null

VAULT_TOKEN="${ROOT_TOKEN}" vault token revoke -self >/dev/null
echo "PASS: The initial root token was revoked after platform-admin validation."
fi

The utility enables file/ only when absent, validates it, and never prints AppRole SecretIDs. The direct-server validation was tested successfully:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault audit list -detailed

sudo tail -n 20 /var/log/vault/audit.log
sudo logrotate -d /etc/logrotate.d/vault-audit

At least one healthy audit device must remain enabled before production certificate keys or ACME credentials are written.


26. Create Human Administrator Authentication and Revoke Root

The direct-server sudo procedure in this section was tested successfully on prod-vault-01. The installed logical-bootstrap utility creates the file audit device, cert/ KV v2 mount, policies, userpass, AppRole roles, and schema placeholders without generating any AppRole SecretID.

Confirm the utility and policies exist


sudo test -x /usr/local/sbin/ac-vault-bootstrap-logical &&
echo "PASS: Logical-bootstrap utility exists."

sudo find /usr/local/share/ac-vault/policies \
-maxdepth 1 \
-type f \
-printf '%f\n' | sort

sudo bash -c '
command -v jq
command -v python3
command -v vault
command -v curl
'

Expected policy files:


cert-issuer.hcl
cert-reader-dev.hcl
cert-reader-local.hcl
cert-reader-prod.hcl
cert-reader-qa.hcl
platform-admin.hcl

Stop if the utility or any policy file is missing.

Configure the logical baseline

Use a new administrator password containing at least twenty characters. The root token and administrator password are read from /dev/tty and are not placed in shell history.


sudo bash <<'BASH'
set -euo pipefail
umask 077

read -rsp 'Initial root token: ' ROOT_TOKEN </dev/tty
echo >/dev/tty

read -rp 'Administrator username [manoj-admin]: ' ADMIN_USERNAME </dev/tty
ADMIN_USERNAME="${ADMIN_USERNAME:-manoj-admin}"

if [[ ! "${ADMIN_USERNAME}" =~ ^[a-zA-Z0-9._-]+$ ]]; then
echo "ERROR: Administrator username contains unsupported characters." >&2
unset ROOT_TOKEN ADMIN_USERNAME
exit 1
fi

read -rsp 'New administrator password: ' ADMIN_PASSWORD </dev/tty
echo >/dev/tty

read -rsp 'Confirm administrator password: ' ADMIN_PASSWORD_CONFIRM </dev/tty
echo >/dev/tty

if [[ "${ADMIN_PASSWORD}" != "${ADMIN_PASSWORD_CONFIRM}" ]]; then
echo "ERROR: Administrator passwords do not match." >&2
unset ROOT_TOKEN ADMIN_USERNAME ADMIN_PASSWORD ADMIN_PASSWORD_CONFIRM
exit 1
fi

if (( ${#ADMIN_PASSWORD} < 20 )); then
echo "ERROR: Administrator password must contain at least 20 characters." >&2
unset ROOT_TOKEN ADMIN_USERNAME ADMIN_PASSWORD ADMIN_PASSWORD_CONFIRM
exit 1
fi

set +e
printf '%s\n%s\n%s\n' \
"${ROOT_TOKEN}" \
"${ADMIN_USERNAME}" \
"${ADMIN_PASSWORD}" |
python3 -c '
import json
import sys

root_token = sys.stdin.readline().rstrip("\n")
username = sys.stdin.readline().rstrip("\n")
password = sys.stdin.readline().rstrip("\n")

print(json.dumps({
  "action": "configure",
  "root_token": root_token,
  "admin_username": username,
  "admin_password": password
}))
' |
/usr/local/sbin/ac-vault-bootstrap-logical

BOOTSTRAP_RC=$?
set -e

unset ROOT_TOKEN
unset ADMIN_USERNAME
unset ADMIN_PASSWORD
unset ADMIN_PASSWORD_CONFIRM

if (( BOOTSTRAP_RC != 0 )); then
echo "ERROR: Vault logical bootstrap failed." >&2
exit "${BOOTSTRAP_RC}"
fi
BASH

Expected final messages:


PASS: Vault logical baseline configured and validated.
Configured: file audit device, cert/ KV v2, userpass, AppRole, policies, roles, and certificate path schema.
No AppRole SecretIDs were generated or printed.

Validate the named administrator before revoking root

Log in using userpass; the temporary administrator token is stored under root's home because sudo -H is used:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault login -method=userpass username=manoj-admin

Validate the token and logical baseline:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault token lookup

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault audit list -detailed

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault auth list -detailed

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault secrets list -detailed

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault read cert/config

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault policy list

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault policy read platform-admin

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator raft list-peers

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault kv get cert/prod/infra/_schema

sudo tail -n 20 /var/log/vault/audit.log
sudo logrotate -d /etc/logrotate.d/vault-audit

Validate every policy:


for policy in \
platform-admin \
cert-issuer \
cert-reader-local \
cert-reader-dev \
cert-reader-qa \
cert-reader-prod
do
echo
echo "===== ${policy} ====="

sudo -H env \
  VAULT_ADDR="https://192.168.8.2:8200" \
  VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
  vault policy read "${policy}"
done

Validate all AppRoles without generating a SecretID:


for role in \
cert-issuer \
cert-reader-local \
cert-reader-dev \
cert-reader-qa \
cert-reader-prod
do
echo
echo "===== ${role} ====="

sudo -H env \
  VAULT_ADDR="https://192.168.8.2:8200" \
  VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
  vault read "auth/approle/role/${role}"
done

Validate representative schema records:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault kv get cert/prod/infra/_schema

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault kv get cert/dev/web/_schema

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault kv get cert/qa/srvc/_schema

Remove the temporary administrator session before root-token retirement:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault token revoke -self

sudo rm -f /root/.vault-token

sudo test ! -e /root/.vault-token &&
echo "PASS: Temporary root-user Vault token file removed."

Revoke the initial root token only after validation


sudo bash <<'BASH'
set -euo pipefail
umask 077

read -rsp 'Initial root token to revoke: ' ROOT_TOKEN </dev/tty
echo >/dev/tty

read -rp 'Administrator username [manoj-admin]: ' ADMIN_USERNAME </dev/tty
ADMIN_USERNAME="${ADMIN_USERNAME:-manoj-admin}"

read -rsp 'Administrator password: ' ADMIN_PASSWORD </dev/tty
echo >/dev/tty

set +e
printf '%s\n%s\n%s\n' \
"${ROOT_TOKEN}" \
"${ADMIN_USERNAME}" \
"${ADMIN_PASSWORD}" |
python3 -c '
import json
import sys

root_token = sys.stdin.readline().rstrip("\n")
username = sys.stdin.readline().rstrip("\n")
password = sys.stdin.readline().rstrip("\n")

print(json.dumps({
  "action": "revoke-initial-root",
  "root_token": root_token,
  "admin_username": username,
  "admin_password": password
}))
' |
/usr/local/sbin/ac-vault-bootstrap-logical

REVOKE_RC=$?
set -e

unset ROOT_TOKEN
unset ADMIN_USERNAME
unset ADMIN_PASSWORD

if (( REVOKE_RC != 0 )); then
echo "ERROR: Initial root-token retirement failed." >&2
exit "${REVOKE_RC}"
fi
BASH

Expected:


PASS: The initial root token was revoked after platform-admin validation.

Permanently delete the separately stored initial-root-token value after revocation. Keep all five unseal shares permanently. The current production instance has completed this root-token-retirement step.


27. Enable KV v2 Mounts

The original working baseline used only cert/. The certificate-issuer integration now requires a second, separately governed KV v2 mount at acme/:


Mount: cert/
Plugin: kv
Version: 2
max_versions: 20
cas_required: false
delete_version_after: 0s

Mount: acme/
Plugin: kv
Version: 2
Target max_versions: 10
Target cas_required: false
Target delete_version_after: 0s
Cloudflare path: acme/cloudflare/dns

The bootstrap utility enables it idempotently:


vault secrets enable -path=cert kv-v2
vault write cert/config max_versions=20 cas_required=false delete_version_after=0s

Verify directly on prod-vault-01 after named-administrator login:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault secrets list -detailed

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault read cert/config

The first direct helper run may already have enabled acme/ before it stopped at acme/config. Verify the mount type before changing anything; do not disable or recreate it blindly. The confirmed error is an authorization failure at the mount configuration path, not a network or TLS failure.

Verify the current state directly on prod-vault-01 after logging in as manoj-admin:


sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault secrets list -detailed

sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault token capabilities acme/config

Expected before the policy repair is either deny for acme/config or the same 403 permission denied when the helper attempts the write. After the source and live policy are corrected, vault read acme/config must succeed and report the intended KV v2 configuration.


28. Certificate Data Model and Automatic Version History

The approved path format is:


cert/<environment>/<workload-type>/<workload-name>

Environments:


local
dev
qa
prod

Workload types:


web
srvc
job
infra

The bootstrap utility creates a _schema record for every environment/workload-type combination. Recommended certificate fields are:


domain
certificate_pem
private_key_pem
chain_pem
fullchain_pem
sans_json
issuer
serial_number
not_before
not_after
renew_after
acme_directory
updated_at

Every successful KV v2 write to the same logical path creates a new version. The current mount retains up to twenty versions. Do not encode secret material in key names.


29. Least-Privilege Machine Policies

The exact version-controlled policies deployed by Ansible are:

cert-issuer.hcl


# Certificate issuer: write certificate versions, but never destroy history.
path "cert/config" {
capabilities = ["read"]
}

path "cert/data/*" {
capabilities = ["create", "read", "update", "patch"]
}

path "cert/metadata" {
capabilities = ["list"]
}

path "cert/metadata/*" {
capabilities = ["read", "list"]
}

path "cert/subkeys/*" {
capabilities = ["read"]
}


# Read only the Cloudflare credential required for DNS-01.
path "acme/data/cloudflare/dns" {
capabilities = ["read"]
}

path "acme/metadata/cloudflare/dns" {
capabilities = ["read"]
}

cert-reader-dev.hcl


# Read-only certificate consumer policy for the dev environment.
path "cert/data/dev/*" {
capabilities = ["read"]
}

path "cert/metadata/dev" {
capabilities = ["read", "list"]
}

path "cert/metadata/dev/*" {
capabilities = ["read", "list"]
}

path "cert/subkeys/dev/*" {
capabilities = ["read"]
}

cert-reader-local.hcl


# Read-only certificate consumer policy for the local environment.
path "cert/data/local/*" {
capabilities = ["read"]
}

path "cert/metadata/local" {
capabilities = ["read", "list"]
}

path "cert/metadata/local/*" {
capabilities = ["read", "list"]
}

path "cert/subkeys/local/*" {
capabilities = ["read"]
}

cert-reader-prod.hcl


# Read-only certificate consumer policy for the prod environment.
path "cert/data/prod/*" {
capabilities = ["read"]
}

path "cert/metadata/prod" {
capabilities = ["read", "list"]
}

path "cert/metadata/prod/*" {
capabilities = ["read", "list"]
}

path "cert/subkeys/prod/*" {
capabilities = ["read"]
}

cert-reader-qa.hcl


# Read-only certificate consumer policy for the qa environment.
path "cert/data/qa/*" {
capabilities = ["read"]
}

path "cert/metadata/qa" {
capabilities = ["read", "list"]
}

path "cert/metadata/qa/*" {
capabilities = ["read", "list"]
}

path "cert/subkeys/qa/*" {
capabilities = ["read"]
}

platform-admin.hcl


# Non-root human administrator policy for day-to-day Vault operations.
# This intentionally excludes the root policy and sys/raw.
path "sys/health" {
capabilities = ["read", "sudo"]
}

path "sys/auth" {
capabilities = ["read"]
}

path "sys/auth/*" {
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

path "sys/mounts" {
capabilities = ["read"]
}

path "sys/mounts/*" {
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

path "sys/policies/acl" {
capabilities = ["list"]
}

path "sys/policies/acl/*" {
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

path "sys/audit" {
capabilities = ["read", "list", "sudo"]
}

path "sys/audit/*" {
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

path "sys/storage/raft/*" {
capabilities = ["read", "update", "sudo"]
}

path "auth/*" {
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

path "identity/*" {
capabilities = ["create", "read", "update", "delete", "list"]
}

path "cert/*" {
capabilities = ["create", "read", "update", "patch", "delete", "list"]
}


# Required by the direct prod-cert-01 preparation helper.
path "acme/config" {
capabilities = ["create", "read", "update"]
}

path "acme/data/cloudflare/_schema" {
capabilities = ["create", "read", "update", "patch"]
}

path "acme/metadata/cloudflare/_schema" {
capabilities = ["read"]
}

path "acme/data/cloudflare/dns" {
capabilities = ["create", "read", "update", "patch"]
}

path "acme/metadata/cloudflare/dns" {
capabilities = ["read"]
}
Source-of-truth requirement

The direct policy repair must not exist only in Vault's live policy store or only under /usr/local/share/ac-vault/policies. Update the repository copy first or immediately afterward, apply the same file to the server, and run vault policy write platform-admin .... Otherwise a later Ansible convergence can restore the old policy and reproduce the 403 failure.


30. Prepare AppRole Machine Identities

The logical bootstrap enables AppRole and creates these roles without generating any SecretID:


cert-issuer          → policy cert-issuer
cert-reader-local     → policy cert-reader-local
cert-reader-dev       → policy cert-reader-dev
cert-reader-qa        → policy cert-reader-qa
cert-reader-prod      → policy cert-reader-prod

Current role configuration:


token_type: batch
token_ttl: 15m
token_max_ttl: 30m
secret_id_ttl: 720h
secret_id_num_uses: 0

AppRole is for machine authentication, not human administration. Generate a SecretID only after the target VM exists and can receive it securely. The RoleID and SecretID should meet only on the consuming machine.

Example future retrieval, performed directly on prod-vault-01 under an approved administrator token:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault read -field=role_id auth/approle/role/cert-issuer/role-id

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault write -wrap-ttl=5m -field=wrapping_token -f auth/approle/role/cert-issuer/secret-id

Do not generate a SecretID until the target issuer VM is installed and ready. The authoritative consumer is now prod-cert-01, not the earlier planning name prod-cert-issuer-01. Do not put the SecretID or wrapping token in Terraform state, GitHub Actions output, Ansible inventory, documentation, or chat.

30.1 Current direct-server checkpoint

The helper is installed on prod-vault-01, and the following attempt authenticated successfully but stopped before secret storage:


Error writing data to acme/config: Error making API request.
URL: PUT https://192.168.8.2:8200/v1/acme/config
Code: 403
permission denied

Interpretation:


Vault TLS: working
userpass authentication: working
platform-admin attachment: working
Vault API: reachable
acme/config authorization: missing
Cloudflare token storage: not reached
wrapped SecretID generation: not reached
prod-cert-01 bootstrap: not reached

30.2 Authoritative Vault-side helper

The exact helper source must exist in Git and be installed by Ansible. The current direct repair installed it at /usr/local/sbin/ac-vault-prepare-cert-issuer:


#!/usr/bin/env bash
set -euo pipefail
umask 077

VAULT_ADDR="https://192.168.8.2:8200"
VAULT_CACERT="/opt/vault/tls/vault-cert.pem"
POLICY_DIR="/usr/local/share/ac-vault/policies"
ACME_MOUNT="acme"
CERT_ISSUER_CIDR="192.168.8.3/32"
export VAULT_ADDR VAULT_CACERT

payload_file="$(mktemp /run/ac-vault-cert-issuer.XXXXXX.json)"
login_file="$(mktemp /run/ac-vault-login.XXXXXX.json)"

cleanup() {
unset VAULT_TOKEN ADMIN_PASSWORD CLOUDFLARE_API_TOKEN
rm -f "${payload_file}" "${login_file}"
}
trap cleanup EXIT

cat > "${payload_file}"
chmod 0600 "${payload_file}"

ADMIN_USERNAME="$(jq -r '.admin_username // empty' "${payload_file}")"
ADMIN_PASSWORD="$(jq -r '.admin_password // empty' "${payload_file}")"
CLOUDFLARE_API_TOKEN="$(jq -r '.cloudflare_api_token // empty' "${payload_file}")"
WRAP_TTL="$(jq -r '.wrap_ttl // "30m"' "${payload_file}")"

[[ "${ADMIN_USERNAME}" =~ ^[a-zA-Z0-9._-]+$ ]] || {
echo "ERROR: admin_username is missing or invalid." >&2
exit 1
}
[[ -n "${ADMIN_PASSWORD}" ]] || {
echo "ERROR: admin_password is required." >&2
exit 1
}
[[ -n "${CLOUDFLARE_API_TOKEN}" ]] || {
echo "ERROR: cloudflare_api_token is required." >&2
exit 1
}
[[ "${WRAP_TTL}" =~ ^[1-9][0-9]*(s|m)$ ]] || {
echo "ERROR: wrap_ttl must be a positive duration in seconds or minutes." >&2
exit 1
}

set +e
vault status -format=json >/dev/null 2>&1
status_rc=$?
set -e
if [[ "${status_rc}" -ne 0 ]]; then
if [[ "${status_rc}" -eq 2 ]]; then
  echo "ERROR: Vault is sealed. Unseal it before preparing the issuer." >&2
else
  echo "ERROR: Vault is not reachable at ${VAULT_ADDR}." >&2
fi
exit 1
fi

jq -n --arg password "${ADMIN_PASSWORD}" '{password:$password}' > "${login_file}"
VAULT_TOKEN="$(
curl --silent --show-error --fail   --cacert "${VAULT_CACERT}"   --request POST   --data @"${login_file}"   "${VAULT_ADDR}/v1/auth/userpass/login/${ADMIN_USERNAME}" |
jq -r '.auth.client_token // empty'
)"
[[ -n "${VAULT_TOKEN}" ]] || {
echo "ERROR: Vault administrator authentication failed." >&2
exit 1
}
export VAULT_TOKEN

vault token lookup -format=json >/dev/null
vault policy write platform-admin "${POLICY_DIR}/platform-admin.hcl" >/dev/null
vault policy write cert-issuer "${POLICY_DIR}/cert-issuer.hcl" >/dev/null

secrets_json="$(vault secrets list -format=json)"
if ! jq -e --arg path "${ACME_MOUNT}/" 'has($path)' <<<"${secrets_json}" >/dev/null; then
vault secrets enable -path="${ACME_MOUNT}" kv-v2 >/dev/null
else
existing_type="$(jq -r --arg path "${ACME_MOUNT}/" '.[$path].type // empty' <<<"${secrets_json}")"
existing_version="$(jq -r --arg path "${ACME_MOUNT}/" '.[$path].options.version // empty' <<<"${secrets_json}")"
if [[ "${existing_type}" != "kv" || "${existing_version}" != "2" ]]; then
  echo "ERROR: ${ACME_MOUNT}/ exists but is not a KV v2 secrets engine." >&2
  exit 1
fi
fi

vault write "${ACME_MOUNT}/config" max_versions=10 cas_required=false delete_version_after=0s >/dev/null

if ! vault kv get -format=json "${ACME_MOUNT}/cloudflare/_schema" >/dev/null 2>&1; then
vault kv put "${ACME_MOUNT}/cloudflare/_schema"   path_format="acme/cloudflare/dns"   required_fields="api_token"   recommended_scope="Zone:DNS:Edit and Zone:Zone:Read for approved zones only"   schema_version="1" >/dev/null
fi

vault kv put "${ACME_MOUNT}/cloudflare/dns" api_token="${CLOUDFLARE_API_TOKEN}" managed_for="prod-cert-01" updated_at="$(date --utc +%Y-%m-%dT%H:%M:%SZ)" >/dev/null

vault write auth/approle/role/cert-issuer token_type=batch token_policies=cert-issuer token_ttl=15m token_max_ttl=30m secret_id_ttl=0 secret_id_num_uses=0 secret_id_bound_cidrs="${CERT_ISSUER_CIDR}" token_bound_cidrs="${CERT_ISSUER_CIDR}" >/dev/null

ROLE_ID="$(vault read -field=role_id auth/approle/role/cert-issuer/role-id)"
WRAPPED_RESPONSE="$(vault write -format=json -wrap-ttl="${WRAP_TTL}" -force auth/approle/role/cert-issuer/secret-id)"
WRAPPING_TOKEN="$(jq -r '.wrap_info.token // empty' <<<"${WRAPPED_RESPONSE}")"
CREATION_PATH="$(jq -r '.wrap_info.creation_path // empty' <<<"${WRAPPED_RESPONSE}")"

[[ -n "${ROLE_ID}" && -n "${WRAPPING_TOKEN}" ]] || {
echo "ERROR: Vault did not return the issuer bootstrap credentials." >&2
exit 1
}
[[ "${CREATION_PATH}" == "auth/approle/role/cert-issuer/secret-id" ]] || {
echo "ERROR: Unexpected wrapping creation path: ${CREATION_PATH}" >&2
exit 1
}

jq -n --arg role_id "${ROLE_ID}" --arg wrapping_token "${WRAPPING_TOKEN}" --arg wrap_ttl "${WRAP_TTL}" --arg creation_path "${CREATION_PATH}" '{role_id:$role_id,wrapping_token:$wrapping_token,wrap_ttl:$wrap_ttl,creation_path:$creation_path}'

Validate the installed helper without displaying secrets:


sudo bash -n /usr/local/sbin/ac-vault-prepare-cert-issuer
sudo stat -c '%n owner=%U group=%G mode=%a size=%s' /usr/local/sbin/ac-vault-prepare-cert-issuer
sudo test -x /usr/local/sbin/ac-vault-prepare-cert-issuer && echo "PASS: preparation utility is executable"

30.3 Apply the corrected policies

Update the repository copies of platform-admin.hcl and cert-issuer.hcl with the ACME stanzas shown in section 29, deploy them to /usr/local/share/ac-vault/policies, then authenticate directly on prod-vault-01:


sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault login -method=userpass username=manoj-admin

sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault policy write platform-admin /usr/local/share/ac-vault/policies/platform-admin.hcl

sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault policy write cert-issuer /usr/local/share/ac-vault/policies/cert-issuer.hcl

sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault token capabilities acme/config

Required acme/config capabilities:


create
read
update

30.4 Rerun the helper directly on prod-vault-01

Use a Cloudflare API token restricted to aspireclan.com with Zone:DNS:Edit and Zone:Zone:Read. The password and token prompts are hidden and do not enter shell history:


bash <<'BASH'
set -Eeuo pipefail
umask 077
sudo -v

read -r -p "Vault administrator username [manoj-admin]: " ADMIN_USERNAME </dev/tty
ADMIN_USERNAME="${ADMIN_USERNAME:-manoj-admin}"
read -r -s -p "Vault administrator password: " ADMIN_PASSWORD </dev/tty
echo
read -r -s -p "Restricted Cloudflare DNS API token: " CLOUDFLARE_API_TOKEN </dev/tty
echo

[[ "${ADMIN_USERNAME}" =~ ^[A-Za-z0-9._-]+$ ]] || exit 1
[[ -n "${ADMIN_PASSWORD}" ]] || exit 1
[[ ${#CLOUDFLARE_API_TOKEN} -ge 20 ]] || exit 1

PAYLOAD="$(
jq -n   --arg admin_username "${ADMIN_USERNAME}"   --arg admin_password "${ADMIN_PASSWORD}"   --arg cloudflare_api_token "${CLOUDFLARE_API_TOKEN}"   --arg wrap_ttl "30m"   '{admin_username:$admin_username,admin_password:$admin_password,cloudflare_api_token:$cloudflare_api_token,wrap_ttl:$wrap_ttl}'
)"

unset ADMIN_PASSWORD CLOUDFLARE_API_TOKEN
RESULT="$(printf '%s' "${PAYLOAD}" | sudo -n /usr/local/sbin/ac-vault-prepare-cert-issuer)"
unset PAYLOAD

jq -e '.role_id != null and .wrapping_token != null and .creation_path == "auth/approle/role/cert-issuer/secret-id"' <<<"${RESULT}" >/dev/null

echo "===== ROLE ID ====="
jq -r '.role_id' <<<"${RESULT}"
echo "===== ONE-USE WRAPPING TOKEN ====="
jq -r '.wrapping_token' <<<"${RESULT}"
echo "IMPORTANT: use the wrapping token within 30 minutes."
BASH

The successful helper run must occur only once per credential-delivery attempt. Copy the RoleID and one-use wrapping token directly to the prod-cert-01 PuTTY session. Do not save them in Git, email, chat, or an ordinary text file.

30.5 Bootstrap the wrapped credential on prod-cert-01

The Vault certificate is already installed at /etc/aspireclan/cert-issuer/vault-ca.pem. Its parent directory is restricted, so run TLS and issuer commands as ac-cert-issuer or root.


bash <<'BASH'
set -Eeuo pipefail
umask 077
CONFIG="/etc/aspireclan/cert-issuer/config.yml"
ISSUER="/usr/local/sbin/ac-cert-issuer"
sudo -v

sudo test -s /etc/aspireclan/cert-issuer/vault-ca.pem
sudo test ! -e /etc/aspireclan/cert-issuer/approle/role_id
sudo test ! -e /etc/aspireclan/cert-issuer/approle/secret_id

read -r -p "Paste the Vault Role ID: " ROLE_ID </dev/tty
read -r -s -p "Paste the one-use Vault wrapping token: " WRAPPING_TOKEN </dev/tty
echo

[[ -n "${ROLE_ID}" && -n "${WRAPPING_TOKEN}" ]]
printf '%s' "${WRAPPING_TOKEN}" | sudo -n "${ISSUER}"   --config "${CONFIG}"   bootstrap-approle   --role-id "${ROLE_ID}"

unset ROLE_ID WRAPPING_TOKEN
echo "PASS: AppRole bootstrap completed."
BASH

Verify without printing credential contents:


sudo stat -c '%n owner=%U group=%G mode=%a size=%s' /etc/aspireclan/cert-issuer/approle/role_id /etc/aspireclan/cert-issuer/approle/secret_id

sudo /usr/sbin/runuser --user ac-cert-issuer -- /usr/bin/env HOME=/var/lib/ac-cert-issuer /usr/local/sbin/ac-cert-issuer --config /etc/aspireclan/cert-issuer/config.yml preflight

Expected while all certificate declarations remain disabled:


Vault AppRole login succeeded
No certificate groups are enabled

Only after preflight passes may the timer be enabled:


sudo systemctl daemon-reload
sudo systemctl enable --now ac-cert-issuer.timer
sudo systemctl start ac-cert-issuer.service
sudo systemctl is-enabled ac-cert-issuer.timer
sudo systemctl is-active ac-cert-issuer.timer
sudo systemctl list-timers ac-cert-issuer.timer --all --no-pager
sudo journalctl -u ac-cert-issuer.service -n 100 --no-pager

31. Configure Raft Snapshot Backups

Current backup implementation status — validated on 2026-06-14

The dedicated off-host Vault backup path is now implemented with the following approved design:

  • Dedicated bucket: aspireclan-prod-vault-raft-backups-425389089086-us-east-1.
  • Encryption: SSE-S3 using AES256; no customer-managed KMS key and no KMS monthly key charge.
  • S3 Versioning: enabled.
  • S3 Object Lock: enabled in Governance mode.
  • Public access: fully blocked.
  • Object ownership: BucketOwnerEnforced; ACLs disabled.
  • Lifecycle classes: hourly, daily, and monthly.
  • Upload identity: dedicated IAM user prod-vault-01-raft-backup with no S3 delete permission.
  • Vault identity: dedicated raft-snapshot policy and AppRole.
  • Execution: root-only script plus systemd services and timers.
  • Failure handling: failed local staging retained temporarily and OnFailure= invokes the SNS alert service.
  • First hourly snapshot: uploaded successfully to S3.
  • Verified S3 metadata: AES256, a non-empty version ID, Governance Object Lock, and retention through 2026-06-17T13:30:19+00:00 for the first confirmed object.
  • Snapshot download from S3: successful.
  • Remaining confirmation: downloaded checksum validation, scheduled timer enablement/observation, CloudWatch stale-backup alarm test, and isolated snapshot restore.

31.1 Approved backup schedule and retention


Backup class   Schedule                         Object retention   Lifecycle expiration
hourly         every hour near minute 17        3 days             after 4 days
daily          every day at 00:37 UTC           35 days            after 36 days
monthly        first day at 01:07 UTC           365 days           after 366 days

S3 object prefix:
prod-vault-01/<backup-class>/<YYYY>/<MM>/<DD>/

Example objects:
prod-vault-01/hourly/2026/06/14/prod-vault-01-20260614T133019Z.snap
prod-vault-01/hourly/2026/06/14/prod-vault-01-20260614T133019Z.snap.sha256

The hourly recovery-point objective is approximately one hour while Vault is running and unsealed. A Shamir-sealed Vault cannot create a snapshot; after every VM reboot or Vault service restart, submit three shares before scheduled backups can resume.

31.2 Create and harden the dedicated S3 bucket

Run this subsection from AWS CloudShell or another AWS-administrative terminal. Do not run broad AWS administrative credentials on prod-vault-01.

Set the values, replacing the alert email address:


export AWS_PAGER=""
export AWS_REGION="us-east-1"
export AWS_ACCOUNT_ID="425389089086"
export VAULT_BACKUP_BUCKET="aspireclan-prod-vault-raft-backups-425389089086-us-east-1"
export IAM_USER="prod-vault-01-raft-backup"
export IAM_POLICY_NAME="prod-vault-01-raft-backup"
export SNS_TOPIC_NAME="aspireclan-prod-vault-backup-alerts"
export ALERT_EMAIL="<YOUR_ALERT_EMAIL_ADDRESS>"

aws sts get-caller-identity

CURRENT_ACCOUNT_ID="$(
aws sts get-caller-identity     --query Account     --output text
)"

test "${CURRENT_ACCOUNT_ID}" = "${AWS_ACCOUNT_ID}" || {
echo "ERROR: Connected to ${CURRENT_ACCOUNT_ID}; expected ${AWS_ACCOUNT_ID}." >&2
exit 1
}

Create the bucket once with Object Lock enabled. If head-bucket succeeds, do not rerun create-bucket:


if aws s3api head-bucket   --bucket "${VAULT_BACKUP_BUCKET}"   2>/dev/null
then
echo "NOTICE: Bucket already exists."
else
aws s3api create-bucket     --bucket "${VAULT_BACKUP_BUCKET}"     --region "${AWS_REGION}"     --object-lock-enabled-for-bucket
fi

aws s3api put-bucket-ownership-controls   --bucket "${VAULT_BACKUP_BUCKET}"   --ownership-controls     'Rules=[{ObjectOwnership=BucketOwnerEnforced}]'

aws s3api put-bucket-versioning   --bucket "${VAULT_BACKUP_BUCKET}"   --versioning-configuration Status=Enabled

aws s3api put-public-access-block   --bucket "${VAULT_BACKUP_BUCKET}"   --public-access-block-configuration     '{
    "BlockPublicAcls": true,
    "IgnorePublicAcls": true,
    "BlockPublicPolicy": true,
    "RestrictPublicBuckets": true
  }' 

Configure SSE-S3 explicitly. This is the approved low-cost design; do not create a customer-managed KMS key for this bucket:


cat > /tmp/prod-vault-backup-encryption.json <<'EOF'
{
"Rules": [
  {
    "ApplyServerSideEncryptionByDefault": {
      "SSEAlgorithm": "AES256"
    }
  }
]
}
EOF

aws s3api put-bucket-encryption   --bucket "${VAULT_BACKUP_BUCKET}"   --server-side-encryption-configuration     file:///tmp/prod-vault-backup-encryption.json

Configure default Object Lock retention:


cat > /tmp/prod-vault-backup-object-lock.json <<'EOF'
{
"ObjectLockEnabled": "Enabled",
"Rule": {
  "DefaultRetention": {
    "Mode": "GOVERNANCE",
    "Days": 3
  }
}
}
EOF

aws s3api put-object-lock-configuration   --bucket "${VAULT_BACKUP_BUCKET}"   --object-lock-configuration     file:///tmp/prod-vault-backup-object-lock.json

Configure lifecycle retention:


cat > /tmp/prod-vault-backup-lifecycle.json <<'EOF'
{
"Rules": [
  {
    "ID": "ExpireHourlyVaultSnapshots",
    "Status": "Enabled",
    "Filter": {"Prefix": "prod-vault-01/hourly/"},
    "Expiration": {"Days": 4},
    "NoncurrentVersionExpiration": {"NoncurrentDays": 1},
    "AbortIncompleteMultipartUpload": {"DaysAfterInitiation": 1}
  },
  {
    "ID": "ExpireDailyVaultSnapshots",
    "Status": "Enabled",
    "Filter": {"Prefix": "prod-vault-01/daily/"},
    "Expiration": {"Days": 36},
    "NoncurrentVersionExpiration": {"NoncurrentDays": 1},
    "AbortIncompleteMultipartUpload": {"DaysAfterInitiation": 1}
  },
  {
    "ID": "ExpireMonthlyVaultSnapshots",
    "Status": "Enabled",
    "Filter": {"Prefix": "prod-vault-01/monthly/"},
    "Expiration": {"Days": 366},
    "NoncurrentVersionExpiration": {"NoncurrentDays": 1},
    "AbortIncompleteMultipartUpload": {"DaysAfterInitiation": 1}
  }
]
}
EOF

aws s3api put-bucket-lifecycle-configuration   --bucket "${VAULT_BACKUP_BUCKET}"   --lifecycle-configuration     file:///tmp/prod-vault-backup-lifecycle.json

Require HTTPS and explicit SSE-S3 on every upload:


cat > /tmp/prod-vault-backup-bucket-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
  {
    "Sid": "DenyInsecureTransport",
    "Effect": "Deny",
    "Principal": "*",
    "Action": "s3:*",
    "Resource": [
      "arn:aws:s3:::${VAULT_BACKUP_BUCKET}",
      "arn:aws:s3:::${VAULT_BACKUP_BUCKET}/*"
    ],
    "Condition": {
      "Bool": {"aws:SecureTransport": "false"}
    }
  },
  {
    "Sid": "DenyUploadsWithoutSseS3",
    "Effect": "Deny",
    "Principal": "*",
    "Action": "s3:PutObject",
    "Resource": "arn:aws:s3:::${VAULT_BACKUP_BUCKET}/*",
    "Condition": {
      "StringNotEquals": {
        "s3:x-amz-server-side-encryption": "AES256"
      }
    }
  }
]
}
EOF

aws s3api put-bucket-policy   --bucket "${VAULT_BACKUP_BUCKET}"   --policy file:///tmp/prod-vault-backup-bucket-policy.json

aws s3api put-bucket-tagging   --bucket "${VAULT_BACKUP_BUCKET}"   --tagging '{
  "TagSet": [
    {"Key": "Environment", "Value": "prod"},
    {"Key": "Component", "Value": "vault"},
    {"Key": "Purpose", "Value": "raft-backup"},
    {"Key": "ManagedBy", "Value": "manual-bootstrap"}
  ]
}' 

31.3 Create SNS and the dedicated upload-only IAM identity

Create the notification topic and confirm the email subscription:


SNS_TOPIC_ARN="$(
aws sns create-topic     --region "${AWS_REGION}"     --name "${SNS_TOPIC_NAME}"     --query TopicArn     --output text
)"

aws sns subscribe   --region "${AWS_REGION}"   --topic-arn "${SNS_TOPIC_ARN}"   --protocol email   --notification-endpoint "${ALERT_EMAIL}"

aws sns list-subscriptions-by-topic   --region "${AWS_REGION}"   --topic-arn "${SNS_TOPIC_ARN}"   --output table

Create the uploader and its least-privilege policy. It intentionally has no S3 delete, governance-bypass, bucket-policy, lifecycle, versioning, or KMS permissions:


aws iam create-user   --user-name "${IAM_USER}"   --tags     Key=Environment,Value=prod     Key=Component,Value=vault     Key=Purpose,Value=raft-backup

cat > /tmp/prod-vault-backup-uploader-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
  {
    "Sid": "ReadBucketLocation",
    "Effect": "Allow",
    "Action": ["s3:GetBucketLocation"],
    "Resource": "arn:aws:s3:::${VAULT_BACKUP_BUCKET}"
  },
  {
    "Sid": "ListVaultBackupPrefix",
    "Effect": "Allow",
    "Action": ["s3:ListBucket"],
    "Resource": "arn:aws:s3:::${VAULT_BACKUP_BUCKET}",
    "Condition": {
      "StringLike": {
        "s3:prefix": ["prod-vault-01", "prod-vault-01/*"]
      }
    }
  },
  {
    "Sid": "UploadAndVerifyVaultSnapshots",
    "Effect": "Allow",
    "Action": [
      "s3:PutObject",
      "s3:PutObjectRetention",
      "s3:GetObject",
      "s3:GetObjectVersion",
      "s3:GetObjectRetention"
    ],
    "Resource": "arn:aws:s3:::${VAULT_BACKUP_BUCKET}/prod-vault-01/*"
  },
  {
    "Sid": "PublishVaultBackupAlerts",
    "Effect": "Allow",
    "Action": ["sns:Publish"],
    "Resource": "${SNS_TOPIC_ARN}"
  },
  {
    "Sid": "PublishVaultBackupSuccessMetric",
    "Effect": "Allow",
    "Action": ["cloudwatch:PutMetricData"],
    "Resource": "*"
  }
]
}
EOF

aws iam put-user-policy   --user-name "${IAM_USER}"   --policy-name "${IAM_POLICY_NAME}"   --policy-document     file:///tmp/prod-vault-backup-uploader-policy.json

umask 077
aws iam create-access-key   --user-name "${IAM_USER}"   > "${HOME}/prod-vault-01-raft-backup-access-key.json"

chmod 0600   "${HOME}/prod-vault-01-raft-backup-access-key.json"

Validate the AWS resources:


aws s3api get-bucket-versioning   --bucket "${VAULT_BACKUP_BUCKET}"

aws s3api get-public-access-block   --bucket "${VAULT_BACKUP_BUCKET}"

aws s3api get-bucket-ownership-controls   --bucket "${VAULT_BACKUP_BUCKET}"

aws s3api get-bucket-encryption   --bucket "${VAULT_BACKUP_BUCKET}"

aws s3api get-object-lock-configuration   --bucket "${VAULT_BACKUP_BUCKET}"

aws s3api get-bucket-lifecycle-configuration   --bucket "${VAULT_BACKUP_BUCKET}"

aws s3api get-bucket-policy-status   --bucket "${VAULT_BACKUP_BUCKET}"

Required state:


Versioning: Enabled
Public access: all four controls true
Ownership: BucketOwnerEnforced
Encryption: AES256
Object Lock: Enabled
Default retention: GOVERNANCE, 3 days
Bucket policy public status: false

31.4 Install AWS CLI and protected credentials on prod-vault-01

Connect to prod-vault-01, then install the AWS CLI and required tools:


sudo apt-get update
sudo apt-get install -y   ca-certificates   curl   unzip   jq   util-linux

sudo bash <<'BASH'
set -euo pipefail
work_dir="$(mktemp -d)"
trap 'rm -rf "${work_dir}"' EXIT

case "$(uname -m)" in
x86_64)
  installer_url="https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"
  ;;
aarch64|arm64)
  installer_url="https://awscli.amazonaws.com/awscli-exe-linux-aarch64.zip"
  ;;
*)
  echo "ERROR: Unsupported architecture: $(uname -m)" >&2
  exit 1
  ;;
esac

curl   --fail   --silent   --show-error   --location   "${installer_url}"   --output "${work_dir}/awscliv2.zip"

unzip -q "${work_dir}/awscliv2.zip" -d "${work_dir}"

if command -v aws >/dev/null 2>&1; then
"${work_dir}/aws/install"     --bin-dir /usr/local/bin     --install-dir /usr/local/aws-cli     --update
else
"${work_dir}/aws/install"     --bin-dir /usr/local/bin     --install-dir /usr/local/aws-cli
fi
BASH

sudo /usr/local/bin/aws --version

Store the upload-only access key interactively. Do not place the secret in command arguments, Git, Terraform, Ansible, documentation, or chat:


sudo bash <<'BASH'
set -euo pipefail
umask 077

install -d -o root -g root -m 0700 /root/.aws

read -rp 'AWS access key ID: ' AWS_ACCESS_KEY_ID </dev/tty
read -rsp 'AWS secret access key: ' AWS_SECRET_ACCESS_KEY </dev/tty
echo >/dev/tty

cat > /root/.aws/credentials <<EOF
[vault-backup]
aws_access_key_id = ${AWS_ACCESS_KEY_ID}
aws_secret_access_key = ${AWS_SECRET_ACCESS_KEY}
EOF

cat > /root/.aws/config <<'EOF'
[profile vault-backup]
region = us-east-1
output = json
cli_pager =
EOF

chmod 0600 /root/.aws/credentials /root/.aws/config
unset AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY
BASH

sudo -H aws sts get-caller-identity   --profile vault-backup   --region us-east-1

sudo -H aws s3api get-bucket-location   --bucket aspireclan-prod-vault-raft-backups-425389089086-us-east-1   --profile vault-backup   --region us-east-1

After the identity test succeeds, securely delete the temporary access-key JSON from CloudShell.

31.5 Configure the local backup environment and Vault AppRole

Create the protected configuration directory and file. Replace the SNS ARN with the real topic ARN:


sudo install -d   -o root   -g root   -m 0700   /etc/ac-vault-snapshot

read -rp 'Vault backup SNS topic ARN: ' SNS_TOPIC_ARN

sudo tee /etc/ac-vault-snapshot/config.env >/dev/null <<EOF
AWS_ACCOUNT_ID=425389089086
AWS_REGION=us-east-1
AWS_PROFILE=vault-backup
S3_BUCKET=aspireclan-prod-vault-raft-backups-425389089086-us-east-1
SNS_TOPIC_ARN=${SNS_TOPIC_ARN}
VAULT_ADDR=https://192.168.8.2:8200
VAULT_CACERT=/opt/vault/tls/vault-cert.pem
BACKUP_SERVER=prod-vault-01
CLOUDWATCH_NAMESPACE=Aspireclan/VaultBackup
EOF

sudo chown root:root /etc/ac-vault-snapshot/config.env
sudo chmod 0600 /etc/ac-vault-snapshot/config.env
unset SNS_TOPIC_ARN

Log in as manoj-admin, create the snapshot policy, and create the dedicated AppRole:


sudo -H env   VAULT_ADDR="https://192.168.8.2:8200"   VAULT_CACERT="/opt/vault/tls/vault-cert.pem"   vault login -method=userpass username=manoj-admin

sudo tee /etc/ac-vault-snapshot/raft-snapshot.hcl >/dev/null <<'EOF'
path "sys/storage/raft/snapshot" {
capabilities = ["read", "sudo"]
}

path "sys/health" {
capabilities = ["read"]
}

path "auth/token/lookup-self" {
capabilities = ["read"]
}

path "auth/token/revoke-self" {
capabilities = ["update"]
}
EOF

sudo -H env   VAULT_ADDR="https://192.168.8.2:8200"   VAULT_CACERT="/opt/vault/tls/vault-cert.pem"   vault policy write   raft-snapshot   /etc/ac-vault-snapshot/raft-snapshot.hcl

sudo -H env   VAULT_ADDR="https://192.168.8.2:8200"   VAULT_CACERT="/opt/vault/tls/vault-cert.pem"   vault write auth/approle/role/raft-snapshot     bind_secret_id=true     secret_id_ttl=0     secret_id_num_uses=0     secret_id_bound_cidrs="127.0.0.1/32,192.168.8.2/32"     token_type=service     token_policies="raft-snapshot"     token_no_default_policy=true     token_ttl=15m     token_max_ttl=15m     token_num_uses=0     token_bound_cidrs="127.0.0.1/32,192.168.8.2/32"

sudo -H bash <<'BASH'
set -euo pipefail
umask 077
export VAULT_ADDR="https://192.168.8.2:8200"
export VAULT_CACERT="/opt/vault/tls/vault-cert.pem"

ROLE_ID="$(
vault read     -field=role_id     auth/approle/role/raft-snapshot/role-id
)"

SECRET_ID="$(
vault write     -field=secret_id     -f     auth/approle/role/raft-snapshot/secret-id
)"

cat > /etc/ac-vault-snapshot/approle.env <<EOF
VAULT_ROLE_ID=${ROLE_ID}
VAULT_SECRET_ID=${SECRET_ID}
EOF

chown root:root /etc/ac-vault-snapshot/approle.env
chmod 0600 /etc/ac-vault-snapshot/approle.env
unset ROLE_ID SECRET_ID
BASH

sudo -H env   VAULT_ADDR="https://192.168.8.2:8200"   VAULT_CACERT="/opt/vault/tls/vault-cert.pem"   vault token revoke -self

sudo rm -f /root/.vault-token

Create the protected working directories:


sudo install -d -o root -g root -m 0700 /var/lib/ac-vault-snapshot
sudo install -d -o root -g root -m 0700 /var/lib/ac-vault-snapshot/tmp
sudo install -d -o root -g root -m 0700 /var/lib/ac-vault-snapshot/failed

31.6 Install the corrected SSE-S3 snapshot script

The following is the authoritative installed script. The explicit .sealed == false test is important; do not use .sealed // true, because jq treats Boolean false as a fallback value and would incorrectly report an unsealed Vault as sealed.


sudo tee /usr/local/sbin/ac-vault-raft-backup >/dev/null <<'BASH'
#!/usr/bin/env bash
set -Eeuo pipefail
umask 077

BACKUP_CLASS="${1:-}"

case "${BACKUP_CLASS}" in
hourly)
  RETENTION_DAYS=3
  ;;
daily)
  RETENTION_DAYS=35
  ;;
monthly)
  RETENTION_DAYS=365
  ;;
*)
  echo "ERROR: Backup class must be hourly, daily, or monthly." >&2
  exit 64
  ;;
esac

source /etc/ac-vault-snapshot/config.env
source /etc/ac-vault-snapshot/approle.env

export AWS_PROFILE
export AWS_REGION
export VAULT_ADDR
export VAULT_CACERT
export VAULT_CLIENT_TIMEOUT="120s"

exec 9>/run/lock/ac-vault-raft-backup.lock

if ! flock -n 9; then
echo "ERROR: Another Vault snapshot process is running." >&2
exit 75
fi

BASE_DIR="/var/lib/ac-vault-snapshot"
FAILED_DIR="${BASE_DIR}/failed"
RUN_ID="$(date -u '+%Y%m%dT%H%M%SZ')-$$"
RUN_DIR="$(mktemp -d "${BASE_DIR}/tmp/run-${RUN_ID}.XXXXXX")"

SUCCESS=0
VAULT_TOKEN=""
LOGIN_PAYLOAD=""

cleanup() {
rc=$?
trap - EXIT

if [[ -n "${VAULT_TOKEN:-}" ]]; then
  VAULT_TOKEN="${VAULT_TOKEN}" \
    vault token revoke -self \
    >/dev/null 2>&1 || true
fi

unset VAULT_TOKEN VAULT_ROLE_ID VAULT_SECRET_ID

if [[ -n "${LOGIN_PAYLOAD:-}" ]]; then
  rm -f "${LOGIN_PAYLOAD}" || true
fi

if (( SUCCESS == 1 )); then
  rm -rf "${RUN_DIR}" || true
else
  failed_destination="${FAILED_DIR}/$(basename "${RUN_DIR}")"
  if [[ -d "${RUN_DIR}" ]]; then
    mv "${RUN_DIR}" "${failed_destination}" || true
    echo "FAILED SNAPSHOT STAGING RETAINED: ${failed_destination}" >&2
  fi
fi

exit "${rc}"
}

trap cleanup EXIT

find "${FAILED_DIR}" \
-mindepth 1 \
-maxdepth 1 \
-type d \
-mtime +7 \
-exec rm -rf {} + \
2>/dev/null || true

set +e
STATUS_JSON="$(vault status -format=json 2>/dev/null)"
STATUS_RC=$?
set -e

if (( STATUS_RC != 0 )); then
echo "ERROR: Vault is unreachable, sealed, or not ready; status exit code ${STATUS_RC}." >&2
exit 1
fi

if ! jq -e '.initialized == true' <<<"${STATUS_JSON}" >/dev/null; then
echo "ERROR: Vault is not initialized or its status response is invalid." >&2
exit 1
fi

# Do not use '.sealed // true': jq treats false as a fallback value.
if ! jq -e '.sealed == false' <<<"${STATUS_JSON}" >/dev/null; then
echo "ERROR: Vault is sealed or its status response is invalid." >&2
exit 1
fi

LOGIN_PAYLOAD="$(mktemp "${RUN_DIR}/approle-login.XXXXXX.json")"

jq -n \
--arg role_id "${VAULT_ROLE_ID}" \
--arg secret_id "${VAULT_SECRET_ID}" \
'{role_id: $role_id, secret_id: $secret_id}' \
> "${LOGIN_PAYLOAD}"

chmod 0600 "${LOGIN_PAYLOAD}"

LOGIN_JSON="$(
vault write \
  -format=json \
  auth/approle/login \
  @"${LOGIN_PAYLOAD}"
)"

VAULT_TOKEN="$(jq -er '.auth.client_token' <<<"${LOGIN_JSON}")"
export VAULT_TOKEN
unset LOGIN_JSON
rm -f "${LOGIN_PAYLOAD}"
LOGIN_PAYLOAD=""

STAMP="$(date -u '+%Y%m%dT%H%M%SZ')"
YEAR="${STAMP:0:4}"
MONTH="${STAMP:4:2}"
DAY="${STAMP:6:2}"

SNAPSHOT_NAME="${BACKUP_SERVER}-${STAMP}.snap"
CHECKSUM_NAME="${SNAPSHOT_NAME}.sha256"
SNAPSHOT_PATH="${RUN_DIR}/${SNAPSHOT_NAME}"
CHECKSUM_PATH="${RUN_DIR}/${CHECKSUM_NAME}"

S3_PREFIX="${BACKUP_SERVER}/${BACKUP_CLASS}/${YEAR}/${MONTH}/${DAY}"
SNAPSHOT_KEY="${S3_PREFIX}/${SNAPSHOT_NAME}"
CHECKSUM_KEY="${S3_PREFIX}/${CHECKSUM_NAME}"

RETAIN_UNTIL="$(
date \
  -u \
  -d "+${RETENTION_DAYS} days" \
  '+%Y-%m-%dT%H:%M:%SZ'
)"

vault operator raft snapshot save "${SNAPSHOT_PATH}"

test -s "${SNAPSHOT_PATH}" || {
echo "ERROR: Vault produced an empty snapshot file." >&2
exit 1
}

vault operator raft snapshot inspect "${SNAPSHOT_PATH}" >/dev/null

(
cd "${RUN_DIR}"
sha256sum "${SNAPSHOT_NAME}" > "${CHECKSUM_NAME}"
)

chmod 0600 "${SNAPSHOT_PATH}" "${CHECKSUM_PATH}"

upload_and_verify() {
local local_path="$1"
local object_key="$2"
local content_type="$3"
local local_size remote_size put_result head_result version_id

local_size="$(stat -c '%s' "${local_path}")"

put_result="$(
  aws s3api put-object \
    --bucket "${S3_BUCKET}" \
    --key "${object_key}" \
    --body "${local_path}" \
    --content-type "${content_type}" \
    --server-side-encryption AES256 \
    --checksum-algorithm SHA256 \
    --object-lock-mode GOVERNANCE \
    --object-lock-retain-until-date "${RETAIN_UNTIL}" \
    --metadata \
      "vault-node=${BACKUP_SERVER},backup-class=${BACKUP_CLASS},created-utc=${STAMP}" \
    --expected-bucket-owner "${AWS_ACCOUNT_ID}"
)"

head_result="$(
  aws s3api head-object \
    --bucket "${S3_BUCKET}" \
    --key "${object_key}" \
    --expected-bucket-owner "${AWS_ACCOUNT_ID}"
)"

remote_size="$(jq -er '.ContentLength' <<<"${head_result}")"

if [[ "${local_size}" != "${remote_size}" ]]; then
  echo "ERROR: S3 object size mismatch for ${object_key}." >&2
  exit 1
fi

jq -e '
  .ServerSideEncryption == "AES256"
  and .ObjectLockMode == "GOVERNANCE"
  and .ObjectLockRetainUntilDate != null
  and .VersionId != null
' <<<"${head_result}" >/dev/null

version_id="$(jq -r '.VersionId // "unknown"' <<<"${put_result}")"

echo "VERIFIED S3 OBJECT: s3://${S3_BUCKET}/${object_key}"
echo "VERSION ID: ${version_id}"
}

upload_and_verify \
"${SNAPSHOT_PATH}" \
"${SNAPSHOT_KEY}" \
"application/octet-stream"

upload_and_verify \
"${CHECKSUM_PATH}" \
"${CHECKSUM_KEY}" \
"text/plain"

METRIC_DATA="$(
jq -nc \
  --arg server "${BACKUP_SERVER}" \
  --arg class "${BACKUP_CLASS}" \
  '[{
    MetricName: "BackupSuccess",
    Dimensions: [
      {Name: "Server", Value: $server},
      {Name: "Class", Value: $class}
    ],
    Value: 1,
    Unit: "Count"
  }]'
)"

aws cloudwatch put-metric-data \
--namespace "${CLOUDWATCH_NAMESPACE}" \
--metric-data "${METRIC_DATA}"

logger \
-t ac-vault-raft-backup \
-- \
"Vault ${BACKUP_CLASS} snapshot uploaded and verified: s3://${S3_BUCKET}/${SNAPSHOT_KEY}"

echo
echo "PASS: Vault ${BACKUP_CLASS} snapshot uploaded and verified."
echo "SNAPSHOT: s3://${S3_BUCKET}/${SNAPSHOT_KEY}"
echo "CHECKSUM: s3://${S3_BUCKET}/${CHECKSUM_KEY}"
echo "RETAIN UNTIL: ${RETAIN_UNTIL}"

SUCCESS=1
BASH

sudo chown root:root /usr/local/sbin/ac-vault-raft-backup
sudo chmod 0700 /usr/local/sbin/ac-vault-raft-backup
sudo bash -n /usr/local/sbin/ac-vault-raft-backup

Install the immediate SNS failure-alert script:


sudo tee /usr/local/sbin/ac-vault-backup-alert >/dev/null <<'BASH'
#!/usr/bin/env bash
set -Eeuo pipefail
umask 077

BACKUP_CLASS="${1:-unknown}"
source /etc/ac-vault-snapshot/config.env

export AWS_PROFILE
export AWS_REGION

UNIT_NAME="ac-vault-raft-backup@${BACKUP_CLASS}.service"
HOST_NAME="$(hostname --fqdn 2>/dev/null || hostname)"
FAILED_AT="$(date -u '+%Y-%m-%dT%H:%M:%SZ')"

JOURNAL_EXCERPT="$(
journalctl \
  -u "${UNIT_NAME}" \
  -n 30 \
  --no-pager \
  2>/dev/null || true
)"

MESSAGE="$(cat <<EOF
Aspireclan production Vault Raft backup failed.

Server: ${HOST_NAME}
Backup class: ${BACKUP_CLASS}
UTC time: ${FAILED_AT}
Systemd unit: ${UNIT_NAME}
S3 bucket: ${S3_BUCKET}

Recent journal output:
${JOURNAL_EXCERPT}
EOF
)"

aws sns publish \
--topic-arn "${SNS_TOPIC_ARN}" \
--subject "FAILED: prod-vault-01 ${BACKUP_CLASS} Raft backup" \
--message "${MESSAGE}" \
>/dev/null

logger \
-t ac-vault-backup-alert \
-- \
"Published Vault ${BACKUP_CLASS} backup failure notification."
BASH

sudo chown root:root /usr/local/sbin/ac-vault-backup-alert
sudo chmod 0700 /usr/local/sbin/ac-vault-backup-alert
sudo bash -n /usr/local/sbin/ac-vault-backup-alert

31.7 Install systemd services and timers

Install the backup service with explicit root AWS profile paths:


sudo tee /etc/systemd/system/ac-vault-raft-backup@.service >/dev/null <<'EOF'
[Unit]
Description=Create and upload the prod-vault-01 %i Raft snapshot
Wants=network-online.target
After=network-online.target vault.service
OnFailure=ac-vault-raft-backup-failure@%i.service

[Service]
Type=oneshot
User=root
Group=root
Environment=HOME=/root
Environment=AWS_SHARED_CREDENTIALS_FILE=/root/.aws/credentials
Environment=AWS_CONFIG_FILE=/root/.aws/config
ExecStart=/usr/local/sbin/ac-vault-raft-backup %i
UMask=0077
Nice=10
IOSchedulingClass=best-effort
IOSchedulingPriority=7
PrivateTmp=true
PrivateDevices=true
ProtectSystem=strict
ProtectHome=read-only
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictSUIDSGID=true
LockPersonality=true
NoNewPrivileges=true
ReadOnlyPaths=/etc/ac-vault-snapshot
ReadOnlyPaths=/opt/vault/tls
ReadOnlyPaths=/root/.aws
ReadWritePaths=/var/lib/ac-vault-snapshot
ReadWritePaths=/run/lock
TimeoutStartSec=15min
EOF

sudo tee /etc/systemd/system/ac-vault-raft-backup-failure@.service >/dev/null <<'EOF'
[Unit]
Description=Notify operators that the prod-vault-01 %i Raft snapshot failed
Wants=network-online.target
After=network-online.target

[Service]
Type=oneshot
User=root
Group=root
Environment=HOME=/root
Environment=AWS_SHARED_CREDENTIALS_FILE=/root/.aws/credentials
Environment=AWS_CONFIG_FILE=/root/.aws/config
ExecStart=/usr/local/sbin/ac-vault-backup-alert %i
UMask=0077
PrivateTmp=true
PrivateDevices=true
ProtectSystem=strict
ProtectHome=read-only
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictSUIDSGID=true
LockPersonality=true
NoNewPrivileges=true
ReadOnlyPaths=/etc/ac-vault-snapshot
ReadOnlyPaths=/root/.aws
TimeoutStartSec=2min
EOF

Install the timers:


sudo tee /etc/systemd/system/ac-vault-raft-backup-hourly.timer >/dev/null <<'EOF'
[Unit]
Description=Run the prod-vault-01 hourly Raft snapshot

[Timer]
OnCalendar=*-*-* *:17:00
Persistent=true
AccuracySec=1min
RandomizedDelaySec=2min
Unit=ac-vault-raft-backup@hourly.service

[Install]
WantedBy=timers.target
EOF

sudo tee /etc/systemd/system/ac-vault-raft-backup-daily.timer >/dev/null <<'EOF'
[Unit]
Description=Run the prod-vault-01 daily Raft snapshot

[Timer]
OnCalendar=*-*-* 00:37:00 UTC
Persistent=true
AccuracySec=1min
RandomizedDelaySec=2min
Unit=ac-vault-raft-backup@daily.service

[Install]
WantedBy=timers.target
EOF

sudo tee /etc/systemd/system/ac-vault-raft-backup-monthly.timer >/dev/null <<'EOF'
[Unit]
Description=Run the prod-vault-01 monthly Raft snapshot

[Timer]
OnCalendar=*-*-01 01:07:00 UTC
Persistent=true
AccuracySec=1min
RandomizedDelaySec=2min
Unit=ac-vault-raft-backup@monthly.service

[Install]
WantedBy=timers.target
EOF

sudo systemctl daemon-reload

sudo systemd-analyze verify   /etc/systemd/system/ac-vault-raft-backup@.service   /etc/systemd/system/ac-vault-raft-backup-failure@.service   /etc/systemd/system/ac-vault-raft-backup-hourly.timer   /etc/systemd/system/ac-vault-raft-backup-daily.timer   /etc/systemd/system/ac-vault-raft-backup-monthly.timer

31.8 Run the first backup and verify S3

Confirm Vault is unsealed, then run the script directly so the complete result is visible:


sudo -H env   VAULT_ADDR="https://192.168.8.2:8200"   VAULT_CACERT="/opt/vault/tls/vault-cert.pem"   vault status

sudo /usr/local/sbin/ac-vault-raft-backup hourly

The confirmed first upload produced a snapshot of 46594 bytes with this S3 state:


Encryption: AES256
VersionId: 6BLmKNuyha0hDytMyMhisiSZiLwIfWVJ
ObjectLockMode: GOVERNANCE
RetainUntil: 2026-06-17T13:30:19+00:00
Metadata:
backup-class: hourly
created-utc: 20260614T133019Z
vault-node: prod-vault-01

List the newest objects and populate both shell variables. The checksum variable assignment must not be omitted:


LATEST_SNAPSHOT_KEY="$(
sudo -H aws s3api list-objects-v2     --bucket aspireclan-prod-vault-raft-backups-425389089086-us-east-1     --prefix prod-vault-01/hourly/     --profile vault-backup     --region us-east-1     --query 'sort_by(Contents[?ends_with(Key, `.snap`)], &LastModified)[-1].Key'     --output text
)"

if [[ -z "${LATEST_SNAPSHOT_KEY}" || "${LATEST_SNAPSHOT_KEY}" == "None" ]]; then
echo "ERROR: No hourly snapshot was found in S3." >&2
exit 1
fi

LATEST_CHECKSUM_KEY="${LATEST_SNAPSHOT_KEY}.sha256"

printf 'Snapshot key: %s
' "${LATEST_SNAPSHOT_KEY}"
printf 'Checksum key: %s
' "${LATEST_CHECKSUM_KEY}"

sudo -H aws s3api head-object   --bucket aspireclan-prod-vault-raft-backups-425389089086-us-east-1   --key "${LATEST_SNAPSHOT_KEY}"   --profile vault-backup   --region us-east-1   --query '{
  Size:ContentLength,
  Encryption:ServerSideEncryption,
  VersionId:VersionId,
  ObjectLockMode:ObjectLockMode,
  RetainUntil:ObjectLockRetainUntilDate,
  Metadata:Metadata
}' 

Download both objects, validate the checksum, and inspect the snapshot:


VERIFY_DIR="$(
sudo mktemp     -d     /var/lib/ac-vault-snapshot/verify.XXXXXX
)"

sudo -H aws s3api get-object   --bucket aspireclan-prod-vault-raft-backups-425389089086-us-east-1   --key "${LATEST_SNAPSHOT_KEY}"   --profile vault-backup   --region us-east-1   "${VERIFY_DIR}/$(basename "${LATEST_SNAPSHOT_KEY}")"

sudo -H aws s3api get-object   --bucket aspireclan-prod-vault-raft-backups-425389089086-us-east-1   --key "${LATEST_CHECKSUM_KEY}"   --profile vault-backup   --region us-east-1   "${VERIFY_DIR}/$(basename "${LATEST_CHECKSUM_KEY}")"

sudo bash -c "
cd '${VERIFY_DIR}'
sha256sum -c '$(basename "${LATEST_CHECKSUM_KEY}")'
"

sudo vault operator raft snapshot inspect   "${VERIFY_DIR}/$(basename "${LATEST_SNAPSHOT_KEY}")"

sudo rm -rf "${VERIFY_DIR}"
unset VERIFY_DIR LATEST_SNAPSHOT_KEY LATEST_CHECKSUM_KEY

Verify the uploader cannot delete a backup. The command must fail with AccessDenied:


if sudo -H aws s3api delete-object   --bucket aspireclan-prod-vault-raft-backups-425389089086-us-east-1   --key "${LATEST_SNAPSHOT_KEY}"   --profile vault-backup   --region us-east-1
then
echo "ERROR: Uploader unexpectedly has delete permission." >&2
exit 1
else
echo "PASS: Uploader cannot delete Vault backup objects."
fi

31.9 Enable and observe scheduled backups

Only enable the timers after the manual upload and downloaded checksum verification succeed:


sudo systemctl enable --now   ac-vault-raft-backup-hourly.timer   ac-vault-raft-backup-daily.timer   ac-vault-raft-backup-monthly.timer

sudo systemctl is-enabled   ac-vault-raft-backup-hourly.timer   ac-vault-raft-backup-daily.timer   ac-vault-raft-backup-monthly.timer

sudo systemctl is-active   ac-vault-raft-backup-hourly.timer   ac-vault-raft-backup-daily.timer   ac-vault-raft-backup-monthly.timer

sudo systemctl list-timers   'ac-vault-raft-backup-*'   --all

Test the immediate failure path without creating a snapshot:


sudo systemctl start   ac-vault-raft-backup@invalid-test.service || true

sudo journalctl   -u ac-vault-raft-backup@invalid-test.service   -n 50   --no-pager

sudo journalctl   -u ac-vault-raft-backup-failure@invalid-test.service   -n 50   --no-pager

sudo systemctl reset-failed   ac-vault-raft-backup@invalid-test.service || true

If a completed failure-alert instance is no longer loaded, systemd can report Unit ... not loaded during reset-failed; that is harmless because the failure-alert oneshot finished successfully and has no failed state to clear.

31.10 Create the independent stale-backup alarm

Run from AWS CloudShell after a successful hourly metric exists:


aws cloudwatch put-metric-alarm   --region us-east-1   --alarm-name "prod-vault-01-hourly-raft-backup-missing"   --alarm-description     "Alert when prod-vault-01 does not publish a successful hourly Raft backup metric for two consecutive hours."   --namespace "Aspireclan/VaultBackup"   --metric-name "BackupSuccess"   --dimensions     Name=Server,Value=prod-vault-01     Name=Class,Value=hourly   --statistic Sum   --period 3600   --evaluation-periods 2   --datapoints-to-alarm 2   --threshold 1   --comparison-operator LessThanThreshold   --treat-missing-data breaching   --alarm-actions "${SNS_TOPIC_ARN}"   --ok-actions "${SNS_TOPIC_ARN}"

aws cloudwatch describe-alarms   --region us-east-1   --alarm-names     "prod-vault-01-hourly-raft-backup-missing"

Test the notification path, then return the alarm to OK:


aws cloudwatch set-alarm-state   --region us-east-1   --alarm-name "prod-vault-01-hourly-raft-backup-missing"   --state-value ALARM   --state-reason     "Manual notification test after configuring Vault S3 backups"

aws cloudwatch set-alarm-state   --region us-east-1   --alarm-name "prod-vault-01-hourly-raft-backup-missing"   --state-value OK   --state-reason     "Manual notification test completed"

31.11 Operational validation and remaining resilience gate


Implemented and directly observed:
[✓] Dedicated S3 bucket exists
[✓] SSE-S3 AES256 configured
[✓] S3 Versioning enabled
[✓] S3 Object Lock enabled
[✓] Governance retention applied
[✓] Public access blocked
[✓] BucketOwnerEnforced configured
[✓] Lifecycle rules configured
[✓] Dedicated IAM uploader configured without delete permission
[✓] Snapshot-only Vault policy and AppRole configured
[✓] Corrected Vault sealed-state Boolean test installed
[✓] First hourly snapshot uploaded to S3
[✓] Snapshot object version ID returned
[✓] AES256 and Governance metadata verified
[✓] Snapshot downloaded successfully
[✓] OnFailure service invoked when Vault was sealed

Still requiring operator confirmation:
[ ] Download checksum object and run sha256sum -c successfully
[ ] Inspect the downloaded snapshot successfully
[ ] Confirm uploader delete request is denied
[ ] Enable and observe hourly, daily, and monthly timers
[ ] Confirm SNS email delivery for an intentional failure
[ ] Create and test the CloudWatch stale-backup alarm
[ ] Complete the isolated snapshot-restore exercise

Do not perform the restore exercise on prod-vault-01. Restore only to an isolated recovery VM with production DNS, clients, and external credential systems blocked.


32. Reboot and Manual Unseal Procedure

The reboot and three-share manual-unseal procedure was tested successfully on prod-vault-01.

Before reboot, confirm that at least three shares are immediately available and that a fresh snapshot has been copied to an approved off-host destination.

Reboot:


sudo reboot

Reconnect and validate the service and sealed state:


ssh acllc@192.168.8.2

sudo systemctl is-active vault

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status

Expected immediately after reboot:


Initialized    true
Sealed         true
Storage Type   raft

Submit any three distinct shares interactively:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal

Verify:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status

Expected normal operating state:


Initialized    true
Sealed         false
Storage Type   raft

Test the named administrator after reboot:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault login -method=userpass username=manoj-admin

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault token lookup

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault audit list

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault secrets list

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator raft list-peers

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault kv get cert/prod/infra/_schema

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault token revoke -self

sudo rm -f /root/.vault-token

Never script the shares into systemd, cron, cloud-init, Terraform, Ansible, or local plaintext files.


33. Vault UI Access

The UI is enabled by ui = true and runs on the same HTTPS listener as the API.

Open from an internal browser:


https://192.168.8.2:8200/ui

https://vault.aspireclan.com:8200/ui

The bootstrap certificate is self-signed, so the browser will show a trust warning until the certificate is replaced or explicitly trusted.

State-dependent behavior:

  • Uninitialized rebuild: initialization screen.
  • Initialized but sealed: unseal screen.
  • Initialized and unsealed: login screen.

The current production instance is initialized, unsealed during normal operation, and has its initial root token revoked. Use:


Authentication method: Username
Mount path: userpass
Username: manoj-admin
Password: <YOUR_STRONG_ADMIN_PASSWORD>

Do not expose TCP 8200 to the public internet.


34. Complete Validation Checklist

Run the Terraform portion from the Terraform control host or repository checkout, not from prod-vault-01:


cd envs/prod
terraform init -input=false -reconfigure
terraform validate
terraform plan \
-input=false \
-lock-timeout=5m \
-var-file=terraform.tfvars \
-var-file=web.tfvars \
-var-file=app.tfvars \
-var-file=db.tfvars \
-var-file=k8s.tfvars \
-var-file=runner.tfvars \
-var-file=vault.tfvars

Run the following direct-server checklist on prod-vault-01; privileged checks consistently use sudo:


# VM identity and networking
sudo hostnamectl --static
sudo ip -brief address
sudo ip route
sudo resolvectl status
sudo getent ahostsv4 vault.aspireclan.com

# Host hardening and services
sudo swapon --show
sudo ufw status verbose
sudo systemctl is-active ssh qemu-guest-agent vault

# TLS files, certificate, and ports
sudo test -s /opt/vault/tls/vault-key.pem &&
echo "PASS: TLS private key exists."

sudo test -s /opt/vault/tls/vault-cert.pem &&
echo "PASS: TLS certificate exists."

sudo openssl x509 \
-in /opt/vault/tls/vault-cert.pem \
-noout \
-subject \
-issuer \
-dates \
-ext subjectAltName

sudo ss -lntp | grep -E ':(8200|8201)\b'

# Vault state
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status

Log in as the named administrator:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault login -method=userpass username=manoj-admin

Validate the logical baseline and backup files:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault token lookup

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault audit list -detailed

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault auth list -detailed

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault secrets list -detailed

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault policy list

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault read cert/config

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator raft list-peers

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault kv get cert/prod/infra/_schema

sudo tail -n 20 /var/log/vault/audit.log
sudo logrotate -d /etc/logrotate.d/vault-audit
sudo ls -lh /var/backups/vault

sudo bash -c '
cd /var/backups/vault
for checksum in *.sha256; do
  [ -e "$checksum" ] || continue
  sha256sum -c "$checksum"
done
'

Validate the certificate-issuer handoff state without displaying secrets:


sudo test -x /usr/local/sbin/ac-vault-prepare-cert-issuer &&
echo "PASS: issuer preparation utility installed"

sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault secrets list -detailed

sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault token capabilities acme/config

sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault read auth/approle/role/cert-issuer

sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault kv metadata get acme/cloudflare/dns

Do not run the final metadata command until the helper has completed successfully. A missing path is expected at the current 403 checkpoint.

Remove the temporary administrator token:


sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault token revoke -self

sudo rm -f /root/.vault-token

Required state before certificate issuer work:


Vault initialized: true
Vault sealed: false during normal operation
Storage type: raft
UI: reachable internally
Audit device: file/ and producing records
Human administrator: userpass login tested
Initial root token: revoked
KV v2 mount: cert/ with max_versions 20
Policies: platform-admin, cert-issuer, environment readers
AppRoles: baseline roles configured; cert-issuer delivery still pending after ACME policy repair
ACME KV v2: mount may exist; acme/config write and Cloudflare secret storage are not yet confirmed
Vault issuer helper: installed; first run stopped at HTTP 403 on acme/config
Manual snapshot/checksum: tested
Reboot/manual unseal: tested
Dedicated SSE-S3 bucket and first hourly upload: complete/tested
Downloaded checksum and snapshot inspection: operator confirmation pending
Hourly/daily/monthly timer observation: pending
CloudWatch stale-backup alarm test: pending
Isolated snapshot restore exercise: pending
UFW: active and default-deny
Public exposure: none

35. Controlled Vault Upgrade Procedure

Treat Vault upgrades as maintenance operations.

Before upgrading:


1. Review the official release notes and upgrade guidance.
2. Save and verify a new Raft snapshot.
3. Copy the snapshot off-host.
4. Confirm at least three unseal shares are recoverable.
5. Record the currently installed package version.
6. Schedule downtime for a single-node service restart and unseal.

Upgrade only to an explicitly approved package version:


sudo apt update
sudo apt-cache policy vault
sudo apt install "vault=<APPROVED_PACKAGE_VERSION>"
sudo /usr/bin/vault version
sudo systemctl restart vault

# Submit three distinct shares interactively.
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal

sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal

Run the complete validation checklist and save another snapshot after the upgrade.


36. Disaster-Recovery Design

Recovery requires all of the following:


A valid off-host Raft snapshot and checksum
At least three original unseal shares associated with that snapshot
Terraform and Ansible source
The approved DHCP reservation and DNS records
A supported Vault package version
Approved TLS replacement or bootstrap procedure

Recovery outline:


1. Provision a clean replacement VM with Terraform.
2. Apply the Vault Ansible baseline without initializing production data.
3. Install approved TLS material.
4. Follow the official Raft snapshot restore procedure on the isolated replacement.
5. Use the original cluster shares required for the restored data.
6. Validate policies, auth methods, audit devices, cert/ paths, and administrator login.
7. Move the internal service DNS record only after validation.
8. Rotate AppRole SecretIDs and other machine credentials if compromise is possible.
9. Save a new post-recovery snapshot.

Perform the first restore exercise before storing production certificate private keys. Never test a destructive restore on the active node.


37. Handoff to prod-cert-issuer-01

The authoritative issuer VM is prod-cert-01. Its Terraform and Ansible foundation is complete. Complete the remaining Vault credential handoff only after:


Vault is initialized and normally unsealed.
The file audit device is healthy.
The named administrator works.
The initial root token is revoked.
The cert/ KV v2 mount and schema paths exist.
The cert-issuer policy and AppRole exist.
No issuer SecretID has been generated prematurely.
A snapshot has been copied off-host.
A restore exercise is scheduled.

After the platform-admin and cert-issuer policy updates are applied, the helper succeeds, the wrapped SecretID is consumed on prod-cert-01, and preflight passes, the issuer will use paths such as:


cert/local/web/fp
cert/local/srvc/api-fp
cert/dev/web/fp
cert/dev/srvc/api-fp
cert/qa/web/fp
cert/qa/srvc/api-fp
cert/prod/web/fp
cert/prod/srvc/api-fp
cert/prod/infra/vault

Start with Let's Encrypt staging, verify renewal and Vault versioning, then approve production issuance.


38. Handoff to prod-int-proxy-01

The proxy phase begins only after at least one valid certificate bundle exists in cert/.

The proxy design will:


Use an environment-specific read-only AppRole.
Read only the exact certificate environment required.
Render HAProxy PEM bundles to root-owned local files.
Run haproxy -c before every reload.
Reload gracefully only after successful validation.
Never write certificate versions or read issuer credentials.
Avoid direct public exposure of Vault.

Vault Agent or an equivalent controlled fetch mechanism can be introduced after AppRole delivery and renewal behavior are tested.


39. Mandatory Security Rules


No unseal shares in Git, Vault, Terraform, Ansible, GitHub, email, chat, screenshots, or plaintext files.
No initial root token in shell profiles, command arguments, GitHub secrets, or automation logs.
No second initialization of an already initialized Raft backend.
No chmod weakening of /opt/vault/tls to make the private key broadly readable.
Use sudo env for local CLI access to the protected bootstrap certificate.
No permanent VAULT_SKIP_VERIFY=true.
No Vault API or UI exposed directly to the internet.
No automatic restart of an initialized Shamir-sealed Vault without three shares available.
No assumption that SIGHUP reloads listener TLS.
No AppRole SecretID before the target VM exists.
No reader policy with write access.
No certificate issuer with broad Vault administration permissions.
No production private keys before audit and off-host snapshot validation.
No backup considered valid until restore testing succeeds.

40. Complete Implementation Order


1.  Confirm prod-dns-01 is healthy at 192.168.8.4.                         complete
2.  Confirm router reservation aa:bb:cc:04:05:01 → 192.168.8.2.           complete
3.  Confirm vault.aspireclan.com resolves to 192.168.8.2.                  complete
4.  Confirm vault.tfvars, Terraform module, variables, outputs, backend.   complete
5.  Push to prod and let the DNS-first smart workflow run.                 complete
6.  Confirm Terraform reports no destructive action.                       complete
7.  Confirm Ansible installs Vault and converges prod-vault-01.             complete
8.  Confirm Vault 2.0.2, TLS, Raft, UFW, and service health.                complete
9.  Run protected-path validation with the documented sudo commands.        complete/tested
10. Initialize once with five shares and threshold three.                   complete; never rerun
11. Secure all five shares outside the Vault server and automation systems.  operator custody; verify/maintain
12. Submit three shares interactively and verify unsealed state.             complete/tested
13. Run ac-vault-bootstrap-logical using the sudo terminal-input procedure.  complete/tested
14. Test the named userpass administrator.                                   complete/tested
15. Revoke the initial root token through ac-vault-bootstrap-logical.        complete
16. Delete the temporary stored initial-root-token value.                    operator confirmation required
17. Validate file audit, cert/ KV v2, policies, AppRoles, schema records.     complete/tested
18. Save and checksum a manual Raft snapshot.                                complete/tested
19. Create and harden dedicated SSE-S3 Vault backup bucket.                  complete/tested
20. Test UI login with the named administrator.                              complete/tested
21. Reboot and test manual three-share unseal.                               complete/tested
22. Install snapshot scripts, AppRole, systemd services, and first upload.    complete/tested
23. Provision prod-cert-01 and apply issuer Ansible foundation.               complete/tested
24. Copy Vault listener certificate to prod-cert-01 trust path.               complete/tested
25. Install ac-vault-prepare-cert-issuer on prod-vault-01.                     complete; direct repair performed
26. Run issuer preparation helper.                                             stopped at acme/config with HTTP 403
27. Update version-controlled platform-admin and cert-issuer ACME policies.    current next step
28. Apply policies and rerun helper to store token/create wrapped SecretID.    pending
29. Bootstrap AppRole on prod-cert-01 and run preflight.                       pending
30. Enable issuer timer with declarations still disabled.                     pending
31. Complete checksum/timer/alarm validation and isolated restore exercise.   pending
32. Enable first Let's Encrypt staging certificate only after approval.       pending

41. Source Consistency Status

Exact-source comparison performed

This revision was generated from the exact file attached in the current request and mounted as /mnt/data/Pasted text.txt.


Baseline line count: 4965
Baseline SHA-256: f14a72e13e0fbbacfb11eb3d936515f16297aed9aa633c4138b826d0b4778e01
Baseline page ID: prod-vault-01-setup
Baseline H2 sections: 43

The updated file is compared byte-for-byte against that exact baseline. It is not the earlier condensed Vault page and is not an old copy.

The following page structure is preserved:


Frontmatter field order
CustomCodeBlock import
Page-local K8S-overview-style layout CSS
Full-width desktop document container
260px desktop table-of-contents / anchor panel
Vault allocation table styling
Single H1
Forty-three numbered H2 sections
Original H2 section order
Horizontal separators
CustomCodeBlock presentation pattern
Official references section
Continuation prompt section

Material changes added to the exact baseline include:


Authoritative issuer hostname: prod-cert-01
Issuer identity: 192.168.8.3 / VM ID 3156003 / MAC aa:bb:cc:04:05:02
Issuer Terraform and Ansible foundation: complete
Vault trust file on issuer: /etc/aspireclan/cert-issuer/vault-ca.pem
Service-account TLS validation requirement: sudo -u ac-cert-issuer or root
Vault helper: /usr/local/sbin/ac-vault-prepare-cert-issuer
Repository helper source: ansible/files/vault/ac-vault-prepare-cert-issuer
Observed helper result: HTTP 403 at PUT /v1/acme/config
platform-admin ACME capability correction: documented
cert-issuer read-only Cloudflare path: acme/cloudflare/dns
Direct Ubuntu helper execution: documented
Response-wrapped AppRole delivery to prod-cert-01: documented
Current pending state: token storage, wrapped SecretID, preflight, timer, staging issuance
PowerShell operational path: removed from the current issuer handoff

Validation performed on this updated file:


PASS: Exact attached 4965-line baseline read successfully.
PASS: Updated output differs from the exact baseline.
PASS: Updated line count is greater than the baseline line count.
PASS: Frontmatter values and field order preserved.
PASS: Original K8S-style CSS block preserved byte-for-byte.
PASS: All 43 numbered H2 sections remain sequential and in the original order.
PASS: Exactly one rendered H1 remains.
PASS: CustomCodeBlock opening and closing counts match.
PASS: JSX template-literal delimiters are structurally balanced.
PASS: Inserted ac-vault-prepare-cert-issuer script passed bash -n before MDX escaping.
PASS: Required helper, policy, ACME, AppRole, TLS, and status strings are present.
PASS: No real Cloudflare token, Vault token, SecretID, unseal share, AWS secret key, certificate private key, or PEM payload is embedded.
PASS: Unified diff and old/new SHA-256 files generated.
LIMITATION: Full Docusaurus build was not run because the complete documentation repository and package manifest were not attached.
LIMITATION: The policy correction and second helper run are documented as pending because no successful post-403 output was provided.

42. Official References


43. Continuation Prompt

We have completed the Terraform, Ansible, and direct-server manual bootstrap foundation for prod-vault-01 at 192.168.8.2, MAC aa:bb:cc:04:05:01, Proxmox VM ID 3156002. Vault 2.0.2 is installed from HashiCorp's official APT repository and runs with Integrated Storage, TLS, UI, UFW, swap disabled, protected files under /opt/vault/tls, and S3 Terraform state at prod/terraform.tfstate.

Vault was initialized exactly once with five Shamir shares and threshold three. The direct Ubuntu sudo runbook was tested for TLS/status checks, interactive unseal, logical bootstrap, named manoj-admin validation, initial-root-token revocation, file audit, cert/ KV v2, policies, AppRole roles, schema records, Raft snapshot/checksum creation, and reboot/manual-unseal validation. Never rerun vault operator init against the existing Raft data, never put shares or tokens in command arguments, and do not weaken /opt/vault/tls permissions.

The authoritative certificate issuer is now prod-cert-01 at 192.168.8.3, MAC aa:bb:cc:04:05:02, Proxmox VM ID 3156003. Terraform provisioning and the Ansible issuer foundation are complete. The Vault listener certificate is installed on the issuer as /etc/aspireclan/cert-issuer/vault-ca.pem; because the parent directory is restricted, TLS checks must run as ac-cert-issuer or root.

The Vault-side helper /usr/local/sbin/ac-vault-prepare-cert-issuer is now installed. Its first direct run authenticated successfully but stopped at PUT /v1/acme/config with HTTP 403 permission denied. Therefore, update the version-controlled platform-admin.hcl and cert-issuer.hcl with the documented ACME capabilities, deploy and apply both policies, verify vault token capabilities acme/config, and rerun the helper. The failed run did not reach Cloudflare-token storage or wrapped SecretID generation.

After the helper succeeds, copy only the RoleID and one-use wrapping token directly to the prod-cert-01 PuTTY session, run ac-cert-issuer bootstrap-approle, verify the credential-file modes without printing their contents, run preflight, and enable ac-cert-issuer.timer only while certificate declarations remain disabled. Then enable one Let's Encrypt staging certificate, prove the second run is idempotent, validate renewal and Vault KV versioning, and approve production issuance before beginning prod-int-proxy-01.

The dedicated Vault Raft backup bucket aspireclan-prod-vault-raft-backups-425389089086-us-east-1 uses SSE-S3/AES256, Versioning, Governance Object Lock, lifecycle retention, Block Public Access, a no-delete uploader, a snapshot AppRole, root-only scripts, and systemd units. Complete the remaining checksum/timer/SNS/CloudWatch/isolated-restore gates. Preserve the DNS-first smart Terraform workflow, component-aware Ansible behavior, this page's complete 43-section structure, and the K8S-overview-style 260px desktop anchor panel.