Production Vault Setup
1. Purpose
This page is the complete, current implementation and operating reference for the Aspireclan production Vault server:
prod-vault-01
192.168.8.2
aa:bb:cc:04:05:01
Proxmox VM ID 3156002
It records the exact Terraform, GitHub Actions, Ansible, TLS, initialization, unseal, logical-bootstrap, root-token-retirement, UI, dedicated SSE-S3 Raft backup, reboot-validation, recovery, and certificate-issuer handoff approach that reached the current tested checkpoint on 2026-06-14.
The server is the trust foundation for the next infrastructure phases:
prod-cert-01, the authoritative certificate-issuer VM at192.168.8.3, which is now Terraform-provisioned and Ansible-configured and will obtain and renew certificates after its Vault AppRole activation is completed.- Internal and external HAProxy servers, which will read only the certificate paths permitted for their environment.
- Infrastructure administrators, who will operate policies, authentication, audit, snapshots, upgrades, and disaster recovery.
Terraform creates and reconciles the Proxmox VM. Ansible installs and hardens Vault. Vault initialization is deliberately performed once, directly on prod-vault-01, so the five Shamir unseal-key shares and initial root token never enter Terraform state, Ansible variables, GitHub Actions logs, or committed files.
2. Implementation Status and Approved Decisions
The following foundation and direct-server bootstrap activities are working:
prod-dns-01is deployed separately with state atprod/dns/terraform.tfstateand DNS service at192.168.8.4.prod-vault-01is tracked in the production state atprod/terraform.tfstate.- Terraform refresh/plan runs on every
dev,qa, andprodworkflow execution, even when Git file detection finds no Terraform source change. - The production DNS job runs and verifies DNS first.
- Ansible targets only the affected component and uses syntax-check, check mode, and apply-on-change behavior.
- HashiCorp Vault 2.0.2 is installed from the official HashiCorp APT repository.
- Vault uses Integrated Storage (Raft), TLS, the built-in UI, fail-closed UFW, swap disabling, and a hardened systemd override.
- The current bootstrap certificate is generated idempotently on the Vault VM and stored under
/opt/vault/tls. - Vault was initialized exactly once directly on
prod-vault-01with five Shamir shares and a threshold of three. - The five unseal shares were obtained directly from the Vault server. Their custody remains operator-controlled and must stay outside Terraform, Ansible, GitHub Actions, and the Vault VM.
- The direct-server
sudoprocedure was tested successfully for TLS/status validation, interactive unseal, logical bootstrap, named-administrator validation, initial-root-token revocation, Raft snapshot creation/checksum verification, and reboot/manual-unseal validation. - The file audit device,
cert/KV v2 mount, policies,userpass, AppRole roles, and certificate schema records are configured. prod-cert-01is now deployed at192.168.8.3, VM ID3156003, MACaa:bb:cc:04:05:02, with the certificate-issuer Ansible foundation installed under/etc/aspireclan/cert-issuerand/usr/local/sbin/ac-cert-issuer.- The Vault listener certificate was copied to
prod-cert-01as/etc/aspireclan/cert-issuer/vault-ca.pem. The issuer service account can read it; the ordinaryacllcshell cannot traverse the restricted issuer directory, so direct TLS tests must run withsudo -u ac-cert-issueror as root. - The Vault-side helper
/usr/local/sbin/ac-vault-prepare-cert-issuerwas initially missing because the earlier certificate-only Ansible run did not execute the Vault play. It has now been installed directly onprod-vault-01and reached the Vault API successfully. - The first issuer-preparation attempt stopped at
PUT /v1/acme/configwith HTTP403 permission denied. This proves authentication and connectivity are working, but the currentplatform-adminpolicy still lacks the required ACME KV v2 data/config capabilities. - Because the helper stops at
acme/configbefore writing the Cloudflare token or generating the wrapped SecretID, the Cloudflare token, final AppRole credential pair, issuer preflight, systemd timer activation, staging issuance, and production issuance remain pending. - The initial root token was retired after the named
platform-adminuser was validated. - The dedicated SSE-S3 Vault backup bucket and first hourly upload are implemented and validated. Downloaded checksum verification, scheduled timer observation, CloudWatch stale-backup alarm testing, and the isolated snapshot-restore exercise remain the outstanding resilience gates.
Do not mark the certificate issuer as activated yet. The confirmed stopping point is:
prod-vault-01 helper: installed and executable
Vault administrator login: successful
AppRole auth endpoint: reachable
acme/ mount creation: may have completed; verify before changing it
PUT /v1/acme/config: failed with HTTP 403 permission denied
Cloudflare token stored: not confirmed; helper did not reach that step
Wrapped SecretID generated: no
prod-cert-01 AppRole files installed: no
Issuer timer active: no
Let's Encrypt staging issuance: no
The next action is to update the version-controlled platform-admin.hcl and cert-issuer.hcl, apply the policy to Vault, rerun the helper, bootstrap the wrapped credential on prod-cert-01, run preflight, and only then enable the timer.
The approved phase-one design is:
Edition: Vault Community Edition
Installed/tested version: Vault 2.0.2
Operating system: Ubuntu Server 26.04 LTS
Storage: Integrated Storage (Raft)
Vault nodes: 1 initially
Seal type: Shamir manual unseal
Unseal shares: 5
Unseal threshold: 3
API and UI: HTTPS on TCP 8200, internal network only
Cluster port: TCP 8201, not opened to clients in single-node phase
Bootstrap TLS: self-signed certificate generated by Ansible on prod-vault-01
Authentication after bootstrap: userpass for a named administrator; AppRole for machines
Audit baseline: file audit device
Secrets engines: KV v2 mounted at cert/; acme/ is the dedicated issuer-credential mount and is at the policy-repair checkpoint
Certificate hierarchy: cert/<environment>/<workload-type>/<workload-name>
Cloudflare credential path: acme/cloudflare/dns
Certificate versions retained: 20
Initial root token: revoked after named-administrator validation
Auto-unseal: deferred
Snapshots: dedicated SSE-S3 off-host bucket implemented; first hourly snapshot uploaded, versioned, Object-Locked, metadata-verified, and downloaded; final checksum/timer/alarm/restore validation remains
Vault initialization created five unseal-key shares and one initial root token. The underlying encryption root key was not printed. Any three shares are sufficient to unseal this single node. Never run vault operator init again against the current Raft data.
3. Scope and Non-Goals
This page covers the current Vault foundation, the tested direct-server manual procedure, and the remaining resilience work.
It includes:
- Proxmox provisioning with S3 remote state and locking.
- DNS-first workflow sequencing.
- Component-aware Terraform and Ansible execution.
- Vault package installation from the official repository.
- TLS private key, CSR, and self-signed bootstrap certificate generation.
- Raft storage, service hardening, UFW, audit-log preparation, and policy deployment.
- Direct initialization on
prod-vault-01with five shares and threshold three. - Direct, interactive unseal without putting shares in shell history.
- Logical baseline configuration through the installed root-only bootstrap utility.
- UI access and named administrator login.
- Initial root-token retirement.
- Manual Raft snapshot creation and checksum verification.
- Dedicated SSE-S3 off-host bucket creation, Object Lock, lifecycle retention, IAM uploader, Vault AppRole, backup scripts, systemd units, and first hourly upload verification.
- Reboot and three-share manual-unseal validation.
- Vault-to-
prod-cert-01trust transfer using the current self-signed listener certificate. - Direct Ubuntu installation and operation of the Vault certificate-issuer preparation helper.
- The exact ACME KV v2 policy correction required by the observed
403response. - Response-wrapped AppRole delivery to the deployed issuer VM without PowerShell.
It does not yet claim that the certificate issuer is fully activated, that the Cloudflare token has been stored successfully, that a wrapped SecretID has been consumed, that the issuer timer is active, that a Let's Encrypt staging or production certificate has been issued, that HAProxy is deployed, that all scheduled backup timer/alarm validation is complete, or that an isolated snapshot restore has already been completed.
4. Final Architecture
The implemented phase-one flow is:
Git push to prod
→ dedicated production DNS Terraform/Ansible job
→ mandatory DNS health gate
→ production Terraform refresh/plan/apply
→ S3 state: prod/terraform.tfstate
→ component-aware Ansible syntax-check
→ Ansible --check on existing prod-vault-01
→ real Ansible apply only when required
prod-vault-01
├── Vault API and UI: HTTPS 8200
├── Raft data: /opt/vault/data
├── TLS: /opt/vault/tls
├── Configuration: /etc/vault.d/vault.hcl
├── Audit log: /var/log/vault/audit.log
├── Version-controlled policies: /usr/local/share/ac-vault/policies
└── Automated Raft backup uploader
→ aspireclan-prod-vault-raft-backups-425389089086-us-east-1
→ SSE-S3 / AES256
→ Versioning + Governance Object Lock + lifecycle retention
The current certificate-issuer handoff flow is:
prod-vault-01
├── /opt/vault/tls/vault-cert.pem
├── /usr/local/share/ac-vault/policies/platform-admin.hcl
├── /usr/local/share/ac-vault/policies/cert-issuer.hcl
└── /usr/local/sbin/ac-vault-prepare-cert-issuer
→ authenticate named administrator through userpass
→ verify/enable acme/ as KV v2
→ configure acme/config
→ store Cloudflare token at acme/cloudflare/dns
→ configure CIDR-bound cert-issuer AppRole for 192.168.8.3/32
→ return non-secret RoleID plus one-use wrapped SecretID
prod-cert-01
├── /etc/aspireclan/cert-issuer/vault-ca.pem
├── /etc/aspireclan/cert-issuer/approle/role_id
├── /etc/aspireclan/cert-issuer/approle/secret_id
├── /usr/local/sbin/ac-cert-issuer
└── ac-cert-issuer.timer
Current stop:
platform-admin lacks required acme/config capability
→ helper returned HTTP 403
→ policy source and live policy must be corrected before rerun
The future certificate hierarchy is:
cert/
├── local/
│ ├── web/<workload-name>
│ ├── srvc/<workload-name>
│ ├── job/<workload-name>
│ └── infra/<workload-name>
├── dev/
├── qa/
└── prod/
Examples:
cert/local/web/fp → local.fp.aspireclan.com
cert/local/srvc/api-fp → local.api.fp.aspireclan.com
cert/dev/web/fp → dev.fp.aspireclan.com
cert/dev/srvc/api-fp → dev.api.fp.aspireclan.com
cert/qa/web/fp → qa.fp.aspireclan.com
cert/qa/srvc/api-fp → qa.api.fp.aspireclan.com
cert/prod/web/fp → fp.aspireclan.com
cert/prod/srvc/api-fp → api.fp.aspireclan.com
cert/prod/infra/vault → vault.aspireclan.com
Vault remains internal. TCP 8200 is allowed only from the approved LAN CIDR during bootstrap; TCP 8201 has no client-facing allow rule in the one-node design.
5. Approved Build Order
Use this dependency order:
1. prod-dns-01 complete
2. prod-vault-01 VM and server configuration complete
3. Initialize/unseal Vault complete; five shares, threshold three
4. Configure logical baseline complete; audit, cert/ KV v2, policies, userpass, AppRole
5. Revoke initial root token complete after named-admin validation
6. Configure/test snapshots SSE-S3 bucket and first hourly upload complete; final checksum/timer/alarm/restore validation pending
7. prod-cert-01 VM and Ansible base complete; issuer staged with credentials absent and timer disabled
8. Install Vault issuer helper complete directly on prod-vault-01
9. Repair ACME policy and rerun helper current next step after observed HTTP 403 at acme/config
10. Bootstrap issuer AppRole pending; direct Ubuntu response-wrapped delivery
11. Let's Encrypt staging lifecycle pending
12. HAProxy servers only after a valid approved certificate exists
Terraform and Ansible may be coordinated in one workflow, but the dependency order remains DNS first, then Vault, then certificate issuer, then proxy.
6. VM Profile and Required Identity Inputs
The final approved VM identity is:
| VM name | VM ID | MAC | Reserved IP | vCPU | RAM | Disk | Template |
|---|---|---|---|---|---|---|---|
prod-vault-01 | 3156002 | aa:bb:cc:04:05:01 | 192.168.8.2 | 4 | 8192 MiB | 40G | tmplt-ub-26-min-base |
Network and service identity:
Proxmox node: pve
Bridge: vmbr0
Storage: local-lvm
DHCP: enabled inside Ubuntu
Router reservation: aa:bb:cc:04:05:01 → 192.168.8.2
Internal DNS server: 192.168.8.4
Vault service name: vault.aspireclan.com
Direct API/UI endpoint: https://192.168.8.2:8200
Named API/UI endpoint: https://vault.aspireclan.com:8200
These values are not secrets. Never place Vault tokens, unseal shares, private keys, or AppRole SecretIDs beside them in Terraform variables.
7. DNS and Endpoint Design
The working Vault TLS identity includes:
Common name: vault.aspireclan.com
DNS SANs:
- vault.aspireclan.com
- prod-vault-01
IP SAN:
- 192.168.8.2
The internal DNS record must resolve:
vault.aspireclan.com → 192.168.8.2
Validate from an internal client:
dig @192.168.8.4 vault.aspireclan.com
getent ahostsv4 vault.aspireclan.com
nc -vz 192.168.8.2 8200
The current bootstrap certificate is self-signed. Browsers and CLI clients must explicitly trust the certificate or use it as VAULT_CACERT. Do not use VAULT_SKIP_VERIFY=true as a permanent workaround.
8. Vault Listener TLS and the Bootstrap Dependency Rule
Vault cannot fetch the certificate required for its own listener from a sealed or unavailable Vault. The working implementation avoids that dependency loop by generating local bootstrap TLS material through Ansible:
# Vault server identity and network addresses.
vault_service_name: vault
vault_user: vault
vault_group: vault
vault_node_id: prod-vault-01
vault_ip_address: "192.168.8.2"
vault_api_addr: "https://192.168.8.2:8200"
vault_cluster_addr: "https://192.168.8.2:8201"
# Integrated Storage (Raft) and TLS filesystem paths.
vault_data_dir: /opt/vault/data
vault_tls_dir: /opt/vault/tls
vault_tls_private_key_path: /opt/vault/tls/vault-key.pem
vault_tls_csr_path: /opt/vault/tls/vault.csr
vault_tls_certificate_path: /opt/vault/tls/vault-cert.pem
vault_config_path: /etc/vault.d/vault.hcl
vault_audit_log_dir: /var/log/vault
vault_audit_log_path: /var/log/vault/audit.log
# This bootstrap certificate is created only when no certificate already exists.
# The certificate issuer phase will later replace the files at the same paths and
# signal Vault with SIGHUP, avoiding a Vault restart and reseal.
vault_bootstrap_tls_common_name: vault.aspireclan.com
vault_bootstrap_tls_dns_sans:
- vault.aspireclan.com
- prod-vault-01
vault_bootstrap_tls_ip_sans:
- "192.168.8.2"
vault_bootstrap_tls_validity: "+365d"
# Vault API access. Keep this narrow and replace the LAN CIDR with internal
# proxy/automation CIDRs after network segmentation is introduced.
vault_api_allowed_cidrs:
- "192.168.8.0/24"
# Port 8201 is not opened for a single-node deployment. Add only future Vault
# peer CIDRs here when a multi-node Raft cluster is approved.
vault_cluster_allowed_cidrs: []
vault_obsolete_ufw_rules: []
# Logical configuration defaults applied only after manual initialization.
vault_cert_mount: cert
vault_cert_environments:
- local
- dev
- qa
- prod
vault_cert_workload_types:
- web
- srvc
- job
- infra
The private key is generated only when absent with regenerate: never; an existing key is preserved. The CSR and certificate are created only during the real Ansible run, not during --check. The playbook verifies that the certificate and private key have the same public key and that the certificate is not immediately expired.
Important operational rule: listener TLS changes require a controlled Vault restart. Do not assume that sending SIGHUP reloads the TCP listener certificate. After Vault is initialized, schedule the restart and have three unseal shares available.
9. Security Model
Mandatory rules:
Vault runs as the dedicated vault operating-system account.
Vault is the only application service on prod-vault-01.
Raft data: /opt/vault/data, owner vault:vault, mode 0700.
TLS directory: /opt/vault/tls, owner vault:vault, mode 0700.
TLS private key: mode 0600.
Vault configuration: /etc/vault.d/vault.hcl, mode 0600.
Swap is disabled and persistent swap entries are commented.
Core dumps are disabled through systemd LimitCORE=0.
UFW defaults: deny incoming, deny routed, allow outgoing.
TCP 8200 is internal only.
TCP 8201 is reserved for future Vault peers.
Unseal shares never enter Git, Terraform, Ansible, GitHub Actions, or Vault.
The initial root token is temporary and revoked after the named administrator works.
AppRole SecretIDs are generated only when the consuming VM exists and is ready.
A single-node Vault is a single point of failure; the dedicated Object-Locked SSE-S3 backup path must remain healthy before production certificate keys are stored.
The current configuration sets disable_mlock = true and compensates by disabling swap. This is the tested phase-one baseline; revisit mlock capabilities during the next production-hardening review.
10. Repository File Structure
The working repository structure is:
terraform/
├── .github/workflows/terraform-proxmox-deploy.yml
├── modules/proxmox-vm/
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
└── envs/prod/
├── backend.tf
├── main.tf
├── variables.tf
├── outputs.tf
├── terraform.tfvars
├── web.tfvars
├── app.tfvars
├── db.tfvars
├── k8s.tfvars
├── runner.tfvars
├── vault.tfvars
├── dns/ # separate DNS root and state
└── ansible/
├── configure-vms.yml
├── configure-vault.yml
├── inventory.ini
├── requirements.yml
├── group_vars/vault.yml
├── templates/vault.hcl.j2
├── templates/vault-audit-logrotate.j2
├── files/vault.service.d/override.conf
└── files/vault/
├── ac-vault-bootstrap-logical
├── ac-vault-prepare-cert-issuer
└── policies/
├── platform-admin.hcl
├── cert-issuer.hcl
├── cert-reader-local.hcl
├── cert-reader-dev.hcl
├── cert-reader-qa.hcl
└── cert-reader-prod.hcl
No generated TLS private key, unseal share, root token, administrator password, Vault token, or AppRole SecretID belongs in the repository.
The helper source must remain in Git at:
envs/prod/ansible/files/vault/ac-vault-prepare-cert-issuer
The Vault play must install it as:
/usr/local/sbin/ac-vault-prepare-cert-issuer
owner: root
group: root
mode: 0750
A direct manual installation repaired the current server, but the Ansible source and playbook must also contain the same helper so a future Vault convergence or disaster-recovery build cannot omit it again.
11. Terraform Responsibilities and Required Additions
Terraform manages only VM lifecycle and remote state. It does not initialize Vault or handle Vault secrets.
envs/prod/variables.tf:
variable "vault_vms" {
description = "Vault VMs for this environment."
type = map(object({
vmid = number
macaddr = string
reserved_ip = string
cores = number
memory = number
disk_size = string
}))
default = {}
}
envs/prod/main.tf:
module "vault_vms" {
source = "../../modules/proxmox-vm"
for_each = var.vault_vms
name = each.key
vmid = each.value.vmid
target_node = var.target_node
template_name = var.template_name
storage = var.storage
bridge = var.bridge
macaddr = each.value.macaddr
reserved_ip = each.value.reserved_ip
cores = each.value.cores
memory = each.value.memory
disk_size = each.value.disk_size
}
envs/prod/outputs.tf:
output "vault_vms" {
value = {
for name, vm in module.vault_vms : name => {
vmid = vm.vmid
macaddr = vm.macaddr
reserved_ip = vm.reserved_ip
}
}
}
envs/prod/vault.tfvars:
vault_vms = {
prod-vault-01 = {
vmid = 3156002
macaddr = "aa:bb:cc:04:05:01"
reserved_ip = "192.168.8.2"
cores = 4
memory = 8192
disk_size = "40G"
}
}
envs/prod/backend.tf:
# Legacy environment-level state used during the controlled migration phase.
# This key will later be split into one S3 state per component.
terraform {
backend "s3" {
bucket = "aspireclan-terraform-state-425389089086-us-east-1"
key = "prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
use_lockfile = true
}
}
The shared VM module pins the Proxmox disk format to raw, preventing repeated provider normalization from raw to null:
resource "proxmox_vm_qemu" "vm" {
name = var.name
vmid = var.vmid
target_node = var.target_node
# Normal Proxmox template without a Cloud-Init drive.
clone = var.template_name
full_clone = true
# QEMU Guest Agent is independent of Cloud-Init.
agent = 1
agent_timeout = 180
# Prevent the provider from waiting for an IPv6 guest address.
skip_ipv6 = true
cpu {
type = "host"
cores = var.cores
sockets = 1
}
memory = var.memory
scsihw = "virtio-scsi-pci"
# IMPORTANT:
# This disk block must match the source template boot disk layout.
# The tmplt-ub-26-min-base template currently uses scsi0 and 40G.
# Do not set this smaller than the template disk. A mismatched size such as 32G
# can create an extra empty disk and leave the real cloned Ubuntu disk unused.
disk {
slot = "scsi0"
size = var.disk_size
storage = var.storage
format = "raw"
}
network {
id = 0
model = "virtio"
bridge = var.bridge
macaddr = var.macaddr
}
}
12. Router Reservation and Collision Checks
The router reservation is:
aa:bb:cc:04:05:01 → 192.168.8.2
Before recreating the VM, verify the reservation and absence of a conflicting lease or neighbor entry:
ping -c 2 -W 1 192.168.8.2 || true
ip neigh show | grep -F '192.168.8.2' || true
Ubuntu remains DHCP-based. Do not add a static Netplan address. The reserved address is selected by the Terraform-assigned MAC.
13. Terraform Validation, Plan, and Apply
First execution — build a new prod-vault-01
- Confirm the router reservation and
vault.tfvarsvalues. - Push the reviewed change to the
prodbranch. - The workflow authenticates to AWS through OIDC, initializes the S3 backend, plans the dedicated DNS root first, verifies DNS health, then plans the main production root.
- A non-destructive push applies automatically.
- Terraform creates or updates
prod/terraform.tfstatein S3 with locking enabled. - A new Vault VM is always sent through the real Ansible playbook after SSH becomes reachable.
First execution — update an existing prod-vault-01
- Push only the relevant Terraform or Ansible files.
- Terraform still refreshes and plans every time, so out-of-band VM deletion and missing state are detectable.
- Ansible file detection maps Vault-specific changes to the
vaultgroup. - Existing hosts run
--check; the real playbook runs only when check mode reports changes. - Destructive Terraform actions remain blocked unless explicitly approved through a manual dispatch.
The exact always-plan logic is:
branch="${GITHUB_REF_NAME}"
apply_requested="false"
# Always evaluate the selected environment with Terraform whenever
# this workflow runs. Git diffs cannot detect VMs deleted directly
# in Proxmox, missing S3 state objects, or other infrastructure drift.
env_terraform_changed="true"
# Production DNS has a separate Terraform root and state. Evaluate it
# on every prod run so a missing DNS VM or missing DNS state is
# detected and repaired before the main production environment.
if [ "${branch}" = "prod" ]; then
dns_terraform_changed="true"
else
dns_terraform_changed="false"
fi
The workflow verifies the complete Vault file set before a production run:
if [ "${TF_ENV}" = "prod" ] && [ -f "${TF_WORKING_DIR}/vault.tfvars" ]; then
required_vault_files=(
"ansible/configure-vault.yml"
"ansible/group_vars/vault.yml"
"ansible/templates/vault.hcl.j2"
"ansible/templates/vault-bootstrap-openssl.cnf.j2"
"ansible/templates/vault-audit-logrotate.j2"
"ansible/files/vault.service.d/override.conf"
"ansible/files/vault/ac-vault-bootstrap-logical"
"ansible/files/vault/ac-vault-prepare-cert-issuer"
"ansible/files/vault/policies/platform-admin.hcl"
"ansible/files/vault/policies/cert-issuer.hcl"
"ansible/files/vault/policies/cert-reader-local.hcl"
"ansible/files/vault/policies/cert-reader-dev.hcl"
"ansible/files/vault/policies/cert-reader-qa.hcl"
"ansible/files/vault/policies/cert-reader-prod.hcl"
)
for file in "${required_vault_files[@]}"; do
if [ ! -f "${TF_WORKING_DIR}/${file}" ]; then
echo "ERROR: Missing required Vault file ${TF_WORKING_DIR}/${file}"
exit 1
fi
done
fi
The Vault required-file check must include ansible/files/vault/ac-vault-prepare-cert-issuer. Without that check, a certificate-only workflow can succeed while leaving the Vault-side handoff utility absent.
The production plan retains every existing variable file and appends vault.tfvars when present:
tf_var_args=(
"-var-file=terraform.tfvars"
"-var-file=web.tfvars"
"-var-file=app.tfvars"
"-var-file=db.tfvars"
"-var-file=k8s.tfvars"
"-var-file=runner.tfvars"
)
if [ -f "vault.tfvars" ]; then
tf_var_args+=("-var-file=vault.tfvars")
fi
For local operator review on the self-hosted runner, use:
cd envs/prod
terraform init -input=false -reconfigure
terraform fmt -check -recursive
terraform validate
terraform plan \
-input=false \
-lock-timeout=5m \
-var-file=terraform.tfvars \
-var-file=web.tfvars \
-var-file=app.tfvars \
-var-file=db.tfvars \
-var-file=k8s.tfvars \
-var-file=runner.tfvars \
-var-file=vault.tfvars \
-out=tfplan
The production DNS root remains separate under envs/prod/dns; envs/prod/dns.tfvars must remain deleted.
14. Ansible Inventory and Targeting
envs/prod/ansible/inventory.ini contains the dedicated Vault group:
[web]
prod-web-01 ansible_host=192.168.8.122 ansible_user=acllc ansible_ssh_private_key_file=~/.ssh/id_ed25519_ansible ansible_python_interpreter=/usr/bin/python3 ansible_ssh_common_args='-o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
[app]
[db]
[k8s]
[runner]
[vault]
prod-vault-01 ansible_host=192.168.8.2 ansible_user=acllc ansible_ssh_private_key_file=~/.ssh/id_ed25519_ansible ansible_python_interpreter=/usr/bin/python3 ansible_ssh_common_args='-o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
envs/prod/ansible/configure-vms.yml preserves the working playbook order:
---
- import_playbook: bootstrap-vms.yml
- import_playbook: configure-web.yml
- import_playbook: configure-vault.yml
- import_playbook: finalize-firewall.yml
The workflow runs syntax validation and then targets only the selected host or component:
ansible-playbook \
-i ansible/inventory.ini \
ansible/configure-vms.yml \
--limit prod-vault-01 \
--syntax-check
ansible-playbook \
-i ansible/inventory.ini \
ansible/configure-vms.yml \
--limit prod-vault-01 \
--check
A new VM bypasses check-only convergence and receives the real playbook. An existing VM receives the real playbook only when --check predicts changes.
15. Common Operating-System Baseline
The common bootstrap and Vault playbook enforce or verify:
Hostname: prod-vault-01
SSH and passwordless sudo for acllc
QEMU Guest Agent from the VM template
Internal DNS resolution through prod-dns-01
UFW installed and low-volume logging enabled
Default incoming: deny
Default routed: deny
Default outgoing: allow
Swap disabled
Vault system account and group present
Vault directories with restrictive ownership and modes
Core dumps disabled
Vault service enabled and active
TLS listener reachable on 127.0.0.1:8200
Useful checks:
sudo hostnamectl --static
sudo ip -brief address
sudo ip route
sudo resolvectl status
sudo swapon --show
sudo systemctl is-active qemu-guest-agent ssh vault
sudo ufw status verbose
The playbook does not run unrelated application roles against the Vault host.
16. Configure Internal DNS on prod-vault-01
The common environment configuration must resolve internal names through prod-dns-01 at 192.168.8.4. Public resolvers belong behind BIND forwarding, not beside the internal resolver on the Vault VM.
Validate on prod-vault-01:
sudo resolvectl status
sudo getent ahostsv4 vault.aspireclan.com
sudo getent ahostsv4 github.com
sudo dig @192.168.8.4 vault.aspireclan.com
Expected Vault service address:
vault.aspireclan.com → 192.168.8.2
The GitHub Actions workflow performs the dedicated DNS job and mandatory health gate before the main production environment job.
17. UFW Policy
The working Vault variables permit API/UI access from the internal LAN during bootstrap:
vault_api_allowed_cidrs:
- "192.168.8.0/24"
vault_cluster_allowed_cidrs: []
The playbook adds only the required rules; it does not reset UFW:
TCP 22: allowed by the common management baseline
TCP 8200: allowed from 192.168.8.0/24
TCP 8201: no allow rule in the single-node phase
Default incoming: deny
Default routed: deny
Default outgoing: allow
Validate:
sudo ufw status verbose
sudo ss -lntp | grep -E ':(8200|8201)\b'
After network segmentation and proxy deployment, narrow vault_api_allowed_cidrs to the administrator, certificate issuer, and approved proxy source addresses.
18. Install Vault Community Edition
Vault is installed by Ansible from HashiCorp's official APT repository. The playbook uses the modern deb822 repository module and verifies the HashiCorp signing-key fingerprint before package installation.
Pinned collection requirements:
---
collections:
- name: community.general
version: "13.0.1"
- name: community.crypto
version: "3.2.1"
The working installation sequence is part of the complete playbook in section 20 and includes:
ca-certificates
curl
gpg
jq
logrotate
openssl
python3-cryptography
python3-debian
HashiCorp key fingerprint:
798AEC654E5C15428C8E42EEAA16FCBCA621E701
Repository:
https://apt.releases.hashicorp.com
Package:
vault
Verify on the VM:
sudo /usr/bin/vault version
sudo apt-cache policy vault
sudo systemctl cat vault
The tested installation reported Vault v2.0.2. Do not hard-code a future version in Terraform state or expose package operations to unattended major-version changes without snapshot and recovery preparation.
19. Generate the Bootstrap CA and Vault Server Certificate
The earlier controller-generated private CA design was superseded by the working Ansible implementation. The current phase-one certificate is generated directly on prod-vault-01 only when no certificate exists.
Authoritative variables:
# Vault server identity and network addresses.
vault_service_name: vault
vault_user: vault
vault_group: vault
vault_node_id: prod-vault-01
vault_ip_address: "192.168.8.2"
vault_api_addr: "https://192.168.8.2:8200"
vault_cluster_addr: "https://192.168.8.2:8201"
# Integrated Storage (Raft) and TLS filesystem paths.
vault_data_dir: /opt/vault/data
vault_tls_dir: /opt/vault/tls
vault_tls_private_key_path: /opt/vault/tls/vault-key.pem
vault_tls_csr_path: /opt/vault/tls/vault.csr
vault_tls_certificate_path: /opt/vault/tls/vault-cert.pem
vault_config_path: /etc/vault.d/vault.hcl
vault_audit_log_dir: /var/log/vault
vault_audit_log_path: /var/log/vault/audit.log
# This bootstrap certificate is created only when no certificate already exists.
# The certificate issuer phase will later replace the files at the same paths and
# signal Vault with SIGHUP, avoiding a Vault restart and reseal.
vault_bootstrap_tls_common_name: vault.aspireclan.com
vault_bootstrap_tls_dns_sans:
- vault.aspireclan.com
- prod-vault-01
vault_bootstrap_tls_ip_sans:
- "192.168.8.2"
vault_bootstrap_tls_validity: "+365d"
# Vault API access. Keep this narrow and replace the LAN CIDR with internal
# proxy/automation CIDRs after network segmentation is introduced.
vault_api_allowed_cidrs:
- "192.168.8.0/24"
# Port 8201 is not opened for a single-node deployment. Add only future Vault
# peer CIDRs here when a multi-node Raft cluster is approved.
vault_cluster_allowed_cidrs: []
vault_obsolete_ufw_rules: []
# Logical configuration defaults applied only after manual initialization.
vault_cert_mount: cert
vault_cert_environments:
- local
- dev
- qa
- prod
vault_cert_workload_types:
- web
- srvc
- job
- infra
Behavior:
- The 4096-bit RSA private key is preserved with
regenerate: never. - Check mode predicts missing TLS material but does not attempt to consume a CSR that has not been written.
- The real run creates or repairs the CSR, then creates the self-signed certificate.
- The certificate contains the approved DNS and IP SANs.
- The key and certificate are checked for a matching public key.
- An existing certificate without its matching private key stops the playbook rather than silently generating a replacement.
File layout:
/opt/vault/tls/
├── vault-key.pem vault:vault 0600
├── vault.csr vault:vault 0640
└── vault-cert.pem vault:vault 0644
This certificate is temporary bootstrap trust. The certificate issuer phase will replace it through a controlled local deployment and controlled Vault restart/unseal procedure.
20. Deploy TLS Files and Vault Configuration
envs/prod/ansible/templates/vault.hcl.j2:
ui = true
api_addr = "{{ vault_api_addr }}"
cluster_addr = "{{ vault_cluster_addr }}"
disable_mlock = true
# Short defaults reduce the lifetime of accidentally exposed dynamic tokens.
default_lease_ttl = "1h"
max_lease_ttl = "24h"
log_level = "info"
log_format = "json"
storage "raft" {
path = "{{ vault_data_dir }}"
node_id = "{{ vault_node_id }}"
}
listener "tcp" {
address = "0.0.0.0:8200"
cluster_address = "0.0.0.0:8201"
tls_cert_file = "{{ vault_tls_certificate_path }}"
tls_key_file = "{{ vault_tls_private_key_path }}"
tls_min_version = "tls12"
}
envs/prod/ansible/files/vault.service.d/override.conf:
[Service]
LimitCORE=0
LimitNOFILE=1048576
Environment=VAULT_ENABLE_FILE_PERMISSIONS_CHECK=true
envs/prod/ansible/templates/vault-audit-logrotate.j2:
{{ vault_audit_log_path }} {
daily
rotate 30
size 100M
missingok
notifempty
compress
delaycompress
dateext
create 0600 {{ vault_user }} {{ vault_group }}
sharedscripts
postrotate
/bin/systemctl kill --kill-who=main --signal=HUP {{ vault_service_name }}.service >/dev/null 2>&1 || true
endscript
}
The complete working envs/prod/ansible/configure-vault.yml, including all check-mode fixes, the strict Boolean condition fix, TLS safety checks, and correctly registered handlers, is:
---
- name: Configure and harden HashiCorp Vault on prod-vault-01
hosts: vault
become: true
gather_facts: true
pre_tasks:
- name: Validate required Vault variables
ansible.builtin.assert:
that:
- vault_service_name is defined
- vault_user is defined
- vault_group is defined
- vault_node_id is defined
- vault_api_addr is defined
- vault_cluster_addr is defined
- vault_data_dir is defined
- vault_tls_dir is defined
- vault_tls_private_key_path is defined
- vault_tls_certificate_path is defined
- vault_config_path is defined
- vault_audit_log_dir is defined
- vault_audit_log_path is defined
- vault_bootstrap_tls_common_name is defined
- vault_bootstrap_tls_dns_sans is defined
- vault_bootstrap_tls_ip_sans is defined
- vault_api_allowed_cidrs is defined
- vault_cluster_allowed_cidrs is defined
- vault_obsolete_ufw_rules is defined
fail_msg: Verify ansible/group_vars/vault.yml before continuing.
- name: Resolve derived Vault TLS paths
ansible.builtin.set_fact:
vault_tls_csr_path_resolved: >-
{{
vault_tls_csr_path
| default(vault_tls_dir ~ '/vault.csr', true)
}}
- name: Check whether the Vault binary exists before configuration
ansible.builtin.stat:
path: /usr/bin/vault
register: vault_binary_before
- name: Record first-install check-mode state
ansible.builtin.set_fact:
vault_first_install_check_mode: >-
{{ ansible_check_mode and not vault_binary_before.stat.exists }}
- name: Predict first-time Vault installation during check mode
ansible.builtin.debug:
msg: >-
HashiCorp Vault is not installed. The real Ansible run will add the
official HashiCorp APT repository, install Vault, create the TLS
bootstrap material, configure Integrated Storage, and start Vault.
changed_when: true
when: vault_first_install_check_mode
tasks:
- name: Install HashiCorp repository and Vault prerequisites
ansible.builtin.apt:
name:
- ca-certificates
- curl
- gpg
- jq
- logrotate
- openssl
- python3-cryptography
- python3-debian
state: present
update_cache: true
cache_valid_time: 3600
when: not ansible_check_mode
- name: Download the official HashiCorp package-signing key
ansible.builtin.get_url:
url: https://apt.releases.hashicorp.com/gpg
dest: /usr/share/keyrings/hashicorp-archive-keyring.asc
owner: root
group: root
mode: "0644"
register: hashicorp_signing_key
when: not ansible_check_mode
- name: Check whether the HashiCorp binary keyring exists
ansible.builtin.stat:
path: /usr/share/keyrings/hashicorp-archive-keyring.gpg
register: hashicorp_binary_keyring
when: not ansible_check_mode
- name: Convert the HashiCorp signing key to a binary keyring
ansible.builtin.command:
argv:
- gpg
- --batch
- --yes
- --dearmor
- --output
- /usr/share/keyrings/hashicorp-archive-keyring.gpg
- /usr/share/keyrings/hashicorp-archive-keyring.asc
changed_when: true
when:
- not ansible_check_mode
- >-
hashicorp_signing_key.changed or
not hashicorp_binary_keyring.stat.exists
- name: Read the HashiCorp package-signing key fingerprint
ansible.builtin.command:
argv:
- gpg
- --batch
- --no-default-keyring
- --keyring
- /usr/share/keyrings/hashicorp-archive-keyring.gpg
- --with-colons
- --fingerprint
check_mode: false
changed_when: false
register: hashicorp_key_fingerprint
when: not ansible_check_mode
- name: Verify the official HashiCorp package-signing key fingerprint
ansible.builtin.assert:
that:
- >-
'798AEC654E5C15428C8E42EEAA16FCBCA621E701' in
(hashicorp_key_fingerprint.stdout | replace(' ', ''))
fail_msg: >-
The downloaded HashiCorp signing key fingerprint is not the expected
official fingerprint. Vault installation has been stopped.
when: not ansible_check_mode
- name: Remove the legacy HashiCorp one-line APT repository file
ansible.builtin.file:
path: /etc/apt/sources.list.d/hashicorp.list
state: absent
when: not ansible_check_mode
- name: Configure the official HashiCorp deb822 APT repository
ansible.builtin.deb822_repository:
name: hashicorp
types:
- deb
uris:
- https://apt.releases.hashicorp.com
suites:
- "{{ ansible_facts['distribution_release'] }}"
components:
- main
signed_by: /usr/share/keyrings/hashicorp-archive-keyring.gpg
enabled: true
state: present
when: not ansible_check_mode
- name: Install HashiCorp Vault
ansible.builtin.apt:
name: vault
state: present
update_cache: true
when: not ansible_check_mode
- name: Verify the Vault binary exists after installation
ansible.builtin.stat:
path: /usr/bin/vault
register: vault_binary_after
- name: Assert that the Vault binary is installed
ansible.builtin.assert:
that:
- vault_binary_after.stat.exists
- vault_binary_after.stat.isreg
- vault_binary_after.stat.executable
fail_msg: HashiCorp Vault was not installed at /usr/bin/vault.
when: not vault_first_install_check_mode
- name: Configure the Vault runtime and server
when: not vault_first_install_check_mode
block:
- name: Read the installed Vault version
ansible.builtin.command: /usr/bin/vault version
check_mode: false
changed_when: false
register: vault_version_result
- name: Default the existing initialization state to false
ansible.builtin.set_fact:
vault_was_initialized: false
- name: Read current Vault status before configuration
ansible.builtin.command: /usr/bin/vault status -format=json
environment:
VAULT_ADDR: "{{ vault_api_addr }}"
VAULT_CACERT: "{{ vault_tls_certificate_path }}"
check_mode: false
changed_when: false
failed_when: false
register: vault_status_before
when: not ansible_check_mode
- name: Record whether Vault was already initialized
ansible.builtin.set_fact:
vault_was_initialized: >-
{{
(
vault_status_before.stdout
| from_json
).initialized
| default(false)
}}
when:
- not ansible_check_mode
- >-
(
vault_status_before.stdout
| default('')
| trim
| regex_search('^\{')
) is not none
- name: Verify the Vault service account exists
ansible.builtin.getent:
database: passwd
key: "{{ vault_user }}"
- name: Verify the Vault service group exists
ansible.builtin.getent:
database: group
key: "{{ vault_group }}"
- name: Create Vault directories with restrictive permissions
ansible.builtin.file:
path: "{{ item.path }}"
state: directory
owner: "{{ item.owner }}"
group: "{{ item.group }}"
mode: "{{ item.mode }}"
loop:
- path: /etc/vault.d
owner: "{{ vault_user }}"
group: "{{ vault_group }}"
mode: "0700"
- path: "{{ vault_data_dir }}"
owner: "{{ vault_user }}"
group: "{{ vault_group }}"
mode: "0700"
- path: "{{ vault_tls_dir }}"
owner: "{{ vault_user }}"
group: "{{ vault_group }}"
mode: "0700"
- path: "{{ vault_audit_log_dir }}"
owner: "{{ vault_user }}"
group: "{{ vault_group }}"
mode: "0750"
- path: /etc/systemd/system/vault.service.d
owner: root
group: root
mode: "0755"
- path: /usr/local/share/ac-vault/policies
owner: root
group: root
mode: "0750"
- name: Read active swap devices
ansible.builtin.command: swapon --show=NAME --noheadings
check_mode: false
changed_when: false
register: vault_swap_devices
- name: Disable active swap for Vault hardening
ansible.builtin.command: swapoff -a
when:
- not ansible_check_mode
- vault_swap_devices.stdout | trim | length > 0
changed_when: true
- name: Disable persistent swap entries
ansible.builtin.replace:
path: /etc/fstab
regexp: '^(?!#)(\s*\S+\s+\S+\s+swap\s+.*)$'
replace: '# Disabled for Vault: \1'
backup: true
- name: Check whether the Vault TLS private key exists
ansible.builtin.stat:
path: "{{ vault_tls_private_key_path }}"
register: vault_tls_private_key_before
- name: Check whether the Vault TLS CSR exists
ansible.builtin.stat:
path: "{{ vault_tls_csr_path_resolved }}"
register: vault_tls_csr_before
- name: Check whether the Vault TLS certificate exists
ansible.builtin.stat:
path: "{{ vault_tls_certificate_path }}"
register: vault_tls_certificate_before
- name: Stop when a certificate exists without its matching private key
ansible.builtin.assert:
that:
- >-
not (
vault_tls_certificate_before.stat.exists and
not vault_tls_private_key_before.stat.exists
)
fail_msg: >-
The Vault TLS certificate exists but the private key is missing.
Restore the matching private key before continuing. Generating a
replacement key would make the existing certificate unusable.
- name: Predict bootstrap Vault TLS material creation during check mode
ansible.builtin.debug:
msg: >-
The bootstrap Vault TLS certificate is absent. The real Ansible
run will reuse any existing private key, create or repair the CSR,
and generate the self-signed bootstrap certificate.
changed_when: true
when:
- ansible_check_mode
- not vault_tls_certificate_before.stat.exists
- name: Generate or repair bootstrap Vault TLS material
when:
- not ansible_check_mode
- not vault_tls_certificate_before.stat.exists
block:
- name: Generate the bootstrap Vault TLS private key when absent
community.crypto.openssl_privatekey:
path: "{{ vault_tls_private_key_path }}"
type: RSA
size: 4096
owner: "{{ vault_user }}"
group: "{{ vault_group }}"
mode: "0600"
regenerate: never
notify: Reload Vault TLS
- name: Generate or repair the bootstrap Vault TLS CSR
community.crypto.openssl_csr:
path: "{{ vault_tls_csr_path_resolved }}"
privatekey_path: "{{ vault_tls_private_key_path }}"
common_name: "{{ vault_bootstrap_tls_common_name }}"
organization_name: Aspireclan LLC
subject_alt_name: >-
{{
(
vault_bootstrap_tls_dns_sans
| map('regex_replace', '^(.*)$', 'DNS:\1')
| list
)
+
(
vault_bootstrap_tls_ip_sans
| map('regex_replace', '^(.*)$', 'IP:\1')
| list
)
}}
key_usage:
- digitalSignature
- keyEncipherment
extended_key_usage:
- serverAuth
owner: "{{ vault_user }}"
group: "{{ vault_group }}"
mode: "0640"
backup: true
notify: Reload Vault TLS
- name: Generate the bootstrap self-signed Vault TLS certificate
community.crypto.x509_certificate:
path: "{{ vault_tls_certificate_path }}"
csr_path: "{{ vault_tls_csr_path_resolved }}"
privatekey_path: "{{ vault_tls_private_key_path }}"
provider: selfsigned
selfsigned_not_before: "-5m"
selfsigned_not_after: "{{ vault_bootstrap_tls_validity }}"
ignore_timestamps: true
owner: "{{ vault_user }}"
group: "{{ vault_group }}"
mode: "0644"
backup: true
notify: Reload Vault TLS
- name: Read resulting Vault TLS private-key state
ansible.builtin.stat:
path: "{{ vault_tls_private_key_path }}"
register: vault_tls_private_key_after
- name: Read resulting Vault TLS certificate state
ansible.builtin.stat:
path: "{{ vault_tls_certificate_path }}"
register: vault_tls_certificate_after
- name: Assert required Vault TLS files exist
ansible.builtin.assert:
that:
- vault_tls_private_key_after.stat.exists
- vault_tls_private_key_after.stat.isreg
- vault_tls_certificate_after.stat.exists
- vault_tls_certificate_after.stat.isreg
fail_msg: >-
Vault TLS material is incomplete. The private key and certificate
must both exist before Vault can start.
when:
- not ansible_check_mode or vault_tls_certificate_before.stat.exists
- name: Verify the Vault certificate and private key match
ansible.builtin.shell: |
set -euo pipefail
cert_public_key_hash="$(
openssl x509 \
-in {{ vault_tls_certificate_path | quote }} \
-pubkey \
-noout | \
openssl pkey -pubin -outform DER | \
sha256sum | awk '{print $1}'
)"
private_public_key_hash="$(
openssl pkey \
-in {{ vault_tls_private_key_path | quote }} \
-pubout \
-outform DER | \
sha256sum | awk '{print $1}'
)"
test "${cert_public_key_hash}" = "${private_public_key_hash}"
openssl x509 \
-in {{ vault_tls_certificate_path | quote }} \
-noout \
-checkend 3600
args:
executable: /bin/bash
check_mode: false
changed_when: false
when:
- vault_tls_certificate_after.stat.exists
- vault_tls_private_key_after.stat.exists
- name: Deploy the Vault server configuration
ansible.builtin.template:
src: vault.hcl.j2
dest: "{{ vault_config_path }}"
owner: "{{ vault_user }}"
group: "{{ vault_group }}"
mode: "0600"
backup: true
notify: Restart Vault configuration
- name: Deploy the Vault systemd hardening override
ansible.builtin.copy:
src: vault.service.d/override.conf
dest: /etc/systemd/system/vault.service.d/override.conf
owner: root
group: root
mode: "0644"
notify: Restart Vault configuration
- name: Prepare the Vault audit log file
ansible.builtin.file:
path: "{{ vault_audit_log_path }}"
state: touch
owner: "{{ vault_user }}"
group: "{{ vault_group }}"
mode: "0600"
modification_time: preserve
access_time: preserve
- name: Install Vault audit log rotation
ansible.builtin.template:
src: vault-audit-logrotate.j2
dest: /etc/logrotate.d/vault-audit
owner: root
group: root
mode: "0644"
- name: Install Vault logical bootstrap utility
ansible.builtin.copy:
src: vault/ac-vault-bootstrap-logical
dest: /usr/local/sbin/ac-vault-bootstrap-logical
owner: root
group: root
mode: "0750"
- name: Install Vault certificate issuer preparation utility
ansible.builtin.copy:
src: vault/ac-vault-prepare-cert-issuer
dest: /usr/local/sbin/ac-vault-prepare-cert-issuer
owner: root
group: root
mode: "0750"
- name: Install version-controlled Vault policies
ansible.builtin.copy:
src: "vault/policies/{{ item }}"
dest: "/usr/local/share/ac-vault/policies/{{ item }}"
owner: root
group: root
mode: "0640"
loop:
- platform-admin.hcl
- cert-issuer.hcl
- cert-reader-local.hcl
- cert-reader-dev.hcl
- cert-reader-qa.hcl
- cert-reader-prod.hcl
- name: Remove obsolete Vault UFW rules
community.general.ufw:
rule: "{{ item.rule | default('allow') }}"
src: "{{ item.src }}"
to_port: "{{ item.port }}"
proto: "{{ item.proto | default('tcp') }}"
delete: true
loop: "{{ vault_obsolete_ufw_rules }}"
when: vault_obsolete_ufw_rules | length > 0
- name: Allow approved clients to reach the Vault API
community.general.ufw:
rule: allow
src: "{{ item }}"
to_port: "8200"
proto: tcp
comment: Allow approved clients to reach Vault API
loop: "{{ vault_api_allowed_cidrs }}"
- name: Allow approved Vault peers to reach the Raft cluster port
community.general.ufw:
rule: allow
src: "{{ item }}"
to_port: "8201"
proto: tcp
comment: Allow approved Vault Raft peers
loop: "{{ vault_cluster_allowed_cidrs }}"
when: vault_cluster_allowed_cidrs | length > 0
- name: Flush Vault configuration handlers
ansible.builtin.meta: flush_handlers
- name: Ensure Vault is enabled and running
ansible.builtin.systemd_service:
name: "{{ vault_service_name }}"
enabled: true
state: started
daemon_reload: true
when: not ansible_check_mode
- name: Wait for the Vault TLS listener
ansible.builtin.wait_for:
host: 127.0.0.1
port: 8200
timeout: 60
when: not ansible_check_mode
- name: Read Vault status after configuration
ansible.builtin.command: /usr/bin/vault status -format=json
environment:
VAULT_ADDR: "{{ vault_api_addr }}"
VAULT_CACERT: "{{ vault_tls_certificate_path }}"
check_mode: false
changed_when: false
failed_when: vault_status_after.rc not in [0, 2]
register: vault_status_after
when: not ansible_check_mode
- name: Verify the Vault service is active
ansible.builtin.command:
argv:
- systemctl
- is-active
- --quiet
- "{{ vault_service_name }}.service"
check_mode: false
changed_when: false
when: not ansible_check_mode
- name: Show the non-secret Vault readiness summary
ansible.builtin.debug:
msg: >-
Vault {{ vault_version_result.stdout | trim }} is listening with TLS,
Integrated Storage, swap disabled, core dumps disabled, audit-log
storage prepared, and fail-closed firewall rules. Initialization and
unseal remain a direct, interactive Ubuntu server operation.
handlers:
- name: Reload Vault TLS
ansible.builtin.command:
argv:
- systemctl
- kill
- --kill-who=main
- --signal=HUP
- "{{ vault_service_name }}.service"
changed_when: true
when:
- not ansible_check_mode
- vault_was_initialized | default(false) | bool
- name: Block automatic restart of an initialized Shamir-sealed Vault
ansible.builtin.fail:
msg: >-
Vault was already initialized and vault.hcl or its systemd unit
changed. Automatic restart is blocked because it would reseal the
server. Schedule a controlled restart and unseal operation.
when:
- not ansible_check_mode
- vault_was_initialized | default(false) | bool
listen: Restart Vault configuration
- name: Restart uninitialized Vault safely
ansible.builtin.systemd_service:
name: "{{ vault_service_name }}"
state: restarted
enabled: true
daemon_reload: true
when:
- not ansible_check_mode
- not (vault_was_initialized | default(false) | bool)
listen: Restart Vault configuration
21. Start Vault and Perform Pre-Initialization Validation
The commands in this section were tested directly on prod-vault-01. Because /opt/vault/tls is intentionally mode 0700, privileged service, TLS, listener, and Vault CLI checks use sudo.
The current prod-vault-01 instance is already initialized. Retain the pre-initialization checks below as the rebuild/recovery runbook, but do not run vault operator init again against the existing Raft data.
Run:
sudo systemctl status vault --no-pager
sudo journalctl -u vault -n 100 --no-pager
sudo systemctl is-active vault
sudo test -s /opt/vault/tls/vault-key.pem &&
echo "PASS: TLS private key exists."
sudo test -s /opt/vault/tls/vault-cert.pem &&
echo "PASS: TLS certificate exists."
sudo openssl x509 \
-in /opt/vault/tls/vault-cert.pem \
-noout \
-checkend 3600 &&
echo "PASS: TLS certificate is currently valid."
sudo openssl x509 \
-in /opt/vault/tls/vault-cert.pem \
-noout \
-subject \
-issuer \
-dates \
-ext subjectAltName
sudo ss -lntH '( sport = :8200 )' | grep LISTEN
sudo ss -lntp | grep -E ':(8200|8201)\b'
sudo ufw status verbose
Verify the protected TLS-path permissions without weakening them:
sudo namei -l /opt/vault/tls/vault-cert.pem
sudo stat -c '%A %a %U:%G %n' \
/opt/vault/tls \
/opt/vault/tls/vault-cert.pem \
/opt/vault/tls/vault-key.pem
Expected security posture:
/opt/vault/tls 0700 vault:vault
vault-cert.pem 0644 vault:vault
vault-key.pem 0600 vault:vault
Check Vault through the protected CA file:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status
Expected before first initialization:
Initialized false
Sealed true
Storage Type raft
A sealed or uninitialized vault status can return exit code 2; the displayed state is the important result. Stop immediately if a rebuild candidate unexpectedly reports Initialized true.
22. Initialize Vault with Five Shares and a Threshold of Three
The current prod-vault-01 instance has already been initialized with five shares and threshold three. The procedure below is retained as the authoritative rebuild runbook. Never run the initialization command again against the existing /opt/vault/data Raft data.
Perform this section once, directly on a new or restored-but-uninitialized prod-vault-01. Disable screen sharing and terminal recording. Do not use tee, output redirection, a transcript, or a screenshot.
Command 1 — verify state
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status
Proceed only when Initialized is false.
Command 2 — initialize and display the six recovery values
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator init -key-shares=5 -key-threshold=3
The terminal displays exactly once:
Unseal Key 1: <COPY_TO_SECURE_LOCATION_1>
Unseal Key 2: <COPY_TO_SECURE_LOCATION_2>
Unseal Key 3: <COPY_TO_SECURE_LOCATION_3>
Unseal Key 4: <COPY_TO_SECURE_LOCATION_4>
Unseal Key 5: <COPY_TO_SECURE_LOCATION_5>
Initial Root Token: <COPY_TO_TEMPORARY_SECURE_ROOT_TOKEN_LOCATION>
Copy each value carefully before continuing. Never run vault operator init again against the same Raft data.
Commands 3, 4, and 5 — submit any three different shares interactively
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
# Paste Unseal Key 1 at the hidden prompt.
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
# Paste Unseal Key 2 at the hidden prompt.
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
# Paste Unseal Key 3 at the hidden prompt.
Do not append a share to the command line; interactive entry prevents it from entering shell history or process arguments.
Command 6 — verify
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status
Expected:
Initialized true
Sealed false
Storage Type raft
23. Secure and Verify Initialization Material
Create six separate records immediately:
prod-vault-01 — Unseal Share 1
prod-vault-01 — Unseal Share 2
prod-vault-01 — Unseal Share 3
prod-vault-01 — Unseal Share 4
prod-vault-01 — Unseal Share 5
prod-vault-01 — Initial Root Token
Preferred custody keeps every share separate. A practical single-operator minimum is:
Encrypted location A: shares 1 and 2
Encrypted location B: shares 3 and 4, stored separately from A
Encrypted location C: share 5
Temporary root-token record: separate from all share locations
No single storage location should contain three shares. Keep the five shares permanently. Keep the initial root token only until the named administrator and logical baseline are validated, then revoke it and delete any plaintext copy.
Suitable storage includes password-manager secure notes with strong MFA and separate BitLocker-encrypted offline media. Do not use Git, GitHub secrets, Terraform, Ansible variables, email, chat, screenshots, unencrypted USB drives, ordinary text files, or the Vault VM itself.
After copying from the terminal:
- Re-read every value from its secure destination and compare the beginning and ending characters with the terminal output.
- Clear the client clipboard and clipboard history.
- Close the terminal tab to discard scrollback.
- Do not leave a PowerShell, PuTTY, SSH, or Proxmox-console transcript containing the values.
- Record only non-secret metadata: initialization date, share count
5, threshold3, and custody location labels.
24. Unseal Vault and Perform the First Login
After initialization or any Vault service/VM restart, submit any three distinct shares interactively. Every command reads the protected CA file through sudo; no share appears in a command argument.
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
Verify:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status
For a new rebuild's first root login, avoid putting the token on the command line:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault login
# Paste the Initial Root Token at the hidden prompt.
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault token lookup
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status
The current production instance no longer uses the initial root token; it has been revoked. Do not add VAULT_TOKEN to /etc/environment, .bashrc, shell profiles, systemd units, or scripts. Remove any temporary root Vault token file when the operation is complete:
sudo rm -f /root/.vault-token
25. Enable Two Audit Devices
The earlier design proposed file and syslog audit devices. The working baseline currently enables one file audit device through the installed logical-bootstrap utility. A second audit device can be added later after its destination and failure behavior are approved.
The audit file and rotation are prepared by Ansible:
/var/log/vault/audit.log
owner: vault
group: vault
mode: 0600
rotation: daily, 30 files, 100M size threshold, compressed
The complete installed utility is:
#!/usr/bin/env bash
set -euo pipefail
umask 077
VAULT_ADDR="https://192.168.8.2:8200"
VAULT_CACERT="/opt/vault/tls/vault-cert.pem"
export VAULT_ADDR VAULT_CACERT
POLICY_DIR="/usr/local/share/ac-vault/policies"
AUDIT_LOG="/var/log/vault/audit.log"
CERT_MOUNT="cert"
payload_file="$(mktemp /run/ac-vault-bootstrap.XXXXXX.json)"
login_file=""
cleanup() {
unset VAULT_TOKEN ROOT_TOKEN ADMIN_PASSWORD ADMIN_TOKEN
rm -f "${payload_file}"
if [[ -n "${login_file}" ]]; then
rm -f "${login_file}"
fi
}
trap cleanup EXIT
cat > "${payload_file}"
chmod 0600 "${payload_file}"
ACTION="$(jq -r '.action // "configure"' "${payload_file}")"
ROOT_TOKEN="$(jq -r '.root_token // empty' "${payload_file}")"
ADMIN_USERNAME="$(jq -r '.admin_username // empty' "${payload_file}")"
ADMIN_PASSWORD="$(jq -r '.admin_password // empty' "${payload_file}")"
case "${ACTION}" in
configure|revoke-initial-root)
if [[ -z "${ROOT_TOKEN}" ]]; then
echo "ERROR: root_token is required for action '${ACTION}'." >&2
exit 1
fi
;;
*)
echo "ERROR: Unsupported action '${ACTION}'." >&2
exit 1
;;
esac
set +e
vault status -format=json >/dev/null 2>&1
status_rc=$?
set -e
if [[ "${status_rc}" -ne 0 ]]; then
if [[ "${status_rc}" -eq 2 ]]; then
echo "ERROR: Vault is sealed. Unseal it before logical configuration." >&2
else
echo "ERROR: Vault is not reachable at ${VAULT_ADDR}." >&2
fi
exit 1
fi
export VAULT_TOKEN="${ROOT_TOKEN}"
vault token lookup -format=json >/dev/null
write_policy() {
local policy_name="$1"
local policy_file="${POLICY_DIR}/${policy_name}.hcl"
[[ -f "${policy_file}" ]] || {
echo "ERROR: Missing policy file ${policy_file}." >&2
exit 1
}
vault policy write "${policy_name}" "${policy_file}" >/dev/null
}
enable_auth_if_missing() {
local auth_path="$1"
local auth_type="$2"
local auth_json
auth_json="$(vault auth list -format=json)"
if ! jq -e --arg path "${auth_path}/" 'has($path)' <<< "${auth_json}" >/dev/null; then
vault auth enable -path="${auth_path}" "${auth_type}" >/dev/null
return
fi
actual_type="$(jq -r --arg path "${auth_path}/" '.[$path].type // empty' <<< "${auth_json}")"
if [[ "${actual_type}" != "${auth_type}" ]]; then
echo "ERROR: auth/${auth_path} exists with type '${actual_type}', expected '${auth_type}'." >&2
exit 1
fi
}
configure_approle() {
local role_name="$1"
local policy_name="$2"
vault write "auth/approle/role/${role_name}" \
token_type=batch \
token_policies="${policy_name}" \
token_ttl=15m \
token_max_ttl=30m \
secret_id_ttl=720h \
secret_id_num_uses=0 >/dev/null
}
validate_logical_baseline() {
local audit_json secrets_json auth_json policy_name role_name
audit_json="$(vault audit list -format=json)"
secrets_json="$(vault secrets list -format=json)"
auth_json="$(vault auth list -format=json)"
jq -e 'has("file/")' <<< "${audit_json}" >/dev/null
jq -e --arg path "${CERT_MOUNT}/" \
'.[$path].type == "kv" and .[$path].options.version == "2"' \
<<< "${secrets_json}" >/dev/null
jq -e 'has("userpass/") and has("approle/")' <<< "${auth_json}" >/dev/null
for policy_name in \
platform-admin \
cert-issuer \
cert-reader-local \
cert-reader-dev \
cert-reader-qa \
cert-reader-prod
do
vault policy read "${policy_name}" >/dev/null
done
for role_name in \
cert-issuer \
cert-reader-local \
cert-reader-dev \
cert-reader-qa \
cert-reader-prod
do
vault read "auth/approle/role/${role_name}" >/dev/null
done
vault kv get "${CERT_MOUNT}/prod/infra/_schema" >/dev/null
}
if [[ "${ACTION}" == "configure" ]]; then
if [[ -z "${ADMIN_USERNAME}" || -z "${ADMIN_PASSWORD}" ]]; then
echo "ERROR: admin_username and admin_password are required for configure." >&2
exit 1
fi
touch "${AUDIT_LOG}"
chown vault:vault "${AUDIT_LOG}"
chmod 0600 "${AUDIT_LOG}"
if ! vault audit list -format=json | jq -e 'has("file/")' >/dev/null; then
vault audit enable -path=file file \
file_path="${AUDIT_LOG}" \
hmac_accessor=false \
elide_list_responses=true >/dev/null
fi
secrets_json="$(vault secrets list -format=json)"
if ! jq -e --arg path "${CERT_MOUNT}/" 'has($path)' <<< "${secrets_json}" >/dev/null; then
vault secrets enable -path="${CERT_MOUNT}" kv-v2 >/dev/null
else
existing_type="$(jq -r --arg path "${CERT_MOUNT}/" '.[$path].type // empty' <<< "${secrets_json}")"
existing_version="$(jq -r --arg path "${CERT_MOUNT}/" '.[$path].options.version // empty' <<< "${secrets_json}")"
if [[ "${existing_type}" != "kv" || "${existing_version}" != "2" ]]; then
echo "ERROR: ${CERT_MOUNT}/ exists but is not a KV v2 secrets engine." >&2
exit 1
fi
fi
vault write "${CERT_MOUNT}/config" \
max_versions=20 \
cas_required=false \
delete_version_after=0s >/dev/null
write_policy platform-admin
write_policy cert-issuer
write_policy cert-reader-local
write_policy cert-reader-dev
write_policy cert-reader-qa
write_policy cert-reader-prod
enable_auth_if_missing userpass userpass
enable_auth_if_missing approle approle
admin_payload="$(jq -n \
--arg password "${ADMIN_PASSWORD}" \
'{password:$password,token_policies:"platform-admin",token_ttl:"1h",token_max_ttl:"8h"}')"
printf '%s' "${admin_payload}" | vault write "auth/userpass/users/${ADMIN_USERNAME}" - >/dev/null
unset admin_payload ADMIN_PASSWORD
configure_approle cert-issuer cert-issuer
configure_approle cert-reader-local cert-reader-local
configure_approle cert-reader-dev cert-reader-dev
configure_approle cert-reader-qa cert-reader-qa
configure_approle cert-reader-prod cert-reader-prod
for environment in local dev qa prod; do
for workload_type in web srvc job infra; do
domain_example="${environment}.${workload_type}.example.invalid"
case "${environment}/${workload_type}" in
local/web) domain_example="local.fp.aspireclan.com" ;;
local/srvc) domain_example="local.api.fp.aspireclan.com" ;;
dev/web) domain_example="dev.fp.aspireclan.com" ;;
dev/srvc) domain_example="dev.api.fp.aspireclan.com" ;;
qa/web) domain_example="qa.fp.aspireclan.com" ;;
qa/srvc) domain_example="qa.api.fp.aspireclan.com" ;;
prod/web) domain_example="fp.aspireclan.com" ;;
prod/srvc) domain_example="api.fp.aspireclan.com" ;;
prod/infra) domain_example="vault.aspireclan.com" ;;
esac
schema_path="${CERT_MOUNT}/${environment}/${workload_type}/_schema"
if ! vault kv get -format=json "${schema_path}" >/dev/null 2>&1; then
vault kv put "${schema_path}" \
environment="${environment}" \
workload_type="${workload_type}" \
path_format="cert/${environment}/${workload_type}/<workload-name>" \
domain_example="${domain_example}" \
certificate_fields="domain,certificate_pem,private_key_pem,chain_pem,fullchain_pem,sans_json,issuer,serial_number,not_before,not_after,renew_after,acme_directory,updated_at" \
schema_version="1" >/dev/null
fi
done
done
validate_logical_baseline
echo "PASS: Vault logical baseline configured and validated."
echo "Configured: file audit device, cert/ KV v2, userpass, AppRole, policies, roles, and certificate path schema."
echo "No AppRole SecretIDs were generated or printed."
elif [[ "${ACTION}" == "revoke-initial-root" ]]; then
if [[ -z "${ADMIN_USERNAME}" || -z "${ADMIN_PASSWORD}" ]]; then
echo "ERROR: admin_username and admin_password are required for root-token retirement." >&2
exit 1
fi
login_payload="$(jq -n --arg password "${ADMIN_PASSWORD}" '{password:$password}')"
login_file="$(mktemp /run/ac-vault-login.XXXXXX.json)"
printf '%s' "${login_payload}" > "${login_file}"
chmod 0600 "${login_file}"
ADMIN_TOKEN="$(curl --silent --show-error --fail \
--cacert "${VAULT_CACERT}" \
--request POST \
--data @"${login_file}" \
"${VAULT_ADDR}/v1/auth/userpass/login/${ADMIN_USERNAME}" | jq -r '.auth.client_token')"
[[ -n "${ADMIN_TOKEN}" && "${ADMIN_TOKEN}" != "null" ]] || {
echo "ERROR: Unable to authenticate the platform administrator." >&2
exit 1
}
VAULT_TOKEN="${ADMIN_TOKEN}" vault secrets list -format=json >/dev/null
VAULT_TOKEN="${ADMIN_TOKEN}" vault policy read platform-admin >/dev/null
VAULT_TOKEN="${ADMIN_TOKEN}" vault audit list -format=json >/dev/null
VAULT_TOKEN="${ROOT_TOKEN}" vault token revoke -self >/dev/null
echo "PASS: The initial root token was revoked after platform-admin validation."
fi
The utility enables file/ only when absent, validates it, and never prints AppRole SecretIDs. The direct-server validation was tested successfully:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault audit list -detailed
sudo tail -n 20 /var/log/vault/audit.log
sudo logrotate -d /etc/logrotate.d/vault-audit
At least one healthy audit device must remain enabled before production certificate keys or ACME credentials are written.
26. Create Human Administrator Authentication and Revoke Root
The direct-server sudo procedure in this section was tested successfully on prod-vault-01. The installed logical-bootstrap utility creates the file audit device, cert/ KV v2 mount, policies, userpass, AppRole roles, and schema placeholders without generating any AppRole SecretID.
Confirm the utility and policies exist
sudo test -x /usr/local/sbin/ac-vault-bootstrap-logical &&
echo "PASS: Logical-bootstrap utility exists."
sudo find /usr/local/share/ac-vault/policies \
-maxdepth 1 \
-type f \
-printf '%f\n' | sort
sudo bash -c '
command -v jq
command -v python3
command -v vault
command -v curl
'
Expected policy files:
cert-issuer.hcl
cert-reader-dev.hcl
cert-reader-local.hcl
cert-reader-prod.hcl
cert-reader-qa.hcl
platform-admin.hcl
Stop if the utility or any policy file is missing.
Configure the logical baseline
Use a new administrator password containing at least twenty characters. The root token and administrator password are read from /dev/tty and are not placed in shell history.
sudo bash <<'BASH'
set -euo pipefail
umask 077
read -rsp 'Initial root token: ' ROOT_TOKEN </dev/tty
echo >/dev/tty
read -rp 'Administrator username [manoj-admin]: ' ADMIN_USERNAME </dev/tty
ADMIN_USERNAME="${ADMIN_USERNAME:-manoj-admin}"
if [[ ! "${ADMIN_USERNAME}" =~ ^[a-zA-Z0-9._-]+$ ]]; then
echo "ERROR: Administrator username contains unsupported characters." >&2
unset ROOT_TOKEN ADMIN_USERNAME
exit 1
fi
read -rsp 'New administrator password: ' ADMIN_PASSWORD </dev/tty
echo >/dev/tty
read -rsp 'Confirm administrator password: ' ADMIN_PASSWORD_CONFIRM </dev/tty
echo >/dev/tty
if [[ "${ADMIN_PASSWORD}" != "${ADMIN_PASSWORD_CONFIRM}" ]]; then
echo "ERROR: Administrator passwords do not match." >&2
unset ROOT_TOKEN ADMIN_USERNAME ADMIN_PASSWORD ADMIN_PASSWORD_CONFIRM
exit 1
fi
if (( ${#ADMIN_PASSWORD} < 20 )); then
echo "ERROR: Administrator password must contain at least 20 characters." >&2
unset ROOT_TOKEN ADMIN_USERNAME ADMIN_PASSWORD ADMIN_PASSWORD_CONFIRM
exit 1
fi
set +e
printf '%s\n%s\n%s\n' \
"${ROOT_TOKEN}" \
"${ADMIN_USERNAME}" \
"${ADMIN_PASSWORD}" |
python3 -c '
import json
import sys
root_token = sys.stdin.readline().rstrip("\n")
username = sys.stdin.readline().rstrip("\n")
password = sys.stdin.readline().rstrip("\n")
print(json.dumps({
"action": "configure",
"root_token": root_token,
"admin_username": username,
"admin_password": password
}))
' |
/usr/local/sbin/ac-vault-bootstrap-logical
BOOTSTRAP_RC=$?
set -e
unset ROOT_TOKEN
unset ADMIN_USERNAME
unset ADMIN_PASSWORD
unset ADMIN_PASSWORD_CONFIRM
if (( BOOTSTRAP_RC != 0 )); then
echo "ERROR: Vault logical bootstrap failed." >&2
exit "${BOOTSTRAP_RC}"
fi
BASH
Expected final messages:
PASS: Vault logical baseline configured and validated.
Configured: file audit device, cert/ KV v2, userpass, AppRole, policies, roles, and certificate path schema.
No AppRole SecretIDs were generated or printed.
Validate the named administrator before revoking root
Log in using userpass; the temporary administrator token is stored under root's home because sudo -H is used:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault login -method=userpass username=manoj-admin
Validate the token and logical baseline:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault token lookup
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault audit list -detailed
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault auth list -detailed
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault secrets list -detailed
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault read cert/config
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault policy list
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault policy read platform-admin
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator raft list-peers
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault kv get cert/prod/infra/_schema
sudo tail -n 20 /var/log/vault/audit.log
sudo logrotate -d /etc/logrotate.d/vault-audit
Validate every policy:
for policy in \
platform-admin \
cert-issuer \
cert-reader-local \
cert-reader-dev \
cert-reader-qa \
cert-reader-prod
do
echo
echo "===== ${policy} ====="
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault policy read "${policy}"
done
Validate all AppRoles without generating a SecretID:
for role in \
cert-issuer \
cert-reader-local \
cert-reader-dev \
cert-reader-qa \
cert-reader-prod
do
echo
echo "===== ${role} ====="
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault read "auth/approle/role/${role}"
done
Validate representative schema records:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault kv get cert/prod/infra/_schema
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault kv get cert/dev/web/_schema
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault kv get cert/qa/srvc/_schema
Remove the temporary administrator session before root-token retirement:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault token revoke -self
sudo rm -f /root/.vault-token
sudo test ! -e /root/.vault-token &&
echo "PASS: Temporary root-user Vault token file removed."
Revoke the initial root token only after validation
sudo bash <<'BASH'
set -euo pipefail
umask 077
read -rsp 'Initial root token to revoke: ' ROOT_TOKEN </dev/tty
echo >/dev/tty
read -rp 'Administrator username [manoj-admin]: ' ADMIN_USERNAME </dev/tty
ADMIN_USERNAME="${ADMIN_USERNAME:-manoj-admin}"
read -rsp 'Administrator password: ' ADMIN_PASSWORD </dev/tty
echo >/dev/tty
set +e
printf '%s\n%s\n%s\n' \
"${ROOT_TOKEN}" \
"${ADMIN_USERNAME}" \
"${ADMIN_PASSWORD}" |
python3 -c '
import json
import sys
root_token = sys.stdin.readline().rstrip("\n")
username = sys.stdin.readline().rstrip("\n")
password = sys.stdin.readline().rstrip("\n")
print(json.dumps({
"action": "revoke-initial-root",
"root_token": root_token,
"admin_username": username,
"admin_password": password
}))
' |
/usr/local/sbin/ac-vault-bootstrap-logical
REVOKE_RC=$?
set -e
unset ROOT_TOKEN
unset ADMIN_USERNAME
unset ADMIN_PASSWORD
if (( REVOKE_RC != 0 )); then
echo "ERROR: Initial root-token retirement failed." >&2
exit "${REVOKE_RC}"
fi
BASH
Expected:
PASS: The initial root token was revoked after platform-admin validation.
Permanently delete the separately stored initial-root-token value after revocation. Keep all five unseal shares permanently. The current production instance has completed this root-token-retirement step.
27. Enable KV v2 Mounts
The original working baseline used only cert/. The certificate-issuer integration now requires a second, separately governed KV v2 mount at acme/:
Mount: cert/
Plugin: kv
Version: 2
max_versions: 20
cas_required: false
delete_version_after: 0s
Mount: acme/
Plugin: kv
Version: 2
Target max_versions: 10
Target cas_required: false
Target delete_version_after: 0s
Cloudflare path: acme/cloudflare/dns
The bootstrap utility enables it idempotently:
vault secrets enable -path=cert kv-v2
vault write cert/config max_versions=20 cas_required=false delete_version_after=0s
Verify directly on prod-vault-01 after named-administrator login:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault secrets list -detailed
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault read cert/config
The first direct helper run may already have enabled acme/ before it stopped at acme/config. Verify the mount type before changing anything; do not disable or recreate it blindly. The confirmed error is an authorization failure at the mount configuration path, not a network or TLS failure.
Verify the current state directly on prod-vault-01 after logging in as manoj-admin:
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault secrets list -detailed
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault token capabilities acme/config
Expected before the policy repair is either deny for acme/config or the same 403 permission denied when the helper attempts the write. After the source and live policy are corrected, vault read acme/config must succeed and report the intended KV v2 configuration.
28. Certificate Data Model and Automatic Version History
The approved path format is:
cert/<environment>/<workload-type>/<workload-name>
Environments:
local
dev
qa
prod
Workload types:
web
srvc
job
infra
The bootstrap utility creates a _schema record for every environment/workload-type combination. Recommended certificate fields are:
domain
certificate_pem
private_key_pem
chain_pem
fullchain_pem
sans_json
issuer
serial_number
not_before
not_after
renew_after
acme_directory
updated_at
Every successful KV v2 write to the same logical path creates a new version. The current mount retains up to twenty versions. Do not encode secret material in key names.
29. Least-Privilege Machine Policies
The exact version-controlled policies deployed by Ansible are:
cert-issuer.hcl
# Certificate issuer: write certificate versions, but never destroy history.
path "cert/config" {
capabilities = ["read"]
}
path "cert/data/*" {
capabilities = ["create", "read", "update", "patch"]
}
path "cert/metadata" {
capabilities = ["list"]
}
path "cert/metadata/*" {
capabilities = ["read", "list"]
}
path "cert/subkeys/*" {
capabilities = ["read"]
}
# Read only the Cloudflare credential required for DNS-01.
path "acme/data/cloudflare/dns" {
capabilities = ["read"]
}
path "acme/metadata/cloudflare/dns" {
capabilities = ["read"]
}
cert-reader-dev.hcl
# Read-only certificate consumer policy for the dev environment.
path "cert/data/dev/*" {
capabilities = ["read"]
}
path "cert/metadata/dev" {
capabilities = ["read", "list"]
}
path "cert/metadata/dev/*" {
capabilities = ["read", "list"]
}
path "cert/subkeys/dev/*" {
capabilities = ["read"]
}
cert-reader-local.hcl
# Read-only certificate consumer policy for the local environment.
path "cert/data/local/*" {
capabilities = ["read"]
}
path "cert/metadata/local" {
capabilities = ["read", "list"]
}
path "cert/metadata/local/*" {
capabilities = ["read", "list"]
}
path "cert/subkeys/local/*" {
capabilities = ["read"]
}
cert-reader-prod.hcl
# Read-only certificate consumer policy for the prod environment.
path "cert/data/prod/*" {
capabilities = ["read"]
}
path "cert/metadata/prod" {
capabilities = ["read", "list"]
}
path "cert/metadata/prod/*" {
capabilities = ["read", "list"]
}
path "cert/subkeys/prod/*" {
capabilities = ["read"]
}
cert-reader-qa.hcl
# Read-only certificate consumer policy for the qa environment.
path "cert/data/qa/*" {
capabilities = ["read"]
}
path "cert/metadata/qa" {
capabilities = ["read", "list"]
}
path "cert/metadata/qa/*" {
capabilities = ["read", "list"]
}
path "cert/subkeys/qa/*" {
capabilities = ["read"]
}
platform-admin.hcl
# Non-root human administrator policy for day-to-day Vault operations.
# This intentionally excludes the root policy and sys/raw.
path "sys/health" {
capabilities = ["read", "sudo"]
}
path "sys/auth" {
capabilities = ["read"]
}
path "sys/auth/*" {
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}
path "sys/mounts" {
capabilities = ["read"]
}
path "sys/mounts/*" {
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}
path "sys/policies/acl" {
capabilities = ["list"]
}
path "sys/policies/acl/*" {
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}
path "sys/audit" {
capabilities = ["read", "list", "sudo"]
}
path "sys/audit/*" {
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}
path "sys/storage/raft/*" {
capabilities = ["read", "update", "sudo"]
}
path "auth/*" {
capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}
path "identity/*" {
capabilities = ["create", "read", "update", "delete", "list"]
}
path "cert/*" {
capabilities = ["create", "read", "update", "patch", "delete", "list"]
}
# Required by the direct prod-cert-01 preparation helper.
path "acme/config" {
capabilities = ["create", "read", "update"]
}
path "acme/data/cloudflare/_schema" {
capabilities = ["create", "read", "update", "patch"]
}
path "acme/metadata/cloudflare/_schema" {
capabilities = ["read"]
}
path "acme/data/cloudflare/dns" {
capabilities = ["create", "read", "update", "patch"]
}
path "acme/metadata/cloudflare/dns" {
capabilities = ["read"]
}
The direct policy repair must not exist only in Vault's live policy store or only under /usr/local/share/ac-vault/policies. Update the repository copy first or immediately afterward, apply the same file to the server, and run vault policy write platform-admin .... Otherwise a later Ansible convergence can restore the old policy and reproduce the 403 failure.
30. Prepare AppRole Machine Identities
The logical bootstrap enables AppRole and creates these roles without generating any SecretID:
cert-issuer → policy cert-issuer
cert-reader-local → policy cert-reader-local
cert-reader-dev → policy cert-reader-dev
cert-reader-qa → policy cert-reader-qa
cert-reader-prod → policy cert-reader-prod
Current role configuration:
token_type: batch
token_ttl: 15m
token_max_ttl: 30m
secret_id_ttl: 720h
secret_id_num_uses: 0
AppRole is for machine authentication, not human administration. Generate a SecretID only after the target VM exists and can receive it securely. The RoleID and SecretID should meet only on the consuming machine.
Example future retrieval, performed directly on prod-vault-01 under an approved administrator token:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault read -field=role_id auth/approle/role/cert-issuer/role-id
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault write -wrap-ttl=5m -field=wrapping_token -f auth/approle/role/cert-issuer/secret-id
Do not generate a SecretID until the target issuer VM is installed and ready. The authoritative consumer is now prod-cert-01, not the earlier planning name prod-cert-issuer-01. Do not put the SecretID or wrapping token in Terraform state, GitHub Actions output, Ansible inventory, documentation, or chat.
30.1 Current direct-server checkpoint
The helper is installed on prod-vault-01, and the following attempt authenticated successfully but stopped before secret storage:
Error writing data to acme/config: Error making API request.
URL: PUT https://192.168.8.2:8200/v1/acme/config
Code: 403
permission denied
Interpretation:
Vault TLS: working
userpass authentication: working
platform-admin attachment: working
Vault API: reachable
acme/config authorization: missing
Cloudflare token storage: not reached
wrapped SecretID generation: not reached
prod-cert-01 bootstrap: not reached
30.2 Authoritative Vault-side helper
The exact helper source must exist in Git and be installed by Ansible. The current direct repair installed it at /usr/local/sbin/ac-vault-prepare-cert-issuer:
#!/usr/bin/env bash
set -euo pipefail
umask 077
VAULT_ADDR="https://192.168.8.2:8200"
VAULT_CACERT="/opt/vault/tls/vault-cert.pem"
POLICY_DIR="/usr/local/share/ac-vault/policies"
ACME_MOUNT="acme"
CERT_ISSUER_CIDR="192.168.8.3/32"
export VAULT_ADDR VAULT_CACERT
payload_file="$(mktemp /run/ac-vault-cert-issuer.XXXXXX.json)"
login_file="$(mktemp /run/ac-vault-login.XXXXXX.json)"
cleanup() {
unset VAULT_TOKEN ADMIN_PASSWORD CLOUDFLARE_API_TOKEN
rm -f "${payload_file}" "${login_file}"
}
trap cleanup EXIT
cat > "${payload_file}"
chmod 0600 "${payload_file}"
ADMIN_USERNAME="$(jq -r '.admin_username // empty' "${payload_file}")"
ADMIN_PASSWORD="$(jq -r '.admin_password // empty' "${payload_file}")"
CLOUDFLARE_API_TOKEN="$(jq -r '.cloudflare_api_token // empty' "${payload_file}")"
WRAP_TTL="$(jq -r '.wrap_ttl // "30m"' "${payload_file}")"
[[ "${ADMIN_USERNAME}" =~ ^[a-zA-Z0-9._-]+$ ]] || {
echo "ERROR: admin_username is missing or invalid." >&2
exit 1
}
[[ -n "${ADMIN_PASSWORD}" ]] || {
echo "ERROR: admin_password is required." >&2
exit 1
}
[[ -n "${CLOUDFLARE_API_TOKEN}" ]] || {
echo "ERROR: cloudflare_api_token is required." >&2
exit 1
}
[[ "${WRAP_TTL}" =~ ^[1-9][0-9]*(s|m)$ ]] || {
echo "ERROR: wrap_ttl must be a positive duration in seconds or minutes." >&2
exit 1
}
set +e
vault status -format=json >/dev/null 2>&1
status_rc=$?
set -e
if [[ "${status_rc}" -ne 0 ]]; then
if [[ "${status_rc}" -eq 2 ]]; then
echo "ERROR: Vault is sealed. Unseal it before preparing the issuer." >&2
else
echo "ERROR: Vault is not reachable at ${VAULT_ADDR}." >&2
fi
exit 1
fi
jq -n --arg password "${ADMIN_PASSWORD}" '{password:$password}' > "${login_file}"
VAULT_TOKEN="$(
curl --silent --show-error --fail --cacert "${VAULT_CACERT}" --request POST --data @"${login_file}" "${VAULT_ADDR}/v1/auth/userpass/login/${ADMIN_USERNAME}" |
jq -r '.auth.client_token // empty'
)"
[[ -n "${VAULT_TOKEN}" ]] || {
echo "ERROR: Vault administrator authentication failed." >&2
exit 1
}
export VAULT_TOKEN
vault token lookup -format=json >/dev/null
vault policy write platform-admin "${POLICY_DIR}/platform-admin.hcl" >/dev/null
vault policy write cert-issuer "${POLICY_DIR}/cert-issuer.hcl" >/dev/null
secrets_json="$(vault secrets list -format=json)"
if ! jq -e --arg path "${ACME_MOUNT}/" 'has($path)' <<<"${secrets_json}" >/dev/null; then
vault secrets enable -path="${ACME_MOUNT}" kv-v2 >/dev/null
else
existing_type="$(jq -r --arg path "${ACME_MOUNT}/" '.[$path].type // empty' <<<"${secrets_json}")"
existing_version="$(jq -r --arg path "${ACME_MOUNT}/" '.[$path].options.version // empty' <<<"${secrets_json}")"
if [[ "${existing_type}" != "kv" || "${existing_version}" != "2" ]]; then
echo "ERROR: ${ACME_MOUNT}/ exists but is not a KV v2 secrets engine." >&2
exit 1
fi
fi
vault write "${ACME_MOUNT}/config" max_versions=10 cas_required=false delete_version_after=0s >/dev/null
if ! vault kv get -format=json "${ACME_MOUNT}/cloudflare/_schema" >/dev/null 2>&1; then
vault kv put "${ACME_MOUNT}/cloudflare/_schema" path_format="acme/cloudflare/dns" required_fields="api_token" recommended_scope="Zone:DNS:Edit and Zone:Zone:Read for approved zones only" schema_version="1" >/dev/null
fi
vault kv put "${ACME_MOUNT}/cloudflare/dns" api_token="${CLOUDFLARE_API_TOKEN}" managed_for="prod-cert-01" updated_at="$(date --utc +%Y-%m-%dT%H:%M:%SZ)" >/dev/null
vault write auth/approle/role/cert-issuer token_type=batch token_policies=cert-issuer token_ttl=15m token_max_ttl=30m secret_id_ttl=0 secret_id_num_uses=0 secret_id_bound_cidrs="${CERT_ISSUER_CIDR}" token_bound_cidrs="${CERT_ISSUER_CIDR}" >/dev/null
ROLE_ID="$(vault read -field=role_id auth/approle/role/cert-issuer/role-id)"
WRAPPED_RESPONSE="$(vault write -format=json -wrap-ttl="${WRAP_TTL}" -force auth/approle/role/cert-issuer/secret-id)"
WRAPPING_TOKEN="$(jq -r '.wrap_info.token // empty' <<<"${WRAPPED_RESPONSE}")"
CREATION_PATH="$(jq -r '.wrap_info.creation_path // empty' <<<"${WRAPPED_RESPONSE}")"
[[ -n "${ROLE_ID}" && -n "${WRAPPING_TOKEN}" ]] || {
echo "ERROR: Vault did not return the issuer bootstrap credentials." >&2
exit 1
}
[[ "${CREATION_PATH}" == "auth/approle/role/cert-issuer/secret-id" ]] || {
echo "ERROR: Unexpected wrapping creation path: ${CREATION_PATH}" >&2
exit 1
}
jq -n --arg role_id "${ROLE_ID}" --arg wrapping_token "${WRAPPING_TOKEN}" --arg wrap_ttl "${WRAP_TTL}" --arg creation_path "${CREATION_PATH}" '{role_id:$role_id,wrapping_token:$wrapping_token,wrap_ttl:$wrap_ttl,creation_path:$creation_path}'
Validate the installed helper without displaying secrets:
sudo bash -n /usr/local/sbin/ac-vault-prepare-cert-issuer
sudo stat -c '%n owner=%U group=%G mode=%a size=%s' /usr/local/sbin/ac-vault-prepare-cert-issuer
sudo test -x /usr/local/sbin/ac-vault-prepare-cert-issuer && echo "PASS: preparation utility is executable"
30.3 Apply the corrected policies
Update the repository copies of platform-admin.hcl and cert-issuer.hcl with the ACME stanzas shown in section 29, deploy them to /usr/local/share/ac-vault/policies, then authenticate directly on prod-vault-01:
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault login -method=userpass username=manoj-admin
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault policy write platform-admin /usr/local/share/ac-vault/policies/platform-admin.hcl
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault policy write cert-issuer /usr/local/share/ac-vault/policies/cert-issuer.hcl
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault token capabilities acme/config
Required acme/config capabilities:
create
read
update
30.4 Rerun the helper directly on prod-vault-01
Use a Cloudflare API token restricted to aspireclan.com with Zone:DNS:Edit and Zone:Zone:Read. The password and token prompts are hidden and do not enter shell history:
bash <<'BASH'
set -Eeuo pipefail
umask 077
sudo -v
read -r -p "Vault administrator username [manoj-admin]: " ADMIN_USERNAME </dev/tty
ADMIN_USERNAME="${ADMIN_USERNAME:-manoj-admin}"
read -r -s -p "Vault administrator password: " ADMIN_PASSWORD </dev/tty
echo
read -r -s -p "Restricted Cloudflare DNS API token: " CLOUDFLARE_API_TOKEN </dev/tty
echo
[[ "${ADMIN_USERNAME}" =~ ^[A-Za-z0-9._-]+$ ]] || exit 1
[[ -n "${ADMIN_PASSWORD}" ]] || exit 1
[[ ${#CLOUDFLARE_API_TOKEN} -ge 20 ]] || exit 1
PAYLOAD="$(
jq -n --arg admin_username "${ADMIN_USERNAME}" --arg admin_password "${ADMIN_PASSWORD}" --arg cloudflare_api_token "${CLOUDFLARE_API_TOKEN}" --arg wrap_ttl "30m" '{admin_username:$admin_username,admin_password:$admin_password,cloudflare_api_token:$cloudflare_api_token,wrap_ttl:$wrap_ttl}'
)"
unset ADMIN_PASSWORD CLOUDFLARE_API_TOKEN
RESULT="$(printf '%s' "${PAYLOAD}" | sudo -n /usr/local/sbin/ac-vault-prepare-cert-issuer)"
unset PAYLOAD
jq -e '.role_id != null and .wrapping_token != null and .creation_path == "auth/approle/role/cert-issuer/secret-id"' <<<"${RESULT}" >/dev/null
echo "===== ROLE ID ====="
jq -r '.role_id' <<<"${RESULT}"
echo "===== ONE-USE WRAPPING TOKEN ====="
jq -r '.wrapping_token' <<<"${RESULT}"
echo "IMPORTANT: use the wrapping token within 30 minutes."
BASH
The successful helper run must occur only once per credential-delivery attempt. Copy the RoleID and one-use wrapping token directly to the prod-cert-01 PuTTY session. Do not save them in Git, email, chat, or an ordinary text file.
30.5 Bootstrap the wrapped credential on prod-cert-01
The Vault certificate is already installed at /etc/aspireclan/cert-issuer/vault-ca.pem. Its parent directory is restricted, so run TLS and issuer commands as ac-cert-issuer or root.
bash <<'BASH'
set -Eeuo pipefail
umask 077
CONFIG="/etc/aspireclan/cert-issuer/config.yml"
ISSUER="/usr/local/sbin/ac-cert-issuer"
sudo -v
sudo test -s /etc/aspireclan/cert-issuer/vault-ca.pem
sudo test ! -e /etc/aspireclan/cert-issuer/approle/role_id
sudo test ! -e /etc/aspireclan/cert-issuer/approle/secret_id
read -r -p "Paste the Vault Role ID: " ROLE_ID </dev/tty
read -r -s -p "Paste the one-use Vault wrapping token: " WRAPPING_TOKEN </dev/tty
echo
[[ -n "${ROLE_ID}" && -n "${WRAPPING_TOKEN}" ]]
printf '%s' "${WRAPPING_TOKEN}" | sudo -n "${ISSUER}" --config "${CONFIG}" bootstrap-approle --role-id "${ROLE_ID}"
unset ROLE_ID WRAPPING_TOKEN
echo "PASS: AppRole bootstrap completed."
BASH
Verify without printing credential contents:
sudo stat -c '%n owner=%U group=%G mode=%a size=%s' /etc/aspireclan/cert-issuer/approle/role_id /etc/aspireclan/cert-issuer/approle/secret_id
sudo /usr/sbin/runuser --user ac-cert-issuer -- /usr/bin/env HOME=/var/lib/ac-cert-issuer /usr/local/sbin/ac-cert-issuer --config /etc/aspireclan/cert-issuer/config.yml preflight
Expected while all certificate declarations remain disabled:
Vault AppRole login succeeded
No certificate groups are enabled
Only after preflight passes may the timer be enabled:
sudo systemctl daemon-reload
sudo systemctl enable --now ac-cert-issuer.timer
sudo systemctl start ac-cert-issuer.service
sudo systemctl is-enabled ac-cert-issuer.timer
sudo systemctl is-active ac-cert-issuer.timer
sudo systemctl list-timers ac-cert-issuer.timer --all --no-pager
sudo journalctl -u ac-cert-issuer.service -n 100 --no-pager
31. Configure Raft Snapshot Backups
The dedicated off-host Vault backup path is now implemented with the following approved design:
- Dedicated bucket:
aspireclan-prod-vault-raft-backups-425389089086-us-east-1. - Encryption: SSE-S3 using
AES256; no customer-managed KMS key and no KMS monthly key charge. - S3 Versioning: enabled.
- S3 Object Lock: enabled in Governance mode.
- Public access: fully blocked.
- Object ownership:
BucketOwnerEnforced; ACLs disabled. - Lifecycle classes: hourly, daily, and monthly.
- Upload identity: dedicated IAM user
prod-vault-01-raft-backupwith no S3 delete permission. - Vault identity: dedicated
raft-snapshotpolicy and AppRole. - Execution: root-only script plus systemd services and timers.
- Failure handling: failed local staging retained temporarily and
OnFailure=invokes the SNS alert service. - First hourly snapshot: uploaded successfully to S3.
- Verified S3 metadata:
AES256, a non-empty version ID, Governance Object Lock, and retention through2026-06-17T13:30:19+00:00for the first confirmed object. - Snapshot download from S3: successful.
- Remaining confirmation: downloaded checksum validation, scheduled timer enablement/observation, CloudWatch stale-backup alarm test, and isolated snapshot restore.
31.1 Approved backup schedule and retention
Backup class Schedule Object retention Lifecycle expiration
hourly every hour near minute 17 3 days after 4 days
daily every day at 00:37 UTC 35 days after 36 days
monthly first day at 01:07 UTC 365 days after 366 days
S3 object prefix:
prod-vault-01/<backup-class>/<YYYY>/<MM>/<DD>/
Example objects:
prod-vault-01/hourly/2026/06/14/prod-vault-01-20260614T133019Z.snap
prod-vault-01/hourly/2026/06/14/prod-vault-01-20260614T133019Z.snap.sha256
The hourly recovery-point objective is approximately one hour while Vault is running and unsealed. A Shamir-sealed Vault cannot create a snapshot; after every VM reboot or Vault service restart, submit three shares before scheduled backups can resume.
31.2 Create and harden the dedicated S3 bucket
Run this subsection from AWS CloudShell or another AWS-administrative terminal. Do not run broad AWS administrative credentials on prod-vault-01.
Set the values, replacing the alert email address:
export AWS_PAGER=""
export AWS_REGION="us-east-1"
export AWS_ACCOUNT_ID="425389089086"
export VAULT_BACKUP_BUCKET="aspireclan-prod-vault-raft-backups-425389089086-us-east-1"
export IAM_USER="prod-vault-01-raft-backup"
export IAM_POLICY_NAME="prod-vault-01-raft-backup"
export SNS_TOPIC_NAME="aspireclan-prod-vault-backup-alerts"
export ALERT_EMAIL="<YOUR_ALERT_EMAIL_ADDRESS>"
aws sts get-caller-identity
CURRENT_ACCOUNT_ID="$(
aws sts get-caller-identity --query Account --output text
)"
test "${CURRENT_ACCOUNT_ID}" = "${AWS_ACCOUNT_ID}" || {
echo "ERROR: Connected to ${CURRENT_ACCOUNT_ID}; expected ${AWS_ACCOUNT_ID}." >&2
exit 1
}
Create the bucket once with Object Lock enabled. If head-bucket succeeds, do not rerun create-bucket:
if aws s3api head-bucket --bucket "${VAULT_BACKUP_BUCKET}" 2>/dev/null
then
echo "NOTICE: Bucket already exists."
else
aws s3api create-bucket --bucket "${VAULT_BACKUP_BUCKET}" --region "${AWS_REGION}" --object-lock-enabled-for-bucket
fi
aws s3api put-bucket-ownership-controls --bucket "${VAULT_BACKUP_BUCKET}" --ownership-controls 'Rules=[{ObjectOwnership=BucketOwnerEnforced}]'
aws s3api put-bucket-versioning --bucket "${VAULT_BACKUP_BUCKET}" --versioning-configuration Status=Enabled
aws s3api put-public-access-block --bucket "${VAULT_BACKUP_BUCKET}" --public-access-block-configuration '{
"BlockPublicAcls": true,
"IgnorePublicAcls": true,
"BlockPublicPolicy": true,
"RestrictPublicBuckets": true
}'
Configure SSE-S3 explicitly. This is the approved low-cost design; do not create a customer-managed KMS key for this bucket:
cat > /tmp/prod-vault-backup-encryption.json <<'EOF'
{
"Rules": [
{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}
]
}
EOF
aws s3api put-bucket-encryption --bucket "${VAULT_BACKUP_BUCKET}" --server-side-encryption-configuration file:///tmp/prod-vault-backup-encryption.json
Configure default Object Lock retention:
cat > /tmp/prod-vault-backup-object-lock.json <<'EOF'
{
"ObjectLockEnabled": "Enabled",
"Rule": {
"DefaultRetention": {
"Mode": "GOVERNANCE",
"Days": 3
}
}
}
EOF
aws s3api put-object-lock-configuration --bucket "${VAULT_BACKUP_BUCKET}" --object-lock-configuration file:///tmp/prod-vault-backup-object-lock.json
Configure lifecycle retention:
cat > /tmp/prod-vault-backup-lifecycle.json <<'EOF'
{
"Rules": [
{
"ID": "ExpireHourlyVaultSnapshots",
"Status": "Enabled",
"Filter": {"Prefix": "prod-vault-01/hourly/"},
"Expiration": {"Days": 4},
"NoncurrentVersionExpiration": {"NoncurrentDays": 1},
"AbortIncompleteMultipartUpload": {"DaysAfterInitiation": 1}
},
{
"ID": "ExpireDailyVaultSnapshots",
"Status": "Enabled",
"Filter": {"Prefix": "prod-vault-01/daily/"},
"Expiration": {"Days": 36},
"NoncurrentVersionExpiration": {"NoncurrentDays": 1},
"AbortIncompleteMultipartUpload": {"DaysAfterInitiation": 1}
},
{
"ID": "ExpireMonthlyVaultSnapshots",
"Status": "Enabled",
"Filter": {"Prefix": "prod-vault-01/monthly/"},
"Expiration": {"Days": 366},
"NoncurrentVersionExpiration": {"NoncurrentDays": 1},
"AbortIncompleteMultipartUpload": {"DaysAfterInitiation": 1}
}
]
}
EOF
aws s3api put-bucket-lifecycle-configuration --bucket "${VAULT_BACKUP_BUCKET}" --lifecycle-configuration file:///tmp/prod-vault-backup-lifecycle.json
Require HTTPS and explicit SSE-S3 on every upload:
cat > /tmp/prod-vault-backup-bucket-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyInsecureTransport",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::${VAULT_BACKUP_BUCKET}",
"arn:aws:s3:::${VAULT_BACKUP_BUCKET}/*"
],
"Condition": {
"Bool": {"aws:SecureTransport": "false"}
}
},
{
"Sid": "DenyUploadsWithoutSseS3",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::${VAULT_BACKUP_BUCKET}/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
}
]
}
EOF
aws s3api put-bucket-policy --bucket "${VAULT_BACKUP_BUCKET}" --policy file:///tmp/prod-vault-backup-bucket-policy.json
aws s3api put-bucket-tagging --bucket "${VAULT_BACKUP_BUCKET}" --tagging '{
"TagSet": [
{"Key": "Environment", "Value": "prod"},
{"Key": "Component", "Value": "vault"},
{"Key": "Purpose", "Value": "raft-backup"},
{"Key": "ManagedBy", "Value": "manual-bootstrap"}
]
}'
31.3 Create SNS and the dedicated upload-only IAM identity
Create the notification topic and confirm the email subscription:
SNS_TOPIC_ARN="$(
aws sns create-topic --region "${AWS_REGION}" --name "${SNS_TOPIC_NAME}" --query TopicArn --output text
)"
aws sns subscribe --region "${AWS_REGION}" --topic-arn "${SNS_TOPIC_ARN}" --protocol email --notification-endpoint "${ALERT_EMAIL}"
aws sns list-subscriptions-by-topic --region "${AWS_REGION}" --topic-arn "${SNS_TOPIC_ARN}" --output table
Create the uploader and its least-privilege policy. It intentionally has no S3 delete, governance-bypass, bucket-policy, lifecycle, versioning, or KMS permissions:
aws iam create-user --user-name "${IAM_USER}" --tags Key=Environment,Value=prod Key=Component,Value=vault Key=Purpose,Value=raft-backup
cat > /tmp/prod-vault-backup-uploader-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ReadBucketLocation",
"Effect": "Allow",
"Action": ["s3:GetBucketLocation"],
"Resource": "arn:aws:s3:::${VAULT_BACKUP_BUCKET}"
},
{
"Sid": "ListVaultBackupPrefix",
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": "arn:aws:s3:::${VAULT_BACKUP_BUCKET}",
"Condition": {
"StringLike": {
"s3:prefix": ["prod-vault-01", "prod-vault-01/*"]
}
}
},
{
"Sid": "UploadAndVerifyVaultSnapshots",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:PutObjectRetention",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:GetObjectRetention"
],
"Resource": "arn:aws:s3:::${VAULT_BACKUP_BUCKET}/prod-vault-01/*"
},
{
"Sid": "PublishVaultBackupAlerts",
"Effect": "Allow",
"Action": ["sns:Publish"],
"Resource": "${SNS_TOPIC_ARN}"
},
{
"Sid": "PublishVaultBackupSuccessMetric",
"Effect": "Allow",
"Action": ["cloudwatch:PutMetricData"],
"Resource": "*"
}
]
}
EOF
aws iam put-user-policy --user-name "${IAM_USER}" --policy-name "${IAM_POLICY_NAME}" --policy-document file:///tmp/prod-vault-backup-uploader-policy.json
umask 077
aws iam create-access-key --user-name "${IAM_USER}" > "${HOME}/prod-vault-01-raft-backup-access-key.json"
chmod 0600 "${HOME}/prod-vault-01-raft-backup-access-key.json"
Validate the AWS resources:
aws s3api get-bucket-versioning --bucket "${VAULT_BACKUP_BUCKET}"
aws s3api get-public-access-block --bucket "${VAULT_BACKUP_BUCKET}"
aws s3api get-bucket-ownership-controls --bucket "${VAULT_BACKUP_BUCKET}"
aws s3api get-bucket-encryption --bucket "${VAULT_BACKUP_BUCKET}"
aws s3api get-object-lock-configuration --bucket "${VAULT_BACKUP_BUCKET}"
aws s3api get-bucket-lifecycle-configuration --bucket "${VAULT_BACKUP_BUCKET}"
aws s3api get-bucket-policy-status --bucket "${VAULT_BACKUP_BUCKET}"
Required state:
Versioning: Enabled
Public access: all four controls true
Ownership: BucketOwnerEnforced
Encryption: AES256
Object Lock: Enabled
Default retention: GOVERNANCE, 3 days
Bucket policy public status: false
31.4 Install AWS CLI and protected credentials on prod-vault-01
Connect to prod-vault-01, then install the AWS CLI and required tools:
sudo apt-get update
sudo apt-get install -y ca-certificates curl unzip jq util-linux
sudo bash <<'BASH'
set -euo pipefail
work_dir="$(mktemp -d)"
trap 'rm -rf "${work_dir}"' EXIT
case "$(uname -m)" in
x86_64)
installer_url="https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"
;;
aarch64|arm64)
installer_url="https://awscli.amazonaws.com/awscli-exe-linux-aarch64.zip"
;;
*)
echo "ERROR: Unsupported architecture: $(uname -m)" >&2
exit 1
;;
esac
curl --fail --silent --show-error --location "${installer_url}" --output "${work_dir}/awscliv2.zip"
unzip -q "${work_dir}/awscliv2.zip" -d "${work_dir}"
if command -v aws >/dev/null 2>&1; then
"${work_dir}/aws/install" --bin-dir /usr/local/bin --install-dir /usr/local/aws-cli --update
else
"${work_dir}/aws/install" --bin-dir /usr/local/bin --install-dir /usr/local/aws-cli
fi
BASH
sudo /usr/local/bin/aws --version
Store the upload-only access key interactively. Do not place the secret in command arguments, Git, Terraform, Ansible, documentation, or chat:
sudo bash <<'BASH'
set -euo pipefail
umask 077
install -d -o root -g root -m 0700 /root/.aws
read -rp 'AWS access key ID: ' AWS_ACCESS_KEY_ID </dev/tty
read -rsp 'AWS secret access key: ' AWS_SECRET_ACCESS_KEY </dev/tty
echo >/dev/tty
cat > /root/.aws/credentials <<EOF
[vault-backup]
aws_access_key_id = ${AWS_ACCESS_KEY_ID}
aws_secret_access_key = ${AWS_SECRET_ACCESS_KEY}
EOF
cat > /root/.aws/config <<'EOF'
[profile vault-backup]
region = us-east-1
output = json
cli_pager =
EOF
chmod 0600 /root/.aws/credentials /root/.aws/config
unset AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY
BASH
sudo -H aws sts get-caller-identity --profile vault-backup --region us-east-1
sudo -H aws s3api get-bucket-location --bucket aspireclan-prod-vault-raft-backups-425389089086-us-east-1 --profile vault-backup --region us-east-1
After the identity test succeeds, securely delete the temporary access-key JSON from CloudShell.
31.5 Configure the local backup environment and Vault AppRole
Create the protected configuration directory and file. Replace the SNS ARN with the real topic ARN:
sudo install -d -o root -g root -m 0700 /etc/ac-vault-snapshot
read -rp 'Vault backup SNS topic ARN: ' SNS_TOPIC_ARN
sudo tee /etc/ac-vault-snapshot/config.env >/dev/null <<EOF
AWS_ACCOUNT_ID=425389089086
AWS_REGION=us-east-1
AWS_PROFILE=vault-backup
S3_BUCKET=aspireclan-prod-vault-raft-backups-425389089086-us-east-1
SNS_TOPIC_ARN=${SNS_TOPIC_ARN}
VAULT_ADDR=https://192.168.8.2:8200
VAULT_CACERT=/opt/vault/tls/vault-cert.pem
BACKUP_SERVER=prod-vault-01
CLOUDWATCH_NAMESPACE=Aspireclan/VaultBackup
EOF
sudo chown root:root /etc/ac-vault-snapshot/config.env
sudo chmod 0600 /etc/ac-vault-snapshot/config.env
unset SNS_TOPIC_ARN
Log in as manoj-admin, create the snapshot policy, and create the dedicated AppRole:
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault login -method=userpass username=manoj-admin
sudo tee /etc/ac-vault-snapshot/raft-snapshot.hcl >/dev/null <<'EOF'
path "sys/storage/raft/snapshot" {
capabilities = ["read", "sudo"]
}
path "sys/health" {
capabilities = ["read"]
}
path "auth/token/lookup-self" {
capabilities = ["read"]
}
path "auth/token/revoke-self" {
capabilities = ["update"]
}
EOF
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault policy write raft-snapshot /etc/ac-vault-snapshot/raft-snapshot.hcl
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault write auth/approle/role/raft-snapshot bind_secret_id=true secret_id_ttl=0 secret_id_num_uses=0 secret_id_bound_cidrs="127.0.0.1/32,192.168.8.2/32" token_type=service token_policies="raft-snapshot" token_no_default_policy=true token_ttl=15m token_max_ttl=15m token_num_uses=0 token_bound_cidrs="127.0.0.1/32,192.168.8.2/32"
sudo -H bash <<'BASH'
set -euo pipefail
umask 077
export VAULT_ADDR="https://192.168.8.2:8200"
export VAULT_CACERT="/opt/vault/tls/vault-cert.pem"
ROLE_ID="$(
vault read -field=role_id auth/approle/role/raft-snapshot/role-id
)"
SECRET_ID="$(
vault write -field=secret_id -f auth/approle/role/raft-snapshot/secret-id
)"
cat > /etc/ac-vault-snapshot/approle.env <<EOF
VAULT_ROLE_ID=${ROLE_ID}
VAULT_SECRET_ID=${SECRET_ID}
EOF
chown root:root /etc/ac-vault-snapshot/approle.env
chmod 0600 /etc/ac-vault-snapshot/approle.env
unset ROLE_ID SECRET_ID
BASH
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault token revoke -self
sudo rm -f /root/.vault-token
Create the protected working directories:
sudo install -d -o root -g root -m 0700 /var/lib/ac-vault-snapshot
sudo install -d -o root -g root -m 0700 /var/lib/ac-vault-snapshot/tmp
sudo install -d -o root -g root -m 0700 /var/lib/ac-vault-snapshot/failed
31.6 Install the corrected SSE-S3 snapshot script
The following is the authoritative installed script. The explicit .sealed == false test is important; do not use .sealed // true, because jq treats Boolean false as a fallback value and would incorrectly report an unsealed Vault as sealed.
sudo tee /usr/local/sbin/ac-vault-raft-backup >/dev/null <<'BASH'
#!/usr/bin/env bash
set -Eeuo pipefail
umask 077
BACKUP_CLASS="${1:-}"
case "${BACKUP_CLASS}" in
hourly)
RETENTION_DAYS=3
;;
daily)
RETENTION_DAYS=35
;;
monthly)
RETENTION_DAYS=365
;;
*)
echo "ERROR: Backup class must be hourly, daily, or monthly." >&2
exit 64
;;
esac
source /etc/ac-vault-snapshot/config.env
source /etc/ac-vault-snapshot/approle.env
export AWS_PROFILE
export AWS_REGION
export VAULT_ADDR
export VAULT_CACERT
export VAULT_CLIENT_TIMEOUT="120s"
exec 9>/run/lock/ac-vault-raft-backup.lock
if ! flock -n 9; then
echo "ERROR: Another Vault snapshot process is running." >&2
exit 75
fi
BASE_DIR="/var/lib/ac-vault-snapshot"
FAILED_DIR="${BASE_DIR}/failed"
RUN_ID="$(date -u '+%Y%m%dT%H%M%SZ')-$$"
RUN_DIR="$(mktemp -d "${BASE_DIR}/tmp/run-${RUN_ID}.XXXXXX")"
SUCCESS=0
VAULT_TOKEN=""
LOGIN_PAYLOAD=""
cleanup() {
rc=$?
trap - EXIT
if [[ -n "${VAULT_TOKEN:-}" ]]; then
VAULT_TOKEN="${VAULT_TOKEN}" \
vault token revoke -self \
>/dev/null 2>&1 || true
fi
unset VAULT_TOKEN VAULT_ROLE_ID VAULT_SECRET_ID
if [[ -n "${LOGIN_PAYLOAD:-}" ]]; then
rm -f "${LOGIN_PAYLOAD}" || true
fi
if (( SUCCESS == 1 )); then
rm -rf "${RUN_DIR}" || true
else
failed_destination="${FAILED_DIR}/$(basename "${RUN_DIR}")"
if [[ -d "${RUN_DIR}" ]]; then
mv "${RUN_DIR}" "${failed_destination}" || true
echo "FAILED SNAPSHOT STAGING RETAINED: ${failed_destination}" >&2
fi
fi
exit "${rc}"
}
trap cleanup EXIT
find "${FAILED_DIR}" \
-mindepth 1 \
-maxdepth 1 \
-type d \
-mtime +7 \
-exec rm -rf {} + \
2>/dev/null || true
set +e
STATUS_JSON="$(vault status -format=json 2>/dev/null)"
STATUS_RC=$?
set -e
if (( STATUS_RC != 0 )); then
echo "ERROR: Vault is unreachable, sealed, or not ready; status exit code ${STATUS_RC}." >&2
exit 1
fi
if ! jq -e '.initialized == true' <<<"${STATUS_JSON}" >/dev/null; then
echo "ERROR: Vault is not initialized or its status response is invalid." >&2
exit 1
fi
# Do not use '.sealed // true': jq treats false as a fallback value.
if ! jq -e '.sealed == false' <<<"${STATUS_JSON}" >/dev/null; then
echo "ERROR: Vault is sealed or its status response is invalid." >&2
exit 1
fi
LOGIN_PAYLOAD="$(mktemp "${RUN_DIR}/approle-login.XXXXXX.json")"
jq -n \
--arg role_id "${VAULT_ROLE_ID}" \
--arg secret_id "${VAULT_SECRET_ID}" \
'{role_id: $role_id, secret_id: $secret_id}' \
> "${LOGIN_PAYLOAD}"
chmod 0600 "${LOGIN_PAYLOAD}"
LOGIN_JSON="$(
vault write \
-format=json \
auth/approle/login \
@"${LOGIN_PAYLOAD}"
)"
VAULT_TOKEN="$(jq -er '.auth.client_token' <<<"${LOGIN_JSON}")"
export VAULT_TOKEN
unset LOGIN_JSON
rm -f "${LOGIN_PAYLOAD}"
LOGIN_PAYLOAD=""
STAMP="$(date -u '+%Y%m%dT%H%M%SZ')"
YEAR="${STAMP:0:4}"
MONTH="${STAMP:4:2}"
DAY="${STAMP:6:2}"
SNAPSHOT_NAME="${BACKUP_SERVER}-${STAMP}.snap"
CHECKSUM_NAME="${SNAPSHOT_NAME}.sha256"
SNAPSHOT_PATH="${RUN_DIR}/${SNAPSHOT_NAME}"
CHECKSUM_PATH="${RUN_DIR}/${CHECKSUM_NAME}"
S3_PREFIX="${BACKUP_SERVER}/${BACKUP_CLASS}/${YEAR}/${MONTH}/${DAY}"
SNAPSHOT_KEY="${S3_PREFIX}/${SNAPSHOT_NAME}"
CHECKSUM_KEY="${S3_PREFIX}/${CHECKSUM_NAME}"
RETAIN_UNTIL="$(
date \
-u \
-d "+${RETENTION_DAYS} days" \
'+%Y-%m-%dT%H:%M:%SZ'
)"
vault operator raft snapshot save "${SNAPSHOT_PATH}"
test -s "${SNAPSHOT_PATH}" || {
echo "ERROR: Vault produced an empty snapshot file." >&2
exit 1
}
vault operator raft snapshot inspect "${SNAPSHOT_PATH}" >/dev/null
(
cd "${RUN_DIR}"
sha256sum "${SNAPSHOT_NAME}" > "${CHECKSUM_NAME}"
)
chmod 0600 "${SNAPSHOT_PATH}" "${CHECKSUM_PATH}"
upload_and_verify() {
local local_path="$1"
local object_key="$2"
local content_type="$3"
local local_size remote_size put_result head_result version_id
local_size="$(stat -c '%s' "${local_path}")"
put_result="$(
aws s3api put-object \
--bucket "${S3_BUCKET}" \
--key "${object_key}" \
--body "${local_path}" \
--content-type "${content_type}" \
--server-side-encryption AES256 \
--checksum-algorithm SHA256 \
--object-lock-mode GOVERNANCE \
--object-lock-retain-until-date "${RETAIN_UNTIL}" \
--metadata \
"vault-node=${BACKUP_SERVER},backup-class=${BACKUP_CLASS},created-utc=${STAMP}" \
--expected-bucket-owner "${AWS_ACCOUNT_ID}"
)"
head_result="$(
aws s3api head-object \
--bucket "${S3_BUCKET}" \
--key "${object_key}" \
--expected-bucket-owner "${AWS_ACCOUNT_ID}"
)"
remote_size="$(jq -er '.ContentLength' <<<"${head_result}")"
if [[ "${local_size}" != "${remote_size}" ]]; then
echo "ERROR: S3 object size mismatch for ${object_key}." >&2
exit 1
fi
jq -e '
.ServerSideEncryption == "AES256"
and .ObjectLockMode == "GOVERNANCE"
and .ObjectLockRetainUntilDate != null
and .VersionId != null
' <<<"${head_result}" >/dev/null
version_id="$(jq -r '.VersionId // "unknown"' <<<"${put_result}")"
echo "VERIFIED S3 OBJECT: s3://${S3_BUCKET}/${object_key}"
echo "VERSION ID: ${version_id}"
}
upload_and_verify \
"${SNAPSHOT_PATH}" \
"${SNAPSHOT_KEY}" \
"application/octet-stream"
upload_and_verify \
"${CHECKSUM_PATH}" \
"${CHECKSUM_KEY}" \
"text/plain"
METRIC_DATA="$(
jq -nc \
--arg server "${BACKUP_SERVER}" \
--arg class "${BACKUP_CLASS}" \
'[{
MetricName: "BackupSuccess",
Dimensions: [
{Name: "Server", Value: $server},
{Name: "Class", Value: $class}
],
Value: 1,
Unit: "Count"
}]'
)"
aws cloudwatch put-metric-data \
--namespace "${CLOUDWATCH_NAMESPACE}" \
--metric-data "${METRIC_DATA}"
logger \
-t ac-vault-raft-backup \
-- \
"Vault ${BACKUP_CLASS} snapshot uploaded and verified: s3://${S3_BUCKET}/${SNAPSHOT_KEY}"
echo
echo "PASS: Vault ${BACKUP_CLASS} snapshot uploaded and verified."
echo "SNAPSHOT: s3://${S3_BUCKET}/${SNAPSHOT_KEY}"
echo "CHECKSUM: s3://${S3_BUCKET}/${CHECKSUM_KEY}"
echo "RETAIN UNTIL: ${RETAIN_UNTIL}"
SUCCESS=1
BASH
sudo chown root:root /usr/local/sbin/ac-vault-raft-backup
sudo chmod 0700 /usr/local/sbin/ac-vault-raft-backup
sudo bash -n /usr/local/sbin/ac-vault-raft-backup
Install the immediate SNS failure-alert script:
sudo tee /usr/local/sbin/ac-vault-backup-alert >/dev/null <<'BASH'
#!/usr/bin/env bash
set -Eeuo pipefail
umask 077
BACKUP_CLASS="${1:-unknown}"
source /etc/ac-vault-snapshot/config.env
export AWS_PROFILE
export AWS_REGION
UNIT_NAME="ac-vault-raft-backup@${BACKUP_CLASS}.service"
HOST_NAME="$(hostname --fqdn 2>/dev/null || hostname)"
FAILED_AT="$(date -u '+%Y-%m-%dT%H:%M:%SZ')"
JOURNAL_EXCERPT="$(
journalctl \
-u "${UNIT_NAME}" \
-n 30 \
--no-pager \
2>/dev/null || true
)"
MESSAGE="$(cat <<EOF
Aspireclan production Vault Raft backup failed.
Server: ${HOST_NAME}
Backup class: ${BACKUP_CLASS}
UTC time: ${FAILED_AT}
Systemd unit: ${UNIT_NAME}
S3 bucket: ${S3_BUCKET}
Recent journal output:
${JOURNAL_EXCERPT}
EOF
)"
aws sns publish \
--topic-arn "${SNS_TOPIC_ARN}" \
--subject "FAILED: prod-vault-01 ${BACKUP_CLASS} Raft backup" \
--message "${MESSAGE}" \
>/dev/null
logger \
-t ac-vault-backup-alert \
-- \
"Published Vault ${BACKUP_CLASS} backup failure notification."
BASH
sudo chown root:root /usr/local/sbin/ac-vault-backup-alert
sudo chmod 0700 /usr/local/sbin/ac-vault-backup-alert
sudo bash -n /usr/local/sbin/ac-vault-backup-alert
31.7 Install systemd services and timers
Install the backup service with explicit root AWS profile paths:
sudo tee /etc/systemd/system/ac-vault-raft-backup@.service >/dev/null <<'EOF'
[Unit]
Description=Create and upload the prod-vault-01 %i Raft snapshot
Wants=network-online.target
After=network-online.target vault.service
OnFailure=ac-vault-raft-backup-failure@%i.service
[Service]
Type=oneshot
User=root
Group=root
Environment=HOME=/root
Environment=AWS_SHARED_CREDENTIALS_FILE=/root/.aws/credentials
Environment=AWS_CONFIG_FILE=/root/.aws/config
ExecStart=/usr/local/sbin/ac-vault-raft-backup %i
UMask=0077
Nice=10
IOSchedulingClass=best-effort
IOSchedulingPriority=7
PrivateTmp=true
PrivateDevices=true
ProtectSystem=strict
ProtectHome=read-only
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictSUIDSGID=true
LockPersonality=true
NoNewPrivileges=true
ReadOnlyPaths=/etc/ac-vault-snapshot
ReadOnlyPaths=/opt/vault/tls
ReadOnlyPaths=/root/.aws
ReadWritePaths=/var/lib/ac-vault-snapshot
ReadWritePaths=/run/lock
TimeoutStartSec=15min
EOF
sudo tee /etc/systemd/system/ac-vault-raft-backup-failure@.service >/dev/null <<'EOF'
[Unit]
Description=Notify operators that the prod-vault-01 %i Raft snapshot failed
Wants=network-online.target
After=network-online.target
[Service]
Type=oneshot
User=root
Group=root
Environment=HOME=/root
Environment=AWS_SHARED_CREDENTIALS_FILE=/root/.aws/credentials
Environment=AWS_CONFIG_FILE=/root/.aws/config
ExecStart=/usr/local/sbin/ac-vault-backup-alert %i
UMask=0077
PrivateTmp=true
PrivateDevices=true
ProtectSystem=strict
ProtectHome=read-only
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
RestrictSUIDSGID=true
LockPersonality=true
NoNewPrivileges=true
ReadOnlyPaths=/etc/ac-vault-snapshot
ReadOnlyPaths=/root/.aws
TimeoutStartSec=2min
EOF
Install the timers:
sudo tee /etc/systemd/system/ac-vault-raft-backup-hourly.timer >/dev/null <<'EOF'
[Unit]
Description=Run the prod-vault-01 hourly Raft snapshot
[Timer]
OnCalendar=*-*-* *:17:00
Persistent=true
AccuracySec=1min
RandomizedDelaySec=2min
Unit=ac-vault-raft-backup@hourly.service
[Install]
WantedBy=timers.target
EOF
sudo tee /etc/systemd/system/ac-vault-raft-backup-daily.timer >/dev/null <<'EOF'
[Unit]
Description=Run the prod-vault-01 daily Raft snapshot
[Timer]
OnCalendar=*-*-* 00:37:00 UTC
Persistent=true
AccuracySec=1min
RandomizedDelaySec=2min
Unit=ac-vault-raft-backup@daily.service
[Install]
WantedBy=timers.target
EOF
sudo tee /etc/systemd/system/ac-vault-raft-backup-monthly.timer >/dev/null <<'EOF'
[Unit]
Description=Run the prod-vault-01 monthly Raft snapshot
[Timer]
OnCalendar=*-*-01 01:07:00 UTC
Persistent=true
AccuracySec=1min
RandomizedDelaySec=2min
Unit=ac-vault-raft-backup@monthly.service
[Install]
WantedBy=timers.target
EOF
sudo systemctl daemon-reload
sudo systemd-analyze verify /etc/systemd/system/ac-vault-raft-backup@.service /etc/systemd/system/ac-vault-raft-backup-failure@.service /etc/systemd/system/ac-vault-raft-backup-hourly.timer /etc/systemd/system/ac-vault-raft-backup-daily.timer /etc/systemd/system/ac-vault-raft-backup-monthly.timer
31.8 Run the first backup and verify S3
Confirm Vault is unsealed, then run the script directly so the complete result is visible:
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault status
sudo /usr/local/sbin/ac-vault-raft-backup hourly
The confirmed first upload produced a snapshot of 46594 bytes with this S3 state:
Encryption: AES256
VersionId: 6BLmKNuyha0hDytMyMhisiSZiLwIfWVJ
ObjectLockMode: GOVERNANCE
RetainUntil: 2026-06-17T13:30:19+00:00
Metadata:
backup-class: hourly
created-utc: 20260614T133019Z
vault-node: prod-vault-01
List the newest objects and populate both shell variables. The checksum variable assignment must not be omitted:
LATEST_SNAPSHOT_KEY="$(
sudo -H aws s3api list-objects-v2 --bucket aspireclan-prod-vault-raft-backups-425389089086-us-east-1 --prefix prod-vault-01/hourly/ --profile vault-backup --region us-east-1 --query 'sort_by(Contents[?ends_with(Key, `.snap`)], &LastModified)[-1].Key' --output text
)"
if [[ -z "${LATEST_SNAPSHOT_KEY}" || "${LATEST_SNAPSHOT_KEY}" == "None" ]]; then
echo "ERROR: No hourly snapshot was found in S3." >&2
exit 1
fi
LATEST_CHECKSUM_KEY="${LATEST_SNAPSHOT_KEY}.sha256"
printf 'Snapshot key: %s
' "${LATEST_SNAPSHOT_KEY}"
printf 'Checksum key: %s
' "${LATEST_CHECKSUM_KEY}"
sudo -H aws s3api head-object --bucket aspireclan-prod-vault-raft-backups-425389089086-us-east-1 --key "${LATEST_SNAPSHOT_KEY}" --profile vault-backup --region us-east-1 --query '{
Size:ContentLength,
Encryption:ServerSideEncryption,
VersionId:VersionId,
ObjectLockMode:ObjectLockMode,
RetainUntil:ObjectLockRetainUntilDate,
Metadata:Metadata
}'
Download both objects, validate the checksum, and inspect the snapshot:
VERIFY_DIR="$(
sudo mktemp -d /var/lib/ac-vault-snapshot/verify.XXXXXX
)"
sudo -H aws s3api get-object --bucket aspireclan-prod-vault-raft-backups-425389089086-us-east-1 --key "${LATEST_SNAPSHOT_KEY}" --profile vault-backup --region us-east-1 "${VERIFY_DIR}/$(basename "${LATEST_SNAPSHOT_KEY}")"
sudo -H aws s3api get-object --bucket aspireclan-prod-vault-raft-backups-425389089086-us-east-1 --key "${LATEST_CHECKSUM_KEY}" --profile vault-backup --region us-east-1 "${VERIFY_DIR}/$(basename "${LATEST_CHECKSUM_KEY}")"
sudo bash -c "
cd '${VERIFY_DIR}'
sha256sum -c '$(basename "${LATEST_CHECKSUM_KEY}")'
"
sudo vault operator raft snapshot inspect "${VERIFY_DIR}/$(basename "${LATEST_SNAPSHOT_KEY}")"
sudo rm -rf "${VERIFY_DIR}"
unset VERIFY_DIR LATEST_SNAPSHOT_KEY LATEST_CHECKSUM_KEY
Verify the uploader cannot delete a backup. The command must fail with AccessDenied:
if sudo -H aws s3api delete-object --bucket aspireclan-prod-vault-raft-backups-425389089086-us-east-1 --key "${LATEST_SNAPSHOT_KEY}" --profile vault-backup --region us-east-1
then
echo "ERROR: Uploader unexpectedly has delete permission." >&2
exit 1
else
echo "PASS: Uploader cannot delete Vault backup objects."
fi
31.9 Enable and observe scheduled backups
Only enable the timers after the manual upload and downloaded checksum verification succeed:
sudo systemctl enable --now ac-vault-raft-backup-hourly.timer ac-vault-raft-backup-daily.timer ac-vault-raft-backup-monthly.timer
sudo systemctl is-enabled ac-vault-raft-backup-hourly.timer ac-vault-raft-backup-daily.timer ac-vault-raft-backup-monthly.timer
sudo systemctl is-active ac-vault-raft-backup-hourly.timer ac-vault-raft-backup-daily.timer ac-vault-raft-backup-monthly.timer
sudo systemctl list-timers 'ac-vault-raft-backup-*' --all
Test the immediate failure path without creating a snapshot:
sudo systemctl start ac-vault-raft-backup@invalid-test.service || true
sudo journalctl -u ac-vault-raft-backup@invalid-test.service -n 50 --no-pager
sudo journalctl -u ac-vault-raft-backup-failure@invalid-test.service -n 50 --no-pager
sudo systemctl reset-failed ac-vault-raft-backup@invalid-test.service || true
If a completed failure-alert instance is no longer loaded, systemd can report Unit ... not loaded during reset-failed; that is harmless because the failure-alert oneshot finished successfully and has no failed state to clear.
31.10 Create the independent stale-backup alarm
Run from AWS CloudShell after a successful hourly metric exists:
aws cloudwatch put-metric-alarm --region us-east-1 --alarm-name "prod-vault-01-hourly-raft-backup-missing" --alarm-description "Alert when prod-vault-01 does not publish a successful hourly Raft backup metric for two consecutive hours." --namespace "Aspireclan/VaultBackup" --metric-name "BackupSuccess" --dimensions Name=Server,Value=prod-vault-01 Name=Class,Value=hourly --statistic Sum --period 3600 --evaluation-periods 2 --datapoints-to-alarm 2 --threshold 1 --comparison-operator LessThanThreshold --treat-missing-data breaching --alarm-actions "${SNS_TOPIC_ARN}" --ok-actions "${SNS_TOPIC_ARN}"
aws cloudwatch describe-alarms --region us-east-1 --alarm-names "prod-vault-01-hourly-raft-backup-missing"
Test the notification path, then return the alarm to OK:
aws cloudwatch set-alarm-state --region us-east-1 --alarm-name "prod-vault-01-hourly-raft-backup-missing" --state-value ALARM --state-reason "Manual notification test after configuring Vault S3 backups"
aws cloudwatch set-alarm-state --region us-east-1 --alarm-name "prod-vault-01-hourly-raft-backup-missing" --state-value OK --state-reason "Manual notification test completed"
31.11 Operational validation and remaining resilience gate
Implemented and directly observed:
[✓] Dedicated S3 bucket exists
[✓] SSE-S3 AES256 configured
[✓] S3 Versioning enabled
[✓] S3 Object Lock enabled
[✓] Governance retention applied
[✓] Public access blocked
[✓] BucketOwnerEnforced configured
[✓] Lifecycle rules configured
[✓] Dedicated IAM uploader configured without delete permission
[✓] Snapshot-only Vault policy and AppRole configured
[✓] Corrected Vault sealed-state Boolean test installed
[✓] First hourly snapshot uploaded to S3
[✓] Snapshot object version ID returned
[✓] AES256 and Governance metadata verified
[✓] Snapshot downloaded successfully
[✓] OnFailure service invoked when Vault was sealed
Still requiring operator confirmation:
[ ] Download checksum object and run sha256sum -c successfully
[ ] Inspect the downloaded snapshot successfully
[ ] Confirm uploader delete request is denied
[ ] Enable and observe hourly, daily, and monthly timers
[ ] Confirm SNS email delivery for an intentional failure
[ ] Create and test the CloudWatch stale-backup alarm
[ ] Complete the isolated snapshot-restore exercise
Do not perform the restore exercise on prod-vault-01. Restore only to an isolated recovery VM with production DNS, clients, and external credential systems blocked.
32. Reboot and Manual Unseal Procedure
The reboot and three-share manual-unseal procedure was tested successfully on prod-vault-01.
Before reboot, confirm that at least three shares are immediately available and that a fresh snapshot has been copied to an approved off-host destination.
Reboot:
sudo reboot
Reconnect and validate the service and sealed state:
ssh acllc@192.168.8.2
sudo systemctl is-active vault
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status
Expected immediately after reboot:
Initialized true
Sealed true
Storage Type raft
Submit any three distinct shares interactively:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
Verify:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status
Expected normal operating state:
Initialized true
Sealed false
Storage Type raft
Test the named administrator after reboot:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault login -method=userpass username=manoj-admin
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault token lookup
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault audit list
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault secrets list
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator raft list-peers
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault kv get cert/prod/infra/_schema
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault token revoke -self
sudo rm -f /root/.vault-token
Never script the shares into systemd, cron, cloud-init, Terraform, Ansible, or local plaintext files.
33. Vault UI Access
The UI is enabled by ui = true and runs on the same HTTPS listener as the API.
Open from an internal browser:
https://192.168.8.2:8200/ui
https://vault.aspireclan.com:8200/ui
The bootstrap certificate is self-signed, so the browser will show a trust warning until the certificate is replaced or explicitly trusted.
State-dependent behavior:
- Uninitialized rebuild: initialization screen.
- Initialized but sealed: unseal screen.
- Initialized and unsealed: login screen.
The current production instance is initialized, unsealed during normal operation, and has its initial root token revoked. Use:
Authentication method: Username
Mount path: userpass
Username: manoj-admin
Password: <YOUR_STRONG_ADMIN_PASSWORD>
Do not expose TCP 8200 to the public internet.
34. Complete Validation Checklist
Run the Terraform portion from the Terraform control host or repository checkout, not from prod-vault-01:
cd envs/prod
terraform init -input=false -reconfigure
terraform validate
terraform plan \
-input=false \
-lock-timeout=5m \
-var-file=terraform.tfvars \
-var-file=web.tfvars \
-var-file=app.tfvars \
-var-file=db.tfvars \
-var-file=k8s.tfvars \
-var-file=runner.tfvars \
-var-file=vault.tfvars
Run the following direct-server checklist on prod-vault-01; privileged checks consistently use sudo:
# VM identity and networking
sudo hostnamectl --static
sudo ip -brief address
sudo ip route
sudo resolvectl status
sudo getent ahostsv4 vault.aspireclan.com
# Host hardening and services
sudo swapon --show
sudo ufw status verbose
sudo systemctl is-active ssh qemu-guest-agent vault
# TLS files, certificate, and ports
sudo test -s /opt/vault/tls/vault-key.pem &&
echo "PASS: TLS private key exists."
sudo test -s /opt/vault/tls/vault-cert.pem &&
echo "PASS: TLS certificate exists."
sudo openssl x509 \
-in /opt/vault/tls/vault-cert.pem \
-noout \
-subject \
-issuer \
-dates \
-ext subjectAltName
sudo ss -lntp | grep -E ':(8200|8201)\b'
# Vault state
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault status
Log in as the named administrator:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault login -method=userpass username=manoj-admin
Validate the logical baseline and backup files:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault token lookup
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault audit list -detailed
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault auth list -detailed
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault secrets list -detailed
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault policy list
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault read cert/config
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator raft list-peers
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault kv get cert/prod/infra/_schema
sudo tail -n 20 /var/log/vault/audit.log
sudo logrotate -d /etc/logrotate.d/vault-audit
sudo ls -lh /var/backups/vault
sudo bash -c '
cd /var/backups/vault
for checksum in *.sha256; do
[ -e "$checksum" ] || continue
sha256sum -c "$checksum"
done
'
Validate the certificate-issuer handoff state without displaying secrets:
sudo test -x /usr/local/sbin/ac-vault-prepare-cert-issuer &&
echo "PASS: issuer preparation utility installed"
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault secrets list -detailed
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault token capabilities acme/config
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault read auth/approle/role/cert-issuer
sudo -H env VAULT_ADDR="https://192.168.8.2:8200" VAULT_CACERT="/opt/vault/tls/vault-cert.pem" vault kv metadata get acme/cloudflare/dns
Do not run the final metadata command until the helper has completed successfully. A missing path is expected at the current 403 checkpoint.
Remove the temporary administrator token:
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault token revoke -self
sudo rm -f /root/.vault-token
Required state before certificate issuer work:
Vault initialized: true
Vault sealed: false during normal operation
Storage type: raft
UI: reachable internally
Audit device: file/ and producing records
Human administrator: userpass login tested
Initial root token: revoked
KV v2 mount: cert/ with max_versions 20
Policies: platform-admin, cert-issuer, environment readers
AppRoles: baseline roles configured; cert-issuer delivery still pending after ACME policy repair
ACME KV v2: mount may exist; acme/config write and Cloudflare secret storage are not yet confirmed
Vault issuer helper: installed; first run stopped at HTTP 403 on acme/config
Manual snapshot/checksum: tested
Reboot/manual unseal: tested
Dedicated SSE-S3 bucket and first hourly upload: complete/tested
Downloaded checksum and snapshot inspection: operator confirmation pending
Hourly/daily/monthly timer observation: pending
CloudWatch stale-backup alarm test: pending
Isolated snapshot restore exercise: pending
UFW: active and default-deny
Public exposure: none
35. Controlled Vault Upgrade Procedure
Treat Vault upgrades as maintenance operations.
Before upgrading:
1. Review the official release notes and upgrade guidance.
2. Save and verify a new Raft snapshot.
3. Copy the snapshot off-host.
4. Confirm at least three unseal shares are recoverable.
5. Record the currently installed package version.
6. Schedule downtime for a single-node service restart and unseal.
Upgrade only to an explicitly approved package version:
sudo apt update
sudo apt-cache policy vault
sudo apt install "vault=<APPROVED_PACKAGE_VERSION>"
sudo /usr/bin/vault version
sudo systemctl restart vault
# Submit three distinct shares interactively.
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
sudo -H env \
VAULT_ADDR="https://192.168.8.2:8200" \
VAULT_CACERT="/opt/vault/tls/vault-cert.pem" \
vault operator unseal
Run the complete validation checklist and save another snapshot after the upgrade.
36. Disaster-Recovery Design
Recovery requires all of the following:
A valid off-host Raft snapshot and checksum
At least three original unseal shares associated with that snapshot
Terraform and Ansible source
The approved DHCP reservation and DNS records
A supported Vault package version
Approved TLS replacement or bootstrap procedure
Recovery outline:
1. Provision a clean replacement VM with Terraform.
2. Apply the Vault Ansible baseline without initializing production data.
3. Install approved TLS material.
4. Follow the official Raft snapshot restore procedure on the isolated replacement.
5. Use the original cluster shares required for the restored data.
6. Validate policies, auth methods, audit devices, cert/ paths, and administrator login.
7. Move the internal service DNS record only after validation.
8. Rotate AppRole SecretIDs and other machine credentials if compromise is possible.
9. Save a new post-recovery snapshot.
Perform the first restore exercise before storing production certificate private keys. Never test a destructive restore on the active node.
37. Handoff to prod-cert-issuer-01
The authoritative issuer VM is prod-cert-01. Its Terraform and Ansible foundation is complete. Complete the remaining Vault credential handoff only after:
Vault is initialized and normally unsealed.
The file audit device is healthy.
The named administrator works.
The initial root token is revoked.
The cert/ KV v2 mount and schema paths exist.
The cert-issuer policy and AppRole exist.
No issuer SecretID has been generated prematurely.
A snapshot has been copied off-host.
A restore exercise is scheduled.
After the platform-admin and cert-issuer policy updates are applied, the helper succeeds, the wrapped SecretID is consumed on prod-cert-01, and preflight passes, the issuer will use paths such as:
cert/local/web/fp
cert/local/srvc/api-fp
cert/dev/web/fp
cert/dev/srvc/api-fp
cert/qa/web/fp
cert/qa/srvc/api-fp
cert/prod/web/fp
cert/prod/srvc/api-fp
cert/prod/infra/vault
Start with Let's Encrypt staging, verify renewal and Vault versioning, then approve production issuance.
38. Handoff to prod-int-proxy-01
The proxy phase begins only after at least one valid certificate bundle exists in cert/.
The proxy design will:
Use an environment-specific read-only AppRole.
Read only the exact certificate environment required.
Render HAProxy PEM bundles to root-owned local files.
Run haproxy -c before every reload.
Reload gracefully only after successful validation.
Never write certificate versions or read issuer credentials.
Avoid direct public exposure of Vault.
Vault Agent or an equivalent controlled fetch mechanism can be introduced after AppRole delivery and renewal behavior are tested.
39. Mandatory Security Rules
No unseal shares in Git, Vault, Terraform, Ansible, GitHub, email, chat, screenshots, or plaintext files.
No initial root token in shell profiles, command arguments, GitHub secrets, or automation logs.
No second initialization of an already initialized Raft backend.
No chmod weakening of /opt/vault/tls to make the private key broadly readable.
Use sudo env for local CLI access to the protected bootstrap certificate.
No permanent VAULT_SKIP_VERIFY=true.
No Vault API or UI exposed directly to the internet.
No automatic restart of an initialized Shamir-sealed Vault without three shares available.
No assumption that SIGHUP reloads listener TLS.
No AppRole SecretID before the target VM exists.
No reader policy with write access.
No certificate issuer with broad Vault administration permissions.
No production private keys before audit and off-host snapshot validation.
No backup considered valid until restore testing succeeds.
40. Complete Implementation Order
1. Confirm prod-dns-01 is healthy at 192.168.8.4. complete
2. Confirm router reservation aa:bb:cc:04:05:01 → 192.168.8.2. complete
3. Confirm vault.aspireclan.com resolves to 192.168.8.2. complete
4. Confirm vault.tfvars, Terraform module, variables, outputs, backend. complete
5. Push to prod and let the DNS-first smart workflow run. complete
6. Confirm Terraform reports no destructive action. complete
7. Confirm Ansible installs Vault and converges prod-vault-01. complete
8. Confirm Vault 2.0.2, TLS, Raft, UFW, and service health. complete
9. Run protected-path validation with the documented sudo commands. complete/tested
10. Initialize once with five shares and threshold three. complete; never rerun
11. Secure all five shares outside the Vault server and automation systems. operator custody; verify/maintain
12. Submit three shares interactively and verify unsealed state. complete/tested
13. Run ac-vault-bootstrap-logical using the sudo terminal-input procedure. complete/tested
14. Test the named userpass administrator. complete/tested
15. Revoke the initial root token through ac-vault-bootstrap-logical. complete
16. Delete the temporary stored initial-root-token value. operator confirmation required
17. Validate file audit, cert/ KV v2, policies, AppRoles, schema records. complete/tested
18. Save and checksum a manual Raft snapshot. complete/tested
19. Create and harden dedicated SSE-S3 Vault backup bucket. complete/tested
20. Test UI login with the named administrator. complete/tested
21. Reboot and test manual three-share unseal. complete/tested
22. Install snapshot scripts, AppRole, systemd services, and first upload. complete/tested
23. Provision prod-cert-01 and apply issuer Ansible foundation. complete/tested
24. Copy Vault listener certificate to prod-cert-01 trust path. complete/tested
25. Install ac-vault-prepare-cert-issuer on prod-vault-01. complete; direct repair performed
26. Run issuer preparation helper. stopped at acme/config with HTTP 403
27. Update version-controlled platform-admin and cert-issuer ACME policies. current next step
28. Apply policies and rerun helper to store token/create wrapped SecretID. pending
29. Bootstrap AppRole on prod-cert-01 and run preflight. pending
30. Enable issuer timer with declarations still disabled. pending
31. Complete checksum/timer/alarm validation and isolated restore exercise. pending
32. Enable first Let's Encrypt staging certificate only after approval. pending
41. Source Consistency Status
This revision was generated from the exact file attached in the current request and mounted as /mnt/data/Pasted text.txt.
Baseline line count: 4965
Baseline SHA-256: f14a72e13e0fbbacfb11eb3d936515f16297aed9aa633c4138b826d0b4778e01
Baseline page ID: prod-vault-01-setup
Baseline H2 sections: 43
The updated file is compared byte-for-byte against that exact baseline. It is not the earlier condensed Vault page and is not an old copy.
The following page structure is preserved:
Frontmatter field order
CustomCodeBlock import
Page-local K8S-overview-style layout CSS
Full-width desktop document container
260px desktop table-of-contents / anchor panel
Vault allocation table styling
Single H1
Forty-three numbered H2 sections
Original H2 section order
Horizontal separators
CustomCodeBlock presentation pattern
Official references section
Continuation prompt section
Material changes added to the exact baseline include:
Authoritative issuer hostname: prod-cert-01
Issuer identity: 192.168.8.3 / VM ID 3156003 / MAC aa:bb:cc:04:05:02
Issuer Terraform and Ansible foundation: complete
Vault trust file on issuer: /etc/aspireclan/cert-issuer/vault-ca.pem
Service-account TLS validation requirement: sudo -u ac-cert-issuer or root
Vault helper: /usr/local/sbin/ac-vault-prepare-cert-issuer
Repository helper source: ansible/files/vault/ac-vault-prepare-cert-issuer
Observed helper result: HTTP 403 at PUT /v1/acme/config
platform-admin ACME capability correction: documented
cert-issuer read-only Cloudflare path: acme/cloudflare/dns
Direct Ubuntu helper execution: documented
Response-wrapped AppRole delivery to prod-cert-01: documented
Current pending state: token storage, wrapped SecretID, preflight, timer, staging issuance
PowerShell operational path: removed from the current issuer handoff
Validation performed on this updated file:
PASS: Exact attached 4965-line baseline read successfully.
PASS: Updated output differs from the exact baseline.
PASS: Updated line count is greater than the baseline line count.
PASS: Frontmatter values and field order preserved.
PASS: Original K8S-style CSS block preserved byte-for-byte.
PASS: All 43 numbered H2 sections remain sequential and in the original order.
PASS: Exactly one rendered H1 remains.
PASS: CustomCodeBlock opening and closing counts match.
PASS: JSX template-literal delimiters are structurally balanced.
PASS: Inserted ac-vault-prepare-cert-issuer script passed bash -n before MDX escaping.
PASS: Required helper, policy, ACME, AppRole, TLS, and status strings are present.
PASS: No real Cloudflare token, Vault token, SecretID, unseal share, AWS secret key, certificate private key, or PEM payload is embedded.
PASS: Unified diff and old/new SHA-256 files generated.
LIMITATION: Full Docusaurus build was not run because the complete documentation repository and package manifest were not attached.
LIMITATION: The policy correction and second helper run are documented as pending because no successful post-403 output was provided.
42. Official References
- Install Vault
- Vault configuration parameters
- TCP listener configuration
- Configure TLS for the TCP listener
- Vault UI
- Vault initialization
- Vault seal and unseal
- Vault operator unseal
- Audit devices
- KV v2
- Enable KV v2
- Userpass authentication
- AppRole authentication
- AppRole best practices
- Vault response wrapping
- Vault policies
- Vault KV v2 API
- Integrated Storage deployment guide
- Raft operator commands
- Manage snapshots
- Save a snapshot
- Restore a snapshot
- Upgrade Vault
- S3 default encryption
- S3 Versioning
- S3 Object Lock
- S3 Block Public Access
- S3 lifecycle configuration
- AWS CLI v2 installation
- CloudWatch metric alarms
- Ansible deb822 repository module
43. Continuation Prompt
We have completed the Terraform, Ansible, and direct-server manual bootstrap foundation for
prod-vault-01at192.168.8.2, MACaa:bb:cc:04:05:01, Proxmox VM ID3156002. Vault 2.0.2 is installed from HashiCorp's official APT repository and runs with Integrated Storage, TLS, UI, UFW, swap disabled, protected files under/opt/vault/tls, and S3 Terraform state atprod/terraform.tfstate.Vault was initialized exactly once with five Shamir shares and threshold three. The direct Ubuntu
sudorunbook was tested for TLS/status checks, interactive unseal, logical bootstrap, namedmanoj-adminvalidation, initial-root-token revocation, file audit,cert/KV v2, policies, AppRole roles, schema records, Raft snapshot/checksum creation, and reboot/manual-unseal validation. Never rerunvault operator initagainst the existing Raft data, never put shares or tokens in command arguments, and do not weaken/opt/vault/tlspermissions.The authoritative certificate issuer is now
prod-cert-01at192.168.8.3, MACaa:bb:cc:04:05:02, Proxmox VM ID3156003. Terraform provisioning and the Ansible issuer foundation are complete. The Vault listener certificate is installed on the issuer as/etc/aspireclan/cert-issuer/vault-ca.pem; because the parent directory is restricted, TLS checks must run asac-cert-issueror root.The Vault-side helper
/usr/local/sbin/ac-vault-prepare-cert-issueris now installed. Its first direct run authenticated successfully but stopped atPUT /v1/acme/configwith HTTP403 permission denied. Therefore, update the version-controlledplatform-admin.hclandcert-issuer.hclwith the documented ACME capabilities, deploy and apply both policies, verifyvault token capabilities acme/config, and rerun the helper. The failed run did not reach Cloudflare-token storage or wrapped SecretID generation.After the helper succeeds, copy only the RoleID and one-use wrapping token directly to the
prod-cert-01PuTTY session, runac-cert-issuer bootstrap-approle, verify the credential-file modes without printing their contents, runpreflight, and enableac-cert-issuer.timeronly while certificate declarations remain disabled. Then enable one Let's Encrypt staging certificate, prove the second run is idempotent, validate renewal and Vault KV versioning, and approve production issuance before beginningprod-int-proxy-01.The dedicated Vault Raft backup bucket
aspireclan-prod-vault-raft-backups-425389089086-us-east-1uses SSE-S3/AES256, Versioning, Governance Object Lock, lifecycle retention, Block Public Access, a no-delete uploader, a snapshot AppRole, root-only scripts, and systemd units. Complete the remaining checksum/timer/SNS/CloudWatch/isolated-restore gates. Preserve the DNS-first smart Terraform workflow, component-aware Ansible behavior, this page's complete 43-section structure, and the K8S-overview-style 260px desktop anchor panel.