Production DNS Setup
1. Purpose
This document is the complete implementation and operations reference for the Aspireclan production DNS server:
Hostname: prod-dns-01
Reserved IP: 192.168.8.4
Service: BIND9
Primary purpose: internal split DNS and controlled forwarding of public DNS queries
The server is the required DNS dependency for production infrastructure such as Vault, the certificate issuer, HAProxy, Harbor, Kubernetes, application servers, and administration clients.
This page covers:
- Proxmox VM provisioning through Terraform.
- Router-side MAC-to-IP reservation.
- Common Ubuntu configuration through Ansible.
- BIND9 installation and configuration.
- Internal authoritative zones.
- Recursive forwarding to approved public resolvers.
- DNSSEC validation.
- Fail-closed UFW rules for UDP and TCP port 53.
- GitHub Actions deployment behavior.
- New-server creation and existing-server update procedures.
- Zone serial management.
- Validation, logging, backup, restore, patching, and disaster recovery.
2. Implementation Status and Start Here
prod-dns-01 is already provisioned and operating at 192.168.8.4.
The attached Terraform repository contains the production DNS VM definition, BIND9 Ansible role, BIND configuration, three managed forward zones, UFW rules, and workflow validation.
Use the correct starting point:
| Operation | First execution section | Purpose |
|---|---|---|
| Build or rebuild a new DNS VM | Section 12 | Begins with identity, reservation, collision, Terraform, and bootstrap checks. |
| Update the existing DNS VM | Section 25 | Validates Git changes and deliberately targets prod-dns-01 without recreating the VM. |
Approved current design:
Operating system: Ubuntu Server 26.04 LTS
Template: tmplt-ub-26-min-base
DNS software: Ubuntu-supported BIND9 package
Deployment: Terraform for VM; Ansible for OS, BIND, zones, and UFW
Addressing: DHCP inside Ubuntu with router-side MAC reservation
DNS IP: 192.168.8.4
Management CIDR: 192.168.8.0/24
Current DNS client ACL: 192.168.8.0/22
Forwarders:
- 1.1.1.1
- 8.8.8.8
Managed zones:
- aspireclan.com
- tidyshelves.com
- shelvera.com
Inbound firewall:
- SSH/TCP 22 from the management CIDR
- DNS/UDP 53 from approved DNS client CIDRs
- DNS/TCP 53 from approved DNS client CIDRs
- deny all other inbound and routed traffic
The installed BIND package version must be recorded from the host with named -v; this page does not hard-code a package version that may change with Ubuntu security updates.
3. Scope and Non-Goals
This page configures one internal BIND server that is both:
- Authoritative for the internal versions of the managed zones.
- Recursive for approved internal clients.
- A forwarding resolver for public names not owned by the local zones.
This page does not:
- Publish an authoritative DNS server directly to the internet.
- Replace the public DNS provider for Aspireclan domains.
- Configure public registrar name-server delegation to
prod-dns-01. - Expose recursion to untrusted networks.
- Configure DHCP service on
prod-dns-01. - Configure encrypted DNS protocols such as DNS-over-TLS or DNS-over-HTTPS.
- Create a second DNS server in phase one.
- Configure dynamic DNS updates.
- Store secrets on the DNS VM.
prod-dns-01 must remain internal-only. Public authoritative DNS remains with the approved external DNS provider.
4. Final Architecture
The current request path is:
Infrastructure VM or approved internal client
↓ DNS query to 192.168.8.4 over UDP/TCP 53
prod-dns-01
├── authoritative answer for managed internal zones
├── cached answer when available
└── forwards other recursive queries to approved public resolvers
├── 1.1.1.1
└── 8.8.8.8
The security boundary is:
Internet
✕ no inbound access to port 53
✕ no public recursive resolver
✕ no public SSH access
Approved management network
→ TCP 22
Approved DNS client networks
→ UDP 53
→ TCP 53
prod-dns-01 outbound
→ UDP/TCP 53 to approved forwarders
→ HTTPS for Ubuntu package updates
→ normal time synchronization and required infrastructure services
TCP port 53 is required in addition to UDP. DNS can use TCP for large responses, truncation fallback, DNSSEC responses, and other standards-compliant operations.
5. Split-DNS Behavior and the Internal Resolver Rule
All Aspireclan infrastructure VMs must use:
192.168.8.4
as their internal DNS resolver.
Do not configure public resolvers such as 1.1.1.1 or 8.8.8.8 directly beside 192.168.8.4 on infrastructure clients. Multiple client resolvers are not a reliable primary/fallback sequence. A client can query any configured resolver, causing internal-only records such as harbor.aspireclan.com or vault.aspireclan.com to return public data or NXDOMAIN intermittently.
Approved rule:
Infrastructure clients:
DNS server: 192.168.8.4 only
prod-dns-01 BIND configuration:
Public forwarding: 1.1.1.1 and 8.8.8.8
Because BIND is authoritative for each configured zone, it does not forward missing names inside that zone. For example, after loading an internal aspireclan.com zone, a query for an absent name.aspireclan.com returns an authoritative negative answer rather than being forwarded to public DNS.
Therefore, every public record that internal clients still need for a managed split-DNS zone must also exist in the internal zone file, normally with the same public value.
6. Approved Build Order and Dependency Role
The production infrastructure dependency order is:
1. prod-dns-01 complete and required first
2. prod-vault-01
3. prod-cert-issuer-01
4. prod-int-proxy-01
DNS must be working before later hosts are configured because those hosts rely on internal service names and public package/repository resolution.
A DNS outage does not necessarily stop already-established connections, but it prevents new name resolution and can block package installation, certificate issuance, Vault access, container pulls, and application routing.
7. VM Profile and Approved Identity
The attached production configuration defines:
| VM name | VM ID | MAC address | Reserved IP | vCPU | RAM | Disk | Template |
|---|---|---|---|---|---|---|---|
prod-dns-01 | 3156004 | aa:bb:cc:04:03:01 | 192.168.8.4 | 2 | 4096 MiB | 40G | tmplt-ub-26-min-base |
Proxmox defaults from envs/prod/terraform.tfvars:
environment = "prod"
pm_api_url = "https://<PROXMOX_HOST>:8006/api2/json"
target_node = "pve"
template_name = "tmplt-ub-26-min-base"
storage = "local-lvm"
bridge = "vmbr0"
Do not configure a static IP inside Ubuntu. The VM uses DHCP and receives 192.168.8.4 from the router reservation for aa:bb:cc:04:03:01.
8. Repository File Structure
The attached implementation uses:
.github/workflows/
└── terraform-proxmox-deploy.yml
envs/prod/
├── terraform.tfvars
├── dns.tfvars
├── main.tf
├── variables.tf
├── outputs.tf
└── ansible/
├── inventory.ini
├── requirements.yml
├── configure-vms.yml
├── bootstrap-vms.yml
├── configure-dns.yml
├── finalize-firewall.yml
├── group_vars/
│ ├── all.yml
│ └── dns.yml
├── templates/
│ └── named.conf.options.j2
└── files/
└── bind/
├── named.conf.local
└── zones/
├── db.aspireclan.com
├── db.tidyshelves.com
└── db.shelvera.com
The attached repository also contains:
envs/prod/ansible/files/bind/named.conf.options
That static file is not deployed by configure-dns.yml; the active source is templates/named.conf.options.j2. Remove the unused static file after confirming no external process consumes it. Keeping two independent files with the same purpose creates configuration drift.
9. Terraform Responsibilities
Terraform manages only the VM resource. It does not install BIND, create zone records, modify the router, or configure Ubuntu networking.
The production variable definition is:
variable "dns_vms" {
description = "DNS VMs for this environment."
type = map(object({
vmid = number
macaddr = string
reserved_ip = string
cores = number
memory = number
disk_size = string
}))
default = {}
}
The module declaration is:
module "dns_vms" {
source = "../../modules/proxmox-vm"
for_each = var.dns_vms
name = each.key
vmid = each.value.vmid
target_node = var.target_node
template_name = var.template_name
storage = var.storage
bridge = var.bridge
macaddr = each.value.macaddr
reserved_ip = each.value.reserved_ip
cores = each.value.cores
memory = each.value.memory
disk_size = each.value.disk_size
}
The current dns.tfvars is:
dns_vms = {
prod-dns-01 = {
vmid = 3156004
macaddr = "aa:bb:cc:04:03:01"
reserved_ip = "192.168.8.4"
cores = 2
memory = 4096
disk_size = "40G"
}
}
The reserved_ip value is documentation and output metadata for the router reservation. The Proxmox provider module does not configure that address inside Ubuntu.
10. DNS Variables and Current Network Scope
The attached group_vars/dns.yml defines:
dns_management_cidrs:
- "192.168.8.0/24"
dns_client_cidrs:
- "192.168.8.0/22"
dns_obsolete_ufw_rules: []
dns_forwarders:
- "1.1.1.1"
- "8.8.8.8"
dns_zones:
- aspireclan.com
- tidyshelves.com
- shelvera.com
The current /22 DNS client ACL covers:
192.168.8.0 through 192.168.11.255
This is broader than the current /24 management network. Preserve it only when clients actually exist in the additional subnets. Otherwise, narrow both the BIND ACL and UFW rules to 192.168.8.0/24 in the same reviewed change.
Never narrow the BIND ACL without narrowing UFW, or narrow UFW without narrowing BIND. Both controls must remain consistent.
11. Approved Hardened BIND Configuration
Use envs/prod/ansible/templates/named.conf.options.j2 as the single source of truth:
acl "internal" {
127.0.0.1;
{% for cidr in dns_client_cidrs %}
{{ cidr }};
{% endfor %}
};
options {
directory "/var/cache/bind";
recursion yes;
allow-query { internal; };
allow-query-cache { internal; };
allow-recursion { internal; };
forward only;
forwarders {
{% for forwarder in dns_forwarders %}
{{ forwarder }};
{% endfor %}
};
dnssec-validation auto;
auth-nxdomain no;
listen-on {
127.0.0.1;
192.168.8.4;
};
listen-on-v6 { none; };
allow-transfer { none; };
minimal-responses yes;
version none;
};
Important behavior:
allow-query,allow-query-cache, andallow-recursionrestrict service to approved clients.forward onlyprevents fallback to direct root-server recursion when both configured forwarders fail.- Explicit
listen-onvalues reduce exposure even before UFW is evaluated. allow-transfer { none; };prevents unauthorized full-zone transfers.dnssec-validation autoenables validating-resolver behavior.version noneavoids returning the exact BIND version through the normal version query.
When a secondary DNS server is added later, replace the global transfer denial with an ACL that permits AXFR/IXFR only to the approved secondary address and use TSIG authentication.
12. New-Build Start: Router Reservation and Collision Checks
This is the first execution section for creating or rebuilding prod-dns-01.
Before Terraform apply, verify the approved identity:
VM name: prod-dns-01
VM ID: 3156004
MAC: aa:bb:cc:04:03:01
Router reservation: aa:bb:cc:04:03:01 → 192.168.8.4
Check Proxmox, the router reservation table, current DHCP leases, ARP/neighbor tables, and existing documentation. From a trusted Linux machine:
ping -c 2 -W 1 192.168.8.4 || true
ip neigh show | grep -F '192.168.8.4' || true
A failed ping does not prove the address is unused. A host can ignore ICMP or be powered off while retaining a reservation.
For a disaster-recovery rebuild that reuses the same identity, power off or isolate the failed/original VM before starting the replacement. Two hosts must never use 192.168.8.4 simultaneously.
13. Terraform Validation, Plan, and Apply
From the production environment:
cd envs/prod
terraform init
terraform fmt -check -recursive
terraform validate
terraform plan \
-var-file=terraform.tfvars \
-var-file=web.tfvars \
-var-file=app.tfvars \
-var-file=db.tfvars \
-var-file=k8s.tfvars \
-var-file=runner.tfvars \
-var-file=dns.tfvars \
-out=tfplan
For a new build, the reviewed plan must show:
Create: prod-dns-01 only
VM ID: 3156004
MAC: aa:bb:cc:04:03:01
Disk: scsi0 on local-lvm, 40G
Bridge: vmbr0
Clone source: tmplt-ub-26-min-base
No replacement of unrelated web, app, database, Kubernetes, or runner VMs
Apply only the reviewed saved plan:
terraform apply tfplan
After creation, verify the VM in Proxmox and confirm that the router assigned 192.168.8.4 to the approved MAC.
14. Ansible Inventory and SSH Host-Key Trust
The attached inventory currently disables strict host-key checking. That behavior is acceptable only during a controlled first bootstrap and must not remain the permanent trust model.
Initial bootstrap entry from the attached source:
[dns]
prod-dns-01 ansible_host=192.168.8.4 ansible_user=acllc ansible_ssh_private_key_file=~/.ssh/id_ed25519_ansible ansible_python_interpreter=/usr/bin/python3 ansible_ssh_common_args='-o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
After verifying the SSH host-key fingerprints through the Proxmox console, enroll the key on the deployment runner:
install -d -m 0700 ~/.ssh
ssh-keygen -R 192.168.8.4
ssh-keyscan -H 192.168.8.4 >> ~/.ssh/known_hosts
chmod 0600 ~/.ssh/known_hosts
ssh-keygen -F 192.168.8.4
Then use the hardened inventory entry:
[dns]
prod-dns-01 ansible_host=192.168.8.4 ansible_user=acllc ansible_ssh_private_key_file=~/.ssh/id_ed25519_ansible ansible_python_interpreter=/usr/bin/python3 ansible_ssh_common_args='-o IdentitiesOnly=yes -o StrictHostKeyChecking=yes'
When the VM is legitimately rebuilt and its host key changes, verify the new fingerprint through the Proxmox console before replacing the known-host entry.
15. Common Ubuntu Baseline
The shared bootstrap playbook must establish:
Permanent hostname: prod-dns-01
/etc/hosts: 127.0.1.1 prod-dns-01
User: acllc
Passwordless sudo: verified
OpenSSH: active
QEMU Guest Agent: active
UFW: installed
UFW logging: low
Default incoming policy: deny
Default routed policy: deny
Default outgoing policy: allow
SSH: allowed only from approved management CIDRs
Time synchronization: active
Addressing: DHCP with router reservation
Run the bootstrap only against the DNS host:
ansible-playbook \
-i envs/prod/ansible/inventory.ini \
envs/prod/ansible/bootstrap-vms.yml \
--limit prod-dns-01
Validate identity and time before installing BIND:
hostnamectl --static
ip -brief address
ip route
timedatectl status
systemctl is-active qemu-guest-agent
sudo -n true
16. Install BIND9 and DNS Utilities
The Ansible role installs the Ubuntu-supported packages:
- bind9
- bind9utils
- dnsutils
Equivalent manual verification:
sudo apt update
sudo apt install -y bind9 bind9utils dnsutils
named -v
dig -v
apt-cache policy bind9
Do not add an unrelated third-party BIND package repository. Use Ubuntu security updates unless an explicitly reviewed requirement demands another source.
The service is managed as:
sudo systemctl enable --now bind9
systemctl is-enabled bind9
systemctl is-active bind9
17. Zone Declarations
Use envs/prod/ansible/files/bind/named.conf.local:
zone "aspireclan.com" {
type primary;
file "/etc/bind/zones/db.aspireclan.com";
allow-update { none; };
allow-transfer { none; };
notify no;
};
zone "tidyshelves.com" {
type primary;
file "/etc/bind/zones/db.tidyshelves.com";
allow-update { none; };
allow-transfer { none; };
notify no;
};
zone "shelvera.com" {
type primary;
file "/etc/bind/zones/db.shelvera.com";
allow-update { none; };
allow-transfer { none; };
notify no;
};
type master remains accepted by BIND, but type primary is used here as the current terminology. Do not enable unauthenticated dynamic updates.
When a secondary DNS server is introduced, update allow-transfer and notify deliberately rather than opening transfers broadly.
18. Zone File Rules and Corrected Baseline
Every zone must contain:
- One SOA record.
- One internal NS record.
- An A record for the NS host.
- A monotonically increasing SOA serial.
- Every internal override required by clients.
- Every public record that internal clients still need from that managed zone.
Use a serial in YYYYMMDDNN form, for example:
2026061301
The attached source contains one confirmed inconsistency:
Current attached value:
ns1.shelvera.com → 192.168.8.60
Approved production DNS value:
ns1.shelvera.com → 192.168.8.4
Correct the shelvera.com zone before the next deployment.
Recommended SOA and NS pattern for each managed zone:
$TTL 300
@ IN SOA ns1.<ZONE>. admin.<ZONE>. (
<YYYYMMDDNN> ; Serial
3600 ; Refresh
900 ; Retry
1209600 ; Expire
300 ) ; Negative Cache TTL
@ IN NS ns1.<ZONE>.
ns1 IN A 192.168.8.4
A 300-second TTL is suitable while infrastructure records are actively changing. Increase it later only when the operational trade-off is understood. Long TTLs make emergency record changes and rollback slower.
19. Aspireclan Internal Zone Example
A corrected structure for db.aspireclan.com is:
$TTL 300
@ IN SOA ns1.aspireclan.com. admin.aspireclan.com. (
<YYYYMMDDNN> ; Serial
3600 ; Refresh
900 ; Retry
1209600 ; Expire
300 ) ; Negative Cache TTL
@ IN NS ns1.aspireclan.com.
ns1 IN A 192.168.8.4
; Preserve the public apex value required by internal clients.
@ IN A <PUBLIC_ASPIRECLAN_APEX_IP>
; Infrastructure records.
prod-dns-01 IN A 192.168.8.4
vault IN A <PROD_VAULT_IP>
harbor IN A <INTERNAL_HARBOR_IP>
; Kubernetes control endpoint and nodes.
ac-cicd-api IN A 192.168.8.61
ac-cicd-lb-01 IN A 192.168.8.61
ac-cicd-cp-01 IN A 192.168.8.62
ac-cicd-cp-02 IN A 192.168.8.63
ac-cicd-cp-03 IN A 192.168.8.64
ac-cicd-prod-wk-01 IN A 192.168.8.71
ac-cicd-prod-wk-02 IN A 192.168.8.72
ac-cicd-qa-wk-01 IN A 192.168.8.81
ac-cicd-dev-wk-01 IN A 192.168.8.91
; Add all other required internal overrides and public records deliberately.
Do not add vault.aspireclan.com until the approved Vault address is known. Do not invent infrastructure addresses in a zone file.
20. TidyShelves and Shelvera Zone Review
The attached tidyshelves.com and shelvera.com zones contain a mixture of internal A records and public CNAME/A records.
Before each deployment:
1. Compare required public records with the public DNS provider.
2. Preserve all public names that internal clients must resolve.
3. Keep internal overrides only where split DNS is intentional.
4. Confirm that a name does not have both CNAME and other record types.
5. Confirm duplicate A records are intentional round-robin values.
6. Correct every ns1 record to 192.168.8.4.
7. Increment the SOA serial exactly once for the reviewed change set.
The duplicate local.api A records in the attached shelvera.com zone resolve to two addresses. Retain both only when active round-robin behavior is intended and both backends are healthy.
The duplicate local-ts-pgsql-db-01 A records in the attached tidyshelves.com zone must likewise be treated as an intentional multi-address result or corrected to one approved address.
21. Ansible DNS Deployment
The DNS playbook must target only:
hosts: dns
It performs:
1. Validate DNS variables.
2. Remove explicitly declared obsolete UFW rules.
3. Preserve SSH access from approved management CIDRs.
4. Allow UDP 53 from approved DNS client CIDRs.
5. Allow TCP 53 from approved DNS client CIDRs.
6. Install BIND9 packages.
7. Create /etc/bind/zones.
8. Render named.conf.options from Jinja variables.
9. Copy named.conf.local.
10. Copy all declared zone files.
11. Run named-checkconf.
12. Run named-checkzone for every zone.
13. Restart BIND only after configuration deployment.
14. Verify service and port listeners.
15. Test every local zone over UDP and TCP.
Run directly when needed:
ansible-galaxy collection install \
-r envs/prod/ansible/requirements.yml
ansible-playbook \
-i envs/prod/ansible/inventory.ini \
envs/prod/ansible/configure-dns.yml \
--limit prod-dns-01 \
--syntax-check
ansible-playbook \
-i envs/prod/ansible/inventory.ini \
envs/prod/ansible/configure-dns.yml \
--limit prod-dns-01
The attached collection pin is:
community.general: 13.0.1
22. UFW Policy
The final firewall baseline is:
Default incoming: deny
Default routed: deny
Default outgoing: allow
Logging: low
Allow TCP 22 from:
192.168.8.0/24
Allow UDP 53 from:
192.168.8.0/22 currently
Allow TCP 53 from:
192.168.8.0/22 currently
The playbook must never run ufw reset on a remotely managed server.
Validate locally:
sudo ufw status numbered
sudo ufw status verbose
sudo ss -lntup | grep -E ':(22|53)\b'
Validate remotely from an approved DNS client:
nc -vz 192.168.8.4 53
dig @192.168.8.4 aspireclan.com SOA +time=3 +tries=1
dig @192.168.8.4 aspireclan.com SOA +tcp +time=3 +tries=1
Do not create 53/tcp or 53/udp rules for Anywhere.
23. New-Build Workflow Execution
The production workflow runs on the prod branch and includes dns.tfvars when present.
For an initial VM creation, Terraform detects prod-dns-01 as created, derives the Ansible target from the plan, applies Terraform, and runs the shared configuration chain:
bootstrap-vms.yml
→ configure-web.yml
→ configure-dns.yml
→ finalize-firewall.yml
The --limit prod-dns-01 boundary means only matching plays affect the DNS host.
Before the first production run, confirm:
Git branch: prod
Runner labels: self-hosted, prod, terraform, deploy
PM_API_TOKEN_ID: configured as a GitHub secret
PM_API_TOKEN_SECRET: configured as a GitHub secret
Router reservation: present
Inventory host: prod-dns-01 → 192.168.8.4
SSH key: available to the deployment user
Ansible collection requirements: resolvable
Never print Proxmox token values in workflow logs.
24. First Deployment Validation
Run from the deployment runner or another approved client:
# Authoritative zones over UDP.
dig @192.168.8.4 aspireclan.com SOA +noall +answer
dig @192.168.8.4 tidyshelves.com SOA +noall +answer
dig @192.168.8.4 shelvera.com SOA +noall +answer
# Authoritative zones over TCP.
dig @192.168.8.4 aspireclan.com SOA +tcp +noall +answer
dig @192.168.8.4 tidyshelves.com SOA +tcp +noall +answer
dig @192.168.8.4 shelvera.com SOA +tcp +noall +answer
# Internal record.
dig @192.168.8.4 prod-dns-01.aspireclan.com A +noall +answer
# Public forwarding.
dig @192.168.8.4 github.com A +noall +answer
# DNSSEC validation behavior.
dig @192.168.8.4 cloudflare.com A +dnssec +comments
# Service and firewall on the server.
ssh acllc@192.168.8.4 \
'systemctl is-active bind9 && sudo named-checkconf && sudo ufw status verbose'
Expected state:
BIND service: active
UDP 53: listening on 127.0.0.1 and 192.168.8.4
TCP 53: listening on 127.0.0.1 and 192.168.8.4
Managed zones: authoritative answers returned
Public names: forwarded and cached
UFW: active with fail-closed defaults
External inbound DNS: not exposed
25. Existing-Server Update Start
This is the first execution section for updating the already-created prod-dns-01.
Do not recreate the VM for normal BIND, zone, UFW, or Ansible changes.
Before deployment:
cd <REPOSITORY_ROOT>
git status --short
git diff -- envs/prod/ansible .github/workflows/terraform-proxmox-deploy.yml
cd envs/prod
terraform fmt -check -recursive
terraform validate
ansible-playbook \
-i ansible/inventory.ini \
ansible/configure-vms.yml \
--limit prod-dns-01 \
--syntax-check
Validate BIND files locally on an Ubuntu/BIND validation host when available:
named-checkconf <RENDERED_NAMED_CONF_ROOT>
named-checkzone aspireclan.com <RENDERED_DB_ASPIRECLAN_FILE>
named-checkzone tidyshelves.com <RENDERED_DB_TIDYSHELVES_FILE>
named-checkzone shelvera.com <RENDERED_DB_SHELVERA_FILE>
Then merge the reviewed change to prod and deliberately run the deployment with:
workflow_dispatch input:
ansible_limit: prod-dns-01
The explicit limit is currently required for DNS-only changes because a Terraform plan with no VM create/update action does not automatically select an Ansible target.
26. Safe Zone-Record Update Procedure
For each zone change:
1. Edit only the required zone file in Git.
2. Confirm the record owner name is relative or fully qualified as intended.
3. Confirm A, AAAA, CNAME, MX, TXT, and SRV syntax.
4. Confirm no CNAME shares an owner with another record type.
5. Increment the SOA serial.
6. Run named-checkzone.
7. Review the exact Git diff.
8. Deploy only to prod-dns-01.
9. Query the record directly from 192.168.8.4.
10. Query an unaffected record and a public forwarded name.
11. Review BIND logs.
Example direct reload after a validated emergency change on the server:
sudo named-checkzone aspireclan.com /etc/bind/zones/db.aspireclan.com
sudo rndc reload aspireclan.com
sudo rndc zonestatus aspireclan.com
dig @127.0.0.1 <CHANGED_NAME> <RECORD_TYPE> +noall +answer
The emergency change must be committed back to Git immediately. Git remains the source of truth.
27. Safe BIND Configuration Update Procedure
For changes to ACLs, forwarders, listening addresses, or zone declarations:
1. Update group_vars/dns.yml or the appropriate template/file.
2. Keep BIND ACLs and UFW client CIDRs synchronized.
3. Render or deploy to a validation host when possible.
4. Run named-checkconf.
5. Validate every zone.
6. Review the Git diff.
7. Run Ansible with --limit prod-dns-01.
8. Confirm the handler restart succeeds.
9. Test local zones over UDP and TCP.
10. Test a public forwarded name.
11. Confirm UFW rules and listeners.
Use rndc reconfig or a controlled service reload only after named-checkconf succeeds. The Ansible role currently uses a restart handler; that is acceptable for this single-node homelab but creates a brief DNS interruption.
A future improvement may replace the restart with reload/reconfig behavior when the exact change permits it.
28. GitHub Actions Targeting Consistency
The attached workflow correctly includes dns.tfvars, validates DNS files, installs Ansible requirements, and tests each DNS zone externally over UDP and TCP.
Current limitation:
DNS file changed
↓
Terraform plan contains no VM create/update
↓
No plan-derived Ansible target
↓
Ansible is skipped unless workflow_dispatch ansible_limit is supplied
Until the workflow is enhanced, every DNS-only deployment must use:
ansible_limit: prod-dns-01
Recommended workflow enhancement:
When changed files match any of these paths:
envs/prod/ansible/configure-dns.yml
envs/prod/ansible/group_vars/dns.yml
envs/prod/ansible/templates/named.conf.options.j2
envs/prod/ansible/files/bind/**
And no manual ansible_limit was supplied:
set ansible_limit=dns
set target_source=DNS configuration change
Keep manual input highest priority, plan-derived changed VM names second, and component-derived targets third. Never default to all for a DNS-only change.
29. SOA Serial Management
Use one monotonically increasing 32-bit unsigned serial per zone.
Approved format:
YYYYMMDDNN
YYYY = four-digit year
MM = two-digit month
DD = two-digit day
NN = revision number for that day, starting at 01
Example:
First change on June 13, 2026: 2026061301
Second change on June 13, 2026: 2026061302
Do not reset a serial to a lower value. A future secondary server relies on serial ordering to detect zone updates.
Check the active serial:
dig @192.168.8.4 aspireclan.com SOA +short
dig @192.168.8.4 tidyshelves.com SOA +short
dig @192.168.8.4 shelvera.com SOA +short
30. Add a New Managed Zone
To add a zone such as <NEW_ZONE>:
1. Confirm that internal split DNS is actually required.
2. Inventory the public records internal clients still need.
3. Create ansible/files/bind/zones/db.<NEW_ZONE>.
4. Add a primary zone block to named.conf.local.
5. Add <NEW_ZONE> to dns_zones.
6. Run named-checkzone.
7. Run named-checkconf.
8. Deploy with --limit prod-dns-01.
9. Test SOA, NS, internal records, and preserved public records.
10. Test an intentionally missing name and confirm the expected NXDOMAIN behavior.
Do not add an entire public domain as an internal authoritative zone merely to override one name without understanding the requirement to maintain all needed records in that zone.
When only a small override is required, evaluate whether a dedicated internal subdomain is safer and easier to maintain.
31. Remove a Managed Zone
Removing a zone changes resolution behavior from internal authoritative answers to public forwarding.
Procedure:
1. Confirm no internal-only records still depend on the zone.
2. Export and back up the current zone file.
3. Remove the zone block from named.conf.local.
4. Remove the zone from dns_zones.
5. Remove the managed file only after the first successful deployment.
6. Run named-checkconf.
7. Deploy with --limit prod-dns-01.
8. Flush BIND cache for the affected name or zone when necessary.
9. Confirm queries now return the intended public DNS answers.
10. Retain the Git history and backup for rollback.
Do not delete a zone file from the server manually while the active configuration still references it.
32. Configure Infrastructure Clients
Infrastructure VMs should receive 192.168.8.4 through DHCP or an Ansible-managed Netplan override.
Example Netplan override for a normal infrastructure client:
network:
version: 2
ethernets:
ens18:
dhcp4: true
dhcp4-overrides:
use-dns: false
nameservers:
addresses:
- 192.168.8.4
Apply and verify:
sudo netplan generate
sudo netplan apply
sudo resolvectl flush-caches
resolvectl dns ens18
getent ahostsv4 harbor.aspireclan.com
getent ahostsv4 github.com
Expected client resolver list:
192.168.8.4 only
The client must not list 1.1.1.1, 8.8.8.8, or the router as an additional resolver when split DNS consistency is required.
33. Configure prod-dns-01's Own Resolver
The DNS server is a special case. It should not depend on an unavailable external client resolver after BIND is healthy.
Approved sequence:
1. During first package installation, use the temporary DHCP-provided resolver.
2. Start BIND and validate local authoritative and forwarded queries.
3. Configure the host resolver to send queries to local BIND.
4. Keep public resolvers only in BIND's forwarders block.
Example systemd-resolved drop-in after BIND validation:
sudo install -d -m 0755 /etc/systemd/resolved.conf.d
sudo tee /etc/systemd/resolved.conf.d/prod-dns-01.conf >/dev/null <<'RESOLVED_EOF'
[Resolve]
DNS=127.0.0.1
FallbackDNS=
Domains=~.
RESOLVED_EOF
sudo systemctl restart systemd-resolved
sudo resolvectl flush-caches
resolvectl status
getent ahostsv4 github.com
Do not configure BIND to forward to 192.168.8.4; that would forward back to itself and create a loop.
Before rebooting after this change, verify both BIND and systemd-resolved are enabled.
34. Forwarding and DNSSEC Validation
Test forwarding:
dig @192.168.8.4 github.com A +noall +answer +stats
dig @192.168.8.4 ubuntu.com AAAA +noall +answer +stats
Test DNSSEC validation with a known signed domain:
dig @192.168.8.4 cloudflare.com A +dnssec +comments
A validated response commonly includes the ad flag when the requesting client permits it and validation succeeds.
Test a known DNSSEC failure domain supplied by an authoritative DNSSEC test service only when that service is currently documented and available. Do not permanently add insecure validation exceptions merely to make a broken external domain resolve.
Check BIND's validation and forwarding logs:
sudo journalctl -u bind9 --since '15 minutes ago' --no-pager
35. Cache Operations
Display server status:
sudo rndc status
Flush one name after an emergency record correction:
sudo rndc flushname <FQDN>
Flush an entire domain subtree only when necessary:
sudo rndc flushtree <DOMAIN>
Flush the complete cache only as a last resort:
sudo rndc flush
Do not routinely flush the cache after every authoritative zone reload. Authoritative data is reloaded separately, and unnecessary full-cache flushes increase external query volume and latency.
36. Logging and Troubleshooting
Primary checks:
systemctl status bind9 --no-pager
sudo journalctl -u bind9 -n 200 --no-pager
sudo journalctl -u bind9 --since today --no-pager
sudo named-checkconf
sudo rndc status
sudo ss -lntup | grep -E ':53\b'
sudo ufw status verbose
Query diagnostics:
dig @127.0.0.1 aspireclan.com SOA +comments +answer
dig @192.168.8.4 github.com A +comments +answer +stats
dig @192.168.8.4 <FQDN> <TYPE> +tcp +comments +answer
Use +trace carefully: it traces delegation directly from the querying machine and does not test the same forwarding path as an ordinary recursive query. For normal resolver validation, query @192.168.8.4 without +trace.
Common failure categories:
| Symptom | Likely cause | First checks |
|---|---|---|
| Internal name intermittently returns NXDOMAIN | Client has public resolvers configured beside 192.168.8.4 | resolvectl status, Netplan, DHCP options |
| All public names fail | Forwarders unreachable, outbound firewall issue, or BIND stopped | rndc status, journal, direct reachability to forwarders |
| One managed-zone public name fails internally | Record missing from the internal authoritative zone | Zone file, public DNS comparison, SOA serial |
| UDP works but TCP fails | Missing TCP 53 UFW rule or listener problem | ufw status, ss -lntup, dig +tcp |
| Changes do not appear after Git push | Workflow skipped Ansible because Terraform had no VM changes | Run workflow dispatch with ansible_limit=prod-dns-01 |
37. Backup Procedure
Back up configuration after every approved change and before package upgrades.
Required data:
/etc/bind/named.conf
/etc/bind/named.conf.options
/etc/bind/named.conf.local
/etc/bind/named.conf.default-zones
/etc/bind/zones/
/etc/default/named when present
Repository commit containing Terraform and Ansible sources
Router DHCP reservation export or documented identity mapping
Installed package versions
Example local archive before an upgrade:
stamp="$(date -u +%Y%m%dT%H%M%SZ)"
sudo tar \
--create \
--gzip \
--file "/var/backups/prod-dns-01-bind-${stamp}.tar.gz" \
/etc/bind
The rendered command creates a timestamped archive under /var/backups.
Copy backups off the VM. Git protects declarative configuration, but it does not replace router reservation data, package inventory, or tested recovery procedures.
38. Disaster Recovery and Restore
Required recovery inputs:
Terraform and Ansible repository
Router reservation for aa:bb:cc:04:03:01 → 192.168.8.4
Proxmox VM identity and template
SSH administrative access
Validated BIND configuration and zone files
Latest known package-version record
Off-host configuration backup
Recovery order:
1. Isolate or power off the failed VM.
2. Confirm 192.168.8.4 is safe to reuse.
3. Recreate prod-dns-01 through Terraform.
4. Verify DHCP assigned 192.168.8.4.
5. Verify and enroll the new SSH host key.
6. Run the common bootstrap with --limit prod-dns-01.
7. Deploy BIND and zones through Ansible.
8. Apply the final fail-closed firewall policy.
9. Validate every managed zone over UDP and TCP.
10. Validate public forwarding and DNSSEC.
11. Reconfigure the host to resolve through local BIND.
12. Validate critical infrastructure names.
13. Monitor logs and client behavior.
A single DNS server is a single point of failure. A future prod-dns-02 should be added before DNS availability becomes a strict production requirement.
39. Reboot Validation
Before reboot:
sudo named-checkconf
sudo systemctl is-enabled bind9
sudo systemctl is-active bind9
sudo ufw status verbose
resolvectl status
Reboot:
sudo reboot
After reboot, validate from an independent approved client:
for zone in aspireclan.com tidyshelves.com shelvera.com; do
dig @192.168.8.4 "${zone}" SOA +short +time=3 +tries=1
dig @192.168.8.4 "${zone}" SOA +tcp +short +time=3 +tries=1
done
dig @192.168.8.4 github.com A +short +time=3 +tries=1
On the server:
systemctl is-active bind9
sudo rndc status
sudo journalctl -u bind9 -b --no-pager
40. Package Update Procedure
BIND receives security updates from Ubuntu. Do not freeze it indefinitely.
Before updating:
1. Review Ubuntu security notices and package changelog.
2. Validate current BIND configuration and zones.
3. Save an off-host configuration backup.
4. Record the current named -v output.
5. Confirm an approved maintenance window.
Update:
sudo apt update
apt list --upgradable 2>/dev/null | grep -E '^(bind9|bind9-utils|bind9utils|dnsutils)/' || true
sudo apt upgrade
After update:
named -v
sudo named-checkconf
sudo systemctl restart bind9
systemctl is-active bind9
sudo journalctl -u bind9 -n 100 --no-pager
Then run the complete UDP, TCP, internal-zone, forwarding, DNSSEC, and UFW validation checklist.
41. Monitoring and Alerting
At minimum, monitor:
Host reachability: 192.168.8.4
UDP 53 response
TCP 53 response
SOA query for every managed zone
Recursive query for a stable public domain
BIND systemd state
Disk space
Memory pressure
Time synchronization
UFW state
Recent BIND errors
Zone serial drift
Recommended probe interval:
Every 1 to 5 minutes for service health
After every deployment for full functional validation
Daily for backup presence and package/security review metadata
Do not use a probe that exposes secrets or attempts zone transfers. A normal SOA query is sufficient for authoritative-zone health.
42. Future Secondary DNS Design
The next availability improvement is:
prod-dns-01: primary
prod-dns-02: secondary
Future requirements:
- A unique reserved IP, MAC, and VM ID for
prod-dns-02. - A second failure domain where practical.
- TSIG-authenticated zone transfers.
allow-transferrestricted to the secondary.also-notifyor normal NOTIFY behavior restricted to the secondary.- Client DHCP/Netplan configuration listing both internal DNS servers.
- Independent monitoring of both servers.
- Tested primary loss and recovery.
Do not list a public resolver as the second client DNS server. The correct redundancy model is two internal split-DNS servers containing the same zones.
43. Complete Validation Checklist
Run before declaring the DNS service ready:
# Identity and networking.
hostnamectl --static
ip -brief address
ip route
# Time and firewall.
timedatectl status
sudo ufw status verbose
# Package, configuration, and service.
named -v
sudo named-checkconf
sudo named-checkzone aspireclan.com /etc/bind/zones/db.aspireclan.com
sudo named-checkzone tidyshelves.com /etc/bind/zones/db.tidyshelves.com
sudo named-checkzone shelvera.com /etc/bind/zones/db.shelvera.com
systemctl is-enabled bind9
systemctl is-active bind9
sudo rndc status
sudo ss -lntup | grep -E ':53\b'
# Authoritative UDP and TCP.
for zone in aspireclan.com tidyshelves.com shelvera.com; do
dig @192.168.8.4 "${zone}" SOA +short +time=3 +tries=1
dig @192.168.8.4 "${zone}" SOA +tcp +short +time=3 +tries=1
done
# Forwarding and DNSSEC.
dig @192.168.8.4 github.com A +short +time=3 +tries=1
dig @192.168.8.4 cloudflare.com A +dnssec +comments
# Critical internal names.
dig @192.168.8.4 harbor.aspireclan.com A +short
dig @192.168.8.4 ac-cicd-api.aspireclan.com A +short
# Logs.
sudo journalctl -u bind9 -n 100 --no-pager
Required final state:
Hostname: prod-dns-01
Reserved IP: 192.168.8.4
BIND: active and enabled
Authoritative zones: aspireclan.com, tidyshelves.com, shelvera.com
Public forwarding: working through BIND
DNSSEC validation: enabled
UDP 53: approved client CIDRs only
TCP 53: approved client CIDRs only
SSH: approved management CIDRs only
UFW: active, deny incoming, deny routed, allow outgoing
Infrastructure clients: 192.168.8.4 only
Public internet exposure: none
Off-host backup: available
44. Mandatory Security Rules
No public inbound DNS access.
No open recursive resolver.
No TCP or UDP 53 rule for Anywhere.
No public resolver configured beside 192.168.8.4 on infrastructure clients.
No BIND forwarder pointing back to 192.168.8.4.
No unauthenticated dynamic DNS updates.
No unrestricted zone transfers.
No permanent StrictHostKeyChecking=no after bootstrap.
No manual production change left uncommitted in Git.
No zone deployment without named-checkconf and named-checkzone.
No zone change without an incremented SOA serial.
No DNS-only Git deployment assumed successful when Ansible was skipped.
No unrelated service hosted on prod-dns-01.
No backup retained only on prod-dns-01.
No second client resolver that bypasses internal split DNS.
45. Complete Implementation Order
For a new build or disaster-recovery rebuild, proceed in this order:
1. Confirm the approved DNS VM identity.
2. Verify the IP and VM ID are not in conflicting use.
3. Create or confirm the router reservation.
4. Confirm dns.tfvars, variables.tf, main.tf, and outputs.tf.
5. Confirm the [dns] inventory entry.
6. Run Terraform init, format check, validate, and plan.
7. Confirm only prod-dns-01 is created or intentionally replaced.
8. Apply the reviewed plan.
9. Verify DHCP address and SSH access.
10. Verify the SSH host key through the Proxmox console.
11. Run the common Ubuntu bootstrap.
12. Install BIND9 packages.
13. Correct the shelvera.com ns1 address to 192.168.8.4.
14. Review all zone records and increment each changed serial.
15. Render the hardened named.conf.options.
16. Deploy named.conf.local and zone files.
17. Run named-checkconf and named-checkzone.
18. Add role-specific UFW rules.
19. Enable and start BIND.
20. Validate listeners, UDP, TCP, and all zones locally.
21. Apply the final fail-closed firewall baseline.
22. Validate all zones remotely over UDP and TCP.
23. Validate public forwarding and DNSSEC.
24. Configure prod-dns-01 to resolve through local BIND.
25. Configure infrastructure clients to use 192.168.8.4 only.
26. Reboot and repeat functional validation.
27. Create and copy an off-host backup.
28. Enable monitoring.
29. Record the installed BIND version and validation date.
For an existing server update, begin at Section 25 and do not run Terraform apply unless the reviewed plan contains an intentional infrastructure change.
46. Source Consistency Status
This page was created from the attached Terraform repository and the supplied Production Vault MDX layout reference.
Page type: new file
Output file: prod-dns-01-setup.mdx
Prior DNS MDX supplied: no
Diff baseline: /dev/null
Attached Terraform Git HEAD: b2ec4fa
Layout reference: Production Vault Setup / K8S Infrastructure Overview style
Docusaurus left documentation sidebar: preserved
Desktop table of contents: right side, 260px
Custom floating anchor panel: not created
CustomCodeBlock usage: consistent
Section numbering: sequential 1 through 48
Source values preserved from the attached repository:
VM: prod-dns-01
VM ID: 3156004
MAC: aa:bb:cc:04:03:01
Reserved IP: 192.168.8.4
CPU: 2
RAM: 4096 MiB
Disk: 40G
Template: tmplt-ub-26-min-base
Management CIDR: 192.168.8.0/24
Current DNS client CIDR: 192.168.8.0/22
Forwarders: 1.1.1.1 and 8.8.8.8
Zones: aspireclan.com, tidyshelves.com, shelvera.com
Ansible collection: community.general 13.0.1
Meaningful corrections and enhancements in this new page:
1. Corrects the documented shelvera.com ns1 address from 192.168.8.60 to 192.168.8.4.
2. Identifies templates/named.conf.options.j2 as the active source and the duplicate static named.conf.options as obsolete.
3. Adds forward only to prevent unintended direct-recursion fallback.
4. Adds explicit listen addresses, transfer denial, minimal responses, and version hiding.
5. Explains authoritative split-DNS NXDOMAIN behavior and the need to preserve required public records internally.
6. Requires TCP and UDP DNS validation.
7. Adds secure SSH host-key enrollment instead of permanent host-key checking bypass.
8. Documents the DNS server's own local-resolver bootstrap sequence without creating a forwarding loop.
9. Identifies the current workflow behavior that skips Ansible for DNS-only changes.
10. Establishes workflow_dispatch with ansible_limit=prod-dns-01 as the approved existing-server update path.
11. Adds SOA serial, backup, restore, reboot, monitoring, package-update, and secondary-DNS procedures.
12. Clearly identifies separate first execution sections for a new build and an existing-server update.
The attached archive does not include a Docusaurus documentation repository, package.json, site configuration, or the CustomCodeBlock implementation. Therefore, a complete Docusaurus production build cannot be run from the supplied files alone. The accompanying validation report records the static MDX/JSX, structure, source, and embedded-configuration checks that were possible.
47. Official References
- Ubuntu Server: Install and configure DNS
- Ubuntu Server: DNSSEC
- Ubuntu Server: BIND9 DNSSEC cryptography selection
- ISC BIND 9 Administrator Reference Manual
- ISC BIND 9 configuration reference
- ISC BIND 9 configuration and zone files
- ISC guidance for allow-recursion and allow-query-cache
- Ansible community.general.ufw module
- Terraform CLI plan command
- GitHub Actions workflow_dispatch
48. Continuation Prompt
Use this prompt when continuing implementation or reviewing a DNS change:
We are maintaining
prod-dns-01, the internal Aspireclan BIND9 server at192.168.8.4. It is provisioned through Terraform fromtmplt-ub-26-min-baseand configured through Ansible. Ubuntu uses DHCP with the router reservationaa:bb:cc:04:03:01 → 192.168.8.4.Preserve the normal Docusaurus page structure and use the Production DNS Setup page as the operational source. Infrastructure clients must use
192.168.8.4only. Public resolvers belong only in BIND's forwarders. BIND is authoritative for the internalaspireclan.com,tidyshelves.com, andshelvera.comzones and forwards other public queries to the approved forwarders.Before applying a DNS change, validate
named-checkconf, validate every changed zone withnamed-checkzone, increment the SOA serial, review the exact diff, and deploy only toprod-dns-01. Test every zone over UDP and TCP, test public forwarding, test DNSSEC, and verify the fail-closed UFW policy.For DNS-only changes to the existing VM, use the production workflow's manual dispatch with
ansible_limit=prod-dns-01until component-based Ansible targeting is implemented. Do not recreate the VM for a normal BIND or zone update.