Skip to main content

Production DNS Setup

1. Purpose

This document is the complete implementation and operations reference for the Aspireclan production DNS server:


Hostname: prod-dns-01
Reserved IP: 192.168.8.4
Service: BIND9
Primary purpose: internal split DNS and controlled forwarding of public DNS queries

The server is the required DNS dependency for production infrastructure such as Vault, the certificate issuer, HAProxy, Harbor, Kubernetes, application servers, and administration clients.

This page covers:

  • Proxmox VM provisioning through Terraform.
  • Router-side MAC-to-IP reservation.
  • Common Ubuntu configuration through Ansible.
  • BIND9 installation and configuration.
  • Internal authoritative zones.
  • Recursive forwarding to approved public resolvers.
  • DNSSEC validation.
  • Fail-closed UFW rules for UDP and TCP port 53.
  • GitHub Actions deployment behavior.
  • New-server creation and existing-server update procedures.
  • Zone serial management.
  • Validation, logging, backup, restore, patching, and disaster recovery.

2. Implementation Status and Start Here

Current implementation status

prod-dns-01 is already provisioned and operating at 192.168.8.4.

The attached Terraform repository contains the production DNS VM definition, BIND9 Ansible role, BIND configuration, three managed forward zones, UFW rules, and workflow validation.

Use the correct starting point:

OperationFirst execution sectionPurpose
Build or rebuild a new DNS VMSection 12Begins with identity, reservation, collision, Terraform, and bootstrap checks.
Update the existing DNS VMSection 25Validates Git changes and deliberately targets prod-dns-01 without recreating the VM.

Approved current design:


Operating system: Ubuntu Server 26.04 LTS
Template: tmplt-ub-26-min-base
DNS software: Ubuntu-supported BIND9 package
Deployment: Terraform for VM; Ansible for OS, BIND, zones, and UFW
Addressing: DHCP inside Ubuntu with router-side MAC reservation
DNS IP: 192.168.8.4
Management CIDR: 192.168.8.0/24
Current DNS client ACL: 192.168.8.0/22
Forwarders:
- 1.1.1.1
- 8.8.8.8
Managed zones:
- aspireclan.com
- tidyshelves.com
- shelvera.com
Inbound firewall:
- SSH/TCP 22 from the management CIDR
- DNS/UDP 53 from approved DNS client CIDRs
- DNS/TCP 53 from approved DNS client CIDRs
- deny all other inbound and routed traffic

The installed BIND package version must be recorded from the host with named -v; this page does not hard-code a package version that may change with Ubuntu security updates.


3. Scope and Non-Goals

This page configures one internal BIND server that is both:

  • Authoritative for the internal versions of the managed zones.
  • Recursive for approved internal clients.
  • A forwarding resolver for public names not owned by the local zones.

This page does not:

  • Publish an authoritative DNS server directly to the internet.
  • Replace the public DNS provider for Aspireclan domains.
  • Configure public registrar name-server delegation to prod-dns-01.
  • Expose recursion to untrusted networks.
  • Configure DHCP service on prod-dns-01.
  • Configure encrypted DNS protocols such as DNS-over-TLS or DNS-over-HTTPS.
  • Create a second DNS server in phase one.
  • Configure dynamic DNS updates.
  • Store secrets on the DNS VM.

prod-dns-01 must remain internal-only. Public authoritative DNS remains with the approved external DNS provider.


4. Final Architecture

The current request path is:


Infrastructure VM or approved internal client
  ↓ DNS query to 192.168.8.4 over UDP/TCP 53
prod-dns-01
  ├── authoritative answer for managed internal zones
  ├── cached answer when available
  └── forwards other recursive queries to approved public resolvers
          ├── 1.1.1.1
          └── 8.8.8.8

The security boundary is:


Internet
  ✕ no inbound access to port 53
  ✕ no public recursive resolver
  ✕ no public SSH access

Approved management network
  → TCP 22

Approved DNS client networks
  → UDP 53
  → TCP 53

prod-dns-01 outbound
  → UDP/TCP 53 to approved forwarders
  → HTTPS for Ubuntu package updates
  → normal time synchronization and required infrastructure services

TCP port 53 is required in addition to UDP. DNS can use TCP for large responses, truncation fallback, DNSSEC responses, and other standards-compliant operations.


5. Split-DNS Behavior and the Internal Resolver Rule

All Aspireclan infrastructure VMs must use:


192.168.8.4

as their internal DNS resolver.

Do not configure public resolvers such as 1.1.1.1 or 8.8.8.8 directly beside 192.168.8.4 on infrastructure clients. Multiple client resolvers are not a reliable primary/fallback sequence. A client can query any configured resolver, causing internal-only records such as harbor.aspireclan.com or vault.aspireclan.com to return public data or NXDOMAIN intermittently.

Approved rule:


Infrastructure clients:
DNS server: 192.168.8.4 only

prod-dns-01 BIND configuration:
Public forwarding: 1.1.1.1 and 8.8.8.8

Because BIND is authoritative for each configured zone, it does not forward missing names inside that zone. For example, after loading an internal aspireclan.com zone, a query for an absent name.aspireclan.com returns an authoritative negative answer rather than being forwarded to public DNS.

Therefore, every public record that internal clients still need for a managed split-DNS zone must also exist in the internal zone file, normally with the same public value.


6. Approved Build Order and Dependency Role

The production infrastructure dependency order is:


1. prod-dns-01               complete and required first
2. prod-vault-01
3. prod-cert-issuer-01
4. prod-int-proxy-01

DNS must be working before later hosts are configured because those hosts rely on internal service names and public package/repository resolution.

A DNS outage does not necessarily stop already-established connections, but it prevents new name resolution and can block package installation, certificate issuance, Vault access, container pulls, and application routing.


7. VM Profile and Approved Identity

The attached production configuration defines:

VM nameVM IDMAC addressReserved IPvCPURAMDiskTemplate
prod-dns-013156004aa:bb:cc:04:03:01192.168.8.424096 MiB40Gtmplt-ub-26-min-base

Proxmox defaults from envs/prod/terraform.tfvars:


environment   = "prod"
pm_api_url    = "https://<PROXMOX_HOST>:8006/api2/json"
target_node   = "pve"
template_name = "tmplt-ub-26-min-base"
storage       = "local-lvm"
bridge        = "vmbr0"

Do not configure a static IP inside Ubuntu. The VM uses DHCP and receives 192.168.8.4 from the router reservation for aa:bb:cc:04:03:01.


8. Repository File Structure

The attached implementation uses:


.github/workflows/
└── terraform-proxmox-deploy.yml

envs/prod/
├── terraform.tfvars
├── dns.tfvars
├── main.tf
├── variables.tf
├── outputs.tf
└── ansible/
  ├── inventory.ini
  ├── requirements.yml
  ├── configure-vms.yml
  ├── bootstrap-vms.yml
  ├── configure-dns.yml
  ├── finalize-firewall.yml
  ├── group_vars/
  │   ├── all.yml
  │   └── dns.yml
  ├── templates/
  │   └── named.conf.options.j2
  └── files/
      └── bind/
          ├── named.conf.local
          └── zones/
              ├── db.aspireclan.com
              ├── db.tidyshelves.com
              └── db.shelvera.com

The attached repository also contains:


envs/prod/ansible/files/bind/named.conf.options

That static file is not deployed by configure-dns.yml; the active source is templates/named.conf.options.j2. Remove the unused static file after confirming no external process consumes it. Keeping two independent files with the same purpose creates configuration drift.


9. Terraform Responsibilities

Terraform manages only the VM resource. It does not install BIND, create zone records, modify the router, or configure Ubuntu networking.

The production variable definition is:


variable "dns_vms" {
description = "DNS VMs for this environment."
type = map(object({
  vmid        = number
  macaddr     = string
  reserved_ip = string
  cores       = number
  memory      = number
  disk_size   = string
}))
default = {}
}

The module declaration is:


module "dns_vms" {
source = "../../modules/proxmox-vm"

for_each = var.dns_vms

name          = each.key
vmid          = each.value.vmid
target_node   = var.target_node
template_name = var.template_name
storage       = var.storage
bridge        = var.bridge
macaddr       = each.value.macaddr
reserved_ip   = each.value.reserved_ip
cores         = each.value.cores
memory        = each.value.memory
disk_size     = each.value.disk_size
}

The current dns.tfvars is:


dns_vms = {
prod-dns-01 = {
  vmid        = 3156004
  macaddr     = "aa:bb:cc:04:03:01"
  reserved_ip = "192.168.8.4"
  cores       = 2
  memory      = 4096
  disk_size   = "40G"
}
}

The reserved_ip value is documentation and output metadata for the router reservation. The Proxmox provider module does not configure that address inside Ubuntu.


10. DNS Variables and Current Network Scope

The attached group_vars/dns.yml defines:


dns_management_cidrs:
- "192.168.8.0/24"

dns_client_cidrs:
- "192.168.8.0/22"

dns_obsolete_ufw_rules: []

dns_forwarders:
- "1.1.1.1"
- "8.8.8.8"

dns_zones:
- aspireclan.com
- tidyshelves.com
- shelvera.com

The current /22 DNS client ACL covers:


192.168.8.0 through 192.168.11.255

This is broader than the current /24 management network. Preserve it only when clients actually exist in the additional subnets. Otherwise, narrow both the BIND ACL and UFW rules to 192.168.8.0/24 in the same reviewed change.

Never narrow the BIND ACL without narrowing UFW, or narrow UFW without narrowing BIND. Both controls must remain consistent.


11. Approved Hardened BIND Configuration

Use envs/prod/ansible/templates/named.conf.options.j2 as the single source of truth:


acl "internal" {
  127.0.0.1;
{% for cidr in dns_client_cidrs %}
  {{ cidr }};
{% endfor %}
};

options {
  directory "/var/cache/bind";

  recursion yes;
  allow-query { internal; };
  allow-query-cache { internal; };
  allow-recursion { internal; };

  forward only;
  forwarders {
{% for forwarder in dns_forwarders %}
      {{ forwarder }};
{% endfor %}
  };

  dnssec-validation auto;
  auth-nxdomain no;

  listen-on {
      127.0.0.1;
      192.168.8.4;
  };
  listen-on-v6 { none; };

  allow-transfer { none; };
  minimal-responses yes;
  version none;
};

Important behavior:

  • allow-query, allow-query-cache, and allow-recursion restrict service to approved clients.
  • forward only prevents fallback to direct root-server recursion when both configured forwarders fail.
  • Explicit listen-on values reduce exposure even before UFW is evaluated.
  • allow-transfer { none; }; prevents unauthorized full-zone transfers.
  • dnssec-validation auto enables validating-resolver behavior.
  • version none avoids returning the exact BIND version through the normal version query.

When a secondary DNS server is added later, replace the global transfer denial with an ACL that permits AXFR/IXFR only to the approved secondary address and use TSIG authentication.


12. New-Build Start: Router Reservation and Collision Checks

This is the first execution section for creating or rebuilding prod-dns-01.

Before Terraform apply, verify the approved identity:


VM name: prod-dns-01
VM ID: 3156004
MAC: aa:bb:cc:04:03:01
Router reservation: aa:bb:cc:04:03:01 → 192.168.8.4

Check Proxmox, the router reservation table, current DHCP leases, ARP/neighbor tables, and existing documentation. From a trusted Linux machine:


ping -c 2 -W 1 192.168.8.4 || true
ip neigh show | grep -F '192.168.8.4' || true

A failed ping does not prove the address is unused. A host can ignore ICMP or be powered off while retaining a reservation.

For a disaster-recovery rebuild that reuses the same identity, power off or isolate the failed/original VM before starting the replacement. Two hosts must never use 192.168.8.4 simultaneously.


13. Terraform Validation, Plan, and Apply

From the production environment:


cd envs/prod

terraform init
terraform fmt -check -recursive
terraform validate

terraform plan \
-var-file=terraform.tfvars \
-var-file=web.tfvars \
-var-file=app.tfvars \
-var-file=db.tfvars \
-var-file=k8s.tfvars \
-var-file=runner.tfvars \
-var-file=dns.tfvars \
-out=tfplan

For a new build, the reviewed plan must show:


Create: prod-dns-01 only
VM ID: 3156004
MAC: aa:bb:cc:04:03:01
Disk: scsi0 on local-lvm, 40G
Bridge: vmbr0
Clone source: tmplt-ub-26-min-base
No replacement of unrelated web, app, database, Kubernetes, or runner VMs

Apply only the reviewed saved plan:


terraform apply tfplan

After creation, verify the VM in Proxmox and confirm that the router assigned 192.168.8.4 to the approved MAC.


14. Ansible Inventory and SSH Host-Key Trust

The attached inventory currently disables strict host-key checking. That behavior is acceptable only during a controlled first bootstrap and must not remain the permanent trust model.

Initial bootstrap entry from the attached source:


[dns]
prod-dns-01 ansible_host=192.168.8.4 ansible_user=acllc ansible_ssh_private_key_file=~/.ssh/id_ed25519_ansible ansible_python_interpreter=/usr/bin/python3 ansible_ssh_common_args='-o IdentitiesOnly=yes -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'

After verifying the SSH host-key fingerprints through the Proxmox console, enroll the key on the deployment runner:


install -d -m 0700 ~/.ssh
ssh-keygen -R 192.168.8.4
ssh-keyscan -H 192.168.8.4 >> ~/.ssh/known_hosts
chmod 0600 ~/.ssh/known_hosts
ssh-keygen -F 192.168.8.4

Then use the hardened inventory entry:


[dns]
prod-dns-01 ansible_host=192.168.8.4 ansible_user=acllc ansible_ssh_private_key_file=~/.ssh/id_ed25519_ansible ansible_python_interpreter=/usr/bin/python3 ansible_ssh_common_args='-o IdentitiesOnly=yes -o StrictHostKeyChecking=yes'

When the VM is legitimately rebuilt and its host key changes, verify the new fingerprint through the Proxmox console before replacing the known-host entry.


15. Common Ubuntu Baseline

The shared bootstrap playbook must establish:


Permanent hostname: prod-dns-01
/etc/hosts: 127.0.1.1 prod-dns-01
User: acllc
Passwordless sudo: verified
OpenSSH: active
QEMU Guest Agent: active
UFW: installed
UFW logging: low
Default incoming policy: deny
Default routed policy: deny
Default outgoing policy: allow
SSH: allowed only from approved management CIDRs
Time synchronization: active
Addressing: DHCP with router reservation

Run the bootstrap only against the DNS host:


ansible-playbook \
-i envs/prod/ansible/inventory.ini \
envs/prod/ansible/bootstrap-vms.yml \
--limit prod-dns-01

Validate identity and time before installing BIND:


hostnamectl --static
ip -brief address
ip route
timedatectl status
systemctl is-active qemu-guest-agent
sudo -n true

16. Install BIND9 and DNS Utilities

The Ansible role installs the Ubuntu-supported packages:


- bind9
- bind9utils
- dnsutils

Equivalent manual verification:


sudo apt update
sudo apt install -y bind9 bind9utils dnsutils

named -v
dig -v
apt-cache policy bind9

Do not add an unrelated third-party BIND package repository. Use Ubuntu security updates unless an explicitly reviewed requirement demands another source.

The service is managed as:


sudo systemctl enable --now bind9
systemctl is-enabled bind9
systemctl is-active bind9

17. Zone Declarations

Use envs/prod/ansible/files/bind/named.conf.local:


zone "aspireclan.com" {
  type primary;
  file "/etc/bind/zones/db.aspireclan.com";
  allow-update { none; };
  allow-transfer { none; };
  notify no;
};

zone "tidyshelves.com" {
  type primary;
  file "/etc/bind/zones/db.tidyshelves.com";
  allow-update { none; };
  allow-transfer { none; };
  notify no;
};

zone "shelvera.com" {
  type primary;
  file "/etc/bind/zones/db.shelvera.com";
  allow-update { none; };
  allow-transfer { none; };
  notify no;
};

type master remains accepted by BIND, but type primary is used here as the current terminology. Do not enable unauthenticated dynamic updates.

When a secondary DNS server is introduced, update allow-transfer and notify deliberately rather than opening transfers broadly.


18. Zone File Rules and Corrected Baseline

Every zone must contain:

  • One SOA record.
  • One internal NS record.
  • An A record for the NS host.
  • A monotonically increasing SOA serial.
  • Every internal override required by clients.
  • Every public record that internal clients still need from that managed zone.

Use a serial in YYYYMMDDNN form, for example:


2026061301

The attached source contains one confirmed inconsistency:


Current attached value:
ns1.shelvera.com → 192.168.8.60

Approved production DNS value:
ns1.shelvera.com → 192.168.8.4

Correct the shelvera.com zone before the next deployment.

Recommended SOA and NS pattern for each managed zone:


$TTL 300
@   IN  SOA ns1.<ZONE>. admin.<ZONE>. (
      <YYYYMMDDNN> ; Serial
      3600         ; Refresh
      900          ; Retry
      1209600      ; Expire
      300 )        ; Negative Cache TTL

@       IN  NS  ns1.<ZONE>.
ns1     IN  A   192.168.8.4

A 300-second TTL is suitable while infrastructure records are actively changing. Increase it later only when the operational trade-off is understood. Long TTLs make emergency record changes and rollback slower.


19. Aspireclan Internal Zone Example

A corrected structure for db.aspireclan.com is:


$TTL 300
@   IN  SOA ns1.aspireclan.com. admin.aspireclan.com. (
      <YYYYMMDDNN> ; Serial
      3600         ; Refresh
      900          ; Retry
      1209600      ; Expire
      300 )        ; Negative Cache TTL

@       IN  NS  ns1.aspireclan.com.
ns1     IN  A   192.168.8.4

; Preserve the public apex value required by internal clients.
@       IN  A   <PUBLIC_ASPIRECLAN_APEX_IP>

; Infrastructure records.
prod-dns-01        IN  A  192.168.8.4
vault              IN  A  <PROD_VAULT_IP>
harbor             IN  A  <INTERNAL_HARBOR_IP>

; Kubernetes control endpoint and nodes.
ac-cicd-api        IN  A  192.168.8.61
ac-cicd-lb-01      IN  A  192.168.8.61
ac-cicd-cp-01      IN  A  192.168.8.62
ac-cicd-cp-02      IN  A  192.168.8.63
ac-cicd-cp-03      IN  A  192.168.8.64
ac-cicd-prod-wk-01 IN  A  192.168.8.71
ac-cicd-prod-wk-02 IN  A  192.168.8.72
ac-cicd-qa-wk-01   IN  A  192.168.8.81
ac-cicd-dev-wk-01  IN  A  192.168.8.91

; Add all other required internal overrides and public records deliberately.

Do not add vault.aspireclan.com until the approved Vault address is known. Do not invent infrastructure addresses in a zone file.


20. TidyShelves and Shelvera Zone Review

The attached tidyshelves.com and shelvera.com zones contain a mixture of internal A records and public CNAME/A records.

Before each deployment:


1. Compare required public records with the public DNS provider.
2. Preserve all public names that internal clients must resolve.
3. Keep internal overrides only where split DNS is intentional.
4. Confirm that a name does not have both CNAME and other record types.
5. Confirm duplicate A records are intentional round-robin values.
6. Correct every ns1 record to 192.168.8.4.
7. Increment the SOA serial exactly once for the reviewed change set.

The duplicate local.api A records in the attached shelvera.com zone resolve to two addresses. Retain both only when active round-robin behavior is intended and both backends are healthy.

The duplicate local-ts-pgsql-db-01 A records in the attached tidyshelves.com zone must likewise be treated as an intentional multi-address result or corrected to one approved address.


21. Ansible DNS Deployment

The DNS playbook must target only:


hosts: dns

It performs:


1. Validate DNS variables.
2. Remove explicitly declared obsolete UFW rules.
3. Preserve SSH access from approved management CIDRs.
4. Allow UDP 53 from approved DNS client CIDRs.
5. Allow TCP 53 from approved DNS client CIDRs.
6. Install BIND9 packages.
7. Create /etc/bind/zones.
8. Render named.conf.options from Jinja variables.
9. Copy named.conf.local.
10. Copy all declared zone files.
11. Run named-checkconf.
12. Run named-checkzone for every zone.
13. Restart BIND only after configuration deployment.
14. Verify service and port listeners.
15. Test every local zone over UDP and TCP.

Run directly when needed:


ansible-galaxy collection install \
-r envs/prod/ansible/requirements.yml

ansible-playbook \
-i envs/prod/ansible/inventory.ini \
envs/prod/ansible/configure-dns.yml \
--limit prod-dns-01 \
--syntax-check

ansible-playbook \
-i envs/prod/ansible/inventory.ini \
envs/prod/ansible/configure-dns.yml \
--limit prod-dns-01

The attached collection pin is:


community.general: 13.0.1

22. UFW Policy

The final firewall baseline is:


Default incoming: deny
Default routed: deny
Default outgoing: allow
Logging: low

Allow TCP 22 from:
192.168.8.0/24

Allow UDP 53 from:
192.168.8.0/22 currently

Allow TCP 53 from:
192.168.8.0/22 currently

The playbook must never run ufw reset on a remotely managed server.

Validate locally:


sudo ufw status numbered
sudo ufw status verbose
sudo ss -lntup | grep -E ':(22|53)\b'

Validate remotely from an approved DNS client:


nc -vz 192.168.8.4 53
dig @192.168.8.4 aspireclan.com SOA +time=3 +tries=1
dig @192.168.8.4 aspireclan.com SOA +tcp +time=3 +tries=1

Do not create 53/tcp or 53/udp rules for Anywhere.


23. New-Build Workflow Execution

The production workflow runs on the prod branch and includes dns.tfvars when present.

For an initial VM creation, Terraform detects prod-dns-01 as created, derives the Ansible target from the plan, applies Terraform, and runs the shared configuration chain:


bootstrap-vms.yml
→ configure-web.yml
→ configure-dns.yml
→ finalize-firewall.yml

The --limit prod-dns-01 boundary means only matching plays affect the DNS host.

Before the first production run, confirm:


Git branch: prod
Runner labels: self-hosted, prod, terraform, deploy
PM_API_TOKEN_ID: configured as a GitHub secret
PM_API_TOKEN_SECRET: configured as a GitHub secret
Router reservation: present
Inventory host: prod-dns-01 → 192.168.8.4
SSH key: available to the deployment user
Ansible collection requirements: resolvable

Never print Proxmox token values in workflow logs.


24. First Deployment Validation

Run from the deployment runner or another approved client:


# Authoritative zones over UDP.
dig @192.168.8.4 aspireclan.com SOA +noall +answer
dig @192.168.8.4 tidyshelves.com SOA +noall +answer
dig @192.168.8.4 shelvera.com SOA +noall +answer

# Authoritative zones over TCP.
dig @192.168.8.4 aspireclan.com SOA +tcp +noall +answer
dig @192.168.8.4 tidyshelves.com SOA +tcp +noall +answer
dig @192.168.8.4 shelvera.com SOA +tcp +noall +answer

# Internal record.
dig @192.168.8.4 prod-dns-01.aspireclan.com A +noall +answer

# Public forwarding.
dig @192.168.8.4 github.com A +noall +answer

# DNSSEC validation behavior.
dig @192.168.8.4 cloudflare.com A +dnssec +comments

# Service and firewall on the server.
ssh acllc@192.168.8.4 \
'systemctl is-active bind9 && sudo named-checkconf && sudo ufw status verbose'

Expected state:


BIND service: active
UDP 53: listening on 127.0.0.1 and 192.168.8.4
TCP 53: listening on 127.0.0.1 and 192.168.8.4
Managed zones: authoritative answers returned
Public names: forwarded and cached
UFW: active with fail-closed defaults
External inbound DNS: not exposed

25. Existing-Server Update Start

This is the first execution section for updating the already-created prod-dns-01.

Do not recreate the VM for normal BIND, zone, UFW, or Ansible changes.

Before deployment:


cd <REPOSITORY_ROOT>

git status --short
git diff -- envs/prod/ansible .github/workflows/terraform-proxmox-deploy.yml

cd envs/prod
terraform fmt -check -recursive
terraform validate

ansible-playbook \
-i ansible/inventory.ini \
ansible/configure-vms.yml \
--limit prod-dns-01 \
--syntax-check

Validate BIND files locally on an Ubuntu/BIND validation host when available:


named-checkconf <RENDERED_NAMED_CONF_ROOT>
named-checkzone aspireclan.com <RENDERED_DB_ASPIRECLAN_FILE>
named-checkzone tidyshelves.com <RENDERED_DB_TIDYSHELVES_FILE>
named-checkzone shelvera.com <RENDERED_DB_SHELVERA_FILE>

Then merge the reviewed change to prod and deliberately run the deployment with:


workflow_dispatch input:
ansible_limit: prod-dns-01

The explicit limit is currently required for DNS-only changes because a Terraform plan with no VM create/update action does not automatically select an Ansible target.


26. Safe Zone-Record Update Procedure

For each zone change:


1. Edit only the required zone file in Git.
2. Confirm the record owner name is relative or fully qualified as intended.
3. Confirm A, AAAA, CNAME, MX, TXT, and SRV syntax.
4. Confirm no CNAME shares an owner with another record type.
5. Increment the SOA serial.
6. Run named-checkzone.
7. Review the exact Git diff.
8. Deploy only to prod-dns-01.
9. Query the record directly from 192.168.8.4.
10. Query an unaffected record and a public forwarded name.
11. Review BIND logs.

Example direct reload after a validated emergency change on the server:


sudo named-checkzone aspireclan.com /etc/bind/zones/db.aspireclan.com
sudo rndc reload aspireclan.com
sudo rndc zonestatus aspireclan.com

dig @127.0.0.1 <CHANGED_NAME> <RECORD_TYPE> +noall +answer

The emergency change must be committed back to Git immediately. Git remains the source of truth.


27. Safe BIND Configuration Update Procedure

For changes to ACLs, forwarders, listening addresses, or zone declarations:


1. Update group_vars/dns.yml or the appropriate template/file.
2. Keep BIND ACLs and UFW client CIDRs synchronized.
3. Render or deploy to a validation host when possible.
4. Run named-checkconf.
5. Validate every zone.
6. Review the Git diff.
7. Run Ansible with --limit prod-dns-01.
8. Confirm the handler restart succeeds.
9. Test local zones over UDP and TCP.
10. Test a public forwarded name.
11. Confirm UFW rules and listeners.

Use rndc reconfig or a controlled service reload only after named-checkconf succeeds. The Ansible role currently uses a restart handler; that is acceptable for this single-node homelab but creates a brief DNS interruption.

A future improvement may replace the restart with reload/reconfig behavior when the exact change permits it.


28. GitHub Actions Targeting Consistency

The attached workflow correctly includes dns.tfvars, validates DNS files, installs Ansible requirements, and tests each DNS zone externally over UDP and TCP.

Current limitation:


DNS file changed
  ↓
Terraform plan contains no VM create/update
  ↓
No plan-derived Ansible target
  ↓
Ansible is skipped unless workflow_dispatch ansible_limit is supplied

Until the workflow is enhanced, every DNS-only deployment must use:


ansible_limit: prod-dns-01

Recommended workflow enhancement:


When changed files match any of these paths:
envs/prod/ansible/configure-dns.yml
envs/prod/ansible/group_vars/dns.yml
envs/prod/ansible/templates/named.conf.options.j2
envs/prod/ansible/files/bind/**

And no manual ansible_limit was supplied:
set ansible_limit=dns
set target_source=DNS configuration change

Keep manual input highest priority, plan-derived changed VM names second, and component-derived targets third. Never default to all for a DNS-only change.


29. SOA Serial Management

Use one monotonically increasing 32-bit unsigned serial per zone.

Approved format:


YYYYMMDDNN

YYYY = four-digit year
MM   = two-digit month
DD   = two-digit day
NN   = revision number for that day, starting at 01

Example:


First change on June 13, 2026: 2026061301
Second change on June 13, 2026: 2026061302

Do not reset a serial to a lower value. A future secondary server relies on serial ordering to detect zone updates.

Check the active serial:


dig @192.168.8.4 aspireclan.com SOA +short
dig @192.168.8.4 tidyshelves.com SOA +short
dig @192.168.8.4 shelvera.com SOA +short

30. Add a New Managed Zone

To add a zone such as <NEW_ZONE>:


1. Confirm that internal split DNS is actually required.
2. Inventory the public records internal clients still need.
3. Create ansible/files/bind/zones/db.<NEW_ZONE>.
4. Add a primary zone block to named.conf.local.
5. Add <NEW_ZONE> to dns_zones.
6. Run named-checkzone.
7. Run named-checkconf.
8. Deploy with --limit prod-dns-01.
9. Test SOA, NS, internal records, and preserved public records.
10. Test an intentionally missing name and confirm the expected NXDOMAIN behavior.

Do not add an entire public domain as an internal authoritative zone merely to override one name without understanding the requirement to maintain all needed records in that zone.

When only a small override is required, evaluate whether a dedicated internal subdomain is safer and easier to maintain.


31. Remove a Managed Zone

Removing a zone changes resolution behavior from internal authoritative answers to public forwarding.

Procedure:


1. Confirm no internal-only records still depend on the zone.
2. Export and back up the current zone file.
3. Remove the zone block from named.conf.local.
4. Remove the zone from dns_zones.
5. Remove the managed file only after the first successful deployment.
6. Run named-checkconf.
7. Deploy with --limit prod-dns-01.
8. Flush BIND cache for the affected name or zone when necessary.
9. Confirm queries now return the intended public DNS answers.
10. Retain the Git history and backup for rollback.

Do not delete a zone file from the server manually while the active configuration still references it.


32. Configure Infrastructure Clients

Infrastructure VMs should receive 192.168.8.4 through DHCP or an Ansible-managed Netplan override.

Example Netplan override for a normal infrastructure client:


network:
version: 2
ethernets:
  ens18:
    dhcp4: true
    dhcp4-overrides:
      use-dns: false
    nameservers:
      addresses:
        - 192.168.8.4

Apply and verify:


sudo netplan generate
sudo netplan apply
sudo resolvectl flush-caches

resolvectl dns ens18
getent ahostsv4 harbor.aspireclan.com
getent ahostsv4 github.com

Expected client resolver list:


192.168.8.4 only

The client must not list 1.1.1.1, 8.8.8.8, or the router as an additional resolver when split DNS consistency is required.


33. Configure prod-dns-01's Own Resolver

The DNS server is a special case. It should not depend on an unavailable external client resolver after BIND is healthy.

Approved sequence:


1. During first package installation, use the temporary DHCP-provided resolver.
2. Start BIND and validate local authoritative and forwarded queries.
3. Configure the host resolver to send queries to local BIND.
4. Keep public resolvers only in BIND's forwarders block.

Example systemd-resolved drop-in after BIND validation:


sudo install -d -m 0755 /etc/systemd/resolved.conf.d

sudo tee /etc/systemd/resolved.conf.d/prod-dns-01.conf >/dev/null <<'RESOLVED_EOF'
[Resolve]
DNS=127.0.0.1
FallbackDNS=
Domains=~.
RESOLVED_EOF

sudo systemctl restart systemd-resolved
sudo resolvectl flush-caches

resolvectl status
getent ahostsv4 github.com

Do not configure BIND to forward to 192.168.8.4; that would forward back to itself and create a loop.

Before rebooting after this change, verify both BIND and systemd-resolved are enabled.


34. Forwarding and DNSSEC Validation

Test forwarding:


dig @192.168.8.4 github.com A +noall +answer +stats
dig @192.168.8.4 ubuntu.com AAAA +noall +answer +stats

Test DNSSEC validation with a known signed domain:


dig @192.168.8.4 cloudflare.com A +dnssec +comments

A validated response commonly includes the ad flag when the requesting client permits it and validation succeeds.

Test a known DNSSEC failure domain supplied by an authoritative DNSSEC test service only when that service is currently documented and available. Do not permanently add insecure validation exceptions merely to make a broken external domain resolve.

Check BIND's validation and forwarding logs:


sudo journalctl -u bind9 --since '15 minutes ago' --no-pager

35. Cache Operations

Display server status:


sudo rndc status

Flush one name after an emergency record correction:


sudo rndc flushname <FQDN>

Flush an entire domain subtree only when necessary:


sudo rndc flushtree <DOMAIN>

Flush the complete cache only as a last resort:


sudo rndc flush

Do not routinely flush the cache after every authoritative zone reload. Authoritative data is reloaded separately, and unnecessary full-cache flushes increase external query volume and latency.


36. Logging and Troubleshooting

Primary checks:


systemctl status bind9 --no-pager
sudo journalctl -u bind9 -n 200 --no-pager
sudo journalctl -u bind9 --since today --no-pager
sudo named-checkconf
sudo rndc status
sudo ss -lntup | grep -E ':53\b'
sudo ufw status verbose

Query diagnostics:


dig @127.0.0.1 aspireclan.com SOA +comments +answer
dig @192.168.8.4 github.com A +comments +answer +stats
dig @192.168.8.4 <FQDN> <TYPE> +tcp +comments +answer

Use +trace carefully: it traces delegation directly from the querying machine and does not test the same forwarding path as an ordinary recursive query. For normal resolver validation, query @192.168.8.4 without +trace.

Common failure categories:

SymptomLikely causeFirst checks
Internal name intermittently returns NXDOMAINClient has public resolvers configured beside 192.168.8.4resolvectl status, Netplan, DHCP options
All public names failForwarders unreachable, outbound firewall issue, or BIND stoppedrndc status, journal, direct reachability to forwarders
One managed-zone public name fails internallyRecord missing from the internal authoritative zoneZone file, public DNS comparison, SOA serial
UDP works but TCP failsMissing TCP 53 UFW rule or listener problemufw status, ss -lntup, dig +tcp
Changes do not appear after Git pushWorkflow skipped Ansible because Terraform had no VM changesRun workflow dispatch with ansible_limit=prod-dns-01

37. Backup Procedure

Back up configuration after every approved change and before package upgrades.

Required data:


/etc/bind/named.conf
/etc/bind/named.conf.options
/etc/bind/named.conf.local
/etc/bind/named.conf.default-zones
/etc/bind/zones/
/etc/default/named when present
Repository commit containing Terraform and Ansible sources
Router DHCP reservation export or documented identity mapping
Installed package versions

Example local archive before an upgrade:


stamp="$(date -u +%Y%m%dT%H%M%SZ)"
sudo tar \
--create \
--gzip \
--file "/var/backups/prod-dns-01-bind-${stamp}.tar.gz" \
/etc/bind

The rendered command creates a timestamped archive under /var/backups.

Copy backups off the VM. Git protects declarative configuration, but it does not replace router reservation data, package inventory, or tested recovery procedures.


38. Disaster Recovery and Restore

Required recovery inputs:


Terraform and Ansible repository
Router reservation for aa:bb:cc:04:03:01 → 192.168.8.4
Proxmox VM identity and template
SSH administrative access
Validated BIND configuration and zone files
Latest known package-version record
Off-host configuration backup

Recovery order:


1. Isolate or power off the failed VM.
2. Confirm 192.168.8.4 is safe to reuse.
3. Recreate prod-dns-01 through Terraform.
4. Verify DHCP assigned 192.168.8.4.
5. Verify and enroll the new SSH host key.
6. Run the common bootstrap with --limit prod-dns-01.
7. Deploy BIND and zones through Ansible.
8. Apply the final fail-closed firewall policy.
9. Validate every managed zone over UDP and TCP.
10. Validate public forwarding and DNSSEC.
11. Reconfigure the host to resolve through local BIND.
12. Validate critical infrastructure names.
13. Monitor logs and client behavior.

A single DNS server is a single point of failure. A future prod-dns-02 should be added before DNS availability becomes a strict production requirement.


39. Reboot Validation

Before reboot:


sudo named-checkconf
sudo systemctl is-enabled bind9
sudo systemctl is-active bind9
sudo ufw status verbose
resolvectl status

Reboot:


sudo reboot

After reboot, validate from an independent approved client:


for zone in aspireclan.com tidyshelves.com shelvera.com; do
dig @192.168.8.4 "${zone}" SOA +short +time=3 +tries=1
dig @192.168.8.4 "${zone}" SOA +tcp +short +time=3 +tries=1
done

dig @192.168.8.4 github.com A +short +time=3 +tries=1

On the server:


systemctl is-active bind9
sudo rndc status
sudo journalctl -u bind9 -b --no-pager

40. Package Update Procedure

BIND receives security updates from Ubuntu. Do not freeze it indefinitely.

Before updating:


1. Review Ubuntu security notices and package changelog.
2. Validate current BIND configuration and zones.
3. Save an off-host configuration backup.
4. Record the current named -v output.
5. Confirm an approved maintenance window.

Update:


sudo apt update
apt list --upgradable 2>/dev/null | grep -E '^(bind9|bind9-utils|bind9utils|dnsutils)/' || true
sudo apt upgrade

After update:


named -v
sudo named-checkconf
sudo systemctl restart bind9
systemctl is-active bind9
sudo journalctl -u bind9 -n 100 --no-pager

Then run the complete UDP, TCP, internal-zone, forwarding, DNSSEC, and UFW validation checklist.


41. Monitoring and Alerting

At minimum, monitor:


Host reachability: 192.168.8.4
UDP 53 response
TCP 53 response
SOA query for every managed zone
Recursive query for a stable public domain
BIND systemd state
Disk space
Memory pressure
Time synchronization
UFW state
Recent BIND errors
Zone serial drift

Recommended probe interval:


Every 1 to 5 minutes for service health
After every deployment for full functional validation
Daily for backup presence and package/security review metadata

Do not use a probe that exposes secrets or attempts zone transfers. A normal SOA query is sufficient for authoritative-zone health.


42. Future Secondary DNS Design

The next availability improvement is:


prod-dns-01: primary
prod-dns-02: secondary

Future requirements:

  • A unique reserved IP, MAC, and VM ID for prod-dns-02.
  • A second failure domain where practical.
  • TSIG-authenticated zone transfers.
  • allow-transfer restricted to the secondary.
  • also-notify or normal NOTIFY behavior restricted to the secondary.
  • Client DHCP/Netplan configuration listing both internal DNS servers.
  • Independent monitoring of both servers.
  • Tested primary loss and recovery.

Do not list a public resolver as the second client DNS server. The correct redundancy model is two internal split-DNS servers containing the same zones.


43. Complete Validation Checklist

Run before declaring the DNS service ready:


# Identity and networking.
hostnamectl --static
ip -brief address
ip route

# Time and firewall.
timedatectl status
sudo ufw status verbose

# Package, configuration, and service.
named -v
sudo named-checkconf
sudo named-checkzone aspireclan.com /etc/bind/zones/db.aspireclan.com
sudo named-checkzone tidyshelves.com /etc/bind/zones/db.tidyshelves.com
sudo named-checkzone shelvera.com /etc/bind/zones/db.shelvera.com
systemctl is-enabled bind9
systemctl is-active bind9
sudo rndc status
sudo ss -lntup | grep -E ':53\b'

# Authoritative UDP and TCP.
for zone in aspireclan.com tidyshelves.com shelvera.com; do
dig @192.168.8.4 "${zone}" SOA +short +time=3 +tries=1
dig @192.168.8.4 "${zone}" SOA +tcp +short +time=3 +tries=1
done

# Forwarding and DNSSEC.
dig @192.168.8.4 github.com A +short +time=3 +tries=1
dig @192.168.8.4 cloudflare.com A +dnssec +comments

# Critical internal names.
dig @192.168.8.4 harbor.aspireclan.com A +short
dig @192.168.8.4 ac-cicd-api.aspireclan.com A +short

# Logs.
sudo journalctl -u bind9 -n 100 --no-pager

Required final state:


Hostname: prod-dns-01
Reserved IP: 192.168.8.4
BIND: active and enabled
Authoritative zones: aspireclan.com, tidyshelves.com, shelvera.com
Public forwarding: working through BIND
DNSSEC validation: enabled
UDP 53: approved client CIDRs only
TCP 53: approved client CIDRs only
SSH: approved management CIDRs only
UFW: active, deny incoming, deny routed, allow outgoing
Infrastructure clients: 192.168.8.4 only
Public internet exposure: none
Off-host backup: available

44. Mandatory Security Rules


No public inbound DNS access.
No open recursive resolver.
No TCP or UDP 53 rule for Anywhere.
No public resolver configured beside 192.168.8.4 on infrastructure clients.
No BIND forwarder pointing back to 192.168.8.4.
No unauthenticated dynamic DNS updates.
No unrestricted zone transfers.
No permanent StrictHostKeyChecking=no after bootstrap.
No manual production change left uncommitted in Git.
No zone deployment without named-checkconf and named-checkzone.
No zone change without an incremented SOA serial.
No DNS-only Git deployment assumed successful when Ansible was skipped.
No unrelated service hosted on prod-dns-01.
No backup retained only on prod-dns-01.
No second client resolver that bypasses internal split DNS.

45. Complete Implementation Order

For a new build or disaster-recovery rebuild, proceed in this order:


1. Confirm the approved DNS VM identity.
2. Verify the IP and VM ID are not in conflicting use.
3. Create or confirm the router reservation.
4. Confirm dns.tfvars, variables.tf, main.tf, and outputs.tf.
5. Confirm the [dns] inventory entry.
6. Run Terraform init, format check, validate, and plan.
7. Confirm only prod-dns-01 is created or intentionally replaced.
8. Apply the reviewed plan.
9. Verify DHCP address and SSH access.
10. Verify the SSH host key through the Proxmox console.
11. Run the common Ubuntu bootstrap.
12. Install BIND9 packages.
13. Correct the shelvera.com ns1 address to 192.168.8.4.
14. Review all zone records and increment each changed serial.
15. Render the hardened named.conf.options.
16. Deploy named.conf.local and zone files.
17. Run named-checkconf and named-checkzone.
18. Add role-specific UFW rules.
19. Enable and start BIND.
20. Validate listeners, UDP, TCP, and all zones locally.
21. Apply the final fail-closed firewall baseline.
22. Validate all zones remotely over UDP and TCP.
23. Validate public forwarding and DNSSEC.
24. Configure prod-dns-01 to resolve through local BIND.
25. Configure infrastructure clients to use 192.168.8.4 only.
26. Reboot and repeat functional validation.
27. Create and copy an off-host backup.
28. Enable monitoring.
29. Record the installed BIND version and validation date.

For an existing server update, begin at Section 25 and do not run Terraform apply unless the reviewed plan contains an intentional infrastructure change.


46. Source Consistency Status

This page was created from the attached Terraform repository and the supplied Production Vault MDX layout reference.


Page type: new file
Output file: prod-dns-01-setup.mdx
Prior DNS MDX supplied: no
Diff baseline: /dev/null
Attached Terraform Git HEAD: b2ec4fa
Layout reference: Production Vault Setup / K8S Infrastructure Overview style
Docusaurus left documentation sidebar: preserved
Desktop table of contents: right side, 260px
Custom floating anchor panel: not created
CustomCodeBlock usage: consistent
Section numbering: sequential 1 through 48

Source values preserved from the attached repository:


VM: prod-dns-01
VM ID: 3156004
MAC: aa:bb:cc:04:03:01
Reserved IP: 192.168.8.4
CPU: 2
RAM: 4096 MiB
Disk: 40G
Template: tmplt-ub-26-min-base
Management CIDR: 192.168.8.0/24
Current DNS client CIDR: 192.168.8.0/22
Forwarders: 1.1.1.1 and 8.8.8.8
Zones: aspireclan.com, tidyshelves.com, shelvera.com
Ansible collection: community.general 13.0.1

Meaningful corrections and enhancements in this new page:


1. Corrects the documented shelvera.com ns1 address from 192.168.8.60 to 192.168.8.4.
2. Identifies templates/named.conf.options.j2 as the active source and the duplicate static named.conf.options as obsolete.
3. Adds forward only to prevent unintended direct-recursion fallback.
4. Adds explicit listen addresses, transfer denial, minimal responses, and version hiding.
5. Explains authoritative split-DNS NXDOMAIN behavior and the need to preserve required public records internally.
6. Requires TCP and UDP DNS validation.
7. Adds secure SSH host-key enrollment instead of permanent host-key checking bypass.
8. Documents the DNS server's own local-resolver bootstrap sequence without creating a forwarding loop.
9. Identifies the current workflow behavior that skips Ansible for DNS-only changes.
10. Establishes workflow_dispatch with ansible_limit=prod-dns-01 as the approved existing-server update path.
11. Adds SOA serial, backup, restore, reboot, monitoring, package-update, and secondary-DNS procedures.
12. Clearly identifies separate first execution sections for a new build and an existing-server update.

The attached archive does not include a Docusaurus documentation repository, package.json, site configuration, or the CustomCodeBlock implementation. Therefore, a complete Docusaurus production build cannot be run from the supplied files alone. The accompanying validation report records the static MDX/JSX, structure, source, and embedded-configuration checks that were possible.


47. Official References


48. Continuation Prompt

Use this prompt when continuing implementation or reviewing a DNS change:

We are maintaining prod-dns-01, the internal Aspireclan BIND9 server at 192.168.8.4. It is provisioned through Terraform from tmplt-ub-26-min-base and configured through Ansible. Ubuntu uses DHCP with the router reservation aa:bb:cc:04:03:01 → 192.168.8.4.

Preserve the normal Docusaurus page structure and use the Production DNS Setup page as the operational source. Infrastructure clients must use 192.168.8.4 only. Public resolvers belong only in BIND's forwarders. BIND is authoritative for the internal aspireclan.com, tidyshelves.com, and shelvera.com zones and forwards other public queries to the approved forwarders.

Before applying a DNS change, validate named-checkconf, validate every changed zone with named-checkzone, increment the SOA serial, review the exact diff, and deploy only to prod-dns-01. Test every zone over UDP and TCP, test public forwarding, test DNSSEC, and verify the fail-closed UFW policy.

For DNS-only changes to the existing VM, use the production workflow's manual dispatch with ansible_limit=prod-dns-01 until component-based Ansible targeting is implemented. Do not recreate the VM for a normal BIND or zone update.