Skip to main content

Configure Internal DNS Resolver on Cloned VMs

1. 🧭 Scope Legend

Use these markers to distinguish fixed instructions from values that identify this environment and workflow.

1.COMMON · NO CHANGE NEEDEDRun exactly as documented for the reusable non-DNS clone resolver role, rollout, validation, persistence test, and rollback workflow.
2.CHANGE FOR DNS CLIENT ROLLOUTReview values that identify prod-dns-01, validation records, inventory groups, rollout size, and Ansible paths.

2. 🎯 Purpose and Scope

COMMON · NO CHANGE NEEDEDMake every non-DNS Ubuntu clone use prod-dns-01 exclusively while retaining DHCP addressing and router-side MAC-to-IP reservations.
DOCUMENT REVISION
  Version: 2026-06-16.2
  Purpose: Make every non-DNS Ubuntu clone use prod-dns-01 exclusively while retaining DHCP addressing.

SCOPE
  Applies to Vault, certificate issuer, proxies, Terraform deployment servers,
  web/app servers, Kubernetes nodes, and future non-DNS clones.

EXCLUSION
  prod-dns-01 is configured by bootstrap-prod-dns-01.mdx and must not receive this client role.

Netplan continues to obtain the IP address, route, and gateway from DHCP. Only DHCP-provided DNS and search domains are rejected. Public DNS resolution remains available through BIND forwarding on 192.168.8.4.

Status and consistency update — June 16, 2026 · revision 2026-06-16.2: this page now uses the same component placement, scope legend, scoped section headings, full-width content grid, 260px right anchor panel, mobile behavior, input styling, and browser-persisted values as the working tmplt-ub-26-min-base page.

3. 🧱 Resolver Inputs

CHANGE FOR DNS CLIENT ROLLOUTReview the DNS server, test records, target inventory group, rolling batch size, and Ansible paths. These inputs are stored locally in this browser.

4. 🧭 Inventory Boundary

COMMON · NO CHANGE NEEDEDCHANGE FOR DNS CLIENT ROLLOUTTarget only non-DNS clones. prod-dns-01 must remain outside the internal DNS client group.
[prod_dns]
prod-dns-01 ansible_host=192.168.8.4

[internal_dns_clients:children]
vault
certificate_issuers
proxies
terraform_deploy
web
app
k8s

# Add the real hosts under the child groups above.
# Do not add prod-dns-01 to internal_dns_clients.

The client group can contain Vault, certificate issuer, proxy, Terraform deployment, web, app, and Kubernetes hosts. The DNS server itself must remain outside this group.

5. 🧩 Create the Reusable Ansible Role

COMMON · NO CHANGE NEEDEDCHANGE FOR DNS CLIENT ROLLOUTCreate the defaults, Netplan template, and validation tasks that enforce prod-dns-01 as the only resolver on non-DNS clones.

5.1 envs/common/ansible/roles/internal-dns-resolver/defaults/main.yml

---
internal_dns_server: "192.168.8.4"
internal_dns_interface_override: ""
internal_dns_netplan_path: "/etc/netplan/99-internal-dns.yaml"
internal_dns_internal_test_name: "harbor.aspireclan.com"
internal_dns_internal_test_expected_ipv4: "192.168.8.5"
internal_dns_public_test_name: "ubuntu.com"
internal_dns_excluded_hosts:
  - "prod-dns-01"

5.2 envs/common/ansible/roles/internal-dns-resolver/templates/99-internal-dns.yaml.j2

network:
  version: 2
  renderer: networkd
  ethernets:
    {{ internal_dns_interface }}:
      dhcp4: true
      dhcp4-overrides:
        use-dns: false
        use-domains: false
      nameservers:
        addresses:
          - {{ internal_dns_server }}

5.3 envs/common/ansible/roles/internal-dns-resolver/tasks/main.yml

---
- name: Refuse to apply the client resolver role to the DNS server
  ansible.builtin.assert:
    that:
      - inventory_hostname not in internal_dns_excluded_hosts
    fail_msg: >-
      This role is for non-DNS clones. Configure prod-dns-01 through the
      bootstrap-prod-dns-01 procedure instead.

- name: Select the primary network interface
  ansible.builtin.set_fact:
    internal_dns_interface: >-
      {{
        internal_dns_interface_override
        if internal_dns_interface_override | length > 0
        else ansible_default_ipv4.interface
      }}

- name: Validate the selected interface
  ansible.builtin.assert:
    that:
      - internal_dns_interface | length > 0
      - internal_dns_interface in ansible_interfaces
    fail_msg: "Unable to identify a valid primary interface for internal DNS."

- name: Prove prod-dns-01 resolves the required internal record before changing Netplan
  ansible.builtin.command:
    argv:
      - dig
      - "@{{ internal_dns_server }}"
      - "{{ internal_dns_internal_test_name }}"
      - A
      - +short
  register: internal_dns_direct_internal_query
  changed_when: false
  failed_when: >-
    internal_dns_internal_test_expected_ipv4 not in
    internal_dns_direct_internal_query.stdout_lines

- name: Prove prod-dns-01 forwards a public query before changing Netplan
  ansible.builtin.command:
    argv:
      - dig
      - "@{{ internal_dns_server }}"
      - "{{ internal_dns_public_test_name }}"
      - A
      - +short
  register: internal_dns_direct_public_query
  changed_when: false
  failed_when: >-
    internal_dns_direct_public_query.stdout_lines |
    select('match', '^[0-9]+(\\.[0-9]+){3}$') |
    list |
    length == 0

- name: Install the internal DNS Netplan override
  ansible.builtin.template:
    src: 99-internal-dns.yaml.j2
    dest: "{{ internal_dns_netplan_path }}"
    owner: root
    group: root
    mode: "0600"
  register: internal_dns_netplan

- name: Validate the complete Netplan configuration
  ansible.builtin.command: netplan generate
  changed_when: false

- name: Apply the internal DNS Netplan override
  ansible.builtin.command: netplan apply
  when: internal_dns_netplan.changed

- name: Wait for SSH after applying Netplan
  ansible.builtin.wait_for_connection:
    delay: 2
    sleep: 2
    timeout: 90

- name: Flush systemd-resolved caches
  ansible.builtin.command: resolvectl flush-caches
  changed_when: false

- name: Verify that the selected link uses only prod-dns-01
  ansible.builtin.shell: |
    set -euo pipefail
    actual="$(resolvectl dns '{{ internal_dns_interface }}' | sed -E 's/^Link [0-9]+ \([^)]*\):[[:space:]]*//' | xargs)"

    if [ "${actual}" != "{{ internal_dns_server }}" ]; then
      echo "ERROR: Expected only {{ internal_dns_server }}, found: ${actual}" >&2
      exit 1
    fi

    if printf '%s\n' "${actual}" | grep -Eq '(^|[[:space:]])(1\.1\.1\.1|8\.8\.8\.8)([[:space:]]|$)'; then
      echo "ERROR: A public resolver remains on the client link." >&2
      exit 1
    fi
  args:
    executable: /bin/bash
  changed_when: false

- name: Verify internal resolution through the system resolver
  ansible.builtin.command:
    argv:
      - getent
      - ahostsv4
      - "{{ internal_dns_internal_test_name }}"
  register: internal_dns_system_internal_query
  changed_when: false
  failed_when: >-
    internal_dns_internal_test_expected_ipv4 not in
    internal_dns_system_internal_query.stdout

- name: Verify public resolution through BIND forwarding
  ansible.builtin.command:
    argv:
      - getent
      - ahostsv4
      - "{{ internal_dns_public_test_name }}"
  changed_when: false

6. 📄 Create the Main Playbook

COMMON · NO CHANGE NEEDEDCHANGE FOR DNS CLIENT ROLLOUTApply the reusable resolver role in controlled rolling batches instead of changing all DNS clients at once.

6.1 envs/common/ansible/configure-internal-dns-resolver.yml

---
- name: Configure the internal resolver on non-DNS Ubuntu clones
  hosts: internal_dns_clients
  become: true
  gather_facts: true
  serial: "25%"

  roles:
    - role: internal-dns-resolver

The rolling serial value prevents all DNS clients from being reconfigured at once and stops the rollout when a batch fails validation.

7. ▶️ Apply to Non-DNS Clones

COMMON · NO CHANGE NEEDEDCHANGE FOR DNS CLIENT ROLLOUTRun the rollout only after bootstrap-prod-dns-01 reports a passing DNS health gate.

Run only after bootstrap-prod-dns-01.mdx reports a passing DNS health gate.

set -euo pipefail

ansible-playbook \
  -i "envs/common/ansible/inventory.ini" \
  "envs/common/ansible/configure-internal-dns-resolver.yml" \
  --diff

8. 🔄 Prove DHCP Renewal and Reboot Persistence

COMMON · NO CHANGE NEEDEDCHANGE FOR DNS CLIENT ROLLOUTUse exactly one canary clone to prove DHCP renewal cannot restore router-advertised DNS and the setting survives reboot.

8.1 envs/common/ansible/verify-internal-dns-persistence.yml

---
- name: Prove internal DNS survives DHCP renewal and reboot on one canary clone
  hosts: internal_dns_clients
  become: true
  gather_facts: true
  serial: 1
  any_errors_fatal: true

  vars:
    internal_dns_canary_required: true

  pre_tasks:
    - name: Require an explicit one-host limit
      ansible.builtin.assert:
        that:
          - ansible_play_hosts_all | length == 1
        fail_msg: >-
          Run this persistence test with --limit and exactly one non-DNS canary host.

  tasks:
    - name: Include the internal DNS resolver role
      ansible.builtin.include_role:
        name: internal-dns-resolver

    - name: Renew the DHCP lease through systemd-networkd
      ansible.builtin.command:
        argv:
          - networkctl
          - renew
          - "{{ internal_dns_interface }}"
      changed_when: true

    - name: Wait for SSH after DHCP renewal
      ansible.builtin.wait_for_connection:
        delay: 3
        sleep: 3
        timeout: 120

    - name: Re-verify DNS after DHCP renewal
      ansible.builtin.include_role:
        name: internal-dns-resolver

    - name: Reboot the canary clone
      ansible.builtin.reboot:
        reboot_timeout: 600
        post_reboot_delay: 10

    - name: Re-verify DNS after reboot
      ansible.builtin.include_role:
        name: internal-dns-resolver

8.2 Run against exactly one canary

set -euo pipefail

CANARY_HOST="dev-app-01"

ansible-playbook \
  -i "envs/common/ansible/inventory.ini" \
  "envs/common/ansible/verify-internal-dns-persistence.yml" \
  --limit "${CANARY_HOST}" \
  --diff

After the canary passes, the same Netplan file is safe for the remaining non-DNS clones. The test proves that a DHCP renewal cannot reintroduce router-advertised public resolvers and that the setting survives reboot.

9. 🧪 Manual Validation

COMMON · NO CHANGE NEEDEDCHANGE FOR DNS CLIENT ROLLOUTConfirm the selected interface uses exactly prod-dns-01 and both internal and public resolution succeed.

Run on any configured clone. The expected final DNS list is exactly 192.168.8.4.

set -euo pipefail

DNS_SERVER="192.168.8.4"
INTERNAL_NAME="harbor.aspireclan.com"
INTERNAL_IP="192.168.8.5"
PUBLIC_NAME="ubuntu.com"
PRIMARY_IFACE="$(ip -o route show default | awk '{print $5; exit}')"

actual="$(resolvectl dns "${PRIMARY_IFACE}" | sed -E 's/^Link [0-9]+ \([^)]*\):[[:space:]]*//' | xargs)"
printf 'Interface: %s\nDNS:       %s\n' "${PRIMARY_IFACE}" "${actual}"
test "${actual}" = "${DNS_SERVER}"
! printf '%s\n' "${actual}" | grep -Eq '(^|[[:space:]])(1\.1\.1\.1|8\.8\.8\.8)([[:space:]]|$)'

getent ahostsv4 "${INTERNAL_NAME}" | grep -F "${INTERNAL_IP}"
getent ahostsv4 "${PUBLIC_NAME}" >/dev/null

dig "@${DNS_SERVER}" "${INTERNAL_NAME}" A +short | grep -Fx "${INTERNAL_IP}"
dig "@${DNS_SERVER}" "${PUBLIC_NAME}" A +short | grep -Eq '^[0-9]+(\.[0-9]+){3}$'

echo 'INTERNAL DNS CLIENT VALIDATION: PASS'

10. ↩️ Rollback

COMMON · NO CHANGE NEEDEDRemove only the clone-side Netplan override and return the VM to DHCP-provided DNS.

Use the Proxmox console if SSH or DNS is unavailable. This removes only the clone-side override and returns the VM to DHCP-provided DNS.

set -euo pipefail

sudo rm -f /etc/netplan/99-internal-dns.yaml
sudo netplan generate
sudo netplan apply
sudo resolvectl flush-caches

PRIMARY_IFACE="$(ip -o route show default | awk '{print $5; exit}')"
resolvectl dns "${PRIMARY_IFACE}"

11. 🏁 Finished State

COMMON · NO CHANGE NEEDEDCHANGE FOR DNS CLIENT ROLLOUTStop here only after rollout, canary persistence, and manual validation all pass.
ACCEPTANCE CHECKPOINT
  DHCP addressing:                  retained
  Router MAC-to-IP reservation:     retained
  Client DNS server:                192.168.8.4 only
  Public resolvers on clients:      absent
  Internal record:                  harbor.aspireclan.com -> 192.168.8.5
  Public forwarding test:           ubuntu.com resolves
  DHCP renewal persistence:         passed on a canary
  Reboot persistence:               passed on a canary
  DNS server excluded from role:    prod-dns-01

Official References