Configure Repository ARC Runner Scale Set
1. 🧭 Scope Legend
The scale set is repository-specific and environment-specific, while the cluster, controller, and generic Ansible role remain shared.
2. 🎯 Page Purpose and Isolation Boundary
THIS PAGE CREATES
One environment-specific Kubernetes namespace
One GitHub App Kubernetes Secret in that namespace
One repository and environment ARC runner scale set
One ARC listener resource and listener pod in the shared controller namespace
One repository branch-to-scale-set mapping
Ephemeral runner pods only when jobs are queued
THIS PAGE REUSES
The shared ARC controller
The GitHub App and protected credential environment from page 9
The completed dev, QA, and production worker pools
The generic exact-409 / Pod-UID-replacement recovery and one-retry logic
THIS PAGE DOES NOT
Install another ARC controller
Commit any private key or Kubernetes Secret manifest
Create permanent runner VMs
Share runner pods across environments
Automatically grant unrelated repositories accessCOMMON · NO CHANGE NEEDED
Shared Kubernetes cluster and API VIP
Shared ARC controller in arc-systems
ARC custom resource definitions
First-control-plane administration path
Generic modular ARC runner-scale-set Ansible role
Server-side rendered-manifest validation
Exact 409 and rapid same-name listener replacement detection
Scoped cleanup, 120-second lease wait, and one automatic retry
Same-Pod-UID listener stability and diagnostic-log verification
CHANGE PER GITHUB ORG / PRODUCT
Tenant short form
GitHub organization
Protected organization credential environment
GitHub App identifiers and private-key secret
Kubernetes GitHub App secret name
Harbor project naming
CHANGE PER REPOSITORY
GitHub repository name and URL
Runner scale-set name
Helm release name
Repository runner group
Repository-specific values and workflow
CHANGE PER DEPLOYMENT BRANCH
dev, qa, or prod source branch
Kubernetes runner namespace
Worker node selector and taint toleration
Minimum and maximum runner capacity
GitHub repository Environment
runs-on value used by application workflows runs-on value generated on this page.3. 🧾 Required Inputs
Common cluster inputs
GitHub organization or product inputs
Repository inputs
Deployment-branch inputs
4. 🧮 Derived Names and Routing Values
https://github.com/<<GITHUB_ORGANIZATION>>/<<REPOSITORY_NAME>>arc-org-<<APP_SHORT_FORM>>arc-runners-<<APP_SHORT_FORM>>-dev<<APP_SHORT_FORM>>-arc-ghapp-secret<<REPOSITORY_NAME>>-dev-arc<<REPOSITORY_NAME>>-dev-arcdevdevenvironment=dev, workload=github-runnerenvironment=dev:NoSchedule5. 📊 From-Scratch Sequence Checkpoint
FROM-SCRATCH SEQUENCE CHECKPOINT
Required before this page
Load balancer, API VIP, and three control planes operational
Development, QA, and production worker pools Ready
15 Kubernetes nodes Ready
Shared ARC controller deployed as arc in arc-systems
GitHub organization App and protected credential environment verified
Implemented by this page
One environment-specific runner namespace
One runtime-created GitHub App Kubernetes Secret
One repository and environment runner scale set
One repository branch-to-runs-on mapping
ARC listener verification using an unchanged Kubernetes Pod UID
Ephemeral Docker-in-Docker runner validation
Source consistency
Generic ARC role files are created once and reused unchanged
Repository/environment files must match the cleaned infrastructure repository
dev validates only; prod reconciles Kubernetes and Helm resources
Not retained from an earlier installation
Namespace, Secret, Helm release, AutoscalingRunnerSet, listener, and runner pods
must all be recreated and verified during the clean rebuild.6. 🔬 Verify the Shared Controller and Environment Worker Pool
ssh \
-i ~/.ssh/id_ed25519_ansible \
-o IdentitiesOnly=yes \
acllc@192.168.8.202 \
'sudo bash -s' <<'REMOTE'
set -euo pipefail
export KUBECONFIG=/etc/kubernetes/admin.conf
HELM=/usr/local/bin/helm
echo "=== SHARED ARC RELEASE ==="
"${HELM}" status arc -n arc-systems
echo
echo "=== CONTROLLER DEPLOYMENT ==="
kubectl get deployment,pods -n arc-systems -o wide
echo
echo "=== ARC CRDS ==="
kubectl get crd | grep actions.github.com
echo
echo "=== CONTROLLER SERVICE ACCOUNT ==="
kubectl get serviceaccount arc-gha-rs-controller -n arc-systems
echo
echo "=== ENVIRONMENT WORKER POOL ==="
kubectl get nodes -l environment=dev,workload=github-runner -o wide
REMOTE7. 🌿 Create the Feature Branch
feature/configure-<<REPOSITORY_NAME>>-dev-arc
↓ merge or push
dev
↓ validate YAML, Ansible targeting, and secret hygiene only
dev → prod promotion
↓ the merge creates a prod push
prod
↓ create namespace, reconcile GitHub App secret, install scale set, verify listener
There is no pull_request workflow trigger.
Pull requests may still be used for review.
The target application repository uses its own branch mapping:
dev → runs-on: <<REPOSITORY_NAME>>-dev-arc → dev workerscd D:\code\ASPIRECLAN-LLC-Org\ac-cicd-infra
git switch dev
git pull --ff-only origin dev
git switch -c feature/configure-<<REPOSITORY_NAME>>-dev-arc8. 📁 Create the Environment Runner Namespace Manifest
Create kubernetes/tenants/<<APP_SHORT_FORM>>/dev/namespace/namespace.yaml:
apiVersion: v1
kind: Namespace
metadata:
name: arc-runners-<<APP_SHORT_FORM>>-dev
labels:
app.kubernetes.io/part-of: arc-runners
aspireclan.com/tenant: <<APP_SHORT_FORM>>
aspireclan.com/environment: dev9. 🗂️ Create the Repository and Branch Definition
Create kubernetes/tenants/<<APP_SHORT_FORM>>/dev/runner-scale-sets/<<REPOSITORY_NAME>>.yaml:
schemaVersion: 1
tenant:
displayName: "<<TENANT_OR_PRODUCT_NAME>>"
shortForm: "<<APP_SHORT_FORM>>"
github:
organization: "<<GITHUB_ORGANIZATION>>"
repository: "<<REPOSITORY_NAME>>"
repositoryUrl: "https://github.com/<<GITHUB_ORGANIZATION>>/<<REPOSITORY_NAME>>"
credentialEnvironment: "arc-org-<<APP_SHORT_FORM>>"
githubAppSecretName: "<<APP_SHORT_FORM>>-arc-ghapp-secret"
runnerScaleSet:
name: "<<REPOSITORY_NAME>>-dev-arc"
helmRelease: "<<REPOSITORY_NAME>>-dev-arc"
namespace: "arc-runners-<<APP_SHORT_FORM>>-dev"
chartVersion: "0.14.2"
runnerGroup: "Default"
containerMode: "dind"
minRunners: 0
maxRunners: 4
branchMapping:
sourceBranch: "dev"
githubEnvironment: "dev"
runsOn: "<<REPOSITORY_NAME>>-dev-arc"
kubernetesEnvironment: "dev"
placement:
nodeSelector:
environment: "dev"
workload: github-runner
toleration:
key: environment
operator: Equal
value: "dev"
effect: NoSchedule
security:
privateKeyCommittedToGit: false
kubernetesSecretManifestCommitted: false
githubAppSecretCreatedAtRuntime: true
runnerPodsAreEphemeral: true10. 🎛️ Create the Helm Values File
Create helm/tenants/<<APP_SHORT_FORM>>/dev/<<REPOSITORY_NAME>>-values.yaml:
githubConfigUrl: "https://github.com/<<GITHUB_ORGANIZATION>>/<<REPOSITORY_NAME>>"
githubConfigSecret: "<<APP_SHORT_FORM>>-arc-ghapp-secret"
runnerGroup: "Default"
runnerScaleSetName: "<<REPOSITORY_NAME>>-dev-arc"
minRunners: 0
maxRunners: 4
containerMode:
type: "dind"
controllerServiceAccount:
namespace: "arc-systems"
name: "arc-gha-rs-controller"
listenerTemplate:
spec:
containers:
- name: listener
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
template:
spec:
nodeSelector:
environment: "dev"
workload: "github-runner"
tolerations:
- key: environment
operator: Equal
value: "dev"
effect: NoSchedule
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
actions.github.com/scale-set-name: "<<REPOSITORY_NAME>>-dev-arc"
listenerTemplate is customized, listenerTemplate.spec.containers must contain a container named listener. Without it, Kubernetes rejects the AutoscalingRunnerSet with spec.listenerTemplate.spec.containers: Required value.11. 🧰 Create the Generic Runner Scale-Set Ansible Role
Create ansible/roles/arc-runner-scale-set/defaults/main.yml:
---
arc_scale_set_admin_hostname: "cicd-ac-k8s-cp-01"
arc_scale_set_admin_ip: "192.168.8.202"
arc_scale_set_kubeconfig: "/etc/kubernetes/admin.conf"
arc_scale_set_helm_binary: "/usr/local/bin/helm"
arc_scale_set_chart: >-
oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set
arc_scale_set_controller_namespace: "arc-systems"
arc_scale_set_namespace: ""
arc_scale_set_release_name: ""
arc_scale_set_name: ""
arc_scale_set_chart_version: ""
arc_scale_set_values_source: ""
arc_scale_set_namespace_manifest_source: ""
arc_scale_set_github_secret_name: ""
arc_scale_set_github_app_id: "{{ lookup('env', 'ARC_GITHUB_APP_ID') }}"
arc_scale_set_github_app_installation_id: >-
{{ lookup('env', 'ARC_GITHUB_APP_INSTALLATION_ID') }}
arc_scale_set_github_app_private_key_source: >-
{{ lookup('env', 'ARC_GITHUB_APP_PRIVATE_KEY_FILE') }}
arc_scale_set_work_root: "/etc/ac-cicd-infra/arc-runner-scale-sets"
# Recovery is narrowly scoped to the current runner scale set.
arc_scale_set_enable_session_conflict_recovery: true
arc_scale_set_session_release_wait_seconds: 120
# Discover a Ready listener without relying on a long kubectl wait against a
# name that ARC may delete and recreate with a different Kubernetes Pod UID.
arc_scale_set_listener_discovery_attempts: 60
arc_scale_set_listener_poll_seconds: 2
# The same listener Pod UID must remain Running and Ready for this interval.
arc_scale_set_listener_stability_seconds: 30
Create ansible/roles/arc-runner-scale-set/tasks/main.yml:
---
- name: Confirm the runner scale set is managed from the first control plane
ansible.builtin.assert:
that:
- inventory_hostname == groups['first_control_plane'][0]
- inventory_hostname == arc_scale_set_admin_hostname
- ansible_facts["default_ipv4"]["address"] == arc_scale_set_admin_ip
- arc_scale_set_controller_namespace | length > 0
- arc_scale_set_namespace | length > 0
- arc_scale_set_release_name | length > 0
- arc_scale_set_name | length > 0
- arc_scale_set_chart_version | length > 0
- arc_scale_set_values_source | length > 0
- arc_scale_set_namespace_manifest_source | length > 0
- arc_scale_set_github_secret_name | length > 0
- arc_scale_set_github_app_id | length > 0
- arc_scale_set_github_app_installation_id | length > 0
- arc_scale_set_github_app_private_key_source | length > 0
- (arc_scale_set_listener_discovery_attempts | int) > 0
- (arc_scale_set_listener_poll_seconds | int) > 0
- (arc_scale_set_listener_stability_seconds | int) > 0
fail_msg: >-
Runner scale-set variables or GitHub App credentials are missing.
- name: Confirm local runner scale-set source files exist
ansible.builtin.stat:
path: "{{ item }}"
delegate_to: localhost
become: false
loop:
- "{{ arc_scale_set_values_source }}"
- "{{ arc_scale_set_namespace_manifest_source }}"
- "{{ arc_scale_set_github_app_private_key_source }}"
register: arc_scale_set_source_files
- name: Assert every local runner scale-set source file exists
ansible.builtin.assert:
that:
- item.stat.exists
- item.stat.isreg
loop: "{{ arc_scale_set_source_files.results }}"
loop_control:
label: "{{ item.stat.path | default('unknown') }}"
- name: Create the remote runner scale-set working directory
ansible.builtin.file:
path: "{{ arc_scale_set_work_root }}/{{ arc_scale_set_release_name }}"
state: directory
owner: root
group: root
mode: "0700"
- name: Copy the namespace manifest
ansible.builtin.copy:
src: "{{ arc_scale_set_namespace_manifest_source }}"
dest: >-
{{ arc_scale_set_work_root }}/{{ arc_scale_set_release_name }}/namespace.yaml
owner: root
group: root
mode: "0644"
- name: Copy the Git-managed runner scale-set values
ansible.builtin.copy:
src: "{{ arc_scale_set_values_source }}"
dest: >-
{{ arc_scale_set_work_root }}/{{ arc_scale_set_release_name }}/values.yaml
owner: root
group: root
mode: "0644"
- name: Copy the temporary GitHub App private key
ansible.builtin.copy:
src: "{{ arc_scale_set_github_app_private_key_source }}"
dest: >-
{{ arc_scale_set_work_root }}/{{ arc_scale_set_release_name }}/github-app-private-key.pem
owner: root
group: root
mode: "0600"
no_log: true
- name: Reconcile the repository runner scale set
block:
- name: Apply the environment runner namespace
ansible.builtin.command:
argv:
- kubectl
- "--kubeconfig={{ arc_scale_set_kubeconfig }}"
- apply
- "--filename={{ arc_scale_set_work_root }}/{{ arc_scale_set_release_name }}/namespace.yaml"
register: arc_scale_set_namespace_apply
changed_when: >-
'created' in arc_scale_set_namespace_apply.stdout or
'configured' in arc_scale_set_namespace_apply.stdout
- name: Reconcile the GitHub App Kubernetes Secret
ansible.builtin.shell:
executable: /bin/bash
cmd: |
set -euo pipefail
kubectl \
--kubeconfig={{ arc_scale_set_kubeconfig }} \
create secret generic {{ arc_scale_set_github_secret_name }} \
--namespace={{ arc_scale_set_namespace }} \
--from-literal=github_app_id='{{ arc_scale_set_github_app_id }}' \
--from-literal=github_app_installation_id='{{ arc_scale_set_github_app_installation_id }}' \
--from-file=github_app_private_key={{ arc_scale_set_work_root }}/{{ arc_scale_set_release_name }}/github-app-private-key.pem \
--dry-run=client \
--output=yaml |
kubectl \
--kubeconfig={{ arc_scale_set_kubeconfig }} \
apply \
--filename=-
register: arc_scale_set_secret_apply
changed_when: >-
'created' in arc_scale_set_secret_apply.stdout or
'configured' in arc_scale_set_secret_apply.stdout
no_log: true
- name: Render the pinned runner scale-set chart
ansible.builtin.shell:
executable: /bin/bash
cmd: |
set -euo pipefail
{{ arc_scale_set_helm_binary }} template \
{{ arc_scale_set_release_name }} \
{{ arc_scale_set_chart }} \
--namespace {{ arc_scale_set_namespace }} \
--version {{ arc_scale_set_chart_version }} \
--values {{ arc_scale_set_work_root }}/{{ arc_scale_set_release_name }}/values.yaml \
--kubeconfig {{ arc_scale_set_kubeconfig }} \
> {{ arc_scale_set_work_root }}/{{ arc_scale_set_release_name }}/rendered.yaml
changed_when: false
- name: Validate rendered runner scale-set resources with the Kubernetes API
ansible.builtin.command:
argv:
- kubectl
- "--kubeconfig={{ arc_scale_set_kubeconfig }}"
- apply
- --dry-run=server
- "--filename={{ arc_scale_set_work_root }}/{{ arc_scale_set_release_name }}/rendered.yaml"
changed_when: false
- name: Perform the first installation and listener probe
ansible.builtin.include_tasks: install-and-probe.yml
- name: Reject a non-recoverable first-attempt failure
ansible.builtin.assert:
that:
- >-
arc_scale_set_attempt_requires_recovery or
(
(arc_scale_set_attempt_helm_rc | int) == 0 and
(arc_scale_set_attempt_listener_rc | int) == 0
)
fail_msg: >-
The ARC installation or listener failed without an exact 409 conflict
or the rapid same-name listener replacement pattern that permits one
scoped recovery attempt. Helm rc={{ arc_scale_set_attempt_helm_rc }},
listener rc={{ arc_scale_set_attempt_listener_rc }}.
Helm stderr={{ arc_scale_set_helm_attempt.stderr | default('') }}
Listener output={{ arc_scale_set_listener_probe.stdout | default('') }}
- name: Recover an active or suspected stale listener session
ansible.builtin.include_tasks: recover-session-conflict.yml
when:
- arc_scale_set_enable_session_conflict_recovery | bool
- arc_scale_set_attempt_requires_recovery | bool
- name: Perform one clean installation retry after listener-session recovery
ansible.builtin.include_tasks: install-and-probe.yml
when:
- arc_scale_set_enable_session_conflict_recovery | bool
- arc_scale_set_attempt_requires_recovery | bool
- name: Confirm the final Helm installation and listener are healthy
ansible.builtin.assert:
that:
- (arc_scale_set_attempt_helm_rc | int) == 0
- (arc_scale_set_attempt_listener_rc | int) == 0
- not (arc_scale_set_attempt_session_conflict | bool)
- not (arc_scale_set_attempt_requires_recovery | bool)
fail_msg: >-
The final ARC installation did not become stable.
Helm rc={{ arc_scale_set_attempt_helm_rc }},
listener rc={{ arc_scale_set_attempt_listener_rc }},
exact session conflict={{ arc_scale_set_attempt_session_conflict }},
additional recovery required={{ arc_scale_set_attempt_requires_recovery }}.
Helm stderr={{ arc_scale_set_helm_attempt.stderr | default('') }}
Listener output={{ arc_scale_set_listener_probe.stdout | default('') }}
- name: Verify the AutoscalingRunnerSet exists in the runner namespace
ansible.builtin.command:
argv:
- kubectl
- "--kubeconfig={{ arc_scale_set_kubeconfig }}"
- get
- autoscalingrunnerset.actions.github.com
- "{{ arc_scale_set_name }}"
- "--namespace={{ arc_scale_set_namespace }}"
changed_when: false
- name: Verify the AutoscalingListener exists in the controller namespace
ansible.builtin.shell:
executable: /bin/bash
cmd: |
set -euo pipefail
kubectl \
--kubeconfig={{ arc_scale_set_kubeconfig }} \
get autoscalinglisteners.actions.github.com \
--namespace={{ arc_scale_set_controller_namespace }} \
--output=name |
grep -E \
'^autoscalinglistener\.actions\.github\.com/{{ arc_scale_set_name }}-[a-z0-9]+-listener$'
changed_when: false
- name: Display the installed runner scale-set release
ansible.builtin.command:
argv:
- "{{ arc_scale_set_helm_binary }}"
- status
- "{{ arc_scale_set_release_name }}"
- --namespace
- "{{ arc_scale_set_namespace }}"
- --kubeconfig
- "{{ arc_scale_set_kubeconfig }}"
register: arc_scale_set_status
changed_when: false
- name: Print the runner scale-set release status
ansible.builtin.debug:
var: arc_scale_set_status.stdout_lines
always:
- name: Remove the remote temporary GitHub App private key
ansible.builtin.file:
path: >-
{{ arc_scale_set_work_root }}/{{ arc_scale_set_release_name }}/github-app-private-key.pem
state: absent
no_log: true
Create ansible/roles/arc-runner-scale-set/tasks/install-and-probe.yml:
---
- name: Install or reconcile the repository runner scale set
ansible.builtin.command:
argv:
- "{{ arc_scale_set_helm_binary }}"
- upgrade
- --install
- "{{ arc_scale_set_release_name }}"
- "{{ arc_scale_set_chart }}"
- --namespace
- "{{ arc_scale_set_namespace }}"
- --version
- "{{ arc_scale_set_chart_version }}"
- --values
- "{{ arc_scale_set_work_root }}/{{ arc_scale_set_release_name }}/values.yaml"
- --kubeconfig
- "{{ arc_scale_set_kubeconfig }}"
- --atomic
- --wait
- --timeout
- 10m
- --history-max
- "10"
- --debug
register: arc_scale_set_helm_attempt
changed_when: >-
'has been upgraded' in arc_scale_set_helm_attempt.stdout or
'has been installed' in arc_scale_set_helm_attempt.stdout
failed_when: false
- name: Probe the listener and capture exact or suspected stale sessions
ansible.builtin.shell:
executable: /bin/bash
cmd: |
set -euo pipefail
listener_pod=""
listener_uid=""
last_seen_uid=""
uid_replacements=0
for attempt in $(seq 1 {{ arc_scale_set_listener_discovery_attempts }}); do
listener_record="$(
kubectl \
--kubeconfig={{ arc_scale_set_kubeconfig }} \
get pods \
--namespace={{ arc_scale_set_controller_namespace }} \
--output=jsonpath='{range .items[*]}{.metadata.name}{"|"}{.metadata.uid}{"|"}{.status.phase}{"|"}{.status.conditions[?(@.type=="Ready")].status}{"\n"}{end}' \
2>/dev/null |
grep -E '^{{ arc_scale_set_name }}-[a-z0-9]+-listener\|' |
head -n 1 || true
)"
if [ -n "${listener_record}" ]; then
IFS='|' read -r candidate_pod candidate_uid candidate_phase candidate_ready \
<<< "${listener_record}"
if [ -n "${last_seen_uid}" ] &&
[ -n "${candidate_uid}" ] &&
[ "${candidate_uid}" != "${last_seen_uid}" ]
then
uid_replacements=$((uid_replacements + 1))
echo "Observed listener Pod UID replacement: ${last_seen_uid} -> ${candidate_uid}"
fi
if [ -n "${candidate_uid}" ]; then
last_seen_uid="${candidate_uid}"
fi
if [ "${candidate_phase}" = "Running" ] &&
[ "${candidate_ready}" = "True" ]
then
listener_pod="${candidate_pod}"
listener_uid="${candidate_uid}"
break
fi
fi
echo "Waiting for a Ready ARC listener (attempt ${attempt}/{{ arc_scale_set_listener_discovery_attempts }})..."
sleep {{ arc_scale_set_listener_poll_seconds }}
done
if [ -z "${listener_pod}" ] || [ -z "${listener_uid}" ]; then
echo "ERROR: A Ready ARC listener was not found."
echo "Observed Pod UID replacements: ${uid_replacements}"
kubectl \
--kubeconfig={{ arc_scale_set_kubeconfig }} \
get pods \
--namespace={{ arc_scale_set_controller_namespace }} \
-o wide |
grep -F '{{ arc_scale_set_name }}' || true
if [ "${uid_replacements}" -gt 0 ]; then
echo "RECOVERABLE: The same logical listener was repeatedly recreated before becoming stable."
exit 45
fi
exit 43
fi
echo "Listener became Ready: ${listener_pod}"
echo "Listener Pod UID: ${listener_uid}"
elapsed=0
while [ "${elapsed}" -lt {{ arc_scale_set_listener_stability_seconds }} ]; do
listener_record="$(
kubectl \
--kubeconfig={{ arc_scale_set_kubeconfig }} \
get pod \
"${listener_pod}" \
--namespace={{ arc_scale_set_controller_namespace }} \
--output=jsonpath='{.metadata.uid}{"|"}{.status.phase}{"|"}{.status.conditions[?(@.type=="Ready")].status}' \
2>/dev/null || true
)"
listener_logs="$(
kubectl \
--kubeconfig={{ arc_scale_set_kubeconfig }} \
logs \
--namespace={{ arc_scale_set_controller_namespace }} \
"${listener_pod}" \
--container=listener \
--tail=200 \
2>&1 || true
)"
if printf '%s\n' "${listener_logs}" |
grep -Eq \
'RunnerScaleSetSessionConflictException|already has an active session'
then
printf '%s\n' "${listener_logs}"
exit 42
fi
if [ -z "${listener_record}" ]; then
echo "RECOVERABLE: Listener disappeared during the stability interval."
echo "Original Pod UID: ${listener_uid}"
printf '%s\n' "${listener_logs}"
exit 45
fi
IFS='|' read -r current_uid current_phase current_ready \
<<< "${listener_record}"
if [ "${current_uid}" != "${listener_uid}" ]; then
echo "RECOVERABLE: Listener name was recreated with a different Pod UID."
echo "Listener pod: ${listener_pod}"
echo "Original Pod UID: ${listener_uid}"
echo "Current Pod UID: ${current_uid:-absent}"
printf '%s\n' "${listener_logs}"
exit 45
fi
if [ "${current_phase}" != "Running" ] ||
[ "${current_ready}" != "True" ]
then
echo "RECOVERABLE: Listener stopped being Running and Ready during the stability interval."
echo "Listener pod: ${listener_pod}"
echo "Pod UID: ${listener_uid}"
echo "Phase: ${current_phase:-absent}"
echo "Ready: ${current_ready:-absent}"
printf '%s\n' "${listener_logs}"
exit 45
fi
sleep {{ arc_scale_set_listener_poll_seconds }}
elapsed=$((elapsed + {{ arc_scale_set_listener_poll_seconds }}))
done
echo "Listener is stable with unchanged Pod UID: ${listener_pod} (${listener_uid})"
register: arc_scale_set_listener_probe
changed_when: false
failed_when: false
- name: Record the installation-attempt result
ansible.builtin.set_fact:
arc_scale_set_attempt_helm_rc: "{{ arc_scale_set_helm_attempt.rc | int }}"
arc_scale_set_attempt_listener_rc: "{{ arc_scale_set_listener_probe.rc | int }}"
arc_scale_set_attempt_session_conflict: >-
{{
(arc_scale_set_listener_probe.rc | int) == 42 or
'RunnerScaleSetSessionConflictException' in
(arc_scale_set_listener_probe.stdout | default('')) or
'already has an active session' in
(arc_scale_set_listener_probe.stdout | default('')) or
'RunnerScaleSetSessionConflictException' in
(arc_scale_set_helm_attempt.stdout | default('')) or
'already has an active session' in
(arc_scale_set_helm_attempt.stdout | default('')) or
'RunnerScaleSetSessionConflictException' in
(arc_scale_set_helm_attempt.stderr | default('')) or
'already has an active session' in
(arc_scale_set_helm_attempt.stderr | default(''))
}}
arc_scale_set_attempt_requires_recovery: >-
{{
(arc_scale_set_listener_probe.rc | int) in [42, 45] or
'RunnerScaleSetSessionConflictException' in
(arc_scale_set_helm_attempt.stdout | default('')) or
'already has an active session' in
(arc_scale_set_helm_attempt.stdout | default('')) or
'RunnerScaleSetSessionConflictException' in
(arc_scale_set_helm_attempt.stderr | default('')) or
'already has an active session' in
(arc_scale_set_helm_attempt.stderr | default(''))
}}
Create ansible/roles/arc-runner-scale-set/tasks/recover-session-conflict.yml:
---
- name: Explain the scoped ARC listener-session recovery
ansible.builtin.debug:
msg:
- "Recovering only runner scale set: {{ arc_scale_set_name }}"
- "Runner namespace: {{ arc_scale_set_namespace }}"
- "Listener namespace: {{ arc_scale_set_controller_namespace }}"
- "Shared ARC controller and unrelated scale sets are preserved."
- "Recovery applies to an exact 409 or rapid same-name listener Pod UID replacement."
- name: Uninstall the affected Helm release when present
ansible.builtin.shell:
executable: /bin/bash
cmd: |
set -euo pipefail
if {{ arc_scale_set_helm_binary }} status \
{{ arc_scale_set_release_name }} \
--namespace {{ arc_scale_set_namespace }} \
--kubeconfig {{ arc_scale_set_kubeconfig }} \
>/dev/null 2>&1
then
{{ arc_scale_set_helm_binary }} uninstall \
{{ arc_scale_set_release_name }} \
--namespace {{ arc_scale_set_namespace }} \
--kubeconfig {{ arc_scale_set_kubeconfig }} \
--wait \
--timeout 3m
else
echo "Helm release is not present."
fi
register: arc_scale_set_recovery_uninstall
changed_when: "'uninstalled' in arc_scale_set_recovery_uninstall.stdout"
- name: Delete a lingering AutoscalingRunnerSet
ansible.builtin.command:
argv:
- kubectl
- "--kubeconfig={{ arc_scale_set_kubeconfig }}"
- delete
- autoscalingrunnerset.actions.github.com
- "{{ arc_scale_set_name }}"
- "--namespace={{ arc_scale_set_namespace }}"
- --ignore-not-found=true
- --wait=true
- --timeout=3m
register: arc_scale_set_recovery_runner_set_delete
changed_when: "'deleted' in arc_scale_set_recovery_runner_set_delete.stdout"
- name: Delete lingering listener resources and listener pods
ansible.builtin.shell:
executable: /bin/bash
cmd: |
set -euo pipefail
changed=false
listener_resources="$(
kubectl \
--kubeconfig={{ arc_scale_set_kubeconfig }} \
get autoscalinglisteners.actions.github.com \
--namespace={{ arc_scale_set_controller_namespace }} \
--output=name \
2>/dev/null |
grep -E \
'^autoscalinglistener\.actions\.github\.com/{{ arc_scale_set_name }}-[a-z0-9]+-listener$' \
|| true
)"
if [ -n "${listener_resources}" ]; then
while IFS= read -r listener_resource; do
[ -n "${listener_resource}" ] || continue
kubectl \
--kubeconfig={{ arc_scale_set_kubeconfig }} \
delete \
--namespace={{ arc_scale_set_controller_namespace }} \
"${listener_resource}" \
--wait=true \
--timeout=3m || true
changed=true
done <<< "${listener_resources}"
fi
listener_pods="$(
kubectl \
--kubeconfig={{ arc_scale_set_kubeconfig }} \
get pods \
--namespace={{ arc_scale_set_controller_namespace }} \
--output=name \
2>/dev/null |
grep -E '^pod/{{ arc_scale_set_name }}-[a-z0-9]+-listener$' \
|| true
)"
if [ -n "${listener_pods}" ]; then
while IFS= read -r listener_pod; do
[ -n "${listener_pod}" ] || continue
kubectl \
--kubeconfig={{ arc_scale_set_kubeconfig }} \
delete \
--namespace={{ arc_scale_set_controller_namespace }} \
"${listener_pod}" \
--wait=true \
--timeout=2m || true
changed=true
done <<< "${listener_pods}"
fi
printf 'changed=%s\n' "${changed}"
register: arc_scale_set_recovery_listener_delete
changed_when: "'changed=true' in arc_scale_set_recovery_listener_delete.stdout"
- name: Wait for the affected ARC resources to disappear
ansible.builtin.shell:
executable: /bin/bash
cmd: |
set -euo pipefail
for attempt in $(seq 1 90); do
runner_set="$(
kubectl \
--kubeconfig={{ arc_scale_set_kubeconfig }} \
get autoscalingrunnerset.actions.github.com \
{{ arc_scale_set_name }} \
--namespace={{ arc_scale_set_namespace }} \
--ignore-not-found \
--output=name \
2>/dev/null || true
)"
listeners="$(
kubectl \
--kubeconfig={{ arc_scale_set_kubeconfig }} \
get autoscalinglisteners.actions.github.com \
--namespace={{ arc_scale_set_controller_namespace }} \
--output=name \
2>/dev/null |
grep -E \
'^autoscalinglistener\.actions\.github\.com/{{ arc_scale_set_name }}-[a-z0-9]+-listener$' \
|| true
)"
listener_pods="$(
kubectl \
--kubeconfig={{ arc_scale_set_kubeconfig }} \
get pods \
--namespace={{ arc_scale_set_controller_namespace }} \
--output=name \
2>/dev/null |
grep -E '^pod/{{ arc_scale_set_name }}-[a-z0-9]+-listener$' \
|| true
)"
if [ -z "${runner_set}" ] &&
[ -z "${listeners}" ] &&
[ -z "${listener_pods}" ]
then
echo "Affected ARC resources are absent."
exit 0
fi
echo "Waiting for affected ARC resources to disappear (attempt ${attempt}/90)..."
sleep 2
done
echo "ERROR: Affected ARC resources did not disappear."
exit 1
changed_when: false
- name: Wait for the GitHub Actions backend session lease to expire
ansible.builtin.pause:
seconds: "{{ arc_scale_set_session_release_wait_seconds }}"
After a future change to this common role, manually dispatch only the affected scale-set workflows. Do not add the common role path to every repository-specific workflow.
12. 📘 Create the Repository and Environment Playbook
Create ansible/playbooks/tenants/<<APP_SHORT_FORM>>/dev/install-<<REPOSITORY_NAME>>-runners.yml:
---
- name: Install the <<REPOSITORY_NAME>> dev ARC runner scale set
hosts: first_control_plane
become: true
gather_facts: true
vars:
arc_scale_set_admin_hostname: "cicd-ac-k8s-cp-01"
arc_scale_set_admin_ip: "192.168.8.202"
arc_scale_set_kubeconfig: "/etc/kubernetes/admin.conf"
arc_scale_set_helm_binary: "/usr/local/bin/helm"
arc_scale_set_controller_namespace: "arc-systems"
arc_scale_set_namespace: "arc-runners-<<APP_SHORT_FORM>>-dev"
arc_scale_set_release_name: "<<REPOSITORY_NAME>>-dev-arc"
arc_scale_set_name: "<<REPOSITORY_NAME>>-dev-arc"
arc_scale_set_chart_version: "0.14.2"
arc_scale_set_values_source: >-
{{ lookup('env', 'GITHUB_WORKSPACE') }}/helm/tenants/<<APP_SHORT_FORM>>/dev/<<REPOSITORY_NAME>>-values.yaml
arc_scale_set_namespace_manifest_source: >-
{{ lookup('env', 'GITHUB_WORKSPACE') }}/kubernetes/tenants/<<APP_SHORT_FORM>>/dev/namespace/namespace.yaml
arc_scale_set_github_secret_name: "<<APP_SHORT_FORM>>-arc-ghapp-secret"
arc_scale_set_enable_session_conflict_recovery: true
arc_scale_set_session_release_wait_seconds: 120
arc_scale_set_listener_discovery_attempts: 60
arc_scale_set_listener_poll_seconds: 2
arc_scale_set_listener_stability_seconds: 30
roles:
- role: arc-runner-scale-set
13. 🔄 Create the Infrastructure Reconciliation Workflow
Create .github/workflows/arc-<<REPOSITORY_NAME>>-dev.yml:
name: ARC Runner Scale Set - <<REPOSITORY_NAME>> - dev
on:
push:
branches:
- dev
- prod
paths:
- "kubernetes/tenants/<<APP_SHORT_FORM>>/dev/runner-scale-sets/<<REPOSITORY_NAME>>.yaml"
- "helm/tenants/<<APP_SHORT_FORM>>/dev/<<REPOSITORY_NAME>>-values.yaml"
- "ansible/playbooks/tenants/<<APP_SHORT_FORM>>/dev/install-<<REPOSITORY_NAME>>-runners.yml"
- ".github/workflows/arc-<<REPOSITORY_NAME>>-dev.yml"
workflow_dispatch:
permissions:
contents: read
concurrency:
group: shared-k8s-ansible
cancel-in-progress: false
env:
ANSIBLE_CONFIG: ${{ github.workspace }}/ansible/ansible.cfg
jobs:
validate:
name: Validate <<REPOSITORY_NAME>> dev runner scale set
runs-on:
- self-hosted
- Linux
- X64
- prod
- terraform
- deploy
- ac-cicd-infra
timeout-minutes: 30
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Verify required files
shell: bash
run: |
set -euo pipefail
required_files=(
"kubernetes/tenants/<<APP_SHORT_FORM>>/dev/namespace/namespace.yaml"
"kubernetes/tenants/<<APP_SHORT_FORM>>/dev/runner-scale-sets/<<REPOSITORY_NAME>>.yaml"
"helm/tenants/<<APP_SHORT_FORM>>/dev/<<REPOSITORY_NAME>>-values.yaml"
"ansible/roles/arc-runner-scale-set/defaults/main.yml"
"ansible/roles/arc-runner-scale-set/tasks/main.yml"
"ansible/roles/arc-runner-scale-set/tasks/install-and-probe.yml"
"ansible/roles/arc-runner-scale-set/tasks/recover-session-conflict.yml"
"ansible/playbooks/tenants/<<APP_SHORT_FORM>>/dev/install-<<REPOSITORY_NAME>>-runners.yml"
)
for file in "${required_files[@]}"; do
if [ ! -f "${file}" ]; then
echo "ERROR: Missing ${file}"
exit 1
fi
done
- name: Validate namespace, definition, and Helm values
shell: bash
run: |
set -euo pipefail
python3 - <<'PY'
from pathlib import Path
import yaml
namespace = yaml.safe_load(
Path("kubernetes/tenants/<<APP_SHORT_FORM>>/dev/namespace/namespace.yaml")
.read_text(encoding="utf-8")
)
definition = yaml.safe_load(
Path("kubernetes/tenants/<<APP_SHORT_FORM>>/dev/runner-scale-sets/<<REPOSITORY_NAME>>.yaml")
.read_text(encoding="utf-8")
)
values = yaml.safe_load(
Path("helm/tenants/<<APP_SHORT_FORM>>/dev/<<REPOSITORY_NAME>>-values.yaml")
.read_text(encoding="utf-8")
)
assert namespace["kind"] == "Namespace"
assert namespace["metadata"]["name"] == "arc-runners-<<APP_SHORT_FORM>>-dev"
assert definition["schemaVersion"] == 1
assert definition["github"]["organization"] == "<<GITHUB_ORGANIZATION>>"
assert definition["github"]["repository"] == "<<REPOSITORY_NAME>>"
assert definition["runnerScaleSet"]["name"] == "<<REPOSITORY_NAME>>-dev-arc"
assert definition["runnerScaleSet"]["namespace"] == "arc-runners-<<APP_SHORT_FORM>>-dev"
assert definition["branchMapping"]["sourceBranch"] == "dev"
assert definition["branchMapping"]["runsOn"] == "<<REPOSITORY_NAME>>-dev-arc"
assert definition["security"]["privateKeyCommittedToGit"] is False
assert definition["security"]["kubernetesSecretManifestCommitted"] is False
assert values["githubConfigUrl"] == "https://github.com/<<GITHUB_ORGANIZATION>>/<<REPOSITORY_NAME>>"
assert values["githubConfigSecret"] == "<<APP_SHORT_FORM>>-arc-ghapp-secret"
assert values["runnerScaleSetName"] == "<<REPOSITORY_NAME>>-dev-arc"
assert values["minRunners"] == 0
assert values["maxRunners"] == 4
assert values["minRunners"] <= values["maxRunners"]
assert values["containerMode"]["type"] == "dind"
listener_spec = values["listenerTemplate"]["spec"]
listener_containers = listener_spec["containers"]
assert isinstance(listener_containers, list)
assert len(listener_containers) >= 1
assert listener_containers[0]["name"] == "listener"
assert (
listener_spec["nodeSelector"][
"node-role.kubernetes.io/control-plane"
]
== ""
)
assert any(
toleration["key"] == "node-role.kubernetes.io/control-plane"
and toleration["operator"] == "Exists"
and toleration["effect"] == "NoSchedule"
for toleration in listener_spec["tolerations"]
)
assert values["template"]["spec"]["nodeSelector"]["environment"] == "dev"
assert (
values["template"]["spec"]["nodeSelector"]["workload"]
== "github-runner"
)
print("Runner scale-set configuration is valid.")
PY
- name: Reject committed private keys and Kubernetes Secret manifests
shell: bash
run: |
set -euo pipefail
tracked_key_files="$(git ls-files | grep -E '\.(pem|key|p8)$' || true)"
if [ -n "${tracked_key_files}" ]; then
echo "ERROR: Private-key files are tracked by Git:"
printf '%s\n' "${tracked_key_files}"
exit 1
fi
private_key_matches="$(
git grep \
-n \
-E \
-- '-----BEGIN ([A-Z0-9]+ )?PRIVATE KEY-----' \
-- . \
':!.github/workflows/arc-<<REPOSITORY_NAME>>-dev.yml' \
':!docs/**' || true
)"
if [ -n "${private_key_matches}" ]; then
echo "ERROR: Private-key contents were found in tracked files:"
printf '%s\n' "${private_key_matches}"
exit 1
fi
secret_manifests="$(
git grep -l -E '^kind:[[:space:]]*Secret$' -- \
'kubernetes/tenants/**' || true
)"
if [ -n "${secret_manifests}" ]; then
echo "ERROR: Kubernetes Secret manifests are committed:"
printf '%s\n' "${secret_manifests}"
exit 1
fi
echo "No private key or Kubernetes Secret manifest is tracked."
- name: Validate the Ansible playbook target
working-directory: ansible
shell: bash
run: |
set -euo pipefail
output="$(
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
playbooks/tenants/<<APP_SHORT_FORM>>/dev/install-<<REPOSITORY_NAME>>-runners.yml \
--list-hosts
)"
printf '%s\n' "${output}"
grep -Fq "cicd-ac-k8s-cp-01" <<< "${output}"
if grep -Eq 'cicd-ac-k8s-(dev|qa|prod)-wk-' <<< "${output}"; then
echo "ERROR: Runner scale-set installation playbook targets a worker node."
exit 1
fi
- name: Syntax-check the runner scale-set playbook
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
playbooks/tenants/<<APP_SHORT_FORM>>/dev/install-<<REPOSITORY_NAME>>-runners.yml \
--syntax-check
configure:
name: Install or reconcile <<REPOSITORY_NAME>> dev runner scale set
needs:
- validate
if: >-
(github.event_name == 'push' && github.ref_name == 'prod') ||
(github.event_name == 'workflow_dispatch' && github.ref_name == 'prod')
environment:
name: arc-org-<<APP_SHORT_FORM>>
runs-on:
- self-hosted
- Linux
- X64
- prod
- terraform
- deploy
- ac-cicd-infra
timeout-minutes: 90
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Verify production branch and organization credentials
shell: bash
env:
ARC_GITHUB_ORGANIZATION: ${{ vars.ARC_GITHUB_ORGANIZATION }}
ARC_GITHUB_APP_CLIENT_ID: ${{ vars.ARC_GITHUB_APP_CLIENT_ID }}
ARC_GITHUB_APP_ID: ${{ vars.ARC_GITHUB_APP_ID }}
ARC_GITHUB_APP_INSTALLATION_ID: ${{ vars.ARC_GITHUB_APP_INSTALLATION_ID }}
ARC_GITHUB_APP_PRIVATE_KEY: ${{ secrets.ARC_GITHUB_APP_PRIVATE_KEY }}
run: |
set -euo pipefail
test "${GITHUB_REF_NAME}" = "prod"
test "${ARC_GITHUB_ORGANIZATION}" = "<<GITHUB_ORGANIZATION>>"
test -n "${ARC_GITHUB_APP_CLIENT_ID}"
test -n "${ARC_GITHUB_APP_ID}"
test -n "${ARC_GITHUB_APP_INSTALLATION_ID}"
test -n "${ARC_GITHUB_APP_PRIVATE_KEY}"
- name: Create a repository-scoped GitHub App token
id: app-token
uses: actions/create-github-app-token@v3.2.0
with:
client-id: ${{ vars.ARC_GITHUB_APP_CLIENT_ID }}
private-key: ${{ secrets.ARC_GITHUB_APP_PRIVATE_KEY }}
owner: "<<GITHUB_ORGANIZATION>>"
repositories: "<<REPOSITORY_NAME>>"
- name: Verify the GitHub App can access the repository
shell: bash
env:
GH_TOKEN: ${{ steps.app-token.outputs.token }}
run: |
set -euo pipefail
full_name="$(gh api /repos/<<GITHUB_ORGANIZATION>>/<<REPOSITORY_NAME>> --jq .full_name)"
test "${full_name,,}" = "<<GITHUB_ORGANIZATION>>/<<REPOSITORY_NAME>>"
echo "Verified GitHub App repository access: ${full_name}"
- name: Prepare the existing Ansible SSH key
shell: bash
run: |
set -euo pipefail
key_path="${HOME}/.ssh/id_ed25519_ansible"
test -f "${key_path}"
chmod 600 "${key_path}"
echo "ANSIBLE_PRIVATE_KEY_FILE=${key_path}" >> "${GITHUB_ENV}"
- name: Refresh the first-control-plane SSH host key
shell: bash
run: |
set -euo pipefail
mkdir -p "${HOME}/.ssh"
chmod 700 "${HOME}/.ssh"
touch "${HOME}/.ssh/known_hosts"
chmod 600 "${HOME}/.ssh/known_hosts"
ssh-keygen -f "${HOME}/.ssh/known_hosts" -R "192.168.8.202" || true
captured=false
for attempt in $(seq 1 30); do
if ssh-keyscan -T 5 -H "192.168.8.202" >> "${HOME}/.ssh/known_hosts" 2>/dev/null; then
captured=true
break
fi
echo "Waiting for SSH on 192.168.8.202 (attempt ${attempt}/30)..."
sleep 10
done
test "${captured}" = "true"
- name: Prepare the Ansible remote temporary directory
shell: bash
run: |
set -euo pipefail
ssh \
-i "${ANSIBLE_PRIVATE_KEY_FILE}" \
-o IdentitiesOnly=yes \
-o BatchMode=yes \
"acllc@192.168.8.202" \
'sudo install -d -m 0700 -o acllc -g acllc /var/tmp/ansible-acllc'
- name: Create the temporary GitHub App private-key file
shell: bash
env:
ARC_GITHUB_APP_PRIVATE_KEY: ${{ secrets.ARC_GITHUB_APP_PRIVATE_KEY }}
run: |
set -euo pipefail
key_file="${RUNNER_TEMP}/<<APP_SHORT_FORM>>-github-app-private-key.pem"
umask 077
printf '%s' "${ARC_GITHUB_APP_PRIVATE_KEY}" > "${key_file}"
chmod 600 "${key_file}"
echo "ARC_GITHUB_APP_PRIVATE_KEY_FILE=${key_file}" >> "${GITHUB_ENV}"
- name: Install or reconcile the repository runner scale set
working-directory: ansible
shell: bash
env:
ARC_GITHUB_APP_ID: ${{ vars.ARC_GITHUB_APP_ID }}
ARC_GITHUB_APP_INSTALLATION_ID: ${{ vars.ARC_GITHUB_APP_INSTALLATION_ID }}
run: |
set -euo pipefail
ansible-playbook \
-i inventories/shared-k8s/hosts.ini \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
playbooks/tenants/<<APP_SHORT_FORM>>/dev/install-<<REPOSITORY_NAME>>-runners.yml
- name: Capture ARC diagnostics after a failed reconciliation
if: failure()
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible \
-i inventories/shared-k8s/hosts.ini \
first_control_plane \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
--become \
--extra-vars ansible_shell_executable=/bin/bash \
-m ansible.builtin.shell \
-a '
set +e
export KUBECONFIG=/etc/kubernetes/admin.conf
echo "=== HELM RELEASES ==="
/usr/local/bin/helm list -a -n arc-runners-<<APP_SHORT_FORM>>-dev
/usr/local/bin/helm status <<REPOSITORY_NAME>>-dev-arc -n arc-runners-<<APP_SHORT_FORM>>-dev
echo
echo "=== AUTOSCALING RUNNER SET ==="
kubectl get autoscalingrunnerset.actions.github.com \
<<REPOSITORY_NAME>>-dev-arc \
-n arc-runners-<<APP_SHORT_FORM>>-dev \
-o yaml
echo
echo "=== AUTOSCALING LISTENERS ==="
kubectl get autoscalinglisteners.actions.github.com \
-n arc-systems \
-o wide
echo
echo "=== LISTENER PODS ==="
kubectl get pods -n arc-systems -o wide |
grep "<<REPOSITORY_NAME>>-dev-arc" || true
echo
echo "=== LISTENER LOGS ==="
for listener_pod in $(
kubectl get pods \
-n arc-systems \
-o jsonpath="{range .items[*]}{.metadata.name}{\"\\n\"}{end}" |
grep -E "^<<REPOSITORY_NAME>>-dev-arc-[a-z0-9]+-listener$" || true
); do
echo "--- ${listener_pod} ---"
kubectl logs \
"${listener_pod}" \
-n arc-systems \
-c listener \
--tail=200 || true
done
echo
echo "=== RUNNER-NAMESPACE EVENTS ==="
kubectl get events \
-n arc-runners-<<APP_SHORT_FORM>>-dev \
--sort-by=.lastTimestamp |
tail -50
echo
echo "=== CONTROLLER-NAMESPACE EVENTS ==="
kubectl get events \
-n arc-systems \
--sort-by=.lastTimestamp |
tail -50
' || true
- name: Verify the runner scale set and stable listener Pod UID
working-directory: ansible
shell: bash
run: |
set -euo pipefail
ansible \
-i inventories/shared-k8s/hosts.ini \
first_control_plane \
--private-key "${ANSIBLE_PRIVATE_KEY_FILE}" \
--become \
--extra-vars ansible_shell_executable=/bin/bash \
-m ansible.builtin.shell \
-a '
set -euo pipefail
export KUBECONFIG=/etc/kubernetes/admin.conf
/usr/local/bin/helm status \
<<REPOSITORY_NAME>>-dev-arc \
-n arc-runners-<<APP_SHORT_FORM>>-dev
kubectl get autoscalingrunnerset.actions.github.com \
<<REPOSITORY_NAME>>-dev-arc \
-n arc-runners-<<APP_SHORT_FORM>>-dev \
-o wide
kubectl get autoscalinglisteners.actions.github.com \
-n arc-systems \
-o wide
listener_pod=""
listener_uid=""
for attempt in $(seq 1 60); do
listener_record="$(
kubectl get pods \
-n arc-systems \
-o jsonpath="{range .items[*]}{.metadata.name}{\"|\"}{.metadata.uid}{\"|\"}{.status.phase}{\"|\"}{.status.conditions[?(@.type==\"Ready\")].status}{\"\\n\"}{end}" \
2>/dev/null |
grep -E "^<<REPOSITORY_NAME>>-dev-arc-[a-z0-9]+-listener\\|" |
head -n 1 || true
)"
if [ -n "${listener_record}" ]; then
IFS="|" read -r candidate_pod candidate_uid candidate_phase candidate_ready \
<<< "${listener_record}"
if [ "${candidate_phase}" = "Running" ] &&
[ "${candidate_ready}" = "True" ]
then
listener_pod="${candidate_pod}"
listener_uid="${candidate_uid}"
break
fi
fi
sleep 2
done
test -n "${listener_pod}"
test -n "${listener_uid}"
for attempt in $(seq 1 15); do
listener_record="$(
kubectl get pod \
"${listener_pod}" \
-n arc-systems \
-o jsonpath="{.metadata.uid}{\"|\"}{.status.phase}{\"|\"}{.status.conditions[?(@.type==\"Ready\")].status}" \
2>/dev/null || true
)"
test -n "${listener_record}"
IFS="|" read -r current_uid current_phase current_ready \
<<< "${listener_record}"
test "${current_uid}" = "${listener_uid}"
test "${current_phase}" = "Running"
test "${current_ready}" = "True"
listener_logs="$(
kubectl logs \
"${listener_pod}" \
-n arc-systems \
-c listener \
--tail=200 \
2>&1 || true
)"
if printf "%s\n" "${listener_logs}" |
grep -Eq \
"RunnerScaleSetSessionConflictException|already has an active session"
then
printf "%s\n" "${listener_logs}"
exit 1
fi
sleep 2
done
echo "Stable listener pod and UID: ${listener_pod} (${listener_uid})"
kubectl get pods -n arc-runners-<<APP_SHORT_FORM>>-dev -o wide
'
- name: Remove the temporary local private-key file
if: always()
shell: bash
run: |
set -euo pipefail
if [ -n "${ARC_GITHUB_APP_PRIVATE_KEY_FILE:-}" ]; then
rm -f "${ARC_GITHUB_APP_PRIVATE_KEY_FILE}"
fi
14. 🛡️ Create the Target Repository Environment Mapping
$TargetRepository = "<<GITHUB_ORGANIZATION>>/<<REPOSITORY_NAME>>"
$RepositoryEnvironment = "dev"
$RunsOn = "<<REPOSITORY_NAME>>-dev-arc"
$RunnerNamespace = "arc-runners-<<APP_SHORT_FORM>>-dev"
Write-Host "Creating or reconciling the repository deployment Environment..."
gh api --method PUT "repos/$TargetRepository/environments/$RepositoryEnvironment"
Write-Host "Setting non-secret branch mapping variables..."
gh variable set ARC_RUNS_ON --body $RunsOn --env $RepositoryEnvironment --repo $TargetRepository
gh variable set ARC_RUNNER_NAMESPACE --body $RunnerNamespace --env $RepositoryEnvironment --repo $TargetRepository
gh variable set ARC_DEPLOYMENT_ENVIRONMENT --body $RepositoryEnvironment --env $RepositoryEnvironment --repo $TargetRepository
Write-Host "Repository Environment variables:"
gh variable list --env $RepositoryEnvironment --repo $TargetRepository
Write-Host "Repository Environment details:"
gh api "repos/$TargetRepository/environments/$RepositoryEnvironment"Restrict the dev environment to the devbranch and add required reviewers for QA or production where appropriate.
15. 🧪 Review and Commit the Infrastructure Files
COMMON — create once when the first scale set is onboarded
ansible/roles/arc-runner-scale-set/defaults/main.yml
ansible/roles/arc-runner-scale-set/tasks/main.yml
ansible/roles/arc-runner-scale-set/tasks/install-and-probe.yml
ansible/roles/arc-runner-scale-set/tasks/recover-session-conflict.yml
REPOSITORY + DEPLOYMENT BRANCH — repeat for each scale set
kubernetes/tenants/<<APP_SHORT_FORM>>/dev/namespace/namespace.yaml
kubernetes/tenants/<<APP_SHORT_FORM>>/dev/runner-scale-sets/<<REPOSITORY_NAME>>.yaml
helm/tenants/<<APP_SHORT_FORM>>/dev/<<REPOSITORY_NAME>>-values.yaml
ansible/playbooks/tenants/<<APP_SHORT_FORM>>/dev/install-<<REPOSITORY_NAME>>-runners.yml
.github/workflows/arc-<<REPOSITORY_NAME>>-dev.yml
TARGET APPLICATION REPOSITORY
.github/workflows/arc-dev-smoke-test.ymlPreserve without replacement
terraform/**
ansible/inventories/shared-k8s/**
ansible/roles/common/**
ansible/roles/containerd/**
ansible/roles/kubernetes-common/**
ansible/roles/kubernetes-control-plane/**
ansible/roles/kubernetes-worker/**
ansible/roles/arc-controller/**
ansible/playbooks/shared-k8s/01-*.yml through 09-install-arc-controller.yml
helm/common/arc-controller/**
kubernetes/common/**
existing tenant organization configurations
existing repository and environment scale sets
Never commit
GitHub App private keys
*.pem, *.key, or *.p8 files
rendered Kubernetes Secret manifests
GitHub App installation tokens
Harbor passwords
deployment SSH private keysgit status
git diff --check
git diff --stat
git diff -- \
kubernetes/tenants/<<APP_SHORT_FORM>>/dev/namespace/namespace.yaml \
kubernetes/tenants/<<APP_SHORT_FORM>>/dev/runner-scale-sets/<<REPOSITORY_NAME>>.yaml \
helm/tenants/<<APP_SHORT_FORM>>/dev/<<REPOSITORY_NAME>>-values.yaml \
ansible/roles/arc-runner-scale-set \
ansible/playbooks/tenants/<<APP_SHORT_FORM>>/dev/install-<<REPOSITORY_NAME>>-runners.yml \
.github/workflows/arc-<<REPOSITORY_NAME>>-dev.ymlgit add \
kubernetes/tenants/<<APP_SHORT_FORM>>/dev/namespace/namespace.yaml \
kubernetes/tenants/<<APP_SHORT_FORM>>/dev/runner-scale-sets/<<REPOSITORY_NAME>>.yaml \
helm/tenants/<<APP_SHORT_FORM>>/dev/<<REPOSITORY_NAME>>-values.yaml \
ansible/roles/arc-runner-scale-set/defaults/main.yml \
ansible/roles/arc-runner-scale-set/tasks/main.yml \
ansible/roles/arc-runner-scale-set/tasks/install-and-probe.yml \
ansible/roles/arc-runner-scale-set/tasks/recover-session-conflict.yml \
ansible/playbooks/tenants/<<APP_SHORT_FORM>>/dev/install-<<REPOSITORY_NAME>>-runners.yml \
.github/workflows/arc-<<REPOSITORY_NAME>>-dev.yml
git commit -m "Configure <<REPOSITORY_NAME>> dev ARC runner scale set"
git push -u origin feature/configure-<<REPOSITORY_NAME>>-dev-arc16. ✅ Validate Through dev
gh pr create --base dev --head feature/configure-<<REPOSITORY_NAME>>-dev-arc --title "Configure <<REPOSITORY_NAME>> dev ARC runner scale set" --body "Adds one isolated repository and environment runner scale set. The pull request is for review only; validation starts after the merge creates a dev push."Expected dev result:
Validate <<REPOSITORY_NAME>> dev runner scale set — success
Install or reconcile <<REPOSITORY_NAME>> dev runner scale set — skipped17. 🚀 Promote and Install Through prod
gh pr create --base prod --head dev --title "Install <<REPOSITORY_NAME>> dev ARC runner scale set" --body "Promotes the validated repository and environment runner scale set to prod. The prod push creates the runtime Kubernetes Secret and installs the Helm release."Expected production sequence:
Validate configuration
Read GitHub App credentials from arc-org-<<APP_SHORT_FORM>>
Create or reconcile arc-runners-<<APP_SHORT_FORM>>-dev
Create or reconcile <<APP_SHORT_FORM>>-arc-ghapp-secret
Render runner chart 0.14.2
Validate rendered resources with kubectl --dry-run=server
Perform the first internal Helm install and immediate listener probe
If an exact 409 or same-name listener Pod UID replacement is detected, remove only <<REPOSITORY_NAME>>-dev-arc resources
Wait 120 seconds for the GitHub message-session lease to clear
Perform exactly one clean Helm retry in the same workflow run
Require the same replacement listener Pod UID to remain Running and Ready for 30 seconds
Verify AutoscalingRunnerSet in arc-runners-<<APP_SHORT_FORM>>-dev
Verify AutoscalingListener and the unchanged listener Pod UID in arc-systems
Capture Helm, ARC resource, listener-log, pod, and event diagnostics on failure
Confirm no 409 active-session conflict remains18. 🏷️ Add the Branch-to-Scale-Set Workflow in the Target Repository
Create .github/workflows/arc-dev-smoke-test.yml in <<GITHUB_ORGANIZATION>>/<<REPOSITORY_NAME>>:
name: ARC dev Runner Smoke Test
on:
push:
branches:
- dev
paths:
- ".github/workflows/arc-dev-smoke-test.yml"
workflow_dispatch:
permissions:
contents: read
jobs:
smoke-test:
name: Verify dev ARC runner
runs-on: <<REPOSITORY_NAME>>-dev-arc
environment:
name: dev
timeout-minutes: 20
steps:
- name: Checkout repository
uses: actions/checkout@v5
- name: Display runner context
shell: bash
run: |
set -euo pipefail
echo "Repository: ${GITHUB_REPOSITORY}"
echo "Branch: ${GITHUB_REF_NAME}"
echo "Runner name: ${RUNNER_NAME}"
echo "Runner architecture: ${RUNNER_ARCH}"
echo "Expected runs-on: <<REPOSITORY_NAME>>-dev-arc"
echo "Expected Kubernetes environment: dev"
- name: Verify Docker-in-Docker
shell: bash
run: |
set -euo pipefail
docker version
docker info
docker run --rm hello-worldCommit this workflow to dev. The push creates an ephemeral runner pod on the dev worker pool. With minRunners=0, no idle runner pod is expected before a job is queued.
19. 🔎 Verify the Listener and Ephemeral Runner
ssh \
-i ~/.ssh/id_ed25519_ansible \
-o IdentitiesOnly=yes \
acllc@192.168.8.202 \
'sudo bash -s' <<'REMOTE'
set -euo pipefail
export KUBECONFIG=/etc/kubernetes/admin.conf
HELM=/usr/local/bin/helm
echo "=== HELM RELEASE ==="
"${HELM}" status <<REPOSITORY_NAME>>-dev-arc -n arc-runners-<<APP_SHORT_FORM>>-dev
echo
echo "=== AUTOSCALING RUNNER SET ==="
kubectl get autoscalingrunnerset.actions.github.com \
<<REPOSITORY_NAME>>-dev-arc \
-n arc-runners-<<APP_SHORT_FORM>>-dev \
-o wide
echo
echo "=== LISTENER RESOURCE IN THE CONTROLLER NAMESPACE ==="
kubectl get autoscalinglisteners.actions.github.com \
-n arc-systems \
-o wide
echo
echo "=== STABLE LISTENER POD AND UID ==="
listener_pod=""
listener_uid=""
for attempt in $(seq 1 60); do
listener_record="$(
kubectl get pods \
-n arc-systems \
-o jsonpath='{range .items[*]}{.metadata.name}{"|"}{.metadata.uid}{"|"}{.status.phase}{"|"}{.status.conditions[?(@.type=="Ready")].status}{"\n"}{end}' \
2>/dev/null |
grep -E '^<<REPOSITORY_NAME>>-dev-arc-[a-z0-9]+-listener\|' |
head -n 1 || true
)"
if [ -n "${listener_record}" ]; then
IFS='|' read -r candidate_pod candidate_uid candidate_phase candidate_ready \
<<< "${listener_record}"
if [ "${candidate_phase}" = "Running" ] &&
[ "${candidate_ready}" = "True" ]
then
listener_pod="${candidate_pod}"
listener_uid="${candidate_uid}"
break
fi
fi
sleep 2
done
test -n "${listener_pod}"
test -n "${listener_uid}"
for attempt in $(seq 1 15); do
listener_record="$(
kubectl get pod \
"${listener_pod}" \
-n arc-systems \
-o jsonpath='{.metadata.uid}{"|"}{.status.phase}{"|"}{.status.conditions[?(@.type=="Ready")].status}' \
2>/dev/null || true
)"
test -n "${listener_record}"
IFS='|' read -r current_uid current_phase current_ready \
<<< "${listener_record}"
test "${current_uid}" = "${listener_uid}"
test "${current_phase}" = "Running"
test "${current_ready}" = "True"
listener_logs="$(
kubectl logs \
"${listener_pod}" \
-n arc-systems \
-c listener \
--tail=200 \
2>&1 || true
)"
if printf '%s\n' "${listener_logs}" |
grep -Eq \
'RunnerScaleSetSessionConflictException|already has an active session'
then
printf '%s\n' "${listener_logs}"
echo "ERROR: Listener has an active GitHub session conflict."
exit 1
fi
sleep 2
done
echo "Stable listener pod and UID: ${listener_pod} (${listener_uid})"
echo
echo "=== EPHEMERAL RUNNER RESOURCES ==="
kubectl get \
ephemeralrunnersets.actions.github.com,ephemeralrunners.actions.github.com \
-n arc-runners-<<APP_SHORT_FORM>>-dev \
-o wide || true
echo
echo "=== RUNNER PODS ==="
kubectl get pods -n arc-runners-<<APP_SHORT_FORM>>-dev -o wide
echo
echo "=== APPROVED ENVIRONMENT WORKERS ==="
kubectl get nodes \
-l environment=dev,workload=github-runner \
-o wide
REMOTEWatch the stable listener and ephemeral runner pods in separate terminals:
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get pods -n arc-systems -o wide --watchsudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get pods -n arc-runners-<<APP_SHORT_FORM>>-dev -o wide --watchFROM-SCRATCH RUNNER SCALE-SET ACCEPTANCE CHECKPOINT
Repository runner scale set
Tenant: <<TENANT_OR_PRODUCT_NAME>>
GitHub organization: <<GITHUB_ORGANIZATION>>
Repository: <<REPOSITORY_NAME>>
Deployment branch: dev
GitHub repository Environment: dev
Runner namespace: arc-runners-<<APP_SHORT_FORM>>-dev
Kubernetes GitHub App secret: <<APP_SHORT_FORM>>-arc-ghapp-secret
Helm release: <<REPOSITORY_NAME>>-dev-arc
Scale-set name / runs-on: <<REPOSITORY_NAME>>-dev-arc
Runner group: Default
Container mode: dind
Minimum idle runners: 0
Maximum runners: 4
AutoscalingRunnerSet: present in arc-runners-<<APP_SHORT_FORM>>-dev
AutoscalingListener: present in arc-systems
Listener pod: same Pod UID remains Running and Ready in arc-systems
Listener session conflict: absent
Idle runner pods at min=0: none — expected
Runner worker pool: dev
Branch isolation
dev workflows explicitly use runs-on: <<REPOSITORY_NAME>>-dev-arc
The scale set itself does not inspect Git branches
qa and prod require separate runner scale sets and different runs-on values
Recovery and validation
Rendered Helm resources are checked with kubectl --dry-run=server
The first internal install attempt is followed by an immediate listener probe
An exact 409 or rapid same-name listener Pod UID replacement triggers scoped cleanup
The role waits 120 seconds for the GitHub session lease to clear
The role performs exactly one clean Helm retry in the same workflow run
The same listener Pod UID must remain Running and Ready for at least 30 seconds
The source workflow verifies the resulting scale set and listener after reconciliation
Security
GitHub App private key committed: no
Kubernetes Secret manifest committed: no
GitHub App secret created at runtime in the runner namespace: yes
Temporary private-key copies removed after reconciliation: yes
Runner pods: ephemeral
Success rule
Do not continue until the prod reconciliation succeeds, the same listener Pod UID remains
Running and Ready without a 409 conflict, and the target repository smoke test completes on an ephemeral
runner pod scheduled to the selected environment worker pool.20. 🩺 Failure Handling
No listener pod is created
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get secret <<APP_SHORT_FORM>>-arc-ghapp-secret -n arc-runners-<<APP_SHORT_FORM>>-dev
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get autoscalingrunnerset.actions.github.com <<REPOSITORY_NAME>>-dev-arc -n arc-runners-<<APP_SHORT_FORM>>-dev -o wide
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get autoscalinglisteners.actions.github.com -n arc-systems -o wide
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get pods -n arc-systems -o wide
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get events -n arc-runners-<<APP_SHORT_FORM>>-dev --sort-by=.lastTimestamp
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get events -n arc-systems --sort-by=.lastTimestamp
sudo /usr/local/bin/helm status <<REPOSITORY_NAME>>-dev-arc -n arc-runners-<<APP_SHORT_FORM>>-devConfirm the GitHub App secret exists in exactly the same namespace as the Helm release. The AutoscalingRunnerSet belongs to the runner namespace, while the AutoscalingListener and listener pod belong to the shared controller namespace.
The listener repeatedly becomes Running and then Error
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf logs -n arc-systems -l actions.github.com/scale-set-name=<<REPOSITORY_NAME>>-dev-arc -c listener --tail=200 --prefix || true
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get autoscalinglisteners.actions.github.com -n arc-systems -o wideLook for RunnerScaleSetSessionConflictException oralready has an active session. Even when those lines disappear before collection, repeated recreation of the same listener name with changing Pod UIDs is treated as the same recoverable stale-session pattern. The shared role removes only the affected scale set and listener resources, waits 120 seconds, performs exactly one clean retry, and requires the same replacement Pod UID to remain Running and Ready for 30 seconds. A second instability is a real failure and is not hidden by an unlimited retry loop.
The workflow remains queued with “Waiting for a runner”
Confirm the target workflow contains:
runs-on: <<REPOSITORY_NAME>>-dev-arc
Then check:
GitHub repository: <<GITHUB_ORGANIZATION>>/<<REPOSITORY_NAME>>
Runner scale set: <<REPOSITORY_NAME>>-dev-arc
Source branch: dev
Namespace: arc-runners-<<APP_SHORT_FORM>>-devRunner pods remain Pending
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get pods -n arc-runners-<<APP_SHORT_FORM>>-dev -o wide
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf describe pods -n arc-runners-<<APP_SHORT_FORM>>-dev
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf get nodes -l environment=dev,workload=github-runner --show-labels
sudo kubectl --kubeconfig=/etc/kubernetes/admin.conf describe nodes -l environment=dev,workload=github-runner | grep -A5 -E 'Taints:|Allocated resources:'Docker commands fail inside the runner
Confirm containerMode.type is dind, then inspect the runner pod and the Docker sidecar. Docker-in-Docker uses a privileged container and should be limited to these isolated CI workers.
The GitHub App secret is rejected
Re-run the prod workflow after confirming the organization environment containsARC_GITHUB_APP_ID, ARC_GITHUB_APP_INSTALLATION_ID, andARC_GITHUB_APP_PRIVATE_KEY. The role reconciles the Kubernetes Secret without committing it.
More than one scale-set workflow starts from one commit
Keep every scale-set workflow path filter limited to its namespace manifest, definition, values, playbook, and workflow file. Do not add broad helm/tenants/**, kubernetes/tenants/**, or ansible/** filters to repository-specific workflows.
21. 🧹 Uninstall One Repository Runner Scale Set and Recreate It
arc-runners-<<APP_SHORT_FORM>>-dev or <<APP_SHORT_FORM>>-arc-ghapp-secret. The environment namespace and GitHub App Secret may be shared by other repository scale sets. The command below removes only <<REPOSITORY_NAME>>-dev-arc, its listener, its ephemeral runner resources, and its remote working directory.ssh \
-i ~/.ssh/id_ed25519_ansible \
-o IdentitiesOnly=yes \
acllc@192.168.8.202 \
'sudo bash -s' <<'REMOTE'
set -euo pipefail
export KUBECONFIG=/etc/kubernetes/admin.conf
HELM=/usr/local/bin/helm
RELEASE=<<REPOSITORY_NAME>>-dev-arc
SCALE_SET=<<REPOSITORY_NAME>>-dev-arc
RUNNER_NAMESPACE=arc-runners-<<APP_SHORT_FORM>>-dev
CONTROLLER_NAMESPACE=arc-systems
WORK_ROOT=/etc/ac-cicd-infra/arc-runner-scale-sets
echo "=== BEFORE: affected fp-gw resources ==="
"${HELM}" status "${RELEASE}" -n "${RUNNER_NAMESPACE}" || true
kubectl get autoscalingrunnerset.actions.github.com \
"${SCALE_SET}" \
-n "${RUNNER_NAMESPACE}" \
-o wide || true
kubectl get autoscalinglisteners.actions.github.com \
-n "${CONTROLLER_NAMESPACE}" \
-o name | grep -F "${SCALE_SET}" || true
kubectl get pods \
-n "${CONTROLLER_NAMESPACE}" \
-o name | grep -F "${SCALE_SET}" || true
kubectl get pods \
-n "${RUNNER_NAMESPACE}" \
-l "actions.github.com/scale-set-name=${SCALE_SET}" \
-o name || true
echo
echo "=== Uninstall only the affected Helm release ==="
if "${HELM}" status "${RELEASE}" -n "${RUNNER_NAMESPACE}" >/dev/null 2>&1; then
"${HELM}" uninstall \
"${RELEASE}" \
-n "${RUNNER_NAMESPACE}" \
--wait \
--timeout 5m
else
echo "Helm release is already absent."
fi
echo
echo "=== Delete only lingering affected ARC resources ==="
kubectl delete autoscalingrunnerset.actions.github.com \
"${SCALE_SET}" \
-n "${RUNNER_NAMESPACE}" \
--ignore-not-found=true \
--wait=true \
--timeout=3m
kubectl get autoscalinglisteners.actions.github.com \
-n "${CONTROLLER_NAMESPACE}" \
-o name 2>/dev/null | \
grep -E "^autoscalinglistener\.actions\.github\.com/${SCALE_SET}-[a-z0-9]+-listener$" | \
xargs -r kubectl delete \
-n "${CONTROLLER_NAMESPACE}" \
--wait=true \
--timeout=3m || true
kubectl get pods \
-n "${CONTROLLER_NAMESPACE}" \
-o name 2>/dev/null | \
grep -E "^pod/${SCALE_SET}-[a-z0-9]+-listener$" | \
xargs -r kubectl delete \
-n "${CONTROLLER_NAMESPACE}" \
--wait=true \
--timeout=2m || true
kubectl delete ephemeralrunnersets.actions.github.com \
-n "${RUNNER_NAMESPACE}" \
-l "actions.github.com/scale-set-name=${SCALE_SET}" \
--ignore-not-found=true \
--wait=true \
--timeout=3m || true
kubectl delete ephemeralrunners.actions.github.com \
-n "${RUNNER_NAMESPACE}" \
-l "actions.github.com/scale-set-name=${SCALE_SET}" \
--ignore-not-found=true \
--wait=true \
--timeout=3m || true
kubectl delete pods \
-n "${RUNNER_NAMESPACE}" \
-l "actions.github.com/scale-set-name=${SCALE_SET}" \
--ignore-not-found=true \
--wait=true \
--timeout=2m || true
rm -rf "${WORK_ROOT}/${RELEASE}"
echo
echo "=== Verify the affected resources are absent ==="
for attempt in $(seq 1 90); do
runner_set="$(
kubectl get autoscalingrunnerset.actions.github.com \
"${SCALE_SET}" \
-n "${RUNNER_NAMESPACE}" \
--ignore-not-found \
-o name 2>/dev/null || true
)"
listeners="$(
kubectl get autoscalinglisteners.actions.github.com \
-n "${CONTROLLER_NAMESPACE}" \
-o name 2>/dev/null | \
grep -F "${SCALE_SET}" || true
)"
listener_pods="$(
kubectl get pods \
-n "${CONTROLLER_NAMESPACE}" \
-o name 2>/dev/null | \
grep -F "${SCALE_SET}" || true
)"
runner_pods="$(
kubectl get pods \
-n "${RUNNER_NAMESPACE}" \
-l "actions.github.com/scale-set-name=${SCALE_SET}" \
-o name 2>/dev/null || true
)"
if [ -z "${runner_set}" ] && \
[ -z "${listeners}" ] && \
[ -z "${listener_pods}" ] && \
[ -z "${runner_pods}" ]
then
echo "All ${SCALE_SET} Kubernetes resources are absent."
break
fi
if [ "${attempt}" -eq 90 ]; then
echo "ERROR: Affected resources still exist after the cleanup timeout."
exit 1
fi
sleep 2
done
echo
echo "=== Preserve shared FP dev resources ==="
kubectl get namespace "${RUNNER_NAMESPACE}"
kubectl get secret <<APP_SHORT_FORM>>-arc-ghapp-secret -n "${RUNNER_NAMESPACE}"
kubectl get autoscalingrunnerset.actions.github.com \
-n "${RUNNER_NAMESPACE}" \
-o wide || true
echo
echo "Do not delete ${RUNNER_NAMESPACE} or <<APP_SHORT_FORM>>-arc-ghapp-secret."
echo "They may be shared with other repository scale sets in the same product and environment."
echo
echo "Waiting 120 seconds for the GitHub message-session lease to clear..."
sleep 120
echo "Scoped uninstall and session-release wait completed."
REMOTEAfter this cleanup, keep the Git-managed namespace manifest, scale-set definition, Helm values, playbook, workflow, target repository Environment, and target smoke-test workflow. They are the desired-state source used to recreate the scale set.
To adopt the corrected common role now, update the files in Section 11 and commit them using Section 15. Because common role paths intentionally do not trigger every repository workflow, manually dispatch this repository workflow on dev for Section 16, promote the change to prod, and manually dispatch it again for Section 17. Once the corrected role and existing repository files are already present on prod, a future cluster-only uninstall requires restarting at Section 17. Sections 8 through 14 do not need to be repeated unless their source files or GitHub mappings were deleted.
22. 🏁 From-Scratch Rebuild Checkpoint After This Page
FROM-SCRATCH REBUILD CHECKPOINT AFTER THIS PAGE
Expected shared platform
Load balancer, API VIP, control planes, and all worker pools operational
Shared ARC controller deployed and healthy
GitHub organization App and protected credential environment verified
Expected repository/environment resources
Environment runner namespace created
Runtime GitHub App Kubernetes Secret created in that namespace
Repository/environment Helm release deployed
AutoscalingRunnerSet present in the runner namespace
AutoscalingListener and same-Pod-UID stable listener present in arc-systems
minRunners=0 and maxRunners=4 applied
Docker-in-Docker container mode applied
Repository Environment and runs-on mapping created
Target repository smoke test completed successfully
Clean-rebuild consistency
No namespace, Secret, Helm release, listener, or runner pod is assumed to survive
from a previous cluster. Every resource above must be recreated and verified by
following this page after the Kubernetes VMs are rebuilt.
Next documentation step
Repeat the repository and deployment-environment sections for additional scale sets,
then add the application build-and-deploy workflow.