This is the working implementation of the lab: one AWS m5.metal RHEL host, nested virtualization, support services, a full disconnected OpenShift build, and the day-2 automation around it.

If you need the input checklist before a first build, start with PREREQUISITES.

Before pushing orchestration changes, run:

make validate

That gate catches YAML/task-file parse errors, shell syntax issues, top-level playbook syntax errors, and the cross-play credential/variable contracts that the bastion runner now depends on.

What This Repo Is

  • outer AWS IaaS scaffolding for virt-01
  • host bootstrap for KVM, libvirt, OVS, firewalld, Cockpit, and PCP
  • support guests:
    • idm-01
    • bastion-01
    • mirror-registry
  • deterministic authoritative IdM DNS publication and validation for static-IP support guests and cluster records
  • disconnected OpenShift install flow
  • fresh-install control-plane recovery during the agent-based bootstrap wait
  • day-2 operator and platform configuration
  • default cluster auth baseline:
    • HTPasswd breakglass
    • Keycloak OIDC backed by IdM
    • direct OpenShift LDAP auth disabled by default
  • formal auth architecture documented in AUTH MODEL
  • teardown and media-cleanup workflows

Build starts outside on the operator workstation, lands on virt-01, and then shifts to bastion for the inside-facing lab and cluster work. The fuller run order lives in AUTOMATION FLOW. The runner split and workstation-to-bastion handoff live in ORCHESTRATION PLUMBING.

Current validation status:

  • cluster build, mirrored-content consumption, and the default auth baseline (HTPasswd breakglass plus Keycloak OIDC) are working on the current lab
  • the latest bastion-side playbooks/site-lab.yml run completed successfully with rc=0 on the live environment
  • the repo validation lane (make validate) is clean
  • the final zero-intervention certification run of playbooks/site-lab.yml from a fresh teardown boundary is still pending

Validated Baseline

Latest fully validated cluster baseline, confirmed before host performance domains were introduced:

  • OpenShift 4.20.15
  • 3 masters at 8 vCPU / 24 GiB
  • 3 infra nodes at 16 vCPU / 48 GiB
  • 3 workers at 4 vCPU / 16 GiB

The repo now also contains the newer host performance-domain design for virt-01. That work does not replace the baseline above; it builds on it by adding more intentional CPU management under contention and by making a worker uplift to 8 vCPU / 16 GiB a reasonable next default. See RESOURCE MANAGEMENT for the current CPU-management design, worker-uplift rationale, and rollout guidance.

Where To Change Things

If you need to change... Start here
AWS substrate and EBS intent cloudformation/ and IAAS MODEL
Hypervisor bootstrap playbooks/bootstrap/site.yml and roles/lab_*
Support guests playbooks/bootstrap/*.yml, roles/idm*, roles/bastion*, roles/mirror_registry*
Cluster VM shells playbooks/cluster/openshift-cluster.yml, roles/openshift_cluster/, vars/guests/openshift_cluster_vm.yml
Host CPU and VM tiering vars/global/host_resource_management.yml, roles/lab_host_resource_management/, RESOURCE MANAGEMENT
Day-2 behavior playbooks/day2/, roles/openshift_post_install_*, vars/day2/
Troubleshooting context INVESTIGATING and ISSUES LEDGER

Continue