This is the working implementation of the lab: one AWS m5.metal RHEL host,
nested virtualization, support services, a full disconnected OpenShift build,
and the day-2 automation around it.
If you need the input checklist before a first build, start with PREREQUISITES.
Before pushing orchestration changes, run:
make validate
That gate catches YAML/task-file parse errors, shell syntax issues, top-level playbook syntax errors, and the cross-play credential/variable contracts that the bastion runner now depends on.
What This Repo Is
- outer AWS IaaS scaffolding for
virt-01 - host bootstrap for KVM, libvirt, OVS, firewalld, Cockpit, and PCP
- support guests:
idm-01bastion-01mirror-registry
- deterministic authoritative IdM DNS publication and validation for static-IP support guests and cluster records
- disconnected OpenShift install flow
- fresh-install control-plane recovery during the agent-based bootstrap wait
- day-2 operator and platform configuration
- default cluster auth baseline:
HTPasswdbreakglass- Keycloak OIDC backed by IdM
- direct OpenShift LDAP auth disabled by default
- formal auth architecture documented in AUTH MODEL
- teardown and media-cleanup workflows
Build starts outside on the operator workstation, lands on virt-01, and then
shifts to bastion for the inside-facing lab and cluster work. The fuller run
order lives in AUTOMATION FLOW.
The runner split and workstation-to-bastion handoff live in
ORCHESTRATION PLUMBING.
Current validation status:
- cluster build, mirrored-content consumption, and the default auth baseline
(
HTPasswdbreakglass plus Keycloak OIDC) are working on the current lab - the latest bastion-side
playbooks/site-lab.ymlrun completed successfully withrc=0on the live environment - the repo validation lane (
make validate) is clean - the final zero-intervention certification run of
playbooks/site-lab.ymlfrom a fresh teardown boundary is still pending
Validated Baseline
Latest fully validated cluster baseline, confirmed before host performance domains were introduced:
- OpenShift
4.20.15 3masters at8 vCPU / 24 GiB3infra nodes at16 vCPU / 48 GiB3workers at4 vCPU / 16 GiB
The repo now also contains the newer host performance-domain design for
virt-01. That work does not replace the baseline above; it builds on it by
adding more intentional CPU management under contention and by making a worker
uplift to 8 vCPU / 16 GiB a reasonable next default. See
RESOURCE MANAGEMENT for the
current CPU-management design, worker-uplift rationale, and rollout guidance.
Where To Change Things
| If you need to change... | Start here |
|---|---|
| AWS substrate and EBS intent | cloudformation/ and IAAS MODEL |
| Hypervisor bootstrap | playbooks/bootstrap/site.yml and roles/lab_* |
| Support guests | playbooks/bootstrap/*.yml, roles/idm*, roles/bastion*, roles/mirror_registry* |
| Cluster VM shells | playbooks/cluster/openshift-cluster.yml, roles/openshift_cluster/, vars/guests/openshift_cluster_vm.yml |
| Host CPU and VM tiering | vars/global/host_resource_management.yml, roles/lab_host_resource_management/, RESOURCE MANAGEMENT |
| Day-2 behavior | playbooks/day2/, roles/openshift_post_install_*, vars/day2/ |
| Troubleshooting context | INVESTIGATING and ISSUES LEDGER |