Automation Flow

Operator Model

Use this page for the build order before you drop into playbooks and roles.

If you are starting from zero, read PREREQUISITES first.

The lab moves through three execution contexts:

  • AWS tenant and host provisioning for virt-01
  • operator workstation/bootstrap to virt-01
  • bastion-native execution from bastion-01

The host CPU-management design used by the bootstrap and guest-build phases is documented separately in:

The two entrypoints are playbooks/site-bootstrap.yml and playbooks/site-lab.yml.

If you need the internal execution model behind that split, including workstation validation, bastion staging, runner-state files, and dashboard handoff behavior, read:

If you need the formal current-state auth and authorization model, read:

If you need the planned future auth-policy model where AD becomes the user and group source of truth while IdM local groups remain the authorization boundary, read:

Flow Diagram

The SVG below is the easiest way to understand the happy path at a glance. It groups the work into the same execution phases the operator experiences in practice.

Automation flow

Phase Summary

  • Phase 1, outer infrastructure:
    • create the tenant and host stacks so virt-01 exists with the expected network and guest-volume substrate
  • Phase 2, workstation to virt-01:
    • bootstrap the hypervisor
    • build bastion-01
    • stage the project onto bastion
  • Phase 3, bastion-side support services:
    • optionally build ad-01 with AD DS and AD CS
    • build idm-01
    • join bastion-01 to IdM after identity services are ready
    • reassert and validate authoritative A/PTR records for static-IP support guests instead of relying on client-side dynamic DNS updates
    • build mirror-registry
    • publish OpenShift DNS
    • prepare installer binaries, artifacts, and agent media
  • Phase 4, cluster and day-2:
    • create the nine nested OpenShift guests
    • wait for install completion
    • normalize the domain boot state and apply the baseline day-2 config
    • converge on:
    • HTPasswd breakglass plus Keycloak OIDC for OpenShift
    • Keycloak OIDC for AAP
    • AD-backed user login through the IdM/Keycloak path when trust is enabled

Where each step runs

Steps Where What happens
1-6 Operator workstation AWS stacks, hypervisor bootstrap, bastion build, bastion staging
7-20 bastion-01 optional AD, IdM, bastion join, mirror registry, DNS, cluster build, day-2 configuration

Important

Pick a side and stay on it. Steps 1-6 run from the operator workstation. Steps 7-20 run from the bastion. The project does not account for switching execution context mid-stream. If you start a bastion-side step from the workstation and then run the next step directly on bastion (or vice versa), generated state will diverge and later steps will fail in ways that are hard to diagnose.

Command shorthand

  • RUN LOCALLY — from the operator workstation at the project root
  • RUN ON BASTION — from bastion-01 at /opt/openshift/aws-metal-openshift-demo
  • ./scripts/run_local_playbook.sh — runs a workstation-side play with tracked PID/log/RC state under ~/.local/state/calabi-playbooks/
  • ./scripts/run_remote_bastion_playbook.sh — runs a bastion play from the workstation (restages first)
  • ./scripts/lab-dashboard.sh — can now run from the workstation before the bastion handoff, then switch to bastion-native runner state after handoff

Note

site-lab.yml does not start directly on bastion. It first runs local validation and bastion-stage.yml, then SSH-handoffs into the bastion-side tracked runner. That execution plumbing is documented separately in ORCHESTRATION PLUMBING.

  1. cloudformation/deploy-stack.sh tenant
    • Renders and deploys the AWS tenant stack for the VPC, subnet, route table, and persistent Elastic IP reserved for virt-01.
    • RUN LOCALLY
      ./cloudformation/deploy-stack.sh tenant
      
  2. cloudformation/deploy-stack.sh host
    • Renders and deploys the host-only CloudFormation stack for virt-01, its first-boot cloud-init configuration, security group, imported key pair, and the attached guest EBS volume set.
    • This remains the rebuild entrypoint when the AWS tenant already exists.
    • RUN LOCALLY
      ./cloudformation/deploy-stack.sh host
      
  3. playbooks/site-bootstrap.yml
    • Runs the outside-facing bootstrap phase.
    • RUN LOCALLY
      ansible-playbook -i inventory/hosts.yml playbooks/site-bootstrap.yml
      
      • Or with tracked local runner state for workstation-side dashboarding:
      ./scripts/run_local_playbook.sh playbooks/site-bootstrap.yml
      
  4. playbooks/bootstrap/site.yml
    • Waits for the full expected guest-disk inventory, derives the active AWS EBS mapping by GuestDisk tag, installs the inventory-driven /dev/ebs/* mapping, enforces virt-01.workshop.lan, configures RHSM/CDN access, registers the hypervisor with Red Hat Insights, ensures ec2-user is unlocked for Cockpit login, updates and reboots the hypervisor when required, installs the Cockpit and PCP host-management stack, configures manager-level host CPUAffinity plus the Gold/Silver/Bronze slice units, and configures lab-switch, libvirt networking, and host NAT.
    • RUN LOCALLY
      ansible-playbook -i inventory/hosts.yml playbooks/bootstrap/site.yml
      
  5. playbooks/bootstrap/bastion.yml
    • Builds bastion-01 on VLAN 100.
    • Enables RHSM/Insights and the bastion management package set, but does not join IdM yet. Bastion enrollment now happens later through playbooks/bootstrap/bastion-join.yml after identity services are ready.
    • RUN LOCALLY
      ansible-playbook -i inventory/hosts.yml playbooks/bootstrap/bastion.yml
      
  6. playbooks/bootstrap/bastion-stage.yml
    • Synchronizes the repo onto the bastion with rsync, preserving bastion-side generated/ content and restaging the pull secret and SSH key.
    • Renders the bastion-local inventory.
    • Installs bastion execution prerequisites, including python3-pip and the Python requirements needed for WinRM-backed Windows orchestration.
    • Installs the bastion profile snippet and user helper links so cloud-user and IdM admins land with working oc, kubectl, openshift-install, helper scripts, and a conditional KUBECONFIG.
    • RUN LOCALLY
      ansible-playbook -i inventory/hosts.yml playbooks/bootstrap/bastion-stage.yml
      

Bastion boundary — all remaining work runs from bastion-01

Warning

Everything below this line runs on the bastion. Do not switch back to the operator workstation for steps 7-20 unless you are deliberately debugging the automation itself. The golden path is bastion-native execution from this point forward. Once you cross this boundary, stay on bastion.

For resilient long-running execution, the bastion helper scripts/run_bastion_playbook.sh writes PID, log, and exit-code state under /var/tmp/bastion-playbooks/.

Bastion staging restores cloud-user ownership on the staged generated/ workspace so repeated cluster renders can recreate generated/ocp cleanly.

  1. playbooks/site-lab.yml
    • Runs the full inside-facing lab phase from the bastion.
    • Imports the validated support-service order: optional ad-server, idm, optional idm-ad-trust, bastion-join, then mirror-registry, followed by cluster preparation, cluster build, validation, and day-2.
    • Support VMs (ad-01, idm-01, bastion-01, and mirror-registry) now default to preserving their existing disks and libvirt domains on rerun instead of being rebuilt automatically. A deliberate fresh support-stack replay also needs the support guest block devices wiped; VM removal alone is no longer treated as a true clean boundary.
    • The mirror-registry phase now records successful mirror completion for the rendered content set and skips the expensive oc-mirror execution on rerun unless a force flag is set.
    • After a successful support-services bring-up, the preferred recovery path is to replay site-lab.yml and let healthy support phases short-circuit.
    • For a deliberate fresh-cluster replay that preserves support services, use the cluster-only cleanup path first, then rerun site-lab.yml.
    • On reruns, the day-2 portion now probes the major post-install phases and skips ones that are already configured and healthy.
    • The current supported day-2 auth baseline is:
      • OpenShift: HTPasswd breakglass plus Keycloak OIDC
      • AAP: Keycloak OIDC with the same Keycloak realm
    • Direct AAP LDAP is no longer the preferred clean-build path.
    • Destructive ODF recovery is not part of a normal rerun. It must be explicitly forced with -e openshift_post_install_force_odf_rebuild=true (or the legacy openshift_post_install_odf_force_osd_device_reset=true).
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/site-lab.yml
      
      • Alternatively, from the workstation:
      ./scripts/run_remote_bastion_playbook.sh playbooks/site-lab.yml
      
  2. playbooks/bootstrap/ad-server.yml
    • Builds ad-01.corp.lan from the bastion-native path when lab_build_ad_server=true.
    • The validated path provisions Windows Server 2025 on /dev/ebs/ad-01, enables WinRM, installs guest tools and remaining virtio drivers, then configures AD DS, AD CS, Web Enrollment, demo users and groups, and exports the root CA.
    • This phase is optional and default-disabled.
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/bootstrap/ad-server.yml \
       -e lab_build_ad_server=true
      
      • Alternatively, from the workstation:
      ./scripts/run_remote_bastion_playbook.sh playbooks/bootstrap/ad-server.yml \
       -e lab_build_ad_server=true
      
  3. playbooks/bootstrap/idm.yml
    • Builds idm-01, configures DNS/CA/KRA, Cockpit, session recording, RHSM/Insights, and IPA data.
    • In the current validated flow this runs from the bastion, after bastion staging and after the optional AD build when enabled.
    • The IdM install path uses the FreeIPA server role for server/KRA and FreeIPA modules for users, groups, password policies, and sudo rules.
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/bootstrap/idm.yml
      
      • Alternatively, from the workstation:
      ./scripts/run_remote_bastion_playbook.sh playbooks/bootstrap/idm.yml
      
  4. playbooks/bootstrap/idm-ad-trust.yml
    • Configures the optional IdM-to-AD trust after both support directories are available.
    • Ensures the AD conditional forwarder for workshop.lan, enables IdM AD trust support, creates the IPA forward zone for corp.lan, establishes the trust, and nests the mapped IdM external groups into the target local policy groups.
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/bootstrap/idm-ad-trust.yml \
       -e lab_build_ad_server=true
      
      • Alternatively, from the workstation:
      ./scripts/run_remote_bastion_playbook.sh playbooks/bootstrap/idm-ad-trust.yml \
       -e lab_build_ad_server=true
      
  5. playbooks/bootstrap/bastion-join.yml
    • Joins the already-built bastion to IdM after identity services are ready.
    • Refreshes the IdM CA, enrolls the host, and enables with-mkhomedir plus with-sudo so domain users receive home directories and SSSD sudo rules on first login. The join path no longer performs a general bastion update or reboot; that remains part of the earlier site-bootstrap.yml flow.
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/bootstrap/bastion-join.yml
      
      • Alternatively, from the workstation:
      ./scripts/run_remote_bastion_playbook.sh playbooks/bootstrap/bastion-join.yml
      
  6. playbooks/lab/mirror-registry.yml
    • Builds mirror-registry, joins it to IdM, installs Quay, and prepares disconnected content tooling.
    • Static-IP support guests no longer rely on SSSD dynamic DNS updates. The play reasserts the mirror-registry A/PTR records in authoritative IdM DNS and validates them before returning.
    • Default disconnected path is now portable, which runs both m2d and d2m in the same playbook invocation.
    • The import-only override remains available when an existing archive should be pushed without rerunning the pull phase: -e mirror_registry_content_mode_override=import -e mirror_registry_content_workflow_override=d2m.
    • The bastion installs:
      • /usr/local/bin/track-mirror-progress
      • /usr/local/bin/track-mirror-progress-tmux
    • Subsequent m2d and d2m runs also write guest-side logs such as:
      • /var/log/oc-mirror-m2d.log
      • /var/log/oc-mirror-d2m.log
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/lab/mirror-registry.yml
      
      • Alternatively, from the workstation:
      ./scripts/run_remote_bastion_playbook.sh playbooks/lab/mirror-registry.yml
      
  7. playbooks/lab/openshift-dns.yml
    • Creates the cluster DNS zones and records in IdM.
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/lab/openshift-dns.yml
      
      • Alternatively, from the workstation:
      ./scripts/run_remote_bastion_playbook.sh playbooks/lab/openshift-dns.yml
      
  8. playbooks/cluster/openshift-installer-binaries.yml
    • Downloads the exact OpenShift installer/client toolchain for the pinned mirrored release on the bastion.
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/cluster/openshift-installer-binaries.yml
      
      • Alternatively, from the workstation:
      ./scripts/run_remote_bastion_playbook.sh playbooks/cluster/openshift-installer-binaries.yml
      
  9. playbooks/cluster/openshift-install-artifacts.yml
    • Renders install-config.yaml, agent-config.yaml, and the IdM CA bundle on the bastion.
    • agent-config.yaml now renders per-node rootDeviceHints.serialNumber values from the libvirt root-disk serials instead of a hardcoded HCTL hint.
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/cluster/openshift-install-artifacts.yml
      
      • Alternatively, from the workstation:
      ./scripts/run_remote_bastion_playbook.sh playbooks/cluster/openshift-install-artifacts.yml
      
  10. playbooks/cluster/openshift-agent-media.yml
    • Generates agent.x86_64.iso on the bastion and publishes it to virt-01.
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/cluster/openshift-agent-media.yml
      
      • Alternatively, from the workstation:
      ./scripts/run_remote_bastion_playbook.sh playbooks/cluster/openshift-agent-media.yml
      
  11. playbooks/cluster/openshift-cluster.yml
    • Builds the 9 nested OpenShift VMs, attaches the agent ISO, and boots them.
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/cluster/openshift-cluster.yml
      
      • Alternatively, from the workstation:
      ./scripts/run_remote_bastion_playbook.sh playbooks/cluster/openshift-cluster.yml
      
  12. playbooks/cluster/openshift-install-wait.yml
    • Runs openshift-install wait-for bootstrap-complete and openshift-install wait-for install-complete from the bastion.
    • On fresh agent-based installs, it also recovers control-plane nodes that remain on the agent ISO by ejecting install media, restoring disk-first boot, and power-cycling the affected domains before or just after bootstrap as needed.
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/cluster/openshift-install-wait.yml
      
      • Alternatively, from the workstation:
      ./scripts/run_remote_bastion_playbook.sh playbooks/cluster/openshift-install-wait.yml
      
  13. playbooks/day2/openshift-post-install-validate.yml
    • Verifies the cluster is ready for day-2 configuration before the aggregated post-install play runs.
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/day2/openshift-post-install-validate.yml
      
      • Alternatively, from the workstation:
      ./scripts/run_remote_bastion_playbook.sh playbooks/day2/openshift-post-install-validate.yml
      
  14. playbooks/day2/openshift-post-install.yml
    • Applies day-2 configuration.
    • The current default baseline order is: disconnected OperatorHub, infra conversion, IdM ingress certs, HTPasswd breakglass auth, NMState, ODF, Keycloak, OIDC auth, optional legacy LDAP auth, Virtualization, Pipelines, Web Terminal, AAP, NetObserv, then validation.
    • The supported default auth model is: breakglass HTPasswd plus Keycloak OIDC. Direct OpenShift LDAP auth is disabled by default and retained only as an optional compatibility path.
    • Healthy major phases are skipped on rerun unless their force flag is set.
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/day2/openshift-post-install.yml
      
      • Alternatively, from the workstation:
      ./scripts/run_remote_bastion_playbook.sh playbooks/day2/openshift-post-install.yml
      
  15. playbooks/maintenance/detach-install-media.yml
    • Ejects cidata and agent.x86_64.iso and restores disk-only boot intent.
    • For support guests, the persistent CD-ROM device is also removed from the libvirt XML.
    • For OpenShift cluster guests, the important success condition is that the agent ISO is no longer attached and boot order is back to disk. A live empty CD-ROM shell may remain until a later reboot.
    • Support guests also do this earlier in their own lifecycle, before the first update reboot, so the reboot clears any remaining live empty CD-ROM shell that libvirt could not hot-unplug.
    • RUN ON BASTION
      ansible-playbook -i inventory/hosts.yml playbooks/maintenance/detach-install-media.yml
      
      • Alternatively, from the workstation:
      ansible-playbook -i inventory/hosts.yml playbooks/maintenance/detach-install-media.yml
      

Certificate Design

  • Mirror registry:
    • Fresh builds default to IdM-issued certificates.
  • OpenShift ingress:
    • The intended supported custom-certificate path is *.apps.ocp.workshop.lan.
  • The ingress workflow also applies the IdM CA into cluster trust so route health checks keep working after the custom certificate rollout.
  • OpenShift API:
    • The project no longer tries to replace the in-cluster API serving certificate.
    • Admin access relies on the cluster CA embedded in the generated kubeconfig.

Current State

  • the workflow has four operator entrypoints:
  • once the bastion is staged, guest-side management on VLAN 100 is performed directly from the bastion rather than proxied back through virt-01
  • playbooks/site-lab.yml now begins with support services in this order:
    • optional ad-server
    • idm
    • optional idm-ad-trust
    • bastion-join
    • mirror-registry
  • the bastion now also presents a ready-to-use shell environment for cloud-user and IdM admins, including helper links under $HOME/bin, cluster artifacts under $HOME/etc, and conditional login-time KUBECONFIG
  • playbooks/maintenance/cleanup.yml remains the aggregated teardown entrypoint
  • the latest validated rebuild path reaches:
    • tenant stack
    • host stack
    • hypervisor bootstrap
    • bastion-01
    • bastion staging
    • ad-01 when explicitly enabled
    • idm-01
    • bastion join
    • mirror-registry
    • OpenShift cluster install through openshift-install wait-for install-complete
  • the currently proven post-install/auth state on the resulting cluster is:
    • HTPasswd breakglass login works
    • kubeadmin is retired
    • Keycloak is deployed from mirrored content
    • OpenShift OAuth uses Keycloak OIDC and groups claim sync
    • openshift-admin is bound to cluster-admin
  • the remaining confidence step is one uninterrupted site-lab.yml run on the current codebase from a deliberate teardown boundary, without live fixes
  • support-service DNS publication is now explicit and authoritative:
    • static-IP bastion and mirror-registry enrollment does not depend on client DNS updates
    • IdM and OpenShift DNS publishing phases validate the new records before returning

Continue