Teaching Reference

Teaching Reference

Step-by-step companion reading for understanding what the automation does under the hood.

Teaching reference. This page explains what the automation does step by step. Do not follow it as the primary build path; use Automation Flow for the supported deployment sequence.

Manual Process

Use this page to understand what the automation does step by step, teach the flow to someone else, or inspect the underlying sequence without treating the automation as a black box.

If you are building Calabi in the supported way, do not start here. Start with PREREQUISITES and then follow AUTOMATION FLOW.

Keep these pages nearby while you use this teaching reference:

Do not read this as a byte-for-byte dump of every Ansible task. Read it as the teaching companion to the supported build and day-2 flow.

When bastion-native playbooks need to be rerun after local repository changes, the staged repo on bastion is refreshed in place so generated/ output is not thrown away between runs.

Important

The validated support-services order changed. The current golden path is: build bastion-01, stage the project to bastion, optionally build ad-01 with AD DS and AD CS, build idm-01, optionally configure IdM to AD trust, join the bastion to IdM, then continue with mirror-registry, OpenShift DNS, and cluster work. The legacy section numbering is retained below so older deep links do not break.

The command examples use these neutral placeholders:

  • <operator-ssh-key>: the SSH private key used from the operator workstation
  • <hypervisor-public-ip>: the reachable public IP of virt-01, preferably a persistent Elastic IP
  • <project-root>: the local checkout of this project on the current execution host
  • <rhel10-image-path>: the local RHEL 10.1 qcow2 image path on virt-01
  • <pull-secret-file>: the local Red Hat pull-secret file
  • <operator-public-key>: the SSH public key injected into guest cloud-init
  • <ec2-user-password-hash>: a SHA-512 password hash for ec2-user
  • <lab-default-password>: the default demonstration password used for guest cloud-init, IdM bootstrap, and related manual examples

Where each step runs

Steps Where What happens
1-13 Operator workstation / virt-01 AWS stacks, hypervisor, bastion build, bastion staging
13A-36 bastion-01 optional AD, IdM, bastion join, mirror registry, DNS, cluster build, day-2, debugging

Important

Pick a side and stay on it. Steps 1-13 run from the operator workstation against virt-01. Steps 13A-36 run from the bastion. The project does not account for switching execution context mid-stream. Once you cross the bastion boundary at step 13A, stay on bastion.

Table Of Contents

Use this like a runbook, not a novel. Jump to the phase you actually need.

Outer Cloud And Host Bring-up

Support Services

Validated support-services order:

Legacy section order retained below:

Cluster Bring-up

Day-2 And Follow-on Work

1. Provision The AWS IaaS Layer

Build the AWS substrate by hand: first the shared tenant layer, then the virt-01 host layer inside it.

For a full fresh environment, create the tenant substrate first, then create the host substrate inside it. For a later virt-01 rebuild inside an existing tenant, only repeat the host stack.

Shell
# Create the tenant and host CloudFormation stacks and inspect their outputs.cd <project-root>cat <<'EOF' >cloudformation/parameters.tenant.json[  { "ParameterKey": "LabPrefix", "ParameterValue": "workshop" },  { "ParameterKey": "AvailabilityZone", "ParameterValue": "us-east-2a" },  { "ParameterKey": "VpcCidr", "ParameterValue": "10.0.0.0/16" },  { "ParameterKey": "PublicSubnetCidr", "ParameterValue": "10.0.0.0/20" }]EOF./cloudformation/deploy-stack.sh tenant virt-tenant cloudformation/parameters.tenant.jsonaws cloudformation describe-stacks \  --stack-name virt-tenant \  --query 'Stacks[0].Outputs'cat <<'EOF' >cloudformation/parameters.host.json[  { "ParameterKey": "LabPrefix", "ParameterValue": "workshop" },  { "ParameterKey": "AvailabilityZone", "ParameterValue": "us-east-2a" },  { "ParameterKey": "ExistingVpcId", "ParameterValue": "vpc-REPLACE_FROM_VIRT_TENANT_OUTPUTS" },  { "ParameterKey": "ExistingSubnetId", "ParameterValue": "subnet-REPLACE_FROM_VIRT_TENANT_OUTPUTS" },  { "ParameterKey": "PersistentPublicIpAllocationId", "ParameterValue": "eipalloc-REPLACE_FROM_VIRT_TENANT_OUTPUTS" },  { "ParameterKey": "VirtHostPrivateIp", "ParameterValue": "10.0.8.207" },  { "ParameterKey": "AdminIngressCidr", "ParameterValue": "0.0.0.0/0" },  { "ParameterKey": "VirtHostInstanceType", "ParameterValue": "m5.metal" },  { "ParameterKey": "RedHatRhelPrivateAmiId", "ParameterValue": "ami-REPLACE_WITH_RHEL_10_1_AMI" },  { "ParameterKey": "ImportedKeyPairName", "ParameterValue": "virt-lab-key" },  { "ParameterKey": "ImportedPublicKeyMaterial", "ParameterValue": "ssh-ed25519 AAAA_REPLACE_WITH_REAL_PUBLIC_KEY" },  { "ParameterKey": "Ec2UserPasswordHash", "ParameterValue": "<ec2-user-password-hash>" },  { "ParameterKey": "RootVolumeSizeGiB", "ParameterValue": "100" },  { "ParameterKey": "RootVolumeIops", "ParameterValue": "3000" },  { "ParameterKey": "RootVolumeThroughput", "ParameterValue": "125" }]EOF./cloudformation/deploy-stack.sh host virt-host cloudformation/parameters.host.jsonaws cloudformation describe-stacks \  --stack-name virt-host \  --query 'Stacks[0].Outputs'

2. Verify First-Boot Access To virt-01

Verify the new virt-01 host is reachable, initialized correctly, and ready for the remaining hypervisor work.

Note

Automation reference: first-boot cloud-init from the host CloudFormation templates, followed by playbooks/bootstrap/site.yml.

Verify that cloud-init completed, the operator SSH key was installed for ec2-user, and Cockpit was enabled for SOCKS-proxied browser access.

Shell
ssh -i <operator-ssh-key> ec2-user@<hypervisor-public-ip> 'hostnamectl; systemctl is-active cockpit.socket'# Example SOCKS proxy for Cockpit access without opening TCP/9090 in the security group.ssh -i <operator-ssh-key> -D 5555 ec2-user@<hypervisor-public-ip># Browser configuration:#   SOCKS5 proxy: 127.0.0.1:5555#   Proxy DNS through SOCKS: enabled#   Cockpit URL: https://<hypervisor-public-ip>:9090/## Authenticate to Cockpit as:#   user: ec2-user#   password: the plaintext corresponding to <ec2-user-password-hash>

Fresh AWS RHEL images can still leave ec2-user locked even when a password hash is present. The orchestration now explicitly unlocks the account. The manual equivalent is:

Shell
# Unlock ec2-user if the first-boot image left the account locked.ssh -i <operator-ssh-key> ec2-user@<hypervisor-public-ip> 'sudo usermod -U ec2-user'

Ensure the hypervisor identity is correct.

Shell
# Set the hypervisor hostname and ensure the local hosts entry is present.ssh -i <operator-ssh-key> ec2-user@<hypervisor-public-ip> <<'EOF'sudo hostnamectl set-hostname virt-01.workshop.langrep -q '^127.0.1.1 virt-01.workshop.lan virt-01$' /etc/hosts || \  echo '127.0.1.1 virt-01.workshop.lan virt-01' | sudo tee -a /etc/hostsEOF

3. Install Deterministic /dev/ebs Host Naming

Create deterministic /dev/ebs/* names on the hypervisor from the live AWS volume attachments.

Note

Automation reference: playbooks/bootstrap/site.yml, role lab_host_base.

Derive the active guest-disk map from the current AWS attachments by GuestDisk tag, then render the host naming layer from that live map. This avoids stale EBS volume IDs after a rebuild.

Shell
# Create the /dev/ebs naming layer from the current AWS volume attachments.sudo install -d -m 0755 /dev/ebscat <<'EOF' | sudo tee /etc/tmpfiles.d/ebs-friendly.confd /dev/ebs 0755 root root -EOFsudo systemd-tmpfiles --create /etc/tmpfiles.d/ebs-friendly.confINSTANCE_ID="$(curl -fsS http://169.254.169.254/latest/meta-data/instance-id)"aws ec2 describe-volumes \  --filters Name=attachment.instance-id,Values=${INSTANCE_ID} \           Name=tag-key,Values=GuestDisk \  --query "Volumes[].{volume_id:VolumeId,guest_disk:Tags[?Key=='GuestDisk']|[0].Value}" \  --output json >/tmp/guest-volumes.jsonpython3 - <<'PY' | sudo tee /etc/udev/rules.d/99-ebs-friendly.rulesimport jsonfrom pathlib import Pathvols = json.loads(Path('/tmp/guest-volumes.json').read_text())print('# Managed from live AWS GuestDisk-tagged attachments.')for vol in sorted(vols, key=lambda v: v['guest_disk']):    serial = vol['volume_id'].replace('-', '')    guest = vol['guest_disk']    print(        f'ACTION=="add|change", SUBSYSTEM=="block", ENV{{DEVTYPE}}=="disk", '        f'KERNEL=="nvme*n1", ENV{{ID_MODEL}}=="Amazon Elastic Block Store", '        f'ENV{{ID_SERIAL_SHORT}}=="{serial}", SYMLINK+="ebs/{guest}"'    )PYsudo udevadm control --reload-rulessudo udevadm trigger --subsystem-match=block --action=changesudo udevadm settlels -1 /dev/ebs

4. Prepare The Hypervisor

Prepare the hypervisor base OS, repositories, and core services for the lab.

Note

Automation reference: playbooks/bootstrap/site.yml, role lab_host_base.

Update redhat-release before the main system update. This ensures the current Red Hat Post-Quantum Cryptography public keys are present before DNF validates newer packages. See: https://access.redhat.com/solutions/3449341

Install the required host packages, enable the Red Hat fast-datapath repo for OVS, and turn on the core host services.

Shell
# Log into the hypervisor, install the base packages, enable the required services, and reboot.ssh -i <operator-ssh-key> ec2-user@<hypervisor-public-ip>sudo -itimeout 30s subscription-manager repos \  --enable fast-datapath-for-rhel-10-x86_64-rpmsdnf -y install insights-clientinsights-client --registerdnf -y install \  firewalld \  qemu-kvm \  qemu-img \  libvirt \  virt-install \  virt-viewer \  virt-top \  guestfs-tools \  genisoimage \  openvswitch3.6 \  cockpit \  cockpit-files \  cockpit-machines \  cockpit-podman \  cockpit-session-recording \  cockpit-image-builder \  pcp \  pcp-system-tools \  tmux \  jqdnf -y update redhat-releasednf -y updatesystemctl enable firewalldsystemctl enable cockpit.socketsystemctl enable openvswitchsystemctl enable osbuild-composer.socketsystemctl enable pmcd.service pmlogger.service pmproxy.servicesystemctl enable virtqemud.socketsystemctl enable virtnetworkd.socketsystemctl enable virtstoraged.socketsystemctl enable virtlogd.socketreboot

Apply The Host Resource-Management Policy

Apply the host CPU-placement and systemd slice policy used by the lab.

Note

Automation reference: playbooks/bootstrap/site.yml, role lab_host_resource_management.

The current settled design keeps manager-level systemd CPUAffinity and the Gold/Silver/Bronze slice units, but it does not set kernel affinity boot args or an irqbalance guest-domain ban by default.

Shell
# Install the host CPU-placement and systemd slice policy.cat <<'EOF' >/etc/systemd/system.conf.d/90-aws-metal-openshift-demo-host-resource-management.conf[Manager]DefaultCPUAccounting=yesCPUAffinity=0-5,24-29,48-53,72-77EOFcat <<'EOF' >/etc/systemd/system/machine-gold.slice[Unit]Description=Gold performance domain for prioritized VMs[Slice]CPUAccounting=yesCPUWeight=512EOFcat <<'EOF' >/etc/systemd/system/machine-silver.slice[Unit]Description=Silver performance domain for medium-priority VMs[Slice]CPUAccounting=yesCPUWeight=333EOFcat <<'EOF' >/etc/systemd/system/machine-bronze.slice[Unit]Description=Bronze performance domain for best-effort VMs[Slice]CPUAccounting=yesCPUWeight=167EOFsystemctl daemon-reloadsystemctl daemon-reexec

Validate the current host-policy shape.

Shell
# Validate the current host CPU-placement policy.grep -E '^(DefaultCPUAccounting|CPUAffinity)=' \  /etc/systemd/system.conf.d/90-aws-metal-openshift-demo-host-resource-management.confsystemctl show machine-gold.slice machine-silver.slice machine-bronze.slice \  -p CPUAccounting -p CPUWeightgrep Cpus_allowed_list /proc/1/statuscat /proc/cmdline

Expected current state:

  • PID 1 allowed on 0-5,24-29,48-53,72-77
  • machine-gold.slice, machine-silver.slice, and machine-bronze.slice installed with weights 512, 333, and 167
  • no systemd.cpu_affinity= or irqaffinity= kernel arguments

Apply The Host Memory-Oversubscription Policy

Apply the host memory-oversubscription policy used by the lab. This policy is independent from CPU placement and can be revisited later without redoing the rest of the host bootstrap.

Note

Automation reference: playbooks/bootstrap/site.yml, role lab_host_memory_oversubscription.

The memory-overcommit policy is kept separate from CPU placement. It improves host RAM efficiency through three independent kernel mechanisms:

  • zram compressed swap — an in-memory block device that stores anonymous pages in compressed form, giving the kernel a cheap place to park cold pages before direct reclaim gets expensive
  • THP in madvise mode — Transparent Huge Pages only when applications explicitly request them, avoiding background compaction stalls
  • KSM with conservative scan settings — Kernel Same-page Merging deduplicates identical memory pages across guests running the same OS image

Important

zram-size = 16G is not a 16 GiB reservation taken away from the host up front. The device only consumes physical RAM as compressed pages are stored in it. With zstd compression the typical effective ratio is 2:1 to 4:1, so 16G of logical swap capacity costs roughly 4-8G of physical RAM when fully utilized.

Note

This is most useful when the host is calm or moderately busy. It helps the kernel avoid harsher reclaim behavior and can smooth out bursty pressure, but it is not a substitute for enough real RAM at high contention.

Warning

Compression, deduplication, and reclaim are CPU work that runs in the host kernel, not inside the Gold/Silver/Bronze tier model. If you lean harder on memory overcommit, expect some host cycles to move from idle capacity into memory management before you see any change in guest throughput.

zram

The role creates a systemd oneshot service that manages the zram device explicitly using zramctl. This avoids relying on systemd-zram-generator and keeps all three memory subsystems in a single service unit.

Shell
# Load the zram kernel modulemodprobe zram num_devices=1# Configure the device with zstd compression and 16G capacityzramctl /dev/zram0 --algorithm zstd --size 16G# Format and activate as swap with high prioritymkswap -f /dev/zram0swapon --priority 100 --discard /dev/zram0

The swap priority of 100 ensures zram is always preferred over any physical swap device. The --discard flag enables TRIM so that freed pages are immediately released back to the host.

An optional advanced override can also attach a dedicated writeback backing device to zram before initialization. Keep that disabled by default and only enable it when you have intentionally repurposed a block device for cold-page spill behavior. The override can either point at an existing block device or create a dedicated LV in calabi_lab_vg. Do not count that tier as planned capacity.

This capability is not inherently on-prem-only. The role can target any dedicated block device through backing_device. What is on-prem-specific in this repo is the managed-LVM convenience path that assumes the local calabi_lab_vg layout used by the on-prem deployment flow.

If you also enable the optional writeback policy timer, start with mode: huge and a small per-run budget such as 256 MiB every 30m. The huge mode writes back pages that did not compress well. For broader cold-page relief, use huge_idle when the running kernel supports age-based zram idle tracking.

Do not point writeback at:

  • a mounted filesystem device
  • an active swap device
  • a shared guest-storage LV that serves another purpose

THP

Shell
# Set THP to madvise mode.echo madvise > /sys/kernel/mm/transparent_hugepage/enabledecho madvise > /sys/kernel/mm/transparent_hugepage/defrag

Setting both to madvise means the kernel only allocates and compacts huge pages when the application explicitly requests them via madvise(MADV_HUGEPAGE). This avoids the pathological case where always mode triggers aggressive khugepaged compaction against memory that no process benefits from.

KSM

Shell
# Set conservative KSM scan parameters and enable deduplication.echo 1000 > /sys/kernel/mm/ksm/pages_to_scanecho 20   > /sys/kernel/mm/ksm/sleep_millisecsecho 1    > /sys/kernel/mm/ksm/run

The scan settings are deliberately conservative: 1000 pages per cycle with a 20 ms pause. The first full scan pass across all guest memory is slow (minutes to hours on a fully deployed lab), but once the internal dedup tree is built the steady-state CPU cost is near zero.

Wrap It In A Persistent Service

Rather than running these commands ad-hoc, the role installs a systemd oneshot service so the policy survives reboot.

Shell
# Install and start the persistent host memory-oversubscription service.cat <<'EOF' >/etc/systemd/system/calabi-host-memory-oversubscription.service[Unit]Description=Apply Calabi host memory oversubscription policyAfter=local-fs.target[Service]Type=oneshotRemainAfterExit=yesExecStartPre=-/usr/sbin/swapoff /dev/zram0ExecStartPre=-/usr/sbin/zramctl --reset /dev/zram0ExecStartPre=-/usr/sbin/modprobe -r zramExecStartPre=/usr/sbin/modprobe zram num_devices=1ExecStart=/usr/sbin/zramctl /dev/zram0 --algorithm zstd --size 16GExecStart=/usr/sbin/mkswap -f /dev/zram0ExecStart=/usr/sbin/swapon --priority 100 --discard /dev/zram0ExecStop=-/usr/sbin/swapoff /dev/zram0ExecStop=-/usr/sbin/zramctl --reset /dev/zram0ExecStop=-/usr/sbin/modprobe -r zramExecStart=/usr/bin/bash -lc '\if [ -e /sys/kernel/mm/transparent_hugepage/enabled ]; then echo madvise > /sys/kernel/mm/transparent_hugepage/enabled; fi; \if [ -e /sys/kernel/mm/transparent_hugepage/defrag ]; then echo madvise > /sys/kernel/mm/transparent_hugepage/defrag; fi; \if [ -e /sys/kernel/mm/ksm/pages_to_scan ]; then echo 1000 > /sys/kernel/mm/ksm/pages_to_scan; fi; \if [ -e /sys/kernel/mm/ksm/sleep_millisecs ]; then echo 20 > /sys/kernel/mm/ksm/sleep_millisecs; fi; \if [ -e /sys/kernel/mm/ksm/run ]; then echo 1 > /sys/kernel/mm/ksm/run; fi; \true'[Install]WantedBy=multi-user.targetEOFsystemctl daemon-reloadsystemctl enable --now calabi-host-memory-oversubscription.service

Validate The Memory-Oversubscription State

Shell
# Service statesystemctl is-enabled calabi-host-memory-oversubscription.servicesystemctl is-active calabi-host-memory-oversubscription.service# zram device and swapzramctlswapon --show# THP modecat /sys/kernel/mm/transparent_hugepage/enabledcat /sys/kernel/mm/transparent_hugepage/defrag# KSM statecat /sys/kernel/mm/ksm/runcat /sys/kernel/mm/ksm/pages_to_scancat /sys/kernel/mm/ksm/sleep_millisecs

Expected current state:

  • service is enabled and active
  • zramctl shows /dev/zram0 with zstd algorithm and 16G disk size
  • swapon shows /dev/zram0 at priority 100
  • THP enabled shows [madvise] (bracketed = active selection)
  • THP defrag shows [madvise]
  • KSM run is 1, pages and sleep match the configured values

Monitor KSM Effectiveness

KSM deduplication savings grow over time as the scanner finds identical pages across guests. Check convergence after the cluster is fully deployed and idle:

Shell
# Pages shared (unique pages backing merged regions)cat /sys/kernel/mm/ksm/pages_shared# Pages sharing (total pages being deduplicated, including copies)cat /sys/kernel/mm/ksm/pages_sharing# Pages not yet mergedcat /sys/kernel/mm/ksm/pages_unshared

If pages_sharing significantly exceeds pages_shared, KSM is saving meaningful memory. If pages_unshared remains high relative to pages_sharing for extended periods, the scan rate may be too conservative.

The project includes a monitoring script for continuous observation:

Shell
# Run the host memory-overcommit dashboard.<project-root>/scripts/host-memory-overcommit-status.py \  --host <hypervisor-public-ip> --user ec2-user

Use --watch 30 for a live dashboard or --delta 60 for a before-and-after comparison across an interval.

The rationale is not to squeeze masters or infra. It is to improve host RAM efficiency while keeping Bronze workers as the primary elasticity lever.

5. Remove The Default Libvirt Network

Remove the default libvirt network so the lab only uses the explicit OVS design.

Note

Automation reference: playbooks/bootstrap/site.yml, role lab_libvirt.

Remove virbr0 so the lab only uses the explicit OVS/libvirt design.

Shell
# Remove the default libvirt network.virsh net-destroy default || truevirsh net-autostart default --disable || truevirsh net-undefine default || true

6. Create The Lab Switch And VLAN Interfaces

Create the OVS bridge, routed VLAN interfaces, and the host-side networking needed by the nested lab.

Note

Automation reference: playbooks/bootstrap/site.yml, role lab_switch.

Create the OVS bridge, create the routed VLAN interfaces, and bring them up.

Shell
cat <<'EOF' >/usr/local/sbin/aws-metal-openshift-demo-net.sh#!/usr/bin/env bashset -euo pipefailovs-vsctl --may-exist add-br lab-switchfor vlan in 100 200 201 202 300 301 302; do  ovs-vsctl --may-exist add-port lab-switch vlan${vlan} \    -- set interface vlan${vlan} type=internaldoneip link set lab-switch upfor vlan in 100 200 201 202 300 301 302; do  ip link set vlan${vlan} updoneip address replace 172.16.0.1/24 dev vlan100ip address replace 172.16.10.1/24 dev vlan200ip address replace 172.16.11.1/24 dev vlan201ip address replace 172.16.12.1/24 dev vlan202ip address replace 172.16.20.1/24 dev vlan300ip address replace 172.16.21.1/24 dev vlan301EOFchmod 0755 /usr/local/sbin/aws-metal-openshift-demo-net.sh/usr/local/sbin/aws-metal-openshift-demo-net.shcat <<'EOF' >/etc/systemd/system/aws-metal-openshift-demo-net.service[Unit]Description=AWS metal OpenShift demo networkAfter=network-online.target openvswitch.serviceWants=network-online.target[Service]Type=oneshotExecStart=/usr/local/sbin/aws-metal-openshift-demo-net.shRemainAfterExit=yes[Install]WantedBy=multi-user.targetEOFsystemctl daemon-reloadsystemctl enable --now aws-metal-openshift-demo-net.service

7. Configure Firewalld And Host Routing

Configure firewalld and host routing so the lab networks can reach each other and NAT out through the hypervisor uplink.

Note

Automation reference: playbooks/bootstrap/site.yml, role lab_firewall.

Create the lab firewall zone, enable forwarding, and NAT the lab out of the host uplink.

Shell
# Create the lab firewalld zone and enable forwarding and NAT.firewall-cmd --permanent --new-zone=lab || truefirewall-cmd --permanent --zone=external --add-interface=enp125s0for iface in vlan100 vlan200 vlan201 vlan202 vlan300 vlan301; do  firewall-cmd --permanent --zone=lab --add-interface=${iface}donefirewall-cmd --permanent --zone=external --add-masqueradefirewall-cmd --reloadcat <<'EOF' >/etc/sysctl.d/99-aws-metal-openshift-demo.confnet.ipv4.ip_forward = 1EOFsysctl --system

8. Define The Libvirt Network Over OVS

Define the libvirt network and portgroups that place guests onto the OVS bridge and the intended VLANs.

Note

Automation reference: playbooks/bootstrap/site.yml, role lab_libvirt.

Define the lab-switch libvirt network and the portgroups used by the VMs.

Shell
# Define and start the OVS-backed libvirt network.cat <<'EOF' >/etc/libvirt/lab-switch.xml<network>  <name>lab-switch</name>  <forward mode='bridge'/>  <bridge name='lab-switch'/>  <virtualport type='openvswitch'/>  <portgroup name='mgmt-access' default='no'>    <vlan>      <tag id='100'/>    </vlan>  </portgroup>  <portgroup name='ocp-trunk' default='no'>    <vlan trunk='yes'>      <tag id='200'/>      <tag id='201'/>      <tag id='202'/>    </vlan>  </portgroup>  <portgroup name='data300-access' default='no'>    <vlan>      <tag id='300'/>    </vlan>  </portgroup>  <portgroup name='data301-access' default='no'>    <vlan>      <tag id='301'/>    </vlan>  </portgroup>  <portgroup name='data302-access' default='no'>    <vlan>      <tag id='302'/>    </vlan>  </portgroup></network>EOFvirsh net-define /etc/libvirt/lab-switch.xmlvirsh net-start lab-switchvirsh net-autostart lab-switch

9. Stage The Guest Base Image

Stage the base RHEL guest image on the hypervisor. Every support VM is seeded from this image, so get this step right before building guests.

Note

Automation reference: the guest-image staging portion of playbooks/bootstrap/site.yml.

Note

This image is the seed for every support VM. If the wrong image lands here, every guest built from this point forward inherits the problem.

Place the RHEL KVM guest image on the hypervisor so the support VMs can be seeded onto their raw EBS devices.

Shell
# Copy the base RHEL guest image to the hypervisor.mkdir -p /root/imagescp <rhel10-image-path> /root/images/rhel-10.1-x86_64-kvm.qcow2

When a Red Hat direct-download URL is available, the same image can be pulled straight to the hypervisor instead of being copied from the operator workstation.

Shell
# Download the base RHEL guest image directly to the hypervisor.mkdir -p /root/imagescurl -L '<rhel10-kvm-direct-download-url>' \  -o /root/images/rhel-10.1-x86_64-kvm.qcow2

10. Build The IdM VM

Build the idm-01 VM shell on the hypervisor, seed its disk from the RHEL image, and attach the cloud-init data needed for first boot.

Note

Automation reference: playbooks/bootstrap/idm.yml, role idm.

Seed the idm-01 disk from the RHEL image, create cloud-init, and build the VM on the management VLAN.

Shell
ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1mkdir -p /var/lib/aws-metal-openshift-demo/idm-01qemu-img convert -f qcow2 -O raw \  <rhel10-image-path> \  /dev/ebs/idm-01SSH_PUBKEY="$(cat <operator-public-key>)"cat <<'EOF' >/var/lib/aws-metal-openshift-demo/idm-01/meta-datainstance-id: idm-01local-hostname: idm-01.workshop.lanEOFcat <<'EOF' >/var/lib/aws-metal-openshift-demo/idm-01/network-configversion: 2ethernets:  eth0:    dhcp4: false    addresses:      - 172.16.0.10/24    routes:      - to: 0.0.0.0/0          via: 172.16.0.1    nameservers:      search: [workshop.lan]      addresses: [172.16.0.10, 8.8.8.8, 4.4.4.4]EOFcat <<EOF >/var/lib/aws-metal-openshift-demo/idm-01/user-data#cloud-configfqdn: idm-01.workshop.lanmanage_etc_hosts: trueusers:  - default  - name: cloud-user      groups: [wheel]      sudo: ALL=(ALL) NOPASSWD:ALL      lock_passwd: false      passwd: $6$rounds=4096$temporary$BfY4OskkM6jv8v6eK9aT8W7F7Y9Q8nN2m5vQzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz      ssh_authorized_keys:      - ${SSH_PUBKEY}runcmd:  - [ sh, -c, 'echo nameserver 127.0.0.1 >/etc/resolv.conf' ]EOFcloud-localds \  --network-config=/var/lib/aws-metal-openshift-demo/idm-01/network-config \  /var/lib/aws-metal-openshift-demo/idm-01/seed.iso \  /var/lib/aws-metal-openshift-demo/idm-01/user-data \  /var/lib/aws-metal-openshift-demo/idm-01/meta-datavirt-install \  --name idm-01.workshop.lan \  --memory 8192 \  --vcpus 2 \  --cpu host-passthrough \  --machine q35 \  --import \  --os-variant rhel10.0 \  --graphics none \  --console pty,target_type=serial \  --network network=lab-switch,portgroup=mgmt-access,model=virtio \  --controller type=scsi,model=virtio-scsi \  --disk path=/dev/ebs/idm-01,device=disk,bus=scsi,rotation_rate=1 \  --disk path=/var/lib/aws-metal-openshift-demo/idm-01/seed.iso,device=cdrom,bus=sata \  --resource partition=/machine/silver \  --cputune shares=333,emulatorpin.cpuset=2-5,26-29,50-53,74-77,\vcpupin0.vcpu=0,vcpupin0.cpuset=6-23,30-47,54-71,78-95,\vcpupin1.vcpu=1,vcpupin1.cpuset=6-23,30-47,54-71,78-95 \  --noautoconsole

That places idm-01 into the Silver performance domain:

  • partition: /machine/silver
  • shares: 333
  • vCPU threads: guest_domain
  • emulator thread: host_emulator

The current automation also prefers a guest poweroff plus host-side virsh start for the first post-update cycle so cloud-init media cleanup in persistent XML becomes the next live device model immediately, instead of surviving as an empty CD-ROM through an in-guest reboot.

11. Configure IdM In The Guest

Configure the IdM guest after first boot: update it, install IPA, enable the supporting services, and create the initial identity data the lab depends on.

Note

Automation reference: playbooks/bootstrap/idm.yml, role idm_guest.

Update redhat-release before the main system update. This ensures the current Red Hat Post-Quantum Cryptography public keys are present before DNF validates newer packages. See: https://access.redhat.com/solutions/3449341

Update the guest, install IdM, enable Cockpit and session recording, and create the core users and groups used by OpenShift.

Shell
# Install and configure IdM in the guest.ssh -i /opt/openshift/secrets/hypervisor-admin.key cloud-user@172.16.0.10sudo -idnf -y update redhat-releasednf -y updaterebootdnf -y install \  ipa-server \  ipa-server-dns \  idm-pki-kra \  ipa-server-trust-ad \  cockpit \  cockpit-files \  cockpit-networkmanager \  cockpit-podman \  tlog \  sssd \  oddjob \  oddjob-mkhomedir \  insights-client \  authselect-compatinsights-client --registerfirewall-cmd --permanent --add-service=cockpitfirewall-cmd --permanent --add-service=dnsfirewall-cmd --permanent --add-service=freeipa-4firewall-cmd --permanent --add-service=freeipa-trustfirewall-cmd --reloadipa-server-install -U \  --realm=WORKSHOP.LAN \  --domain=workshop.lan \  --hostname=idm-01.workshop.lan \  --ds-password='<lab-default-password>' \  --admin-password='<lab-default-password>' \  --setup-dns \  --auto-forwarders \  --no-host-dnskinit admin <<< '<lab-default-password>'ipa-kra-install -U -p '<lab-default-password>'ipa dnsconfig-mod --forwarder=8.8.8.8 --forwarder=1.1.1.1ipa group-add access-openshift-admin || trueipa group-add virt-admin || trueipa group-add developer || trueipa group-add admins || trueipa pwpolicy-add admins \  --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \  --priority=40 2>/dev/null || \ipa pwpolicy-mod admins \  --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \  --priority=40ipa pwpolicy-add access-openshift-admin \  --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \  --priority=50 2>/dev/null || \ipa pwpolicy-mod access-openshift-admin \  --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \  --priority=50ipa pwpolicy-add virt-admin \  --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \  --priority=60 2>/dev/null || \ipa pwpolicy-mod virt-admin \  --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \  --priority=60ipa pwpolicy-add developer \  --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \  --priority=70 2>/dev/null || \ipa pwpolicy-mod developer \  --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \  --priority=70ipa user-add sysop --first=Sys --last=Op --shell=/bin/bash --password <<< '<lab-default-password>'ipa user-add virtadm --first=Virt --last=Admin --shell=/bin/bash --password <<< '<lab-default-password>'ipa user-add dev --first=Dev --last=User --shell=/bin/bash --password <<< '<lab-default-password>'ipa user-mod sysop --setattr=krbPasswordExpiration=20360313235039Zipa user-mod virtadm --setattr=krbPasswordExpiration=20360313235039Zipa user-mod dev --setattr=krbPasswordExpiration=20360313235039Zipa dnsrecord-add workshop.lan virt-01 --a-rec=172.16.0.1 2>/dev/null || \ipa dnsrecord-mod workshop.lan virt-01 --a-rec=172.16.0.1ipa dnszone-add 0.16.172.in-addr.arpa \  --name-server=idm-01.workshop.lan. \  --admin-email=hostmaster.workshop.lan \  --dynamic-update=FALSE 2>/dev/null || trueipa dnsrecord-add 0.16.172.in-addr.arpa 1 \  --ptr-rec=virt-01.workshop.lan. 2>/dev/null || \ipa dnsrecord-mod 0.16.172.in-addr.arpa 1 \  --ptr-rec=virt-01.workshop.lan.cat <<'EOF' >/etc/named/ipa-ext.conf/* User customization for BIND named */acl "trusted_network" {  localhost;  localnets;  172.16.0.0/24;  172.16.10.0/24;  172.16.11.0/24;  172.16.12.0/24;  172.16.20.0/24;  172.16.21.0/24;  172.16.22.0/24;};EOFcat <<'EOF' >/etc/named/ipa-options-ext.conf/* User customization for BIND named */listen-on-v6 { any; };dnssec-validation yes;allow-query { trusted_network; };allow-recursion { trusted_network; };allow-query-cache { trusted_network; };EOFnamed-checkconf /etc/named.confsystemctl restart namedsystemctl is-active namedipa group-add-member admins --users=sysopipa group-add-member access-openshift-admin --users=sysopipa group-add-member virt-admin --users=virtadmipa group-add-member developer --users=devipa sudorule-add admins-nopasswd-all \  --desc='Permit admins group members to run any command on any host without authentication'ipa sudorule-mod admins-nopasswd-all --hostcat=allipa sudorule-mod admins-nopasswd-all --cmdcat=allipa sudorule-mod admins-nopasswd-all --runasusercat=allipa sudorule-mod admins-nopasswd-all --runasgroupcat=allipa sudorule-add-user admins-nopasswd-all --groups=adminsipa sudorule-add-option admins-nopasswd-all --sudooption='!authenticate'systemctl enable --now cockpit.socketsystemctl enable --now oddjobd.serviceauthselect select sssd with-tlog with-mkhomedir with-sudo --forcesystemctl restart sssdsss_cache -Esssctl domain-status workshop.lan

12. Build The Bastion VM

Build the bastion VM shell on VLAN 100. This becomes the execution host for all remaining in-lab work.

Note

Automation reference: playbooks/bootstrap/bastion.yml, role bastion.

Create the bastion on VLAN 100. This becomes the execution host for the rest of the lab.

Note

The validated flow builds the bastion before IdM. The initial bastion build does not enroll the guest into IdM. That enrollment now happens later in 13B. Join The Bastion To IdM.

Shell
mkdir -p /var/lib/aws-metal-openshift-demo/bastion-01qemu-img convert -f qcow2 -O raw \  <rhel10-image-path> \  /dev/ebs/bastion-01SSH_PUBKEY="$(cat <operator-public-key>)"cat <<'EOF' >/var/lib/aws-metal-openshift-demo/bastion-01/meta-datainstance-id: bastion-01local-hostname: bastion-01.workshop.lanEOFcat <<'EOF' >/var/lib/aws-metal-openshift-demo/bastion-01/network-configversion: 2ethernets:  eth0:    dhcp4: false    addresses:      - 172.16.0.30/24    routes:      - to: 0.0.0.0/0          via: 172.16.0.1    nameservers:      search: [workshop.lan]      addresses: [172.16.0.10, 8.8.8.8, 4.4.4.4]EOFcat <<EOF >/var/lib/aws-metal-openshift-demo/bastion-01/user-data#cloud-configfqdn: bastion-01.workshop.lanmanage_etc_hosts: trueusers:  - default  - name: cloud-user      groups: [wheel]      sudo: ALL=(ALL) NOPASSWD:ALL      lock_passwd: false      passwd: $6$rounds=4096$temporary$BfY4OskkM6jv8v6eK9aT8W7F7Y9Q8nN2m5vQzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz      ssh_authorized_keys:      - ${SSH_PUBKEY}EOFcloud-localds \  --network-config=/var/lib/aws-metal-openshift-demo/bastion-01/network-config \  /var/lib/aws-metal-openshift-demo/bastion-01/seed.iso \  /var/lib/aws-metal-openshift-demo/bastion-01/user-data \  /var/lib/aws-metal-openshift-demo/bastion-01/meta-datavirt-install \  --name bastion-01.workshop.lan \  --memory 8192 \  --vcpus 4 \  --cpu host-passthrough \  --machine q35 \  --import \  --os-variant rhel10.0 \  --graphics none \  --console pty,target_type=serial \  --network network=lab-switch,portgroup=mgmt-access,model=virtio \  --controller type=scsi,model=virtio-scsi \  --disk path=/dev/ebs/bastion-01,device=disk,bus=scsi,rotation_rate=1 \  --disk path=/var/lib/aws-metal-openshift-demo/bastion-01/seed.iso,device=cdrom,bus=sata \  --resource partition=/machine/bronze \  --cputune shares=167,emulatorpin.cpuset=2-5,26-29,50-53,74-77,\vcpupin0.vcpu=0,vcpupin0.cpuset=6-23,30-47,54-71,78-95,\vcpupin1.vcpu=1,vcpupin1.cpuset=6-23,30-47,54-71,78-95,\vcpupin2.vcpu=2,vcpupin2.cpuset=6-23,30-47,54-71,78-95,\vcpupin3.vcpu=3,vcpupin3.cpuset=6-23,30-47,54-71,78-95 \  --noautoconsolessh -i <operator-ssh-key> cloud-user@172.16.0.30 \  "sudo dnf -y install \     git ansible-core ansible-lint jq podman wget tar make insights-client \     cockpit-files cockpit-packagekit cockpit-podman \     cockpit-session-recording cockpit-image-builder \     pcp pcp-system-tools oddjob oddjob-mkhomedir; \   sudo insights-client --register; \   sudo systemctl enable --now cockpit.socket; \   sudo systemctl enable --now osbuild-composer.socket; \   sudo systemctl enable --now pmcd pmlogger pmproxy; \   sudo systemctl enable --now oddjobd"

That places bastion-01 into the Bronze performance domain.

The current automation uses the same support-guest lifecycle as IdM: when the first package update requires a restart, the guest powers off, the seed media is cleaned from persistent XML while the domain is down, and the hypervisor starts the domain again.

13. Stage The Project To The Bastion

Stage the project, secrets, and operator tools onto the bastion so the rest of the build can run from inside the lab.

Note

Automation reference: playbooks/bootstrap/bastion-stage.yml, role bastion_stage.

Important

This is the last step that runs from the operator workstation. After staging completes, all remaining work runs from bastion-01. Do not criss-cross between workstation and bastion for subsequent steps.

Copy the repo, the pull secret, and the SSH keys to the bastion so the rest of the work happens from inside the lab. The current orchestration also creates a ready-to-use shell environment for cloud-user and current IdM admins members, including $HOME/bin, $HOME/etc, tool symlinks, and a login-time KUBECONFIG export when the cluster artifacts exist.

Note

Automation reference: playbooks/bootstrap/bastion-stage.yml, role bastion_stage plus the managed name-resolution role that seeds the bootstrap /etc/hosts fallback for bastion, IdM, mirror-registry, and the cluster API endpoints.

The bastion staging phase also installs the execution-time Python requirements needed for Windows orchestration, including pywinrm.

Shell
# Copy the project tree to bastion without overwriting generated output or secrets.rsync -a --delete \  --exclude generated \  --exclude secrets \  -e "ssh -i <operator-ssh-key>" \  <project-root>/ \  cloud-user@172.16.0.30:/tmp/aws-metal-openshift-demo/scp -i <operator-ssh-key> <pull-secret-file> cloud-user@172.16.0.30:/tmp/pull-secret.txtscp -i <operator-ssh-key> <operator-ssh-key> cloud-user@172.16.0.30:/tmp/hypervisor-admin.keyscp -i <operator-ssh-key> <operator-public-key> cloud-user@172.16.0.30:/tmp/hypervisor-admin.pubssh -i <operator-ssh-key> cloud-user@172.16.0.30 <<'EOF'sudo mkdir -p /opt/openshift /opt/openshift/secretssudo rsync -a --delete \  --exclude generated \  --exclude secrets \  /tmp/aws-metal-openshift-demo/ /opt/openshift/aws-metal-openshift-demo/sudo mv /tmp/pull-secret.txt /opt/openshift/secrets/pull-secret.txtsudo mv /tmp/hypervisor-admin.key /opt/openshift/secrets/hypervisor-admin.keysudo mv /tmp/hypervisor-admin.pub /opt/openshift/secrets/hypervisor-admin.pubsudo chmod 0600 /opt/openshift/secrets/hypervisor-admin.keysudo chown -R cloud-user:cloud-user /opt/openshiftsudo dnf -y install python3-pipsudo python3 -m pip install -r /opt/openshift/aws-metal-openshift-demo/requirements-pip.txtEOF

After the cluster artifacts exist, the manual equivalent of the helper layout is:

Shell
# Create the bastion helper layout and local tool symlinks.ssh -i <operator-ssh-key> cloud-user@172.16.0.30 <<'EOF'set -euo pipefailmkdir -p "$HOME/bin" "$HOME/etc"ln -sfn /opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin/oc "$HOME/bin/oc"ln -sfn /opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin/kubectl "$HOME/bin/kubectl"ln -sfn /opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin/openshift-install "$HOME/bin/openshift-install"ln -sfn /usr/local/bin/track-mirror-progress "$HOME/bin/track-mirror-progress"ln -sfn /usr/local/bin/track-mirror-progress-tmux "$HOME/bin/track-mirror-progress-tmux"ln -sfn /opt/openshift/aws-metal-openshift-demo/scripts/run_bastion_playbook.sh "$HOME/bin/run-bastion-playbook"cp /opt/openshift/aws-metal-openshift-demo/generated/ocp/auth/kubeconfig "$HOME/etc/kubeconfig"cp /opt/openshift/aws-metal-openshift-demo/generated/ocp/auth/kubeconfig "$HOME/etc/kubeconfig.local"chmod 0600 "$HOME/etc/kubeconfig" "$HOME/etc/kubeconfig.local"ln -sfn /opt/openshift/aws-metal-openshift-demo/generated/ocp/idm-ca.crt "$HOME/etc/idm-ca.crt"cat <<'PROFILE' | sudo tee /etc/profile.d/openshift-bastion.sh >/dev/nullcase ":$PATH:" in  *":$HOME/bin:"*) ;;  *) PATH="$HOME/bin:$PATH" ;;esaccase ":$PATH:" in  *":/opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin:"*) ;;  *) PATH="/opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin:$PATH" ;;esacexport KUBECONFIG_ADMIN="$HOME/etc/kubeconfig.local"if [ -z "${KUBECONFIG:-}" ]; then  if [ -r "$HOME/etc/kubeconfig" ]; then    export KUBECONFIG="$HOME/etc/kubeconfig"  elif [ -r "$KUBECONFIG_ADMIN" ]; then    export KUBECONFIG="$KUBECONFIG_ADMIN"  fifiPROFILEEOF

Iterative Development With push_and_run.sh

This helper is not part of the manual standup path. It exists only to shorten developer edit-sync-rerun cycles after the bastion is already staged.

After the initial staging, use the lightweight scripts/push_and_run.sh helper for iterative code changes. It rsyncs only the role/playbook/vars tree (excluding inventory/, secrets/, and generated/ — all of which have bastion-specific content that must not be overwritten) and runs the specified playbook in a single blocking call.

For normal operator reruns of bastion-native playbooks, prefer scripts/run_remote_bastion_playbook.sh. It refreshes the full staged tree first and matches the documented golden path more closely than the lightweight developer helper.

Shell
# From the operator workstationcd <project-root>./scripts/push_and_run.sh playbooks/day2/openshift-post-install-infra.yml./scripts/push_and_run.sh playbooks/day2/openshift-post-install-ldap-auth.yml -e some_override=true

The script:

  • syncs only code changes (not generated artifacts, secrets, or inventory)
  • runs the playbook as cloud-user on the bastion in a blocking foreground SSH session
  • shows only PLAY RECAP on success
  • dumps the full output on failure

This reduces the edit → sync → run → check cycle to a single command.

Token optimization for AI-assisted development:

When using an AI assistant to develop against this codebase:

  • Batch all code edits locally before syncing. Get the code right by reading it and reasoning about correctness, then sync and run once. Do not iterate through the bastion.
  • Check PLAY RECAP first. Only read the full playbook log on failure. push_and_run.sh does this automatically.
  • Do not debug runtime infrastructure issues through the AI. SSH key loading failures, SELinux context mismatches, and service connectivity problems are faster and cheaper to debug in a terminal. Report the findings back and let the AI adjust the code.
  • Use the right model tier for the task. Use a reasoning-heavy model for planning, doc rewrites, and multi-source investigations. Switch to a faster model for mechanical execution: running playbooks, committing, syncing.

13A. Optionally Build AD DS And AD CS From Bastion

This is the manual AD build path: prepare media on virt-01, create the VM, complete the first boot, then configure AD DS and AD CS directly inside Windows.

Note

Automation reference: playbooks/bootstrap/ad-server.yml.

Important

This phase is optional and default-disabled in automation: lab_build_ad_server: false. Enable it only when you want the lab AD DS / AD CS server.

Note

Before enabling this path, download Windows Server 2025 evaluation media from the Microsoft Evaluation Center: https://www.microsoft.com/en-us/evalcenter/download-windows-server-2025 The currently validated selection is English (United States) -> ISO download -> 64-bit edition.

The manual path below follows the same validated design as the automated AD build:

  • install Windows Server 2025 onto /dev/ebs/ad-01
  • use virtio-win.iso for the required storage and network drivers
  • complete the remaining guest-tools and virtio-driver work after the OS is up
  • promote the server to corp.lan
  • install AD CS and Web Enrollment
  • seed the demo users and groups
  • export the root CA

Validated guest identity:

  • VM/domain: ad-01.corp.lan
  • Windows hostname: AD-01
  • IPv4: 172.16.0.40/24
  • gateway: 172.16.0.1
  • DNS: 172.16.0.10, 8.8.8.8

Confirm The Required Media On virt-01

Shell
# Confirm the Windows media and target disk are present on virt-01.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1ls -l /root/images/26100.32230.260111-0550.lt_release_svc_refresh_SERVER_EVAL_x64FRE_en-us.isols -l /root/images/virtio-win.isols -l /dev/ebs/ad-01exit

If virtio-win.iso is not already staged, the current documented source is:

Prepare The Unattended Install Media On virt-01

Create a small OEMDRV ISO that provides the answer file to Windows Setup. This keeps the manual path aligned with the validated unattended install.

Shell
# Create the AD answer-file media.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'set -euo pipefailmkdir -p /var/lib/aws-metal-openshift-demo/ad-01/autounattendinstall -o qemu -g qemu -m 0644 \  /root/images/26100.32230.260111-0550.lt_release_svc_refresh_SERVER_EVAL_x64FRE_en-us.iso \  /var/lib/aws-metal-openshift-demo/ad-01/26100.32230.260111-0550.lt_release_svc_refresh_SERVER_EVAL_x64FRE_en-us.isoinstall -o qemu -g qemu -m 0644 \  /root/images/virtio-win.iso \  /var/lib/aws-metal-openshift-demo/ad-01/virtio-win.isocat >/var/lib/aws-metal-openshift-demo/ad-01/autounattend/autounattend.xml <<'XML'<?xml version="1.0" encoding="utf-8"?><unattend xmlns="urn:schemas-microsoft-com:unattend">  <settings pass="windowsPE">    <component name="Microsoft-Windows-International-Core-WinPE"               processorArchitecture="amd64"               publicKeyToken="31bf3856ad364e35"               language="neutral"               versionScope="nonSxS">      <SetupUILanguage><UILanguage>en-US</UILanguage></SetupUILanguage>      <InputLocale>en-US</InputLocale>      <SystemLocale>en-US</SystemLocale>      <UILanguage>en-US</UILanguage>      <UserLocale>en-US</UserLocale>    </component>    <component name="Microsoft-Windows-PnpCustomizationsWinPE"               processorArchitecture="amd64"               publicKeyToken="31bf3856ad364e35"               language="neutral"               versionScope="nonSxS"               xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State">      <DriverPaths>        <PathAndCredentials wcm:action="add" wcm:keyValue="1">          <Path>E:\vioscsi\2k25\amd64</Path>        </PathAndCredentials>        <PathAndCredentials wcm:action="add" wcm:keyValue="2">          <Path>E:\NetKVM\2k25\amd64</Path>        </PathAndCredentials>      </DriverPaths>    </component>    <component name="Microsoft-Windows-Setup"               processorArchitecture="amd64"               publicKeyToken="31bf3856ad364e35"               language="neutral"               versionScope="nonSxS"               xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State">      <DiskConfiguration>        <Disk wcm:action="add">          <DiskID>0</DiskID>          <WillWipeDisk>true</WillWipeDisk>          <CreatePartitions>            <CreatePartition wcm:action="add"><Order>1</Order><Size>260</Size><Type>EFI</Type></CreatePartition>            <CreatePartition wcm:action="add"><Order>2</Order><Size>128</Size><Type>MSR</Type></CreatePartition>            <CreatePartition wcm:action="add"><Order>3</Order><Extend>true</Extend><Type>Primary</Type></CreatePartition>          </CreatePartitions>          <ModifyPartitions>            <ModifyPartition wcm:action="add"><Order>1</Order><PartitionID>1</PartitionID><Format>FAT32</Format><Label>EFI</Label></ModifyPartition>            <ModifyPartition wcm:action="add"><Order>2</Order><PartitionID>2</PartitionID></ModifyPartition>            <ModifyPartition wcm:action="add"><Order>3</Order><PartitionID>3</PartitionID><Format>NTFS</Format><Label>Windows</Label><Letter>C</Letter></ModifyPartition>          </ModifyPartitions>        </Disk>      </DiskConfiguration>      <ImageInstall>        <OSImage>          <InstallTo><DiskID>0</DiskID><PartitionID>3</PartitionID></InstallTo>          <InstallFrom>            <MetaData wcm:action="add"><Key>/IMAGE/INDEX</Key><Value>2</Value></MetaData>          </InstallFrom>        </OSImage>      </ImageInstall>      <UserData>        <AcceptEula>true</AcceptEula>        <ProductKey><WillShowUI>Never</WillShowUI></ProductKey>      </UserData>    </component>  </settings>  <settings pass="specialize">    <component name="Microsoft-Windows-Shell-Setup"               processorArchitecture="amd64"               publicKeyToken="31bf3856ad364e35"               language="neutral"               versionScope="nonSxS">      <ComputerName>AD-01</ComputerName>      <TimeZone>UTC</TimeZone>    </component>    <component name="Microsoft-Windows-TerminalServices-LocalSessionManager"               processorArchitecture="amd64"               publicKeyToken="31bf3856ad364e35"               language="neutral"               versionScope="nonSxS">      <fDenyTSConnections>false</fDenyTSConnections>    </component>  </settings>  <settings pass="oobeSystem">    <component name="Microsoft-Windows-International-Core"               processorArchitecture="amd64"               publicKeyToken="31bf3856ad364e35"               language="neutral"               versionScope="nonSxS">      <InputLocale>en-US</InputLocale>      <SystemLocale>en-US</SystemLocale>      <UILanguage>en-US</UILanguage>      <UserLocale>en-US</UserLocale>    </component>    <component name="Microsoft-Windows-Shell-Setup"               processorArchitecture="amd64"               publicKeyToken="31bf3856ad364e35"               language="neutral"               versionScope="nonSxS"               xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State">      <OOBE>        <HideEULAPage>true</HideEULAPage>        <HideLocalAccountScreen>true</HideLocalAccountScreen>        <HideOEMRegistrationScreen>true</HideOEMRegistrationScreen>        <HideOnlineAccountScreens>true</HideOnlineAccountScreens>        <HideWirelessSetupInOOBE>true</HideWirelessSetupInOOBE>        <ProtectYourPC>3</ProtectYourPC>        <SkipMachineOOBE>true</SkipMachineOOBE>        <SkipUserOOBE>true</SkipUserOOBE>      </OOBE>      <UserAccounts>        <AdministratorPassword>          <Value>REPLACE_WITH_LAB_DEFAULT_PASSWORD</Value>          <PlainText>true</PlainText>        </AdministratorPassword>      </UserAccounts>      <AutoLogon>        <Enabled>true</Enabled>        <Username>Administrator</Username>        <Password>          <Value>REPLACE_WITH_LAB_DEFAULT_PASSWORD</Value>          <PlainText>true</PlainText>        </Password>        <LogonCount>3</LogonCount>      </AutoLogon>      <FirstLogonCommands>        <SynchronousCommand wcm:action="add">          <Order>1</Order>          <CommandLine>powershell -NoProfile -Command "Set-ExecutionPolicy RemoteSigned -Force"</CommandLine>          <Description>Set PowerShell execution policy</Description>        </SynchronousCommand>        <SynchronousCommand wcm:action="add">          <Order>2</Order>          <CommandLine>powershell -NoProfile -Command "$adapter = Get-NetAdapter | Where-Object { $_.Status -ne 'Disabled' } | Sort-Object ifIndex | Select-Object -First 1; Get-NetIPAddress -InterfaceAlias $adapter.Name -AddressFamily IPv4 -ErrorAction SilentlyContinue | Remove-NetIPAddress -Confirm:$false -ErrorAction SilentlyContinue; New-NetIPAddress -InterfaceAlias $adapter.Name -IPAddress '172.16.0.40' -PrefixLength 24 -DefaultGateway '172.16.0.1'; Set-DnsClientServerAddress -InterfaceAlias $adapter.Name -ServerAddresses @('172.16.0.10','8.8.8.8')"</CommandLine>          <Description>Configure static IPv4 networking</Description>        </SynchronousCommand>        <SynchronousCommand wcm:action="add">          <Order>3</Order>          <CommandLine>powershell -NoProfile -Command "Set-DnsClientGlobalSetting -SuffixSearchList @('corp.lan')"</CommandLine>          <Description>Set the DNS suffix search list</Description>        </SynchronousCommand>        <SynchronousCommand wcm:action="add">          <Order>4</Order>          <CommandLine>powershell -NoProfile -Command "Enable-PSRemoting -Force -SkipNetworkProfileCheck"</CommandLine>          <Description>Enable PowerShell remoting</Description>        </SynchronousCommand>        <SynchronousCommand wcm:action="add">          <Order>5</Order>          <CommandLine>powershell -NoProfile -Command "Set-Item WSMan:\localhost\Service\Auth\Basic -Value $true"</CommandLine>          <Description>Enable WinRM basic auth</Description>        </SynchronousCommand>        <SynchronousCommand wcm:action="add">          <Order>6</Order>          <CommandLine>powershell -NoProfile -Command "Set-Item WSMan:\localhost\Service\AllowUnencrypted -Value $true"</CommandLine>          <Description>Allow unencrypted WinRM for lab</Description>        </SynchronousCommand>        <SynchronousCommand wcm:action="add">          <Order>7</Order>          <CommandLine>powershell -NoProfile -Command "New-NetFirewallRule -DisplayName 'WinRM HTTP' -Direction Inbound -Protocol TCP -LocalPort 5985 -Action Allow"</CommandLine>          <Description>Open WinRM firewall port</Description>        </SynchronousCommand>        <SynchronousCommand wcm:action="add">          <Order>8</Order>          <CommandLine>powershell -NoProfile -Command "Restart-Service WinRM"</CommandLine>          <Description>Restart WinRM service</Description>        </SynchronousCommand>        <SynchronousCommand wcm:action="add">          <Order>9</Order>          <CommandLine>reg add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon" /v AutoAdminLogon /t REG_SZ /d 0 /f</CommandLine>          <Description>Disable auto-logon</Description>        </SynchronousCommand>      </FirstLogonCommands>    </component>  </settings></unattend>XMLsed -i "s/REPLACE_WITH_LAB_DEFAULT_PASSWORD/<lab-default-password>/g" \  /var/lib/aws-metal-openshift-demo/ad-01/autounattend/autounattend.xmlxorriso -as mkisofs \  -o /var/lib/aws-metal-openshift-demo/ad-01/ad-01-autounattend.iso \  -V OEMDRV -J -R -graft-points \  autounattend.xml=/var/lib/aws-metal-openshift-demo/ad-01/autounattend/autounattend.xmlchown qemu:qemu /var/lib/aws-metal-openshift-demo/ad-01/ad-01-autounattend.isoEOF

Create The VM On virt-01

Shell
# Create the AD guest on virt-01 and attach the Windows installer media.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'set -euo pipefaildd if=/dev/zero of=/dev/ebs/ad-01 bs=1M count=1 conv=notruncvirt-install \  --name ad-01.corp.lan \  --osinfo win2k25 \  --boot uefi,loader_secure=no \  --machine q35 \  --memory 8192 \  --vcpus 4 \  --cpu host-passthrough \  --controller type=scsi,model=virtio-scsi \  --controller type=virtio-serial,index=0 \  --disk path=/dev/ebs/ad-01,format=raw,bus=scsi,cache=none,io=native,discard=unmap,rotation_rate=1,boot_order=2 \  --disk path=/var/lib/aws-metal-openshift-demo/ad-01/26100.32230.260111-0550.lt_release_svc_refresh_SERVER_EVAL_x64FRE_en-us.iso,device=cdrom,readonly=on,boot_order=1 \  --disk path=/var/lib/aws-metal-openshift-demo/ad-01/virtio-win.iso,device=cdrom,readonly=on \  --disk path=/var/lib/aws-metal-openshift-demo/ad-01/ad-01-autounattend.iso,device=cdrom,readonly=on \  --network network=lab-switch,portgroup=mgmt-access,model=virtio,mac=52:54:00:50:01:05 \  --channel unix,target_type=virtio,name=org.qemu.guest_agent.0 \  --rng builtin \  --graphics vnc,listen=0.0.0.0 \  --console pty,target_type=serial \  --autostart \  --resource partition=/machine/bronze \  --cputune shares=167,emulatorpin.cpuset=2-5,26-29,50-53,74-77,\vcpupin0.vcpu=0,vcpupin0.cpuset=6-23,30-47,54-71,78-95,\vcpupin1.vcpu=1,vcpupin1.cpuset=6-23,30-47,54-71,78-95,\vcpupin2.vcpu=2,vcpupin2.cpuset=6-23,30-47,54-71,78-95,\vcpupin3.vcpu=3,vcpupin3.cpuset=6-23,30-47,54-71,78-95 \  --noautoconsoleEOF

Use the Cockpit console or virt-viewer to watch first boot. On the validated media path, UEFI may present a DVD boot menu first. If it does:

  • choose the first DVD entry
  • at Press any key to boot from CD or DVD, press Enter

Windows should then:

  • load the boot-critical vioscsi and NetKVM drivers from virtio-win.iso
  • partition /dev/ebs/ad-01
  • install Server 2025
  • come up as AD-01
  • apply the static IP and WinRM settings from autounattend.xml

Verify First WinRM Reachability

From the bastion:

Shell
# Verify that WinRM is reachable on the AD guest.curl -sI http://172.16.0.40:5985/wsman | head -n 1

You should see an HTTP response from Microsoft-HTTPAPI/2.0, which confirms that the WinRM listener is up.

Install Remaining Virtio Components And Guest Agent

Log in to the Windows console as Administrator, open an elevated PowerShell, locate the virtio media drive letter, then install the guest-tools bundle, the remaining drivers, and the QEMU guest agent:

PowerShell
$virtio = Get-PSDrive -PSProvider FileSystem |  ForEach-Object { $_.Root.TrimEnd('\') } |  Where-Object { Test-Path "$_\guest-agent\qemu-ga-x86_64.msi" } |  Select-Object -First 1msiexec /i "$virtio\virtio-win-gt-x64.msi" /qn /norestartpnputil /add-driver "$virtio\Balloon\2k25\amd64\*.inf" /installpnputil /add-driver "$virtio\qemufwcfg\2k25\amd64\*.inf" /installpnputil /add-driver "$virtio\vioserial\2k25\amd64\*.inf" /installpnputil /add-driver "$virtio\viorng\2k25\amd64\*.inf" /installmsiexec /i "$virtio\guest-agent\qemu-ga-x86_64.msi" /qn /norestart$svc = Get-Service | Where-Object {  $_.Name -in @('QEMU-GA', 'qemu-ga') -or  $_.DisplayName -like '*QEMU*Guest*Agent*'} | Select-Object -First 1Set-Service -Name $svc.Name -StartupType AutomaticStart-Service -Name $svc.NameGet-CimInstance Win32_Service -Filter "Name='$($svc.Name)'"

Promote The Server To corp.lan

Still in elevated PowerShell on AD-01:

PowerShell
Install-WindowsFeature AD-Domain-Services,DNS -IncludeManagementToolsInstall-ADDSForest `  -DomainName 'corp.lan' `  -DomainNetbiosName 'CORP' `  -SafeModeAdministratorPassword (ConvertTo-SecureString '<lab-default-password>' -AsPlainText -Force) `  -InstallDns `  -Force

Let the server reboot. After it comes back, verify domain-controller state:

PowerShell
Get-ADDomainControllerGet-ADDomain

Configure DNS Forwarding And AD CS

PowerShell
Add-DnsServerConditionalForwarderZone `  -Name 'workshop.lan' `  -MasterServers @('172.16.0.10') `  -ReplicationScope ForestInstall-WindowsFeature AD-Certificate,ADCS-Cert-Authority,ADCS-Web-Enrollment -IncludeManagementToolsInstall-AdcsCertificationAuthority `  -CAType EnterpriseRootCA `  -CryptoProviderName 'RSA#Microsoft Software Key Storage Provider' `  -KeyLength 4096 `  -HashAlgorithmName SHA256 `  -CACommonName 'CORP Enterprise Root CA' `  -ValidityPeriod Years `  -ValidityPeriodUnits 10 `  -ForceInstall-AdcsWebEnrollment -Forcecertutil -ping

Seed The Demo Groups And Users

PowerShell
$base = 'CN=Users,DC=corp,DC=lan''OpenShift-Admins','OpenShift-Virt-Admins','Ansible-Automation-Admins','Developers' | ForEach-Object {  if (-not (Get-ADGroup -Filter "Name -eq '$_'" -ErrorAction SilentlyContinue)) {    New-ADGroup -Name $_ -GroupScope Global -GroupCategory Security -Path $base  }}$users = @(  @{ name='ad-directoryadmin'; first='Directory';     last='Admin';         groups=@('Domain Admins') },  @{ name='ad-ocpadmin';      first='OpenShift';     last='Admin';         groups=@('OpenShift-Admins') },  @{ name='ad-virtadmin';     first='Virtualization';last='Admin';         groups=@('OpenShift-Virt-Admins') },  @{ name='ad-aapadmin';      first='Automation';    last='Admin';         groups=@('Ansible-Automation-Admins') },  @{ name='ad-dev01';         first='Developer';     last='One';           groups=@('Developers') })$pw = ConvertTo-SecureString '<lab-default-password>' -AsPlainText -Forceforeach ($u in $users) {  if (-not (Get-ADUser -Filter "SamAccountName -eq '$($u.name)'" -ErrorAction SilentlyContinue)) {    New-ADUser `      -Name "$($u.first) $($u.last)" `      -GivenName $u.first `      -Surname $u.last `      -SamAccountName $u.name `      -UserPrincipalName "$($u.name)@corp.lan" `      -AccountPassword $pw `      -Enabled $true `      -PasswordNeverExpires $true `      -Path $base  }  foreach ($group in $u.groups) {    $members = Get-ADGroupMember -Identity $group -ErrorAction SilentlyContinue |      Select-Object -ExpandProperty SamAccountName    if ($members -notcontains $u.name) {      Add-ADGroupMember -Identity $group -Members $u.name    }  }}

Open The Required Windows Firewall Groups

PowerShell
$displayGroups = @(  'DNS Service',  'Kerberos Key Distribution Center',  'Active Directory Domain Services',  'Certification Authority',  'Windows Remote Management')foreach ($group in $displayGroups) {  $rules = Get-NetFirewallRule | Where-Object { $_.DisplayGroup -eq $group }  if ($rules) {    $rules | Enable-NetFirewallRule  }}

Export The AD Root CA And Validate Final State

PowerShell
certutil -ca.cert C:\Windows\Temp\corp-root-ca.cerGet-ADDomainGet-ADDomainControllerGet-Service CertSvc

From virt-01, detach the installation media once the guest configuration is complete:

Shell
# Eject the Windows installer media from the AD guest.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'for target in $(virsh domblklist ad-01.corp.lan --details | awk '$2 == "cdrom" { print $3 }'); do  virsh change-media ad-01.corp.lan "$target" --eject --config --live --force || truedoneEOF

Validated AD outputs:

  • AD domain: corp.lan
  • Enterprise Root CA: CORP Enterprise Root CA
  • groups:
    • OpenShift-Admins
    • OpenShift-Virt-Admins
    • Ansible-Automation-Admins
    • Developers
  • users:
    • ad-directoryadmin
    • ad-ocpadmin
    • ad-virtadmin
    • ad-aapadmin
    • ad-dev01

Quick verification from the bastion and hypervisor:

Shell
# Validate WinRM and the AD guest power state.curl -sI http://172.16.0.40:5985/wsman | head -n 1ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 'virsh domstate ad-01.corp.lan'

13AA. Optionally Configure IdM To AD Trust

If the AD support VM is enabled and the lab should bridge selected AD groups into local IdM policy groups, complete the trust setup here before bastion enrollment.

Note

Automation reference: playbooks/bootstrap/idm-ad-trust.yml. The current automated path configures the AD conditional forwarder for workshop.lan, enables IdM AD trust support, creates the AD DNS forward zone in IPA, establishes the trust, and nests the mapped IdM external groups into the target local policy groups described in AD / IDM POLICY MODEL.

Manual checkpoints for this phase:

  • on AD-01, workshop.lan must resolve through the conditional forwarder to idm-01
  • on idm-01, corp.lan forward-zone lookups and AD LDAP SRV lookups must resolve through ad-01
  • ipa trust-show corp.lan --all must succeed on idm-01
  • the mapped IdM external groups and nested local policy groups must match the intended bridge policy

Useful spot checks:

PowerShell
Resolve-DnsName -Name 'idm-01.workshop.lan' -Server 127.0.0.1 -Type A
Shell
# Confirm the AD trust records are visible before proceeding.host ad-01.corp.lan 127.0.0.1host -t SRV _ldap._tcp.dc._msdcs.corp.lan 127.0.0.1ipa trust-show corp.lan --all

13B. Join The Bastion To IdM

At this point bastion-01 already exists and idm-01 is already configured. The remaining work is to trust the active IdM CA, enroll the bastion as an IPA client, and enable the authselect features used by the rest of the lab. The current join path no longer performs a general guest update or reboot; those cycles stay in the earlier site-bootstrap.yml provisioning flow.

Note

Automation reference: playbooks/bootstrap/bastion-join.yml.

From bastion-01, make sure the active IdM CA is trusted locally before the client install:

Shell
# Install the IdM CA on bastion before the client join.curl -o /tmp/idm-ca.crt http://idm-01.workshop.lan/ipa/config/ca.crtsudo install -o root -g root -m 0644 \  /tmp/idm-ca.crt /etc/ipa/ca.crtsudo install -o root -g root -m 0644 \  /tmp/idm-ca.crt /etc/pki/ca-trust/source/anchors/idm-rootCA.pemsudo update-ca-trust extract

Enroll the bastion into IdM:

Shell
# Join bastion to IdM.sudo dnf -y install \  ipa-client \  oddjob \  oddjob-mkhomedir \  sssd \  authselect-compatsudo ipa-client-install -U \  --hostname=bastion-01.workshop.lan \  --domain=workshop.lan \  --realm=WORKSHOP.LAN \  --server=idm-01.workshop.lan \  --principal=admin \  --password='<lab-default-password>' \  --force-join \  --mkhomedir \  --no-ntp

Because bastion-01 uses a static address, do not rely on client-side dynamic DNS updates for its authoritative IdM records. Reassert and validate the A/PTR records explicitly:

Shell
# Create the bastion DNS records in IdM.kinit admin <<< '<lab-default-password>'ipa dnsrecord-add workshop.lan bastion-01 --a-rec=172.16.0.30 \  || ipa dnsrecord-mod workshop.lan bastion-01 --a-rec=172.16.0.30ipa dnsrecord-add 0.16.172.in-addr.arpa 30 \  --ptr-rec=bastion-01.workshop.lan. \  || ipa dnsrecord-mod 0.16.172.in-addr.arpa 30 \    --ptr-rec=bastion-01.workshop.lan.dig +short @172.16.0.10 bastion-01.workshop.lan Adig +short @172.16.0.10 -x 172.16.0.30

Enable the same client-side login behavior the automation expects:

Shell
# Enable the expected SSSD login behavior on bastion.sudo systemctl enable --now oddjobd.servicesudo authselect select sssd with-mkhomedir with-sudo --forcesudo systemctl restart sssdsudo sss_cache -E

Validate the bastion is now using IdM:

Shell
# Validate that bastion is using IdM for identity resolution.id admin@workshop.langetent passwd admin@workshop.lansudo sssctl domain-status workshop.lan

At this point the bastion is ready for IdM-backed operator access. The next support-service phase is the mirror registry build.


Bastion boundary — all remaining work runs from bastion-01

Warning

Everything below this line runs from the bastion. Do not switch back to the operator workstation for steps 13A-36 unless you are deliberately debugging the automation itself. Once you cross this boundary, stay on bastion.

14. Build The Mirror Registry VM

Build and configure the mirror-registry VM from the bastion, join it to IdM, and install the Quay-based disconnected registry stack.

Note

Automation reference: playbooks/lab/mirror-registry.yml, roles mirror_registry and mirror_registry_guest.

From the bastion, after the validated support-services order (bastion -> bastion-stage -> optional ad-server -> idm -> bastion-join), create the mirror-registry VM on virt-01, then configure the guest, join it to IdM, and install the Quay-based mirror registry stack.

Shell
ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1mkdir -p /var/lib/aws-metal-openshift-demo/mirror-registry/cloudinitqemu-img convert -f qcow2 -O raw \  <rhel10-image-path> \  /dev/ebs/mirror-registrySSH_PUBKEY="$(cat /opt/openshift/secrets/hypervisor-admin.pub)"cat <<'EOF' >/var/lib/aws-metal-openshift-demo/mirror-registry/cloudinit/meta-datainstance-id: mirror-registrylocal-hostname: mirror-registry.workshop.lanEOFcat <<'EOF' >/var/lib/aws-metal-openshift-demo/mirror-registry/cloudinit/network-configversion: 2ethernets:  eth0:    dhcp4: false    addresses:      - 172.16.0.20/24    routes:      - to: 0.0.0.0/0          via: 172.16.0.1    nameservers:      search: [workshop.lan]      addresses: [172.16.0.10, 8.8.8.8, 4.4.4.4]EOFcat <<EOF >/var/lib/aws-metal-openshift-demo/mirror-registry/cloudinit/user-data#cloud-configfqdn: mirror-registry.workshop.lanmanage_etc_hosts: trueusers:  - default  - name: cloud-user      groups: [wheel]      sudo: ALL=(ALL) NOPASSWD:ALL      lock_passwd: false      ssh_authorized_keys:      - ${SSH_PUBKEY}EOFxorriso -as mkisofs \  -o /var/lib/aws-metal-openshift-demo/mirror-registry/mirror-registry-cidata.iso \  -V CIDATA -J -R -graft-points \  user-data=/var/lib/aws-metal-openshift-demo/mirror-registry/cloudinit/user-data \  meta-data=/var/lib/aws-metal-openshift-demo/mirror-registry/cloudinit/meta-data \  network-config=/var/lib/aws-metal-openshift-demo/mirror-registry/cloudinit/network-configchown qemu:qemu /var/lib/aws-metal-openshift-demo/mirror-registry/mirror-registry-cidata.isovirt-install \  --name mirror-registry.workshop.lan \  --osinfo rhel10.0 \  --boot uefi \  --machine q35 \  --memory 8192 \  --vcpus 4 \  --cpu host-passthrough \  --controller type=scsi,model=virtio-scsi \  --disk path=/dev/ebs/mirror-registry,format=raw,bus=scsi,cache=none,io=native,discard=unmap,rotation_rate=1 \  --disk path=/var/lib/aws-metal-openshift-demo/mirror-registry/mirror-registry-cidata.iso,device=cdrom \  --network network=lab-switch,portgroup=mgmt-access,model=virtio,mac=52:54:00:00:00:20 \  --rng builtin \  --import \  --graphics none \  --resource partition=/machine/bronze \  --cputune shares=167,emulatorpin.cpuset=2-5,26-29,50-53,74-77,\vcpupin0.vcpu=0,vcpupin0.cpuset=6-23,30-47,54-71,78-95,\vcpupin1.vcpu=1,vcpupin1.cpuset=6-23,30-47,54-71,78-95,\vcpupin2.vcpu=2,vcpupin2.cpuset=6-23,30-47,54-71,78-95,\vcpupin3.vcpu=3,vcpupin3.cpuset=6-23,30-47,54-71,78-95,\iothreadpin0.iothread=1,iothreadpin0.cpuset=2-5,26-29,50-53,74-77 \  --iothreads iothreads=1 \  --console pty,target_type=serial \  --autostart \  --noautoconsole

Configure the guest itself, install packages, join IdM, and install the mirror registry appliance.

Note

Update redhat-release before the main system update here as well. This ensures the current Red Hat Post-Quantum Cryptography public keys are present before DNF validates newer packages. See: https://access.redhat.com/solutions/3449341

Shell
# Configure the mirror-registry guest and install the appliance prerequisites.ssh -i /opt/openshift/secrets/hypervisor-admin.key cloud-user@172.16.0.20sudo -idnf -y update redhat-releasednf -y updaterebootdnf -y install \  cockpit \  firewalld \  ipa-client \  certmonger \  git \  podman \  jq \  skopeo \  openssl \  tar \  gzipmkdir -p /etc/containers/containers.conf.dcat <<'EOF' >/etc/containers/containers.conf.d/99-mirror-registry-cgroupfs.conf[engine]cgroup_manager = "cgroupfs"EOFsystemctl enable --now firewalldsystemctl enable --now cockpit.socketfirewall-cmd --permanent --add-service=cockpitfirewall-cmd --permanent --add-service=sshfirewall-cmd --permanent --add-port=8443/tcpfirewall-cmd --reloadipa-client-install -U \  --hostname=mirror-registry.workshop.lan \  --domain=workshop.lan \  --realm=WORKSHOP.LAN \  --server=idm-01.workshop.lan \  --principal=admin \  --password='<lab-default-password>' \  --force-join \  --mkhomedirmkdir -p /usr/local/libexec/mirror-registry /opt/quay-install /root/bin /opt/openshiftcurl -L -o /tmp/mirror-registry-amd64.tar.gz \  https://mirror.openshift.com/pub/cgw/mirror-registry/latest/mirror-registry-amd64.tar.gztar -C /usr/local/libexec/mirror-registry -xzf /tmp/mirror-registry-amd64.tar.gzinstall -m 0755 /usr/local/libexec/mirror-registry/mirror-registry /usr/local/bin/mirror-registry

Install calabi-shell only after the guest is registered and can install packages from the Red Hat CDN. The system-wide install requires git, so it does not belong in pre-registration cloud-init.

Shell
git clone https://github.com/gprocunier/calabi-shell /opt/calabi-shell/opt/calabi-shell/install.sh --system

As with the bastion, the mirror-registry guest has a static address. Reassert and validate its authoritative IdM records explicitly instead of relying on client-driven dynamic DNS updates:

Shell
# Create the mirror-registry DNS records in IdM.kinit admin <<< '<lab-default-password>'ipa dnsrecord-add workshop.lan mirror-registry --a-rec=172.16.0.20 \  || ipa dnsrecord-mod workshop.lan mirror-registry --a-rec=172.16.0.20ipa dnsrecord-add 0.16.172.in-addr.arpa 20 \  --ptr-rec=mirror-registry.workshop.lan. \  || ipa dnsrecord-mod 0.16.172.in-addr.arpa 20 \    --ptr-rec=mirror-registry.workshop.lan.dig +short @172.16.0.10 mirror-registry.workshop.lan Adig +short @172.16.0.10 -x 172.16.0.20

Request an IdM-issued certificate for the registry and install the registry with that certificate.

Shell
# Request and install the mirror-registry certificate.kinit admin <<< '<lab-default-password>'ipa service-add HTTP/mirror-registry.workshop.lan || truemkdir -p /var/lib/mirror-registry/install-certsipa-getcert request -w \  -I mirror-registry-quay \  -f /etc/pki/tls/certs/mirror-registry.workshop.lan.crt \  -k /etc/pki/tls/private/mirror-registry.workshop.lan.key \  -K HTTP/mirror-registry.workshop.lan \  -D mirror-registry.workshop.lan \  -g 2048cat /etc/pki/tls/certs/mirror-registry.workshop.lan.crt \    /etc/ipa/ca.crt \  >/var/lib/mirror-registry/install-certs/ssl.certcp /etc/pki/tls/private/mirror-registry.workshop.lan.key \  /var/lib/mirror-registry/install-certs/ssl.keychmod 0644 /var/lib/mirror-registry/install-certs/ssl.*mirror-registry install \  --quayHostname mirror-registry.workshop.lan \  --quayRoot /opt/quay-install \  --initUser init \  --initPassword <lab-default-password> \  --sslCert /var/lib/mirror-registry/install-certs/ssl.cert \  --sslKey /var/lib/mirror-registry/install-certs/ssl.keyupdate-ca-trustmkdir -p /etc/containers/certs.d/mirror-registry.workshop.lan:8443cp /etc/ipa/ca.crt /etc/pki/ca-trust/source/anchors/workshop-idm-ca.crtcp /etc/ipa/ca.crt /etc/containers/certs.d/mirror-registry.workshop.lan:8443/ca.crtupdate-ca-trust extractpodman login mirror-registry.workshop.lan:8443 \  --username init \  --password <lab-default-password>

After certificate issuance or renewal, the current automation also synchronizes the staged cert and key into the live Quay config under /opt/quay-install/quay-config/ and restarts the Quay services. That step is what ensures the served certificate actually matches the freshly issued IdM certificate chain.

Like the other support guests, the current automation uses a poweroff/offline cleanup/start cycle for the first update-triggered restart so the cloud-init CD-ROM cleanup is reflected in the next live QEMU process.

15. Mirror OpenShift And Operator Content

Mirror the OpenShift release payloads, operator catalogs, and extra images into the local registry using the portable m2d plus d2m workflow.

Note

Automation reference: the mirroring portion of playbooks/lab/mirror-registry.yml, primarily role mirror_registry_guest.

The disconnected standard for this lab is now:

  • portable — runs both m2d (pull to disk archive) and d2m (push into local Quay) in a single playbook invocation

The import mode (d2m only) remains available for re-importing an existing archive without re-pulling from upstream.

Direct mirror-to-registry remains available for partial-disconnect validation, but it is no longer the primary student workflow.

Install the matching client tools on the mirror registry, copy the Red Hat pull secret into place, render the ImageSetConfiguration, and run oc-mirror.

Shell
# Download the required tools and mirror the OpenShift content.dnf -y install jqmkdir -p /opt/openshift/oc-mirror /opt/openshift/oc-mirror-archive /root/.config/containerscurl -L -o /tmp/openshift-client-linux.tar.gz \  https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.20.15/openshift-client-linux.tar.gztar -C /usr/local/bin -xzf /tmp/openshift-client-linux.tar.gz oc kubectlcurl -L -o /tmp/oc-mirror.rhel9.tar.gz \  https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.20.15/oc-mirror.rhel9.tar.gztar -C /usr/local/bin -xzf /tmp/oc-mirror.rhel9.tar.gz oc-mirrorcp /opt/openshift/secrets/pull-secret.txt /opt/openshift/pull-secret.jsonjq -s '.[0] * .[1] | .auths = (.[0].auths + .[1].auths)' \  /opt/openshift/pull-secret.json \  /root/.config/containers/auth.json \  >/opt/openshift/pull-secret-merged.jsoncat <<'EOF' >/opt/openshift/imageset-config.yamlapiVersion: mirror.openshift.io/v2alpha1kind: ImageSetConfigurationmirror:  platform:    channels:      - name: stable-4.20          minVersion: 4.20.15          maxVersion: 4.20.15    architectures:      - amd64  operators:    - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.20        packages:        - name: kubevirt-hyperconverged            channels: [{name: stable}]        - name: local-storage-operator            channels: [{name: stable}]        - name: kubernetes-nmstate-operator            channels: [{name: stable}]        - name: loki-operator            channels: [{name: stable-6.2}]        - name: netobserv-operator            channels: [{name: stable}]        - name: openshift-pipelines-operator-rh            channels: [{name: pipelines-1.20}]        - name: ansible-automation-platform-operator            channels: [{name: stable-2.6}]        - name: web-terminal            channels: [{name: fast}]        - name: devworkspace-operator            channels: [{name: fast}]        - name: node-healthcheck-operator            channels: [{name: alpha}]        - name: fence-agents-remediation            channels: [{name: alpha}]        - name: odf-operator            channels: [{name: stable-4.20}]        - name: ocs-operator            channels: [{name: stable-4.20}]        - name: mcg-operator            channels: [{name: stable-4.20}]        - name: odf-csi-addons-operator            channels: [{name: stable-4.20}]        - name: rook-ceph-operator            channels: [{name: stable-4.20}]        - name: cephcsi-operator            channels: [{name: stable-4.20}]        - name: metallb-operator            channels: [{name: stable}]EOFoc-mirror --v2 \  --config /opt/openshift/imageset-config.yaml \  --authfile /opt/openshift/pull-secret-merged.json \  --parallel-images 16 \  --parallel-layers 12 \  file:///opt/openshift/oc-mirror-archive

Import the resulting archive into the local registry. In the current automation this runs as a single portable workflow (m2d then d2m in one invocation). The manual equivalent is two separate commands — pull to disk, then push into Quay:

Shell
# Import the mirrored archive into the local registry.oc-mirror --v2 \  --config /opt/openshift/imageset-config.yaml \  --authfile /root/.config/containers/auth.json \  --parallel-images 16 \  --parallel-layers 12 \  --from file:///opt/openshift/oc-mirror-archive \  docker://mirror-registry.workshop.lan:8443/openshift

The automation exposes those values as mirror_registry_oc_mirror_parallel_images and mirror_registry_oc_mirror_parallel_layers.

Track the workflow with the bastion helper that the orchestration now installs.

Shell
# Open the mirror progress helpers./usr/local/bin/track-mirror-progress/usr/local/bin/track-mirror-progress-tmux

The tmux variant opens dedicated panes for:

  • summary
  • runner state
  • storage/import state
  • registry container state
  • live bastion log tail

The same information can also be gathered manually without the helper.

From bastion-01, inspect the runner state and latest Ansible task.

Shell
# Inspect the bastion-side mirror job state.tail -f /var/tmp/bastion-playbooks/mirror-registry.logcat /var/tmp/bastion-playbooks/mirror-registry.pidcat /var/tmp/bastion-playbooks/mirror-registry.rc

From mirror-registry.workshop.lan, inspect live oc-mirror activity, archive growth, and guest disk usage.

Shell
# Inspect live mirror activity and archive growth on the registry host.pgrep -af oc-mirrordf -h /du -sh /opt/openshift/oc-mirror-archivedu -sh /opt/openshift/oc-mirrorsudo du -sh /var/lib/containers/storage/volumes/quay-storage/_datasudo du -sh /var/lib/containers/storage/volumes/sqlite-storage/_datasudo podman pstail -f /var/log/oc-mirror-m2d.logtail -f /var/log/oc-mirror-d2m.log

Tip

The mirroring phase is the longest single step in the build (hours, not minutes). Use track-mirror-progress-tmux on the bastion to monitor it. If the guest runs out of disk mid-mirror, the archive is corrupted and you start over.

If an oc-mirror dry run fails because port 55000 is already bound, look for a stale oc-mirror process on the mirror-registry guest before rerunning. That is usually a rerun artifact, not a content-model problem.

Approximate sizing guidance:

  • m2d safe target: archive_size * 1.5 + 20 GiB
  • same-host d2m safe target: archive_size * 2.5 + 20 GiB
  • recommended same-host disk for both phases with margin: archive_size * 3 + 20 GiB

Observed on the live 4.20.15 run with the current operator set:

  • m2d archive size: about 95 GiB
  • m2d safe target: about 162 GiB
  • same-host d2m safe target: about 256 GiB
  • recommended same-host disk for both phases with margin: about 303 GiB
  • practical lab decision: provision 400 GiB for mirror-registry
  • observed imported Quay content footprint after d2m: about 82 GiB

16. Populate OpenShift DNS In IdM

Populate the OpenShift forward and reverse DNS zones in IdM so the cluster and its routes resolve correctly before install.

Note

Automation reference: playbooks/lab/openshift-dns.yml, role idm_openshift_dns.

Create the forward and reverse DNS zones and records in IdM for the cluster, nodes, and VIPs.

Shell
# Create the OpenShift forward and reverse DNS zones and VIP records.ssh -i /opt/openshift/secrets/hypervisor-admin.key cloud-user@172.16.0.10sudo -ikinit admin <<< '<lab-default-password>'ipa dnszone-add ocp.workshop.lan \  --name-server=idm-01.workshop.lan. \  --admin-email=hostmaster.ocp.workshop.lan \  --dynamic-update=FALSE || trueipa dnszone-add 10.16.172.in-addr.arpa \  --name-server=idm-01.workshop.lan. \  --admin-email=hostmaster.ocp.workshop.lan \  --dynamic-update=FALSE || trueipa dnszone-add 11.16.172.in-addr.arpa \  --name-server=idm-01.workshop.lan. \  --admin-email=hostmaster.ocp.workshop.lan \  --dynamic-update=FALSE || trueipa dnsrecord-add ocp.workshop.lan api --a-rec=172.16.10.5 || trueipa dnsrecord-add ocp.workshop.lan api-int --a-rec=172.16.10.5 || trueipa dnsrecord-add ocp.workshop.lan '*.apps' --a-rec=172.16.10.7 || trueipa dnsrecord-add ocp.workshop.lan ingress --a-rec=172.16.10.7 || true

Create the node A and PTR records.

Shell
# Create the OpenShift node A and PTR records.for entry in \  "ocp-master-01 11 11" \  "ocp-master-02 12 12" \  "ocp-master-03 13 13" \  "ocp-infra-01 21 21" \  "ocp-infra-02 22 22" \  "ocp-infra-03 23 23" \  "ocp-worker-01 31 31" \  "ocp-worker-02 32 32" \  "ocp-worker-03 33 33"; do  set -- $entry  name="$1"  machine_octet="$2"  storage_octet="$3"  ipa dnsrecord-add ocp.workshop.lan "${name}" \    --a-rec="172.16.10.${machine_octet}" || true  ipa dnsrecord-add 10.16.172.in-addr.arpa "${machine_octet}" \    --ptr-rec="${name}.ocp.workshop.lan." || true  ipa dnsrecord-add ocp.workshop.lan "${name}-storage" \    --a-rec="172.16.11.${storage_octet}" || true  ipa dnsrecord-add 11.16.172.in-addr.arpa "${storage_octet}" \    --ptr-rec="${name}-storage.ocp.workshop.lan." || truedone

17. Download Installer Binaries

Download the exact matching OpenShift installer and client binaries onto the bastion.

Note

Automation reference: playbooks/cluster/openshift-installer-binaries.yml, role openshift_installer_binaries.

Download the exact matching OpenShift installer and client tools onto the bastion.

Shell
# Download the exact matching OpenShift installer and client tools onto the bastion.mkdir -p /opt/openshift/generated/tools/4.20.15/downloadsmkdir -p /opt/openshift/generated/tools/4.20.15/bindnf -y install nmstatecurl -L -o /opt/openshift/generated/tools/4.20.15/downloads/openshift-install-linux.tar.gz \  https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.20.15/openshift-install-linux.tar.gzcurl -L -o /opt/openshift/generated/tools/4.20.15/downloads/openshift-client-linux.tar.gz \  https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.20.15/openshift-client-linux.tar.gztar -C /opt/openshift/generated/tools/4.20.15/bin -xzf \  /opt/openshift/generated/tools/4.20.15/downloads/openshift-install-linux.tar.gztar -C /opt/openshift/generated/tools/4.20.15/bin -xzf \  /opt/openshift/generated/tools/4.20.15/downloads/openshift-client-linux.tar.gzchmod 0755 /opt/openshift/generated/tools/4.20.15/bin/openshift-installchmod 0755 /opt/openshift/generated/tools/4.20.15/bin/occhmod 0755 /opt/openshift/generated/tools/4.20.15/bin/kubectl

18. Render Install Artifacts

Render the OpenShift install-config, manifests, and cluster artifacts on the bastion.

Note

Automation reference: playbooks/cluster/openshift-install-artifacts.yml, role openshift_install_artifacts.

Write the install-config.yaml, agent-config.yaml, and the IdM CA file that are used by the agent installer.

Shell
# Write the OpenShift install config, agent config, and IdM CA bundle.mkdir -p /opt/openshift/generated/ocpcurl -fsSL http://172.16.0.10/ipa/config/ca.crt >/opt/openshift/generated/ocp/idm-ca.crtPULL_SECRET_JSON="$(jq -c . /opt/openshift/secrets/pull-secret.txt)"SSH_PUBKEY="$(cat /opt/openshift/secrets/hypervisor-admin.pub)"cat <<EOF >/opt/openshift/generated/ocp/install-config.yamlapiVersion: v1baseDomain: workshop.lanmetadata:  name: ocpplatform:  none: {}controlPlane:  name: master  replicas: 3  architecture: amd64compute:  - name: worker      replicas: 6      architecture: amd64networking:  networkType: OVNKubernetes  machineNetwork:    - cidr: 172.16.10.0/24  clusterNetwork:    - cidr: 10.128.0.0/14        hostPrefix: 23  serviceNetwork:    - 172.30.0.0/16pullSecret: 'REPLACE_FROM_PULL_SECRET_FILE'sshKey: '${SSH_PUBKEY}'additionalTrustBundle: |EOFcat /opt/openshift/generated/ocp/idm-ca.crt >>/opt/openshift/generated/ocp/install-config.yamlpython3 - <<'PY'from pathlib import Pathpath = Path("/opt/openshift/generated/ocp/install-config.yaml")text = path.read_text()pull = Path("/opt/openshift/secrets/pull-secret.txt").read_text().strip().replace("'", "''")path.write_text(text.replace("REPLACE_FROM_PULL_SECRET_FILE", pull))PY

Write the agent config with MAC-based NIC identification and explicit root disk selection. The current automation uses the libvirt root-disk serial for each node and renders that into rootDeviceHints.serialNumber.

Shell
cat <<'EOF' >/opt/openshift/generated/ocp/agent-config.yamlapiVersion: v1alpha1kind: AgentConfigrendezvousIP: 172.16.10.11hosts:  - hostname: ocp-master-01.ocp.workshop.lan      role: master      interfaces:      - name: nic0          macAddress: "52:54:00:20:00:10"      rootDeviceHints:      serialNumber: "ocpmaster01root"      networkConfig:      interfaces:        - name: nic0            type: ethernet            state: up            identifier: mac-address            mac-address: "52:54:00:20:00:10"        - name: nic0.200            type: vlan            state: up            vlan:            base-iface: nic0            id: 200            ipv4:            enabled: true            address:              - ip: 172.16.10.11                  prefix-length: 24        - name: nic0.201            type: vlan            state: up            vlan:            base-iface: nic0            id: 201            ipv4:            enabled: true            address:              - ip: 172.16.11.11                  prefix-length: 24      dns-resolver:        config:          server:            - 172.16.0.10      routes:        config:          - destination: 0.0.0.0/0              next-hop-address: 172.16.10.1              next-hop-interface: nic0.200  # Repeat the same pattern for the remaining 8 nodes.EOF

19. Generate The Agent ISO

Generate the agent-based installer ISO used to boot the cluster VMs.

Note

Automation reference: playbooks/cluster/openshift-agent-media.yml, role openshift_agent_media.

Generate the agent media on the bastion, then copy the ISO to virt-01 and verify its checksum before using it.

Shell
# Generate the agent ISO and copy it to virt-01./opt/openshift/generated/tools/4.20.15/bin/openshift-install agent create image \  --dir /opt/openshift/generated/ocpsha256sum /opt/openshift/generated/ocp/agent.x86_64.isossh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 \  "mkdir -p /var/lib/libvirt/images"scp -i /opt/openshift/secrets/hypervisor-admin.key \  /opt/openshift/generated/ocp/agent.x86_64.iso \  root@172.16.0.1:/var/lib/libvirt/images/agent.x86_64.iso.tmpssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 \  "install -m 0644 /var/lib/libvirt/images/agent.x86_64.iso.tmp \   /var/lib/libvirt/images/agent.x86_64.iso && \   sha256sum /var/lib/libvirt/images/agent.x86_64.iso"

Create the generated attachment plan that says every node should boot from the agent ISO.

Shell
# Render the generated ISO attachment plan.cat <<'EOF' >/opt/openshift/generated/ocp/openshift_cluster_attachment_plan.ymlopenshift_cluster_node_attachment_plan:  ocp-master-01:    access:      attach_agent_boot_media: true      agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso  ocp-master-02:    access:      attach_agent_boot_media: true      agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso  ocp-master-03:    access:      attach_agent_boot_media: true      agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso  ocp-infra-01:    access:      attach_agent_boot_media: true      agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso  ocp-infra-02:    access:      attach_agent_boot_media: true      agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso  ocp-infra-03:    access:      attach_agent_boot_media: true      agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso  ocp-worker-01:    access:      attach_agent_boot_media: true      agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso  ocp-worker-02:    access:      attach_agent_boot_media: true      agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso  ocp-worker-03:    access:      attach_agent_boot_media: true      agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.isoEOF

20. Create The OpenShift VM Shells

Create the nine OpenShift VM shells, attach the agent ISO, and boot them into the agent installer.

Note

Automation reference: playbooks/cluster/openshift-cluster.yml, role openshift_cluster.

Create the 9 OpenShift VM shells on virt-01, attach the ISO, and set them to boot CD-ROM first.

Current tier intent:

  • Gold:
    • masters
  • Silver:
    • infra
  • Bronze:
    • workers

Current sizing:

  • masters: 3 x 8 vCPU
  • infra: 3 x 16 vCPU
  • workers: 3 x 8 vCPU

Current CPU pools:

  • guest_domain: 6-23,30-47,54-71,78-95
  • host_emulator: 2-5,26-29,50-53,74-77
Shell
ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1virt-install \  --name ocp-master-01.ocp.workshop.lan \  --osinfo rhel9.4 \  --boot uefi \  --machine q35 \  --memory 24576 \  --vcpus 8 \  --cpu host-passthrough \  --controller type=scsi,model=virtio-scsi \  --disk path=/dev/ebs/ocp-master-01,format=raw,bus=scsi,cache=none,io=native,discard=unmap,rotation_rate=1 \  --disk path=/var/lib/libvirt/images/agent.x86_64.iso,device=cdrom,bus=scsi \  --network network=lab-switch,portgroup=ocp-trunk,model=virtio,mac=52:54:00:20:00:10 \  --graphics vnc,listen=127.0.0.1 \  --import \  --resource partition=/machine/gold \  --cputune shares=512,emulatorpin.cpuset=2-5,26-29,50-53,74-77,\vcpupin0.vcpu=0,vcpupin0.cpuset=6-23,30-47,54-71,78-95,\vcpupin1.vcpu=1,vcpupin1.cpuset=6-23,30-47,54-71,78-95,\vcpupin2.vcpu=2,vcpupin2.cpuset=6-23,30-47,54-71,78-95,\vcpupin3.vcpu=3,vcpupin3.cpuset=6-23,30-47,54-71,78-95,\vcpupin4.vcpu=4,vcpupin4.cpuset=6-23,30-47,54-71,78-95,\vcpupin5.vcpu=5,vcpupin5.cpuset=6-23,30-47,54-71,78-95,\vcpupin6.vcpu=6,vcpupin6.cpuset=6-23,30-47,54-71,78-95,\vcpupin7.vcpu=7,vcpupin7.cpuset=6-23,30-47,54-71,78-95 \  --autostart \  --noautoconsolevirt-xml ocp-master-01.ocp.workshop.lan --edit --boot cdrom,hd# Repeat the same pattern for:# - ocp-master-02, ocp-master-03#   - partition=/machine/gold#   - shares=512# - ocp-infra-01..03#   - partition=/machine/silver#   - shares=333#   - attach /dev/ebs/ocp-infra-0X-data as a second disk#   - add --iothreads iothreads=1 and iothreadpin0.cpuset=2-5,26-29,50-53,74-77# - ocp-worker-01..03#   - partition=/machine/bronze#   - shares=167

21. Wait For Installer Convergence

Wait for bootstrap and install completion from the bastion.

After the VM shells are created and booted from agent.x86_64.iso, run the installer wait phase from the bastion. This is the step that turns “VMs are running” into “the cluster finished bootstrap and install.”

Shell
# Wait for the OpenShift installer to converge./opt/openshift/generated/tools/4.20.15/bin/openshift-install \  --dir /opt/openshift/generated/ocp \  wait-for bootstrap-complete --log-level=debug/opt/openshift/generated/tools/4.20.15/bin/openshift-install \  --dir /opt/openshift/generated/ocp \  wait-for install-complete --log-level=debug

22. Validate Post-install State

Validate that the newly installed cluster is healthy enough for day-2 work.

Note

Automation reference: playbooks/day2/openshift-post-install-validate.yml, role openshift_post_install_validate.

Once installer convergence is complete and auth/kubeconfig exists, use the generated kubeconfig from inside the lab and validate the cluster from virt-01.

Shell
# Validate the installed cluster from virt-01.scp -i /opt/openshift/secrets/hypervisor-admin.key \  /opt/openshift/generated/ocp/auth/kubeconfig \  root@172.16.0.1:/var/tmp/ocp-kubeconfigscp -i /opt/openshift/secrets/hypervisor-admin.key \  /opt/openshift/generated/tools/4.20.15/bin/oc \  root@172.16.0.1:/var/tmp/ocssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'chmod 0755 /var/tmp/ocexport KUBECONFIG=/var/tmp/ocp-kubeconfig/var/tmp/oc get clusterversion/var/tmp/oc get co/var/tmp/oc get nodes/var/tmp/oc get csrEOF

After those checks pass, refresh the bastion helper kubeconfigs from the current cluster state and import the live cluster CA bundle into bastion system trust so normal oc login works without --insecure-skip-tls-verify.

Shell
# Refresh the bastion helper kubeconfigs and trust the cluster CA.ssh cloud-user@172.16.0.30 <<'EOF'set -euo pipefailcp /opt/openshift/aws-metal-openshift-demo/generated/ocp/auth/kubeconfig "$HOME/etc/kubeconfig"cp /opt/openshift/aws-metal-openshift-demo/generated/ocp/auth/kubeconfig "$HOME/etc/kubeconfig.local"chmod 0600 "$HOME/etc/kubeconfig" "$HOME/etc/kubeconfig.local"oc --kubeconfig "$HOME/etc/kubeconfig.local" get configmap/kube-root-ca.crt \  -o jsonpath='{.data.ca\.crt}' >/tmp/kube-root-ca.crtsudo cp /tmp/kube-root-ca.crt /etc/pki/ca-trust/source/anchors/ocp-cluster-ca-bundle.pemsudo update-ca-trust extractEOF

23. Detach Install Media And Normalize Boot

Detach the install media and restore disk-first boot intent before any normal cluster reboots occur.

Caution

Do not skip this step. If the agent ISO is still attached when a cluster node reboots (vCPU resize, operator-triggered restart, or accidental power cycle), the node will re-enter the day-1 agent installer instead of booting from disk. This happened in production — see issue 007c920 in the issues ledger.

Once guests have completed day-1 provisioning, eject the attached installation media and restore disk-only boot intent. This prevents support guests from retaining sensitive cloud-init data and prevents OpenShift guests from booting back into the agent installer after a restart.

For support guests, the preferred timing is earlier than the end of the build: after the initial package update is staged but before the reboot required by that update. That reboot clears the live empty CD-ROM shell that libvirt may leave behind even after the media is ejected and the persistent XML is cleaned up.

For OpenShift cluster guests, the important success condition is different: eject agent.x86_64.iso and restore disk-first boot. The live or persistent empty CD-ROM shell does not need to be removed immediately, and trying to do so on a running node is not a reliable success criterion.

Shell
# Verify that the support guests no longer have persistent CD-ROM devices.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'for domain in \  idm-01.workshop.lan \  bastion-01.workshop.lan \  mirror-registry.workshop.lando  target=$(virsh domblklist "$domain" --details | awk '$2 == "cdrom" {print $3}')  if [ -n "$target" ]; then    virsh change-media "$domain" "$target" --eject --config --live || true    virt-xml "$domain" --remove-device --disk "target=$target"    virt-xml "$domain" --edit --boot hd  fidonefor domain in \  ocp-master-01.ocp.workshop.lan \  ocp-master-02.ocp.workshop.lan \  ocp-master-03.ocp.workshop.lan \  ocp-infra-01.ocp.workshop.lan \  ocp-infra-02.ocp.workshop.lan \  ocp-infra-03.ocp.workshop.lan \  ocp-worker-01.ocp.workshop.lan \  ocp-worker-02.ocp.workshop.lan \  ocp-worker-03.ocp.workshop.lando  target=$(virsh domblklist "$domain" --details \    | awk '$2 == "cdrom" && $4 == "/var/lib/libvirt/images/agent.x86_64.iso" {print $3}')  if [ -n "$target" ]; then    virsh change-media "$domain" "$target" --eject --config --live || true    virt-xml "$domain" --edit --boot hd  fidoneEOF

Verify support guests no longer carry persistent CD-ROM devices:

Shell
# Verify support guests no longer carry persistent CD-ROM devices.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'for domain in \  idm-01.workshop.lan \  bastion-01.workshop.lan \  mirror-registry.workshop.lando  echo "=== $domain ==="  virsh dumpxml --inactive "$domain" | grep "device='cdrom'" || echo "no persistent cdrom"doneEOF

Verify OpenShift guests have no attached agent ISO media and boot from disk:

Shell
# Verify that the OpenShift guests boot from disk with no agent ISO attached.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'for domain in \  ocp-master-01.ocp.workshop.lan \  ocp-master-02.ocp.workshop.lan \  ocp-master-03.ocp.workshop.lan \  ocp-infra-01.ocp.workshop.lan \  ocp-infra-02.ocp.workshop.lan \  ocp-infra-03.ocp.workshop.lan \  ocp-worker-01.ocp.workshop.lan \  ocp-worker-02.ocp.workshop.lan \  ocp-worker-03.ocp.workshop.lando  echo "=== $domain ==="  virsh domblklist "$domain" --details | awk '$2 == "cdrom" {print}'  virsh dumpxml --inactive "$domain" | grep "<boot dev='hd'/>" || echo "boot order needs review"doneEOF

24. Configure Breakglass Auth, Keycloak OIDC, And Infra Roles

This section is the manual runbook for the supported infra and authentication cutover: move platform workloads onto infra nodes, establish a local breakglass login, deploy Keycloak, federate it to IdM, and configure OpenShift to use OIDC.

Note

Automation reference: the identity and infra phases inside playbooks/day2/openshift-post-install.yml, primarily roles openshift_post_install_infra, openshift_post_install_breakglass_auth, openshift_post_install_keycloak, and openshift_post_install_oidc_auth.

Architecture reference: AUTH MODEL for the current supported auth boundary, and AD / IDM POLICY MODEL for the planned future AD-source-of-truth model.

The supported execution order is:

  1. disconnected OperatorHub pivot
  2. infra conversion
  3. IdM ingress certificate rollout
  4. breakglass HTPasswd auth
  5. NMState
  6. ODF
  7. Keycloak
  8. OIDC auth
  9. optional legacy LDAP auth and group sync
  10. OpenShift Virtualization
  11. OpenShift Pipelines
  12. Web Terminal
  13. AAP
  14. Network Observability
  15. validation

The aggregated day-2 play now uses convergence probes for the major phases. On rerun, a healthy phase is skipped unless its matching force_* variable is true. Use enable_* variables to select the desired profile; use force_* variables only for deliberate repair or rebuild work.

The supported default auth model is:

  1. create a local HTPasswd breakglass login
  2. remove kubeadmin after the breakglass login is proven
  3. deploy Keycloak after ODF storage is available
  4. federate Keycloak to IdM
  5. configure OpenShift OAuth for OIDC against Keycloak
  6. map the OIDC groups claim into OpenShift groups
  7. bind IdM group access-openshift-admin to OpenShift cluster-admin

Direct OpenShift LDAP auth is no longer the default baseline. Keep it out of the cluster OAuth configuration unless you are deliberately validating that compatibility path.

The same principle now applies to AAP: the supported clean-build path is Keycloak OIDC, not direct AAP LDAP.

Label the infra nodes and move platform workloads onto them early in day-2 so the later auth and storage work settles on the intended node tier.

Note: do not taint infra nodes for general workload placement here. Workloads are steered via nodeSelector / nodePlacement. Taints are applied later only for the ODF storage set (node.ocs.openshift.io/storage).

Shell
# Label the infra nodes and move the core platform workloads onto them.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfig/var/tmp/oc label node ocp-infra-01 node-role.kubernetes.io/infra='' --overwrite/var/tmp/oc label node ocp-infra-02 node-role.kubernetes.io/infra='' --overwrite/var/tmp/oc label node ocp-infra-03 node-role.kubernetes.io/infra='' --overwritecat <<'YAML' | /var/tmp/oc apply -f -apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata:  name: access-openshift-admin-cluster-adminroleRef:  apiGroup: rbac.authorization.k8s.io  kind: ClusterRole  name: cluster-adminsubjects:  - kind: Group      apiGroup: rbac.authorization.k8s.io      name: access-openshift-adminYAML# --- Move platform workloads to infra nodes ---/var/tmp/oc patch ingresscontroller/default -n openshift-ingress-operator \  --type=merge -p \  '{"spec":{"nodePlacement":{"nodeSelector":{"matchLabels":{"node-role.kubernetes.io/infra":""}},"tolerations":[{"key":"node.ocs.openshift.io/storage","value":"true","effect":"NoSchedule"}]}}}'cat <<'YAML' | /var/tmp/oc apply -f -apiVersion: v1kind: ConfigMapmetadata:  name: cluster-monitoring-config  namespace: openshift-monitoringdata:  config.yaml: |    prometheusOperator:      nodeSelector:        node-role.kubernetes.io/infra: ""      tolerations:        - key: node.ocs.openshift.io/storage            value: "true"            effect: NoSchedule    prometheusK8s:      nodeSelector:        node-role.kubernetes.io/infra: ""      tolerations:        - key: node.ocs.openshift.io/storage            value: "true"            effect: NoSchedule    alertmanagerMain:      nodeSelector:        node-role.kubernetes.io/infra: ""      tolerations:        - key: node.ocs.openshift.io/storage            value: "true"            effect: NoSchedule    kubeStateMetrics:      nodeSelector:        node-role.kubernetes.io/infra: ""      tolerations:        - key: node.ocs.openshift.io/storage            value: "true"            effect: NoSchedule    openshiftStateMetrics:      nodeSelector:        node-role.kubernetes.io/infra: ""      tolerations:        - key: node.ocs.openshift.io/storage            value: "true"            effect: NoSchedule    thanosQuerier:      nodeSelector:        node-role.kubernetes.io/infra: ""      tolerations:        - key: node.ocs.openshift.io/storage            value: "true"            effect: NoSchedule    metricsServer:      nodeSelector:        node-role.kubernetes.io/infra: ""      tolerations:        - key: node.ocs.openshift.io/storage            value: "true"            effect: NoScheduleYAML/var/tmp/oc patch configs.imageregistry/cluster --type=merge \  -p '{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""},"tolerations":[{"key":"node.ocs.openshift.io/storage","value":"true","effect":"NoSchedule"}]}}'EOF

Important

Preserve a local recovery path before changing network auth. Create and validate a breakglass HTPasswd user before patching OAuth to use Keycloak. Only after the breakglass login works should you retire kubeadmin.

Start by establishing and validating the breakglass OAuth identity provider.

Shell
# Create the breakglass identity provider, test it, and retire kubeadmin.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfightpasswd -BbnC 12 breakglass-admin '<lab-default-password>' >/tmp/htpasswd/var/tmp/oc create secret generic breakglass-htpasswd \  -n openshift-config \  --from-file=htpasswd=/tmp/htpasswd \  --dry-run=client -o yaml | /var/tmp/oc apply -f -cat <<'YAML' | /var/tmp/oc apply -f -apiVersion: config.openshift.io/v1kind: OAuthmetadata:  name: clusterspec:  identityProviders:    - name: Breakglass HTPasswd        mappingMethod: claim        type: HTPasswd        htpasswd:        fileData:          name: breakglass-htpasswdYAMLuntil [ "$(/var/tmp/oc get clusteroperator authentication -o jsonpath='{.status.conditions[?(@.type=="Available")].status},{.status.conditions[?(@.type=="Progressing")].status},{.status.conditions[?(@.type=="Degraded")].status}')" = "True,False,False" ]; do  sleep 10doneuntil curl -skf "https://$(/var/tmp/oc get route oauth-openshift -n openshift-authentication -o jsonpath='{.spec.host}')/healthz" >/dev/null; do  sleep 10done/var/tmp/oc login https://api.ocp.workshop.lan:6443 \  --username=breakglass-admin \  --password='<lab-default-password>' \  --insecure-skip-tls-verify \  --kubeconfig=/tmp/kubeconfig-breakglass-test/var/tmp/oc whoami --kubeconfig=/tmp/kubeconfig-breakglass-testcat <<'YAML' | /var/tmp/oc apply -f -apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata:  name: breakglass-admin-cluster-adminroleRef:  apiGroup: rbac.authorization.k8s.io  kind: ClusterRole  name: cluster-adminsubjects:  - apiGroup: rbac.authorization.k8s.io      kind: User      name: breakglass-adminYAML/var/tmp/oc delete secret kubeadmin -n kube-system --ignore-not-found=true! /var/tmp/oc get secret kubeadmin -n kube-system/var/tmp/oc logout --kubeconfig=/tmp/kubeconfig-breakglass-test || truerm -f /tmp/kubeconfig-breakglass-test /tmp/htpasswdEOF

Deploy Keycloak after ODF so its PostgreSQL PVC can bind to the Ceph RBD storage class. The intended state is:

  • namespace keycloak
  • rhbk-operator
  • PostgreSQL backed by ocs-storagecluster-ceph-rbd
  • Keycloak route sso.apps.ocp.workshop.lan
  • realm openshift
  • client openshift
  • LDAP federation against the IdM compat tree
  • a groups protocol mapper so OpenShift receives group membership claims

Manual equivalent for the Keycloak install itself:

Shell
# Install the Keycloak operator and base deployment.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc create namespace keycloak || truecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata:  name: keycloak  namespace: keycloakspec:  targetNamespaces:    - keycloak---apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata:  name: rhbk-operator  namespace: keycloakspec:  channel: stable-v26.2  installPlanApproval: Manual  name: rhbk-operator  source: cs-redhat-operator-index-v4-20  sourceNamespace: openshift-marketplaceYAMLoc create secret generic workshop-keycloak-bootstrap-admin \  -n keycloak \  --from-literal=username=admin \  --from-literal=password='<lab-default-password>' \  --dry-run=client -o yaml | oc apply -f -oc create secret generic workshop-keycloak-db \  -n keycloak \  --from-literal=username=keycloak \  --from-literal=password='<lab-default-password>' \  --dry-run=client -o yaml | oc apply -f -oc extract -n openshift-ingress secret/ingress-default-idm-tls --to=/tmp/keycloak-tls --confirmoc create secret tls workshop-keycloak-tls \  -n keycloak \  --cert=/tmp/keycloak-tls/tls.crt \  --key=/tmp/keycloak-tls/tls.key \  --dry-run=client -o yaml | oc apply -f -until [ -n "$(oc get subscription rhbk-operator -n keycloak -o jsonpath='{.status.installplan.name}' 2>/dev/null)" ]; do  sleep 10doneINSTALLPLAN="$(oc get subscription rhbk-operator -n keycloak -o jsonpath='{.status.installplan.name}')"oc patch installplan "$INSTALLPLAN" -n keycloak --type=merge -p '{"spec":{"approved":true}}'until [ "$(oc get subscription rhbk-operator -n keycloak -o jsonpath='{.status.currentCSV}' 2>/dev/null | xargs -r -I{} oc get csv {} -n keycloak -o jsonpath='{.status.phase}')" = "Succeeded" ]; do  sleep 10donecat <<'YAML' | oc apply -f -apiVersion: apps/v1kind: StatefulSetmetadata:  name: postgres-db  namespace: keycloakspec:  serviceName: postgres-db  selector:    matchLabels:      app: postgres-db  replicas: 1  template:    metadata:      labels:        app: postgres-db    spec:      containers:        - name: postgres-db            image: registry.redhat.io/rhel9/postgresql-15:latest            env:            - name: POSTGRESQL_USER                valueFrom:                secretKeyRef:                  name: workshop-keycloak-db                  key: username            - name: POSTGRESQL_PASSWORD                valueFrom:                secretKeyRef:                  name: workshop-keycloak-db                  key: password            - name: POSTGRESQL_DATABASE                value: keycloak            ports:            - containerPort: 5432                name: postgres            volumeMounts:            - mountPath: /var/lib/pgsql/data                name: pgdata  volumeClaimTemplates:    - metadata:        name: pgdata        spec:        storageClassName: ocs-storagecluster-ceph-rbd        accessModes:          - ReadWriteOnce        resources:          requests:            storage: 5Gi---apiVersion: v1kind: Servicemetadata:  name: postgres-db  namespace: keycloakspec:  selector:    app: postgres-db  ports:    - port: 5432        targetPort: 5432        name: postgres---apiVersion: k8s.keycloak.org/v2alpha1kind: Keycloakmetadata:  name: workshop-keycloak  namespace: keycloakspec:  instances: 1  bootstrapAdmin:    user:      secret: workshop-keycloak-bootstrap-admin  db:    vendor: postgres    host: postgres-db    port: 5432    database: keycloak    usernameSecret:      name: workshop-keycloak-db      key: username    passwordSecret:      name: workshop-keycloak-db      key: password  http:    tlsSecret: workshop-keycloak-tls  hostname:    hostname: https://sso.apps.ocp.workshop.lan  ingress:    enabled: false  proxy:    headers: xforwardedYAMLuntil [ "$(oc get keycloak workshop-keycloak -n keycloak -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}' 2>/dev/null)" = "True" ]; do  sleep 10doneoc create route reencrypt workshop-keycloak \  --service=workshop-keycloak-service \  --cert=/tmp/keycloak-tls/tls.crt \  --key=/tmp/keycloak-tls/tls.key \  --dest-ca-cert=/etc/ipa/ca.crt \  --ca-cert=/etc/ipa/ca.crt \  --hostname=sso.apps.ocp.workshop.lan \  -n keycloak \  --dry-run=client -o yaml | oc apply -f -until curl -skf https://sso.apps.ocp.workshop.lan/realms/master/.well-known/openid-configuration >/dev/null; do  sleep 10doneEOF

OpenShift OAuth is then patched to trust Keycloak OIDC and map the groups claim into OpenShift groups. The resulting effective authorization model is:

  • IdM group membership is the source of truth
  • Keycloak emits groups
  • OpenShift maps claims.groups
  • access-openshift-admin is bound to cluster-admin

Manual equivalent for the OIDC federation and OAuth patch:

Shell
# Configure the Keycloak realm, client, LDAP federation, and OpenShift OAuth integration.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigOAUTH_HOST="$(oc get route oauth-openshift -n openshift-authentication -o jsonpath='{.spec.host}')"KEYCLOAK_ADMIN_TOKEN="$(curl --cacert /etc/ipa/ca.crt -sS   -X POST https://sso.apps.ocp.workshop.lan/realms/master/protocol/openid-connect/token   -H 'Content-Type: application/x-www-form-urlencoded'   --data-urlencode 'grant_type=password'   --data-urlencode 'client_id=admin-cli'   --data-urlencode 'username=admin'   --data-urlencode 'password=<lab-default-password>' | jq -r .access_token)"curl --cacert /etc/ipa/ca.crt -sS   -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}"   -H 'Content-Type: application/json'   -X POST https://sso.apps.ocp.workshop.lan/admin/realms   -d '{"realm":"openshift","enabled":true,"displayName":"OpenShift","sslRequired":"external","registrationAllowed":false,"resetPasswordAllowed":false,"rememberMe":false,"loginWithEmailAllowed":false}' || trueCLIENT_ID="$(curl --cacert /etc/ipa/ca.crt -sS   -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}"   "https://sso.apps.ocp.workshop.lan/admin/realms/openshift/clients?clientId=openshift" | jq -r '.[0].id')"if [ -z "${CLIENT_ID}" ] || [ "${CLIENT_ID}" = "null" ]; then  cat >/tmp/keycloak-openshift-client.json <<JSON{  "clientId": "openshift",  "name": "OpenShift",  "enabled": true,  "protocol": "openid-connect",  "publicClient": false,  "secret": "<lab-default-password>",  "standardFlowEnabled": true,  "directAccessGrantsEnabled": true,  "serviceAccountsEnabled": false,  "frontchannelLogout": true,  "redirectUris": ["https://${OAUTH_HOST}/oauth2callback/Keycloak"],  "webOrigins": ["+"],  "defaultClientScopes": ["profile", "email", "roles", "web-origins"]}JSON  curl --cacert /etc/ipa/ca.crt -sS     -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}"     -H 'Content-Type: application/json'     -X POST https://sso.apps.ocp.workshop.lan/admin/realms/openshift/clients     --data-binary @/tmp/keycloak-openshift-client.json  CLIENT_ID="$(curl --cacert /etc/ipa/ca.crt -sS -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" "https://sso.apps.ocp.workshop.lan/admin/realms/openshift/clients?clientId=openshift" | jq -r '.[0].id')"ficurl --cacert /etc/ipa/ca.crt -sS   -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}"   -H 'Content-Type: application/json'   -X POST "https://sso.apps.ocp.workshop.lan/admin/realms/openshift/clients/${CLIENT_ID}/protocol-mappers/models"   -d '{"name":"groups","protocol":"openid-connect","protocolMapper":"oidc-group-membership-mapper","config":{"full.path":"false","id.token.claim":"true","access.token.claim":"true","userinfo.token.claim":"true","claim.name":"groups","multivalued":"true"}}' || trueREALM_ID="$(curl --cacert /etc/ipa/ca.crt -sS -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" https://sso.apps.ocp.workshop.lan/admin/realms/openshift | jq -r .id)"curl --cacert /etc/ipa/ca.crt -sS   -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}"   -H 'Content-Type: application/json'   -X POST https://sso.apps.ocp.workshop.lan/admin/realms/openshift/components   -d '{"name":"idm-compat","parentId":"'"${REALM_ID}"'","providerId":"ldap","providerType":"org.keycloak.storage.UserStorageProvider","config":{"enabled":["true"],"priority":["0"],"fullSyncPeriod":["-1"],"changedSyncPeriod":["-1"],"cachePolicy":["DEFAULT"],"batchSizeForSync":["1000"],"importEnabled":["true"],"syncRegistrations":["false"],"editMode":["READ_ONLY"],"vendor":["other"],"usernameLDAPAttribute":["uid"],"rdnLDAPAttribute":["uid"],"uuidLDAPAttribute":["uid"],"userObjectClasses":["posixAccount"],"connectionUrl":["ldap://idm-01.workshop.lan"],"usersDn":["cn=users,cn=compat,dc=workshop,dc=lan"],"authType":["simple"],"bindDn":["cn=Directory Manager"],"bindCredential":["<lab-default-password>"],"searchScope":["2"],"validatePasswordPolicy":["false"],"trustEmail":["false"],"connectionPooling":["true"],"pagination":["true"],"startTls":["false"]}}' || trueLDAP_ID="$(curl --cacert /etc/ipa/ca.crt -sS -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" "https://sso.apps.ocp.workshop.lan/admin/realms/openshift/components?type=org.keycloak.storage.UserStorageProvider" | jq -r '.[] | select(.name=="idm-compat") | .id')"curl --cacert /etc/ipa/ca.crt -sS   -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}"   -H 'Content-Type: application/json'   -X POST "https://sso.apps.ocp.workshop.lan/admin/realms/openshift/components"   -d '{"name":"idm-compat-group-mapper","parentId":"'"${LDAP_ID}"'","providerId":"group-ldap-mapper","providerType":"org.keycloak.storage.ldap.mappers.LDAPStorageMapper","config":{"enabled":["true"],"priority":["0"],"fullSyncPeriod":["-1"],"changedSyncPeriod":["-1"],"cachePolicy":["DEFAULT"],"batchSizeForSync":["1000"],"mode":["LDAP_ONLY"],"groups.dn":["cn=groups,cn=compat,dc=workshop,dc=lan"],"group.name.ldap.attribute":["cn"],"group.object.classes":["posixGroup"],"preserve.group.inheritance":["false"],"membership.ldap.attribute":["memberUid"],"membership.attribute.type":["UID"],"groups.ldap.filter":[""],"user.roles.retrieve.strategy":["LOAD_GROUPS_BY_MEMBER_ATTRIBUTE"],"drop.non.existing.groups.during.sync":["false"]}}' || trueoc create secret generic oidc-client-secret \  -n openshift-config \  --from-literal=clientSecret='<lab-default-password>' \  --dry-run=client -o yaml | oc apply -f -oc create configmap oidc-ca \  -n openshift-config \  --from-file=ca.crt=/etc/ipa/ca.crt \  --dry-run=client -o yaml | oc apply -f -cat <<'YAML' | oc apply -f -apiVersion: config.openshift.io/v1kind: OAuthmetadata:  name: clusterspec:  identityProviders:    - name: Breakglass HTPasswd        mappingMethod: claim        type: HTPasswd        htpasswd:        fileData:          name: breakglass-htpasswd    - name: Keycloak        mappingMethod: claim        type: OpenID        openID:        clientID: openshift        clientSecret:          name: oidc-client-secret        issuer: https://sso.apps.ocp.workshop.lan/realms/openshift        ca:          name: oidc-ca        claims:          preferredUsername:            - preferred_username          name:            - name            - preferred_username          email:            - email            - preferred_username          groups:            - groups        extraScopes:          - email          - profileYAMLcat <<'YAML' | oc apply -f -apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata:  name: access-openshift-admin-cluster-adminroleRef:  apiGroup: rbac.authorization.k8s.io  kind: ClusterRole  name: cluster-adminsubjects:  - apiGroup: rbac.authorization.k8s.io      kind: Group      name: access-openshift-adminYAMLEOF

That means adding a native IdM user, or a trusted AD user that lands in the same IdM role group, to access-openshift-admin makes that user a cluster admin once they authenticate through Keycloak.

Validate the end state with both a native IdM user and an AD-backed user.

Shell
# Validate the Keycloak OIDC login path.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfiguntil [ "$(/var/tmp/oc get co authentication -o jsonpath='{.status.conditions[?(@.type=="Progressing")].status}')" = "False" ]; do  sleep 10done/var/tmp/oc get oauth cluster -o jsonpath='{range .spec.identityProviders[*]}{.name} => groups={.openID.claims.groups}{"\n"}{end}'/var/tmp/oc get groups/var/tmp/oc get clusterrolebinding access-openshift-admin-cluster-adminEOF

If you deliberately want to validate the old direct-LDAP path, treat it as an optional side test after OIDC is working. Do not treat it as the default cluster auth model, and do not replace the breakglass plus OIDC baseline with it.

25. Install Kubernetes NMState

Install Kubernetes NMState and create the VLAN policies needed by later VM and live-migration networking.

Note

Automation reference: playbooks/day2/openshift-post-install-nmstate.yml, role openshift_post_install_nmstate.

Install the NMState operator and create the singleton NMState instance.

Shell
# Install the Kubernetes NMState operator.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc create namespace openshift-nmstate || truecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata:  name: openshift-nmstate  namespace: openshift-nmstatespec:  targetNamespaces:    - openshift-nmstateYAMLcat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata:  name: kubernetes-nmstate-operator  namespace: openshift-nmstatespec:  channel: stable  installPlanApproval: Automatic  name: kubernetes-nmstate-operator  source: cs-redhat-operator-index-v4-20  sourceNamespace: openshift-marketplaceYAMLoc wait --for=condition=Established crd/nmstates.nmstate.io --timeout=20mcat <<'YAML' | oc apply -f -apiVersion: nmstate.io/v1kind: NMStatemetadata:  name: nmstateYAMLoc -n openshift-nmstate wait --for=condition=Available deployment/nmstate-operator --timeout=20mEOF

Create the VLAN policies used later by OpenShift Virtualization and VM workloads.

Shell
# Apply the nmstate desired state policy.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigcat <<'YAML' | oc apply -f -apiVersion: nmstate.io/v1kind: NodeNetworkConfigurationPolicymetadata:  name: kubevirt-live-migration-vlanspec:  nodeSelector:    node-role.kubernetes.io/worker: ""  desiredState:    interfaces:      - name: vlan202          description: OpenShift Virtualization live migration VLAN          type: vlan          state: up          vlan:          base-iface: enp1s0          id: 202          ipv4:          enabled: false          ipv6:          enabled: falseYAMLcat <<'YAML' | oc apply -f -apiVersion: nmstate.io/v1kind: NodeNetworkConfigurationPolicymetadata:  name: vm-data-vlan-300spec:  nodeSelector:    node-role.kubernetes.io/worker: ""  desiredState:    interfaces:      - name: vlan300          description: Routed VM data network A          type: vlan          state: up          vlan:          base-iface: enp1s0          id: 300          ipv4:          enabled: false          ipv6:          enabled: falseYAMLcat <<'YAML' | oc apply -f -apiVersion: nmstate.io/v1kind: NodeNetworkConfigurationPolicymetadata:  name: vm-data-vlan-301spec:  nodeSelector:    node-role.kubernetes.io/worker: ""  desiredState:    interfaces:      - name: vlan301          description: Routed VM data network B          type: vlan          state: up          vlan:          base-iface: enp1s0          id: 301          ipv4:          enabled: false          ipv6:          enabled: falseYAMLcat <<'YAML' | oc apply -f -apiVersion: nmstate.io/v1kind: NodeNetworkConfigurationPolicymetadata:  name: vm-data-vlan-302spec:  nodeSelector:    node-role.kubernetes.io/worker: ""  desiredState:    interfaces:      - name: vlan302          description: Isolated VM data network          type: vlan          state: up          vlan:          base-iface: enp1s0          id: 302          ipv4:          enabled: false          ipv6:          enabled: falseYAMLoc wait nncp/kubevirt-live-migration-vlan --for=jsonpath='{.status.conditions[?(@.type=="Available")].status}'=True --timeout=20moc wait nncp/vm-data-vlan-300 --for=jsonpath='{.status.conditions[?(@.type=="Available")].status}'=True --timeout=20moc wait nncp/vm-data-vlan-301 --for=jsonpath='{.status.conditions[?(@.type=="Available")].status}'=True --timeout=20moc wait nncp/vm-data-vlan-302 --for=jsonpath='{.status.conditions[?(@.type=="Available")].status}'=True --timeout=20mEOF

Design note:

  • this lab currently uses interface-name matching with enp1s0 because it is easy to read and explain
  • nmstate also supports matching the parent uplink by MAC address
  • a MAC-matched model is more robust across different hardware and interface naming schemes, but it requires generating a separate policy per node

26. Deploy ODF Declaratively

Deploy ODF declaratively, including the host-side cleanup needed to avoid stale Ceph and OLM state.

Note

Automation reference: the ODF phase inside playbooks/day2/openshift-post-install.yml, primarily role openshift_post_install_odf for internal mode and openshift_post_install_odf_external for external mode.

Warning

ODF must run before Virtualization (27) and NetObserv (29). CNV expects ocs-storagecluster-ceph-rbd to be available when it sets the default virt storage class. NetObserv needs NooBaa S3 for Loki. Running them out of order causes silent failures that are hard to diagnose.

Wipe stale Ceph bluestore labels from OSD backing devices, clean up any duplicate OperatorGroups in the Local Storage namespace, label and taint the infra nodes for storage, configure Local Storage discovery, create the LocalVolumeSet, and apply the StorageCluster.

That is the internal ODF path. External mode is selected with openshift_post_install_odf_mode: external and deliberately skips Local Storage Operator resources, local OSD disks, storage-node labels, and storage taints. It imports external Ceph cluster details from either openshift_post_install_odf_external_cluster_details_file or openshift_post_install_odf_external_cluster_details_b64, applies the external StorageCluster, and keeps the same ordering slot so Keycloak, AAP, NetObserv, and any later storage consumers still wait for ODF first.

External mode can also converge OVN host routing before the external StorageCluster is created:

YAML
openshift_post_install_odf_external_gateway_routing_via_host: trueopenshift_post_install_odf_external_gateway_ip_forwarding: Global

This is required when ODF pods need to reach off-cluster Ceph endpoints over a node-reachable routed storage network.

Caution

OSD device preparation on reused EBS volumes. A conventional small head/tail wipe is not sufficient for reused ODF disks. The current recovery path wipes the first 2 GiB, fixed BlueStore label positions at 0, 1, 10, 100, and 1000 GiB, and the device tail. It also purges /var/lib/rook/* and /var/lib/ceph/* on the infra nodes before reinstall. Destructive recovery is not part of a normal rerun. It must be explicitly forced.

The manual equivalent on the hypervisor:

Shell
# Wipe the ODF data disks on the infra nodes.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'for dev in /dev/ebs/ocp-infra-01-data /dev/ebs/ocp-infra-02-data /dev/ebs/ocp-infra-03-data; do  size_mb=$(( $(blockdev --getsize64 "$dev") / 1024 / 1024 ))  blkdiscard "$dev" || true  wipefs --all --force "$dev" || true  dd if=/dev/zero of="$dev" bs=4M count=512 oflag=direct conv=fsync,notrunc status=none  for offset_mb in 0 1024 10240 102400 1024000; do    if [ "$offset_mb" -lt "$size_mb" ]; then      dd if=/dev/zero of="$dev" bs=1M seek=$offset_mb count=64 oflag=direct conv=fsync,notrunc status=none    fi  done  if [ "$size_mb" -gt 256 ]; then    dd if=/dev/zero of="$dev" bs=1M seek=$(( size_mb - 256 )) count=256 oflag=direct conv=fsync,notrunc status=none  fidonefor node in ocp-infra-01 ocp-infra-02 ocp-infra-03; do  oc debug "node/${node}" -- chroot /host bash -lc 'rm -rf /var/lib/rook/* /var/lib/ceph/*'doneEOF

Warning

OperatorGroup cleanup. OLM can leave behind auto-generated OperatorGroups when namespaces are recreated. If more than one OperatorGroup exists in openshift-local-storage, OLM refuses to process subscriptions (MultipleOperatorGroupsFound). The automation deletes any stale OperatorGroups before applying the subscription. If you are running this manually, check first.

Before you apply the LocalVolume and StorageCluster CRs, install the Local Storage Operator and ODF operator so the APIs exist.

Shell
# Install the local storage and ODF operators.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc create namespace openshift-local-storage || trueoc create namespace openshift-storage || truefor og in $(oc get operatorgroup -n openshift-local-storage -o jsonpath='{.items[*].metadata.name}' 2>/dev/null); do  oc delete operatorgroup "$og" -n openshift-local-storage --ignore-not-found=truedonecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata:  name: openshift-local-storage  namespace: openshift-local-storagespec:  targetNamespaces:    - openshift-local-storage---apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata:  name: local-storage-operator  namespace: openshift-local-storagespec:  channel: stable  installPlanApproval: Automatic  name: local-storage-operator  source: cs-redhat-operator-index-v4-20  sourceNamespace: openshift-marketplace---apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata:  name: openshift-storage  namespace: openshift-storagespec:  targetNamespaces:    - openshift-storage---apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata:  name: odf-operator  namespace: openshift-storagespec:  channel: stable-4.20  installPlanApproval: Automatic  name: odf-operator  source: cs-redhat-operator-index-v4-20  sourceNamespace: openshift-marketplaceYAMLuntil [ "$(oc get subscription local-storage-operator -n openshift-local-storage -o jsonpath='{.status.currentCSV}' 2>/dev/null | xargs -r -I{} oc get csv {} -n openshift-local-storage -o jsonpath='{.status.phase}')" = "Succeeded" ]; do  sleep 10doneuntil [ "$(oc get subscription odf-operator -n openshift-storage -o jsonpath='{.status.currentCSV}' 2>/dev/null | xargs -r -I{} oc get csv {} -n openshift-storage -o jsonpath='{.status.phase}')" = "Succeeded" ]; do  sleep 10doneEOF

Current default:

  • openshift_post_install_odf_multus_enabled: false

Reason:

  • this project runs ODF in a nested KVM + OVS + libvirt environment
  • ODF public-network Multus/macvlan on VLAN 201 is not a safe default on that hypervisor path
  • the stable default is therefore the pod network unless the hypervisor is intentionally engineered for the extra secondary-MAC/promiscuous-mode requirements
Shell
# Label the infra nodes and deploy the ODF storage resources.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigfor node in ocp-infra-01 ocp-infra-02 ocp-infra-03; do  oc label node "${node}" cluster.ocs.openshift.io/openshift-storage='' --overwrite  oc adm taint node "${node}" node.ocs.openshift.io/storage=true:NoSchedule --overwritedoneoc create namespace openshift-local-storage || trueoc create namespace openshift-storage || truecat <<'YAML' | oc apply -f -apiVersion: local.storage.openshift.io/v1alpha1kind: LocalVolumeDiscoverymetadata:  name: auto-discover-devices  namespace: openshift-local-storagespec:  nodeSelector:    nodeSelectorTerms:      - matchExpressions:          - key: node-role.kubernetes.io/infra              operator: Exists  tolerations:    - key: node-role.kubernetes.io/infra        operator: Exists        effect: NoSchedule    - key: node.ocs.openshift.io/storage        operator: Equal        value: "true"        effect: NoScheduleYAMLcat <<'YAML' | oc apply -f -apiVersion: local.storage.openshift.io/v1alpha1kind: LocalVolumeSetmetadata:  name: ceph-osd  namespace: openshift-local-storagespec:  storageClassName: ceph-osd  volumeMode: Block  fsType: ext4  maxDeviceCount: 1  deviceInclusionSpec:    deviceTypes: [disk]    minSize: 900Gi    maxSize: 1000Gi  nodeSelector:    nodeSelectorTerms:      - matchExpressions:          - key: cluster.ocs.openshift.io/openshift-storage              operator: Exists  tolerations:    - key: node-role.kubernetes.io/infra        operator: Exists        effect: NoSchedule    - key: node.ocs.openshift.io/storage        operator: Equal        value: "true"        effect: NoScheduleYAMLcat <<'YAML' | oc apply -f -apiVersion: ocs.openshift.io/v1kind: StorageClustermetadata:  name: ocs-storagecluster  namespace: openshift-storagespec:  manageNodes: false  monDataDirHostPath: /var/lib/rook  multiCloudGateway:    reconcileStrategy: manage  storageDeviceSets:    - name: ocs-deviceset        count: 1        replica: 3        portable: false        dataPVCTemplate:        spec:          accessModes: [ReadWriteOnce]          resources:            requests:              storage: 980Gi          storageClassName: ceph-osd          volumeMode: Block        placement:        nodeAffinity:          requiredDuringSchedulingIgnoredDuringExecution:            nodeSelectorTerms:              - matchExpressions:                  - key: cluster.ocs.openshift.io/openshift-storage                      operator: Exists        tolerations:          - key: node-role.kubernetes.io/infra              operator: Exists              effect: NoSchedule          - key: node.ocs.openshift.io/storage              operator: Equal              value: "true"              effect: NoScheduleYAMLEOF

27. Install OpenShift Virtualization

Install OpenShift Virtualization and the workload-availability operators that support it.

Note

Automation reference: playbooks/day2/openshift-post-install-virtualization.yml, role openshift_post_install_virtualization.

Install CNV, set its default storage class, and install the workload availability operators.

Shell
# Install OpenShift Virtualization and the recovery operators.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc create namespace openshift-cnv || truecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata:  name: kubevirt-hyperconverged  namespace: openshift-cnvspec:  targetNamespaces:    - openshift-cnvYAMLcat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata:  name: kubevirt-hyperconverged  namespace: openshift-cnvspec:  channel: stable  installPlanApproval: Automatic  name: kubevirt-hyperconverged  source: cs-redhat-operator-index-v4-20  sourceNamespace: openshift-marketplaceYAMLoc wait --for=condition=Established crd/hyperconvergeds.hco.kubevirt.io --timeout=20moc annotate storageclass ocs-storagecluster-ceph-rbd \  storageclass.kubevirt.io/is-default-virt-class=true --overwritecat <<'YAML' | oc apply -f -apiVersion: hco.kubevirt.io/v1beta1kind: HyperConvergedmetadata:  name: kubevirt-hyperconverged  namespace: openshift-cnvspec:  vmStateStorageClass: ocs-storagecluster-ceph-rbdYAMLoc create namespace openshift-workload-availability || truecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata:  name: openshift-workload-availability  namespace: openshift-workload-availabilityspec:  targetNamespaces: []YAMLcat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata:  name: node-healthcheck-operator  namespace: openshift-workload-availabilityspec:  channel: stable  installPlanApproval: Automatic  name: node-healthcheck-operator  source: cs-redhat-operator-index-v4-20  sourceNamespace: openshift-marketplace---apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata:  name: fence-agents-remediation  namespace: openshift-workload-availabilityspec:  channel: stable  installPlanApproval: Automatic  name: fence-agents-remediation  source: cs-redhat-operator-index-v4-20  sourceNamespace: openshift-marketplaceYAMLEOF

28. Install The Web Terminal

Install the Web Terminal operator, build the custom tooling image, and point the devworkspace template at that image.

Note

Automation reference: playbooks/day2/openshift-post-install-web-terminal.yml, role openshift_post_install_web_terminal.

Install the operator, build the custom tooling image in the mirror registry, and patch the Web Terminal tooling template to use it.

Shell
# Install the Web Terminal operator.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigcat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata:  name: web-terminal  namespace: openshift-operatorsspec:  channel: fast  installPlanApproval: Automatic  name: web-terminal  source: cs-redhat-operator-index-v4-20  sourceNamespace: openshift-marketplaceYAMLEOF

Build and push the tooling image from the mirror registry host.

Shell
# Build and push the web terminal tooling image.ssh -i /opt/openshift/secrets/hypervisor-admin.key cloud-user@172.16.0.20 <<'EOF'sudo -imkdir -p /var/tmp/web-terminal-toolingcat <<'CONTAINERFILE' >/var/tmp/web-terminal-tooling/ContainerfileFROM registry.redhat.io/web-terminal/web-terminal-tooling-rhel9:latestRUN microdnf install -y \    bind-utils \    iperf3 \    iproute \    iputils \    jq \    nmap-ncat \    openldap-clients \    procps-ng \    traceroute && \    microdnf clean allCONTAINERFILEpodman build -t mirror-registry.workshop.lan:8443/init/web-terminal-tooling-custom:latest \  /var/tmp/web-terminal-toolingpodman push mirror-registry.workshop.lan:8443/init/web-terminal-tooling-custom:latestEOF

Patch the pull secret and the terminal tooling template.

Shell
# Patch the cluster pull secret and the web terminal tooling template.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigREGISTRY_AUTH="$(printf '%s' 'init:<lab-default-password>' | base64 -w0)"oc extract secret/pull-secret -n openshift-config --to=/tmp/pull-secret --confirmcat /tmp/pull-secret/.dockerconfigjson | jq --arg auth "${REGISTRY_AUTH}" \  '.auths["mirror-registry.workshop.lan:8443"] = {"auth":$auth,"email":"init@workshop.lan"}' \  >/tmp/dockerconfigjsonoc set data secret/pull-secret -n openshift-config \  .dockerconfigjson="$(cat /tmp/dockerconfigjson)"cat <<'YAML' | oc apply -f -apiVersion: workspace.devfile.io/v1alpha2kind: DevWorkspaceTemplatemetadata:  name: web-terminal-tooling  namespace: openshift-operatorsspec:  components:    - name: web-terminal-tooling        container:        image: mirror-registry.workshop.lan:8443/init/web-terminal-tooling-custom:latestYAMLoc -n openshift-terminal delete devworkspace --all || trueEOF

29. Install Network Observability And Loki

Install Network Observability and Loki, then create the ODF-backed FlowCollector and LokiStack resources.

Note

Automation reference: playbooks/day2/openshift-post-install-netobserv.yml, role openshift_post_install_netobserv.

Install the operators, create an ODF-backed LokiStack, and create a tuned FlowCollector.

Shell
# Install the Network Observability operators.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc create namespace netobserv || trueoc create namespace openshift-netobserv-operator || trueoc create namespace openshift-operators-redhat || truecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata:  name: openshift-netobserv-operator  namespace: openshift-netobserv-operatorspec:  targetNamespaces: []---apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata:  name: openshift-operators-redhat  namespace: openshift-operators-redhatspec:  targetNamespaces: []YAMLcat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata:  name: netobserv-operator  namespace: openshift-netobserv-operatorspec:  channel: stable  installPlanApproval: Automatic  name: netobserv-operator  source: cs-redhat-operator-index-v4-20  sourceNamespace: openshift-marketplace---apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata:  name: loki-operator  namespace: openshift-operators-redhatspec:  channel: stable-6.2  installPlanApproval: Automatic  name: loki-operator  source: cs-redhat-operator-index-v4-20  sourceNamespace: openshift-marketplaceYAMLEOF

Create the ODF-backed object bucket, derive the Loki object storage secret from the generated NooBaa credentials, and then apply LokiStack and FlowCollector.

For external ODF profiles, make sure the NetObserv storage and scheduling values come from the effective ODF variables. External mode does not create internal storage nodes, so inherited internal-ODF node selectors or tolerations should be cleared unless the profile deliberately adds equivalent labels and taints.

Shell
# Create the object bucket, derive the Loki credentials, and deploy Network Observability.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigcat <<'YAML' | oc apply -f -apiVersion: objectbucket.io/v1alpha1kind: ObjectBucketClaimmetadata:  name: netobserv-loki  namespace: netobservspec:  generateBucketName: netobserv-loki-  storageClassName: openshift-storage.noobaa.ioYAMLuntil [ "$(oc get obc netobserv-loki -n netobserv -o jsonpath='{.status.phase}' 2>/dev/null)" = "Bound" ]; do  sleep 5doneaccess_key="$(oc get secret netobserv-loki -n netobserv -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d)"secret_key="$(oc get secret netobserv-loki -n netobserv -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 -d)"bucket_name="$(oc get configmap netobserv-loki -n netobserv -o jsonpath='{.data.BUCKET_NAME}')"bucket_host="$(oc get configmap netobserv-loki -n netobserv -o jsonpath='{.data.BUCKET_HOST}')"bucket_port="$(oc get configmap netobserv-loki -n netobserv -o jsonpath='{.data.BUCKET_PORT}')"cat <<YAML | oc apply -f -apiVersion: v1kind: Secretmetadata:  name: loki-object-storage  namespace: netobservstringData:  bucketnames: ${bucket_name}  endpoint: https://${bucket_host}:${bucket_port}  access_key_id: ${access_key}  access_key_secret: ${secret_key}  region: us-east-1type: OpaqueYAMLcat <<'YAML' | oc apply -f -apiVersion: loki.grafana.com/v1kind: LokiStackmetadata:  name: netobserv-loki  namespace: netobservspec:  size: 1x.extra-small  storage:    schemas:      - effectiveDate: "2024-01-01"          version: v13    secret:      name: loki-object-storage      type: s3  storageClassName: ocs-storagecluster-ceph-rbd  tenants:    mode: openshift-networkYAMLcat <<'YAML' | oc apply -f -apiVersion: flows.netobserv.io/v1beta2kind: FlowCollectormetadata:  name: clusterspec:  namespace: netobserv  deploymentModel: Service  agent:    type: eBPF    ebpf:      sampling: 100      privileged: true      features:        - PacketDrop        - DNSTracking        - FlowRTT        - NetworkEvents        - PacketTranslation      excludeInterfaces:        - lo  processor:    consumerReplicas: 1    subnetLabels:      openShiftAutoDetect: true      customLabels:        - name: EXT:management            cidrs: [172.16.0.0/24]        - name: EXT:data300            cidrs: [172.16.20.0/24]        - name: EXT:data301            cidrs: [172.16.21.0/24]        - name: EXT:isolated302            cidrs: [172.16.22.0/24]    metrics:      disableAlerts: false  consolePlugin:    enable: true  networkPolicy:    enable: true  prometheus:    querier:      enable: true  loki:    enable: true    mode: LokiStack    lokiStack:      name: netobserv-loki      namespace: netobservYAMLEOF

30. Install Ansible Automation Platform

Install Ansible Automation Platform on OpenShift and configure it to authenticate through Keycloak OIDC backed by IdM.

Note

Automation reference: playbooks/day2/openshift-post-install-aap.yml, role openshift_post_install_aap.

Architecture reference: AUTH MODEL.

Install AAP on OpenShift and wire it to the same Keycloak realm already used for the cluster OAuth path.

Use the effective ODF RBD storage class for the AAP database. Internal ODF usually resolves to ocs-storagecluster-ceph-rbd; external ODF may resolve to an imported class such as ocs-external-storagecluster-ceph-rbd.

Shell
# Install Ansible Automation Platform and its operator dependencies.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc create namespace aap || truecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata:  name: aap  namespace: aapspec:  targetNamespaces:    - aap---apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata:  name: ansible-automation-platform-operator  namespace: aapspec:  channel: stable-2.6  installPlanApproval: Automatic  name: ansible-automation-platform-operator  source: cs-redhat-operator-index-v4-20  sourceNamespace: openshift-marketplaceYAMLcat <<'YAML' | oc apply -f -apiVersion: v1kind: Secretmetadata:  name: workshop-aap-admin-password  namespace: aapstringData:  password: <lab-default-password>---apiVersion: v1kind: Secretmetadata:  name: workshop-aap-idm-ca  namespace: aapstringData:  bundle-ca.crt: |YAMLEOF

Append the IdM CA and create the AAP instance.

Shell
# Install the IdM CA into the cluster for AAP trust.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 \  "curl -fsSL http://172.16.0.10/ipa/config/ca.crt >>/tmp/aap-idm-ca.yaml"ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc apply -f /tmp/aap-idm-ca.yamlcat <<'YAML' | oc apply -f -apiVersion: aap.ansible.com/v1alpha1kind: AnsibleAutomationPlatformmetadata:  name: workshop-aap  namespace: aapspec:  admin_user: admin  admin_password_secret: workshop-aap-admin-password  postgres_storage_class: ocs-storagecluster-ceph-rbd  postgres_storage_requirements:    requests:      storage: 20GiYAMLEOF

Configure the Keycloak aap client, add the groups and aap audience protocol mappers, then create the AAP gateway authenticator and superuser map.

The validated clean-build path uses:

  • AAP route: https://aap.apps.ocp.workshop.lan
  • Keycloak route: https://sso.apps.ocp.workshop.lan
  • Keycloak realm: openshift
  • AAP client ID: aap
  • AAP authenticator name: Red Hat build of Keycloak
  • required AAP admin group: access-openshift-admin
Shell
# Configure Keycloak SSO for AAP.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigAAP_ROUTE="$(oc -n aap get route workshop-aap -o jsonpath='{.spec.host}')"KEYCLOAK_ROUTE="$(oc -n keycloak get route workshop-keycloak -o jsonpath='{.spec.host}')"KEYCLOAK_ADMIN_TOKEN="$(curl --cacert /etc/ipa/ca.crt -sS \  -X POST https://${KEYCLOAK_ROUTE}/realms/master/protocol/openid-connect/token \  -H 'Content-Type: application/x-www-form-urlencoded' \  --data-urlencode 'grant_type=password' \  --data-urlencode 'client_id=admin-cli' \  --data-urlencode 'username=admin' \  --data-urlencode 'password=<lab-default-password>' | jq -r .access_token)"CLIENT_ID="$(curl --cacert /etc/ipa/ca.crt -sS \  -H \"Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}\" \  \"https://${KEYCLOAK_ROUTE}/admin/realms/openshift/clients?clientId=aap\" \  | jq -r '.[0].id')"if [ -z "${CLIENT_ID}" ] || [ "${CLIENT_ID}" = "null" ]; then  curl --cacert /etc/ipa/ca.crt -sS \    -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" \    -H 'Content-Type: application/json' \    -X POST "https://${KEYCLOAK_ROUTE}/admin/realms/openshift/clients" \    -d '{      "clientId":"aap",      "enabled":true,      "protocol":"openid-connect",      "publicClient":false,      "standardFlowEnabled":true,      "directAccessGrantsEnabled":true,      "serviceAccountsEnabled":false,      "secret":"<lab-default-password>",      "redirectUris":[        "https://aap.apps.ocp.workshop.lan/*"      ]    }'  CLIENT_ID="$(curl --cacert /etc/ipa/ca.crt -sS \    -H \"Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}\" \    \"https://${KEYCLOAK_ROUTE}/admin/realms/openshift/clients?clientId=aap\" \    | jq -r '.[0].id')"ficurl --cacert /etc/ipa/ca.crt -sS \  -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" \  -H 'Content-Type: application/json' \  -X POST "https://${KEYCLOAK_ROUTE}/admin/realms/openshift/clients/${CLIENT_ID}/protocol-mappers/models" \  -d '{    "name":"groups",    "protocol":"openid-connect",    "protocolMapper":"oidc-group-membership-mapper",    "consentRequired":false,    "config":{      "full.path":"false",      "id.token.claim":"true",      "access.token.claim":"true",      "userinfo.token.claim":"true",      "claim.name":"groups"    }  }' || truecurl --cacert /etc/ipa/ca.crt -sS \  -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" \  -H 'Content-Type: application/json' \  -X POST "https://${KEYCLOAK_ROUTE}/admin/realms/openshift/clients/${CLIENT_ID}/protocol-mappers/models" \  -d '{    "name":"aap-audience",    "protocol":"openid-connect",    "protocolMapper":"oidc-audience-mapper",    "consentRequired":false,    "config":{      "included.client.audience":"aap",      "id.token.claim":"true",      "access.token.claim":"true"    }  }' || trueTOKEN="$(curl -sk -X POST https://${AAP_ROUTE}/api/gateway/v1/token/ \  -H 'Content-Type: application/json' \  -d '{"username":"admin","password":"<lab-default-password>"}' | jq -r .access)"REALM_PUBLIC_KEY="$(curl --cacert /etc/ipa/ca.crt -sS \  "https://${KEYCLOAK_ROUTE}/realms/openshift" | jq -r .public_key)"AAP_AUTH_PAYLOAD="$(jq -n \  --arg public_key "${REALM_PUBLIC_KEY}" '  {    name: "Red Hat build of Keycloak",    enabled: true,    order: 2,    type: "ansible_base.authentication.authenticator_plugins.keycloak",    configuration: {      AUTHORIZATION_URL: "https://sso.apps.ocp.workshop.lan/realms/openshift/protocol/openid-connect/auth",      ACCESS_TOKEN_URL: "https://sso.apps.ocp.workshop.lan/realms/openshift/protocol/openid-connect/token",      KEY: "aap",      SECRET: "<lab-default-password>",      PUBLIC_KEY: $public_key,      GROUPS_CLAIM: "groups"    }  }')"curl -sk -X POST https://${AAP_ROUTE}/api/gateway/v1/authenticators/ \  -H "Authorization: Bearer ${TOKEN}" \  -H 'Content-Type: application/json' \  -d "${AAP_AUTH_PAYLOAD}"AUTH_ID="$(curl -sk -H "Authorization: Bearer ${TOKEN}" \  "https://${AAP_ROUTE}/api/gateway/v1/authenticators/" \  | jq -r '.results[] | select(.name=="Red Hat build of Keycloak") | .id')"curl -sk -X POST https://${AAP_ROUTE}/api/gateway/v1/authenticator_maps/ \  -H "Authorization: Bearer ${TOKEN}" \  -H 'Content-Type: application/json' \  -d @- <<JSON{  "name": "access-openshift-admin AAP superuser",  "map_type": "is_superuser",  "triggers": {    "groups": {      "has_or": [        "access-openshift-admin"      ]    }  },  "authenticator": ${AUTH_ID}}JSONEOF

The automation writes the full JSON payload and drives this through the API directly; the manual runbook keeps the moving parts visible instead of hiding them in a helper script.

Validate the end state with the same two checkpoints the automation now uses before the final browser-style login proof:

  1. the AAP UI advertises only the Keycloak SSO entry
  2. an AD-backed user can obtain an OIDC token for the aap client with the expected group claims

If the lab trust path is enabled, the validated user is ad-ocpadmin@corp.lan. Without AD trust, use the native IdM admin-path user instead.

Shell
# Validate the AAP SSO entry and token flow.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigAAP_ROUTE="$(oc -n aap get route workshop-aap -o jsonpath='{.spec.host}')"KEYCLOAK_ROUTE="$(oc -n keycloak get route workshop-keycloak -o jsonpath='{.spec.host}')"curl -sk "https://${AAP_ROUTE}/api/gateway/v1/ui_auth/" | jq .curl --cacert /etc/ipa/ca.crt -sS \  -X POST "https://${KEYCLOAK_ROUTE}/realms/openshift/protocol/openid-connect/token" \  -H 'Content-Type: application/x-www-form-urlencoded' \  --data-urlencode 'client_id=aap' \  --data-urlencode 'client_secret=<lab-default-password>' \  --data-urlencode 'grant_type=password' \  --data-urlencode 'username=ad-ocpadmin@corp.lan' \  --data-urlencode 'password=<lab-default-password>' \  | jq .EOF

31. Install OpenShift Pipelines

Install OpenShift Pipelines and prepare the Windows EFI image-build lane.

Note

Automation reference: playbooks/day2/openshift-post-install-pipelines.yml, role openshift_post_install_pipelines.

Install Tekton, make sure there is a default storage class, and install the Windows EFI installer pipeline.

Shell
# Install OpenShift Pipelines and the Windows builder pipeline.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc annotate storageclass ocs-storagecluster-ceph-rbd \  storageclass.kubernetes.io/is-default-class=true --overwritecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata:  name: openshift-pipelines-operator-rh  namespace: openshift-operatorsspec:  channel: pipelines-1.20  installPlanApproval: Automatic  name: openshift-pipelines-operator-rh  source: cs-redhat-operator-index-v4-20  sourceNamespace: openshift-marketplaceYAMLoc create namespace windows-image-builder || trueoc adm policy add-role-to-user edit system:serviceaccount:windows-image-builder:pipeline -n windows-image-buildercurl -L \  https://raw.githubusercontent.com/openshift-pipelines/tektoncd-catalog/p/pipelines/windows-efi-installer/4.20.7/windows-efi-installer.yaml \  | oc apply -n windows-image-builder -f -EOF

32. Launch A Windows EFI Build

Launch the Windows Server image-build PipelineRun manually after the Pipelines lane is in place.

Note

Automation reference: playbooks/day2/openshift-windows-server-build.yml, role openshift_windows_server_build.

Set a real Windows ISO URL, then apply the PipelineRun directly.

Shell
# Launch the Windows EFI image build.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigcat <<'YAML' | oc apply -f -apiVersion: tekton.dev/v1kind: PipelineRunmetadata:  name: windows2k22-efi-installer-run  namespace: windows-image-builderspec:  pipelineRef:    name: windows-efi-installer  params:    - name: winImageDownloadURL        value: REPLACE_WITH_WINDOWS_SERVER_ISO_URL    - name: acceptEula        value: "true"    - name: autounattendXMLConfigMapsURL        value: https://raw.githubusercontent.com/rh-ecosystem-edge/windows-machine-config-bootstrapper/main/configmaps/    - name: instanceTypeName        value: u1.large    - name: instanceTypeKind        value: VirtualMachineClusterInstancetype    - name: preferenceName        value: windows.2k22.virtio    - name: virtualMachinePreferenceKind        value: VirtualMachineClusterPreference    - name: autounattendConfigMapName        value: windows2k22-autounattend    - name: virtioContainerDiskName        value: quay.io/kubevirt/virtio-container-disk:centos-stream9    - name: baseDvName        value: win2k22    - name: isoDVName        value: win2k22-install    - name: useBiosMode        value: "false"  taskRunTemplate:    serviceAccountName: pipelineYAMLoc get pipelinerun windows2k22-efi-installer-run -n windows-image-builderEOF

33. Pivot OperatorHub To The Disconnected Catalog

Pivot OperatorHub to the disconnected catalogs produced by the mirror phase.

Note

Automation reference: playbooks/day2/openshift-disconnected-operatorhub.yml, role openshift_disconnected_operatorhub.

Important

In the automated path this runs before any operator subscriptions (sections 25-32). If you are walking the manual process in order, you have already been using the disconnected catalog source names (cs-redhat-operator-index-v4-20). This section exists for reference and for rebuilds where the pivot needs to be reapplied. All subsequent operator installs (sections 25-32) use the disconnected catalog source names (cs-redhat-operator-index-v4-20, cc-redhat-operator-index-v4-20) instead of the upstream redhat-operators / community-operators defaults.

Disable the default remote catalogs, merge mirror-registry auth into the cluster pull secret, attach a dedicated pull secret to the mirrored catalog pods, and wait for the mirrored sources to become READY.

Shell
# Pivot OperatorHub to the disconnected catalog sources.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigREGISTRY_HOST="mirror-registry.workshop.lan:8443"REGISTRY_AUTH="$(printf '%s' 'init:<lab-default-password>' | base64 -w0)"cat <<'YAML' | oc apply -f -apiVersion: config.openshift.io/v1kind: OperatorHubmetadata:  name: clusterspec:  disableAllDefaultSources: trueYAMLfor node in $(oc get nodes -o jsonpath='{.items[*].metadata.name}'); do  oc debug "node/${node}" --quiet -- chroot /host getent ahostsv4 mirror-registry.workshop.lan | grep -q '^172.16.0.20\b'doneoc extract secret/pull-secret -n openshift-config --to=/tmp/pull-secret --confirmjq --arg auth "${REGISTRY_AUTH}" '.auths["'"${REGISTRY_HOST}"'"] = {"auth":$auth,"email":"init@workshop.lan"}'   /tmp/pull-secret/.dockerconfigjson >/tmp/pull-secret-merged.jsonoc set data secret/pull-secret -n openshift-config   --from-file=.dockerconfigjson=/tmp/pull-secret-merged.jsoncat >/tmp/mirror-registry-auth.json <<JSON{  "auths": {    "${REGISTRY_HOST}": {      "auth": "${REGISTRY_AUTH}",      "email": "init@workshop.lan"    }  }}JSONoc create secret generic mirror-registry-catalog-pull-secret   -n openshift-marketplace   --type=kubernetes.io/dockerconfigjson   --from-file=.dockerconfigjson=/tmp/mirror-registry-auth.json   --dry-run=client -o yaml | oc apply -f -for manifest in /opt/openshift/oc-mirror/working-dir/cluster-resources/cs-*.yaml; do  oc apply -f "$manifest"donefor catalog in redhat-operators certified-operators community-operators redhat-marketplace; do  oc delete catalogsource "$catalog" -n openshift-marketplace --ignore-not-found=truedonefor clustercatalog in openshift-redhat-operators openshift-certified-operators openshift-community-operators openshift-redhat-marketplace; do  oc patch clustercatalog "$clustercatalog" --type=merge -p '{"spec":{"availabilityMode":"Unavailable"}}' || truedonefor catalog in cs-redhat-operator-index-v4-20; do  oc patch catalogsource "$catalog" -n openshift-marketplace --type=merge     -p '{"spec":{"secrets":["mirror-registry-catalog-pull-secret"]}}'  oc patch serviceaccount "$catalog" -n openshift-marketplace --type=merge     -p '{"imagePullSecrets":[{"name":"mirror-registry-catalog-pull-secret"}]}'  oc delete pod -n openshift-marketplace -l "olm.catalogSource=${catalog}" --ignore-not-found=true  until [ "$(oc get catalogsource "$catalog" -n openshift-marketplace -o jsonpath='{.status.connectionState.lastObservedState}' 2>/dev/null)" = "READY" ]; do    sleep 10  donedonefor pkg in kubernetes-nmstate-operator local-storage-operator; do  until [ "$(oc get packagemanifest "$pkg" -n openshift-marketplace -o jsonpath='{.status.catalogSource}' 2>/dev/null)" = "cs-redhat-operator-index-v4-20" ]; do    sleep 10  donedoneEOF

34. Roll Out An IdM Ingress Certificate

Roll out the IdM-issued wildcard ingress certificate early in day-2 so later work lands on the final TLS state.

Note

Automation reference: playbooks/day2/openshift-post-install-idm-certs.yml, role openshift_post_install_idm_certs.

Warning

Ordering matters. In the automated path, the IdM ingress cert pivot runs early (phase 3, after infra conversion but before LDAP). Applying it late causes extended CO degradation — console CO health checks failed for 28 minutes in the first live run. See issue 44a51e8 in the issues ledger.

The supported certificate customization path is the ingress wildcard, not the cluster API serving certificate.

Shell
# Roll out the IdM wildcard ingress certificate.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigssh -i /opt/openshift/secrets/hypervisor-admin.key cloud-user@172.16.0.10 <<'INNER'sudo -ikinit admin <<< '<lab-default-password>'ipa dnsrecord-add ocp.workshop.lan apps --a-rec=172.16.10.7 || trueipa service-add HTTP/apps.ocp.workshop.lan || truecat <<'PROFILE' >/root/ocp-wildcard-ingress-profile.cfgauth.instance_id=raCertAuthclassId=caEnrollImpldesc=OpenShift wildcard ingress certificate profileenable=trueenableBy=iparainput.i1.class_id=certReqInputImplinput.i2.class_id=submitterInfoInputImplinput.list=i1,i2name=OpenShift Wildcard Ingress Certificate Enrollmentoutput.list=o1output.o1.class_id=certOutputImplpolicyset.list=serverCertSetpolicyset.serverCertSet.1.constraint.class_id=subjectNameConstraintImplpolicyset.serverCertSet.1.constraint.name=Subject Name Constraintpolicyset.serverCertSet.1.constraint.params.accept=truepolicyset.serverCertSet.1.constraint.params.pattern=CN=[^,]+,.+policyset.serverCertSet.1.default.class_id=subjectNameDefaultImplpolicyset.serverCertSet.1.default.name=Subject Name Defaultpolicyset.serverCertSet.1.default.params.name=CN=$request.req_subject_name.cn$, O=WORKSHOP.LANpolicyset.serverCertSet.10.constraint.class_id=noConstraintImplpolicyset.serverCertSet.10.constraint.name=No Constraintpolicyset.serverCertSet.10.default.class_id=subjectKeyIdentifierExtDefaultImplpolicyset.serverCertSet.10.default.name=Subject Key Identifier Extension Defaultpolicyset.serverCertSet.10.default.params.critical=falsepolicyset.serverCertSet.11.constraint.class_id=noConstraintImplpolicyset.serverCertSet.11.constraint.name=No Constraintpolicyset.serverCertSet.11.default.class_id=userExtensionDefaultImplpolicyset.serverCertSet.11.default.name=User Supplied Extension Defaultpolicyset.serverCertSet.11.default.params.userExtOID=2.5.29.17policyset.serverCertSet.12.constraint.class_id=noConstraintImplpolicyset.serverCertSet.12.constraint.name=No Constraintpolicyset.serverCertSet.12.default.class_id=commonNameToSANDefaultImplpolicyset.serverCertSet.12.default.name=Copy Common Name to Subject Alternative Namepolicyset.serverCertSet.2.constraint.class_id=validityConstraintImplpolicyset.serverCertSet.2.constraint.name=Validity Constraintpolicyset.serverCertSet.2.constraint.params.notAfterCheck=falsepolicyset.serverCertSet.2.constraint.params.notBeforeCheck=falsepolicyset.serverCertSet.2.constraint.params.range=740policyset.serverCertSet.2.default.class_id=validityDefaultImplpolicyset.serverCertSet.2.default.name=Validity Defaultpolicyset.serverCertSet.2.default.params.range=731policyset.serverCertSet.2.default.params.startTime=0policyset.serverCertSet.3.constraint.class_id=keyConstraintImplpolicyset.serverCertSet.3.constraint.name=Key Constraintpolicyset.serverCertSet.3.constraint.params.keyParameters=1024,2048,3072,4096,8192policyset.serverCertSet.3.constraint.params.keyType=RSApolicyset.serverCertSet.3.default.class_id=userKeyDefaultImplpolicyset.serverCertSet.3.default.name=Key Defaultpolicyset.serverCertSet.4.constraint.class_id=noConstraintImplpolicyset.serverCertSet.4.constraint.name=No Constraintpolicyset.serverCertSet.4.default.class_id=authorityKeyIdentifierExtDefaultImplpolicyset.serverCertSet.4.default.name=Authority Key Identifier Defaultpolicyset.serverCertSet.5.constraint.class_id=noConstraintImplpolicyset.serverCertSet.5.constraint.name=No Constraintpolicyset.serverCertSet.5.default.class_id=authInfoAccessExtDefaultImplpolicyset.serverCertSet.5.default.name=AIA Extension Defaultpolicyset.serverCertSet.5.default.params.authInfoAccessADEnable_0=truepolicyset.serverCertSet.5.default.params.authInfoAccessADLocationType_0=URINamepolicyset.serverCertSet.5.default.params.authInfoAccessADLocation_0=http://ipa-ca.workshop.lan/ca/ocsppolicyset.serverCertSet.5.default.params.authInfoAccessADMethod_0=1.3.6.1.5.5.7.48.1policyset.serverCertSet.5.default.params.authInfoAccessCritical=falsepolicyset.serverCertSet.5.default.params.authInfoAccessNumADs=1policyset.serverCertSet.6.constraint.class_id=keyUsageExtConstraintImplpolicyset.serverCertSet.6.constraint.name=Key Usage Extension Constraintpolicyset.serverCertSet.6.constraint.params.keyUsageCritical=truepolicyset.serverCertSet.6.constraint.params.keyUsageCrlSign=falsepolicyset.serverCertSet.6.constraint.params.keyUsageDataEncipherment=truepolicyset.serverCertSet.6.constraint.params.keyUsageDecipherOnly=falsepolicyset.serverCertSet.6.constraint.params.keyUsageDigitalSignature=truepolicyset.serverCertSet.6.constraint.params.keyUsageEncipherOnly=falsepolicyset.serverCertSet.6.constraint.params.keyUsageKeyAgreement=falsepolicyset.serverCertSet.6.constraint.params.keyUsageKeyCertSign=falsepolicyset.serverCertSet.6.constraint.params.keyUsageKeyEncipherment=truepolicyset.serverCertSet.6.constraint.params.keyUsageNonRepudiation=truepolicyset.serverCertSet.6.default.class_id=keyUsageExtDefaultImplpolicyset.serverCertSet.6.default.name=Key Usage Defaultpolicyset.serverCertSet.6.default.params.keyUsageCritical=truepolicyset.serverCertSet.6.default.params.keyUsageCrlSign=falsepolicyset.serverCertSet.6.default.params.keyUsageDataEncipherment=truepolicyset.serverCertSet.6.default.params.keyUsageDecipherOnly=falsepolicyset.serverCertSet.6.default.params.keyUsageDigitalSignature=truepolicyset.serverCertSet.6.default.params.keyUsageEncipherOnly=falsepolicyset.serverCertSet.6.default.params.keyUsageKeyAgreement=falsepolicyset.serverCertSet.6.default.params.keyUsageKeyCertSign=falsepolicyset.serverCertSet.6.default.params.keyUsageKeyEncipherment=truepolicyset.serverCertSet.6.default.params.keyUsageNonRepudiation=truepolicyset.serverCertSet.7.constraint.class_id=noConstraintImplpolicyset.serverCertSet.7.constraint.name=No Constraintpolicyset.serverCertSet.7.default.class_id=extendedKeyUsageExtDefaultImplpolicyset.serverCertSet.7.default.name=Extended Key Usage Extension Defaultpolicyset.serverCertSet.7.default.params.exKeyUsageCritical=falsepolicyset.serverCertSet.7.default.params.exKeyUsageOIDs=1.3.6.1.5.5.7.3.1,1.3.6.1.5.5.7.3.2policyset.serverCertSet.8.constraint.class_id=signingAlgConstraintImplpolicyset.serverCertSet.8.constraint.name=No Constraintpolicyset.serverCertSet.8.constraint.params.signingAlgsAllowed=SHA1withRSA,SHA256withRSA,SHA384withRSA,SHA512withRSA,MD5withRSA,MD2withRSA,SHA1withDSA,SHA1withEC,SHA256withEC,SHA384withEC,SHA512withECpolicyset.serverCertSet.8.default.class_id=signingAlgDefaultImplpolicyset.serverCertSet.8.default.name=Signing Algpolicyset.serverCertSet.8.default.params.signingAlg=-policyset.serverCertSet.9.constraint.class_id=noConstraintImplpolicyset.serverCertSet.9.constraint.name=No Constraintpolicyset.serverCertSet.9.default.class_id=crlDistributionPointsExtDefaultImplpolicyset.serverCertSet.9.default.name=CRL Distribution Points Extension Defaultpolicyset.serverCertSet.9.default.params.crlDistPointsCritical=falsepolicyset.serverCertSet.9.default.params.crlDistPointsEnable_0=truepolicyset.serverCertSet.9.default.params.crlDistPointsIssuerName_0=CN=Certificate Authority,o=ipacapolicyset.serverCertSet.9.default.params.crlDistPointsIssuerType_0=DirectoryNamepolicyset.serverCertSet.9.default.params.crlDistPointsNum=1policyset.serverCertSet.9.default.params.crlDistPointsPointName_0=http://ipa-ca.workshop.lan/ipa/crl/MasterCRL.binpolicyset.serverCertSet.9.default.params.crlDistPointsPointType_0=URINamepolicyset.serverCertSet.9.default.params.crlDistPointsReasons_0=policyset.serverCertSet.list=1,2,3,4,5,6,7,8,9,10,11,12profileId=ocpWildcardIngressvisible=falsePROFILEipa certprofile-show ocpWildcardIngress >/dev/null 2>&1 || \  ipa certprofile-import ocpWildcardIngress \    --file /root/ocp-wildcard-ingress-profile.cfg \    --desc "OpenShift wildcard ingress certificate profile" \    --store=trueINNEREOFopenssl req -new -newkey rsa:2048 -nodes \  -keyout /tmp/apps.key \  -out /tmp/apps.csr \  -subj '/CN=apps.ocp.workshop.lan' \  -addext 'subjectAltName=DNS:apps.ocp.workshop.lan,DNS:*.apps.ocp.workshop.lan'scp -i /opt/openshift/secrets/hypervisor-admin.key \  /tmp/apps.csr cloud-user@172.16.0.10:/tmp/apps.csrssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigssh -i /opt/openshift/secrets/hypervisor-admin.key cloud-user@172.16.0.10 <<'INNER'sudo -ikinit admin <<< '<lab-default-password>'ipa cert-request /tmp/apps.csr \  --principal=HTTP/apps.ocp.workshop.lan \  --profile-id=ocpWildcardIngress \  --certificate-out=/tmp/apps.crtINNEREOFscp -i /opt/openshift/secrets/hypervisor-admin.key \  cloud-user@172.16.0.10:/tmp/apps.crt /tmp/apps.crtssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc -n openshift-config create configmap idm-ca-trust \  --from-file=ca-bundle.crt=/var/tmp/idm-ca.crt \  --dry-run=client -o yaml | oc apply -f -oc patch proxy cluster --type=merge \  -p '{"spec":{"trustedCA":{"name":"idm-ca-trust"}}}'oc -n openshift-ingress create secret tls ingress-default-idm-tls \  --cert=/var/tmp/apps.crt \  --key=/var/tmp/apps.key \  --dry-run=client -o yaml | oc apply -f -cat <<'YAML' | oc apply -f -apiVersion: operator.openshift.io/v1kind: IngressControllermetadata:  name: default  namespace: openshift-ingress-operatorspec:  defaultCertificate:    name: ingress-default-idm-tlsYAMLEOF

35. Cleanup

Use cleanup intentionally: either destroy the full lab or, more commonly, destroy only the OpenShift cluster and preserve the healthy support services.

Note

Automation reference: playbooks/maintenance/cleanup.yml and the cleanup roles it aggregates.

Caution

This is destructive and not reversible. It destroys VMs and wipes disks. The mirror-registry archive and all OpenShift cluster state will be gone. If you only want to rebuild the cluster, preserve support services and use the cluster-only cleanup path instead of the full cleanup.

Important

For a true fresh support-services redeploy, removing the support VMs is not enough. Also wipe the support guest block devices (/dev/ebs/bastion-01, /dev/ebs/ad-01, /dev/ebs/idm-01, and /dev/ebs/mirror-registry) before replaying ./scripts/run_local_playbook.sh playbooks/site-bootstrap.yml, otherwise the next run can inherit stale guest state.

Automation shortcut for the preferred fresh-cluster rebuild:

Shell
# Destroy only the OpenShift cluster and preserve healthy support services.ansible-playbook -i inventory/hosts.yml playbooks/maintenance/cleanup.yml \  -e cleanup_destroy_openshift_cluster=true \  -e cleanup_wipe_openshift_cluster_block_devices=true./scripts/run_remote_bastion_playbook.sh playbooks/site-lab.yml \  -e lab_default_password='<lab-default-password>'

Destroy the OpenShift cluster shells, optionally wipe the disks, and clean up the support VM and lab-switch state.

Shell
# Wipe the support-service disks on virt-01.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'for domain in \  ocp-master-01.ocp.workshop.lan \  ocp-master-02.ocp.workshop.lan \  ocp-master-03.ocp.workshop.lan \  ocp-infra-01.ocp.workshop.lan \  ocp-infra-02.ocp.workshop.lan \  ocp-infra-03.ocp.workshop.lan \  ocp-worker-01.ocp.workshop.lan \  ocp-worker-02.ocp.workshop.lan \  ocp-worker-03.ocp.workshop.lan; do  virsh destroy "$domain" || true  virsh undefine "$domain" --nvram || truedonefor disk in \  /dev/ebs/ocp-master-01 /dev/ebs/ocp-master-02 /dev/ebs/ocp-master-03 \  /dev/ebs/ocp-infra-01 /dev/ebs/ocp-infra-02 /dev/ebs/ocp-infra-03 \  /dev/ebs/ocp-worker-01 /dev/ebs/ocp-worker-02 /dev/ebs/ocp-worker-03 \  /dev/ebs/ocp-infra-01-data /dev/ebs/ocp-infra-02-data /dev/ebs/ocp-infra-03-data; do  wipefs -a "$disk" || true  dd if=/dev/zero of="$disk" bs=1M count=100 oflag=direct,dsync status=progress || truedoneEOFrm -rf /opt/openshift/generated/ocp

When tearing all the way back to the post-OVS support-services boundary, wipe the support guest block devices too:

Shell
# Wipe the support-service disks on virt-01.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'for disk in /dev/ebs/bastion-01 /dev/ebs/ad-01 /dev/ebs/idm-01 /dev/ebs/mirror-registry; do  wipefs -a "$disk" || true  dd if=/dev/zero of="$disk" bs=1M count=100 oflag=direct,dsync status=none || truedoneEOF

36. Manual Debugging Examples

These commands are useful when teaching or troubleshooting the manual process.

Check cluster status from the correct side of the network boundary:

Shell
# Check the basic cluster state.export KUBECONFIG=/opt/openshift/aws-metal-openshift-demo/generated/ocp/auth/kubeconfig/opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin/oc get clusterversion/opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin/oc get nodes/opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin/oc get co

Check libvirt state on virt-01:

Shell
# Inspect the libvirt domain state on virt-01.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 \  "virsh list --all"

Check ODF storage state:

Shell
# Inspect the ODF status.oc -n openshift-storage get storageclusteroc -n openshift-storage get cephclusteroc -n openshift-local-storage get localvolumediscoveryoc -n openshift-local-storage get localvolumeset

Check NetObserv and Loki:

Shell
# Inspect the Network Observability status.oc -n netobserv get flowcollectoroc -n netobserv get lokistackoc -n netobserv get pods

Check AAP:

Shell
# Inspect the AAP status and login entry.oc -n aap get podsoc -n aap get routecurl -sk https://aap.apps.ocp.workshop.lan/api/gateway/v1/ui_auth/ | jq .

Check Tekton and Windows build lane:

Shell
# Inspect the Pipelines and Windows builder status.oc -n openshift-pipelines get tektonconfigoc -n windows-image-builder get pipelineoc -n windows-image-builder get pipelinerun