Manual Process
Use this page to understand what the automation does step by step, teach the flow to someone else, or inspect the underlying sequence without treating the automation as a black box.
If you are building Calabi in the supported way, do not start here. Start with PREREQUISITES and then follow AUTOMATION FLOW.
Keep these pages nearby while you use this teaching reference:
- AUTOMATION FLOW for phase order and execution context
- AUTH MODEL for the current supported OpenShift and AAP auth boundary
- ORCHESTRATION PLUMBING for workstation-to-bastion handoff and tracked runner state
- RESOURCE MANAGEMENT for CPU pools, performance domains, and host-resize guidance
Do not read this as a byte-for-byte dump of every Ansible task. Read it as the teaching companion to the supported build and day-2 flow.
When bastion-native playbooks need to be rerun after local repository changes,
the staged repo on bastion is refreshed in place so generated/ output is not
thrown away between runs.
Important
The validated support-services order changed. The current golden path is:
build bastion-01, stage the project to bastion, optionally build
ad-01 with AD DS and AD CS, build idm-01, optionally configure IdM to AD
trust, join the bastion to IdM, then continue with mirror-registry,
OpenShift DNS, and cluster work. The legacy section numbering is retained
below so older deep links do not break.
The command examples use these neutral placeholders:
<operator-ssh-key>: the SSH private key used from the operator workstation<hypervisor-public-ip>: the reachable public IP ofvirt-01, preferably a persistent Elastic IP<project-root>: the local checkout of this project on the current execution host<rhel10-image-path>: the local RHEL 10.1 qcow2 image path onvirt-01<pull-secret-file>: the local Red Hat pull-secret file<operator-public-key>: the SSH public key injected into guest cloud-init<ec2-user-password-hash>: a SHA-512 password hash forec2-user<lab-default-password>: the default demonstration password used for guest cloud-init, IdM bootstrap, and related manual examples
Where each step runs
| Steps | Where | What happens |
|---|---|---|
| 1-13 | Operator workstation / virt-01 |
AWS stacks, hypervisor, bastion build, bastion staging |
| 13A-36 | bastion-01 |
optional AD, IdM, bastion join, mirror registry, DNS, cluster build, day-2, debugging |
Important
Pick a side and stay on it. Steps 1-13 run from the operator workstation
against virt-01. Steps 13A-36 run from the bastion. The project does not
account for switching execution context mid-stream. Once you cross the
bastion boundary at step 13A, stay on bastion.
Table Of Contents
Use this like a runbook, not a novel. Jump to the phase you actually need.
Outer Cloud And Host Bring-up
- 1. Provision The AWS IaaS Layer
- 2. Verify First-Boot Access To
virt-01 - 3. Install Deterministic
/dev/ebsHost Naming - 4. Prepare The Hypervisor
- 5. Remove The Default Libvirt Network
- 6. Create The Lab Switch And VLAN Interfaces
- 7. Configure Firewalld And Host Routing
- 8. Define The Libvirt Network Over OVS
- 9. Stage The Guest Base Image
Support Services
Validated support-services order:
- 12. Build The Bastion VM
- 13. Stage The Project To The Bastion
- 13A. Optionally Build AD DS And AD CS From Bastion
- 10. Build The IdM VM
- 11. Configure IdM In The Guest
- 13AA. Optionally Configure IdM To AD Trust
- 13B. Join The Bastion To IdM
- 14. Build The Mirror Registry VM
- 15. Mirror OpenShift And Operator Content
- 16. Populate OpenShift DNS In IdM
Legacy section order retained below:
- 10. Build The IdM VM
- 11. Configure IdM In The Guest
- 12. Build The Bastion VM
- 13. Stage The Project To The Bastion
- 13A. Optionally Build AD DS And AD CS From Bastion
- 13AA. Optionally Configure IdM To AD Trust
- 13B. Join The Bastion To IdM
- 14. Build The Mirror Registry VM
- 15. Mirror OpenShift And Operator Content
- 16. Populate OpenShift DNS In IdM
Cluster Bring-up
- 17. Download Installer Binaries
- 18. Render Install Artifacts
- 19. Generate The Agent ISO
- 20. Create The OpenShift VM Shells
- 21. Wait For Installer Convergence
- 22. Validate Post-install State
- 23. Detach Install Media And Normalize Boot
Day-2 And Follow-on Work
- 24. Configure Breakglass Auth, Keycloak OIDC, And Infra Roles
- 25. Install Kubernetes NMState
- 26. Deploy ODF Declaratively
- 27. Install OpenShift Virtualization
- 28. Install The Web Terminal
- 29. Install Network Observability And Loki
- 30. Install Ansible Automation Platform
- 31. Install OpenShift Pipelines
- 32. Launch A Windows EFI Build
- 33. Pivot OperatorHub To The Disconnected Catalog
- 34. Roll Out An IdM Ingress Certificate
- 35. Cleanup
- 36. Manual Debugging Examples
1. Provision The AWS IaaS Layer
Build the AWS substrate by hand: first the shared tenant layer, then the
virt-01 host layer inside it.
Note
Automation reference: cloudformation/deploy-stack.sh with
cloudformation/templates/tenant.yaml.j2 and
cloudformation/templates/virt-host.yaml.j2.
For a full fresh environment, create the tenant substrate first, then create
the host substrate inside it. For a later virt-01 rebuild inside an existing
tenant, only repeat the host stack.
# Create the tenant and host CloudFormation stacks and inspect their outputs.cd <project-root>cat <<'EOF' >cloudformation/parameters.tenant.json[ { "ParameterKey": "LabPrefix", "ParameterValue": "workshop" }, { "ParameterKey": "AvailabilityZone", "ParameterValue": "us-east-2a" }, { "ParameterKey": "VpcCidr", "ParameterValue": "10.0.0.0/16" }, { "ParameterKey": "PublicSubnetCidr", "ParameterValue": "10.0.0.0/20" }]EOF./cloudformation/deploy-stack.sh tenant virt-tenant cloudformation/parameters.tenant.jsonaws cloudformation describe-stacks \ --stack-name virt-tenant \ --query 'Stacks[0].Outputs'cat <<'EOF' >cloudformation/parameters.host.json[ { "ParameterKey": "LabPrefix", "ParameterValue": "workshop" }, { "ParameterKey": "AvailabilityZone", "ParameterValue": "us-east-2a" }, { "ParameterKey": "ExistingVpcId", "ParameterValue": "vpc-REPLACE_FROM_VIRT_TENANT_OUTPUTS" }, { "ParameterKey": "ExistingSubnetId", "ParameterValue": "subnet-REPLACE_FROM_VIRT_TENANT_OUTPUTS" }, { "ParameterKey": "PersistentPublicIpAllocationId", "ParameterValue": "eipalloc-REPLACE_FROM_VIRT_TENANT_OUTPUTS" }, { "ParameterKey": "VirtHostPrivateIp", "ParameterValue": "10.0.8.207" }, { "ParameterKey": "AdminIngressCidr", "ParameterValue": "0.0.0.0/0" }, { "ParameterKey": "VirtHostInstanceType", "ParameterValue": "m5.metal" }, { "ParameterKey": "RedHatRhelPrivateAmiId", "ParameterValue": "ami-REPLACE_WITH_RHEL_10_1_AMI" }, { "ParameterKey": "ImportedKeyPairName", "ParameterValue": "virt-lab-key" }, { "ParameterKey": "ImportedPublicKeyMaterial", "ParameterValue": "ssh-ed25519 AAAA_REPLACE_WITH_REAL_PUBLIC_KEY" }, { "ParameterKey": "Ec2UserPasswordHash", "ParameterValue": "<ec2-user-password-hash>" }, { "ParameterKey": "RootVolumeSizeGiB", "ParameterValue": "100" }, { "ParameterKey": "RootVolumeIops", "ParameterValue": "3000" }, { "ParameterKey": "RootVolumeThroughput", "ParameterValue": "125" }]EOF./cloudformation/deploy-stack.sh host virt-host cloudformation/parameters.host.jsonaws cloudformation describe-stacks \ --stack-name virt-host \ --query 'Stacks[0].Outputs'2. Verify First-Boot Access To virt-01
Verify the new virt-01 host is reachable, initialized correctly, and ready
for the remaining hypervisor work.
Note
Automation reference: first-boot cloud-init from the host CloudFormation
templates, followed by playbooks/bootstrap/site.yml.
Verify that cloud-init completed, the operator SSH key was installed for
ec2-user, and Cockpit was enabled for SOCKS-proxied browser access.
ssh -i <operator-ssh-key> ec2-user@<hypervisor-public-ip> 'hostnamectl; systemctl is-active cockpit.socket'# Example SOCKS proxy for Cockpit access without opening TCP/9090 in the security group.ssh -i <operator-ssh-key> -D 5555 ec2-user@<hypervisor-public-ip># Browser configuration:# SOCKS5 proxy: 127.0.0.1:5555# Proxy DNS through SOCKS: enabled# Cockpit URL: https://<hypervisor-public-ip>:9090/## Authenticate to Cockpit as:# user: ec2-user# password: the plaintext corresponding to <ec2-user-password-hash>Fresh AWS RHEL images can still leave ec2-user locked even when a password
hash is present. The orchestration now explicitly unlocks the account. The
manual equivalent is:
# Unlock ec2-user if the first-boot image left the account locked.ssh -i <operator-ssh-key> ec2-user@<hypervisor-public-ip> 'sudo usermod -U ec2-user'Ensure the hypervisor identity is correct.
# Set the hypervisor hostname and ensure the local hosts entry is present.ssh -i <operator-ssh-key> ec2-user@<hypervisor-public-ip> <<'EOF'sudo hostnamectl set-hostname virt-01.workshop.langrep -q '^127.0.1.1 virt-01.workshop.lan virt-01$' /etc/hosts || \ echo '127.0.1.1 virt-01.workshop.lan virt-01' | sudo tee -a /etc/hostsEOF3. Install Deterministic /dev/ebs Host Naming
Create deterministic /dev/ebs/* names on the hypervisor from the live AWS
volume attachments.
Note
Automation reference: playbooks/bootstrap/site.yml, role lab_host_base.
Derive the active guest-disk map from the current AWS attachments by
GuestDisk tag, then render the host naming layer from that live map. This
avoids stale EBS volume IDs after a rebuild.
# Create the /dev/ebs naming layer from the current AWS volume attachments.sudo install -d -m 0755 /dev/ebscat <<'EOF' | sudo tee /etc/tmpfiles.d/ebs-friendly.confd /dev/ebs 0755 root root -EOFsudo systemd-tmpfiles --create /etc/tmpfiles.d/ebs-friendly.confINSTANCE_ID="$(curl -fsS http://169.254.169.254/latest/meta-data/instance-id)"aws ec2 describe-volumes \ --filters Name=attachment.instance-id,Values=${INSTANCE_ID} \ Name=tag-key,Values=GuestDisk \ --query "Volumes[].{volume_id:VolumeId,guest_disk:Tags[?Key=='GuestDisk']|[0].Value}" \ --output json >/tmp/guest-volumes.jsonpython3 - <<'PY' | sudo tee /etc/udev/rules.d/99-ebs-friendly.rulesimport jsonfrom pathlib import Pathvols = json.loads(Path('/tmp/guest-volumes.json').read_text())print('# Managed from live AWS GuestDisk-tagged attachments.')for vol in sorted(vols, key=lambda v: v['guest_disk']): serial = vol['volume_id'].replace('-', '') guest = vol['guest_disk'] print( f'ACTION=="add|change", SUBSYSTEM=="block", ENV{{DEVTYPE}}=="disk", ' f'KERNEL=="nvme*n1", ENV{{ID_MODEL}}=="Amazon Elastic Block Store", ' f'ENV{{ID_SERIAL_SHORT}}=="{serial}", SYMLINK+="ebs/{guest}"' )PYsudo udevadm control --reload-rulessudo udevadm trigger --subsystem-match=block --action=changesudo udevadm settlels -1 /dev/ebs4. Prepare The Hypervisor
Prepare the hypervisor base OS, repositories, and core services for the lab.
Note
Automation reference: playbooks/bootstrap/site.yml, role lab_host_base.
Update redhat-release before the main system update. This ensures the
current Red Hat Post-Quantum Cryptography public keys are present before
DNF validates newer packages. See:
https://access.redhat.com/solutions/3449341
Install the required host packages, enable the Red Hat fast-datapath repo for OVS, and turn on the core host services.
# Log into the hypervisor, install the base packages, enable the required services, and reboot.ssh -i <operator-ssh-key> ec2-user@<hypervisor-public-ip>sudo -itimeout 30s subscription-manager repos \ --enable fast-datapath-for-rhel-10-x86_64-rpmsdnf -y install insights-clientinsights-client --registerdnf -y install \ firewalld \ qemu-kvm \ qemu-img \ libvirt \ virt-install \ virt-viewer \ virt-top \ guestfs-tools \ genisoimage \ openvswitch3.6 \ cockpit \ cockpit-files \ cockpit-machines \ cockpit-podman \ cockpit-session-recording \ cockpit-image-builder \ pcp \ pcp-system-tools \ tmux \ jqdnf -y update redhat-releasednf -y updatesystemctl enable firewalldsystemctl enable cockpit.socketsystemctl enable openvswitchsystemctl enable osbuild-composer.socketsystemctl enable pmcd.service pmlogger.service pmproxy.servicesystemctl enable virtqemud.socketsystemctl enable virtnetworkd.socketsystemctl enable virtstoraged.socketsystemctl enable virtlogd.socketrebootApply The Host Resource-Management Policy
Apply the host CPU-placement and systemd slice policy used by the lab.
Note
Automation reference: playbooks/bootstrap/site.yml, role
lab_host_resource_management.
The current settled design keeps manager-level systemd CPUAffinity and the
Gold/Silver/Bronze slice units, but it does not set kernel affinity boot args
or an irqbalance guest-domain ban by default.
# Install the host CPU-placement and systemd slice policy.cat <<'EOF' >/etc/systemd/system.conf.d/90-aws-metal-openshift-demo-host-resource-management.conf[Manager]DefaultCPUAccounting=yesCPUAffinity=0-5,24-29,48-53,72-77EOFcat <<'EOF' >/etc/systemd/system/machine-gold.slice[Unit]Description=Gold performance domain for prioritized VMs[Slice]CPUAccounting=yesCPUWeight=512EOFcat <<'EOF' >/etc/systemd/system/machine-silver.slice[Unit]Description=Silver performance domain for medium-priority VMs[Slice]CPUAccounting=yesCPUWeight=333EOFcat <<'EOF' >/etc/systemd/system/machine-bronze.slice[Unit]Description=Bronze performance domain for best-effort VMs[Slice]CPUAccounting=yesCPUWeight=167EOFsystemctl daemon-reloadsystemctl daemon-reexecValidate the current host-policy shape.
# Validate the current host CPU-placement policy.grep -E '^(DefaultCPUAccounting|CPUAffinity)=' \ /etc/systemd/system.conf.d/90-aws-metal-openshift-demo-host-resource-management.confsystemctl show machine-gold.slice machine-silver.slice machine-bronze.slice \ -p CPUAccounting -p CPUWeightgrep Cpus_allowed_list /proc/1/statuscat /proc/cmdlineExpected current state:
- PID 1 allowed on
0-5,24-29,48-53,72-77 machine-gold.slice,machine-silver.slice, andmachine-bronze.sliceinstalled with weights512,333, and167- no
systemd.cpu_affinity=orirqaffinity=kernel arguments
Apply The Host Memory-Oversubscription Policy
Apply the host memory-oversubscription policy used by the lab. This policy is independent from CPU placement and can be revisited later without redoing the rest of the host bootstrap.
Note
Automation reference: playbooks/bootstrap/site.yml, role
lab_host_memory_oversubscription.
The memory-overcommit policy is kept separate from CPU placement. It improves host RAM efficiency through three independent kernel mechanisms:
- zram compressed swap — an in-memory block device that stores anonymous pages in compressed form, giving the kernel a cheap place to park cold pages before direct reclaim gets expensive
- THP in
madvisemode — Transparent Huge Pages only when applications explicitly request them, avoiding background compaction stalls - KSM with conservative scan settings — Kernel Same-page Merging deduplicates identical memory pages across guests running the same OS image
Important
zram-size = 16G is not a 16 GiB reservation taken away from the host up
front. The device only consumes physical RAM as compressed pages are stored
in it. With zstd compression the typical effective ratio is 2:1 to 4:1,
so 16G of logical swap capacity costs roughly 4-8G of physical RAM when
fully utilized.
Note
This is most useful when the host is calm or moderately busy. It helps the kernel avoid harsher reclaim behavior and can smooth out bursty pressure, but it is not a substitute for enough real RAM at high contention.
Warning
Compression, deduplication, and reclaim are CPU work that runs in the host kernel, not inside the Gold/Silver/Bronze tier model. If you lean harder on memory overcommit, expect some host cycles to move from idle capacity into memory management before you see any change in guest throughput.
zram
The role creates a systemd oneshot service that manages the zram device
explicitly using zramctl. This avoids relying on systemd-zram-generator
and keeps all three memory subsystems in a single service unit.
# Load the zram kernel modulemodprobe zram num_devices=1# Configure the device with zstd compression and 16G capacityzramctl /dev/zram0 --algorithm zstd --size 16G# Format and activate as swap with high prioritymkswap -f /dev/zram0swapon --priority 100 --discard /dev/zram0The swap priority of 100 ensures zram is always preferred over any physical
swap device. The --discard flag enables TRIM so that freed pages are
immediately released back to the host.
An optional advanced override can also attach a dedicated writeback backing
device to zram before initialization. Keep that disabled by default and only
enable it when you have intentionally repurposed a block device for cold-page
spill behavior. The override can either point at an existing block device or
create a dedicated LV in calabi_lab_vg. Do not count that tier as planned
capacity.
This capability is not inherently on-prem-only. The role can target any
dedicated block device through backing_device. What is on-prem-specific in
this repo is the managed-LVM convenience path that assumes the local
calabi_lab_vg layout used by the on-prem deployment flow.
If you also enable the optional writeback policy timer, start with
mode: huge and a small per-run budget such as 256 MiB every 30m.
The huge mode writes back pages that did not compress well. For
broader cold-page relief, use huge_idle when the running kernel
supports age-based zram idle tracking.
Do not point writeback at:
- a mounted filesystem device
- an active swap device
- a shared guest-storage LV that serves another purpose
THP
# Set THP to madvise mode.echo madvise > /sys/kernel/mm/transparent_hugepage/enabledecho madvise > /sys/kernel/mm/transparent_hugepage/defragSetting both to madvise means the kernel only allocates and compacts huge
pages when the application explicitly requests them via madvise(MADV_HUGEPAGE).
This avoids the pathological case where always mode triggers aggressive
khugepaged compaction against memory that no process benefits from.
KSM
# Set conservative KSM scan parameters and enable deduplication.echo 1000 > /sys/kernel/mm/ksm/pages_to_scanecho 20 > /sys/kernel/mm/ksm/sleep_millisecsecho 1 > /sys/kernel/mm/ksm/runThe scan settings are deliberately conservative: 1000 pages per cycle with a 20 ms pause. The first full scan pass across all guest memory is slow (minutes to hours on a fully deployed lab), but once the internal dedup tree is built the steady-state CPU cost is near zero.
Wrap It In A Persistent Service
Rather than running these commands ad-hoc, the role installs a systemd oneshot service so the policy survives reboot.
# Install and start the persistent host memory-oversubscription service.cat <<'EOF' >/etc/systemd/system/calabi-host-memory-oversubscription.service[Unit]Description=Apply Calabi host memory oversubscription policyAfter=local-fs.target[Service]Type=oneshotRemainAfterExit=yesExecStartPre=-/usr/sbin/swapoff /dev/zram0ExecStartPre=-/usr/sbin/zramctl --reset /dev/zram0ExecStartPre=-/usr/sbin/modprobe -r zramExecStartPre=/usr/sbin/modprobe zram num_devices=1ExecStart=/usr/sbin/zramctl /dev/zram0 --algorithm zstd --size 16GExecStart=/usr/sbin/mkswap -f /dev/zram0ExecStart=/usr/sbin/swapon --priority 100 --discard /dev/zram0ExecStop=-/usr/sbin/swapoff /dev/zram0ExecStop=-/usr/sbin/zramctl --reset /dev/zram0ExecStop=-/usr/sbin/modprobe -r zramExecStart=/usr/bin/bash -lc '\if [ -e /sys/kernel/mm/transparent_hugepage/enabled ]; then echo madvise > /sys/kernel/mm/transparent_hugepage/enabled; fi; \if [ -e /sys/kernel/mm/transparent_hugepage/defrag ]; then echo madvise > /sys/kernel/mm/transparent_hugepage/defrag; fi; \if [ -e /sys/kernel/mm/ksm/pages_to_scan ]; then echo 1000 > /sys/kernel/mm/ksm/pages_to_scan; fi; \if [ -e /sys/kernel/mm/ksm/sleep_millisecs ]; then echo 20 > /sys/kernel/mm/ksm/sleep_millisecs; fi; \if [ -e /sys/kernel/mm/ksm/run ]; then echo 1 > /sys/kernel/mm/ksm/run; fi; \true'[Install]WantedBy=multi-user.targetEOFsystemctl daemon-reloadsystemctl enable --now calabi-host-memory-oversubscription.serviceValidate The Memory-Oversubscription State
# Service statesystemctl is-enabled calabi-host-memory-oversubscription.servicesystemctl is-active calabi-host-memory-oversubscription.service# zram device and swapzramctlswapon --show# THP modecat /sys/kernel/mm/transparent_hugepage/enabledcat /sys/kernel/mm/transparent_hugepage/defrag# KSM statecat /sys/kernel/mm/ksm/runcat /sys/kernel/mm/ksm/pages_to_scancat /sys/kernel/mm/ksm/sleep_millisecsExpected current state:
- service is enabled and active
zramctlshows/dev/zram0withzstdalgorithm and16Gdisk sizeswaponshows/dev/zram0at priority100- THP enabled shows
[madvise](bracketed = active selection) - THP defrag shows
[madvise] - KSM
runis1, pages and sleep match the configured values
Monitor KSM Effectiveness
KSM deduplication savings grow over time as the scanner finds identical pages across guests. Check convergence after the cluster is fully deployed and idle:
# Pages shared (unique pages backing merged regions)cat /sys/kernel/mm/ksm/pages_shared# Pages sharing (total pages being deduplicated, including copies)cat /sys/kernel/mm/ksm/pages_sharing# Pages not yet mergedcat /sys/kernel/mm/ksm/pages_unsharedIf pages_sharing significantly exceeds pages_shared, KSM is saving
meaningful memory. If pages_unshared remains high relative to pages_sharing
for extended periods, the scan rate may be too conservative.
The project includes a monitoring script for continuous observation:
# Run the host memory-overcommit dashboard.<project-root>/scripts/host-memory-overcommit-status.py \ --host <hypervisor-public-ip> --user ec2-userUse --watch 30 for a live dashboard or --delta 60 for a before-and-after
comparison across an interval.
The rationale is not to squeeze masters or infra. It is to improve host RAM efficiency while keeping Bronze workers as the primary elasticity lever.
5. Remove The Default Libvirt Network
Remove the default libvirt network so the lab only uses the explicit OVS design.
Note
Automation reference: playbooks/bootstrap/site.yml, role lab_libvirt.
Remove virbr0 so the lab only uses the explicit OVS/libvirt design.
# Remove the default libvirt network.virsh net-destroy default || truevirsh net-autostart default --disable || truevirsh net-undefine default || true6. Create The Lab Switch And VLAN Interfaces
Create the OVS bridge, routed VLAN interfaces, and the host-side networking needed by the nested lab.
Note
Automation reference: playbooks/bootstrap/site.yml, role lab_switch.
Create the OVS bridge, create the routed VLAN interfaces, and bring them up.
cat <<'EOF' >/usr/local/sbin/aws-metal-openshift-demo-net.sh#!/usr/bin/env bashset -euo pipefailovs-vsctl --may-exist add-br lab-switchfor vlan in 100 200 201 202 300 301 302; do ovs-vsctl --may-exist add-port lab-switch vlan${vlan} \ -- set interface vlan${vlan} type=internaldoneip link set lab-switch upfor vlan in 100 200 201 202 300 301 302; do ip link set vlan${vlan} updoneip address replace 172.16.0.1/24 dev vlan100ip address replace 172.16.10.1/24 dev vlan200ip address replace 172.16.11.1/24 dev vlan201ip address replace 172.16.12.1/24 dev vlan202ip address replace 172.16.20.1/24 dev vlan300ip address replace 172.16.21.1/24 dev vlan301EOFchmod 0755 /usr/local/sbin/aws-metal-openshift-demo-net.sh/usr/local/sbin/aws-metal-openshift-demo-net.shcat <<'EOF' >/etc/systemd/system/aws-metal-openshift-demo-net.service[Unit]Description=AWS metal OpenShift demo networkAfter=network-online.target openvswitch.serviceWants=network-online.target[Service]Type=oneshotExecStart=/usr/local/sbin/aws-metal-openshift-demo-net.shRemainAfterExit=yes[Install]WantedBy=multi-user.targetEOFsystemctl daemon-reloadsystemctl enable --now aws-metal-openshift-demo-net.service7. Configure Firewalld And Host Routing
Configure firewalld and host routing so the lab networks can reach each other and NAT out through the hypervisor uplink.
Note
Automation reference: playbooks/bootstrap/site.yml, role lab_firewall.
Create the lab firewall zone, enable forwarding, and NAT the lab out of the host uplink.
# Create the lab firewalld zone and enable forwarding and NAT.firewall-cmd --permanent --new-zone=lab || truefirewall-cmd --permanent --zone=external --add-interface=enp125s0for iface in vlan100 vlan200 vlan201 vlan202 vlan300 vlan301; do firewall-cmd --permanent --zone=lab --add-interface=${iface}donefirewall-cmd --permanent --zone=external --add-masqueradefirewall-cmd --reloadcat <<'EOF' >/etc/sysctl.d/99-aws-metal-openshift-demo.confnet.ipv4.ip_forward = 1EOFsysctl --system8. Define The Libvirt Network Over OVS
Define the libvirt network and portgroups that place guests onto the OVS bridge and the intended VLANs.
Note
Automation reference: playbooks/bootstrap/site.yml, role lab_libvirt.
Define the lab-switch libvirt network and the portgroups used by the VMs.
# Define and start the OVS-backed libvirt network.cat <<'EOF' >/etc/libvirt/lab-switch.xml<network> <name>lab-switch</name> <forward mode='bridge'/> <bridge name='lab-switch'/> <virtualport type='openvswitch'/> <portgroup name='mgmt-access' default='no'> <vlan> <tag id='100'/> </vlan> </portgroup> <portgroup name='ocp-trunk' default='no'> <vlan trunk='yes'> <tag id='200'/> <tag id='201'/> <tag id='202'/> </vlan> </portgroup> <portgroup name='data300-access' default='no'> <vlan> <tag id='300'/> </vlan> </portgroup> <portgroup name='data301-access' default='no'> <vlan> <tag id='301'/> </vlan> </portgroup> <portgroup name='data302-access' default='no'> <vlan> <tag id='302'/> </vlan> </portgroup></network>EOFvirsh net-define /etc/libvirt/lab-switch.xmlvirsh net-start lab-switchvirsh net-autostart lab-switch9. Stage The Guest Base Image
Stage the base RHEL guest image on the hypervisor. Every support VM is seeded from this image, so get this step right before building guests.
Note
Automation reference: the guest-image staging portion of
playbooks/bootstrap/site.yml.
Note
This image is the seed for every support VM. If the wrong image lands here, every guest built from this point forward inherits the problem.
Place the RHEL KVM guest image on the hypervisor so the support VMs can be seeded onto their raw EBS devices.
# Copy the base RHEL guest image to the hypervisor.mkdir -p /root/imagescp <rhel10-image-path> /root/images/rhel-10.1-x86_64-kvm.qcow2When a Red Hat direct-download URL is available, the same image can be pulled straight to the hypervisor instead of being copied from the operator workstation.
# Download the base RHEL guest image directly to the hypervisor.mkdir -p /root/imagescurl -L '<rhel10-kvm-direct-download-url>' \ -o /root/images/rhel-10.1-x86_64-kvm.qcow210. Build The IdM VM
Build the idm-01 VM shell on the hypervisor, seed its disk from the RHEL
image, and attach the cloud-init data needed for first boot.
Note
Automation reference: playbooks/bootstrap/idm.yml, role idm.
Seed the idm-01 disk from the RHEL image, create cloud-init, and build the
VM on the management VLAN.
ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1mkdir -p /var/lib/aws-metal-openshift-demo/idm-01qemu-img convert -f qcow2 -O raw \ <rhel10-image-path> \ /dev/ebs/idm-01SSH_PUBKEY="$(cat <operator-public-key>)"cat <<'EOF' >/var/lib/aws-metal-openshift-demo/idm-01/meta-datainstance-id: idm-01local-hostname: idm-01.workshop.lanEOFcat <<'EOF' >/var/lib/aws-metal-openshift-demo/idm-01/network-configversion: 2ethernets: eth0: dhcp4: false addresses: - 172.16.0.10/24 routes: - to: 0.0.0.0/0 via: 172.16.0.1 nameservers: search: [workshop.lan] addresses: [172.16.0.10, 8.8.8.8, 4.4.4.4]EOFcat <<EOF >/var/lib/aws-metal-openshift-demo/idm-01/user-data#cloud-configfqdn: idm-01.workshop.lanmanage_etc_hosts: trueusers: - default - name: cloud-user groups: [wheel] sudo: ALL=(ALL) NOPASSWD:ALL lock_passwd: false passwd: $6$rounds=4096$temporary$BfY4OskkM6jv8v6eK9aT8W7F7Y9Q8nN2m5vQzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz ssh_authorized_keys: - ${SSH_PUBKEY}runcmd: - [ sh, -c, 'echo nameserver 127.0.0.1 >/etc/resolv.conf' ]EOFcloud-localds \ --network-config=/var/lib/aws-metal-openshift-demo/idm-01/network-config \ /var/lib/aws-metal-openshift-demo/idm-01/seed.iso \ /var/lib/aws-metal-openshift-demo/idm-01/user-data \ /var/lib/aws-metal-openshift-demo/idm-01/meta-datavirt-install \ --name idm-01.workshop.lan \ --memory 8192 \ --vcpus 2 \ --cpu host-passthrough \ --machine q35 \ --import \ --os-variant rhel10.0 \ --graphics none \ --console pty,target_type=serial \ --network network=lab-switch,portgroup=mgmt-access,model=virtio \ --controller type=scsi,model=virtio-scsi \ --disk path=/dev/ebs/idm-01,device=disk,bus=scsi,rotation_rate=1 \ --disk path=/var/lib/aws-metal-openshift-demo/idm-01/seed.iso,device=cdrom,bus=sata \ --resource partition=/machine/silver \ --cputune shares=333,emulatorpin.cpuset=2-5,26-29,50-53,74-77,\vcpupin0.vcpu=0,vcpupin0.cpuset=6-23,30-47,54-71,78-95,\vcpupin1.vcpu=1,vcpupin1.cpuset=6-23,30-47,54-71,78-95 \ --noautoconsoleThat places idm-01 into the Silver performance domain:
- partition:
/machine/silver - shares:
333 - vCPU threads:
guest_domain - emulator thread:
host_emulator
The current automation also prefers a guest poweroff plus host-side
virsh start for the first post-update cycle so cloud-init media cleanup in
persistent XML becomes the next live device model immediately, instead of
surviving as an empty CD-ROM through an in-guest reboot.
11. Configure IdM In The Guest
Configure the IdM guest after first boot: update it, install IPA, enable the supporting services, and create the initial identity data the lab depends on.
Note
Automation reference: playbooks/bootstrap/idm.yml, role idm_guest.
Update redhat-release before the main system update. This ensures the
current Red Hat Post-Quantum Cryptography public keys are present before
DNF validates newer packages. See:
https://access.redhat.com/solutions/3449341
Update the guest, install IdM, enable Cockpit and session recording, and create the core users and groups used by OpenShift.
# Install and configure IdM in the guest.ssh -i /opt/openshift/secrets/hypervisor-admin.key cloud-user@172.16.0.10sudo -idnf -y update redhat-releasednf -y updaterebootdnf -y install \ ipa-server \ ipa-server-dns \ idm-pki-kra \ ipa-server-trust-ad \ cockpit \ cockpit-files \ cockpit-networkmanager \ cockpit-podman \ tlog \ sssd \ oddjob \ oddjob-mkhomedir \ insights-client \ authselect-compatinsights-client --registerfirewall-cmd --permanent --add-service=cockpitfirewall-cmd --permanent --add-service=dnsfirewall-cmd --permanent --add-service=freeipa-4firewall-cmd --permanent --add-service=freeipa-trustfirewall-cmd --reloadipa-server-install -U \ --realm=WORKSHOP.LAN \ --domain=workshop.lan \ --hostname=idm-01.workshop.lan \ --ds-password='<lab-default-password>' \ --admin-password='<lab-default-password>' \ --setup-dns \ --auto-forwarders \ --no-host-dnskinit admin <<< '<lab-default-password>'ipa-kra-install -U -p '<lab-default-password>'ipa dnsconfig-mod --forwarder=8.8.8.8 --forwarder=1.1.1.1ipa group-add access-openshift-admin || trueipa group-add virt-admin || trueipa group-add developer || trueipa group-add admins || trueipa pwpolicy-add admins \ --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \ --priority=40 2>/dev/null || \ipa pwpolicy-mod admins \ --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \ --priority=40ipa pwpolicy-add access-openshift-admin \ --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \ --priority=50 2>/dev/null || \ipa pwpolicy-mod access-openshift-admin \ --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \ --priority=50ipa pwpolicy-add virt-admin \ --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \ --priority=60 2>/dev/null || \ipa pwpolicy-mod virt-admin \ --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \ --priority=60ipa pwpolicy-add developer \ --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \ --priority=70 2>/dev/null || \ipa pwpolicy-mod developer \ --maxlife=3650 --minlife=0 --history=0 --minclasses=0 --minlength=8 \ --priority=70ipa user-add sysop --first=Sys --last=Op --shell=/bin/bash --password <<< '<lab-default-password>'ipa user-add virtadm --first=Virt --last=Admin --shell=/bin/bash --password <<< '<lab-default-password>'ipa user-add dev --first=Dev --last=User --shell=/bin/bash --password <<< '<lab-default-password>'ipa user-mod sysop --setattr=krbPasswordExpiration=20360313235039Zipa user-mod virtadm --setattr=krbPasswordExpiration=20360313235039Zipa user-mod dev --setattr=krbPasswordExpiration=20360313235039Zipa dnsrecord-add workshop.lan virt-01 --a-rec=172.16.0.1 2>/dev/null || \ipa dnsrecord-mod workshop.lan virt-01 --a-rec=172.16.0.1ipa dnszone-add 0.16.172.in-addr.arpa \ --name-server=idm-01.workshop.lan. \ --admin-email=hostmaster.workshop.lan \ --dynamic-update=FALSE 2>/dev/null || trueipa dnsrecord-add 0.16.172.in-addr.arpa 1 \ --ptr-rec=virt-01.workshop.lan. 2>/dev/null || \ipa dnsrecord-mod 0.16.172.in-addr.arpa 1 \ --ptr-rec=virt-01.workshop.lan.cat <<'EOF' >/etc/named/ipa-ext.conf/* User customization for BIND named */acl "trusted_network" { localhost; localnets; 172.16.0.0/24; 172.16.10.0/24; 172.16.11.0/24; 172.16.12.0/24; 172.16.20.0/24; 172.16.21.0/24; 172.16.22.0/24;};EOFcat <<'EOF' >/etc/named/ipa-options-ext.conf/* User customization for BIND named */listen-on-v6 { any; };dnssec-validation yes;allow-query { trusted_network; };allow-recursion { trusted_network; };allow-query-cache { trusted_network; };EOFnamed-checkconf /etc/named.confsystemctl restart namedsystemctl is-active namedipa group-add-member admins --users=sysopipa group-add-member access-openshift-admin --users=sysopipa group-add-member virt-admin --users=virtadmipa group-add-member developer --users=devipa sudorule-add admins-nopasswd-all \ --desc='Permit admins group members to run any command on any host without authentication'ipa sudorule-mod admins-nopasswd-all --hostcat=allipa sudorule-mod admins-nopasswd-all --cmdcat=allipa sudorule-mod admins-nopasswd-all --runasusercat=allipa sudorule-mod admins-nopasswd-all --runasgroupcat=allipa sudorule-add-user admins-nopasswd-all --groups=adminsipa sudorule-add-option admins-nopasswd-all --sudooption='!authenticate'systemctl enable --now cockpit.socketsystemctl enable --now oddjobd.serviceauthselect select sssd with-tlog with-mkhomedir with-sudo --forcesystemctl restart sssdsss_cache -Esssctl domain-status workshop.lan12. Build The Bastion VM
Build the bastion VM shell on VLAN 100. This becomes the execution host for all remaining in-lab work.
Note
Automation reference: playbooks/bootstrap/bastion.yml, role bastion.
Create the bastion on VLAN 100. This becomes the execution host for the rest of the lab.
Note
The validated flow builds the bastion before IdM. The initial bastion build does not enroll the guest into IdM. That enrollment now happens later in 13B. Join The Bastion To IdM.
mkdir -p /var/lib/aws-metal-openshift-demo/bastion-01qemu-img convert -f qcow2 -O raw \ <rhel10-image-path> \ /dev/ebs/bastion-01SSH_PUBKEY="$(cat <operator-public-key>)"cat <<'EOF' >/var/lib/aws-metal-openshift-demo/bastion-01/meta-datainstance-id: bastion-01local-hostname: bastion-01.workshop.lanEOFcat <<'EOF' >/var/lib/aws-metal-openshift-demo/bastion-01/network-configversion: 2ethernets: eth0: dhcp4: false addresses: - 172.16.0.30/24 routes: - to: 0.0.0.0/0 via: 172.16.0.1 nameservers: search: [workshop.lan] addresses: [172.16.0.10, 8.8.8.8, 4.4.4.4]EOFcat <<EOF >/var/lib/aws-metal-openshift-demo/bastion-01/user-data#cloud-configfqdn: bastion-01.workshop.lanmanage_etc_hosts: trueusers: - default - name: cloud-user groups: [wheel] sudo: ALL=(ALL) NOPASSWD:ALL lock_passwd: false passwd: $6$rounds=4096$temporary$BfY4OskkM6jv8v6eK9aT8W7F7Y9Q8nN2m5vQzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz ssh_authorized_keys: - ${SSH_PUBKEY}EOFcloud-localds \ --network-config=/var/lib/aws-metal-openshift-demo/bastion-01/network-config \ /var/lib/aws-metal-openshift-demo/bastion-01/seed.iso \ /var/lib/aws-metal-openshift-demo/bastion-01/user-data \ /var/lib/aws-metal-openshift-demo/bastion-01/meta-datavirt-install \ --name bastion-01.workshop.lan \ --memory 8192 \ --vcpus 4 \ --cpu host-passthrough \ --machine q35 \ --import \ --os-variant rhel10.0 \ --graphics none \ --console pty,target_type=serial \ --network network=lab-switch,portgroup=mgmt-access,model=virtio \ --controller type=scsi,model=virtio-scsi \ --disk path=/dev/ebs/bastion-01,device=disk,bus=scsi,rotation_rate=1 \ --disk path=/var/lib/aws-metal-openshift-demo/bastion-01/seed.iso,device=cdrom,bus=sata \ --resource partition=/machine/bronze \ --cputune shares=167,emulatorpin.cpuset=2-5,26-29,50-53,74-77,\vcpupin0.vcpu=0,vcpupin0.cpuset=6-23,30-47,54-71,78-95,\vcpupin1.vcpu=1,vcpupin1.cpuset=6-23,30-47,54-71,78-95,\vcpupin2.vcpu=2,vcpupin2.cpuset=6-23,30-47,54-71,78-95,\vcpupin3.vcpu=3,vcpupin3.cpuset=6-23,30-47,54-71,78-95 \ --noautoconsolessh -i <operator-ssh-key> cloud-user@172.16.0.30 \ "sudo dnf -y install \ git ansible-core ansible-lint jq podman wget tar make insights-client \ cockpit-files cockpit-packagekit cockpit-podman \ cockpit-session-recording cockpit-image-builder \ pcp pcp-system-tools oddjob oddjob-mkhomedir; \ sudo insights-client --register; \ sudo systemctl enable --now cockpit.socket; \ sudo systemctl enable --now osbuild-composer.socket; \ sudo systemctl enable --now pmcd pmlogger pmproxy; \ sudo systemctl enable --now oddjobd"That places bastion-01 into the Bronze performance domain.
The current automation uses the same support-guest lifecycle as IdM: when the first package update requires a restart, the guest powers off, the seed media is cleaned from persistent XML while the domain is down, and the hypervisor starts the domain again.
13. Stage The Project To The Bastion
Stage the project, secrets, and operator tools onto the bastion so the rest of the build can run from inside the lab.
Note
Automation reference: playbooks/bootstrap/bastion-stage.yml, role
bastion_stage.
Important
This is the last step that runs from the operator workstation. After staging
completes, all remaining work runs from bastion-01. Do not criss-cross
between workstation and bastion for subsequent steps.
Copy the repo, the pull secret, and the SSH keys to the bastion so the rest of
the work happens from inside the lab. The current orchestration also creates a
ready-to-use shell environment for cloud-user and current IdM admins
members, including $HOME/bin, $HOME/etc, tool symlinks, and a login-time
KUBECONFIG export when the cluster artifacts exist.
Note
Automation reference: playbooks/bootstrap/bastion-stage.yml, role
bastion_stage plus the managed name-resolution role that seeds the
bootstrap /etc/hosts fallback for bastion, IdM, mirror-registry, and the
cluster API endpoints.
The bastion staging phase also installs the execution-time Python requirements
needed for Windows orchestration, including pywinrm.
# Copy the project tree to bastion without overwriting generated output or secrets.rsync -a --delete \ --exclude generated \ --exclude secrets \ -e "ssh -i <operator-ssh-key>" \ <project-root>/ \ cloud-user@172.16.0.30:/tmp/aws-metal-openshift-demo/scp -i <operator-ssh-key> <pull-secret-file> cloud-user@172.16.0.30:/tmp/pull-secret.txtscp -i <operator-ssh-key> <operator-ssh-key> cloud-user@172.16.0.30:/tmp/hypervisor-admin.keyscp -i <operator-ssh-key> <operator-public-key> cloud-user@172.16.0.30:/tmp/hypervisor-admin.pubssh -i <operator-ssh-key> cloud-user@172.16.0.30 <<'EOF'sudo mkdir -p /opt/openshift /opt/openshift/secretssudo rsync -a --delete \ --exclude generated \ --exclude secrets \ /tmp/aws-metal-openshift-demo/ /opt/openshift/aws-metal-openshift-demo/sudo mv /tmp/pull-secret.txt /opt/openshift/secrets/pull-secret.txtsudo mv /tmp/hypervisor-admin.key /opt/openshift/secrets/hypervisor-admin.keysudo mv /tmp/hypervisor-admin.pub /opt/openshift/secrets/hypervisor-admin.pubsudo chmod 0600 /opt/openshift/secrets/hypervisor-admin.keysudo chown -R cloud-user:cloud-user /opt/openshiftsudo dnf -y install python3-pipsudo python3 -m pip install -r /opt/openshift/aws-metal-openshift-demo/requirements-pip.txtEOFAfter the cluster artifacts exist, the manual equivalent of the helper layout is:
# Create the bastion helper layout and local tool symlinks.ssh -i <operator-ssh-key> cloud-user@172.16.0.30 <<'EOF'set -euo pipefailmkdir -p "$HOME/bin" "$HOME/etc"ln -sfn /opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin/oc "$HOME/bin/oc"ln -sfn /opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin/kubectl "$HOME/bin/kubectl"ln -sfn /opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin/openshift-install "$HOME/bin/openshift-install"ln -sfn /usr/local/bin/track-mirror-progress "$HOME/bin/track-mirror-progress"ln -sfn /usr/local/bin/track-mirror-progress-tmux "$HOME/bin/track-mirror-progress-tmux"ln -sfn /opt/openshift/aws-metal-openshift-demo/scripts/run_bastion_playbook.sh "$HOME/bin/run-bastion-playbook"cp /opt/openshift/aws-metal-openshift-demo/generated/ocp/auth/kubeconfig "$HOME/etc/kubeconfig"cp /opt/openshift/aws-metal-openshift-demo/generated/ocp/auth/kubeconfig "$HOME/etc/kubeconfig.local"chmod 0600 "$HOME/etc/kubeconfig" "$HOME/etc/kubeconfig.local"ln -sfn /opt/openshift/aws-metal-openshift-demo/generated/ocp/idm-ca.crt "$HOME/etc/idm-ca.crt"cat <<'PROFILE' | sudo tee /etc/profile.d/openshift-bastion.sh >/dev/nullcase ":$PATH:" in *":$HOME/bin:"*) ;; *) PATH="$HOME/bin:$PATH" ;;esaccase ":$PATH:" in *":/opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin:"*) ;; *) PATH="/opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin:$PATH" ;;esacexport KUBECONFIG_ADMIN="$HOME/etc/kubeconfig.local"if [ -z "${KUBECONFIG:-}" ]; then if [ -r "$HOME/etc/kubeconfig" ]; then export KUBECONFIG="$HOME/etc/kubeconfig" elif [ -r "$KUBECONFIG_ADMIN" ]; then export KUBECONFIG="$KUBECONFIG_ADMIN" fifiPROFILEEOFIterative Development With push_and_run.sh
This helper is not part of the manual standup path. It exists only to shorten developer edit-sync-rerun cycles after the bastion is already staged.
After the initial staging, use the lightweight scripts/push_and_run.sh helper
for iterative code changes. It rsyncs only the role/playbook/vars tree
(excluding inventory/, secrets/, and generated/ — all of which have
bastion-specific content that must not be overwritten) and runs the specified
playbook in a single blocking call.
For normal operator reruns of bastion-native playbooks, prefer
scripts/run_remote_bastion_playbook.sh. It refreshes the full staged tree
first and matches the documented golden path more closely than the lightweight
developer helper.
# From the operator workstationcd <project-root>./scripts/push_and_run.sh playbooks/day2/openshift-post-install-infra.yml./scripts/push_and_run.sh playbooks/day2/openshift-post-install-ldap-auth.yml -e some_override=trueThe script:
- syncs only code changes (not generated artifacts, secrets, or inventory)
- runs the playbook as
cloud-useron the bastion in a blocking foreground SSH session - shows only
PLAY RECAPon success - dumps the full output on failure
This reduces the edit → sync → run → check cycle to a single command.
Token optimization for AI-assisted development:
When using an AI assistant to develop against this codebase:
- Batch all code edits locally before syncing. Get the code right by reading it and reasoning about correctness, then sync and run once. Do not iterate through the bastion.
- Check PLAY RECAP first. Only read the full playbook log on failure.
push_and_run.shdoes this automatically. - Do not debug runtime infrastructure issues through the AI. SSH key loading failures, SELinux context mismatches, and service connectivity problems are faster and cheaper to debug in a terminal. Report the findings back and let the AI adjust the code.
- Use the right model tier for the task. Use a reasoning-heavy model for planning, doc rewrites, and multi-source investigations. Switch to a faster model for mechanical execution: running playbooks, committing, syncing.
13A. Optionally Build AD DS And AD CS From Bastion
This is the manual AD build path: prepare media on virt-01, create the VM,
complete the first boot, then configure AD DS and AD CS directly inside
Windows.
Note
Automation reference: playbooks/bootstrap/ad-server.yml.
Important
This phase is optional and default-disabled in automation:
lab_build_ad_server: false. Enable it only when you want the lab AD DS /
AD CS server.
Note
Before enabling this path, download Windows Server 2025 evaluation media from
the Microsoft Evaluation Center:
https://www.microsoft.com/en-us/evalcenter/download-windows-server-2025
The currently validated selection is English (United States) ->
ISO download -> 64-bit edition.
The manual path below follows the same validated design as the automated AD build:
- install Windows Server 2025 onto
/dev/ebs/ad-01 - use
virtio-win.isofor the required storage and network drivers - complete the remaining guest-tools and virtio-driver work after the OS is up
- promote the server to
corp.lan - install AD CS and Web Enrollment
- seed the demo users and groups
- export the root CA
Validated guest identity:
- VM/domain:
ad-01.corp.lan - Windows hostname:
AD-01 - IPv4:
172.16.0.40/24 - gateway:
172.16.0.1 - DNS:
172.16.0.10,8.8.8.8
Confirm The Required Media On virt-01
# Confirm the Windows media and target disk are present on virt-01.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1ls -l /root/images/26100.32230.260111-0550.lt_release_svc_refresh_SERVER_EVAL_x64FRE_en-us.isols -l /root/images/virtio-win.isols -l /dev/ebs/ad-01exitIf virtio-win.iso is not already staged, the current documented source is:
virtio-windriver installation guidance:- direct ISO download referenced there:
Prepare The Unattended Install Media On virt-01
Create a small OEMDRV ISO that provides the answer file to Windows Setup.
This keeps the manual path aligned with the validated unattended install.
# Create the AD answer-file media.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'set -euo pipefailmkdir -p /var/lib/aws-metal-openshift-demo/ad-01/autounattendinstall -o qemu -g qemu -m 0644 \ /root/images/26100.32230.260111-0550.lt_release_svc_refresh_SERVER_EVAL_x64FRE_en-us.iso \ /var/lib/aws-metal-openshift-demo/ad-01/26100.32230.260111-0550.lt_release_svc_refresh_SERVER_EVAL_x64FRE_en-us.isoinstall -o qemu -g qemu -m 0644 \ /root/images/virtio-win.iso \ /var/lib/aws-metal-openshift-demo/ad-01/virtio-win.isocat >/var/lib/aws-metal-openshift-demo/ad-01/autounattend/autounattend.xml <<'XML'<?xml version="1.0" encoding="utf-8"?><unattend xmlns="urn:schemas-microsoft-com:unattend"> <settings pass="windowsPE"> <component name="Microsoft-Windows-International-Core-WinPE" processorArchitecture="amd64" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS"> <SetupUILanguage><UILanguage>en-US</UILanguage></SetupUILanguage> <InputLocale>en-US</InputLocale> <SystemLocale>en-US</SystemLocale> <UILanguage>en-US</UILanguage> <UserLocale>en-US</UserLocale> </component> <component name="Microsoft-Windows-PnpCustomizationsWinPE" processorArchitecture="amd64" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State"> <DriverPaths> <PathAndCredentials wcm:action="add" wcm:keyValue="1"> <Path>E:\vioscsi\2k25\amd64</Path> </PathAndCredentials> <PathAndCredentials wcm:action="add" wcm:keyValue="2"> <Path>E:\NetKVM\2k25\amd64</Path> </PathAndCredentials> </DriverPaths> </component> <component name="Microsoft-Windows-Setup" processorArchitecture="amd64" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State"> <DiskConfiguration> <Disk wcm:action="add"> <DiskID>0</DiskID> <WillWipeDisk>true</WillWipeDisk> <CreatePartitions> <CreatePartition wcm:action="add"><Order>1</Order><Size>260</Size><Type>EFI</Type></CreatePartition> <CreatePartition wcm:action="add"><Order>2</Order><Size>128</Size><Type>MSR</Type></CreatePartition> <CreatePartition wcm:action="add"><Order>3</Order><Extend>true</Extend><Type>Primary</Type></CreatePartition> </CreatePartitions> <ModifyPartitions> <ModifyPartition wcm:action="add"><Order>1</Order><PartitionID>1</PartitionID><Format>FAT32</Format><Label>EFI</Label></ModifyPartition> <ModifyPartition wcm:action="add"><Order>2</Order><PartitionID>2</PartitionID></ModifyPartition> <ModifyPartition wcm:action="add"><Order>3</Order><PartitionID>3</PartitionID><Format>NTFS</Format><Label>Windows</Label><Letter>C</Letter></ModifyPartition> </ModifyPartitions> </Disk> </DiskConfiguration> <ImageInstall> <OSImage> <InstallTo><DiskID>0</DiskID><PartitionID>3</PartitionID></InstallTo> <InstallFrom> <MetaData wcm:action="add"><Key>/IMAGE/INDEX</Key><Value>2</Value></MetaData> </InstallFrom> </OSImage> </ImageInstall> <UserData> <AcceptEula>true</AcceptEula> <ProductKey><WillShowUI>Never</WillShowUI></ProductKey> </UserData> </component> </settings> <settings pass="specialize"> <component name="Microsoft-Windows-Shell-Setup" processorArchitecture="amd64" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS"> <ComputerName>AD-01</ComputerName> <TimeZone>UTC</TimeZone> </component> <component name="Microsoft-Windows-TerminalServices-LocalSessionManager" processorArchitecture="amd64" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS"> <fDenyTSConnections>false</fDenyTSConnections> </component> </settings> <settings pass="oobeSystem"> <component name="Microsoft-Windows-International-Core" processorArchitecture="amd64" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS"> <InputLocale>en-US</InputLocale> <SystemLocale>en-US</SystemLocale> <UILanguage>en-US</UILanguage> <UserLocale>en-US</UserLocale> </component> <component name="Microsoft-Windows-Shell-Setup" processorArchitecture="amd64" publicKeyToken="31bf3856ad364e35" language="neutral" versionScope="nonSxS" xmlns:wcm="http://schemas.microsoft.com/WMIConfig/2002/State"> <OOBE> <HideEULAPage>true</HideEULAPage> <HideLocalAccountScreen>true</HideLocalAccountScreen> <HideOEMRegistrationScreen>true</HideOEMRegistrationScreen> <HideOnlineAccountScreens>true</HideOnlineAccountScreens> <HideWirelessSetupInOOBE>true</HideWirelessSetupInOOBE> <ProtectYourPC>3</ProtectYourPC> <SkipMachineOOBE>true</SkipMachineOOBE> <SkipUserOOBE>true</SkipUserOOBE> </OOBE> <UserAccounts> <AdministratorPassword> <Value>REPLACE_WITH_LAB_DEFAULT_PASSWORD</Value> <PlainText>true</PlainText> </AdministratorPassword> </UserAccounts> <AutoLogon> <Enabled>true</Enabled> <Username>Administrator</Username> <Password> <Value>REPLACE_WITH_LAB_DEFAULT_PASSWORD</Value> <PlainText>true</PlainText> </Password> <LogonCount>3</LogonCount> </AutoLogon> <FirstLogonCommands> <SynchronousCommand wcm:action="add"> <Order>1</Order> <CommandLine>powershell -NoProfile -Command "Set-ExecutionPolicy RemoteSigned -Force"</CommandLine> <Description>Set PowerShell execution policy</Description> </SynchronousCommand> <SynchronousCommand wcm:action="add"> <Order>2</Order> <CommandLine>powershell -NoProfile -Command "$adapter = Get-NetAdapter | Where-Object { $_.Status -ne 'Disabled' } | Sort-Object ifIndex | Select-Object -First 1; Get-NetIPAddress -InterfaceAlias $adapter.Name -AddressFamily IPv4 -ErrorAction SilentlyContinue | Remove-NetIPAddress -Confirm:$false -ErrorAction SilentlyContinue; New-NetIPAddress -InterfaceAlias $adapter.Name -IPAddress '172.16.0.40' -PrefixLength 24 -DefaultGateway '172.16.0.1'; Set-DnsClientServerAddress -InterfaceAlias $adapter.Name -ServerAddresses @('172.16.0.10','8.8.8.8')"</CommandLine> <Description>Configure static IPv4 networking</Description> </SynchronousCommand> <SynchronousCommand wcm:action="add"> <Order>3</Order> <CommandLine>powershell -NoProfile -Command "Set-DnsClientGlobalSetting -SuffixSearchList @('corp.lan')"</CommandLine> <Description>Set the DNS suffix search list</Description> </SynchronousCommand> <SynchronousCommand wcm:action="add"> <Order>4</Order> <CommandLine>powershell -NoProfile -Command "Enable-PSRemoting -Force -SkipNetworkProfileCheck"</CommandLine> <Description>Enable PowerShell remoting</Description> </SynchronousCommand> <SynchronousCommand wcm:action="add"> <Order>5</Order> <CommandLine>powershell -NoProfile -Command "Set-Item WSMan:\localhost\Service\Auth\Basic -Value $true"</CommandLine> <Description>Enable WinRM basic auth</Description> </SynchronousCommand> <SynchronousCommand wcm:action="add"> <Order>6</Order> <CommandLine>powershell -NoProfile -Command "Set-Item WSMan:\localhost\Service\AllowUnencrypted -Value $true"</CommandLine> <Description>Allow unencrypted WinRM for lab</Description> </SynchronousCommand> <SynchronousCommand wcm:action="add"> <Order>7</Order> <CommandLine>powershell -NoProfile -Command "New-NetFirewallRule -DisplayName 'WinRM HTTP' -Direction Inbound -Protocol TCP -LocalPort 5985 -Action Allow"</CommandLine> <Description>Open WinRM firewall port</Description> </SynchronousCommand> <SynchronousCommand wcm:action="add"> <Order>8</Order> <CommandLine>powershell -NoProfile -Command "Restart-Service WinRM"</CommandLine> <Description>Restart WinRM service</Description> </SynchronousCommand> <SynchronousCommand wcm:action="add"> <Order>9</Order> <CommandLine>reg add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon" /v AutoAdminLogon /t REG_SZ /d 0 /f</CommandLine> <Description>Disable auto-logon</Description> </SynchronousCommand> </FirstLogonCommands> </component> </settings></unattend>XMLsed -i "s/REPLACE_WITH_LAB_DEFAULT_PASSWORD/<lab-default-password>/g" \ /var/lib/aws-metal-openshift-demo/ad-01/autounattend/autounattend.xmlxorriso -as mkisofs \ -o /var/lib/aws-metal-openshift-demo/ad-01/ad-01-autounattend.iso \ -V OEMDRV -J -R -graft-points \ autounattend.xml=/var/lib/aws-metal-openshift-demo/ad-01/autounattend/autounattend.xmlchown qemu:qemu /var/lib/aws-metal-openshift-demo/ad-01/ad-01-autounattend.isoEOFCreate The VM On virt-01
# Create the AD guest on virt-01 and attach the Windows installer media.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'set -euo pipefaildd if=/dev/zero of=/dev/ebs/ad-01 bs=1M count=1 conv=notruncvirt-install \ --name ad-01.corp.lan \ --osinfo win2k25 \ --boot uefi,loader_secure=no \ --machine q35 \ --memory 8192 \ --vcpus 4 \ --cpu host-passthrough \ --controller type=scsi,model=virtio-scsi \ --controller type=virtio-serial,index=0 \ --disk path=/dev/ebs/ad-01,format=raw,bus=scsi,cache=none,io=native,discard=unmap,rotation_rate=1,boot_order=2 \ --disk path=/var/lib/aws-metal-openshift-demo/ad-01/26100.32230.260111-0550.lt_release_svc_refresh_SERVER_EVAL_x64FRE_en-us.iso,device=cdrom,readonly=on,boot_order=1 \ --disk path=/var/lib/aws-metal-openshift-demo/ad-01/virtio-win.iso,device=cdrom,readonly=on \ --disk path=/var/lib/aws-metal-openshift-demo/ad-01/ad-01-autounattend.iso,device=cdrom,readonly=on \ --network network=lab-switch,portgroup=mgmt-access,model=virtio,mac=52:54:00:50:01:05 \ --channel unix,target_type=virtio,name=org.qemu.guest_agent.0 \ --rng builtin \ --graphics vnc,listen=0.0.0.0 \ --console pty,target_type=serial \ --autostart \ --resource partition=/machine/bronze \ --cputune shares=167,emulatorpin.cpuset=2-5,26-29,50-53,74-77,\vcpupin0.vcpu=0,vcpupin0.cpuset=6-23,30-47,54-71,78-95,\vcpupin1.vcpu=1,vcpupin1.cpuset=6-23,30-47,54-71,78-95,\vcpupin2.vcpu=2,vcpupin2.cpuset=6-23,30-47,54-71,78-95,\vcpupin3.vcpu=3,vcpupin3.cpuset=6-23,30-47,54-71,78-95 \ --noautoconsoleEOFUse the Cockpit console or virt-viewer to watch first boot. On the validated
media path, UEFI may present a DVD boot menu first. If it does:
- choose the first DVD entry
- at
Press any key to boot from CD or DVD, pressEnter
Windows should then:
- load the boot-critical
vioscsiandNetKVMdrivers fromvirtio-win.iso - partition
/dev/ebs/ad-01 - install Server 2025
- come up as
AD-01 - apply the static IP and WinRM settings from
autounattend.xml
Verify First WinRM Reachability
From the bastion:
# Verify that WinRM is reachable on the AD guest.curl -sI http://172.16.0.40:5985/wsman | head -n 1You should see an HTTP response from Microsoft-HTTPAPI/2.0, which confirms
that the WinRM listener is up.
Install Remaining Virtio Components And Guest Agent
Log in to the Windows console as Administrator, open an elevated PowerShell,
locate the virtio media drive letter, then install the guest-tools bundle, the
remaining drivers, and the QEMU guest agent:
$virtio = Get-PSDrive -PSProvider FileSystem | ForEach-Object { $_.Root.TrimEnd('\') } | Where-Object { Test-Path "$_\guest-agent\qemu-ga-x86_64.msi" } | Select-Object -First 1msiexec /i "$virtio\virtio-win-gt-x64.msi" /qn /norestartpnputil /add-driver "$virtio\Balloon\2k25\amd64\*.inf" /installpnputil /add-driver "$virtio\qemufwcfg\2k25\amd64\*.inf" /installpnputil /add-driver "$virtio\vioserial\2k25\amd64\*.inf" /installpnputil /add-driver "$virtio\viorng\2k25\amd64\*.inf" /installmsiexec /i "$virtio\guest-agent\qemu-ga-x86_64.msi" /qn /norestart$svc = Get-Service | Where-Object { $_.Name -in @('QEMU-GA', 'qemu-ga') -or $_.DisplayName -like '*QEMU*Guest*Agent*'} | Select-Object -First 1Set-Service -Name $svc.Name -StartupType AutomaticStart-Service -Name $svc.NameGet-CimInstance Win32_Service -Filter "Name='$($svc.Name)'"Promote The Server To corp.lan
Still in elevated PowerShell on AD-01:
Install-WindowsFeature AD-Domain-Services,DNS -IncludeManagementToolsInstall-ADDSForest ` -DomainName 'corp.lan' ` -DomainNetbiosName 'CORP' ` -SafeModeAdministratorPassword (ConvertTo-SecureString '<lab-default-password>' -AsPlainText -Force) ` -InstallDns ` -ForceLet the server reboot. After it comes back, verify domain-controller state:
Get-ADDomainControllerGet-ADDomainConfigure DNS Forwarding And AD CS
Add-DnsServerConditionalForwarderZone ` -Name 'workshop.lan' ` -MasterServers @('172.16.0.10') ` -ReplicationScope ForestInstall-WindowsFeature AD-Certificate,ADCS-Cert-Authority,ADCS-Web-Enrollment -IncludeManagementToolsInstall-AdcsCertificationAuthority ` -CAType EnterpriseRootCA ` -CryptoProviderName 'RSA#Microsoft Software Key Storage Provider' ` -KeyLength 4096 ` -HashAlgorithmName SHA256 ` -CACommonName 'CORP Enterprise Root CA' ` -ValidityPeriod Years ` -ValidityPeriodUnits 10 ` -ForceInstall-AdcsWebEnrollment -Forcecertutil -pingSeed The Demo Groups And Users
$base = 'CN=Users,DC=corp,DC=lan''OpenShift-Admins','OpenShift-Virt-Admins','Ansible-Automation-Admins','Developers' | ForEach-Object { if (-not (Get-ADGroup -Filter "Name -eq '$_'" -ErrorAction SilentlyContinue)) { New-ADGroup -Name $_ -GroupScope Global -GroupCategory Security -Path $base }}$users = @( @{ name='ad-directoryadmin'; first='Directory'; last='Admin'; groups=@('Domain Admins') }, @{ name='ad-ocpadmin'; first='OpenShift'; last='Admin'; groups=@('OpenShift-Admins') }, @{ name='ad-virtadmin'; first='Virtualization';last='Admin'; groups=@('OpenShift-Virt-Admins') }, @{ name='ad-aapadmin'; first='Automation'; last='Admin'; groups=@('Ansible-Automation-Admins') }, @{ name='ad-dev01'; first='Developer'; last='One'; groups=@('Developers') })$pw = ConvertTo-SecureString '<lab-default-password>' -AsPlainText -Forceforeach ($u in $users) { if (-not (Get-ADUser -Filter "SamAccountName -eq '$($u.name)'" -ErrorAction SilentlyContinue)) { New-ADUser ` -Name "$($u.first) $($u.last)" ` -GivenName $u.first ` -Surname $u.last ` -SamAccountName $u.name ` -UserPrincipalName "$($u.name)@corp.lan" ` -AccountPassword $pw ` -Enabled $true ` -PasswordNeverExpires $true ` -Path $base } foreach ($group in $u.groups) { $members = Get-ADGroupMember -Identity $group -ErrorAction SilentlyContinue | Select-Object -ExpandProperty SamAccountName if ($members -notcontains $u.name) { Add-ADGroupMember -Identity $group -Members $u.name } }}Open The Required Windows Firewall Groups
$displayGroups = @( 'DNS Service', 'Kerberos Key Distribution Center', 'Active Directory Domain Services', 'Certification Authority', 'Windows Remote Management')foreach ($group in $displayGroups) { $rules = Get-NetFirewallRule | Where-Object { $_.DisplayGroup -eq $group } if ($rules) { $rules | Enable-NetFirewallRule }}Export The AD Root CA And Validate Final State
certutil -ca.cert C:\Windows\Temp\corp-root-ca.cerGet-ADDomainGet-ADDomainControllerGet-Service CertSvcFrom virt-01, detach the installation media once the guest configuration is
complete:
# Eject the Windows installer media from the AD guest.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'for target in $(virsh domblklist ad-01.corp.lan --details | awk '$2 == "cdrom" { print $3 }'); do virsh change-media ad-01.corp.lan "$target" --eject --config --live --force || truedoneEOFValidated AD outputs:
- AD domain:
corp.lan - Enterprise Root CA:
CORP Enterprise Root CA - groups:
OpenShift-AdminsOpenShift-Virt-AdminsAnsible-Automation-AdminsDevelopers
- users:
ad-directoryadminad-ocpadminad-virtadminad-aapadminad-dev01
Quick verification from the bastion and hypervisor:
# Validate WinRM and the AD guest power state.curl -sI http://172.16.0.40:5985/wsman | head -n 1ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 'virsh domstate ad-01.corp.lan'13AA. Optionally Configure IdM To AD Trust
If the AD support VM is enabled and the lab should bridge selected AD groups into local IdM policy groups, complete the trust setup here before bastion enrollment.
Note
Automation reference: playbooks/bootstrap/idm-ad-trust.yml. The current
automated path configures the AD conditional forwarder for workshop.lan,
enables IdM AD trust support, creates the AD DNS forward zone in IPA,
establishes the trust, and nests the mapped IdM external groups into the
target local policy groups described in
AD / IDM POLICY MODEL.
Manual checkpoints for this phase:
- on
AD-01,workshop.lanmust resolve through the conditional forwarder toidm-01 - on
idm-01,corp.lanforward-zone lookups and AD LDAP SRV lookups must resolve throughad-01 ipa trust-show corp.lan --allmust succeed onidm-01- the mapped IdM external groups and nested local policy groups must match the intended bridge policy
Useful spot checks:
Resolve-DnsName -Name 'idm-01.workshop.lan' -Server 127.0.0.1 -Type A# Confirm the AD trust records are visible before proceeding.host ad-01.corp.lan 127.0.0.1host -t SRV _ldap._tcp.dc._msdcs.corp.lan 127.0.0.1ipa trust-show corp.lan --all13B. Join The Bastion To IdM
At this point bastion-01 already exists and idm-01 is already configured.
The remaining work is to trust the active IdM CA, enroll the bastion as an IPA
client, and enable the authselect features used by the rest of the lab. The
current join path no longer performs a general guest update or reboot; those
cycles stay in the earlier site-bootstrap.yml provisioning flow.
Note
Automation reference: playbooks/bootstrap/bastion-join.yml.
From bastion-01, make sure the active IdM CA is trusted locally before the
client install:
# Install the IdM CA on bastion before the client join.curl -o /tmp/idm-ca.crt http://idm-01.workshop.lan/ipa/config/ca.crtsudo install -o root -g root -m 0644 \ /tmp/idm-ca.crt /etc/ipa/ca.crtsudo install -o root -g root -m 0644 \ /tmp/idm-ca.crt /etc/pki/ca-trust/source/anchors/idm-rootCA.pemsudo update-ca-trust extractEnroll the bastion into IdM:
# Join bastion to IdM.sudo dnf -y install \ ipa-client \ oddjob \ oddjob-mkhomedir \ sssd \ authselect-compatsudo ipa-client-install -U \ --hostname=bastion-01.workshop.lan \ --domain=workshop.lan \ --realm=WORKSHOP.LAN \ --server=idm-01.workshop.lan \ --principal=admin \ --password='<lab-default-password>' \ --force-join \ --mkhomedir \ --no-ntpBecause bastion-01 uses a static address, do not rely on client-side dynamic
DNS updates for its authoritative IdM records. Reassert and validate the A/PTR
records explicitly:
# Create the bastion DNS records in IdM.kinit admin <<< '<lab-default-password>'ipa dnsrecord-add workshop.lan bastion-01 --a-rec=172.16.0.30 \ || ipa dnsrecord-mod workshop.lan bastion-01 --a-rec=172.16.0.30ipa dnsrecord-add 0.16.172.in-addr.arpa 30 \ --ptr-rec=bastion-01.workshop.lan. \ || ipa dnsrecord-mod 0.16.172.in-addr.arpa 30 \ --ptr-rec=bastion-01.workshop.lan.dig +short @172.16.0.10 bastion-01.workshop.lan Adig +short @172.16.0.10 -x 172.16.0.30Enable the same client-side login behavior the automation expects:
# Enable the expected SSSD login behavior on bastion.sudo systemctl enable --now oddjobd.servicesudo authselect select sssd with-mkhomedir with-sudo --forcesudo systemctl restart sssdsudo sss_cache -EValidate the bastion is now using IdM:
# Validate that bastion is using IdM for identity resolution.id admin@workshop.langetent passwd admin@workshop.lansudo sssctl domain-status workshop.lanAt this point the bastion is ready for IdM-backed operator access. The next support-service phase is the mirror registry build.
Bastion boundary — all remaining work runs from bastion-01
Warning
Everything below this line runs from the bastion. Do not switch back to the operator workstation for steps 13A-36 unless you are deliberately debugging the automation itself. Once you cross this boundary, stay on bastion.
14. Build The Mirror Registry VM
Build and configure the mirror-registry VM from the bastion, join it to IdM, and install the Quay-based disconnected registry stack.
Note
Automation reference: playbooks/lab/mirror-registry.yml, roles
mirror_registry and mirror_registry_guest.
From the bastion, after the validated support-services order
(bastion -> bastion-stage -> optional ad-server -> idm -> bastion-join),
create the mirror-registry VM on virt-01, then configure the guest, join it
to IdM, and install the Quay-based mirror registry stack.
ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1mkdir -p /var/lib/aws-metal-openshift-demo/mirror-registry/cloudinitqemu-img convert -f qcow2 -O raw \ <rhel10-image-path> \ /dev/ebs/mirror-registrySSH_PUBKEY="$(cat /opt/openshift/secrets/hypervisor-admin.pub)"cat <<'EOF' >/var/lib/aws-metal-openshift-demo/mirror-registry/cloudinit/meta-datainstance-id: mirror-registrylocal-hostname: mirror-registry.workshop.lanEOFcat <<'EOF' >/var/lib/aws-metal-openshift-demo/mirror-registry/cloudinit/network-configversion: 2ethernets: eth0: dhcp4: false addresses: - 172.16.0.20/24 routes: - to: 0.0.0.0/0 via: 172.16.0.1 nameservers: search: [workshop.lan] addresses: [172.16.0.10, 8.8.8.8, 4.4.4.4]EOFcat <<EOF >/var/lib/aws-metal-openshift-demo/mirror-registry/cloudinit/user-data#cloud-configfqdn: mirror-registry.workshop.lanmanage_etc_hosts: trueusers: - default - name: cloud-user groups: [wheel] sudo: ALL=(ALL) NOPASSWD:ALL lock_passwd: false ssh_authorized_keys: - ${SSH_PUBKEY}EOFxorriso -as mkisofs \ -o /var/lib/aws-metal-openshift-demo/mirror-registry/mirror-registry-cidata.iso \ -V CIDATA -J -R -graft-points \ user-data=/var/lib/aws-metal-openshift-demo/mirror-registry/cloudinit/user-data \ meta-data=/var/lib/aws-metal-openshift-demo/mirror-registry/cloudinit/meta-data \ network-config=/var/lib/aws-metal-openshift-demo/mirror-registry/cloudinit/network-configchown qemu:qemu /var/lib/aws-metal-openshift-demo/mirror-registry/mirror-registry-cidata.isovirt-install \ --name mirror-registry.workshop.lan \ --osinfo rhel10.0 \ --boot uefi \ --machine q35 \ --memory 8192 \ --vcpus 4 \ --cpu host-passthrough \ --controller type=scsi,model=virtio-scsi \ --disk path=/dev/ebs/mirror-registry,format=raw,bus=scsi,cache=none,io=native,discard=unmap,rotation_rate=1 \ --disk path=/var/lib/aws-metal-openshift-demo/mirror-registry/mirror-registry-cidata.iso,device=cdrom \ --network network=lab-switch,portgroup=mgmt-access,model=virtio,mac=52:54:00:00:00:20 \ --rng builtin \ --import \ --graphics none \ --resource partition=/machine/bronze \ --cputune shares=167,emulatorpin.cpuset=2-5,26-29,50-53,74-77,\vcpupin0.vcpu=0,vcpupin0.cpuset=6-23,30-47,54-71,78-95,\vcpupin1.vcpu=1,vcpupin1.cpuset=6-23,30-47,54-71,78-95,\vcpupin2.vcpu=2,vcpupin2.cpuset=6-23,30-47,54-71,78-95,\vcpupin3.vcpu=3,vcpupin3.cpuset=6-23,30-47,54-71,78-95,\iothreadpin0.iothread=1,iothreadpin0.cpuset=2-5,26-29,50-53,74-77 \ --iothreads iothreads=1 \ --console pty,target_type=serial \ --autostart \ --noautoconsoleConfigure the guest itself, install packages, join IdM, and install the mirror registry appliance.
Note
Update redhat-release before the main system update here as well. This
ensures the current Red Hat Post-Quantum Cryptography public keys are present
before DNF validates newer packages. See:
https://access.redhat.com/solutions/3449341
# Configure the mirror-registry guest and install the appliance prerequisites.ssh -i /opt/openshift/secrets/hypervisor-admin.key cloud-user@172.16.0.20sudo -idnf -y update redhat-releasednf -y updaterebootdnf -y install \ cockpit \ firewalld \ ipa-client \ certmonger \ git \ podman \ jq \ skopeo \ openssl \ tar \ gzipmkdir -p /etc/containers/containers.conf.dcat <<'EOF' >/etc/containers/containers.conf.d/99-mirror-registry-cgroupfs.conf[engine]cgroup_manager = "cgroupfs"EOFsystemctl enable --now firewalldsystemctl enable --now cockpit.socketfirewall-cmd --permanent --add-service=cockpitfirewall-cmd --permanent --add-service=sshfirewall-cmd --permanent --add-port=8443/tcpfirewall-cmd --reloadipa-client-install -U \ --hostname=mirror-registry.workshop.lan \ --domain=workshop.lan \ --realm=WORKSHOP.LAN \ --server=idm-01.workshop.lan \ --principal=admin \ --password='<lab-default-password>' \ --force-join \ --mkhomedirmkdir -p /usr/local/libexec/mirror-registry /opt/quay-install /root/bin /opt/openshiftcurl -L -o /tmp/mirror-registry-amd64.tar.gz \ https://mirror.openshift.com/pub/cgw/mirror-registry/latest/mirror-registry-amd64.tar.gztar -C /usr/local/libexec/mirror-registry -xzf /tmp/mirror-registry-amd64.tar.gzinstall -m 0755 /usr/local/libexec/mirror-registry/mirror-registry /usr/local/bin/mirror-registryInstall calabi-shell only after the guest is registered and can install
packages from the Red Hat CDN. The system-wide install requires git, so it
does not belong in pre-registration cloud-init.
git clone https://github.com/gprocunier/calabi-shell /opt/calabi-shell/opt/calabi-shell/install.sh --systemAs with the bastion, the mirror-registry guest has a static address. Reassert and validate its authoritative IdM records explicitly instead of relying on client-driven dynamic DNS updates:
# Create the mirror-registry DNS records in IdM.kinit admin <<< '<lab-default-password>'ipa dnsrecord-add workshop.lan mirror-registry --a-rec=172.16.0.20 \ || ipa dnsrecord-mod workshop.lan mirror-registry --a-rec=172.16.0.20ipa dnsrecord-add 0.16.172.in-addr.arpa 20 \ --ptr-rec=mirror-registry.workshop.lan. \ || ipa dnsrecord-mod 0.16.172.in-addr.arpa 20 \ --ptr-rec=mirror-registry.workshop.lan.dig +short @172.16.0.10 mirror-registry.workshop.lan Adig +short @172.16.0.10 -x 172.16.0.20Request an IdM-issued certificate for the registry and install the registry with that certificate.
# Request and install the mirror-registry certificate.kinit admin <<< '<lab-default-password>'ipa service-add HTTP/mirror-registry.workshop.lan || truemkdir -p /var/lib/mirror-registry/install-certsipa-getcert request -w \ -I mirror-registry-quay \ -f /etc/pki/tls/certs/mirror-registry.workshop.lan.crt \ -k /etc/pki/tls/private/mirror-registry.workshop.lan.key \ -K HTTP/mirror-registry.workshop.lan \ -D mirror-registry.workshop.lan \ -g 2048cat /etc/pki/tls/certs/mirror-registry.workshop.lan.crt \ /etc/ipa/ca.crt \ >/var/lib/mirror-registry/install-certs/ssl.certcp /etc/pki/tls/private/mirror-registry.workshop.lan.key \ /var/lib/mirror-registry/install-certs/ssl.keychmod 0644 /var/lib/mirror-registry/install-certs/ssl.*mirror-registry install \ --quayHostname mirror-registry.workshop.lan \ --quayRoot /opt/quay-install \ --initUser init \ --initPassword <lab-default-password> \ --sslCert /var/lib/mirror-registry/install-certs/ssl.cert \ --sslKey /var/lib/mirror-registry/install-certs/ssl.keyupdate-ca-trustmkdir -p /etc/containers/certs.d/mirror-registry.workshop.lan:8443cp /etc/ipa/ca.crt /etc/pki/ca-trust/source/anchors/workshop-idm-ca.crtcp /etc/ipa/ca.crt /etc/containers/certs.d/mirror-registry.workshop.lan:8443/ca.crtupdate-ca-trust extractpodman login mirror-registry.workshop.lan:8443 \ --username init \ --password <lab-default-password>After certificate issuance or renewal, the current automation also synchronizes
the staged cert and key into the live Quay config under
/opt/quay-install/quay-config/ and restarts the Quay services. That step is
what ensures the served certificate actually matches the freshly issued IdM
certificate chain.
Like the other support guests, the current automation uses a poweroff/offline cleanup/start cycle for the first update-triggered restart so the cloud-init CD-ROM cleanup is reflected in the next live QEMU process.
15. Mirror OpenShift And Operator Content
Mirror the OpenShift release payloads, operator catalogs, and extra images
into the local registry using the portable m2d plus d2m workflow.
Note
Automation reference: the mirroring portion of
playbooks/lab/mirror-registry.yml, primarily role
mirror_registry_guest.
The disconnected standard for this lab is now:
portable— runs bothm2d(pull to disk archive) andd2m(push into local Quay) in a single playbook invocation
The import mode (d2m only) remains available for re-importing an existing
archive without re-pulling from upstream.
Direct mirror-to-registry remains available for partial-disconnect validation, but it is no longer the primary student workflow.
Install the matching client tools on the mirror registry, copy the Red Hat pull
secret into place, render the ImageSetConfiguration, and run oc-mirror.
# Download the required tools and mirror the OpenShift content.dnf -y install jqmkdir -p /opt/openshift/oc-mirror /opt/openshift/oc-mirror-archive /root/.config/containerscurl -L -o /tmp/openshift-client-linux.tar.gz \ https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.20.15/openshift-client-linux.tar.gztar -C /usr/local/bin -xzf /tmp/openshift-client-linux.tar.gz oc kubectlcurl -L -o /tmp/oc-mirror.rhel9.tar.gz \ https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.20.15/oc-mirror.rhel9.tar.gztar -C /usr/local/bin -xzf /tmp/oc-mirror.rhel9.tar.gz oc-mirrorcp /opt/openshift/secrets/pull-secret.txt /opt/openshift/pull-secret.jsonjq -s '.[0] * .[1] | .auths = (.[0].auths + .[1].auths)' \ /opt/openshift/pull-secret.json \ /root/.config/containers/auth.json \ >/opt/openshift/pull-secret-merged.jsoncat <<'EOF' >/opt/openshift/imageset-config.yamlapiVersion: mirror.openshift.io/v2alpha1kind: ImageSetConfigurationmirror: platform: channels: - name: stable-4.20 minVersion: 4.20.15 maxVersion: 4.20.15 architectures: - amd64 operators: - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.20 packages: - name: kubevirt-hyperconverged channels: [{name: stable}] - name: local-storage-operator channels: [{name: stable}] - name: kubernetes-nmstate-operator channels: [{name: stable}] - name: loki-operator channels: [{name: stable-6.2}] - name: netobserv-operator channels: [{name: stable}] - name: openshift-pipelines-operator-rh channels: [{name: pipelines-1.20}] - name: ansible-automation-platform-operator channels: [{name: stable-2.6}] - name: web-terminal channels: [{name: fast}] - name: devworkspace-operator channels: [{name: fast}] - name: node-healthcheck-operator channels: [{name: alpha}] - name: fence-agents-remediation channels: [{name: alpha}] - name: odf-operator channels: [{name: stable-4.20}] - name: ocs-operator channels: [{name: stable-4.20}] - name: mcg-operator channels: [{name: stable-4.20}] - name: odf-csi-addons-operator channels: [{name: stable-4.20}] - name: rook-ceph-operator channels: [{name: stable-4.20}] - name: cephcsi-operator channels: [{name: stable-4.20}] - name: metallb-operator channels: [{name: stable}]EOFoc-mirror --v2 \ --config /opt/openshift/imageset-config.yaml \ --authfile /opt/openshift/pull-secret-merged.json \ --parallel-images 16 \ --parallel-layers 12 \ file:///opt/openshift/oc-mirror-archiveImport the resulting archive into the local registry. In the current
automation this runs as a single portable workflow (m2d then d2m in one
invocation). The manual equivalent is two separate commands — pull to disk,
then push into Quay:
# Import the mirrored archive into the local registry.oc-mirror --v2 \ --config /opt/openshift/imageset-config.yaml \ --authfile /root/.config/containers/auth.json \ --parallel-images 16 \ --parallel-layers 12 \ --from file:///opt/openshift/oc-mirror-archive \ docker://mirror-registry.workshop.lan:8443/openshiftThe automation exposes those values as
mirror_registry_oc_mirror_parallel_images and
mirror_registry_oc_mirror_parallel_layers.
Track the workflow with the bastion helper that the orchestration now installs.
# Open the mirror progress helpers./usr/local/bin/track-mirror-progress/usr/local/bin/track-mirror-progress-tmuxThe tmux variant opens dedicated panes for:
- summary
- runner state
- storage/import state
- registry container state
- live bastion log tail
The same information can also be gathered manually without the helper.
From bastion-01, inspect the runner state and latest Ansible task.
# Inspect the bastion-side mirror job state.tail -f /var/tmp/bastion-playbooks/mirror-registry.logcat /var/tmp/bastion-playbooks/mirror-registry.pidcat /var/tmp/bastion-playbooks/mirror-registry.rcFrom mirror-registry.workshop.lan, inspect live oc-mirror activity, archive
growth, and guest disk usage.
# Inspect live mirror activity and archive growth on the registry host.pgrep -af oc-mirrordf -h /du -sh /opt/openshift/oc-mirror-archivedu -sh /opt/openshift/oc-mirrorsudo du -sh /var/lib/containers/storage/volumes/quay-storage/_datasudo du -sh /var/lib/containers/storage/volumes/sqlite-storage/_datasudo podman pstail -f /var/log/oc-mirror-m2d.logtail -f /var/log/oc-mirror-d2m.logTip
The mirroring phase is the longest single step in the build (hours, not
minutes). Use track-mirror-progress-tmux on the bastion to monitor it. If
the guest runs out of disk mid-mirror, the archive is corrupted and you start
over.
If an oc-mirror dry run fails because port 55000 is already bound, look
for a stale oc-mirror process on the mirror-registry guest before rerunning.
That is usually a rerun artifact, not a content-model problem.
Approximate sizing guidance:
m2dsafe target:archive_size * 1.5 + 20 GiB- same-host
d2msafe target:archive_size * 2.5 + 20 GiB - recommended same-host disk for both phases with margin:
archive_size * 3 + 20 GiB
Observed on the live 4.20.15 run with the current operator set:
m2darchive size: about95 GiBm2dsafe target: about162 GiB- same-host
d2msafe target: about256 GiB - recommended same-host disk for both phases with margin: about
303 GiB - practical lab decision: provision
400 GiBformirror-registry - observed imported Quay content footprint after
d2m: about82 GiB
16. Populate OpenShift DNS In IdM
Populate the OpenShift forward and reverse DNS zones in IdM so the cluster and its routes resolve correctly before install.
Note
Automation reference: playbooks/lab/openshift-dns.yml, role
idm_openshift_dns.
Create the forward and reverse DNS zones and records in IdM for the cluster, nodes, and VIPs.
# Create the OpenShift forward and reverse DNS zones and VIP records.ssh -i /opt/openshift/secrets/hypervisor-admin.key cloud-user@172.16.0.10sudo -ikinit admin <<< '<lab-default-password>'ipa dnszone-add ocp.workshop.lan \ --name-server=idm-01.workshop.lan. \ --admin-email=hostmaster.ocp.workshop.lan \ --dynamic-update=FALSE || trueipa dnszone-add 10.16.172.in-addr.arpa \ --name-server=idm-01.workshop.lan. \ --admin-email=hostmaster.ocp.workshop.lan \ --dynamic-update=FALSE || trueipa dnszone-add 11.16.172.in-addr.arpa \ --name-server=idm-01.workshop.lan. \ --admin-email=hostmaster.ocp.workshop.lan \ --dynamic-update=FALSE || trueipa dnsrecord-add ocp.workshop.lan api --a-rec=172.16.10.5 || trueipa dnsrecord-add ocp.workshop.lan api-int --a-rec=172.16.10.5 || trueipa dnsrecord-add ocp.workshop.lan '*.apps' --a-rec=172.16.10.7 || trueipa dnsrecord-add ocp.workshop.lan ingress --a-rec=172.16.10.7 || trueCreate the node A and PTR records.
# Create the OpenShift node A and PTR records.for entry in \ "ocp-master-01 11 11" \ "ocp-master-02 12 12" \ "ocp-master-03 13 13" \ "ocp-infra-01 21 21" \ "ocp-infra-02 22 22" \ "ocp-infra-03 23 23" \ "ocp-worker-01 31 31" \ "ocp-worker-02 32 32" \ "ocp-worker-03 33 33"; do set -- $entry name="$1" machine_octet="$2" storage_octet="$3" ipa dnsrecord-add ocp.workshop.lan "${name}" \ --a-rec="172.16.10.${machine_octet}" || true ipa dnsrecord-add 10.16.172.in-addr.arpa "${machine_octet}" \ --ptr-rec="${name}.ocp.workshop.lan." || true ipa dnsrecord-add ocp.workshop.lan "${name}-storage" \ --a-rec="172.16.11.${storage_octet}" || true ipa dnsrecord-add 11.16.172.in-addr.arpa "${storage_octet}" \ --ptr-rec="${name}-storage.ocp.workshop.lan." || truedone17. Download Installer Binaries
Download the exact matching OpenShift installer and client binaries onto the bastion.
Note
Automation reference: playbooks/cluster/openshift-installer-binaries.yml,
role openshift_installer_binaries.
Download the exact matching OpenShift installer and client tools onto the bastion.
# Download the exact matching OpenShift installer and client tools onto the bastion.mkdir -p /opt/openshift/generated/tools/4.20.15/downloadsmkdir -p /opt/openshift/generated/tools/4.20.15/bindnf -y install nmstatecurl -L -o /opt/openshift/generated/tools/4.20.15/downloads/openshift-install-linux.tar.gz \ https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.20.15/openshift-install-linux.tar.gzcurl -L -o /opt/openshift/generated/tools/4.20.15/downloads/openshift-client-linux.tar.gz \ https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.20.15/openshift-client-linux.tar.gztar -C /opt/openshift/generated/tools/4.20.15/bin -xzf \ /opt/openshift/generated/tools/4.20.15/downloads/openshift-install-linux.tar.gztar -C /opt/openshift/generated/tools/4.20.15/bin -xzf \ /opt/openshift/generated/tools/4.20.15/downloads/openshift-client-linux.tar.gzchmod 0755 /opt/openshift/generated/tools/4.20.15/bin/openshift-installchmod 0755 /opt/openshift/generated/tools/4.20.15/bin/occhmod 0755 /opt/openshift/generated/tools/4.20.15/bin/kubectl18. Render Install Artifacts
Render the OpenShift install-config, manifests, and cluster artifacts on the bastion.
Note
Automation reference: playbooks/cluster/openshift-install-artifacts.yml,
role openshift_install_artifacts.
Write the install-config.yaml, agent-config.yaml, and the IdM CA file that
are used by the agent installer.
# Write the OpenShift install config, agent config, and IdM CA bundle.mkdir -p /opt/openshift/generated/ocpcurl -fsSL http://172.16.0.10/ipa/config/ca.crt >/opt/openshift/generated/ocp/idm-ca.crtPULL_SECRET_JSON="$(jq -c . /opt/openshift/secrets/pull-secret.txt)"SSH_PUBKEY="$(cat /opt/openshift/secrets/hypervisor-admin.pub)"cat <<EOF >/opt/openshift/generated/ocp/install-config.yamlapiVersion: v1baseDomain: workshop.lanmetadata: name: ocpplatform: none: {}controlPlane: name: master replicas: 3 architecture: amd64compute: - name: worker replicas: 6 architecture: amd64networking: networkType: OVNKubernetes machineNetwork: - cidr: 172.16.10.0/24 clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 serviceNetwork: - 172.30.0.0/16pullSecret: 'REPLACE_FROM_PULL_SECRET_FILE'sshKey: '${SSH_PUBKEY}'additionalTrustBundle: |EOFcat /opt/openshift/generated/ocp/idm-ca.crt >>/opt/openshift/generated/ocp/install-config.yamlpython3 - <<'PY'from pathlib import Pathpath = Path("/opt/openshift/generated/ocp/install-config.yaml")text = path.read_text()pull = Path("/opt/openshift/secrets/pull-secret.txt").read_text().strip().replace("'", "''")path.write_text(text.replace("REPLACE_FROM_PULL_SECRET_FILE", pull))PYWrite the agent config with MAC-based NIC identification and explicit root disk
selection. The current automation uses the libvirt root-disk serial for each
node and renders that into rootDeviceHints.serialNumber.
cat <<'EOF' >/opt/openshift/generated/ocp/agent-config.yamlapiVersion: v1alpha1kind: AgentConfigrendezvousIP: 172.16.10.11hosts: - hostname: ocp-master-01.ocp.workshop.lan role: master interfaces: - name: nic0 macAddress: "52:54:00:20:00:10" rootDeviceHints: serialNumber: "ocpmaster01root" networkConfig: interfaces: - name: nic0 type: ethernet state: up identifier: mac-address mac-address: "52:54:00:20:00:10" - name: nic0.200 type: vlan state: up vlan: base-iface: nic0 id: 200 ipv4: enabled: true address: - ip: 172.16.10.11 prefix-length: 24 - name: nic0.201 type: vlan state: up vlan: base-iface: nic0 id: 201 ipv4: enabled: true address: - ip: 172.16.11.11 prefix-length: 24 dns-resolver: config: server: - 172.16.0.10 routes: config: - destination: 0.0.0.0/0 next-hop-address: 172.16.10.1 next-hop-interface: nic0.200 # Repeat the same pattern for the remaining 8 nodes.EOF19. Generate The Agent ISO
Generate the agent-based installer ISO used to boot the cluster VMs.
Note
Automation reference: playbooks/cluster/openshift-agent-media.yml, role
openshift_agent_media.
Generate the agent media on the bastion, then copy the ISO to virt-01 and
verify its checksum before using it.
# Generate the agent ISO and copy it to virt-01./opt/openshift/generated/tools/4.20.15/bin/openshift-install agent create image \ --dir /opt/openshift/generated/ocpsha256sum /opt/openshift/generated/ocp/agent.x86_64.isossh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 \ "mkdir -p /var/lib/libvirt/images"scp -i /opt/openshift/secrets/hypervisor-admin.key \ /opt/openshift/generated/ocp/agent.x86_64.iso \ root@172.16.0.1:/var/lib/libvirt/images/agent.x86_64.iso.tmpssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 \ "install -m 0644 /var/lib/libvirt/images/agent.x86_64.iso.tmp \ /var/lib/libvirt/images/agent.x86_64.iso && \ sha256sum /var/lib/libvirt/images/agent.x86_64.iso"Create the generated attachment plan that says every node should boot from the agent ISO.
# Render the generated ISO attachment plan.cat <<'EOF' >/opt/openshift/generated/ocp/openshift_cluster_attachment_plan.ymlopenshift_cluster_node_attachment_plan: ocp-master-01: access: attach_agent_boot_media: true agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso ocp-master-02: access: attach_agent_boot_media: true agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso ocp-master-03: access: attach_agent_boot_media: true agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso ocp-infra-01: access: attach_agent_boot_media: true agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso ocp-infra-02: access: attach_agent_boot_media: true agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso ocp-infra-03: access: attach_agent_boot_media: true agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso ocp-worker-01: access: attach_agent_boot_media: true agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso ocp-worker-02: access: attach_agent_boot_media: true agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.iso ocp-worker-03: access: attach_agent_boot_media: true agent_boot_media_path: /var/lib/libvirt/images/agent.x86_64.isoEOF20. Create The OpenShift VM Shells
Create the nine OpenShift VM shells, attach the agent ISO, and boot them into the agent installer.
Note
Automation reference: playbooks/cluster/openshift-cluster.yml, role
openshift_cluster.
Create the 9 OpenShift VM shells on virt-01, attach the ISO, and set them to
boot CD-ROM first.
Current tier intent:
- Gold:
- masters
- Silver:
- infra
- Bronze:
- workers
Current sizing:
- masters:
3 x 8 vCPU - infra:
3 x 16 vCPU - workers:
3 x 8 vCPU
Current CPU pools:
guest_domain:6-23,30-47,54-71,78-95host_emulator:2-5,26-29,50-53,74-77
ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1virt-install \ --name ocp-master-01.ocp.workshop.lan \ --osinfo rhel9.4 \ --boot uefi \ --machine q35 \ --memory 24576 \ --vcpus 8 \ --cpu host-passthrough \ --controller type=scsi,model=virtio-scsi \ --disk path=/dev/ebs/ocp-master-01,format=raw,bus=scsi,cache=none,io=native,discard=unmap,rotation_rate=1 \ --disk path=/var/lib/libvirt/images/agent.x86_64.iso,device=cdrom,bus=scsi \ --network network=lab-switch,portgroup=ocp-trunk,model=virtio,mac=52:54:00:20:00:10 \ --graphics vnc,listen=127.0.0.1 \ --import \ --resource partition=/machine/gold \ --cputune shares=512,emulatorpin.cpuset=2-5,26-29,50-53,74-77,\vcpupin0.vcpu=0,vcpupin0.cpuset=6-23,30-47,54-71,78-95,\vcpupin1.vcpu=1,vcpupin1.cpuset=6-23,30-47,54-71,78-95,\vcpupin2.vcpu=2,vcpupin2.cpuset=6-23,30-47,54-71,78-95,\vcpupin3.vcpu=3,vcpupin3.cpuset=6-23,30-47,54-71,78-95,\vcpupin4.vcpu=4,vcpupin4.cpuset=6-23,30-47,54-71,78-95,\vcpupin5.vcpu=5,vcpupin5.cpuset=6-23,30-47,54-71,78-95,\vcpupin6.vcpu=6,vcpupin6.cpuset=6-23,30-47,54-71,78-95,\vcpupin7.vcpu=7,vcpupin7.cpuset=6-23,30-47,54-71,78-95 \ --autostart \ --noautoconsolevirt-xml ocp-master-01.ocp.workshop.lan --edit --boot cdrom,hd# Repeat the same pattern for:# - ocp-master-02, ocp-master-03# - partition=/machine/gold# - shares=512# - ocp-infra-01..03# - partition=/machine/silver# - shares=333# - attach /dev/ebs/ocp-infra-0X-data as a second disk# - add --iothreads iothreads=1 and iothreadpin0.cpuset=2-5,26-29,50-53,74-77# - ocp-worker-01..03# - partition=/machine/bronze# - shares=16721. Wait For Installer Convergence
Wait for bootstrap and install completion from the bastion.
Note
Automation reference: playbooks/cluster/openshift-install-wait.yml.
After the VM shells are created and booted from agent.x86_64.iso, run the
installer wait phase from the bastion. This is the step that turns “VMs are
running” into “the cluster finished bootstrap and install.”
# Wait for the OpenShift installer to converge./opt/openshift/generated/tools/4.20.15/bin/openshift-install \ --dir /opt/openshift/generated/ocp \ wait-for bootstrap-complete --log-level=debug/opt/openshift/generated/tools/4.20.15/bin/openshift-install \ --dir /opt/openshift/generated/ocp \ wait-for install-complete --log-level=debug22. Validate Post-install State
Validate that the newly installed cluster is healthy enough for day-2 work.
Note
Automation reference: playbooks/day2/openshift-post-install-validate.yml,
role openshift_post_install_validate.
Once installer convergence is complete and auth/kubeconfig exists, use the
generated kubeconfig from inside the lab and validate the cluster from
virt-01.
# Validate the installed cluster from virt-01.scp -i /opt/openshift/secrets/hypervisor-admin.key \ /opt/openshift/generated/ocp/auth/kubeconfig \ root@172.16.0.1:/var/tmp/ocp-kubeconfigscp -i /opt/openshift/secrets/hypervisor-admin.key \ /opt/openshift/generated/tools/4.20.15/bin/oc \ root@172.16.0.1:/var/tmp/ocssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'chmod 0755 /var/tmp/ocexport KUBECONFIG=/var/tmp/ocp-kubeconfig/var/tmp/oc get clusterversion/var/tmp/oc get co/var/tmp/oc get nodes/var/tmp/oc get csrEOFAfter those checks pass, refresh the bastion helper kubeconfigs from the
current cluster state and import the live cluster CA bundle into bastion system
trust so normal oc login works without --insecure-skip-tls-verify.
# Refresh the bastion helper kubeconfigs and trust the cluster CA.ssh cloud-user@172.16.0.30 <<'EOF'set -euo pipefailcp /opt/openshift/aws-metal-openshift-demo/generated/ocp/auth/kubeconfig "$HOME/etc/kubeconfig"cp /opt/openshift/aws-metal-openshift-demo/generated/ocp/auth/kubeconfig "$HOME/etc/kubeconfig.local"chmod 0600 "$HOME/etc/kubeconfig" "$HOME/etc/kubeconfig.local"oc --kubeconfig "$HOME/etc/kubeconfig.local" get configmap/kube-root-ca.crt \ -o jsonpath='{.data.ca\.crt}' >/tmp/kube-root-ca.crtsudo cp /tmp/kube-root-ca.crt /etc/pki/ca-trust/source/anchors/ocp-cluster-ca-bundle.pemsudo update-ca-trust extractEOF23. Detach Install Media And Normalize Boot
Detach the install media and restore disk-first boot intent before any normal cluster reboots occur.
Note
Automation reference: playbooks/maintenance/detach-install-media.yml.
Caution
Do not skip this step. If the agent ISO is still attached when a cluster
node reboots (vCPU resize, operator-triggered restart, or accidental power
cycle), the node will re-enter the day-1 agent installer instead of booting
from disk. This happened in production — see issue 007c920 in the issues
ledger.
Once guests have completed day-1 provisioning, eject the attached installation media and restore disk-only boot intent. This prevents support guests from retaining sensitive cloud-init data and prevents OpenShift guests from booting back into the agent installer after a restart.
For support guests, the preferred timing is earlier than the end of the build: after the initial package update is staged but before the reboot required by that update. That reboot clears the live empty CD-ROM shell that libvirt may leave behind even after the media is ejected and the persistent XML is cleaned up.
For OpenShift cluster guests, the important success condition is different:
eject agent.x86_64.iso and restore disk-first boot. The live or persistent
empty CD-ROM shell does not need to be removed immediately, and trying to do so
on a running node is not a reliable success criterion.
# Verify that the support guests no longer have persistent CD-ROM devices.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'for domain in \ idm-01.workshop.lan \ bastion-01.workshop.lan \ mirror-registry.workshop.lando target=$(virsh domblklist "$domain" --details | awk '$2 == "cdrom" {print $3}') if [ -n "$target" ]; then virsh change-media "$domain" "$target" --eject --config --live || true virt-xml "$domain" --remove-device --disk "target=$target" virt-xml "$domain" --edit --boot hd fidonefor domain in \ ocp-master-01.ocp.workshop.lan \ ocp-master-02.ocp.workshop.lan \ ocp-master-03.ocp.workshop.lan \ ocp-infra-01.ocp.workshop.lan \ ocp-infra-02.ocp.workshop.lan \ ocp-infra-03.ocp.workshop.lan \ ocp-worker-01.ocp.workshop.lan \ ocp-worker-02.ocp.workshop.lan \ ocp-worker-03.ocp.workshop.lando target=$(virsh domblklist "$domain" --details \ | awk '$2 == "cdrom" && $4 == "/var/lib/libvirt/images/agent.x86_64.iso" {print $3}') if [ -n "$target" ]; then virsh change-media "$domain" "$target" --eject --config --live || true virt-xml "$domain" --edit --boot hd fidoneEOFVerify support guests no longer carry persistent CD-ROM devices:
# Verify support guests no longer carry persistent CD-ROM devices.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'for domain in \ idm-01.workshop.lan \ bastion-01.workshop.lan \ mirror-registry.workshop.lando echo "=== $domain ===" virsh dumpxml --inactive "$domain" | grep "device='cdrom'" || echo "no persistent cdrom"doneEOFVerify OpenShift guests have no attached agent ISO media and boot from disk:
# Verify that the OpenShift guests boot from disk with no agent ISO attached.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'for domain in \ ocp-master-01.ocp.workshop.lan \ ocp-master-02.ocp.workshop.lan \ ocp-master-03.ocp.workshop.lan \ ocp-infra-01.ocp.workshop.lan \ ocp-infra-02.ocp.workshop.lan \ ocp-infra-03.ocp.workshop.lan \ ocp-worker-01.ocp.workshop.lan \ ocp-worker-02.ocp.workshop.lan \ ocp-worker-03.ocp.workshop.lando echo "=== $domain ===" virsh domblklist "$domain" --details | awk '$2 == "cdrom" {print}' virsh dumpxml --inactive "$domain" | grep "<boot dev='hd'/>" || echo "boot order needs review"doneEOF24. Configure Breakglass Auth, Keycloak OIDC, And Infra Roles
This section is the manual runbook for the supported infra and authentication cutover: move platform workloads onto infra nodes, establish a local breakglass login, deploy Keycloak, federate it to IdM, and configure OpenShift to use OIDC.
Note
Automation reference: the identity and infra phases inside
playbooks/day2/openshift-post-install.yml, primarily roles
openshift_post_install_infra,
openshift_post_install_breakglass_auth,
openshift_post_install_keycloak, and
openshift_post_install_oidc_auth.
Architecture reference: AUTH MODEL for the current supported auth boundary, and AD / IDM POLICY MODEL for the planned future AD-source-of-truth model.
The supported execution order is:
- disconnected OperatorHub pivot
- infra conversion
- IdM ingress certificate rollout
- breakglass
HTPasswdauth - NMState
- ODF
- Keycloak
- OIDC auth
- optional legacy LDAP auth and group sync
- OpenShift Virtualization
- OpenShift Pipelines
- Web Terminal
- AAP
- Network Observability
- validation
The aggregated day-2 play now uses convergence probes for the major phases.
On rerun, a healthy phase is skipped unless its matching force_* variable is
true. Use enable_* variables to select the desired profile; use force_*
variables only for deliberate repair or rebuild work.
The supported default auth model is:
- create a local
HTPasswdbreakglass login - remove
kubeadminafter the breakglass login is proven - deploy Keycloak after ODF storage is available
- federate Keycloak to IdM
- configure OpenShift OAuth for OIDC against Keycloak
- map the OIDC
groupsclaim into OpenShift groups - bind IdM group
access-openshift-adminto OpenShiftcluster-admin
Direct OpenShift LDAP auth is no longer the default baseline. Keep it out of the cluster OAuth configuration unless you are deliberately validating that compatibility path.
The same principle now applies to AAP: the supported clean-build path is Keycloak OIDC, not direct AAP LDAP.
Label the infra nodes and move platform workloads onto them early in day-2 so the later auth and storage work settles on the intended node tier.
Note: do not taint infra nodes for general workload placement here.
Workloads are steered via nodeSelector / nodePlacement. Taints are applied
later only for the ODF storage set
(node.ocs.openshift.io/storage).
# Label the infra nodes and move the core platform workloads onto them.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfig/var/tmp/oc label node ocp-infra-01 node-role.kubernetes.io/infra='' --overwrite/var/tmp/oc label node ocp-infra-02 node-role.kubernetes.io/infra='' --overwrite/var/tmp/oc label node ocp-infra-03 node-role.kubernetes.io/infra='' --overwritecat <<'YAML' | /var/tmp/oc apply -f -apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: access-openshift-admin-cluster-adminroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-adminsubjects: - kind: Group apiGroup: rbac.authorization.k8s.io name: access-openshift-adminYAML# --- Move platform workloads to infra nodes ---/var/tmp/oc patch ingresscontroller/default -n openshift-ingress-operator \ --type=merge -p \ '{"spec":{"nodePlacement":{"nodeSelector":{"matchLabels":{"node-role.kubernetes.io/infra":""}},"tolerations":[{"key":"node.ocs.openshift.io/storage","value":"true","effect":"NoSchedule"}]}}}'cat <<'YAML' | /var/tmp/oc apply -f -apiVersion: v1kind: ConfigMapmetadata: name: cluster-monitoring-config namespace: openshift-monitoringdata: config.yaml: | prometheusOperator: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node.ocs.openshift.io/storage value: "true" effect: NoSchedule prometheusK8s: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node.ocs.openshift.io/storage value: "true" effect: NoSchedule alertmanagerMain: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node.ocs.openshift.io/storage value: "true" effect: NoSchedule kubeStateMetrics: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node.ocs.openshift.io/storage value: "true" effect: NoSchedule openshiftStateMetrics: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node.ocs.openshift.io/storage value: "true" effect: NoSchedule thanosQuerier: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node.ocs.openshift.io/storage value: "true" effect: NoSchedule metricsServer: nodeSelector: node-role.kubernetes.io/infra: "" tolerations: - key: node.ocs.openshift.io/storage value: "true" effect: NoScheduleYAML/var/tmp/oc patch configs.imageregistry/cluster --type=merge \ -p '{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""},"tolerations":[{"key":"node.ocs.openshift.io/storage","value":"true","effect":"NoSchedule"}]}}'EOFImportant
Preserve a local recovery path before changing network auth. Create and
validate a breakglass HTPasswd user before patching OAuth to use Keycloak.
Only after the breakglass login works should you retire kubeadmin.
Start by establishing and validating the breakglass OAuth identity provider.
# Create the breakglass identity provider, test it, and retire kubeadmin.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfightpasswd -BbnC 12 breakglass-admin '<lab-default-password>' >/tmp/htpasswd/var/tmp/oc create secret generic breakglass-htpasswd \ -n openshift-config \ --from-file=htpasswd=/tmp/htpasswd \ --dry-run=client -o yaml | /var/tmp/oc apply -f -cat <<'YAML' | /var/tmp/oc apply -f -apiVersion: config.openshift.io/v1kind: OAuthmetadata: name: clusterspec: identityProviders: - name: Breakglass HTPasswd mappingMethod: claim type: HTPasswd htpasswd: fileData: name: breakglass-htpasswdYAMLuntil [ "$(/var/tmp/oc get clusteroperator authentication -o jsonpath='{.status.conditions[?(@.type=="Available")].status},{.status.conditions[?(@.type=="Progressing")].status},{.status.conditions[?(@.type=="Degraded")].status}')" = "True,False,False" ]; do sleep 10doneuntil curl -skf "https://$(/var/tmp/oc get route oauth-openshift -n openshift-authentication -o jsonpath='{.spec.host}')/healthz" >/dev/null; do sleep 10done/var/tmp/oc login https://api.ocp.workshop.lan:6443 \ --username=breakglass-admin \ --password='<lab-default-password>' \ --insecure-skip-tls-verify \ --kubeconfig=/tmp/kubeconfig-breakglass-test/var/tmp/oc whoami --kubeconfig=/tmp/kubeconfig-breakglass-testcat <<'YAML' | /var/tmp/oc apply -f -apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: breakglass-admin-cluster-adminroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-adminsubjects: - apiGroup: rbac.authorization.k8s.io kind: User name: breakglass-adminYAML/var/tmp/oc delete secret kubeadmin -n kube-system --ignore-not-found=true! /var/tmp/oc get secret kubeadmin -n kube-system/var/tmp/oc logout --kubeconfig=/tmp/kubeconfig-breakglass-test || truerm -f /tmp/kubeconfig-breakglass-test /tmp/htpasswdEOFDeploy Keycloak after ODF so its PostgreSQL PVC can bind to the Ceph RBD storage class. The intended state is:
- namespace
keycloak rhbk-operator- PostgreSQL backed by
ocs-storagecluster-ceph-rbd - Keycloak route
sso.apps.ocp.workshop.lan - realm
openshift - client
openshift - LDAP federation against the IdM compat tree
- a
groupsprotocol mapper so OpenShift receives group membership claims
Manual equivalent for the Keycloak install itself:
# Install the Keycloak operator and base deployment.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc create namespace keycloak || truecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata: name: keycloak namespace: keycloakspec: targetNamespaces: - keycloak---apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata: name: rhbk-operator namespace: keycloakspec: channel: stable-v26.2 installPlanApproval: Manual name: rhbk-operator source: cs-redhat-operator-index-v4-20 sourceNamespace: openshift-marketplaceYAMLoc create secret generic workshop-keycloak-bootstrap-admin \ -n keycloak \ --from-literal=username=admin \ --from-literal=password='<lab-default-password>' \ --dry-run=client -o yaml | oc apply -f -oc create secret generic workshop-keycloak-db \ -n keycloak \ --from-literal=username=keycloak \ --from-literal=password='<lab-default-password>' \ --dry-run=client -o yaml | oc apply -f -oc extract -n openshift-ingress secret/ingress-default-idm-tls --to=/tmp/keycloak-tls --confirmoc create secret tls workshop-keycloak-tls \ -n keycloak \ --cert=/tmp/keycloak-tls/tls.crt \ --key=/tmp/keycloak-tls/tls.key \ --dry-run=client -o yaml | oc apply -f -until [ -n "$(oc get subscription rhbk-operator -n keycloak -o jsonpath='{.status.installplan.name}' 2>/dev/null)" ]; do sleep 10doneINSTALLPLAN="$(oc get subscription rhbk-operator -n keycloak -o jsonpath='{.status.installplan.name}')"oc patch installplan "$INSTALLPLAN" -n keycloak --type=merge -p '{"spec":{"approved":true}}'until [ "$(oc get subscription rhbk-operator -n keycloak -o jsonpath='{.status.currentCSV}' 2>/dev/null | xargs -r -I{} oc get csv {} -n keycloak -o jsonpath='{.status.phase}')" = "Succeeded" ]; do sleep 10donecat <<'YAML' | oc apply -f -apiVersion: apps/v1kind: StatefulSetmetadata: name: postgres-db namespace: keycloakspec: serviceName: postgres-db selector: matchLabels: app: postgres-db replicas: 1 template: metadata: labels: app: postgres-db spec: containers: - name: postgres-db image: registry.redhat.io/rhel9/postgresql-15:latest env: - name: POSTGRESQL_USER valueFrom: secretKeyRef: name: workshop-keycloak-db key: username - name: POSTGRESQL_PASSWORD valueFrom: secretKeyRef: name: workshop-keycloak-db key: password - name: POSTGRESQL_DATABASE value: keycloak ports: - containerPort: 5432 name: postgres volumeMounts: - mountPath: /var/lib/pgsql/data name: pgdata volumeClaimTemplates: - metadata: name: pgdata spec: storageClassName: ocs-storagecluster-ceph-rbd accessModes: - ReadWriteOnce resources: requests: storage: 5Gi---apiVersion: v1kind: Servicemetadata: name: postgres-db namespace: keycloakspec: selector: app: postgres-db ports: - port: 5432 targetPort: 5432 name: postgres---apiVersion: k8s.keycloak.org/v2alpha1kind: Keycloakmetadata: name: workshop-keycloak namespace: keycloakspec: instances: 1 bootstrapAdmin: user: secret: workshop-keycloak-bootstrap-admin db: vendor: postgres host: postgres-db port: 5432 database: keycloak usernameSecret: name: workshop-keycloak-db key: username passwordSecret: name: workshop-keycloak-db key: password http: tlsSecret: workshop-keycloak-tls hostname: hostname: https://sso.apps.ocp.workshop.lan ingress: enabled: false proxy: headers: xforwardedYAMLuntil [ "$(oc get keycloak workshop-keycloak -n keycloak -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}' 2>/dev/null)" = "True" ]; do sleep 10doneoc create route reencrypt workshop-keycloak \ --service=workshop-keycloak-service \ --cert=/tmp/keycloak-tls/tls.crt \ --key=/tmp/keycloak-tls/tls.key \ --dest-ca-cert=/etc/ipa/ca.crt \ --ca-cert=/etc/ipa/ca.crt \ --hostname=sso.apps.ocp.workshop.lan \ -n keycloak \ --dry-run=client -o yaml | oc apply -f -until curl -skf https://sso.apps.ocp.workshop.lan/realms/master/.well-known/openid-configuration >/dev/null; do sleep 10doneEOFOpenShift OAuth is then patched to trust Keycloak OIDC and map the groups
claim into OpenShift groups. The resulting effective authorization model is:
- IdM group membership is the source of truth
- Keycloak emits
groups - OpenShift maps
claims.groups access-openshift-adminis bound tocluster-admin
Manual equivalent for the OIDC federation and OAuth patch:
# Configure the Keycloak realm, client, LDAP federation, and OpenShift OAuth integration.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigOAUTH_HOST="$(oc get route oauth-openshift -n openshift-authentication -o jsonpath='{.spec.host}')"KEYCLOAK_ADMIN_TOKEN="$(curl --cacert /etc/ipa/ca.crt -sS -X POST https://sso.apps.ocp.workshop.lan/realms/master/protocol/openid-connect/token -H 'Content-Type: application/x-www-form-urlencoded' --data-urlencode 'grant_type=password' --data-urlencode 'client_id=admin-cli' --data-urlencode 'username=admin' --data-urlencode 'password=<lab-default-password>' | jq -r .access_token)"curl --cacert /etc/ipa/ca.crt -sS -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" -H 'Content-Type: application/json' -X POST https://sso.apps.ocp.workshop.lan/admin/realms -d '{"realm":"openshift","enabled":true,"displayName":"OpenShift","sslRequired":"external","registrationAllowed":false,"resetPasswordAllowed":false,"rememberMe":false,"loginWithEmailAllowed":false}' || trueCLIENT_ID="$(curl --cacert /etc/ipa/ca.crt -sS -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" "https://sso.apps.ocp.workshop.lan/admin/realms/openshift/clients?clientId=openshift" | jq -r '.[0].id')"if [ -z "${CLIENT_ID}" ] || [ "${CLIENT_ID}" = "null" ]; then cat >/tmp/keycloak-openshift-client.json <<JSON{ "clientId": "openshift", "name": "OpenShift", "enabled": true, "protocol": "openid-connect", "publicClient": false, "secret": "<lab-default-password>", "standardFlowEnabled": true, "directAccessGrantsEnabled": true, "serviceAccountsEnabled": false, "frontchannelLogout": true, "redirectUris": ["https://${OAUTH_HOST}/oauth2callback/Keycloak"], "webOrigins": ["+"], "defaultClientScopes": ["profile", "email", "roles", "web-origins"]}JSON curl --cacert /etc/ipa/ca.crt -sS -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" -H 'Content-Type: application/json' -X POST https://sso.apps.ocp.workshop.lan/admin/realms/openshift/clients --data-binary @/tmp/keycloak-openshift-client.json CLIENT_ID="$(curl --cacert /etc/ipa/ca.crt -sS -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" "https://sso.apps.ocp.workshop.lan/admin/realms/openshift/clients?clientId=openshift" | jq -r '.[0].id')"ficurl --cacert /etc/ipa/ca.crt -sS -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" -H 'Content-Type: application/json' -X POST "https://sso.apps.ocp.workshop.lan/admin/realms/openshift/clients/${CLIENT_ID}/protocol-mappers/models" -d '{"name":"groups","protocol":"openid-connect","protocolMapper":"oidc-group-membership-mapper","config":{"full.path":"false","id.token.claim":"true","access.token.claim":"true","userinfo.token.claim":"true","claim.name":"groups","multivalued":"true"}}' || trueREALM_ID="$(curl --cacert /etc/ipa/ca.crt -sS -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" https://sso.apps.ocp.workshop.lan/admin/realms/openshift | jq -r .id)"curl --cacert /etc/ipa/ca.crt -sS -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" -H 'Content-Type: application/json' -X POST https://sso.apps.ocp.workshop.lan/admin/realms/openshift/components -d '{"name":"idm-compat","parentId":"'"${REALM_ID}"'","providerId":"ldap","providerType":"org.keycloak.storage.UserStorageProvider","config":{"enabled":["true"],"priority":["0"],"fullSyncPeriod":["-1"],"changedSyncPeriod":["-1"],"cachePolicy":["DEFAULT"],"batchSizeForSync":["1000"],"importEnabled":["true"],"syncRegistrations":["false"],"editMode":["READ_ONLY"],"vendor":["other"],"usernameLDAPAttribute":["uid"],"rdnLDAPAttribute":["uid"],"uuidLDAPAttribute":["uid"],"userObjectClasses":["posixAccount"],"connectionUrl":["ldap://idm-01.workshop.lan"],"usersDn":["cn=users,cn=compat,dc=workshop,dc=lan"],"authType":["simple"],"bindDn":["cn=Directory Manager"],"bindCredential":["<lab-default-password>"],"searchScope":["2"],"validatePasswordPolicy":["false"],"trustEmail":["false"],"connectionPooling":["true"],"pagination":["true"],"startTls":["false"]}}' || trueLDAP_ID="$(curl --cacert /etc/ipa/ca.crt -sS -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" "https://sso.apps.ocp.workshop.lan/admin/realms/openshift/components?type=org.keycloak.storage.UserStorageProvider" | jq -r '.[] | select(.name=="idm-compat") | .id')"curl --cacert /etc/ipa/ca.crt -sS -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" -H 'Content-Type: application/json' -X POST "https://sso.apps.ocp.workshop.lan/admin/realms/openshift/components" -d '{"name":"idm-compat-group-mapper","parentId":"'"${LDAP_ID}"'","providerId":"group-ldap-mapper","providerType":"org.keycloak.storage.ldap.mappers.LDAPStorageMapper","config":{"enabled":["true"],"priority":["0"],"fullSyncPeriod":["-1"],"changedSyncPeriod":["-1"],"cachePolicy":["DEFAULT"],"batchSizeForSync":["1000"],"mode":["LDAP_ONLY"],"groups.dn":["cn=groups,cn=compat,dc=workshop,dc=lan"],"group.name.ldap.attribute":["cn"],"group.object.classes":["posixGroup"],"preserve.group.inheritance":["false"],"membership.ldap.attribute":["memberUid"],"membership.attribute.type":["UID"],"groups.ldap.filter":[""],"user.roles.retrieve.strategy":["LOAD_GROUPS_BY_MEMBER_ATTRIBUTE"],"drop.non.existing.groups.during.sync":["false"]}}' || trueoc create secret generic oidc-client-secret \ -n openshift-config \ --from-literal=clientSecret='<lab-default-password>' \ --dry-run=client -o yaml | oc apply -f -oc create configmap oidc-ca \ -n openshift-config \ --from-file=ca.crt=/etc/ipa/ca.crt \ --dry-run=client -o yaml | oc apply -f -cat <<'YAML' | oc apply -f -apiVersion: config.openshift.io/v1kind: OAuthmetadata: name: clusterspec: identityProviders: - name: Breakglass HTPasswd mappingMethod: claim type: HTPasswd htpasswd: fileData: name: breakglass-htpasswd - name: Keycloak mappingMethod: claim type: OpenID openID: clientID: openshift clientSecret: name: oidc-client-secret issuer: https://sso.apps.ocp.workshop.lan/realms/openshift ca: name: oidc-ca claims: preferredUsername: - preferred_username name: - name - preferred_username email: - email - preferred_username groups: - groups extraScopes: - email - profileYAMLcat <<'YAML' | oc apply -f -apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: access-openshift-admin-cluster-adminroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-adminsubjects: - apiGroup: rbac.authorization.k8s.io kind: Group name: access-openshift-adminYAMLEOFThat means adding a native IdM user, or a trusted AD user that lands in the
same IdM role group, to access-openshift-admin makes that user a cluster admin once
they authenticate through Keycloak.
Validate the end state with both a native IdM user and an AD-backed user.
# Validate the Keycloak OIDC login path.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfiguntil [ "$(/var/tmp/oc get co authentication -o jsonpath='{.status.conditions[?(@.type=="Progressing")].status}')" = "False" ]; do sleep 10done/var/tmp/oc get oauth cluster -o jsonpath='{range .spec.identityProviders[*]}{.name} => groups={.openID.claims.groups}{"\n"}{end}'/var/tmp/oc get groups/var/tmp/oc get clusterrolebinding access-openshift-admin-cluster-adminEOFIf you deliberately want to validate the old direct-LDAP path, treat it as an optional side test after OIDC is working. Do not treat it as the default cluster auth model, and do not replace the breakglass plus OIDC baseline with it.
25. Install Kubernetes NMState
Install Kubernetes NMState and create the VLAN policies needed by later VM and live-migration networking.
Note
Automation reference: playbooks/day2/openshift-post-install-nmstate.yml,
role openshift_post_install_nmstate.
Install the NMState operator and create the singleton NMState instance.
# Install the Kubernetes NMState operator.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc create namespace openshift-nmstate || truecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata: name: openshift-nmstate namespace: openshift-nmstatespec: targetNamespaces: - openshift-nmstateYAMLcat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata: name: kubernetes-nmstate-operator namespace: openshift-nmstatespec: channel: stable installPlanApproval: Automatic name: kubernetes-nmstate-operator source: cs-redhat-operator-index-v4-20 sourceNamespace: openshift-marketplaceYAMLoc wait --for=condition=Established crd/nmstates.nmstate.io --timeout=20mcat <<'YAML' | oc apply -f -apiVersion: nmstate.io/v1kind: NMStatemetadata: name: nmstateYAMLoc -n openshift-nmstate wait --for=condition=Available deployment/nmstate-operator --timeout=20mEOFCreate the VLAN policies used later by OpenShift Virtualization and VM workloads.
# Apply the nmstate desired state policy.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigcat <<'YAML' | oc apply -f -apiVersion: nmstate.io/v1kind: NodeNetworkConfigurationPolicymetadata: name: kubevirt-live-migration-vlanspec: nodeSelector: node-role.kubernetes.io/worker: "" desiredState: interfaces: - name: vlan202 description: OpenShift Virtualization live migration VLAN type: vlan state: up vlan: base-iface: enp1s0 id: 202 ipv4: enabled: false ipv6: enabled: falseYAMLcat <<'YAML' | oc apply -f -apiVersion: nmstate.io/v1kind: NodeNetworkConfigurationPolicymetadata: name: vm-data-vlan-300spec: nodeSelector: node-role.kubernetes.io/worker: "" desiredState: interfaces: - name: vlan300 description: Routed VM data network A type: vlan state: up vlan: base-iface: enp1s0 id: 300 ipv4: enabled: false ipv6: enabled: falseYAMLcat <<'YAML' | oc apply -f -apiVersion: nmstate.io/v1kind: NodeNetworkConfigurationPolicymetadata: name: vm-data-vlan-301spec: nodeSelector: node-role.kubernetes.io/worker: "" desiredState: interfaces: - name: vlan301 description: Routed VM data network B type: vlan state: up vlan: base-iface: enp1s0 id: 301 ipv4: enabled: false ipv6: enabled: falseYAMLcat <<'YAML' | oc apply -f -apiVersion: nmstate.io/v1kind: NodeNetworkConfigurationPolicymetadata: name: vm-data-vlan-302spec: nodeSelector: node-role.kubernetes.io/worker: "" desiredState: interfaces: - name: vlan302 description: Isolated VM data network type: vlan state: up vlan: base-iface: enp1s0 id: 302 ipv4: enabled: false ipv6: enabled: falseYAMLoc wait nncp/kubevirt-live-migration-vlan --for=jsonpath='{.status.conditions[?(@.type=="Available")].status}'=True --timeout=20moc wait nncp/vm-data-vlan-300 --for=jsonpath='{.status.conditions[?(@.type=="Available")].status}'=True --timeout=20moc wait nncp/vm-data-vlan-301 --for=jsonpath='{.status.conditions[?(@.type=="Available")].status}'=True --timeout=20moc wait nncp/vm-data-vlan-302 --for=jsonpath='{.status.conditions[?(@.type=="Available")].status}'=True --timeout=20mEOFDesign note:
- this lab currently uses interface-name matching with
enp1s0because it is easy to read and explain - nmstate also supports matching the parent uplink by MAC address
- a MAC-matched model is more robust across different hardware and interface naming schemes, but it requires generating a separate policy per node
26. Deploy ODF Declaratively
Deploy ODF declaratively, including the host-side cleanup needed to avoid stale Ceph and OLM state.
Note
Automation reference: the ODF phase inside
playbooks/day2/openshift-post-install.yml, primarily role
openshift_post_install_odf for internal mode and
openshift_post_install_odf_external for external mode.
Warning
ODF must run before Virtualization (27) and NetObserv (29). CNV expects
ocs-storagecluster-ceph-rbd to be available when it sets the default virt
storage class. NetObserv needs NooBaa S3 for Loki. Running them out of order
causes silent failures that are hard to diagnose.
Wipe stale Ceph bluestore labels from OSD backing devices, clean up any
duplicate OperatorGroups in the Local Storage namespace, label and taint the
infra nodes for storage, configure Local Storage discovery, create the
LocalVolumeSet, and apply the StorageCluster.
That is the internal ODF path. External mode is selected with
openshift_post_install_odf_mode: external and deliberately skips Local
Storage Operator resources, local OSD disks, storage-node labels, and storage
taints. It imports external Ceph cluster details from either
openshift_post_install_odf_external_cluster_details_file or
openshift_post_install_odf_external_cluster_details_b64, applies the external
StorageCluster, and keeps the same ordering slot so Keycloak, AAP,
NetObserv, and any later storage consumers still wait for ODF first.
External mode can also converge OVN host routing before the external
StorageCluster is created:
openshift_post_install_odf_external_gateway_routing_via_host: trueopenshift_post_install_odf_external_gateway_ip_forwarding: GlobalThis is required when ODF pods need to reach off-cluster Ceph endpoints over a node-reachable routed storage network.
Caution
OSD device preparation on reused EBS volumes. A conventional small
head/tail wipe is not sufficient for reused ODF disks. The current recovery
path wipes the first 2 GiB, fixed BlueStore label positions at 0, 1,
10, 100, and 1000 GiB, and the device tail. It also purges
/var/lib/rook/* and /var/lib/ceph/* on the infra nodes before reinstall.
Destructive recovery is not part of a normal rerun. It must be explicitly
forced.
The manual equivalent on the hypervisor:
# Wipe the ODF data disks on the infra nodes.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'for dev in /dev/ebs/ocp-infra-01-data /dev/ebs/ocp-infra-02-data /dev/ebs/ocp-infra-03-data; do size_mb=$(( $(blockdev --getsize64 "$dev") / 1024 / 1024 )) blkdiscard "$dev" || true wipefs --all --force "$dev" || true dd if=/dev/zero of="$dev" bs=4M count=512 oflag=direct conv=fsync,notrunc status=none for offset_mb in 0 1024 10240 102400 1024000; do if [ "$offset_mb" -lt "$size_mb" ]; then dd if=/dev/zero of="$dev" bs=1M seek=$offset_mb count=64 oflag=direct conv=fsync,notrunc status=none fi done if [ "$size_mb" -gt 256 ]; then dd if=/dev/zero of="$dev" bs=1M seek=$(( size_mb - 256 )) count=256 oflag=direct conv=fsync,notrunc status=none fidonefor node in ocp-infra-01 ocp-infra-02 ocp-infra-03; do oc debug "node/${node}" -- chroot /host bash -lc 'rm -rf /var/lib/rook/* /var/lib/ceph/*'doneEOFWarning
OperatorGroup cleanup. OLM can leave behind auto-generated
OperatorGroups when namespaces are recreated. If more than one OperatorGroup
exists in openshift-local-storage, OLM refuses to process subscriptions
(MultipleOperatorGroupsFound). The automation deletes any stale
OperatorGroups before applying the subscription. If you are running this
manually, check first.
Before you apply the LocalVolume and StorageCluster CRs, install the Local
Storage Operator and ODF operator so the APIs exist.
# Install the local storage and ODF operators.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc create namespace openshift-local-storage || trueoc create namespace openshift-storage || truefor og in $(oc get operatorgroup -n openshift-local-storage -o jsonpath='{.items[*].metadata.name}' 2>/dev/null); do oc delete operatorgroup "$og" -n openshift-local-storage --ignore-not-found=truedonecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata: name: openshift-local-storage namespace: openshift-local-storagespec: targetNamespaces: - openshift-local-storage---apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata: name: local-storage-operator namespace: openshift-local-storagespec: channel: stable installPlanApproval: Automatic name: local-storage-operator source: cs-redhat-operator-index-v4-20 sourceNamespace: openshift-marketplace---apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata: name: openshift-storage namespace: openshift-storagespec: targetNamespaces: - openshift-storage---apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata: name: odf-operator namespace: openshift-storagespec: channel: stable-4.20 installPlanApproval: Automatic name: odf-operator source: cs-redhat-operator-index-v4-20 sourceNamespace: openshift-marketplaceYAMLuntil [ "$(oc get subscription local-storage-operator -n openshift-local-storage -o jsonpath='{.status.currentCSV}' 2>/dev/null | xargs -r -I{} oc get csv {} -n openshift-local-storage -o jsonpath='{.status.phase}')" = "Succeeded" ]; do sleep 10doneuntil [ "$(oc get subscription odf-operator -n openshift-storage -o jsonpath='{.status.currentCSV}' 2>/dev/null | xargs -r -I{} oc get csv {} -n openshift-storage -o jsonpath='{.status.phase}')" = "Succeeded" ]; do sleep 10doneEOFCurrent default:
openshift_post_install_odf_multus_enabled: false
Reason:
- this project runs ODF in a nested
KVM + OVS + libvirtenvironment - ODF public-network Multus/macvlan on VLAN 201 is not a safe default on that hypervisor path
- the stable default is therefore the pod network unless the hypervisor is intentionally engineered for the extra secondary-MAC/promiscuous-mode requirements
# Label the infra nodes and deploy the ODF storage resources.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigfor node in ocp-infra-01 ocp-infra-02 ocp-infra-03; do oc label node "${node}" cluster.ocs.openshift.io/openshift-storage='' --overwrite oc adm taint node "${node}" node.ocs.openshift.io/storage=true:NoSchedule --overwritedoneoc create namespace openshift-local-storage || trueoc create namespace openshift-storage || truecat <<'YAML' | oc apply -f -apiVersion: local.storage.openshift.io/v1alpha1kind: LocalVolumeDiscoverymetadata: name: auto-discover-devices namespace: openshift-local-storagespec: nodeSelector: nodeSelectorTerms: - matchExpressions: - key: node-role.kubernetes.io/infra operator: Exists tolerations: - key: node-role.kubernetes.io/infra operator: Exists effect: NoSchedule - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoScheduleYAMLcat <<'YAML' | oc apply -f -apiVersion: local.storage.openshift.io/v1alpha1kind: LocalVolumeSetmetadata: name: ceph-osd namespace: openshift-local-storagespec: storageClassName: ceph-osd volumeMode: Block fsType: ext4 maxDeviceCount: 1 deviceInclusionSpec: deviceTypes: [disk] minSize: 900Gi maxSize: 1000Gi nodeSelector: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists tolerations: - key: node-role.kubernetes.io/infra operator: Exists effect: NoSchedule - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoScheduleYAMLcat <<'YAML' | oc apply -f -apiVersion: ocs.openshift.io/v1kind: StorageClustermetadata: name: ocs-storagecluster namespace: openshift-storagespec: manageNodes: false monDataDirHostPath: /var/lib/rook multiCloudGateway: reconcileStrategy: manage storageDeviceSets: - name: ocs-deviceset count: 1 replica: 3 portable: false dataPVCTemplate: spec: accessModes: [ReadWriteOnce] resources: requests: storage: 980Gi storageClassName: ceph-osd volumeMode: Block placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cluster.ocs.openshift.io/openshift-storage operator: Exists tolerations: - key: node-role.kubernetes.io/infra operator: Exists effect: NoSchedule - key: node.ocs.openshift.io/storage operator: Equal value: "true" effect: NoScheduleYAMLEOF27. Install OpenShift Virtualization
Install OpenShift Virtualization and the workload-availability operators that support it.
Note
Automation reference:
playbooks/day2/openshift-post-install-virtualization.yml, role
openshift_post_install_virtualization.
Install CNV, set its default storage class, and install the workload availability operators.
# Install OpenShift Virtualization and the recovery operators.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc create namespace openshift-cnv || truecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata: name: kubevirt-hyperconverged namespace: openshift-cnvspec: targetNamespaces: - openshift-cnvYAMLcat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata: name: kubevirt-hyperconverged namespace: openshift-cnvspec: channel: stable installPlanApproval: Automatic name: kubevirt-hyperconverged source: cs-redhat-operator-index-v4-20 sourceNamespace: openshift-marketplaceYAMLoc wait --for=condition=Established crd/hyperconvergeds.hco.kubevirt.io --timeout=20moc annotate storageclass ocs-storagecluster-ceph-rbd \ storageclass.kubevirt.io/is-default-virt-class=true --overwritecat <<'YAML' | oc apply -f -apiVersion: hco.kubevirt.io/v1beta1kind: HyperConvergedmetadata: name: kubevirt-hyperconverged namespace: openshift-cnvspec: vmStateStorageClass: ocs-storagecluster-ceph-rbdYAMLoc create namespace openshift-workload-availability || truecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata: name: openshift-workload-availability namespace: openshift-workload-availabilityspec: targetNamespaces: []YAMLcat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata: name: node-healthcheck-operator namespace: openshift-workload-availabilityspec: channel: stable installPlanApproval: Automatic name: node-healthcheck-operator source: cs-redhat-operator-index-v4-20 sourceNamespace: openshift-marketplace---apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata: name: fence-agents-remediation namespace: openshift-workload-availabilityspec: channel: stable installPlanApproval: Automatic name: fence-agents-remediation source: cs-redhat-operator-index-v4-20 sourceNamespace: openshift-marketplaceYAMLEOF28. Install The Web Terminal
Install the Web Terminal operator, build the custom tooling image, and point the devworkspace template at that image.
Note
Automation reference: playbooks/day2/openshift-post-install-web-terminal.yml,
role openshift_post_install_web_terminal.
Install the operator, build the custom tooling image in the mirror registry, and patch the Web Terminal tooling template to use it.
# Install the Web Terminal operator.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigcat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata: name: web-terminal namespace: openshift-operatorsspec: channel: fast installPlanApproval: Automatic name: web-terminal source: cs-redhat-operator-index-v4-20 sourceNamespace: openshift-marketplaceYAMLEOFBuild and push the tooling image from the mirror registry host.
# Build and push the web terminal tooling image.ssh -i /opt/openshift/secrets/hypervisor-admin.key cloud-user@172.16.0.20 <<'EOF'sudo -imkdir -p /var/tmp/web-terminal-toolingcat <<'CONTAINERFILE' >/var/tmp/web-terminal-tooling/ContainerfileFROM registry.redhat.io/web-terminal/web-terminal-tooling-rhel9:latestRUN microdnf install -y \ bind-utils \ iperf3 \ iproute \ iputils \ jq \ nmap-ncat \ openldap-clients \ procps-ng \ traceroute && \ microdnf clean allCONTAINERFILEpodman build -t mirror-registry.workshop.lan:8443/init/web-terminal-tooling-custom:latest \ /var/tmp/web-terminal-toolingpodman push mirror-registry.workshop.lan:8443/init/web-terminal-tooling-custom:latestEOFPatch the pull secret and the terminal tooling template.
# Patch the cluster pull secret and the web terminal tooling template.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigREGISTRY_AUTH="$(printf '%s' 'init:<lab-default-password>' | base64 -w0)"oc extract secret/pull-secret -n openshift-config --to=/tmp/pull-secret --confirmcat /tmp/pull-secret/.dockerconfigjson | jq --arg auth "${REGISTRY_AUTH}" \ '.auths["mirror-registry.workshop.lan:8443"] = {"auth":$auth,"email":"init@workshop.lan"}' \ >/tmp/dockerconfigjsonoc set data secret/pull-secret -n openshift-config \ .dockerconfigjson="$(cat /tmp/dockerconfigjson)"cat <<'YAML' | oc apply -f -apiVersion: workspace.devfile.io/v1alpha2kind: DevWorkspaceTemplatemetadata: name: web-terminal-tooling namespace: openshift-operatorsspec: components: - name: web-terminal-tooling container: image: mirror-registry.workshop.lan:8443/init/web-terminal-tooling-custom:latestYAMLoc -n openshift-terminal delete devworkspace --all || trueEOF29. Install Network Observability And Loki
Install Network Observability and Loki, then create the ODF-backed
FlowCollector and LokiStack resources.
Note
Automation reference: playbooks/day2/openshift-post-install-netobserv.yml,
role openshift_post_install_netobserv.
Install the operators, create an ODF-backed LokiStack, and create a tuned
FlowCollector.
# Install the Network Observability operators.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc create namespace netobserv || trueoc create namespace openshift-netobserv-operator || trueoc create namespace openshift-operators-redhat || truecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata: name: openshift-netobserv-operator namespace: openshift-netobserv-operatorspec: targetNamespaces: []---apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata: name: openshift-operators-redhat namespace: openshift-operators-redhatspec: targetNamespaces: []YAMLcat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata: name: netobserv-operator namespace: openshift-netobserv-operatorspec: channel: stable installPlanApproval: Automatic name: netobserv-operator source: cs-redhat-operator-index-v4-20 sourceNamespace: openshift-marketplace---apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata: name: loki-operator namespace: openshift-operators-redhatspec: channel: stable-6.2 installPlanApproval: Automatic name: loki-operator source: cs-redhat-operator-index-v4-20 sourceNamespace: openshift-marketplaceYAMLEOFCreate the ODF-backed object bucket, derive the Loki object storage
secret from the generated NooBaa credentials, and then apply LokiStack and
FlowCollector.
For external ODF profiles, make sure the NetObserv storage and scheduling values come from the effective ODF variables. External mode does not create internal storage nodes, so inherited internal-ODF node selectors or tolerations should be cleared unless the profile deliberately adds equivalent labels and taints.
# Create the object bucket, derive the Loki credentials, and deploy Network Observability.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigcat <<'YAML' | oc apply -f -apiVersion: objectbucket.io/v1alpha1kind: ObjectBucketClaimmetadata: name: netobserv-loki namespace: netobservspec: generateBucketName: netobserv-loki- storageClassName: openshift-storage.noobaa.ioYAMLuntil [ "$(oc get obc netobserv-loki -n netobserv -o jsonpath='{.status.phase}' 2>/dev/null)" = "Bound" ]; do sleep 5doneaccess_key="$(oc get secret netobserv-loki -n netobserv -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d)"secret_key="$(oc get secret netobserv-loki -n netobserv -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 -d)"bucket_name="$(oc get configmap netobserv-loki -n netobserv -o jsonpath='{.data.BUCKET_NAME}')"bucket_host="$(oc get configmap netobserv-loki -n netobserv -o jsonpath='{.data.BUCKET_HOST}')"bucket_port="$(oc get configmap netobserv-loki -n netobserv -o jsonpath='{.data.BUCKET_PORT}')"cat <<YAML | oc apply -f -apiVersion: v1kind: Secretmetadata: name: loki-object-storage namespace: netobservstringData: bucketnames: ${bucket_name} endpoint: https://${bucket_host}:${bucket_port} access_key_id: ${access_key} access_key_secret: ${secret_key} region: us-east-1type: OpaqueYAMLcat <<'YAML' | oc apply -f -apiVersion: loki.grafana.com/v1kind: LokiStackmetadata: name: netobserv-loki namespace: netobservspec: size: 1x.extra-small storage: schemas: - effectiveDate: "2024-01-01" version: v13 secret: name: loki-object-storage type: s3 storageClassName: ocs-storagecluster-ceph-rbd tenants: mode: openshift-networkYAMLcat <<'YAML' | oc apply -f -apiVersion: flows.netobserv.io/v1beta2kind: FlowCollectormetadata: name: clusterspec: namespace: netobserv deploymentModel: Service agent: type: eBPF ebpf: sampling: 100 privileged: true features: - PacketDrop - DNSTracking - FlowRTT - NetworkEvents - PacketTranslation excludeInterfaces: - lo processor: consumerReplicas: 1 subnetLabels: openShiftAutoDetect: true customLabels: - name: EXT:management cidrs: [172.16.0.0/24] - name: EXT:data300 cidrs: [172.16.20.0/24] - name: EXT:data301 cidrs: [172.16.21.0/24] - name: EXT:isolated302 cidrs: [172.16.22.0/24] metrics: disableAlerts: false consolePlugin: enable: true networkPolicy: enable: true prometheus: querier: enable: true loki: enable: true mode: LokiStack lokiStack: name: netobserv-loki namespace: netobservYAMLEOF30. Install Ansible Automation Platform
Install Ansible Automation Platform on OpenShift and configure it to authenticate through Keycloak OIDC backed by IdM.
Note
Automation reference: playbooks/day2/openshift-post-install-aap.yml, role
openshift_post_install_aap.
Architecture reference: AUTH MODEL.
Install AAP on OpenShift and wire it to the same Keycloak realm already used for the cluster OAuth path.
Use the effective ODF RBD storage class for the AAP database. Internal ODF
usually resolves to ocs-storagecluster-ceph-rbd; external ODF may resolve to
an imported class such as ocs-external-storagecluster-ceph-rbd.
# Install Ansible Automation Platform and its operator dependencies.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc create namespace aap || truecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1kind: OperatorGroupmetadata: name: aap namespace: aapspec: targetNamespaces: - aap---apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata: name: ansible-automation-platform-operator namespace: aapspec: channel: stable-2.6 installPlanApproval: Automatic name: ansible-automation-platform-operator source: cs-redhat-operator-index-v4-20 sourceNamespace: openshift-marketplaceYAMLcat <<'YAML' | oc apply -f -apiVersion: v1kind: Secretmetadata: name: workshop-aap-admin-password namespace: aapstringData: password: <lab-default-password>---apiVersion: v1kind: Secretmetadata: name: workshop-aap-idm-ca namespace: aapstringData: bundle-ca.crt: |YAMLEOFAppend the IdM CA and create the AAP instance.
# Install the IdM CA into the cluster for AAP trust.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 \ "curl -fsSL http://172.16.0.10/ipa/config/ca.crt >>/tmp/aap-idm-ca.yaml"ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc apply -f /tmp/aap-idm-ca.yamlcat <<'YAML' | oc apply -f -apiVersion: aap.ansible.com/v1alpha1kind: AnsibleAutomationPlatformmetadata: name: workshop-aap namespace: aapspec: admin_user: admin admin_password_secret: workshop-aap-admin-password postgres_storage_class: ocs-storagecluster-ceph-rbd postgres_storage_requirements: requests: storage: 20GiYAMLEOFConfigure the Keycloak aap client, add the groups and aap audience
protocol mappers, then create the AAP gateway authenticator and superuser map.
The validated clean-build path uses:
- AAP route:
https://aap.apps.ocp.workshop.lan - Keycloak route:
https://sso.apps.ocp.workshop.lan - Keycloak realm:
openshift - AAP client ID:
aap - AAP authenticator name:
Red Hat build of Keycloak - required AAP admin group:
access-openshift-admin
# Configure Keycloak SSO for AAP.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigAAP_ROUTE="$(oc -n aap get route workshop-aap -o jsonpath='{.spec.host}')"KEYCLOAK_ROUTE="$(oc -n keycloak get route workshop-keycloak -o jsonpath='{.spec.host}')"KEYCLOAK_ADMIN_TOKEN="$(curl --cacert /etc/ipa/ca.crt -sS \ -X POST https://${KEYCLOAK_ROUTE}/realms/master/protocol/openid-connect/token \ -H 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode 'grant_type=password' \ --data-urlencode 'client_id=admin-cli' \ --data-urlencode 'username=admin' \ --data-urlencode 'password=<lab-default-password>' | jq -r .access_token)"CLIENT_ID="$(curl --cacert /etc/ipa/ca.crt -sS \ -H \"Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}\" \ \"https://${KEYCLOAK_ROUTE}/admin/realms/openshift/clients?clientId=aap\" \ | jq -r '.[0].id')"if [ -z "${CLIENT_ID}" ] || [ "${CLIENT_ID}" = "null" ]; then curl --cacert /etc/ipa/ca.crt -sS \ -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" \ -H 'Content-Type: application/json' \ -X POST "https://${KEYCLOAK_ROUTE}/admin/realms/openshift/clients" \ -d '{ "clientId":"aap", "enabled":true, "protocol":"openid-connect", "publicClient":false, "standardFlowEnabled":true, "directAccessGrantsEnabled":true, "serviceAccountsEnabled":false, "secret":"<lab-default-password>", "redirectUris":[ "https://aap.apps.ocp.workshop.lan/*" ] }' CLIENT_ID="$(curl --cacert /etc/ipa/ca.crt -sS \ -H \"Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}\" \ \"https://${KEYCLOAK_ROUTE}/admin/realms/openshift/clients?clientId=aap\" \ | jq -r '.[0].id')"ficurl --cacert /etc/ipa/ca.crt -sS \ -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" \ -H 'Content-Type: application/json' \ -X POST "https://${KEYCLOAK_ROUTE}/admin/realms/openshift/clients/${CLIENT_ID}/protocol-mappers/models" \ -d '{ "name":"groups", "protocol":"openid-connect", "protocolMapper":"oidc-group-membership-mapper", "consentRequired":false, "config":{ "full.path":"false", "id.token.claim":"true", "access.token.claim":"true", "userinfo.token.claim":"true", "claim.name":"groups" } }' || truecurl --cacert /etc/ipa/ca.crt -sS \ -H "Authorization: Bearer ${KEYCLOAK_ADMIN_TOKEN}" \ -H 'Content-Type: application/json' \ -X POST "https://${KEYCLOAK_ROUTE}/admin/realms/openshift/clients/${CLIENT_ID}/protocol-mappers/models" \ -d '{ "name":"aap-audience", "protocol":"openid-connect", "protocolMapper":"oidc-audience-mapper", "consentRequired":false, "config":{ "included.client.audience":"aap", "id.token.claim":"true", "access.token.claim":"true" } }' || trueTOKEN="$(curl -sk -X POST https://${AAP_ROUTE}/api/gateway/v1/token/ \ -H 'Content-Type: application/json' \ -d '{"username":"admin","password":"<lab-default-password>"}' | jq -r .access)"REALM_PUBLIC_KEY="$(curl --cacert /etc/ipa/ca.crt -sS \ "https://${KEYCLOAK_ROUTE}/realms/openshift" | jq -r .public_key)"AAP_AUTH_PAYLOAD="$(jq -n \ --arg public_key "${REALM_PUBLIC_KEY}" ' { name: "Red Hat build of Keycloak", enabled: true, order: 2, type: "ansible_base.authentication.authenticator_plugins.keycloak", configuration: { AUTHORIZATION_URL: "https://sso.apps.ocp.workshop.lan/realms/openshift/protocol/openid-connect/auth", ACCESS_TOKEN_URL: "https://sso.apps.ocp.workshop.lan/realms/openshift/protocol/openid-connect/token", KEY: "aap", SECRET: "<lab-default-password>", PUBLIC_KEY: $public_key, GROUPS_CLAIM: "groups" } }')"curl -sk -X POST https://${AAP_ROUTE}/api/gateway/v1/authenticators/ \ -H "Authorization: Bearer ${TOKEN}" \ -H 'Content-Type: application/json' \ -d "${AAP_AUTH_PAYLOAD}"AUTH_ID="$(curl -sk -H "Authorization: Bearer ${TOKEN}" \ "https://${AAP_ROUTE}/api/gateway/v1/authenticators/" \ | jq -r '.results[] | select(.name=="Red Hat build of Keycloak") | .id')"curl -sk -X POST https://${AAP_ROUTE}/api/gateway/v1/authenticator_maps/ \ -H "Authorization: Bearer ${TOKEN}" \ -H 'Content-Type: application/json' \ -d @- <<JSON{ "name": "access-openshift-admin AAP superuser", "map_type": "is_superuser", "triggers": { "groups": { "has_or": [ "access-openshift-admin" ] } }, "authenticator": ${AUTH_ID}}JSONEOFThe automation writes the full JSON payload and drives this through the API directly; the manual runbook keeps the moving parts visible instead of hiding them in a helper script.
Validate the end state with the same two checkpoints the automation now uses before the final browser-style login proof:
- the AAP UI advertises only the Keycloak SSO entry
- an AD-backed user can obtain an OIDC token for the
aapclient with the expected group claims
If the lab trust path is enabled, the validated user is
ad-ocpadmin@corp.lan. Without AD trust, use the native IdM admin-path user
instead.
# Validate the AAP SSO entry and token flow.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigAAP_ROUTE="$(oc -n aap get route workshop-aap -o jsonpath='{.spec.host}')"KEYCLOAK_ROUTE="$(oc -n keycloak get route workshop-keycloak -o jsonpath='{.spec.host}')"curl -sk "https://${AAP_ROUTE}/api/gateway/v1/ui_auth/" | jq .curl --cacert /etc/ipa/ca.crt -sS \ -X POST "https://${KEYCLOAK_ROUTE}/realms/openshift/protocol/openid-connect/token" \ -H 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode 'client_id=aap' \ --data-urlencode 'client_secret=<lab-default-password>' \ --data-urlencode 'grant_type=password' \ --data-urlencode 'username=ad-ocpadmin@corp.lan' \ --data-urlencode 'password=<lab-default-password>' \ | jq .EOF31. Install OpenShift Pipelines
Install OpenShift Pipelines and prepare the Windows EFI image-build lane.
Note
Automation reference: playbooks/day2/openshift-post-install-pipelines.yml,
role openshift_post_install_pipelines.
Install Tekton, make sure there is a default storage class, and install the Windows EFI installer pipeline.
# Install OpenShift Pipelines and the Windows builder pipeline.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc annotate storageclass ocs-storagecluster-ceph-rbd \ storageclass.kubernetes.io/is-default-class=true --overwritecat <<'YAML' | oc apply -f -apiVersion: operators.coreos.com/v1alpha1kind: Subscriptionmetadata: name: openshift-pipelines-operator-rh namespace: openshift-operatorsspec: channel: pipelines-1.20 installPlanApproval: Automatic name: openshift-pipelines-operator-rh source: cs-redhat-operator-index-v4-20 sourceNamespace: openshift-marketplaceYAMLoc create namespace windows-image-builder || trueoc adm policy add-role-to-user edit system:serviceaccount:windows-image-builder:pipeline -n windows-image-buildercurl -L \ https://raw.githubusercontent.com/openshift-pipelines/tektoncd-catalog/p/pipelines/windows-efi-installer/4.20.7/windows-efi-installer.yaml \ | oc apply -n windows-image-builder -f -EOF32. Launch A Windows EFI Build
Launch the Windows Server image-build PipelineRun manually after the Pipelines lane is in place.
Note
Automation reference: playbooks/day2/openshift-windows-server-build.yml,
role openshift_windows_server_build.
Set a real Windows ISO URL, then apply the PipelineRun directly.
# Launch the Windows EFI image build.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigcat <<'YAML' | oc apply -f -apiVersion: tekton.dev/v1kind: PipelineRunmetadata: name: windows2k22-efi-installer-run namespace: windows-image-builderspec: pipelineRef: name: windows-efi-installer params: - name: winImageDownloadURL value: REPLACE_WITH_WINDOWS_SERVER_ISO_URL - name: acceptEula value: "true" - name: autounattendXMLConfigMapsURL value: https://raw.githubusercontent.com/rh-ecosystem-edge/windows-machine-config-bootstrapper/main/configmaps/ - name: instanceTypeName value: u1.large - name: instanceTypeKind value: VirtualMachineClusterInstancetype - name: preferenceName value: windows.2k22.virtio - name: virtualMachinePreferenceKind value: VirtualMachineClusterPreference - name: autounattendConfigMapName value: windows2k22-autounattend - name: virtioContainerDiskName value: quay.io/kubevirt/virtio-container-disk:centos-stream9 - name: baseDvName value: win2k22 - name: isoDVName value: win2k22-install - name: useBiosMode value: "false" taskRunTemplate: serviceAccountName: pipelineYAMLoc get pipelinerun windows2k22-efi-installer-run -n windows-image-builderEOF33. Pivot OperatorHub To The Disconnected Catalog
Pivot OperatorHub to the disconnected catalogs produced by the mirror phase.
Note
Automation reference:
playbooks/day2/openshift-disconnected-operatorhub.yml, role
openshift_disconnected_operatorhub.
Important
In the automated path this runs before any operator subscriptions
(sections 25-32). If you are walking the manual process in order, you have
already been using the disconnected catalog source names
(cs-redhat-operator-index-v4-20). This section exists for reference and for
rebuilds where the pivot needs to be reapplied. All subsequent
operator installs (sections 25-32) use the disconnected catalog source names
(cs-redhat-operator-index-v4-20, cc-redhat-operator-index-v4-20) instead
of the upstream redhat-operators / community-operators defaults.
Disable the default remote catalogs, merge mirror-registry auth into the
cluster pull secret, attach a dedicated pull secret to the mirrored catalog
pods, and wait for the mirrored sources to become READY.
# Pivot OperatorHub to the disconnected catalog sources.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigREGISTRY_HOST="mirror-registry.workshop.lan:8443"REGISTRY_AUTH="$(printf '%s' 'init:<lab-default-password>' | base64 -w0)"cat <<'YAML' | oc apply -f -apiVersion: config.openshift.io/v1kind: OperatorHubmetadata: name: clusterspec: disableAllDefaultSources: trueYAMLfor node in $(oc get nodes -o jsonpath='{.items[*].metadata.name}'); do oc debug "node/${node}" --quiet -- chroot /host getent ahostsv4 mirror-registry.workshop.lan | grep -q '^172.16.0.20\b'doneoc extract secret/pull-secret -n openshift-config --to=/tmp/pull-secret --confirmjq --arg auth "${REGISTRY_AUTH}" '.auths["'"${REGISTRY_HOST}"'"] = {"auth":$auth,"email":"init@workshop.lan"}' /tmp/pull-secret/.dockerconfigjson >/tmp/pull-secret-merged.jsonoc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=/tmp/pull-secret-merged.jsoncat >/tmp/mirror-registry-auth.json <<JSON{ "auths": { "${REGISTRY_HOST}": { "auth": "${REGISTRY_AUTH}", "email": "init@workshop.lan" } }}JSONoc create secret generic mirror-registry-catalog-pull-secret -n openshift-marketplace --type=kubernetes.io/dockerconfigjson --from-file=.dockerconfigjson=/tmp/mirror-registry-auth.json --dry-run=client -o yaml | oc apply -f -for manifest in /opt/openshift/oc-mirror/working-dir/cluster-resources/cs-*.yaml; do oc apply -f "$manifest"donefor catalog in redhat-operators certified-operators community-operators redhat-marketplace; do oc delete catalogsource "$catalog" -n openshift-marketplace --ignore-not-found=truedonefor clustercatalog in openshift-redhat-operators openshift-certified-operators openshift-community-operators openshift-redhat-marketplace; do oc patch clustercatalog "$clustercatalog" --type=merge -p '{"spec":{"availabilityMode":"Unavailable"}}' || truedonefor catalog in cs-redhat-operator-index-v4-20; do oc patch catalogsource "$catalog" -n openshift-marketplace --type=merge -p '{"spec":{"secrets":["mirror-registry-catalog-pull-secret"]}}' oc patch serviceaccount "$catalog" -n openshift-marketplace --type=merge -p '{"imagePullSecrets":[{"name":"mirror-registry-catalog-pull-secret"}]}' oc delete pod -n openshift-marketplace -l "olm.catalogSource=${catalog}" --ignore-not-found=true until [ "$(oc get catalogsource "$catalog" -n openshift-marketplace -o jsonpath='{.status.connectionState.lastObservedState}' 2>/dev/null)" = "READY" ]; do sleep 10 donedonefor pkg in kubernetes-nmstate-operator local-storage-operator; do until [ "$(oc get packagemanifest "$pkg" -n openshift-marketplace -o jsonpath='{.status.catalogSource}' 2>/dev/null)" = "cs-redhat-operator-index-v4-20" ]; do sleep 10 donedoneEOF34. Roll Out An IdM Ingress Certificate
Roll out the IdM-issued wildcard ingress certificate early in day-2 so later work lands on the final TLS state.
Note
Automation reference: playbooks/day2/openshift-post-install-idm-certs.yml,
role openshift_post_install_idm_certs.
Warning
Ordering matters. In the automated path, the IdM ingress cert pivot runs
early (phase 3, after infra conversion but before LDAP). Applying it late
causes extended CO degradation — console CO health checks failed for 28
minutes in the first live run. See issue 44a51e8 in the issues ledger.
The supported certificate customization path is the ingress wildcard, not the cluster API serving certificate.
# Roll out the IdM wildcard ingress certificate.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigssh -i /opt/openshift/secrets/hypervisor-admin.key cloud-user@172.16.0.10 <<'INNER'sudo -ikinit admin <<< '<lab-default-password>'ipa dnsrecord-add ocp.workshop.lan apps --a-rec=172.16.10.7 || trueipa service-add HTTP/apps.ocp.workshop.lan || truecat <<'PROFILE' >/root/ocp-wildcard-ingress-profile.cfgauth.instance_id=raCertAuthclassId=caEnrollImpldesc=OpenShift wildcard ingress certificate profileenable=trueenableBy=iparainput.i1.class_id=certReqInputImplinput.i2.class_id=submitterInfoInputImplinput.list=i1,i2name=OpenShift Wildcard Ingress Certificate Enrollmentoutput.list=o1output.o1.class_id=certOutputImplpolicyset.list=serverCertSetpolicyset.serverCertSet.1.constraint.class_id=subjectNameConstraintImplpolicyset.serverCertSet.1.constraint.name=Subject Name Constraintpolicyset.serverCertSet.1.constraint.params.accept=truepolicyset.serverCertSet.1.constraint.params.pattern=CN=[^,]+,.+policyset.serverCertSet.1.default.class_id=subjectNameDefaultImplpolicyset.serverCertSet.1.default.name=Subject Name Defaultpolicyset.serverCertSet.1.default.params.name=CN=$request.req_subject_name.cn$, O=WORKSHOP.LANpolicyset.serverCertSet.10.constraint.class_id=noConstraintImplpolicyset.serverCertSet.10.constraint.name=No Constraintpolicyset.serverCertSet.10.default.class_id=subjectKeyIdentifierExtDefaultImplpolicyset.serverCertSet.10.default.name=Subject Key Identifier Extension Defaultpolicyset.serverCertSet.10.default.params.critical=falsepolicyset.serverCertSet.11.constraint.class_id=noConstraintImplpolicyset.serverCertSet.11.constraint.name=No Constraintpolicyset.serverCertSet.11.default.class_id=userExtensionDefaultImplpolicyset.serverCertSet.11.default.name=User Supplied Extension Defaultpolicyset.serverCertSet.11.default.params.userExtOID=2.5.29.17policyset.serverCertSet.12.constraint.class_id=noConstraintImplpolicyset.serverCertSet.12.constraint.name=No Constraintpolicyset.serverCertSet.12.default.class_id=commonNameToSANDefaultImplpolicyset.serverCertSet.12.default.name=Copy Common Name to Subject Alternative Namepolicyset.serverCertSet.2.constraint.class_id=validityConstraintImplpolicyset.serverCertSet.2.constraint.name=Validity Constraintpolicyset.serverCertSet.2.constraint.params.notAfterCheck=falsepolicyset.serverCertSet.2.constraint.params.notBeforeCheck=falsepolicyset.serverCertSet.2.constraint.params.range=740policyset.serverCertSet.2.default.class_id=validityDefaultImplpolicyset.serverCertSet.2.default.name=Validity Defaultpolicyset.serverCertSet.2.default.params.range=731policyset.serverCertSet.2.default.params.startTime=0policyset.serverCertSet.3.constraint.class_id=keyConstraintImplpolicyset.serverCertSet.3.constraint.name=Key Constraintpolicyset.serverCertSet.3.constraint.params.keyParameters=1024,2048,3072,4096,8192policyset.serverCertSet.3.constraint.params.keyType=RSApolicyset.serverCertSet.3.default.class_id=userKeyDefaultImplpolicyset.serverCertSet.3.default.name=Key Defaultpolicyset.serverCertSet.4.constraint.class_id=noConstraintImplpolicyset.serverCertSet.4.constraint.name=No Constraintpolicyset.serverCertSet.4.default.class_id=authorityKeyIdentifierExtDefaultImplpolicyset.serverCertSet.4.default.name=Authority Key Identifier Defaultpolicyset.serverCertSet.5.constraint.class_id=noConstraintImplpolicyset.serverCertSet.5.constraint.name=No Constraintpolicyset.serverCertSet.5.default.class_id=authInfoAccessExtDefaultImplpolicyset.serverCertSet.5.default.name=AIA Extension Defaultpolicyset.serverCertSet.5.default.params.authInfoAccessADEnable_0=truepolicyset.serverCertSet.5.default.params.authInfoAccessADLocationType_0=URINamepolicyset.serverCertSet.5.default.params.authInfoAccessADLocation_0=http://ipa-ca.workshop.lan/ca/ocsppolicyset.serverCertSet.5.default.params.authInfoAccessADMethod_0=1.3.6.1.5.5.7.48.1policyset.serverCertSet.5.default.params.authInfoAccessCritical=falsepolicyset.serverCertSet.5.default.params.authInfoAccessNumADs=1policyset.serverCertSet.6.constraint.class_id=keyUsageExtConstraintImplpolicyset.serverCertSet.6.constraint.name=Key Usage Extension Constraintpolicyset.serverCertSet.6.constraint.params.keyUsageCritical=truepolicyset.serverCertSet.6.constraint.params.keyUsageCrlSign=falsepolicyset.serverCertSet.6.constraint.params.keyUsageDataEncipherment=truepolicyset.serverCertSet.6.constraint.params.keyUsageDecipherOnly=falsepolicyset.serverCertSet.6.constraint.params.keyUsageDigitalSignature=truepolicyset.serverCertSet.6.constraint.params.keyUsageEncipherOnly=falsepolicyset.serverCertSet.6.constraint.params.keyUsageKeyAgreement=falsepolicyset.serverCertSet.6.constraint.params.keyUsageKeyCertSign=falsepolicyset.serverCertSet.6.constraint.params.keyUsageKeyEncipherment=truepolicyset.serverCertSet.6.constraint.params.keyUsageNonRepudiation=truepolicyset.serverCertSet.6.default.class_id=keyUsageExtDefaultImplpolicyset.serverCertSet.6.default.name=Key Usage Defaultpolicyset.serverCertSet.6.default.params.keyUsageCritical=truepolicyset.serverCertSet.6.default.params.keyUsageCrlSign=falsepolicyset.serverCertSet.6.default.params.keyUsageDataEncipherment=truepolicyset.serverCertSet.6.default.params.keyUsageDecipherOnly=falsepolicyset.serverCertSet.6.default.params.keyUsageDigitalSignature=truepolicyset.serverCertSet.6.default.params.keyUsageEncipherOnly=falsepolicyset.serverCertSet.6.default.params.keyUsageKeyAgreement=falsepolicyset.serverCertSet.6.default.params.keyUsageKeyCertSign=falsepolicyset.serverCertSet.6.default.params.keyUsageKeyEncipherment=truepolicyset.serverCertSet.6.default.params.keyUsageNonRepudiation=truepolicyset.serverCertSet.7.constraint.class_id=noConstraintImplpolicyset.serverCertSet.7.constraint.name=No Constraintpolicyset.serverCertSet.7.default.class_id=extendedKeyUsageExtDefaultImplpolicyset.serverCertSet.7.default.name=Extended Key Usage Extension Defaultpolicyset.serverCertSet.7.default.params.exKeyUsageCritical=falsepolicyset.serverCertSet.7.default.params.exKeyUsageOIDs=1.3.6.1.5.5.7.3.1,1.3.6.1.5.5.7.3.2policyset.serverCertSet.8.constraint.class_id=signingAlgConstraintImplpolicyset.serverCertSet.8.constraint.name=No Constraintpolicyset.serverCertSet.8.constraint.params.signingAlgsAllowed=SHA1withRSA,SHA256withRSA,SHA384withRSA,SHA512withRSA,MD5withRSA,MD2withRSA,SHA1withDSA,SHA1withEC,SHA256withEC,SHA384withEC,SHA512withECpolicyset.serverCertSet.8.default.class_id=signingAlgDefaultImplpolicyset.serverCertSet.8.default.name=Signing Algpolicyset.serverCertSet.8.default.params.signingAlg=-policyset.serverCertSet.9.constraint.class_id=noConstraintImplpolicyset.serverCertSet.9.constraint.name=No Constraintpolicyset.serverCertSet.9.default.class_id=crlDistributionPointsExtDefaultImplpolicyset.serverCertSet.9.default.name=CRL Distribution Points Extension Defaultpolicyset.serverCertSet.9.default.params.crlDistPointsCritical=falsepolicyset.serverCertSet.9.default.params.crlDistPointsEnable_0=truepolicyset.serverCertSet.9.default.params.crlDistPointsIssuerName_0=CN=Certificate Authority,o=ipacapolicyset.serverCertSet.9.default.params.crlDistPointsIssuerType_0=DirectoryNamepolicyset.serverCertSet.9.default.params.crlDistPointsNum=1policyset.serverCertSet.9.default.params.crlDistPointsPointName_0=http://ipa-ca.workshop.lan/ipa/crl/MasterCRL.binpolicyset.serverCertSet.9.default.params.crlDistPointsPointType_0=URINamepolicyset.serverCertSet.9.default.params.crlDistPointsReasons_0=policyset.serverCertSet.list=1,2,3,4,5,6,7,8,9,10,11,12profileId=ocpWildcardIngressvisible=falsePROFILEipa certprofile-show ocpWildcardIngress >/dev/null 2>&1 || \ ipa certprofile-import ocpWildcardIngress \ --file /root/ocp-wildcard-ingress-profile.cfg \ --desc "OpenShift wildcard ingress certificate profile" \ --store=trueINNEREOFopenssl req -new -newkey rsa:2048 -nodes \ -keyout /tmp/apps.key \ -out /tmp/apps.csr \ -subj '/CN=apps.ocp.workshop.lan' \ -addext 'subjectAltName=DNS:apps.ocp.workshop.lan,DNS:*.apps.ocp.workshop.lan'scp -i /opt/openshift/secrets/hypervisor-admin.key \ /tmp/apps.csr cloud-user@172.16.0.10:/tmp/apps.csrssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigssh -i /opt/openshift/secrets/hypervisor-admin.key cloud-user@172.16.0.10 <<'INNER'sudo -ikinit admin <<< '<lab-default-password>'ipa cert-request /tmp/apps.csr \ --principal=HTTP/apps.ocp.workshop.lan \ --profile-id=ocpWildcardIngress \ --certificate-out=/tmp/apps.crtINNEREOFscp -i /opt/openshift/secrets/hypervisor-admin.key \ cloud-user@172.16.0.10:/tmp/apps.crt /tmp/apps.crtssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'export KUBECONFIG=/var/tmp/ocp-kubeconfigoc -n openshift-config create configmap idm-ca-trust \ --from-file=ca-bundle.crt=/var/tmp/idm-ca.crt \ --dry-run=client -o yaml | oc apply -f -oc patch proxy cluster --type=merge \ -p '{"spec":{"trustedCA":{"name":"idm-ca-trust"}}}'oc -n openshift-ingress create secret tls ingress-default-idm-tls \ --cert=/var/tmp/apps.crt \ --key=/var/tmp/apps.key \ --dry-run=client -o yaml | oc apply -f -cat <<'YAML' | oc apply -f -apiVersion: operator.openshift.io/v1kind: IngressControllermetadata: name: default namespace: openshift-ingress-operatorspec: defaultCertificate: name: ingress-default-idm-tlsYAMLEOF35. Cleanup
Use cleanup intentionally: either destroy the full lab or, more commonly, destroy only the OpenShift cluster and preserve the healthy support services.
Note
Automation reference: playbooks/maintenance/cleanup.yml and the cleanup
roles it aggregates.
Caution
This is destructive and not reversible. It destroys VMs and wipes disks. The mirror-registry archive and all OpenShift cluster state will be gone. If you only want to rebuild the cluster, preserve support services and use the cluster-only cleanup path instead of the full cleanup.
Important
For a true fresh support-services redeploy, removing the support VMs is not
enough. Also wipe the support guest block devices (/dev/ebs/bastion-01,
/dev/ebs/ad-01, /dev/ebs/idm-01, and /dev/ebs/mirror-registry) before
replaying ./scripts/run_local_playbook.sh
playbooks/site-bootstrap.yml,
otherwise the next run can inherit stale guest state.
Automation shortcut for the preferred fresh-cluster rebuild:
# Destroy only the OpenShift cluster and preserve healthy support services.ansible-playbook -i inventory/hosts.yml playbooks/maintenance/cleanup.yml \ -e cleanup_destroy_openshift_cluster=true \ -e cleanup_wipe_openshift_cluster_block_devices=true./scripts/run_remote_bastion_playbook.sh playbooks/site-lab.yml \ -e lab_default_password='<lab-default-password>'Destroy the OpenShift cluster shells, optionally wipe the disks, and clean up the support VM and lab-switch state.
# Wipe the support-service disks on virt-01.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'for domain in \ ocp-master-01.ocp.workshop.lan \ ocp-master-02.ocp.workshop.lan \ ocp-master-03.ocp.workshop.lan \ ocp-infra-01.ocp.workshop.lan \ ocp-infra-02.ocp.workshop.lan \ ocp-infra-03.ocp.workshop.lan \ ocp-worker-01.ocp.workshop.lan \ ocp-worker-02.ocp.workshop.lan \ ocp-worker-03.ocp.workshop.lan; do virsh destroy "$domain" || true virsh undefine "$domain" --nvram || truedonefor disk in \ /dev/ebs/ocp-master-01 /dev/ebs/ocp-master-02 /dev/ebs/ocp-master-03 \ /dev/ebs/ocp-infra-01 /dev/ebs/ocp-infra-02 /dev/ebs/ocp-infra-03 \ /dev/ebs/ocp-worker-01 /dev/ebs/ocp-worker-02 /dev/ebs/ocp-worker-03 \ /dev/ebs/ocp-infra-01-data /dev/ebs/ocp-infra-02-data /dev/ebs/ocp-infra-03-data; do wipefs -a "$disk" || true dd if=/dev/zero of="$disk" bs=1M count=100 oflag=direct,dsync status=progress || truedoneEOFrm -rf /opt/openshift/generated/ocpWhen tearing all the way back to the post-OVS support-services boundary, wipe the support guest block devices too:
# Wipe the support-service disks on virt-01.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 <<'EOF'for disk in /dev/ebs/bastion-01 /dev/ebs/ad-01 /dev/ebs/idm-01 /dev/ebs/mirror-registry; do wipefs -a "$disk" || true dd if=/dev/zero of="$disk" bs=1M count=100 oflag=direct,dsync status=none || truedoneEOF36. Manual Debugging Examples
These commands are useful when teaching or troubleshooting the manual process.
Check cluster status from the correct side of the network boundary:
# Check the basic cluster state.export KUBECONFIG=/opt/openshift/aws-metal-openshift-demo/generated/ocp/auth/kubeconfig/opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin/oc get clusterversion/opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin/oc get nodes/opt/openshift/aws-metal-openshift-demo/generated/tools/4.20.15/bin/oc get coCheck libvirt state on virt-01:
# Inspect the libvirt domain state on virt-01.ssh -i /opt/openshift/secrets/hypervisor-admin.key root@172.16.0.1 \ "virsh list --all"Check ODF storage state:
# Inspect the ODF status.oc -n openshift-storage get storageclusteroc -n openshift-storage get cephclusteroc -n openshift-local-storage get localvolumediscoveryoc -n openshift-local-storage get localvolumesetCheck NetObserv and Loki:
# Inspect the Network Observability status.oc -n netobserv get flowcollectoroc -n netobserv get lokistackoc -n netobserv get podsCheck AAP:
# Inspect the AAP status and login entry.oc -n aap get podsoc -n aap get routecurl -sk https://aap.apps.ocp.workshop.lan/api/gateway/v1/ui_auth/ | jq .Check Tekton and Windows build lane:
# Inspect the Pipelines and Windows builder status.oc -n openshift-pipelines get tektonconfigoc -n windows-image-builder get pipelineoc -n windows-image-builder get pipelinerun