On-Prem Manual Process
Warning
This on-prem target is experimental. Treat the docs and playbooks in this subtree as an emerging alternate installation path, not yet the same confidence level as the validated AWS-target flow.
This page covers only the on-prem-specific portion of the operator workflow. Once the host is prepared, guest storage exists, and bastion staging is done, return to the stock AWS MANUAL PROCESS.
Table Of Contents
- 1. Prepare The Operator Workstation
- 2. Verify The On-Prem
virt-01Host Contract - 3. Prepare The Guest Storage Volume Group
- 4. Configure The On-Prem Inventory And Group Vars
- 5. Bootstrap The Host And Provision Guest LVs
- 6. Build The Bastion And Stage The Project
- 7. Hand Back To The Stock Runbook
- 8. Current External-Ceph Day-2 Continuation
- 9. Clean OpenShift-Only Teardown
1. Prepare The Operator Workstation
The on-prem path uses the same controller-side secret and content inputs as the AWS path, except there is no public-cloud CLI or stack deployment step.
Required local inputs:
- repo checkout
- SSH keypair
- pull secret
- RHSM credentials
- optional local lab credentials
Keep these stock pages nearby:
Install the collection dependencies and syntax-check the on-prem entrypoints:
cd <project-root>/aws-metal-openshift-demoansible-galaxy collection install -r requirements.ymlcd <project-root>/on-prem-openshift-demoansible-playbook --syntax-check playbooks/site-bootstrap.ymlansible-playbook --syntax-check playbooks/site-precluster.ymlansible-playbook --syntax-check playbooks/site-lab.yml2. Verify The On-Prem virt-01 Host Contract
The on-prem target starts after the host already exists. Before bootstrap,
confirm that host can stand in for virt-01:
- SSH reachable from the operator workstation
- RHEL installed and updated to the desired baseline
- nested KVM available
- an uplink interface is present for OVS integration
- local storage is visible and the guest VG exists or can be created
- you know what bastion-side host and user you want to publish through:
on_prem_bastion_hypervisor_hoston_prem_bastion_hypervisor_user
Minimal verification:
ssh <hypervisor-admin-user>@<hypervisor-management-ip> <<'EOF'hostnamectl --staticsudo virt-host-validatesudo lsblksudo vgsEOFIf CPU, RAM, or NUMA shape differs from the validated m5.metal baseline,
read the host-sizing guidance before bootstrap:
3. Prepare The Guest Storage Volume Group
On-prem guest disks are created as logical volumes inside an operator-provided volume group.
The current lab footprint expects roughly:
5950 GiBof guest LV capacity for the full current design
That is only the raw guest-disk sum. Leave additional headroom for:
- host root growth
- image cache
- mirror-content staging
- rebuild hygiene
Example check:
ssh <hypervisor-admin-user>@<hypervisor-management-ip> <<'EOF'sudo vgssudo lvsEOFIf the volume group does not exist yet, create it before bootstrap using your site-local storage procedure. This repo does not own PV creation.
What the repo does own:
- validating the volume group exists
- validating free space before provisioning
- creating the missing guest LVs
- publishing the
/dev/ebs/*compatibility symlinks the stock guest roles use
If you want the on-prem subtree to seed a dedicated guest VG from one explicit lab disk, use the optional override inputs:
on_prem_lvm_seed_enabled: trueon_prem_lvm_seed_device: /dev/nvme0n1on_prem_lvm_seed_force: false
That path is opt-in and additive. It does not change the stock on-prem defaults, and it fails closed unless you explicitly enable it. When forced, it uses the same destructive whole-device wipe profile the project uses for ODF backing-disk recovery before creating the guest VG.
4. Configure The On-Prem Inventory And Group Vars
The current on-prem target keeps the stock aws_metal inventory-group name on
purpose so the existing support/day-2 playbooks do not need to fork.
Edit these files before the first run:
For hosts that should stop before cluster build, start from one of:
- inventory/overrides/core-services.yml.example
- inventory/overrides/core-services-ad.yml.example
- inventory/overrides/core-services-ad-128g.yml.example
- inventory/overrides/precluster-64g.yml.example
For the current cluster-capable external Ceph profile, start from:
Read OVERRIDE MECHANISM before copying or publishing that file. It explains the phase toggles, external ODF payload, storage class indirection, and resource sizing assumptions.
Use core-services-ad-128g.yml.example for the current ~128 GiB host class
when you want the core-services+AD footprint plus a managed 32G zram
writeback LV in calabi_lab_vg.
That override also enables a conservative periodic zram writeback policy:
- backing LV:
calabi_lab_vg/zram-writeback - backing LV size:
32G - policy mode:
huge - timer interval:
30m - per-run budget:
256 MiB
Use that profile as the current reference for the on-prem writeback-capable host-memory policy.
What must be correct:
ansible_hostansible_useransible_ssh_private_key_fileon_prem_lvm_volume_groupon_prem_bastion_hypervisor_hoston_prem_bastion_hypervisor_user- any optional
on_prem_lvm_lv_name_prefix - any project-local credential overrides
If you are using the reduced precluster-64g profile, copy the example
override and edit the actual device path before the first run:
cd <project-root>/on-prem-openshift-democp inventory/overrides/precluster-64g.yml.example \ inventory/overrides/precluster-64g.ymlAt this stage, the on-prem subtree reuses the stock guest and day-2 vars and
playbooks from aws-metal-openshift-demo through local wrappers. It does not
modify the AWS-target codepath.
5. Bootstrap The Host And Provision Guest LVs
This is the main on-prem divergence from the AWS path.
Run:
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/bootstrap/site.ymlFor the reduced pre-cluster profile:
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/bootstrap/site.yml \ -e @inventory/overrides/precluster-64g.ymlFor the support-services-only AD profile:
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/bootstrap/site.yml \ -e @inventory/overrides/core-services-ad.yml.exampleFor the current ~128 GiB host class with managed zram writeback:
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/bootstrap/site.yml \ -e @inventory/overrides/core-services-ad-128g.yml.exampleThis is the on-prem equivalent of the early AWS host steps:
- host base configuration
- host CPU and memory policy
- OVS / libvirt host setup
- guest base-image staging
- LVM guest LV validation and creation
/dev/ebs/*compatibility symlink publication
Note
The shared host bootstrap now updates redhat-release before the full system
update. This ensures the current Red Hat Post-Quantum Cryptography public
keys are present before DNF validates newer packages. See:
https://access.redhat.com/solutions/3449341
When it succeeds, the host should satisfy the same effective guest-disk contract the stock guest roles already expect.
Useful verification:
ssh <hypervisor-admin-user>@<hypervisor-management-ip> <<'EOF'sudo lvssudo ls -l /dev/ebsEOF6. Build The Bastion And Stage The Project
The current on-prem site-bootstrap.yml:
- runs the on-prem bootstrap host prep
- applies the baseline host memory oversubscription policy (
zram, THPmadvise, KSM) during the host bootstrap path - can optionally provision a dedicated zram writeback LV and policy timer when
an override such as
core-services-ad-128g.yml.exampleenables it - reuses the stock bastion build
- stages both the on-prem subtree and the stock AWS-target subtree onto bastion through the local on-prem bastion-stage wrapper
- rewrites the bastion-side runtime inventory so the bastion can SSH back to
the hypervisor without requiring
ec2-user
Run:
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/site-bootstrap.ymlFor the reduced pre-cluster profile:
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/site-bootstrap.yml \ -e @inventory/overrides/precluster-64g.ymlFor the support-services-only AD profile:
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/site-bootstrap.yml \ -e @inventory/overrides/core-services-ad.yml.exampleFor the current ~128 GiB host class with managed zram writeback:
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/site-bootstrap.yml \ -e @inventory/overrides/core-services-ad-128g.yml.exampleAfter this, the bastion should exist and the project should be staged.
Writeback caveats on the on-prem path:
- the managed-LVM writeback path assumes the local
calabi_lab_vgvolume group - the writeback LV must be dedicated to zram and is not counted as planned RAM
- the role fails fast if the configured writeback LV already exists at a different size
- the shipped policy uses
huge, which writes back pages that did not compress well. For broader cold-page relief,huge_idlealso sweeps idle compressible pages but requires kernel age-tracking support
7. Hand Back To The Stock Runbook
At this point, the on-prem-specific portion is over.
Choose the next stock runbook entry based on what you already completed:
- if you want the manual bastion-forward process:
- if you are still walking the support-services flow by hand from the bastion build:
For automation rather than the hand-run sequence on a cluster-capable host, use:
cd <project-root>/on-prem-openshift-demo./scripts/run_remote_bastion_playbook.sh playbooks/site-lab.yml \ -e @inventory/overrides/core-services-ad-plus-openshift-3node-external-ceph.yml.exampleFor support-services-only profiles such as core-services or
core-services-ad, stop after the support-service path instead of continuing
into cluster build:
cd <project-root>/on-prem-openshift-demo./scripts/run_remote_bastion_playbook.sh playbooks/site-precluster.yml \ -e @inventory/overrides/core-services-ad.yml.exampleOr, from the staged on-prem tree on bastion:
cd <staged-on-prem-project-root>./scripts/run_bastion_playbook.sh playbooks/site-precluster.yml \ -e @inventory/overrides/core-services-ad.yml.exampleThat path stops after:
- optional
ad-server idm- optional
idm-ad-trust bastion-joinmirror-registry
For the reduced precluster-64g profile, also stop at mirror-registry instead
of continuing into cluster build:
cd <staged-on-prem-project-root>./scripts/run_bastion_playbook.sh playbooks/site-precluster.yml \ -e @inventory/overrides/precluster-64g.yml8. Current External-Ceph Day-2 Continuation
If the OpenShift cluster exists and the support services are already healthy,
continue at the shared day-2 orchestration instead of replaying the full
site-lab.yml chain:
cd <project-root>/on-prem-openshift-demo./scripts/run_remote_bastion_playbook.sh \ ../aws-metal-openshift-demo/playbooks/day2/openshift-post-install.yml \ -e @inventory/overrides/core-services-ad-plus-openshift-3node-external-ceph.yml.exampleThe current external-Ceph profile intentionally enables disconnected OperatorHub, IdM ingress certs, breakglass auth, NMState, external ODF, Keycloak, OIDC auth, Web Terminal, AAP, NetObserv, and validation. It disables infra conversion, internal ODF, LDAP auth, OpenShift Virtualization, and Pipelines.
Leave the force_* variables false for normal continuation. Set a force flag
only when you are deliberately repairing or replacing that phase.
9. Clean OpenShift-Only Teardown
To tear down a broken OpenShift cluster while preserving healthy support services:
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh \ ../aws-metal-openshift-demo/playbooks/maintenance/cleanup.yml \ -e cleanup_destroy_openshift_cluster=true \ -e cleanup_wipe_openshift_cluster_block_devices=true \ -e @inventory/overrides/core-services-ad-plus-openshift-3node-external-ceph.yml.exampleThat path removes the OpenShift cluster domains, wipes the cluster block
devices when requested, and preserves AD, IdM, bastion, and mirror-registry.
After cleanup, rerun site-lab.yml with the same override to rebuild the
cluster and continue through the mirror, install, and day-2 flow.