Experimental

Experimental Path

Manual analog for the on-prem branch before the normal Calabi flow resumes.

Experimental. The on-prem path is an unvalidated developer sandbox. For the supported deployment, use the AWS docs map and follow the golden path from there.

On-Prem Manual Process

Warning

This on-prem target is experimental. Treat the docs and playbooks in this subtree as an emerging alternate installation path, not yet the same confidence level as the validated AWS-target flow.

This page covers only the on-prem-specific portion of the operator workflow. Once the host is prepared, guest storage exists, and bastion staging is done, return to the stock AWS MANUAL PROCESS.

Table Of Contents

1. Prepare The Operator Workstation

The on-prem path uses the same controller-side secret and content inputs as the AWS path, except there is no public-cloud CLI or stack deployment step.

Required local inputs:

  • repo checkout
  • SSH keypair
  • pull secret
  • RHSM credentials
  • optional local lab credentials

Keep these stock pages nearby:

Install the collection dependencies and syntax-check the on-prem entrypoints:

Shell
cd <project-root>/aws-metal-openshift-demoansible-galaxy collection install -r requirements.ymlcd <project-root>/on-prem-openshift-demoansible-playbook --syntax-check playbooks/site-bootstrap.ymlansible-playbook --syntax-check playbooks/site-precluster.ymlansible-playbook --syntax-check playbooks/site-lab.yml

2. Verify The On-Prem virt-01 Host Contract

The on-prem target starts after the host already exists. Before bootstrap, confirm that host can stand in for virt-01:

  • SSH reachable from the operator workstation
  • RHEL installed and updated to the desired baseline
  • nested KVM available
  • an uplink interface is present for OVS integration
  • local storage is visible and the guest VG exists or can be created
  • you know what bastion-side host and user you want to publish through:
    • on_prem_bastion_hypervisor_host
    • on_prem_bastion_hypervisor_user

Minimal verification:

Shell
ssh <hypervisor-admin-user>@<hypervisor-management-ip> <<'EOF'hostnamectl --staticsudo virt-host-validatesudo lsblksudo vgsEOF

If CPU, RAM, or NUMA shape differs from the validated m5.metal baseline, read the host-sizing guidance before bootstrap:

3. Prepare The Guest Storage Volume Group

On-prem guest disks are created as logical volumes inside an operator-provided volume group.

The current lab footprint expects roughly:

  • 5950 GiB of guest LV capacity for the full current design

That is only the raw guest-disk sum. Leave additional headroom for:

  • host root growth
  • image cache
  • mirror-content staging
  • rebuild hygiene

Example check:

Shell
ssh <hypervisor-admin-user>@<hypervisor-management-ip> <<'EOF'sudo vgssudo lvsEOF

If the volume group does not exist yet, create it before bootstrap using your site-local storage procedure. This repo does not own PV creation.

What the repo does own:

  • validating the volume group exists
  • validating free space before provisioning
  • creating the missing guest LVs
  • publishing the /dev/ebs/* compatibility symlinks the stock guest roles use

If you want the on-prem subtree to seed a dedicated guest VG from one explicit lab disk, use the optional override inputs:

  • on_prem_lvm_seed_enabled: true
  • on_prem_lvm_seed_device: /dev/nvme0n1
  • on_prem_lvm_seed_force: false

That path is opt-in and additive. It does not change the stock on-prem defaults, and it fails closed unless you explicitly enable it. When forced, it uses the same destructive whole-device wipe profile the project uses for ODF backing-disk recovery before creating the guest VG.

4. Configure The On-Prem Inventory And Group Vars

The current on-prem target keeps the stock aws_metal inventory-group name on purpose so the existing support/day-2 playbooks do not need to fork.

Edit these files before the first run:

For hosts that should stop before cluster build, start from one of:

For the current cluster-capable external Ceph profile, start from:

Read OVERRIDE MECHANISM before copying or publishing that file. It explains the phase toggles, external ODF payload, storage class indirection, and resource sizing assumptions.

Use core-services-ad-128g.yml.example for the current ~128 GiB host class when you want the core-services+AD footprint plus a managed 32G zram writeback LV in calabi_lab_vg.

That override also enables a conservative periodic zram writeback policy:

  • backing LV: calabi_lab_vg/zram-writeback
  • backing LV size: 32G
  • policy mode: huge
  • timer interval: 30m
  • per-run budget: 256 MiB

Use that profile as the current reference for the on-prem writeback-capable host-memory policy.

What must be correct:

  • ansible_host
  • ansible_user
  • ansible_ssh_private_key_file
  • on_prem_lvm_volume_group
  • on_prem_bastion_hypervisor_host
  • on_prem_bastion_hypervisor_user
  • any optional on_prem_lvm_lv_name_prefix
  • any project-local credential overrides

If you are using the reduced precluster-64g profile, copy the example override and edit the actual device path before the first run:

Shell
cd <project-root>/on-prem-openshift-democp inventory/overrides/precluster-64g.yml.example \  inventory/overrides/precluster-64g.yml

At this stage, the on-prem subtree reuses the stock guest and day-2 vars and playbooks from aws-metal-openshift-demo through local wrappers. It does not modify the AWS-target codepath.

5. Bootstrap The Host And Provision Guest LVs

This is the main on-prem divergence from the AWS path.

Run:

Shell
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/bootstrap/site.yml

For the reduced pre-cluster profile:

Shell
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/bootstrap/site.yml \  -e @inventory/overrides/precluster-64g.yml

For the support-services-only AD profile:

Shell
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/bootstrap/site.yml \  -e @inventory/overrides/core-services-ad.yml.example

For the current ~128 GiB host class with managed zram writeback:

Shell
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/bootstrap/site.yml \  -e @inventory/overrides/core-services-ad-128g.yml.example

This is the on-prem equivalent of the early AWS host steps:

  • host base configuration
  • host CPU and memory policy
  • OVS / libvirt host setup
  • guest base-image staging
  • LVM guest LV validation and creation
  • /dev/ebs/* compatibility symlink publication

Note

The shared host bootstrap now updates redhat-release before the full system update. This ensures the current Red Hat Post-Quantum Cryptography public keys are present before DNF validates newer packages. See: https://access.redhat.com/solutions/3449341

When it succeeds, the host should satisfy the same effective guest-disk contract the stock guest roles already expect.

Useful verification:

Shell
ssh <hypervisor-admin-user>@<hypervisor-management-ip> <<'EOF'sudo lvssudo ls -l /dev/ebsEOF

6. Build The Bastion And Stage The Project

The current on-prem site-bootstrap.yml:

  • runs the on-prem bootstrap host prep
  • applies the baseline host memory oversubscription policy (zram, THP madvise, KSM) during the host bootstrap path
  • can optionally provision a dedicated zram writeback LV and policy timer when an override such as core-services-ad-128g.yml.example enables it
  • reuses the stock bastion build
  • stages both the on-prem subtree and the stock AWS-target subtree onto bastion through the local on-prem bastion-stage wrapper
  • rewrites the bastion-side runtime inventory so the bastion can SSH back to the hypervisor without requiring ec2-user

Run:

Shell
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/site-bootstrap.yml

For the reduced pre-cluster profile:

Shell
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/site-bootstrap.yml \  -e @inventory/overrides/precluster-64g.yml

For the support-services-only AD profile:

Shell
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/site-bootstrap.yml \  -e @inventory/overrides/core-services-ad.yml.example

For the current ~128 GiB host class with managed zram writeback:

Shell
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh playbooks/site-bootstrap.yml \  -e @inventory/overrides/core-services-ad-128g.yml.example

After this, the bastion should exist and the project should be staged.

Writeback caveats on the on-prem path:

  • the managed-LVM writeback path assumes the local calabi_lab_vg volume group
  • the writeback LV must be dedicated to zram and is not counted as planned RAM
  • the role fails fast if the configured writeback LV already exists at a different size
  • the shipped policy uses huge, which writes back pages that did not compress well. For broader cold-page relief, huge_idle also sweeps idle compressible pages but requires kernel age-tracking support

7. Hand Back To The Stock Runbook

At this point, the on-prem-specific portion is over.

Choose the next stock runbook entry based on what you already completed:

For automation rather than the hand-run sequence on a cluster-capable host, use:

Shell
cd <project-root>/on-prem-openshift-demo./scripts/run_remote_bastion_playbook.sh playbooks/site-lab.yml \  -e @inventory/overrides/core-services-ad-plus-openshift-3node-external-ceph.yml.example

For support-services-only profiles such as core-services or core-services-ad, stop after the support-service path instead of continuing into cluster build:

Shell
cd <project-root>/on-prem-openshift-demo./scripts/run_remote_bastion_playbook.sh playbooks/site-precluster.yml \  -e @inventory/overrides/core-services-ad.yml.example

Or, from the staged on-prem tree on bastion:

Shell
cd <staged-on-prem-project-root>./scripts/run_bastion_playbook.sh playbooks/site-precluster.yml \  -e @inventory/overrides/core-services-ad.yml.example

That path stops after:

  • optional ad-server
  • idm
  • optional idm-ad-trust
  • bastion-join
  • mirror-registry

For the reduced precluster-64g profile, also stop at mirror-registry instead of continuing into cluster build:

Shell
cd <staged-on-prem-project-root>./scripts/run_bastion_playbook.sh playbooks/site-precluster.yml \  -e @inventory/overrides/precluster-64g.yml

8. Current External-Ceph Day-2 Continuation

If the OpenShift cluster exists and the support services are already healthy, continue at the shared day-2 orchestration instead of replaying the full site-lab.yml chain:

Shell
cd <project-root>/on-prem-openshift-demo./scripts/run_remote_bastion_playbook.sh \  ../aws-metal-openshift-demo/playbooks/day2/openshift-post-install.yml \  -e @inventory/overrides/core-services-ad-plus-openshift-3node-external-ceph.yml.example

The current external-Ceph profile intentionally enables disconnected OperatorHub, IdM ingress certs, breakglass auth, NMState, external ODF, Keycloak, OIDC auth, Web Terminal, AAP, NetObserv, and validation. It disables infra conversion, internal ODF, LDAP auth, OpenShift Virtualization, and Pipelines.

Leave the force_* variables false for normal continuation. Set a force flag only when you are deliberately repairing or replacing that phase.

9. Clean OpenShift-Only Teardown

To tear down a broken OpenShift cluster while preserving healthy support services:

Shell
cd <project-root>/on-prem-openshift-demo./scripts/run_local_playbook.sh \  ../aws-metal-openshift-demo/playbooks/maintenance/cleanup.yml \  -e cleanup_destroy_openshift_cluster=true \  -e cleanup_wipe_openshift_cluster_block_devices=true \  -e @inventory/overrides/core-services-ad-plus-openshift-3node-external-ceph.yml.example

That path removes the OpenShift cluster domains, wipes the cluster block devices when requested, and preserves AD, IdM, bastion, and mirror-registry. After cleanup, rerun site-lab.yml with the same override to rebuild the cluster and continue through the mirror, install, and day-2 flow.

Continue