Portability And Gap Analysis

Warning

This on-prem target is experimental. Treat the docs and playbooks in this subtree as an emerging alternate installation path, not yet the same confidence level as the validated AWS-target flow.

Executive Summary

This page separates what is already portable from what still depends on the current AWS-shaped host contract.

The repo now ships an initial on-prem target under:

The AWS-specific part is mostly the substrate that creates virt-01 and its attached guest block devices. Once your host exists and presents the expected network, storage, and execution contract, the rest of the lab behaves like a host-local KVM/libvirt/Open vSwitch environment.

That means:

  • the support guest build flow is mostly portable
  • the disconnected OpenShift install flow is mostly portable
  • the bastion split and day-2 flow are mostly portable
  • the main work is replacing the AWS host-acquisition contract with an on-prem host-preparation contract

Current Portability Boundary

AWS-bound today

These pieces are directly tied to AWS or to assumptions created by the AWS layer:

Portable in practice once the host exists

These parts are already fundamentally host-local:

  • KVM/libvirt guest provisioning
  • Open vSwitch and lab VLAN topology
  • support guest configuration:
    • IdM
    • bastion
    • mirror registry
    • AD
  • bastion staging and workstation-to-bastion handoff
  • mirrored-content and disconnected install flows
  • OpenShift cluster stand-up
  • day-2 operator and auth automation
  • Keycloak / OpenShift / AAP auth model

Host assumptions that must still be satisfied

Your on-prem server can reuse most of the current orchestration only if it provides the same effective host contract as virt-01:

  • RHEL host with the required virtualization and networking stack
  • nested KVM available and stable
  • enough CPU, RAM, and storage headroom for the selected cluster shape
  • deterministic guest block-device naming
  • a working uplink for management traffic
  • a VLAN-capable or equivalent Open vSwitch model
  • an operator path to the host from the workstation

The Three Real Gaps

1. Outer host acquisition

Today AWS creates:

  • the host itself
  • public access path
  • guest block devices
  • stable volume inventory input

On-prem, those responsibilities move outside the current CloudFormation layer.

You need your own answer for:

  • how virt-01 is installed
  • how the operator reaches it
  • how guest disks are attached
  • how those disks are named consistently

2. Hypervisor identity

The code still uses the inventory group name aws_metal almost everywhere.

That does not mean the lab is deeply AWS-only. In practice it means:

  • playbooks target one hypervisor host through the aws_metal group
  • some tasks derive the hypervisor host via groups['aws_metal']

The current on-prem path does this:

  • keep the inventory group name aws_metal even for an on-prem host

Cleaner long-term path:

  • rename or abstract that inventory group to something neutral like lab_hypervisor

3. Deterministic guest block-device naming

This is the biggest practical portability dependency.

The guest definitions and several roles assume stable block-device names such as:

  • /dev/ebs/idm-01
  • /dev/ebs/bastion-01
  • /dev/ebs/mirror-registry
  • /dev/ebs/ocp-master-01
  • /dev/ebs/ocp-infra-01-data

Those names are fed by:

The current on-prem target now satisfies this by:

  • creating guest logical volumes from an operator-provided LVM volume group
  • publishing /dev/ebs/* compatibility symlinks

That keeps the existing guest and cluster roles reusable even though the backing devices are now on-prem LVs rather than AWS EBS volumes.

Those backing devices may be:

  • local NVMe
  • RAID LUNs
  • SAN-backed block devices
  • local SSDs presented by HBA order

What the current on-prem target does not carry over yet is the AWS gp3 performance contract. It preserves the disk layout and naming contract, but it does not currently convert per-volume AWS IOPS and throughput settings into libvirt iotune or any other host-level storage QoS policy.

Current On-Prem Bring-Up Model

The current shipped on-prem target takes the lowest-risk path:

  1. Install a RHEL host manually so it becomes the on-prem equivalent of virt-01.
  2. Provide an LVM volume group with enough free space for the guest footprint.
  3. Let on-prem bootstrap create the guest LVs and deterministic /dev/ebs/* compatibility symlinks.
  4. Keep the inventory host in the aws_metal group initially.
  5. Set the operator-side and bastion-side hypervisor SSH users explicitly.
  6. Skip the AWS host-acquisition layer and run only the host/bootstrap and lab orchestration from the point where the host contract is satisfied.

For the current branch, that split is now explicit:

  • inventory/hosts.yml is the operator-workstation path to the hypervisor
  • on_prem_bastion_hypervisor_host and on_prem_bastion_hypervisor_user define the bastion-side return path to that same host

In that model, most of the current playbooks work with little or no orchestration change.

The tradeoff is that the current target is capacity-first, not QoS-first:

  • it validates space
  • it provisions the expected guest disks
  • it preserves stable guest disk identity
  • it expects you to provide a backend with enough aggregate performance

The current branch also keeps the AWS-target tree pristine. The on-prem target reuses the stock support-service and day-2 code through local wrappers in on-prem-openshift-demo/ rather than by modifying the validated AWS path.

What Would Need To Change For A First-Class On-Prem Target

If you want a more neutral long-term target rather than “prepare the host to look like current virt-01,” the code should eventually change in these places:

Replace AWS bootstrap discovery in playbooks/bootstrap/site.yml

Current AWS-bound pre-tasks:

  • discover current instance ID
  • query live attached EBS volumes from AWS
  • derive active guest volume mapping from AWS

First-class on-prem target should instead consume:

  • a neutral host inventory file
  • a neutral guest volume inventory file
  • a host-local or inventory-driven mapping source

Split the current volume inventory away from CloudFormation

Today the effective source of truth is under:

For on-prem, that should become a neutral lab volume inventory with fields like:

  • guest name
  • device path or stable symlink
  • capacity
  • purpose
  • optional performance hints

Abstract the hypervisor inventory group

Current code references aws_metal directly in many playbooks and roles.

That should become a neutral group for:

  • hypervisor playbooks
  • host-local guest preparation
  • maintenance flows

Abstract the login/user assumption

The current on-prem branch now removes the runtime requirement for ec2-user on the hypervisor by rendering a bastion-side inventory with explicit on-prem host and user values.

Longer term, the remaining cleanup is mostly about making that host-user contract more neutral and less aws_metal-named in the stock codebase.

What Does Not Need To Be Redesigned

Assuming your on-prem host satisfies the same effective lab contract, these do not need a conceptual redesign:

  • support guest architecture
  • bastion execution boundary
  • OpenShift disconnected installation approach
  • mirrored-content strategy
  • auth architecture:
    • IdM
    • AD trust
    • Keycloak OIDC
    • AAP OIDC
  • day-2 operator layout
  • CPU tiering concept
  • memory oversubscription concept

Phase 1: Host mimicry

Do not start by generalizing the codebase.

Start by proving that an on-prem host can mimic the current virt-01 contract:

  • same guest naming
  • same /dev/ebs/* symlink layout
  • same inventory group
  • same bastion boundary

That gives the fastest proof of portability.

Phase 2: Neutralize the substrate

Once you have proven the host-mimicry path, make the substrate neutral:

Phase 3: Publish a more neutral on-prem target

The repo now has an initial alternate target. The remaining work is to make it less compatibility-driven and more neutral, with:

  • on-prem prerequisites
  • on-prem host-preparation workflow
  • neutralized inventory and storage model

Bottom Line

The bulk of Calabi was already portable once a virt-01-like host existed. The current on-prem target proves that by replacing the outer AWS host and storage acquisition steps while reusing the stock support-service, cluster, and day-2 orchestration.

If you already have a freshly installed on-prem server with m5.metal-like capacity and you prepare it to satisfy the current virt-01 contract, most of the Calabi playbooks should not need significant tinkering.

The hard part is not OpenShift or the support services. The hard part is preserving the current host substrate contract without AWS.