The two networks bare-metal guides skip, and booting the head node
Almost every \"my node won't PXE boot\" problem is a network problem. Here's the two-network design, including the DHCP-less network that makes stateless booting possible, and how to stand up the head node on it.
In Part 1 we settled the mental model: a cluster is five roles on a private network, and compute nodes are cattle. This part is about that private network, the piece every bare-metal guide assumes you already have, and the piece that, when it’s wrong, produces the single most common cluster failure: a node that won’t PXE boot.
If you get one thing right in this whole build, get the networks right.
Why two networks, and why one of them must not run DHCP¶
OpenHPC’s official guide assumes two physical networks already exist. On a single box we fake both as libvirt virtual networks:
default (NAT, 192.168.122.0/24) hpc-prov (isolated, 10.0.0.0/24)
─────────────────────────────── ───────────────────────────────
head node → internet for packages provisioning, DHCP, PXE, NFS,
compute nodes never touch it MPI, Slurm all ride here
The first network is boring: NAT, so the head node can reach the internet to download packages, with nothing exposed inbound. The second is where all the interesting failures live, and it has one non-negotiable rule:
Why the cluster network must NOT run DHCP
Warewulf is going to be the DHCP server on 10.0.0.0/24. It hands each compute
node a specific boot file based on its MAC address. If libvirt also runs DHCP on
that network, you have two DHCP servers racing to answer the same broadcast, and
nodes randomly fail to boot. libvirt’s dnsmasq only serves DHCP if you give it a
<dhcp> range, so we deliberately omit one and leave UDP/67 free for Warewulf.
That single omission is the difference between “PXE works first try” and an afternoon of confused debugging.
Network 1: the NAT path to the internet¶
This one usually already exists. Just make sure it’s running and survives reboot:
virsh net-list --all
virsh net-start default 2>/dev/null || true
virsh net-autostart default 2>/dev/null || true
If it’s missing, define it from XML. It’s a NAT forward, a bridge, and a DHCP range, and here libvirt’s DHCP is fine because nothing PXE-boots on this network:
<network>
<name>default</name>
<forward mode='nat'/>
<bridge name='virbr0' stp='on' delay='0'/>
<ip address='192.168.122.1' netmask='255.255.255.0'>
<dhcp><range start='192.168.122.2' end='192.168.122.254'/></dhcp>
</ip>
</network>
Network 2: the isolated provisioning network (no DHCP)¶
Here’s the important one. Note what’s missing: there is no <dhcp> block, and there is
no <forward> element at all. Leaving <forward> out is what makes the network fully
isolated, with no route off it. (There’s no such thing as forward mode='none' in libvirt,
which is its own small trap. You make a network isolated by omitting the element, not by
setting it to none.)
<network>
<name>hpc-prov</name>
<bridge name='virbr-hpc' stp='on' delay='0'/>
<!-- no <forward> element at all = isolated: no NAT, no route off this network -->
<ip address='10.0.0.254' netmask='255.255.255.0'>
<!-- deliberately NO <dhcp> block: Warewulf owns DHCP here -->
</ip>
</network>
virsh net-define /tmp/hpc-prov.xml
virsh net-start hpc-prov && virsh net-autostart hpc-prov
Three deliberate choices, each with a reason:
- Isolated keeps PXE/DHCP broadcast traffic off your real LAN. You do not want a rogue DHCP server answering your office network.
- No libvirt DHCP keeps port 67 free for Warewulf (the whole point above).
- The host keeps
10.0.0.254on the bridge purely so you can reach the nodes from the host for debugging. The head node itself will own10.0.0.1.
A DOWN bridge is not a broken bridge
Right after you create it, ip -br addr shows virbr-hpc as DOWN even though it
already has the right 10.0.0.254 address. That’s normal and worth internalizing now so
you don’t chase it later. A Linux bridge reports NO-CARRIER/DOWN until something is
actually plugged into it, and nothing is yet. The moment the head node’s NIC attaches, it
flips to UP. A DOWN bridge with a correct IP is fine.
Track A shortcut
If you’re doing the manual “Slurm-first” track before Warewulf, you can keep this network exactly as-is and just assign static node IPs by hand. The no-DHCP rule only starts to matter once Warewulf takes over provisioning.
Building the head node, the headless way¶
The head node (sms) has two NICs, one on each network, and I built it from the Rocky 10
minimal ISO.
Why the minimal ISO
A head node should run only what it needs. Starting from minimal forces you to learn each service as you add it, instead of inheriting a pile of distro defaults you don’t understand and can’t audit.
My first instinct was the obvious virt-install --cdrom ... --graphics vnc and clicking
through the graphical installer. That works fine on a laptop. It does not work on a
headless server you only reach over SSH: there’s no display to attach to, and the installer
just sits there forever waiting for input it will never get.
That dead end turned out to be a gift, because the fix is the more professional approach anyway: an unattended kickstart install. A kickstart file answers every installer question up front, so the machine installs itself with zero interaction. It’s the same technique real sites use to provision servers reproducibly, and it’s the stateful-install counterpart to the stateless Warewulf nodes coming later in the series.
Here’s the kickstart that defines the entire head node. The two network lines are the
interesting part:
text
lang en_US.UTF-8
timezone America/Los_Angeles --utc
rootpw --plaintext rockylab
user --name=bobby --groups=wheel --plaintext --password=rockylab
clearpart --all --initlabel
autopart --type=plain
bootloader --location=mbr --append="console=ttyS0,115200n8 console=tty0"
# Pin each NIC by MAC so the assignment is deterministic
network --device=52:54:00:9c:d0:9c --bootproto=dhcp --onboot=on --activate
network --device=52:54:00:53:c0:f0 --bootproto=static --ip=10.0.0.1 --netmask=255.255.255.0 --onboot=on --activate --nodefroute --nodns
network --hostname=sms
firewall --enabled --service=ssh
selinux --permissive
firstboot --disable
reboot
%packages
@^minimal-environment
chrony
openssh-server
%end
Rocky 10’s installer rejected my first kickstart
My first attempt carried an ignoredisk --only-use=vda line (and a matching
--boot-drive=vda) out of habit. Rocky 10 ships Anaconda 40, which evaluates that disk
name at parse time and aborts with “Disk ‘vda’ given in ignoredisk command does not
exist” before the disk is even visible. The VM has exactly one disk, so the fix was to
drop both lines and let clearpart --all and autopart do their thing. A small thing,
but the kind of version-specific surprise that’s invisible until you hit it.
Why pin NICs by MAC, and why --nodefroute
Interface names can reorder between boots, and on a cluster a swapped NIC means every
later config points at the wrong network. Pinning each NIC by its MAC makes the
assignment deterministic. And --nodefroute on the cluster NIC is how you enforce “no
gateway on the isolated network”: the default route stays on the internet NIC, and the
cluster NIC carries only local 10.0.0.0/24 traffic.
Why static on the cluster NIC
The entire cluster refers to the head node as 10.0.0.1: Slurm config, NFS exports,
Warewulf, /etc/hosts, all of it. If that address floated via DHCP, every service would
break the first time the head node rebooted and got a different lease.
You hand the kickstart to virt-install by injecting it into the installer’s initrd and
pointing the installer kernel at it. --graphics none plus a serial console means the whole
install runs over SSH, with no VNC anywhere:
virt-install \
--name sms \
--vcpus 2 --memory 4096 \
--disk path=/var/lib/libvirt/images/sms.qcow2,size=40,format=qcow2 \
--location ~/isos/Rocky-10-latest-x86_64-minimal.iso \
--initrd-inject ~/sms-ks.cfg \
--extra-args "inst.ks=file:/sms-ks.cfg console=ttyS0,115200n8" \
--osinfo detect=on,require=off \
--network network=default,mac=52:54:00:9c:d0:9c \
--network network=hpc-prov,mac=52:54:00:53:c0:f0 \
--graphics none --noautoconsole
Then watch it install over the serial console (detach with Ctrl + ]):
virsh console sms
qcow2 is thin-provisioned
size=40 is a ceiling, not an upfront allocation. The head node will store the compute
image it serves later, so give it room.
A couple of small surprises showed up, the kind a clean tutorial usually hides. After the
install rebooted, libvirt left the domain powered off, so it needed a virsh start sms
before I could reconnect. And the DHCP NIC came up named ksdev0 instead of a normal
ens/enp name, a quirk of pinning it by MAC in kickstart. Both are cosmetic, and the NIC
that actually matters, the cluster one, came up as a clean ens3.
The payoff is logging in and seeing the networking land exactly as designed:
[bobby@sms ~]$ hostnamectl
Static hostname: sms
Virtualization: kvm
Operating System: Rocky Linux 10.2 (Red Quartz)
Architecture: x86-64
[bobby@sms ~]$ ip -br addr
lo UNKNOWN 127.0.0.1/8 ::1/128
ksdev0 UP 192.168.122.108/24 fe80::5054:ff:fe9c:d09c/64
ens3 UP 10.0.0.1/24 fe80::5054:ff:fe53:c0f0/64
[bobby@sms ~]$ ip route
default via 192.168.122.1 dev ksdev0 proto dhcp src 192.168.122.108 metric 101
10.0.0.0/24 dev ens3 proto kernel scope link src 10.0.0.1 metric 100
192.168.122.0/24 dev ksdev0 proto kernel scope link src 192.168.122.108 metric 101
Internet NIC on a DHCP lease, cluster NIC nailed to 10.0.0.1, and the default route
leaving through the internet side. That last detail is the one to check: anything bound for
the outside world goes out ksdev0, while ens3 carries cluster traffic only. Exactly
what --nodefroute bought us.
Reaching the cluster like a real one: NetBird¶
On a real cluster you don’t sit at the console. You reach the login node over a VPN.
The same workflow fits here with NetBird (a WireGuard mesh): enroll
sms as a mesh peer and SSH to it over its overlay IP from anywhere, with no port-forwarding
off the host.
The same tool, at work
Using NetBird here as a single peer is what got me comfortable enough to run it for real. I’m now migrating a company’s entire remote access onto a self-hosted NetBird control plane with SSO. That’s a separate series, Rebuilding Remote Access.
your laptop ──NetBird (100.x overlay)──► sms (login node)
│ libvirt L2 bridge 10.0.0.0/24
├── c1 (PXE/DHCP/Slurm, must be real L2)
└── c2
curl -fsSL https://pkgs.netbird.io/install.sh | sh
sudo netbird up --setup-key <YOUR_SETUP_KEY> # unattended re-enrollment
netbird status # Management Connected + an overlay IP
The rule that keeps this from breaking the cluster:
NetBird is the front door, not the fabric
NetBird goes on the head node only. It does not replace the hpc-prov network.
WireGuard is Layer-3, point-to-point, NOARP, and carries no DHCP/PXE broadcast, and
a diskless node has no WireGuard agent at power-on anyway. So compute nodes keep
talking over the libvirt L2 bridge, while NetBird is purely the access layer.
Different layers, different jobs. Mixing them up is how people accidentally break PXE.
This mirrors production exactly: users reach the cluster through a controlled access layer
to the login node, and the compute nodes stay unreachable from the outside, which is
correct. (If you later want your laptop to reach c1/c2 directly, you make sms a
NetBird routing peer advertising 10.0.0.0/24 and enable net.ipv4.ip_forward. Forget
the forwarding and you get the classic “the route exists but nothing reaches the subnet”
gotcha.)
What’s next¶
We have two networks and a head node sitting on both, reachable like a real login node. Part 3 wires up the cluster’s brain: OpenHPC’s repos, Munge for authentication, and the Slurm controller, the daemons that turn a couple of VMs into something that can actually schedule work.