SeriesHPC LabPart 3 of 5

OpenHPC, Munge, and Slurm: wiring the cluster's brain

A cluster runs on trust: every daemon has to prove who it is, and they all have to agree on the time. This part installs the curated software stack and the shared-secret auth that lets a pile of VMs become one cluster.

We have a head node sitting on two networks (Part 2). Now we make it the brain of a cluster. That means three things, and only one of them is “install some packages.” The other two, a shared secret and agreement on what time it is, are the load-bearing details that decide whether the cluster ever forms at all.

The mental model: a cluster runs on trust

When slurmctld on the head node tells slurmd on a compute node to launch a job, the compute node has to believe the message is genuine. When you run squeue, the controller has to believe it’s really you. There’s no human in the loop, so the trust has to be mechanical. Two ingredients make it work:

  1. A shared secret every node holds (Munge’s key).
  2. Synchronized clocks, because the secret is time-stamped to defeat replay attacks.

Get both right and a heap of VMs cohere into one cluster. Get either wrong and you get the single most baffling symptom in HPC ops: nodes that are up, pingable, correctly configured, and stuck DOWN. So we set up trust first, before Slurm even starts.

Base config: names and time, the two silent killers

SSH into sms, become root, and pin down identity:

hostnamectl set-hostname sms
cat >> /etc/hosts <<'EOF'
10.0.0.1   sms  sms.cluster.local
10.0.0.2   c1   c1.cluster.local
10.0.0.3   c2   c2.cluster.local
EOF

Why /etc/hosts is load-bearing here

There’s no DNS on the private network, so /etc/hosts is the name service. Slurm and Warewulf identify nodes by name, and every node must resolve every other node identically. A mismatch here is a classic cause of nodes stuck in DOWN/UNKNOWN, where the controller and node literally disagree about who they’re talking to. (This bites again later in a sneakier way. On EL10 the node’s own name can resolve to an IPv6 link-local address instead of its real IPv4 one, and Slurm chokes on it. Part 4 has that war story.)

Then time, the step it’s tempting to skip and the one that bites hardest:

dnf install -y chrony
systemctl enable --now chronyd
chronyc tracking

Why time sync is non-negotiable

Slurm authenticates daemon-to-daemon traffic with Munge, and Munge rejects any message whose timestamp is off by more than a few minutes. That’s its replay defense. Clock skew between head and compute nodes is the #1 reason a freshly built node refuses to join. The head node will also serve time to the compute nodes, so fixing it here fixes it everywhere.

For the lab we relax SELinux and open the isolated network, and this is worth saying out loud honestly:

Lab posture, stated plainly

setenforce 0 and a wide-open firewall on the cluster interface are correct for a learning lab on an isolated network. They stop SELinux denials from drowning out the real lesson. In production you run enforcing and a locked-down firewall. Knowing when it’s safe to relax security is itself the skill. Pretending the relaxed posture is production-ready is not.

OpenHPC: a curated stack instead of software drift

With trust groundwork laid, we add the software. The interesting choice is not installing Slurm directly, it’s installing OpenHPC first.

Why OpenHPC instead of dnf install slurm

OpenHPC is a community repo of pre-integrated, pre-tested HPC packages (Slurm, Warewulf, Lmod, compilers, MPI) all built to work together and drop into a documented recipe. Roll your own and you hand-compile Slurm and fight version mismatches forever. Sherlock-class sites use exactly this pattern, a curated stack rather than random dnf installs, precisely to avoid the software drift that makes nodes diverge from each other over time.

OpenHPC’s current 4.x release targets Enterprise Linux 10, so the repo URL and the distro have to line up. EPEL and CodeReady Builder (CRB) supply dependencies OpenHPC leans on:

dnf install -y epel-release
crb enable                     # Rocky 10 CodeReady Builder (build-time deps)
dnf install -y http://repos.openhpc.community/OpenHPC/4/EL_10/x86_64/ohpc-release-4-1.el10.x86_64.rpm
$ dnf repolist | grep -iE 'epel|crb|openhpc'
OpenHPC                 OpenHPC-4 - Base
OpenHPC-updates         OpenHPC-4 - Updates
crb                     Rocky Linux 10 - CRB
epel                    Extra Packages for Enterprise Linux 10 - x86_64

All four repos live. Now the foundation plus the provisioning stack:

dnf install -y ohpc-base        # foundation: repos, conventions, /opt layout
dnf install -y warewulf-ohpc    # Warewulf 4: wwctl, warewulfd, OCI-container node images
systemctl enable --now warewulfd

Get the Warewulf package name right (this trips people up)

Plenty of older guides say dnf install ohpc-warewulf, which pulls Warewulf 3 (the wwsh + VNFS-chroot + MariaDB stack). On OpenHPC 4 / EL10 you want warewulf-ohpc, which is Warewulf 4: a single warewulfd daemon, a wwctl CLI, and OCI containers as node images instead of chroot tarballs. One word of difference, completely different architecture. Part 4 is all about driving it.

A quick version check confirms the cutting-edge stack landed:

$ wwctl version
wwctl version: 4.6.5-1

Munge: the shared secret that makes it one cluster

Now the keystone. Munge gives every daemon and command a way to prove its identity using a single shared key:

dnf install -y munge
/usr/sbin/create-munge-key   # generates /etc/munge/munge.key if absent
systemctl enable --now munge
munge -n | unmunge | grep STATUS   # sanity check
$ munge -n | unmunge | grep STATUS
STATUS:           Success (0)

STATUS: Success (0) means the local Munge service can mint and verify a credential. The real test comes later, across nodes, and it hinges on one fact:

Why the same key must land on every node

Every Slurm daemon (slurmctld here, slurmd on each node) and every user command (srun, squeue) authenticates with the same /etc/munge/munge.key plus synchronized clocks. When we build the compute image in Part 4, we copy this exact key into it. Different keys mean nodes can’t authenticate and the cluster never forms. This is one of the two or three things that simply has to be right. (Getting the key’s file ownership right on the node turned out to be its own saga. More in Part 4.)

The Slurm controller: install now, configure later

Finally, the scheduler itself, plus the database its accounting daemon needs:

dnf install -y ohpc-slurm-server mariadb-server   # slurmctld (brain) + slurmdbd (accounting)
systemctl enable slurmctld                          # enable, but DON'T start yet
$ slurmctld --version
slurm 25.05.7

Notice we enable but deliberately don’t start slurmctld:

Why split install from config

slurmctld won’t stay up without a valid slurm.conf that lists real nodes, and Slurm needs each node’s actual hardware (cores, memory) to write that config. Since the compute nodes don’t exist yet, starting now would just produce a confusing crash loop. We install the brain now and give it something to think about once the nodes are built.

What’s next

The brain is installed and trust is established: shared key, synced clocks, curated stack. But there are still no compute nodes to command. Part 4 is the showcase: Warewulf 4, stateless provisioning with OCI container images, and the payoff of the cattle-not-pets principle, building one image and watching diskless VMs PXE-boot into identical, ready-to-work compute nodes. It’s also where I hit the most surprises in the whole build.