#hpc
5 posts tagged hpc.
The condo model and the money shot: watching Slurm preempt a job
Sherlock's condo model: research groups buy nodes and get guaranteed priority; everyone else uses idle cycles and gets bumped when an owner shows up. Here we configure it on two VMs and watch a high-priority job preempt a running one, live.
Warewulf 4 and the art of treating servers as cattle
A stateless node has no OS on its disk. It PXE-boots one golden image into RAM. This is how big clusters stay sane. Getting there on a brand-new EL10 stack took a pile of debugging I didn't expect. Here's all of it.
OpenHPC, Munge, and Slurm: wiring the cluster's brain
A cluster runs on trust: every daemon has to prove who it is, and they all have to agree on the time. This part installs the curated software stack and the shared-secret auth that lets a pile of VMs become one cluster.
The two networks bare-metal guides skip, and booting the head node
Almost every \"my node won't PXE boot\" problem is a network problem. Here's the two-network design, including the DHCP-less network that makes stateless booting possible, and how to stand up the head node on it.
What an HPC cluster actually is (and how to fake one on a single box)
Strip away the marketing and a supercomputer is five roles wired to a private network. Here's the mental model, and the groundwork for rebuilding a Sherlock-style HPC cluster on one Linux box.