This is an old revision of the document!

SLURM - Simple Linux Utility for Resource Management

SLURM is an open-source workload manager designed for Linux clusters of all sizes. It provides job scheduling and resource management to optimize cluster utilization.

What is SLURM?

SLURM (Simple Linux Utility for Resource Management) is a highly scalable cluster management and job scheduling system for large and small Linux clusters. It is used by some of the world’s most powerful supercomputers.

Key Features

Open-source and actively developed
Scalable to tens of thousands of nodes
Flexible job scheduling options
Supports job arrays, reservations, and dependencies
Plugins available for authentication, accounting, and more

Basic Terminology

Node – A single computer in the cluster.
Partition – A group of nodes (like a queue).
Job – A user-submitted task to be run on the cluster.
Job Step – A component of a job, such as a single MPI process.
Scheduler – The component that determines which jobs run when.

SLURM Commands

Here are some commonly used SLURM commands:

Command	Description
`srun`	Run a job or job step
`sbatch`	Submit a job script for batch scheduling
`scancel`	Cancel a running or pending job
`squeue`	View job queue
`sinfo`	View information about nodes and partitions

Example: Submitting a Job

Create a job script (e.g., `job.sh`):

```bash #!/bin/bash #SBATCH –job-name=test_job #SBATCH –output=result.out #SBATCH –error=result.err #SBATCH –time=01:00:00 #SBATCH –partition=standard #SBATCH –ntasks=1

echo “Hello from SLURM job”

HPC UNITe

Table of Contents