User Tools

Site Tools


slurm_tutorial

This is an old revision of the document!


SLURM - Simple Linux Utility for Resource Management

SLURM is an open-source workload manager designed for Linux clusters of all sizes. It provides job scheduling and resource management to optimize cluster utilization.It is a highly scalable cluster management and job scheduling system for large and small Linux clusters. It is used by some of the world’s most powerful supercomputers.

Key Features

  • Open-source and actively developed
  • Scalable to tens of thousands of nodes
  • Flexible job scheduling options
  • Supports job arrays, reservations, and dependencies
  • Plugins available for authentication, accounting, and more

Basic Terminology

  • Node – A single computer in the cluster.
  • Partition – A group of nodes (like a queue).
  • Job – A user-submitted task to be run on the cluster.
  • Job Step – A component of a job, such as a single MPI process.
  • Scheduler – The component that determines which jobs run when.

SLURM Commands

Here are some commonly used SLURM commands:

Command Description
`srun` Run a job or job step
`sbatch` Submit a job script for batch scheduling
`scancel` Cancel a running or pending job
`squeue` View job queue
`sinfo` View information about nodes and partitions

Example: Submitting a Job

Create a job script (e.g., `job.sh`):

```bash

#!/bin/bash
#SBATCH --job-name=test_job
#SBATCH --output=result.out
#SBATCH --error=result.err
#SBATCH --time=01:00:00
#SBATCH --partition=standard
#SBATCH --ntasks=1
echo "Hello from SLURM job"

```

slurm_tutorial.1744042872.txt.gz · Last modified: 2025/04/07 16:21 by nshegunov

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki