slurm_tutorial
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
slurm_tutorial [2025/04/07 16:36] – nshegunov | slurm_tutorial [2025/04/07 17:03] (current) – [SLURM - Simple Linux Utility for Resource Management] nshegunov | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== SLURM - Simple Linux Utility for Resource Management ====== | ====== SLURM - Simple Linux Utility for Resource Management ====== | ||
- | SLURM is an open-source workload manager designed for Linux clusters of all sizes. It provides job scheduling and resource management to optimize cluster utilization.It is a highly scalable cluster management and job scheduling system for large and small Linux clusters. It is used by some of the world’s most powerful supercomputers. | + | SLURM is an open-source workload manager designed for Linux clusters of all sizes. It provides job scheduling and resource management to optimize cluster utilization.It is a highly scalable cluster management and job scheduling system for large and small Linux clusters. It is used by some of the world’s most powerful supercomputers. |
Please refer to [[https:// | Please refer to [[https:// | ||
Line 21: | Line 21: | ||
===== Basic Architecture ===== | ===== Basic Architecture ===== | ||
+ | | {{ : | ||
+ | | SLURM architecture overview ([[https:// | ||
- | | + | Slurm is based on different components, to menage the cluster resources. Bellow you can find a short summary: |
- | * **slurmd** - Node daemon that runs on each compute node to execute assigned tasks. | + | |
- | * **slurmdbd** (optional) - Handles | + | |
+ | | ||
+ | - Handles | ||
+ | - Usually consists | ||
+ | |||
+ | * **slurmd | ||
+ | | ||
+ | - Responsible for launching, monitoring, and cleaning up jobs on the node. | ||
+ | - Communicates with the slurmctld | ||
+ | |||
+ | * **slurmdbd | ||
+ | | ||
+ | - Works with an external | ||
+ | - Enables commands like **sacct** and **sreport** for usage reporting. | ||
+ | |||
+ | * **Client Commands** | ||
+ | - Tools used by users and admins to interact with Slurm: | ||
+ | - **sbatch** – submit batch jobs | ||
+ | - **srun** – run parallel jobs interactively | ||
+ | - **scancel** – cancel jobs | ||
+ | - **squeue** – view job queues | ||
+ | |||
+ | * **Central Database** '' | ||
+ | - Stores job and usage records. | ||
+ | - Used in conjunction with **slurmdbd** for accounting and reporting. | ||
+ | - Supports multiple clusters if needed. | ||
Each component communicates over a secure protocol to coordinate resource usage and job execution efficiently. | Each component communicates over a secure protocol to coordinate resource usage and job execution efficiently. |
slurm_tutorial.1744043780.txt.gz · Last modified: 2025/04/07 16:36 by nshegunov