Download Slurm
Slurm source can be downloaded from
https://www.schedmd.com/downloads.php.
Slurm has also been packaged for
Debian and
Ubuntu
(named slurm-wlm or slurm-llnl depending upon the version),
Fedora, and
NetBSD (in pkgsrc) and
FreeBSD.
Download Related Software
- Authentication plugins identifies the user originating a message.
- MUNGE (recommended)
In order to compile the "auth/munge" authentication plugin for Slurm, you will need to build and install MUNGE, available from https://dun.github.io/munge/ and Debian and Fedora and Ubuntu. - Authentication tools for users that work with Slurm.
- AUKS
AUKS is an utility designed to ease Kerberos V credential support addition to non-interactive applications, like batch systems (Slurm, LSF, Torque, etc.). It includes a plugin for the Slurm workload manager. AUKS is not used as an authentication plugin by the Slurm code itself, but provides a mechanism for the application to manage Kerberos V credentials. - Databases can be used to store accounting information.
See our Accounting web page for more information.
- Debuggers and debugging tools
- TotalView is a GUI-based source code debugger well suited for parallel applications.
- Padb is a job inspection tool for examining and debugging parallel programs, primarily it simplifies the process of gathering stack traces but also supports a wide range of other functions. It's an open source, non-interactive, command line, scriptable tool intended for use by programmers and system administrators alike.
- DRMAA (Distributed Resource Management Application API)
PSNC DRMAA for Slurm is an implementation of Open Grid Forum DRMAA 1.0 (Distributed Resource Management Application API) specification for submission and control of jobs toSlurm. Using DRMAA, grid applications builders, portal developers and ISVs can use the same high-level API to link their software with different cluster/resource management systems.
There is a variant of PSNC DRMAA providing support for Slurm's --cluster option available from https://github.com/natefoo/slurm-drmaa.
Perl 6 DRMAA bindings are available from https://github.com/scovit/Scheduler-DRMAA. - Hostlist
A Python program used for manipulation of Slurm hostlists including functions such as intersection and difference. Download the code from:
http://www.nsc.liu.se/~kent/python-hostlist
Lua bindings for hostlist functions are also available here:
https://github.com/grondo/lua-hostlist
NOTE: The Lua hostlist functions do not support the bracketed numeric ranges anywhere except at the end of the name (i.e. "tux[0001-0100]" and "rack[0-3]_blade[0-63]" are not supported). - Interactive Script
A wrapper script that makes it very simple to get an interactive shell on a cluster. Download the code from:
https://github.com/ilri/hpc-infrastructure-scripts/blob/master/slurm/interactive - Interconnect plugins (Switch plugin)
- Infiniband
The topology.conf file for an Infiniband switch can be automatically generated using the slurmibtopology tool found here:
https://ftp.fysik.dtu.dk/Slurm/slurmibtopology.sh. - I/O Watchdog
A facility for monitoring user applications, most notably parallel jobs, for hangs which typically have a side-effect of ceasing all write activity. This facility attempts to monitor all write activity of an application and trigger a set of user-defined actions when write activity as ceased for a configurable period of time. A SPANK plugin is provided for use with Slurm. See the README and man page in the package for more details. https://github.com/grondo/io-watchdog - MPI versions supported
- Workload Simulator
A Slurm simulator is available to assess various scheduling policies using historic workload data. Under simulation, jobs are not actually executed. Instead, a job execution trace from a real system, or a synthetic trace, are used.
NOTE: This sofware is currently not maintained. - PAM Module (pam_slurm)
Pluggable Authentication Module (PAM) for restricting access to compute nodes where Slurm performs workload management. Access to the node is restricted to user root and users who have been allocated resources on that node. NOTE: pam_slurm is included within the Slurm distribution. For earlier Slurm versions, pam_slurm is available for download here.
Slurm's PAM module has also been packaged for Debian and Ubuntu (both named libpam-slurm). - Command wrappers
There is a wrapper for Maui/Moab's showq command here.
- Job Script Generator
Brigham Young University has developed a Javascript tool to generate batch job scripts for Slurm which is available here.
- Scripting interfaces
- A Perl interface is included in the Slurm distribution in the contribs/perlapi directory and packaged in the perapi RPM.
- PySlurm is a Python/Pyrex module to interface with Slurm. There is also a Python module to expand and collect hostlist expressions available here.
- SPANK Plugins
SPANK provides a very generic interface for stackable plug-ins which may be used to dynamically modify the job launch code in Slurm. SPANK plugins may be built without access to Slurm source code. They need only be compiled against Slurm‘s spank.h header file, added to the SPANK config file plugstack.conf, and they will be loaded at runtime during the next job launch. Thus, the SPANK infrastructure provides administrators and other developers a low cost, low effort ability to dynamically modify the runtime behavior of Slurm job launch. As assortment of SPANK plugins are available from
https://github.com/grondo/slurm-spank-plugins.
A SPANK plugin called "spunnel" to support ssh port forwarding is available from Harvard University. It can be downloaded from the spunnel repository. - Sqlog
A set of scripts that leverages Slurm's job completion logging facility in provide information about what jobs were running at any point in the past as well as what resources they used. Download the code from:
https://github.com/grondo/sqlog - Task Affinity plugins
- Node Health Check
Probably the most comprehensive and lightweight health check tool out there is LBNL Node Health Check. It has integration with Slurm as well as Torque resource managers. - Accounting Tools
- UBMoD is a web based tool for displaying accounting data from various
resource managers. It aggregates the accounting data from sacct into a MySQL
data warehouse and provide a front end web interface for browsing the data.
For more information, see the
UDMod home page and
source code.
- XDMoD (XD Metrics on Demand) is an NSF-funded open source tool designed to audit and facilitate the utilization of the XSEDE cyberinfrastructure by providing a wide range of metrics on XSEDE resources, including resource utilization, resource performance, and impact on scholarship and research.
- UBMoD is a web based tool for displaying accounting data from various
resource managers. It aggregates the accounting data from sacct into a MySQL
data warehouse and provide a front end web interface for browsing the data.
For more information, see the
UDMod home page and
source code.
- STUBL (Slurm Tools and UBiLities)
STUBL is a collection of supplemental tools and utility scripts for Slurm.
STUBL home page.
- pbs2sbatch
- Converts PBS directives to equivalent Slurm sbatch directives. Accommodates old UB CCR-specific PBS tags like IB1, IB2, etc.
- pbs2slurm
- A script that attempts to convert PBS scripts into corresponding sbatch scripts. It will convert PBS directives as well as PBS environment variables and will insert bash code to create a SLURM_NODEFILE that is consistent with the PBS_NODEFILE.
- slurmbf
- Analogous to the PBS "showbf -S" command.
- snodes
- A customized version of sinfo. Displays node information in an easy-to-interpet format. Filters can be applied to view (1) specific nodes, (2) nodes in a specific partition, or (3) nodes in a specifc state.
- sqstat
- A customized version of squeue that produces output analogous to the PBS qstat and xqstat commands (requires clush).
- fisbatch
- Friendly Interactive sbatch. A customized version of sbatch that provides a user-friendly interface to an interactive job with X11 forwarding enabled. It is analogous to the PBS "qsub -I -X" command. This code was adopted from srun.x11 (requires clush).
- sranks
- A command that lists the overall priorities and associated priority components of queued jobs in ascending order. Top-ranked jobs will be given priority by the scheduler but lower ranked jobs may get slotted in first if they fit into the scheduler's backfill window.
- sqelp
- A customized version of squeue that only prints a double-quote if the information in a column is the same from row to row. Some users find this type of formatting easier to visually digest.
- sjeff
- Determines the efficiency of one or more running jobs. Inefficient jobs are high-lighted in red text (requires clush).
- sueff
- Determines the overall efficiency of the running jobs of one or more users. Users that are inefficient are highlighted in red text (requires clush).
- yasqr
- Yet Another Squeue Replacement. Fixes squeue bugs in earlier versions of Slurm.
- sgetscr
- Retrieves the Slurm/sbatch script and environment files for a job that is queued or running.
- snacct
- Retrieves Slurm accounting information for a given node and for a given period of time.
- suacct
- Retrieves Slurm accounting information for a given user's jobs for a given period of time.
- slist
- Retrieves Slurm accounting and node information for a running or completed job (requires clush).
- slogs
- Retrieves resource usage and accounting information for a user or list of users. For each job that was run after the given start date, the following information is gathered from the Slurm accounting logs: Number of CPUs, Start Time, Elapsed Time, Amount of RAM Requested, Average RAM Used, and Max RAM Used.
- pestat
Prints a consolidated compute node status line, with one line per node including a list of jobs.
Home page - Slurmmon
Slurmmon is a system for gathering and plotting data about Slurm scheduling and job characteristics. It currently simply sends the data to ganglia, but it includes some custom reports and a web page for an organized summary. It collects all the data from sdiag as well as total counts of running and pending jobs in the system and the maximum such values for any single user. It can also submit probe jobs to various partitions in order to trend the times spent pending in them, which is often a good bellwether of scheduling problems.
Slurmmon code - Graphical Sdiag
The sdiag utility is a diagnostic tool that maintains statistics on Slurm's scheduling performance. You can run sdiag periodically or as you modify Slurm's configuration. However if you want a historical view of these statistics, you could save them in a time-series database and graph them over time as performed with this tool: - MSlurm
Such a superstructure for the management of multiple Slurm environments is done with MSlurm. Thereby several Slurm clusters - even across multiple Slurm databases - can run parallel on a Slurm master and can be administered in an easy and elegantly manner. - JSON
Some Slurm plugins (burst_buffer/datawarp and power/cray_aries) plugins parse JSON format data. These plugins are designed to make use of the JSON-C library for this purpose. Instructions for the build are as follows:- Download json-c version 0.12 (or higher) from
https://github.com/json-c/json-c/wiki - Unpackage json-c
gunzip json-c-0.12.tar.gz
tar -xf json-c-0.12.tar - Built and install json-c
- If you have current build tools
cd json-c-0.12
export CFLAGS=-Wno-error=unused-but-set-variable
./configure --prefix=DESIRED_PATH
make
make install - If you have old build tools
cd json-c-0.12
mv aclocal.m4 aclocal.m4.orig
mv ltmain.sh ltmain.sh.orig
./autogen.sh
export CFLAGS=-Wno-error=unused-but-set-variable
./configure --prefix=DESIRED_JSON_PATH
make
make install
- If you have current build tools
- Build and install Slurm
./configure --with-json=DESIRED_JSON_PATH ...
make -j
- Download json-c version 0.12 (or higher) from
- Slurm-web
Slurm-web is a free software, distributed under the GPL version 2 license, that provides both a HTTP REST API (based on JSON format) and a web GUI with dashboards and graphical views of the current state of your Slurm-based HPC supercomputers. The website of Slurm-web, with screenshots:
http://edf-hpc.github.io/slurm-web
Last modified 17 June 2020