Launch Plugin API
Overview
This document describes the launch plugin that is responsible for launching a parallel task in Slurm and the API that defines them. It is intended as a resource to programmers wishing to write their own launch plugin.
const char plugin_name[]="launch Slurm plugin"
const char
plugin_type[]="launch/slurm"
- slurm — Use Slurm's default launching infrastructure
const uint32_t plugin_version
If specified, identifies the version of Slurm used to build this plugin and
any attempt to load the plugin from a different version of Slurm will result
in an error.
If not specified, then the plugin may be loaded by Slurm commands and
daemons from any version, however this may result in difficult to diagnose
failures due to changes in the arguments to plugin functions or changes
in other Slurm functions used by the plugin.
The programmer is urged to study src/plugins/launch/slurm/launch_slurm.c for a sample implementation of a Slurm launch plugin.
API Functions
int init (void)
Description:
Called when the plugin is loaded, before any other functions are
called. Put global initialization here.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void fini (void)
Description:
Called when the plugin is removed. Clear any allocated storage here.
Returns: None.
Note: These init and fini functions are not the same as those described in the dlopen (3) system library. The C run-time system co-opts those symbols for its own initialization. The system _init() is called before the Slurm init(), and the Slurm fini() is called before the system's _fini().
int launch_p_setup_srun_opt(char **rest, opt_t *opt_local)
Description:
Sets up the srun operation.
Arguments:
rest: extra parameters on the
command line not processed by srun
opt_local: task launch options from srun command
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int launch_p_handle_multi_prog_verify(int command_pos, opt_t *opt_local)
Description:
Is called to verify a multi-prog file if verifying needs to be done.
Arguments:
command_pos: to be used with
global opt variable to tell which spot the command is in opt.argv.
opt_local: task launch options from srun command
Returns:
1 if handled, or
0 if not.
int launch_p_create_job_step(srun_job_t *job, bool use_all_cpus, void (*signal_function)(int), sig_atomic_t *destroy_job, opt_t *opt_local, int pack_offset)
Description:
Creates the job step.
Arguments:
job: the job to run.
use_all_cpus: choice whether to use
all cpus.
signal_function: function that
handles the signals coming in.
destroy_job: pointer to a global
flag signifying if the job was canceled while allocating.
opt_local: task launch options from srun command
pack_offset: zero-origin index into a
heterogeneous job allocation, -1 if not heterogeneous job
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
launch_p_step_launch(srun_job_t *job, slurm_step_io_fds_t *cio_fds, uint32_t *global_rc, opt_t *opt_local)
Description:
Launches the job step.
Arguments:
job: the job to launch.
cio_fds: filled in io descriptors
global_rc: srun global return code.
opt_local: task launch options from srun command
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int launch_p_step_wait(srun_job_t *job, bool got_alloc, opt_t *opt_local, int pack_offset)
Description:
Waits for the job to be finished.
Arguments:
job: the job to wait for.
got_alloc: if the resource
allocation was created inside srun.
opt_local: task launch options from srun command
pack_offset: zero-origin index into a
heterogeneous job allocation, -1 if not heterogeneous job
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
int launch_p_step_terminate(void)
Description:
Terminates the job step.
Returns:
SLURM_SUCCESS on success, or
SLURM_ERROR on failure.
void launch_p_print_status(void)
Description:
Gets the status of the job.
void launch_p_fwd_signal(int signal)
Description:
Sends a forward signal to any underlying tasks.
Arguments:
signal: the signal that needs to be sent.
Last modified 14 June 2018