Release Notes

The following are the contents of the RELEASE_NOTES file as distributed with the Slurm source code for this release. Please refer to the NEWS include alongside the source as well for more detailed descriptions of the associated changes, and for bugs fixed within each maintenance release.

RELEASE NOTES FOR SLURM VERSION 20.02

IMPORTANT NOTES:
If using the slurmdbd (Slurm DataBase Daemon) you must update this first.

NOTE: If using a backup DBD you must start the primary first to do any
database conversion, the backup will not start until this has happened.

The 20.02 slurmdbd will work with Slurm daemons of version 18.08 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it.

Slurm can be upgraded from version 18.08 or 19.05 to version 20.02 without loss
of jobs or other state information. Upgrading directly from an earlier version
of Slurm will result in loss of state information.

If using SPANK plugins that use the Slurm APIs, they should be recompiled when
upgrading Slurm to a new major release.

NOTE: Slurmctld is now set to fatal in case of computing node configured with
      CPUs == #Sockets. CPUs has to be either total number of cores or threads.

NOTE: The FastSchedule option has been removed. The FastSchedule=2 functionality
      (used for testing and development) is available as the new
      SlurmdParameters=config_overrides option.

HIGHLIGHTS
==========
 -- Drop slurm.spec-legacy from distribution.
 -- Removed the smap command.
 -- Exclusive behavior of a node includes all GRES on a node as well
    as the cpus.
 -- Use python3 instead of python for internal build/test scripts.
    The slurm.spec file has been updated to depend on python3 as well.
 -- Added new NodeSet configuration option to help simplify partition
    configuration sections for heterogeneous / condo-style clusters.
 -- Added slurm.conf option MaxDBDMsgs to control how many messages will be
    stored in the slurmctld before throwing them away when the slurmdbd is down.
 -- The checkpoint plugin interface and all associated API calls have been
    removed.
 -- slurm_init_job_desc_msg() initializes mail_type as uint16_t. This allows
    mail_type to be set to NONE with scontrol.
 -- Add new slurm_spank_log() function to print messages back to the user from
    within a SPANK plugin without prepending "error: " from slurm_error().
 -- Enforce having partition name and nodelist=ALL when creating reservations
    with flags=PART_NODES.
 -- SPANK - removed never-implemented slurm_spank_slurmd_init() interface. This
    hook has always been accessible through slurm_spank_init() in the
    S_CTX_SLURMD context instead.
 -- sbcast - add new BcastAddr option to NodeName lines to allow sbcast traffic
    to flow over an alternate network path.
 -- Added auth/jwt plugin, and 'scontrol token' subcommand.
 -- PMIx - improve performance of proc map generation.
 -- Move kill_invalid_depend and max_depend_depth options to the new
    DependencyParameters option, and deprecate use in SchedulerParameters.
 -- Enable job dependencies for any job on any cluster in the same federation.
 -- Allow clusters to be added automatically to db at startup of ctld.
 -- Add AccountingStorageExternalHost slurm.conf parameter.
 -- The "ConditionPathExists" condition in slurmd.service has been disabled by
    default to permit simpler installation of a "configless" Slurm cluster.
 -- In SchedulerParameters remove deprecated max_job_bf and replace with
    bf_max_job_test.
 -- Disable sbatch, salloc, srun --reboot for non-admins.
 -- SPANK - added support for S_JOB_GID in the job script context with
    spank_get_item().
 -- Prolog/Epilog - add SLURM_JOB_GID environment variable.
 -- Add new slurmrestd command/daemon which implements the Slurm REST API.
 -- The inconsistent terminology and environment variable naming for
    Heterogeneous Job ("HetJob") support has been tidied up.
    -- The correct term for these jobs are "HetJobs", references to "PackJob"
       have been corrected.
    -- The correct term for the separate constituent jobs are "components",
       references to "packs" have been corrected.
    -- Output from 'scontrol show job' and others has been made consistent
       with this .
    -- Relevant environment variables are all of the form SLURM_HET_JOB_*.
       (Old forms are still supported for this release, but may be removed
       in the future.)
 -- slurm.spec - override "hardening" linker flags to ensure RHEL8 builds
    in a usable manner.
 -- Now when requesting exclusive access on a node the job will be given all the
    GRES on a node whether or not they asked for it.
    NOTE: Because of this change exclusive jobs submitted on previous versions
    of Slurm may produce cosmetic errors such as
    'error: _job_alloc_whole_node_internal: This should never happen, we
     couldn't find the gres 7696487:3160171' and when the job finishes
    'error: gres/gpu: job 17262 dealloc node snowflake0 GRES count underflow
     (0 < 1)'
    These should only exist while these older jobs are loading/finishing.
    The error does not present an actual problem or impact future jobs and
    should be ignored on pre-20.02 jobs.
 -- Both jobcomp and cli_filter now have lua interfaces.

SPECIAL NOTES FOR CRAY SYSTEMS
==============================
 -- burst_buffer/datawarp - the OtherTimeout configuration option will now
    apply to the stage-in "setup" and stage-out "dws_post_run" calls in to
    dw_wlm_cli. (The StageInTimeout and StageOutTimeout were previously
    used here by mistake.)
 -- burst_buffer/datawarp - add a set of % symbols that will be replaced by
    job details. E.g., %d will be filled in with the WorkDir for the job.

CONFIGURATION FILE CHANGES (see man appropriate man page for details)
=====================================================================
 -- The mpi/openmpi plugin has been removed as it does nothing.
    MpiDefault=openmpi will be translated to the functionally-equivalent
    MpiDefault=none.

COMMAND CHANGES (see man pages for details)
===========================================
 -- Display StepId=.batch instead of StepId=.4294967294 in output
    of "scontrol show step". (slurm_sprint_job_step_info())
 -- MPMD in srun will now defer PATH resolution for the commands to launch to
    slurmstepd. Previously it would handle resolution client-side, but with
    a non-standard approach that walked PATH in reverse.
 -- squeue - added "--me" option, equivalent to --user=$USER.
 -- The LicensesUsed line has been removed from 'scontrol show config'.
    Please see the 'scontrol show licenses' command as an alternative.
 -- sbatch - adjusted backoff times for "--wait" option to reduce load on
    slurmctld. This results in a steady-state delay of 32s between queries,
    instead of the prior 10s delay.
 -- Change 'scancel -t' to filter based only on job base state.