pam_slurm_adopt
The purpose of this module is to prevent users from sshing into nodes that they do not have a running job on, and to track the ssh connection and any other spawned processes for accounting and to ensure complete job cleanup when the job is completed. This module does this by determining the job which originated the ssh connection. The user's connection is "adopted" into the "external" step of the job.
Installation
Source:
In your Slurm build directory, navigate to slurm/contribs/pam_slurm_adopt/ and run
make && make installas root. This will place pam_slurm_adopt.a, pam_slurm_adopt.la, and pam_slurm_adopt.so in /lib/security/ (on Debian systems) or /lib64/security/ (on RedHat/SuSE systems).
RPM:
The included slurm.spec will build a slurm-pam_slurm RPM which will install pam_slurm_adopt. Refer to the Quick Start Administrator Guide for instructions on managing an RPM-based install.
PAM Configuration
Add the following line to the appropriate file in /etc/pam.d, such as system-auth or sshd (you may use either the "required" or "sufficient" PAM control flag):
account required pam_slurm_adopt.so
The order of plugins is very important. pam_slurm_adopt.so should be the last PAM module in the account stack. Included files such as common-account should normally be included before pam_slurm_adopt. You might have the following account stack in sshd:
account required pam_nologin.so account include password-auth ... -account required pam_slurm_adopt.so
Note the "-" before the account entry for pam_slurm_adopt. It allows PAM to fail gracefully if the pam_slurm_adopt.so file is not found. If Slurm is on a shared filesystem, such as NFS, then this is suggested to avoid being locked out of a node while the shared filesystem is mounting or down.
pam_slurm_adopt must be used with the task/cgroup task plugin and either the proctrack/cgroup or the proctrack/cray_aries proctrack plugin. The pam_systemd module will conflict with pam_slurm_adopt, so you need to disable it in all files that are included in sshd or system-auth (e.g. password-auth, common-session, etc.). You should also stop and mask systemd-logind. You must also make sure a different PAM module isn't short-circuiting the account stack before it gets to pam_slurm_adopt.so. From the example above, the following two lines have been commented out in the included password-auth file:
#account sufficient pam_localuser.so #-session optional pam_systemd.so
Note: This may involve editing a file that is auto-generated. Do not run the config script that generates the file or your changes will be erased.
If you always want to allow access for an administrative group (e.g., wheel), stack the pam_access module after pam_slurm_adopt. A success with pam_slurm_adopt is sufficient to allow access, but the pam_access module can allow others, such as administrative staff, access even without jobs on that node:
account sufficient pam_slurm_adopt.so account required pam_access.so
Then edit the pam_access configuration file (/etc/security/access.conf):
+:wheel:ALL -:ALL:ALL
An alternative to pam_access is to place pam_listfile.so before pam_slurm_adopt.so. For example:
account sufficient pam_listfile.so item=user sense=allow onerr=fail file=/path/to/allowed_users_file account required pam_slurm_adopt.so
List the usernames of the allowed users in allowed_users_file.
When access is denied, the user will receive a relevant error message.
pam_slurm_adopt Module Options
This module is configurable. Add these options to the end of the pam_slurm_adopt line in the appropriate file in /etc/pam.d/ (e.g., sshd or system-auth):
account sufficient pam_slurm_adopt.so optionname=optionvalue
This module has the following options:
- action_no_jobs
-
The action to perform if the user has no jobs on the node. Configurable
values are:
-
- ignore
- Do nothing. Fall through to the next pam module.
- deny (default)
- Deny the connection.
-
- action_unknown
-
The action to perform when the user has multiple jobs on the node and
the RPC does not locate the source job. If the RPC mechanism works properly in
your environment, this option will likely be relevant only when
connecting from a login node. Configurable values are:
-
- newest (default)
- Pick the newest job on the node. The "newest" job is chosen based on the mtime of the job's step_extern cgroup; asking Slurm would require an RPC to the controller. Thus, the memory cgroup must be in use so that the code can check mtimes of cgroup directories. The user can ssh in but may be adopted into a job that exits earlier than the job they intended to check on. The ssh connection will at least be subject to appropriate limits and the user can be informed of better ways to accomplish their objectives if this becomes a problem. NOTE: If the module fails to retrieve the cgroup mtime, then the picked job may not be the newest one.
- allow
- Let the connection through without adoption.
- deny
- Deny the connection.
-
- action_adopt_failure
- The action to perform if the process is unable to be adopted into any
job for whatever reason. If the process cannot be adopted into the job
identified by the callerid RPC, it will fall through to the action_unknown
code and try to adopt there. A failure at that point or if there is only
one job will result in this action being taken. Configurable values are:
-
- allow (default)
- Let the connection through without adoption. WARNING: This value is insecure and is recommended for testing purposes only. We recommend using "deny."
- deny
- Deny the connection.
-
- action_generic_failure
- The action to perform if there are certain failures such as the
inability to talk to the local slurmd or if the kernel doesn't offer
the correct facilities. Configurable values are:
-
- ignore (default)
- Do nothing. Fall through to the next pam module. WARNING: This value is insecure and is recommended for testing purposes only. We recommend using "deny."
- allow
- Let the connection through without adoption.
- deny
- Deny the connection.
-
- disable_x11
- Turn off Slurm built-in X11 forwarding support. Configurable values are:
-
- 0 (default)
- If the job the connection is adopted into has Slurm's X11 forwarding enabled, the DISPLAY variable will be overwritten with the X11 tunnel endpoint details.
- 1
- Do not check for Slurm's X11 forwarding support, and do not alter the DISPLAY variable.
-
- log_level
- See SlurmdDebug in slurm.conf for available options. The default log_level is info.
- nodename
- If the NodeName defined in slurm.conf is different than this node's hostname (as reported by hostname -s), then this must be set to the NodeName in slurm.conf that this host operates as.
- service
- The pam service name for which this module should run. By default it only runs for sshd for which it was designed for. A different service name can be specified like "login" or "*" to allow the module to in any service context. For local pam logins this module could cause unexpected behaviour or even security issues. Therefore if the service name does not match then this module will not perform the adoption logic and returns PAM_IGNORE immediately.
Slurm Configuration
PrologFlags=contain must be set in the slurm.conf. This sets up the "extern" step into which ssh-launched processes will be adopted. You must also enable the task/cgroup plugin in slurm.conf. See the Slurm cgroups guide.
Important
PrologFlags=contain must be in place before using this module. The module bases its checks on local steps that have already been launched. If the user has no steps on the node, such as the extern step, the module will assume that the user has no jobs allocated to the node. Depending on your configuration of the PAM module, you might accidentally deny all user ssh attempts without PrologFlags=contain.
The UsePAM option in slurm.conf is not related to pam_slurm_adopt.
Other Configuration
Verify that UsePAM is set to On in /etc/ssh/sshd_config (it should be on by default).
Firewalls, IP Addresses, etc.
slurmd should be accessible on any IP address from which a user might launch ssh. The RPC to determine the source job must be able to reach the slurmd port on that particular IP address. If there is no slurmd on the source node, such as on a login node, it is better to have the RPC be rejected rather than silently dropped. This will allow better responsiveness to the RPC initiator.
Limitations
Alternate authentication methods such as multi-factor authentication may break process adoption with pam_slurm_adopt.
SELinux may conflict with pam_slurm_adopt, so it might need to be disabled.
Last modified 29 April 2019