Name: au-slurm-package
Owner: Fred Hutchinson Cancer Research Center
Description: The various configs and tools we use on the GenomeDK cluster
Forked from: runefriborg/au-slurm-package
Created: 2015-12-10 17:09:25.0
Updated: 2015-12-10 17:09:26.0
Pushed: 2015-12-08 14:41:56.0
Homepage: null
Size: 474
Language: Python
GitHub Committers
User | Most Recent Commit | # Commits |
---|
Other Committers
User | Most Recent Commit | # Commits |
---|
The various configs and tools we use on the GenomeDK cluster
folder install location note
------ ---------------- ----
config/ /opt/slurm/etc/ Our configuration files (see further notes before installing)
scripts/ /opt/slurm/scripts/ This folder holds the various prolog/epilog scripts
replacements/ /opt/slurm/bin/ Replacements for most of our old tools
tools/ /opt/slurm/bin/ New tools to make things nicer for the user
support-bin/ various A few supporting programs
init.d/ /etc/init.d/ Simple script for starting/stopping slurm
ganglia/ ... Ganglia gmetric script and web scripts
You need to pay attention to what you are installing on what machines here.
slurmdbd.conf
contains the user/pass for the database and should only be
installed on the controller machine(s).
This is the main config file that is needed on all machines. Theoretically you could probably get away with a smaller version on compute nodes and frontends, but the files are compared via hashing by default so just install identical configs everywhere.
Must be readable by all users.
This script is only needed on the controller, but is not sensitive so you can install it everywhere if that is easier. It does two things for our setup:
Must be readable by the user that slurmctld runs under.
This is the config for the accounting module. Since it has the password and user for the database it is important that it is not accessible to regular users.
Must be readable by the user that slurmdbd runs under.
We configure cgroups to constrain cores.
These are the standard prolog and epilog scripts that run before and after a job, with root permissions. The default for slurm is to run the epilog on all nodes involved in a job, at the end of the job – as expected. I found the behaviour for the prolog surprising though, it only runs on a node when the job starts something on the node. That means that with a script like this:
#SBATCH -n 32
echo nothing
sleep 1000
srun hostname
The prolog will run immediately on one node, the other nodes will only run
with srun – leaving 1000 seconds where the user can't ssh in, or can ssh in
but without the node having been setup.
In order to change this we have set PrologFlags=Alloc
in slurm.conf
. This
ensures that the prolog is run on all machines as soon as they are allocated to
a job.
The scripts them selves are pretty simple. We create job specific folders, make
sure our audit service is running and call bash-login-update
to open for ssh
connections from the user.
The epilog then closes for ssh connections from the user (disconnecting them,
and deleting all their /tmp data).
Then it deletes the job specific folders, and runs a sanity check to make sure the node is still healthy.
Must be present on all compute-nodes.
The task prolog is run as the user before the users script, it sets a few environment variables for compatibility with the old Torque system.
Must be present on all compute-nodes.
We don't want a node to take a job and then immediately fail. It should probably be avoidable by putting a sanity-check in the regular prolog script,
but we couldn't get it to work so we went for another solution.
When the controller has found a suitable set of nodes to run a job, it calls
the controller-prolog.
The controller-prlog
script then connects to all the proposed nodes and have
them run a sanity-check (the slurm-remote-prolog
). If any of the nodes fail,
the proposed set of nodes is discarded and the job goes back in the queue.
The remote prolog must be present on all compute-nodes, the controller prolog only needs to be on the controller.
We have only needed one completely new tool so far. jobinfo
collects the most
useful fields from sacct (and sstat for running jobs) and presents it in a
format that is easier to read and grep.
It takes a very wide format with multiple entries, like this:
JobID JobName Partition MaxVMSize MaxVMSizeNode MaxVMSizeTask ...
------------ ---------- ---------- ---------- -------------- -------------- ...
219304 94 express,n+ ...
219304.batch batch 314132K s01n36 0 ...
And converts it in to something like this:
Name : 94
User : qianyuxx
Partition : express,normal
Nodes : s01n36
...
Max Mem used : 3.54M (s01n36)
Max Disk Write : 348.00M (s01n36)
Max Disk Read : 348.00M (s01n36)
The slurm_ld.conf
files is put into /etc/ld.so.conf.d/
to make sure the
binaries can find the libraries they need.
Enabling cgroups means that when ever a job is started it is allocated a set of cores. Every subprocess of the job is also bound to these constraints. This means that we can have a bad job, pushing the load average of a machine to 100 with no discernible impact on the other jobs.
The install procedure is to install the cgroup.conf file next to the slurm.conf
file on all compute nodes, and installing the slurm release_common script where
your CgroupReleaseAgentDir
variable points.
Finally you should create some aliases of the script, like this:
cd /opt/slurm/scripts/cgroup/
for subsystem in blkio cpuacct cpuset freezer memory; do
ln -s release_common release_$subsystem
done
This is a slightly simplified/cleaned version of our install script. Probably has a few missing or broken things in it.
Very primitive script for starting and stopping slurm - no proper header, no status function, can probably wait forever when shutting down. It automatically finds out which services need to be run on the machine (might
be none if the machine is just used for submitting jobs).
There is no difference between starting and restarting, the slurm daemons figure out if they need to replace an old process on their own.
It is very simple to configure, if you already have a running Ganglia Monitoring system.
Edit the constants in ganglia/gmetric/slurm-gmetric and start it from a host with access to the slurm executables.
Copy the files from ganglia/www to your ganglia web installation (3.6.0) and point your browser to http://ganglia-installation/slurm.php