The default job scheduler on Gemini is SLURM.
SLURM has replaced Sun Grid Engine as the job scheduling system, and as result any previously developed workflows need to be modified to work with SLURM. Equivalent commands and instructions for using the most common features are described below.
Job submission is done from the command line or via batch script. For example:
sbatch -n 4 my_batch_script.sh
sbatch -n 8 -N 1 my_application <application arguments>
sbatch -n 8 -N 1 mpirun <mpirun arguments> my_application <application arguments>
In the above examples specifying -n 4 means submit the script to four CPUs or -n 8 -N 1 means run the application using 8 CPUs, but restrict it to a single compute node. In the latter case the application must be parallelized already (e.g. via OpenMP or MPI).
Alternatively all of the necessary information can be placed inside the submission script. For example:
#!/bin/bash
#
#SBATCH --job-name=my_job
#SBATCH -n 4
source /etc/profile.d/modules.sh
module load shared openmpi/gcc
mpirun --mca btl openib --report-bindings sleep 10
exit 0
In the above example, the directives for the job scheduler are provided as #SBATCH directives at the top of the script. Modules needed for the job are loaded and then the command for the specific application is given. In this case we're using OpenMPI via Infiniband and each process will execute the command 'sleep' for 10 seconds.
The job would then be dispatched to the scheduler using the command:
sbatch my_script.sh
Notes
- By default jobs run relative to the directory the job was submitted from.
- For MPI jobs you do not need to specify a hosts file. SLURM takes care of this for you.
- Environment variables can be explicitly exported (see below), and SLURM sets a number of variables you can access from your scripts.