Job Submission

The default job scheduler on Gemini is SLURM.

SLURM has replaced Sun Grid Engine as the job scheduling system, and as result any previously developed workflows need to be modified to work with SLURM. Equivalent commands and instructions for using the most common features are described below.

Job submission is done from the command line or via batch script. For example:

Command Line

sbatch -n 4 my_batch_script.sh
sbatch -n 8 -N 1 my_application <application arguments>
sbatch -n 8 -N 1 mpirun <mpirun arguments> my_application <application arguments>

In the above examples specifying -n 4 means submit the script to four CPUs or -n 8 -N 1 means run the application using 8 CPUs, but restrict it to a single compute node. In the latter case the application must be parallelized already (e.g. via OpenMP or MPI).

Alternatively all of the necessary information can be placed inside the submission script. For example:

Submission Script called my_script.sh

#!/bin/bash
#
#SBATCH --job-name=my_job
#SBATCH -n 4 

source /etc/profile.d/modules.sh
module load shared openmpi/gcc

mpirun --mca btl openib --report-bindings sleep 10

exit 0

In the above example, the directives for the job scheduler are provided as #SBATCH directives at the top of the script. Modules needed for the job are loaded and then the command for the specific application is given. In this case we're using OpenMPI via Infiniband and each process will execute the command 'sleep' for 10 seconds.

The job would then be dispatched to the scheduler using the command:

sbatch my_script.sh

Notes

  • By default jobs run relative to the directory the job was submitted from.
  • For MPI jobs you do not need to specify a hosts file. SLURM takes care of this for you.
  • Environment variables can be explicitly exported (see below), and SLURM sets a number of variables you can access from your scripts.

 

Common Commands

Command Description SGE
sbatch Used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks. qsub
squeue Reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order. qstat
scancel Used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step. qdel
sinfo Reports the state of partitions and nodes managed by Slurm. It has a wide variety of filtering, sorting, and formatting options. qhost
sbcast Used to transfer a file from local disk to local disk on the nodes allocated to a job. This can be used to effectively use diskless compute nodes or provide improved performance relative to a shared file system.  
sview Graphical user interface to get and update state information for jobs, partitions, and nodes managed by Slurm. qmon

Job Specification

Job Specification SLURM SGE
Script Directive #SBATCH #$
Queue -p [queue] -q [queue]
Node Count -N [min[-max]] N/A
CPU Count -n [count] -pe [PE] [count]
Generic Resources --gres=[resource_spec] -l [resource]=[value]
StdOut/StdErr -o [file_name]/-e [file_name] -o [file_name]/-e [file_name]
Copy Environment --export=[ALL | NONE | variables] -V
Email Address --mail-user=[address] -M [address]
Job Name --job-name=[name] -N [name]
Working Directory --workdir=[dir_name] -wd [directory]
Tasks Per Node --tasks-per-node=[count] Fixed Allocation
CPUs Per Task --cpus-per-task=[count] N/A
Job Dependency --depend=[state:job_id] -hold_jid [job_id | job_name]
Job Arrays --array=[array_spec] -t [array_spec]
Job Host Preference --nodelist=[nodes] AND/OR --exclude= [nodes] -q [queue]@[node] OR -q [queue]@@[hostgroup]

Environment Variables

Environment SLURM SGE
Job ID $SLURM_JOBID $JOB_ID
Submit Directory $SLURM_SUBMIT_DIR $SGE_O_WORKDIR
Submit Host $SLURM_SUBMIT_HOST $SGE_O_HOST
Node List $SLURM_JOB_NODELIST $PE_HOSTFILE
Job Array Index $SLURM_ARRAY_TASK_ID $SGE_TASK_ID