We will walk through the use of the Summit Supercomuputer in three steps:
To log into the summit, we will use the command line and the command secure shell.
ssh [-l <yourloginname>] <addressOfRemoteServer> #that's a lower case “L”
Password –> for the Summit system, you must type your password as: eID-password + “,” + DUOkey.
For example, if your eID password is “ramsarecool” and the DUO ID is 123456, you would type:
Exercise: Log into summit:
ssh csu-eID@colostate.edu login.rc.colorado.edu # Provide eID,DUOkey
Exercise: Explore summit:
$ whoami $ hostname $ pwd $ ls $ ls -alh $ more README.mdwn
As a summit user, you have certain directories already set aside for your use:
Exercise: Let's explore your summit directories… <yourusernamehere> - input your username. If you have a colostate.edu address, this will be “eID@colostate.edu”
$ cd /projects/<yourusernamehere> $ ls $ more README.mdwn
Exercise: We will be working today in the scratch space, navigate to your scratch space..
$ cd /scratch/summit/<yourusernamehere> $ ls $ more README.mdwn
To use software on the summit system, we have two options:
Software that is available for shared use is already installed on the system. To use it, we need to simply load it as a 'module'. Let's list whether you have any modules loaded already:
Exercise: Explore the module function.
$ module list $ module avail #List modules that are available for you to load given your currently loaded software. $ module avail #List all the modules on the system.
I prefer to load modules on the Compile Node. Let's move there first,
$ ssh scompile $ hostname $ pwd $ module list
Exercise: Let's load the R module
$ module list $ module spider R $ module load R $ module list
We will submit jobs from the compile node. This seems to work best for me. To submit jobs on our local linux machines, we used to type out the command at the prompt, push return, and the job would start executing immediately. This is not how things work on the cluster.
On the cluster, we will put our jobs in a script and submit that script to a job scheduling program. The scheduling program will use our requests for the number of nodes and processors we want to use and assign us to compute node(s) and processors (aka cores) where the job will run. Depending on the type of hardware we want, we may need to be patient and wait until the hardware is available for use. While we are waiting for our job to start, we will be put into a 'queue'. Summit uses a fair use queue system in which your place in the queue is a function of (1) when you submitted the job, (2) what resources you have requested, and (3) your use of the system.
The job scheduling program we will be using is called slurm. Slurm is loaded up on the compile nodes already. (If you are on the login node, you need to load it as a module as
The slurm software has a number of commands you can use:
$ sbatch <shell script> #submit a job $ squeue #check all jobs that are running $ squeue -u $USER #check just my jobs that are running $ squeue -j <enterJobNumber> #check just this job $ scancel -j <enterJobNumber> #cancel just this job
1) Example: submitting a job using slurm. Let's make a job to run.
#!/bin/bash # print out a friendly message echo "Hello World!"
$ sbatch printHelloWorld.sh
2) Example: submitting a slurm job with more options
We can add more options to our sbatch command:
$ sbatch --nodes=1 --ntasks=1 --partition=shas --qos=normal --time=4:00:00 --output=helloWorld_output.txt printHelloWorld.sh
That's getting pretty crazy pretty fast. Instead of doing this, we can append the options inside the
printHelloWorld.sh script. In this case, the
printHelloWorld.sh script would look like this..
#!/bin/bash #SBATCH --job-name=helloWorld #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --partition=shas #SBATCH --qos=debug #SBATCH --time=0:01:00 #SBATCH --output=hello_world_output_%j.txt # print out a friendly message echo "Hello World!"
What does all this stuff mean?
#SBATCH --job-name=helloWorld #We want to name the job "helloWorld" #SBATCH --nodes=1 #We request to use 1 node (computer) #SBATCH --ntasks=1 #We request to use 1 core #SBATCH --partition=shas #We want to use a Haswell compute node #SBATCH --qos=debut #We want to be in a debug queue #SBATCH --time=0:01:00 #We expect this job should take at most a minute. #SBATCH --output=hello_world_output_%j.txt #We want to name any output files "hello_world_output_%j.txt" where <%j> will input the job's number in the name.
The new code could be sutmitted as a job like this…
$ sbatch printHelloWorld.sh
Find the output now.
startProject.shthat does the following:
startProject.shshell script that does the following:
regularquality of service (qos)