# NSCI 580A4 fall 2017

### Sidebar

NSCI 580A4

Instructors
Tai Montgomery
Erin Nishimura

wiki:supercomputing

# SUPERCOMPUTING

## What is High Performance Computing?

Supercomputing, High Performance Computing (HPC), or Computing Clusters all refer to computations done on a set of interconnected processors that work together such that they can be viewed as a single computer. These High Performance Computing Clusters work at a much higher level of performance than a stand-alone computer. They are typically used for research purposes. They almost always run on a linux operating system. The top 500 compute clusters in the world are listed on www.top500.org . Today we'll be working on the Summit cluster system that is shared between universities and public institutes on the Front Range.

Today we will be using the Summit Supercomputer system.

Image: a picture of a typical small- to mid-sized computing cluster.

## The Summit HPC Cluster

What is Summit? Summit is a High Performance Computer system that is a joint venture between Colorado State University (CSU) and the University of Colorado Boulder(CU). Summit is housed at, and operated by, CU IT staff.

Who pays for Summit? The project is a \$3.55 million venture funded by a \$2.7 million award from the National Science Foundation and the remainder being supported by CSU, CU Boulder and other regional universities and institutions.

How much can I use Summit? The first year, you allocated an Initial Allocation up to 50,000 Service Units (SU). SU's equal one hour of use on one node. After either 50,000 SU's have been used or one year has passed, you can apply for a Project Allocation. If your lab is using Summit a lot, your lab can buy into Summit for higher allocations, longer job runs, and priority in the queue. For more information see Summit CSU Documentation

What are the alternatives to Summit? There are many smaller servers available for use in different departments or labs that have been paid for and are maintained by those smaller groups. Alternatively, Amazon Cloud is available as a cloud-only service that offers computing power for purchase.

## Parts of a Supercomputer

What did you look for when you bought your last computer? CPU performance? Memory? Hard drive space? Super computers have all these same features, but at a higher scale. The supercomputer's power comes from the fact that it consists of many computers linked together in such a way that they can share jobs.

Here is a comparison of Erin's computer's specs versus the Summit High Performance Computer's specs:

Feature Erin's Computer Summit
# “computers” 1 488 nodes
# CPU cores 4 cores 12,632 cores
memory 16 GB 70.8 TB (70,800 GB)
storage 512 GB 1.2 petabytes (1,200,000 GB)

A computing cluster is one large computer made up of many smaller computers. Each smaller computer unit within a cluster is called a node. The Summit system contains different types of nodes.

• This is where you log in
• Don't run big jobs here!!!
• Minor editing, scripting, testing
• Submitting jobs to compute nodes

Compile nodes.

• This is where you can compile code (install new software).

Compute nodes.

• The workhorse of the system.
• Where all big jobs run.
• To learn more more about Summit's compute nodes, Summit Specs
• On Summit chose from CPUs, GPUs, and BigMem nodes.
• Compute nodes contain multiple processors, called cores. The general CPU compute nodes (a.k.a. Haswell Nodes) on Summit have 24 cores and use up to 128 GB of memory.

## How it works

How do I send a job to the compute nodes? Sending a job to the compute nodes requires a Workload Manager (sometimes called a Job Scheduler). We will be learning to use the slurm workload manager.

How do all these nodes (computers) communicate with one another? Because jobs can be shared across different processors (cores) or across different computers (nodes) the connection between these types of hardware takes on a more significant role than in a stand-alone computer. The Summit system uses Intel's OmniPath interconnect system. It's very fancy!

What are the benefits of using a compute cluster versus using my local computer?

• power, efficiency, throughput
• shared software
• collaboration
• reduces the need for specific features & software on your local computer
wiki/supercomputing.txt · Last modified: 2017/08/30 14:50 by erin