CS475 Lab 1: Mandelbrot: Measuring, Analyzing, Reporting
Performance
This lab is intended to set you up for PA1 and this class in general. We
will show you how to:
- compile (make) an OMP code,
- run it,
- gather performance data (execution time),
- process the data to plot speedup,
- analyze the performance and improve the performance,
- write a report,
- and submit it.
It will enable you to to run your HPC experiments without interference of
external factors such as a busy machine or varying input sizes. It will
also introduce you to the hybrid multi-core processors in our lab
machines, explain, CPUs, cores and threads, and how to control their
allocation to your execution.
We will use a simple, easily parallelizable program to compute the
Mandelbrot set.
1. Retrieve source code, verify that you can compile, and, run it.
- Log (or ssh) into one of the fish machines machine (physically located in
CSB 325, but you should be able to ssh or vpn to it if you
are not in the building. The most current information on using the our
machines is available here.
These machines have a 24-core Intel processor with eight performance core
(P-cores) and 16 efficient cores (E-cores). The two have very distinct
properties and capabilities: vectorization, hyperthreading, frequency, etc.
Other machines are similar but different, and this is why we want you to use
specifically these machines.
-
Download the provided.tar file and untar it.
- Study
- the
mandelbrot.c
code: its single
"pragma oml parallel for
" pragma, its
command line arguments, and what it prints.
- the makefile
- the script to gather performance data
- Compile (using the makefile) and run (the first argument tells it to run
in "debug/verbose" mode, or to produce performance data in a csv format)
mandSEQ 0 1000
mandOMP 0 1000
- Vary the number of threads being used: for p = 1, 2, 3, 4, through 16.
csh: setenv OMP_NUM_THREADS p
bash: export OMP_NUM_THREADS=p
2. Collect Parallel Performance Data
- Run each case 7 times. We provided a simple script that you are
encouraged to use and modify as you see fit. It produces output in .csv
form so that you can then plot and visualize speedup in Excel or a similar
tool. Down the road, our testing scripts will also expect such
output.
- Record execution times in a file: (e.g.," data/mandelLab1.csv"
- Analyze your data and prepare for discussion:
- Contrast the execution times for mandSEQ and mandOMP for
OMP_NUM_THREADS=1. What is the difference between mandSEQ and mandOMP for
OMP_NUM_THREADS=1? Write down an explanation of what you see and why you
think it is happening.
- Make observations about the collected data including:
How many decimal
digits are significant (see
this
blog). How variable are the execution times?
- To process the data,we remove the extreme values (min/max), and average
the remaining five runs. You may want to check the variance of each
observation (make sure the deviation from the mean is below the 2% threshold).
You may want to: consider adjusting the number of reps, and edit the script to
assist in compiling the performance data and producing the information that you
will need to submit in your reports.
- In your data analysis software (Excel or python, or ...), build/compute a
table with a row for each p with
- (average) time in seconds.. and
- speedup.
-
Plot the speedup against the number of threads. You can use any plotting
tool, or plot by hand. What do you observe? Can you explain your
observations?
4. Analyze and improve the parallelization (discuss with us)
You most likely did not get perfect, or even linear speedup. The outer two
loops of Mandelbrot are very nice and regular (each pixel can be computed
independently of the others), but the innermost loop IS NOT! Study the
mandelbrot image. Consider the following points
- What does the color (intensity) of a pixel value tell you about the number
of iterations it took to find that value?
- Look at the fractal image; where is most of the work done?
- What does that say about the work the threads have to do? Does each
thread do the same amount of work (hint: think about 2 vs 3 threads)?
- Find a better parallelization strategy (check out the schedule
clause), confirm with us, and include these results in your report.
If you are stuck, please see the CSx75 team (TA+instructor). We want you to
successfully complete the lab.
4. Write and submit your report using the Checkin tab
Submit a pdf file using the Checkin system on the cs fish machines (we are not
doing submission in Canvas, since we need to run our grading scripts on the
cs department machines). We strongly recommend writing your report using
Overleaf or some similar package. Please make sure that you
- do not plot the execution time as a function of the number of
threads (why?)
- do not include a table of your raw data in the main body
of the report, but you may place it in a clearly marked appendix (why?)
Notes
Delete this para?
A quick note for Mac users: OpenMP
If you want work on a Mac at home,
you need a version of gcc compiled with openMP support: You can go
to hpc.sourceforge.net and download
gcc 4.7 4.8 4.9 depending on your OS version. You will have to extract the
archive and update the PATH variable to include the new gcc.
You still have to report the results for a fish machine in CSB Lab
325 the cs department!