CS 475/GRAD 510 Assignment 01 - This is not a test.

CS 475: Parallel Programming <br> GRAD 510: Fundamentals of HPC

CS 475: Parallel Programming
GRAD 510: Fundamentals of HPC
Computer Science Department
Fall 2019
CS 475/GRAD 510 Assignment 01 - This is not a test.

Objectives

The objective of this homework is to write three OpenMP programs, to debug and test them on a ski-resort machine (in Lab 225) such as wolf-creek, and to experimentally determine the gains you get when running it in parallel with up to 10 threads. The parallelizations are relatively simple, and the results should be interesting in terms of speedup. You should measure and plot the performance of your parallelization as a function of the number of threads, and analyze your observations.

1. Stencil 1D
Parallelize the stencil_1D computation from the provided sequential code.
2. Stencil 2D
This is a 2D extension of the previous program; the data is updated using the values of eight neighboring row and column elements.
3. Matrix-vector Product
Parallelize the provided sequential program for the Matrix-vector product.

Provided Code

Download and untar this tarball The input usage for each program is

stencil_1D 20000 300000
stencil_2D 3500 3500
mat_vec 25000 10000

Submission Instructions:

Submit your source code and report in a single tarball named PA1.tar. The tarball should contain the following list of files named exactly as listed here: makefile, mat_vec.c, stencil_1D.c, stencil_2D.c and report.pdf.

You are responsible for editing the makefile to have the following commands:

make clean
make

and make should produce both sequential and multi-threaded executables, e.g. mat_vec_SEQ and mat_vec.

During testing we will run make clean; and make; followed by a series of automated tests on your executables. Your executables must be named: stencil_1D_SEQ, stencil_1D, stencil_2D_SEQ, stencil_2D, mat_vec_SEQ and mat_vec.

We will be performing automated testing on your output. Do not change the output format from the existing format.

Here is an example:

		~...PA1 36>stencil_1D_SEQ 4000 200000
                data[0]: 5000.000000 
                data[1]: 5000.000000 
                data[400]: 2639.187105 
                data[800]: 1031.572483 
                data[2000]: 15.758161 
                Data size : 4000  , #iterations : 200000 , time : 0.998575 sec

                ~...PA1 37>stencil_2D_SEQ 800 2000
                Data : 800 by 800 , Iterations : 2000 , Time : 1.686642 sec
                Final data
                10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 
                10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 
                10000.000000 10000.000000 9994.530989 9991.153834 9986.982415 
                10000.000000 10000.000000 9991.153834 9985.691261 9978.943961 
                10000.000000 10000.000000 9986.982415 9978.943961 9969.014964 



		~...PA1 38>mat_vec_SEQ 25000 10000
                N=25000, M=10000
                c[0] = 49995000.000000
                c[3125] = 81245000.000000
                c[6250] = 112495000.000000
                c[9375] = 143745000.000000
                c[12500] = 174995000.000000
                c[15625] = 206245000.000000
                c[18750] = 237495000.000000
                c[21875] = 268745000.000000
                elapsed time = 0.223439

The checkin website will perform preliminary testing of your makefile and code.. These tests do not indicate your final grade. They can however catch small mistakes in your submission. You can re-submit your file until you get 100% on the preliminary testing. Failure of preliminary testing at due-date (and time) will lead to a zero in your final grade. So, please make sure to do amke sure that your submission passes the preliminary tests every time you edit to fix a bug and or change a parallelization strategy.

In your report you will present your performance results. Here is a general outline of such a report

Algorithm Description (showing you understand the algorithm).
Description of parallelization approach.
Experimental setup:
1. Describe the machine (name, number of cores, cache sizes).
2. List experiments planned (core count).
Experimental results.
1. Compare sequential and threaded times.
2. Include tables and graphs of execution times, speedup, and efficiency.
3. Make observations about speedup.
Conclusion: Can you see a trend in the speedup of the program?

Grading Logistics

Report: 10 points
For each of the three programs:
1. Passing preliminary make and make clean: 5 points
2. Functional correctness; parallel version produing same result as sequential: 10 points
3. Legal parallelization. Correct placement of correct OMP pragma's. In this case, you are not expected to change the given sequential code: 8 points
4. Parallelization is legal and efficient, and gives a speedup in the expected range: 7 points