Objectives (Deadine extended)
The objective of this assignment is to write a sequence of sequential
and parallel programs to compute the prefix-op or
the scan of a sequence of inputs. You will write the
prorams measure their performance on
the ski-resort
machines in the CSB-225 lab, and determine the gains you get with up
to 16 threads, and validate analytical predictions.
Download and untar PA2.tar.
(updated Thursday Sept 19 at 8:00 pm, in calse you want the earlier
version, it's here). You will need to
update your makefile for each parallel program you write. Do not set
the number of threads in the program, set it at the Linux OS level
(using the export or setenv command). Otherwise we cannot run your
program for different numbers of threads. Write your report in pdf
format, preferably using LaTeX. The provided code can generate many
different executables, such as MMScanSEQ, and MMScanSEQ.verb from the
sigle source file MMScan-wrapper.c using conditional complation--the
Makefile has rules that compile it with different flags, and the file
has code wrapped in #ifdef-#endif macros. We expect you to adopt a
similar strategy for this assignment.
You will write the following programs (the name in bold is the name of
the executable that should be produced).
- [MMScanDNC] Your first task is to write a (sequential)
divide and conquer function (its signature and a dummy
implementation has been pfovided for you) that exposes all available
parallelism in the scan by exploiting associativity. This program
is sequential, and should exhibit
complexity.
- [MMScanDNCP1] This is the first parallelization of
MMScanDNC. It should use only task parallelism, and should include
a strategy to throttle parallelism at a
predetermined number of tasks, rather than for a
predetermined size of task. Your code should take the
value of p as an additional command-line parameter,
and you should experiment and report for the optimal values of this
parameter
- [MMScanDNCP2] This is the second parallelization of
MMScanDNC, and should only parallelize the for loop that updates the
second half of the array after the recursive children have returned.
It should also include a strategy to throttle parallelism
appropriate for for parallelization, and you should tune
the schedule and any other clauses you deem necessary to get the
best performance. Your submitted code should have these parameters
set to their optimal values.
- [MMScanDNCP3] Now experiment with the hybrid of both
parallelization, experiment with the one that gives the best
performance and submit that (with the tunable parameters set to
their optimal values in the code). As before, the total work should
remain
.
- [MMScanEFF] This is the efficient (sequential) program
that has
complexity, where the DnC recursion stops at the
level with p nodes. Rather, the algorithm executes a
simple, sequential
loop to compute the (partial) answer.
- [MMScanEFFPAR] This is your best parallelization of the
above MMScanEFF. Make sure that any tunable parameters are set to
their optimal values when you submit the code, and the work remains
Again, the value of p should be an
additional commandline parameter.
- [MMScanOPT] This is the optimal algorithm wthat only does
a constant factor extra work (explained in class on Thursday, Sept
19). Make sure that work now becomes
. Please tune the program to provide the best
performance.
- [MMScanOPTEC] The extra credit version,
with p+1 chunks on p threads. This
should provide linear speeedup, beating the sequential version on
any number of threads.
Grading Rubric
- Report: 10 points
- Functional correctness of DNC, (and its three parallel versions).
The testing of the parallel versions will first make sure that the
results produced are correct, and second, verufy that the
parallelization is as per the specifications. Points: 25+5+5+5=40.
- EFF (and its parallelization). Correctness points: 15.
- OPT (only parallel version). Correctness points: 21.
- Performance. We will test (even for the sequential ones) that
the performance is close to our solutions, and the parallelization
yields the expected speedups. Performance points: 2*7=14.
Submission
Submit a PA2.tar file through Checkin. More details
coming soon.
Good luck and have fun!