CS475 Fall 2009 - Homework 2
Parallel Program for Sieve of Eratosthenes
Objectives
The objective of this homework is to start with a straightforward
parallel implementation and progressively tune it and measure the
performance gains as you go along. You will consider three specific
optimizations, viz., (i) avoiding communication (almost) completely by
duplicating the work of determining of the primes up to sqrt(n) on each
processor; (ii) then improving the sequential part through tiling; and
(iii) allocating memory for only the odd numbers. These programs are
cumulative, each version building on the other. This is a long homework,
so please get started early.
Problem Statement
For this homework, you will need to write analyze and report on three
programs (all derived from a single, common starting point that is
provided here). This program uses the memory
allocation/data distribution strategy discussed in class (and also
allocates two extra chars of memory to make the indexing logic simpler
(the index into the array IS the integer that the location
represents: marked[i] represents the status of i.
- Your first program should be (nearly) communication-free. Each
processor should independently first compute all the primes up to
sqrt(n), and then use these primes to mark off their share of the
array of n integers. It will need to allocate a local auxiliary
array to store these primes.
- Your second program should include the previous optimization and also
reorder the loops on each processor for improved locality. You should
experiment with the block size, b, and report how/why you chose the
one you did.
- Your third program should further optimize this by allocating memory
for only odd numbers. Hypothesize about the improvement you can get from
the modification, and your experimental data should study if/how the
hypothesis is validated.
You may use MPI on the linux machines for the initial design,
development and debugging of the programs. Your performance measurements
for the comparisons should be done on bassi at NERSC.
Your code should conform to the following:
- You should implement each of the three improvements as stand-alone
MPI programs.
- Your programs should take n as a command line argument.
(Recall: your program is going to find and count all primes less than or
equal to n.)
Your programs should run with only one input given.
Feel free to add optional arguments if you would like,
but no other input should be required.
- Your programs should have two modes of compilation:
- DEBUG mode -- in which it should print some outputs (as explained
below)
- PERF mode -- in which it will not print any outputs.
The motivation for these two compilation modes is that when you measure
the performance of your code, you should not include the time for
input/output. Note that you should use a compilation flag (a.k.a. compile
time macro) and not a C-style if statement, because such an
if statement will affect the performance measurements. Here is
an example code fragment that can be used to mark-off sections that needs
to included only in a DEBUG mode:
#ifdef DEBUG printf("%d primes are less than or equal to %d\n", count, n); #endif
The above printf statement will be included in your compilation only when
you define DEBUG in your compilation command. That is, only
when you give mpicc -DDEBUG -o myprog myprog.c
In the absence
of this flag ( -DDEBUG ) your compiler will not include the
printf statement in the final executable.
Here is an introduction to C preprocessor directives. Check Section
7.3.7 for conditional compilation.
- You should provide a Makefile that can compile the three different
programs (three improvements from the provided code) that you are
going to turn in. Your Makefile should create executables for the DEBUG
versions of the programs, with the executables named sieve1, sieve2, sieve3
(first, second, and third implementation respectively).
- All implementations should report the running time in the same format
as the provided program.
- A recommended value of n for getting timing data on Bassi is n=2000000000
- Please provide README and job scripts you used as announced earlier.
Extra Credit
- Implement the suggestion brought up in class to begin with a few primes
already known. That is, initialize an array with the first few primes
(e.g. 2,3,5) already known and allow all processors but the first to begin
marking off their arrays while processor 0 finds the first sqrt(n) primes.
Experiment with when to broadcast the 'found' primes to the other
processors and see if you can get any gains from this method.
What/When/How to Submit:
- 11:59 PM Thu, Sept 24. MPI Programs to be submitted via checkin as
HW2. A detailed description of MPI program output will be posted here. Please read carefully and make sure
that your programs match the description.
- 2:00 PM Tue, Sept 29. Lab report (hard copy) to be handed in in
class (Distance students to submit via checkin as HW2_REPORT). If your
program has been modified between submission of program and lab report,
please turn in the new version also by checkin as HW2_PART2, and describe
clearly in your lab report, what has changed. Please note that Bassi
will be down on Tue, Sept 29. So, make sure to finish with your data
collection on Monday.
Good luck and have fun!