The objective of this homework is to step wise parallelize a Prime Sieve Program,
and record the performance of your programs on a Capital machine, and to
experimentally determine the gains you get with 1, 2, 3, 4, 5, 6, 7, and 8 threads.
You will submit sieve 1, an optimized sequential code and sieve 2 a parallelization of sieve 1 with their performance
analysis, sieve 3, a sequential, blocked code and sieve 4 a parallelization of sieve 3
with their analysis.
Download and untar PA2.tar. You will need to
update your makefile for each parallel program you are writing. Do not
set the number of threads in the program, set it at the Linux OS
level. Otherwise we cannot run your program for different numbers of threads.
See the Mandelbrot lab. Write your main report in pdf or in txt format.
- [Sieve 1] Your first task is to improve the sequential program provided, creating a program sieve1.c
where you apply the sequential optimizations discussed in the lecture:
Analyze the execution time of sieve1.c and compare it to the performancve of sieve.c.
Report performance for n = 100,000, n = 200,000, and 300,000. Analyse the order
of magnitude improvement of your code.
- marking using a stride rather than testing every element for divisibility,
- starting the marking off at the square of the current prime,
- marking off multiples of only those primes that are no more than sqrt(n), and
- marking and storing only odds.
- [Sieve 2] Your next program sieve2.c should be a parallelization of
sieve1. You should expfreiment with parallelizing one or more of the
loops, and report how the speedup varies
as a function of the number of threads (1, 2, 3, 4, 5, 6, 7, 8) on a Capital machine, for n=500,000,000,
n=1,000,000,000, and n=1,500,000,000. If you do not get speedup, explain the reasons, as best as you
can. Think about PA1. For which codes did you not get good speedup?
Why was that there? Is there a relationship?
- [Sieve 3] Now return to your sequential program Sieve 1, and
block the loop to improve locality as discussed in class. For this program,
you should again analyze the performance gains achieved by blocking for the
values of n given above, and a blockSize of 1,000,000.
- [Sieve 4] is a parallelization of Sieve3. Report the speedup
as a function of the number of threads. Ask yourself: which loop can be parallelized?
How to Submit:
- Submit sieve1.c, sieve2.c, sieve3.c and sieve4.c, your makefile, and your
analysis in a README file in a tar file: PA2.tar
How to tar files:
Good luck and have fun!