Objectives

The objective of this homework is to step wise parallelize a Prime Sieve Program, and record the performance of your programs on a Veggie machine, and to experimentally determine the gains you get with 1, 2, 3, 4, 5, 6, 7, and 8 threads.

You will submit sieve 1, an optimized sequential code and sieve 2 a parallelization of sieve 1 with their performance analysis first, and a week later you will submit sieve 3, a sequential, blocked code and sieve 4 a parallelization of sieve 3 with their analysis.

To start, Download and untar PA2.tar

  • [Sieve 1] Your first task is to improve the sequential program provided, creating a program sieve1.c where you apply the sequential optimizations discussed in the lecture:
    • marking using a stride rather than testing every element for divisibility,
    • starting the marking off at the square of the current prime,
    • marking off multiples of only those primes that are no more than sqrt(n), and
    • marking / storing only odds.
    Analyze the execution time of sieve1.c and compare it to the performancve of sieve.c. Report performance for n = 100,000, n = 200,000, and 300,000. Analyse the order of magnitude improvement of your code.
  • [Sieve 2] Your next program sieve2.c should be a parallelization of sieve1. You should simply parallelize the forall candidates, and report how the speedup varies as a function of the number of threads (1, 2, 3, 4, 5, 6, 7, 8) on a veggie machine, for n=500,000,000, n=1,000,000,000, and n=1,500,000,000. If you do not get speedup, explain the reasons, as best as you can. Think about PA1. For which codes did you not get good speedup? Why was that there? Is there a relationship?
  • Submit sieve1.c and sieve2.c and your analysis in a README file in PA2A.tar
  • [Sieve 3] Now return to your sequential program Sieve 1, and block the loop to improve locality as discussed in class. For this program, you should again analyze the performance gains achieved by blocking for the values of n given above, and a blockSize of 1,000,000.
  • [Sieve 4] is a parallelization of Sieve3. Report the speedup as a function of the number of threads. Ask yourself: which loop can be parallelized?
  • Submit sieve3.c and sieve4.c and your analysis in a README file in PA2B.tar
How to Submit:
  • Submit a tarball PA2A.tar of your sieve1.c sieve2.c, your makefile, and README file through ramCT.
  • One week later, submit a tarball PA2B.tar of your sieve3.c, sieve4.c, your makefile, and README file through ramCT.
How to tar files:
  • Enter the directory above the directory that contains the files you want to submit.
  • Use the command:
     tar cvf PA2x.tar [dir] 
Good luck and have fun!