Objectives

The objective of this assignment is to explore implementations of Polynomial Multiplication. Record the performance of your programs on a Fish machine (Anchovy to Wahoo in the machines list), and experimentally determine the gains you get with 1-8 threads.

Download and untar PA3.tar. You should not need to update the Makefile. Do not set the number of threads in the program, set it at the Linux OS level (using the export or setenv command). Otherwise we cannot run your program for different numbers of threads. Write your report in pdf format.

The following two sequential programs have been implemented for you.

  • [PolyMultGSQ] (Grade School Quadratic) This is the naive implementation of Polynomial Multiplication.
  • [PolyMultGold] This (sequential) implementation includes all of the optimizations up to PolyMultBLQ. It is only provided as a binary. Use it as a benchmark against which to compare your programs.
You will need to implement the following programs.
  • [PolyMultINQ] (Inverted Quadratic)
    • Invert the inner and outer loops of both computational loops from the PolyMultGSQ implementation.
    • Hint: once the loops have been inverted you can condense the computation into a single double for loop.
    • Explore loop parallelism. Report the optimal strategy and number of threads.
  • [PolyMultOPQ] (Outer Product Quadratic)
    • Rewrite PolyMultGSQ to use a square iteration space (as shown in lecture).
    • Explore loop parallelism. Report the optimal strategy and number of threads.
  • [PolyMultBLQ] (Blocked Quadratic)
    • Block both the inner and outer loops from PolyMultOPQ.
    • Use the same parameter (tune1) to set the block size for each loop.
    • Explore loop parallelism. Report the optimal strategy, number of threads, and block size.
  • [PolyMultDCQ] (Divide and Conquer Quadratic)
    • Rewrite polyMultOPQ to use the divide and conquer strategy presented in lecture.
    • Use tune2 to set the minimum leaf size.
    • Explore task parallelism and test the use of different non recursive algorithms in the leafs (e.g. PolyMult{INQ,OPQ,BLQ}). Determine and report which configuration gives you the best performance.

Report

  • Include the details from the explore steps in each of the programs above.
  • When measuring speedup and finding optimal tuning parameters use a degree of 65535 (2^16-1)
  • Justify the optimal design and configuration of each program with either experimental evidence or design principles from this class (for some programs you may need to use both).
  • For each parallelization, report speedup on 1-8 threads, using the speedup equation from the lecture/book.
  • As always, report the details of the machine where you are running your tests.

Submission

Submit a PA3.tar file through Checkin. Create this file by using the rule for PA3.tar in the provided Makefile. (This will ensure that your project has the correct directory structure.)

Testing

  • Your executables must be named PolyMult{INQ,OPQ,BLQ,DCQ}.{time,time-par,check,check-rand,check-par} (you should not need to edit the Makefile).
  • Do not change the format and the order of printing statements. Example:
    $ ./bin/PolyMultGSQ.time 65536
    Execution time for Current:   3.266627 sec.
    Execution time for Gold:  1.038377 sec.
    $ ./bin/PolyMultGSQ.time-par 65536
    Execution time for Current:   3.273981 sec.
    Execution time for Gold:  1.040758 sec.
    $ ./bin/PolyMultGSQ.check 65536
    The total number of errors is 0
    $ ./bin/PolyMultGSQ.check-rand 65536
    The total number of errors is 0
    $ ./bin/PolyMultGSQ.check-par 65536
    The total number of errors is 0
          

Good luck and have fun!