Assignment 3 - OpenMP Knapsack

CS 475: Parallel Programming
Computer Science Department
Fall 2016
Assignment 3 - OpenMP Knapsack

Report Questions

The report for this assignment is more detailed than previous reports and will account for 20% of your assignment grade. In addition to the standard report, we are asking for answers to two additional questions. One is catalytic (think of this as a homework question) and the other is empirical: run specific kinds experiments, collect, analyze interpret the data and report. Data collection will take time, so DO NOT PROCRASTINATE!

The state capital machines will be reserved for exclusive use by students in this class from midnight to 6:00 am each night from Oct 14 through Oct 18 inclusive). You will find it useful to write/develop shell scripts to run your programs during these times so that you have data that can be analyzed and reported later.

In addition to the usual performance comparison of the different versions of the code (see below), your report (maximum of 4 pages) must also answer the following questions.

Question I: Analytical Derivation of Sequential Fraction, σ

This question concerns the analysis of the coarse grain parallelization. We want you to derive a lower bound on the sequential fraction of the coarse grain parallelization. Of the three options for coarse grain parallelization, we want you you to derive the formula for Option 1, where you only parallelize the calls to solve_kp. Extension to the other two cases will earn you extra credit. Here are the suggested steps to follow.

The program follows a simple binary tree divide and conquer execution flow, just like the merge sort.
Since nothing but the two recursive calls are being parallelized, the worst case execution path is from the root to one of the leaves of this binary tree.
However, a certain amount of (sequential) work must be performed before the recursive calls can be made. Figure out the lower (and also upper?) bound on this work.
Use that to develop a formula for the bounds on the the work along the worst case execution path [Hint: think geometric series].
Write up your solution. Extra credit for various extensions.

Question II: Empirical comparison of the sequential programs

This question concerns empirical comparison of the two sequential algorithms (knap1.c and knap.c). The main question we want you to answer is, "How does the execution time of the memory efficient program, that does twice the amount of work, compare in practice with the full-table code?" You will need to plan and run experiments to gather execution times, and carefully analyze the data that you get. After this, you should write a report to "tell a story" of what you have discovered/observed. The following is a list of suggestions and questions to think about. You are free to develop your own, since this question is open ended.

You may use this Python script to generate input files for a range of problem sizes. Usage:
```
 python knapsack_gen N C output_file_name 
```
You can then run your programs and gather data for the execution times for a range of values of N and C.
What is/are the largest problem sizes (for knap1.c and knap.c) that each program is able to handle, based on the configuration of capital machines?
Can you confirm the asymptotic complexity of the two algorithms? Can you demonstrate this visually in a clean and simple [Hint: a straight line conveys the most information to your reader, so a plot of actual execution time vs. predicted execution time would be most informative] plot?
By keeping the value of N*C fixed to a "reasonably large" value, say 4G(4,000,000,000), you can study how sensitive the execution time (of both programs) is to the aspect ratio of the "table." For example, you could choose 6-8 independent values of N and C.
Example: N=40 and C=100,000,000 or N=40,000 and C=100,000. Can you demonstrate your conclusions easily and visually by s simple, clean plot [e.g., execution time as a function of (?)]
Extra credit for other interesting experiments/observations/analyses.

Report

The rest of your report should follow the standard rules. Aa a minimum, you should report the performance on the k100x100M input data file for each of the other parallel programs you develop and submit. Record times for the part of your code that computes the main recursive call, and report speedups for 1 through 8 threads. Specifically,

knap2: Measure the execution time as a function of the number of processors, and observe the speedup with respect to the knap1 sequential version. Do this for a range of values of depth for all three variations of coarse grain parallelism. Show your speedup plots (3 plots, one for each parallel implementation), and report the value that provides the best performance for each type of variation.
knap3: Measure the execution time as a function of the number of processors. Show your speedup plot with respect to the sequential version knap1.
[Extra-credit] knap4: Measure the execution time as a function of the number of processors, and observe the speedup. Do this for a range of values of depth. Show your speedup plots, and report the value that provides the best performance.

Based on the above questions, was the code memory-bound or compute bound? Why? Submit the report (a file named "PA3.R.pdf") via Checkin.