OpenMP parallelization of Memory Efficient 0/1 Knapsack problem using Dynamic Programming

Version 1.0

The objective of this assignment is for you to implement the Divide and Conquer Dynamic Programming algorithm for the 0/1 knapsack problem and parallelize it in OpenMP. The assignment has three parts. One of them is the sequential algorithm and the other two are different parallel implementations.

Work in an incremental fashion for debugging purposes. We will again be using the state capital machines. You must use these machines to measure and report performance. When studying performance, run your codes on a quiet capital machine.

  • HW3A: knapAlgoSeq [35 points]
    Start from your version of Memory-efficient DP knapsack algorithm that you submitted for HW0B. Extend your code to implement the C* algorithm explained in class.

    We suggest the following steps:

    • Create a separate function (fill_table) to compute the last row of the given inputs from a start to an end point, similar to the memory-efficient DPKP algorithm of HW0B.
    • Write a recursive function (solve_kp). This function will take as inputs, the starting and ending points and a capacity, C. It will divide the set of objects into two nearly equal parts, and call fill_table for each part. This produces two rows of profits, A1 and A2. It will then calculate C* based on these two rows. Do not forget to free the memory (if any) allocated for the calculation of rows.
    • Make two recursive calls to your solve_kp function (one for each half of the objects) with respective capacities based on the calculated value of C*.

    For testing (and debugging for yourselves) your code should provide 3 verbose options. Use the code provided in verbose to print out the required details based on verbose option (the code for option 0 goes in main, and that for the other two goes in the recursive solve_kp function). Wherever you insert the code, do not make any changes to the print statements. We will use the exactly same output format for testing your submission. Here is a sample output file output.

    You will have a recursive algorithm that will compute a series of values of C*. Print these values of C* and maximum profit possible at C*. Test your program with the same input files that were used for HW0.

    NOTE: Since the solution to the knapsack problem may not be unique (multiple combinations of the object may have the same profit) you need to be careful about how to resolve ties so that the answer you produce is the same as what we generate. Here are the rules to follow.

    • When partitioning the table into two nearly equal halves, when N is odd, divide the objects such that the first half is smaller, e.g., if N = 5, the first half should have 2 objects and second half should have 3.
    • Also, when you compute the value of C* as argmin(MAX(A1[i]+A2[C-i])), make sure that all ties are resolved towards the smaller index, e.g., if A1[2]+A2[C-2] = 10 (and it happens to be the max) and A1[5]+A2[C-5] = 10, then C* should not be 5, but rather, 2 (provided no value of i smaller than 2 also gives 10).

  • HW3B: knapAlgoCoarse (Coarse-grain implementation) [35 points]
  • Parallelize the implementation of HW3A using OpenMP's coarse grain parallelization (using the #pragma omp parallel section directive) where large chunks of work will be done in parallel, but the chunks themselves are sequential, e.g., parallelize the complete calls to solve_kp. Do not parallelize the for loops. An additional option to think about is that in each call itself, there are two sub-tables to fill up, so they too can be executed as two separate parallel chunks. Therefore, we have three variations possible as follows:

    1. Parallelize only the recursive calls to solve_kp.
    2. Parallelize only the calls to fill_table.
    3. Parallelize both recursive calls and subtables filling.

    Do not expect to see much improvement in performance as yet. Observe that these recursive calls build a tree. Introduce a new command line argument called depth. Modify your code such that at a recursion level that exceeds depth the code executes sequentially. In other words, allow parallelism only until certain depth in the tree. For the performance results, we will ask you to experiment with this parameter.

    Test your code and observe the performance on the standard test cases k120x4M.txt and k240x2M.txt(same as HW0). Observe how performance changes with depth. For your final report, you will be asked to study the performance on specific input data.

  • HW3C: knapAlgoFine (Fine-grain implementation) [35 points]
  • Parallelize the implementation of HW3A using OpenMP's parallelization directives (used in HW0A) where for loop is executed in parallel. For this part, parallelize the fill_table function only. Do not use any coarse grain parallelization as you did in HW3B.

    Test your code and observe the performance on the standard test cases k120x4M.txt and k240x2M.txt (same as HW0). For your final report, you will be asked to study the performance on specific input data.

  • HW3D: knapAlgoHybrid (Combination of Coarse-grain and Fine-grain implementation) [Extra-credit]
  • Combine the parallelizations of HW3B and HW3C using OpenMP's parallelization directives to reflect both coarse-grain and fine-grain parallelism. For this part, parallelize both the fill_table and solve_kp functions.

    As before, test your code and observe the performance on the standard test cases k120x4M.txt and k240x2M.txt (same as HW0). Observe how performance changes with depth with the hybrid version. For your final report, you will be asked to study the performance on specific input data (to be posted later).

  • HW3E: Report [30 points]
  • Your report should present the performance, i.e., speedup as a function of the number of processors, of the different versions of the code. In each case, you must report speedup with respect to the fastest sequential code, (the version you wrote for HW3A). In your report, we are going to ask you for specific performance experiments. All experiments are to be done on the capital machines in the CS department.

    More details in report questions.

    Submit, using the Checkin tab, your sequential and parallel codes, your time and speedup statistics, and observations as well as your complexity analysis in a report (maximum 4 pages). Your submission must contain a Makefile in addition to other files. The Makefile should generate executables named knapAlgoSeq, knapAlgoCoarse, knapAlgoFine and knapAlgoHybrid for HW3A, HW3B, HW3C and HW3D respectively. Submit HW3 files in HW3.tar (NOTE that there is separate submission for CODE and REPORT)..