### Objective

The objective of this assignment is for you to implement the Divide and Conquer Dynamic Programming algorithm for the 0/1 knapsack problem and parallelize it in OpenMP. The assignment has three parts. One of them is the sequential algorithm and the following two are different parallel implementations.

Work in an incremental fashion for debugging purposes. We will again be using the state capital machines. You must use these machines to measure and report performance. When studying performance, run your codes on a quiet capital machine.

• knap1.c, knap1(name of the executable)
Start from the Full-table knapsack version (knap.c) provided in PA3.tar.

The names of input files are of the form knnxcc.txt, where nn represents N, the number of objects and cc represents the approximate total capacity. e.g. k120x4M.txt corresponds to input file for 120 objects and total capacity of ~4Million.

The file format is:

```   // first line: number of objects, knapsack capacity,
// next lines: weight and profit of each object (1 object per line)
```

For a simpler case use k10x25.txt (provided in the PA3.tar) to debug your code. It has a verbose option as the 3rd argument as seen below. knapSeq k10x25.txt should produce:

```cs475@poudre>./knapSeq sample/k10x25.txt
The number of objects is 10, and the capacity is 25.
The optimal profit is 874
Time taken : 0.000003.

cs475@poudre>./knapSeq sample/k10x25.txt 1
prints the solution vector

cs475@poudre>./knapSeq k10x25.txt 2
prints the table.
```

Modify the given code to implement the Memory Efficient Divide and Conquer algorithm explained in class.

We suggest the following steps:

• Create a separate function (get_last_row) to compute the last row of the given inputs from a start to an end point.
• Write a recursive function (solve_kp). This function will take as inputs, the starting and ending points and a capacity, C. It will divide the set of objects into two nearly equal parts, and call get_last_row for each part. This produces two rows of profits, A1 and A2. It will then calculate C* based on these two rows. Do not forget to free the memory (if any) allocated for the calculation of rows.
• Make two recursive calls to your solve_kp function (one for each half of the objects) with respective capacities based on the calculated value of C*.
• Add the base case to the recursive function to update the solution vector.
• Make the weights, profits and solution array as function parameter instead of a global array.

You will have a recursive algorithm that will compute a series of values of C*. Since the solution to the knapsack problem may not be unique (multiple combinations of the objects may have the same profit) you need to be careful about how to resolve ties so that the answer you produce is the same as what we generate. Here are the rules to follow.

• When partitioning the table into two nearly equal halves, when N is odd, divide the objects such that the first half is smaller, e.g., if N = 5, the first half should have 2 objects and second half should have 3.
• Also, when you compute the value of C* as argmin(MAX(A1[i]+A2[C-i])), make sure that all ties are resolved towards the smaller index, e.g., if A1[2]+A2[C-2] = 10 (and it happens to be the max) and A1[5]+A2[C-5] = 10, then C* should not be 5, but rather, 2 (provided no value of i smaller than 2 also gives 10).
Keep the verbose options(default and 1) and the corresponding print statements the same as that of the given code.

• knap2.c (Coarse-grain implementation): knap2(name of the executable)
• Parallelize the implementation of knap1.c using OpenMP's coarse grain parallelization where large chunks of work will be done in parallel, but the chunks themselves are sequential, e.g., parallelize the complete calls to solve_kp. Do not parallelize the for loops. An additional option to think about is that in each call itself, there are two sub-tables to fill up, so they too can be executed as two separate parallel chunks. Therefore, we have three variations possible as follows:

1. Parallelize only the recursive calls to solve_kp.
2. Parallelize only the calls to get_last_row.
3. Parallelize both recursive calls and get_last_row calls.

Do not expect to see much improvement in performance as yet. Observe that these recursive calls build a tree. Introduce a new command line argument called depth. Modify your code such that at a recursion level that exceeds depth the code executes sequentially. A value of 0 for depth means that the entire tree is executed sequentially. In other words, allow parallelism only until certain depth in the tree.

• knap3.c (Fine-grain implementation): knap3(name of the executable)
• Parallelize the implementation of knap1.c using OpenMP's parallelization directives where "for loop" is executed in parallel. For this part, parallelize the get_last_row function only. Do not use any coarse grain parallelization as you did in knap2.c.

Make sure to use -O3 compiler flag to compile all your codes.

Produce a report.pdf. See the Grading Rubric here. For NUMTHREADS 8, report the optimum depth for your knap2 code. Show performance for a range of 3 depth values