Work in an incremental fashion for debugging purposes. We will again be using the state capital machines. You must use these machines to measure and report performance. When studying performance, run your codes on a quiet capital machine.
The names of input files are of the form knnxcc.txt, where nn represents N, the number of objects and cc represents the approximate total capacity. e.g. k120x4M.txt corresponds to input file for 120 objects and total capacity of ~4Million.
The file format is:
// first line: number of objects, knapsack capacity, // next lines: weight and profit of each object (1 object per line)
For a simpler case use k10x25.txt (provided in the PA3.tar) to debug your code. It has a verbose option as the 3rd argument as seen below. knapSeq k10x25.txt should produce:
cs475@poudre>./knapSeq sample/k10x25.txt The number of objects is 10, and the capacity is 25. The optimal profit is 874 Time taken : 0.000003. cs475@poudre>./knapSeq sample/k10x25.txt 1 prints the solution vector cs475@poudre>./knapSeq k10x25.txt 2 prints the table.
Modify the given code to implement the Memory Efficient Divide and Conquer algorithm explained in class.
We suggest the following steps:
You will have a recursive algorithm that will compute a series of values of C*. Since the solution to the knapsack problem may not be unique (multiple combinations of the objects may have the same profit) you need to be careful about how to resolve ties so that the answer you produce is the same as what we generate. Here are the rules to follow.
Parallelize the implementation of knap1.c using OpenMP's coarse grain parallelization where large chunks of work will be done in parallel, but the chunks themselves are sequential, e.g., parallelize the complete calls to solve_kp. Do not parallelize the for loops. An additional option to think about is that in each call itself, there are two sub-tables to fill up, so they too can be executed as two separate parallel chunks. Therefore, we have three variations possible as follows:
Do not expect to see much improvement in performance as yet. Observe that these recursive calls build a tree. Introduce a new command line argument called depth. Modify your code such that at a recursion level that exceeds depth the code executes sequentially. A value of 0 for depth means that the entire tree is executed sequentially. In other words, allow parallelism only until certain depth in the tree.
Parallelize the implementation of knap1.c using OpenMP's parallelization directives where "for loop" is executed in parallel. For this part, parallelize the get_last_row function only. Do not use any coarse grain parallelization as you did in knap2.c.
Make sure to use -O3 compiler flag to compile all your codes.
Produce a report.pdf. See the Grading Rubric here. For NUMTHREADS 8, report the optimum depth for your knap2 code. Show performance for a range of 3 depth values
Your submission (PA3.tar) must contain:
The Makefile should generate executables named knap1, knap2, knap3 respectively. Use the clean option to remove unecessary files, and the tar option to create your submission PA3.tar file.