CS475 Fall 2011 - Final Project

Exploring knapsack DP parallelizations


Objectives

The objective of your final project is to explore parallelizations of the memory efficient dynamic programming knapsack solver. The programs you have written so far, as well as the initial studies that you have done have now exposed you to a variety of parallelization options, and the goal of your project is to systematically explore some of these choices. In HW3, you developed the sequential divide-and-conquer program to solve the knapsack, and this should be your starting point. We want to get good parallelization on up to 16 provcessors, i.e., multiple nodes on carver. So just an OpenMP parallelization will not be sufficient, but a purely MPI approach may be possibe.

You also explored a coarse grain task parallelization of the calls to a main workhorse function, (let's call it find_last_row(i,j,c)), and also a direct OpenMP parallelization of the function itself. As you know, the main workhorse find_last_row(i,j,c), takes as input a range of item-indices (i, j) and a capacity (c) and computes an array (of length c+1) giving the optimal profit that can be achieved for all capacities between 0 and c, for the knapsack problem with only objects i through j. The main goal of the project is to parallelize the complete knapsack solver, but to focus on parallelization of find_last_row(i,j,c). You may set aside the exploration of the coarse-grain parallelization, and focus primarily on find_last_row(i,j,c) itself.

Teams: The project will be done in teams of two students. We strongly encourage mixed teams of online and on-campus students. As you form your teams, send us email (cs475@cs.colostate.edu) to register your team. If you like we can set up a discussion group for you in RamCT.

Parallelization Options

The core driver of your project should be the divide-and-conquer program of HW3. There are two main directions to pursue, and OpenMP version and an MPI version. Here are some details of the options:
  1. OpenMP based and tiled parallelization of the find_last_row function. This should use the wavefront parallelization described in the last week's lectures. Since the computation of each row uses only elements from the previous row, there is an obvious simple parallelization (that you explored in HW3). However, this does one synchronization per row, and a tiled implementation would reduce the overhead. However, the knapsack dependences are not from just the northwest as yo uhad in the Needlemand Wunsch (HW4), but from j-w[i], which is different in each row (you may want to allocate buffers of width WMAX, the maximum weight) may need/want to change the order in which the table is filled -- instead of row-by-row (i.e., with the item loop outer, capacity loop inner) you may want to explore making the item loop the inner loop. Regardless of whether or not you do this reordering, a number of values in each row need to be saved to be used by the next tile.
  2. MPI based and tiled parallelization of find_last_row. Each processor is (conceptually) responsible for a section of the table (it is responsible for computing this section of the table, not necessarily storing all of it) of width c/np, and height n. The required communication is from one processor to the next. In order to ease your programming task, you may assume that no weight is greater than c/np, so that the data a processor needs must come from just the previous processor (send-recv to nearest neighbors will suffice). Note that in recursive calls, c may be small and this assumption may not hold. So your code should have a graceful recovery. Here is one way to do a graceful recovery: assign the tile width to be max(c/np, MAX_WT). Now, there may be some processors who have no work to do, but that is merely inefficient, not catastrophic.

    You could also explore tiling to reduce the communication frequency, and increasing the granularity. As in the OpenMP case, this will necessitate will require saving a "fringe" of values.

Your project must explore a minimal set of required options, and the remaining choices may be pursued for extra credit. The following is a minimal set of requirements (so you must submit two separate programs). Many additional options may be pursued and will earn you extra credit. Here are some suggestions: We post additional details about the format of the submission, preferred/required naming conventions, etc.

Good luck and have fun!