CS560 Suggested Class Projects (Spring 2008)

There are essentially two kinds of projects: application/algorithm studies, and tools. A tool oriented project requires prior discussion with Sanjay to ensure that the efforts complement efforts in the Mélange group.

We expect most students to do application/algorithm studies, where an application/algorithm is parallelized and implemented on one of two platforms:

It is strongly recommended that the Playstation projects involve a complete application, and the FPGA ones tackle an algorithmic kernel (since there is a considerable amount of interface VHDL code that must be written by hand). It is acceptable to do the project with a partner, with the expectation of correspondingly larger scope and size. FPGA based projects should probably be done with a partner.

Application Studies

Here is a list of suggested applications that you may explore. I've tried to make this list reasonably large, but it also gives you an idea of what I'm looking for. It is also somewhat restrictive because I know that their computationally intensive kernels are amenable to the techniques we develop in this class. If there's something else you'd like to pursue, please discuss with Sanjay.
  1. Linear Space Sequence Alignment for bioinformatics: Driga, Lu, Schaeffer, Szafron, Charter and Parsons propose a memory efficient biological sequence alignment algorithm: FastLSA: A Fast, Linear-Space, Parallel and Sequential Algorithm for Sequence Alignment, in Algorithmica, Volume 45 , Issue 3, Pages: 337-375 (July 2006). A simple parallelization of this algorithm is available from the Biobench Suite at University of Maryland, under the name of PLSA: Parallel Linear Space Alignment.
  2. Watershed computation models: TREX: Two-Dimensional Runoff Erosion and Export is a software package developed over the years by CSU's Civil Engineering faculty. There is a wealth of information on this project there, but if you plan to do it, we have access to a more recent version of the software (written in C++).
  3. Weather modeling I: The Weather Research and Forecasting Model is a next generation mesoscale numerical weather prediction system developed at NCAR, Boulder. The source is freely available and can be installed to run sequentially, and we are in touch with the authors.
  4. Weather modeling II: CSU's own Atmospheric Science Department also has a number of software packages for weather prediction and modeling. We have been collaborating with Dave Randall's group to investigate data mapping and parallelization strategies for the so called "geodesic grid". We have access to a shallow water model (written in Fortran) on this grid, and a number of students have worked on parts of this code. We have proposed a "smashing technique" to avoid "wrap-around dependences."
  5. Biomimetic Vision and Focus of Attention: Bruce Draper and his research group are exploring biomimetic vision, and have developed software to determine "Focus of Attention." The initial and computationally intensive part of this application involves constructing what is known as the "image pyramid." Thomson Comer has very generously provided a stand-alone implementation of this part of the software called AttendAsYou. Furthermore he has agreed to be available to help, and also has prior experience with implementing image convolutions on FPGAs (he tool CS560 last year). Another student is already working with Wim Bohm to implement this algorithm on FPGAs (Digilent and/or AlphaData) so this target is not recommended for CS560, although the Cell would be complementary.

Algorithmic Kernels

  1. Algorithmic Tiling for the APP: The algebraic path problem (APP) is a generalized problem that unifies many problems (such as the all-pairs shortest paths in a graph, the transitive closure of a boolean relation, matrix inversion, among others) into a single seting. It (i.e., all instances of the problem) can be resolved by a generic algorithm schema, simply by changing the nature of some of the operations. Many parallel algorithms/architectures have been proposed fopr the APP, the most recent ones may be found in this and in this recent papers that propose a technique called "algorithmic tiling" that transforms a standard "scalar" algorithm to a "blocked" version. Such blocked programs have all the advantages of tiling even though, the two programs are not semantically equivalent ( all intermediate results are not necesarily identical, although the final values are).
  2. Protein folding and the optimal parenthesization problem: The optimal parenthesization problem is a classic algorithm for which a neat systolic aray has been proposed more than twenty years ago by Guibas, Kung and Thompson. The problem has recently become important due to its immediate application in the determination of the secondary structure of proteins. The project would require you to implement the array on an FPGA co-processor, measure the performance, determine the bottlenecks and develop solutions to resolve them. There are many issues to resolve: the proposed architecture is a triangular array with about 0.5 n^2 processors, and the total computation time is about 1.5n, leading to a work of 0.75 n^3, but the algorithm has only a ninth as many operations. Because the "core" of each processor implements just two add-compare operations, it may be possible to get a fairly large degree of parallelism (as high as 100 procesors?), and so offset the work-inefficiency. However, an even more interesting challenge is to obtain an efficient architecture which can also exploit the parallelism. Finally, a potetial way to improve efficiency is by judicious partitioning of the computation between the host and the FPGA.
  3. Stencil kernels: Successive-over-relaxation (SOR) is an algorithmic technique commonly used in the numerical simulation of many physical phenomena (eg. the so called "heat equation", electromagnetic wave propagation using Maxwell's equations, etc.) It is an iterative technique, that lends itself to elegant parallelization on a systolic array. However, many interesting issues would arise when actually implementing such an architecture on an FPGA co-processor: the tradeoff between precision and the number of iterations (on an FPGA there is flexibility regarding the arithmetic oprerations, from full IEEE floating point implementations to fixed precision); the choice among different systolic arrays possible and the resulting architectural tradeoffs, and finally issues related to tiling which could be pursued if time permits.
  4. LU Decomposition:
  5. Classic example, and if you choose this, we should expect some strong results.