CS 560: Foundations of Fine-Grain Parallelism

Detailed Catalog Description: Computer architecture is in constant flux, with ever-increasing parallelism that now comes in many forms: instruction level parallelism, SIMD instruction-set extensions, and recently, multiple cores. Emerging architectures such as Intel's Larrabee, Sun's Niagara, and other "many-core" processors are taking us towards tens to hundreds of processors on the same processor die. GPUs (Graphics Processing Units) are specialized hardware devices, previously used exclusively to accelerate display engines, that are finding their way into general purpose computing. Embedded systems offer yet other forms of parallelism such as FPGAs (Field Programmable Gate Arrays, chips that can be re-programmed "at the circuit level," allowing for huge performance gains for certain applications) and ASICs (Application Specific Integrated Circuits, that can provide highly tuned, but very specialized functions). Future processors, whether general-purpose or specialized, will be massively parallel, with a huge number of cores, typically fine-grain, possibly with dedicated, often distributed memories, and lower power consumption. Collectively however, they will be much more powerful than today's machines.

In the past, programmers were shielded from processor evolution. A single architectural abstraction (the von Neumann machine), and a single programming/algorithmic abstraction (the random access machine, RAM) allowed a clean separation of concerns. These abstractions are crumbling today. Programmers, library writers, as well as application developers must be aware of parallelism and memory locality at multiple levels, or risk losing the gains of modern architectures.

This raises a number of issues. First, the immediate challenge is to develop highly tuned applications that can best exploit the emerging architectures of today. Second, the medium-term challenge is to do this in a way that the skills we develop are portable to tomorrow's architectures, even though we do not know their details. Finally, the long-term goal is to enable the research that renders the first two challenges moot through automatic compilation and code generation tools. This course will tackle all three issues. In the short term, we will study how to optimize compute-intensive kernels on a family of relevant target architectures. To address the longer-term challenges, the course will teach students the foundations of the polyhedral model, a mathematical formalism to specify, develop, analyze, transform, and parallelize high-level programs towards many different target (sequential and parallel) architectures. The course will also train graduate students for research in architectural and programming abstractions for massive parallelism.