Christopher D. Krieger

Graduate Student
Colorado State University
Computer Science Department
1873 Campus Delivery
Fort Collins, CO 80523-1873
krieger atsign cs dot colostate dot edu
CS253 Recitation Mondays 1pm CSB225
Office Hours Wed 2-4pm CSB120

 


Education

Ph.D. Candidate, Computer Science, Colorado State University, Expected Graduation June 2013
M.S., Electrical and Computer Engineering, University of Utah, 2002.
B.S., Electrical and Computer Engineering, Brigham Young University, 1995.

Current Research Interests

I'm now doing research in Computer Science at Colorado State. My advisor is Dr. Michelle Strout. My work focuses on two areas. First, I am researching a new programming language abstraction, called a loop chain. A loop chain is a sequence of loops that share data, often via irregular accesses such as A[col[i]]. These types of loop chains are quite common in molecular dynamics, irregular mesh PDE solvers, or finite element codes. By identifying a group of loops as a loop chain, compilers and runtime optimizers can perform a variety of code and data transformations. For example, several techniques can be applied that merge the iteration spaces of the loops to improve locality (e.g. full sparse tiling), concurrency (shared memory autoparallelization), or arithmetic intensity (improved vectorization). I am interested in the optimizations unlocked by loop chaining and how to automatically apply them to loop chains. Therefore, some of my work focuses on developing techniques to optimally select optimization parameters. My approach includes performance models, machine learning, and autotuning.

My other research area relates to the separation of algorithm and implementation details in parallel programming models. Today, most code tangles performance and implementation details with the code doing the original, algorithmic work. Often, after optimization, the purpose of the code is difficult to find, obfuscated by details such as scheduling or data distribution. This area of research seeks to understand how different parallel models can specify implementation details in an orthogonal manner to the algorithm code.

Previous Research

Previously, I've done research surrounding hardware data prefetchers. Many HPC scheduling algorithms, such as tiling, try to keep data accesses within a limited amount of memory. But these schedules often work counter to the data prefetching hardware on modern microprocessors. I investigated the positive and negative interactions between hardware prefetching and tiling. My algorithm allows one to analyze different schedules to help select the one with the maximum performance benefit due to cooperation with the hardware prefetcher.

I've also explored using virtual machines to dynamically detect parallelizable loops. I modified the Jikes RVM and gathered data on the potential this method has by testing a range of benchmarks, including NAS/JavaGrande and DaCapo.

My master's thesis research focused on asynchronous hardware systems. Specifically, I worked on efficient state coding of partially coded asynchronous systems, with performance of the final circuit as the driving cost function. This work could be extended to include concurrency reduction, particularly methods of altering system timing to remove state coding violations without completely removing concurrency.


Previous Industry Work

I started my career at Hewlett-Packard in their microprocessor design lab. There I developed large-scale automated physical design tools. I took projects from requirements gathering through software architecture to deployment and support. This gave me my first exposure to performance-sensitive irregular applications, as many of these algorithms involved traversing circuit topologies. After HP, I worked for Intel. Initially, I wrote timing-aware power reduction tools for server processors. I then had the opportunity to exercise my architectural background and transitioned from software development to working as a microprocessor performance architect. I worked primarily on the Itanium® Processor Family (IPF) and on larger Xeon servers. I also worked on defining and using the Performance Monitoring Units found in Itanium® chips. Over my career, I worked on two HP PA-RISC® processors (8700/8800), 5 Itanium® processors, 1 Xeon® processor and 1 Atom® processor.

Publications


Class Reports


Service


Coursework