Homework 0: Pick your Poison and your Medicine
The main goal of this class is to understand the mathematical foundations of
the polyhedral model, a formalism that is used for program optimization in the
broadest sense. The techniques are agnostic about the target, which may range
from distributed memory parallel processors, shared-memory, multicore
processors, GPUs, or even application specific VLSI. However, in order to
drive the points home, you need to choose one. This semester, your choies
are:
Remember, our goal is not to teach you the target. Rather, it is a
prerequisite. What we will do with HW0 is to re-introduce and review this
background for you. Moreover, we will also drive home the main point of the
exercise: performance optimization. You need to pick the target (your
medicine).
Systolic Sorter
This is a combination of a paper-and-pencil, analytical assignment, and a
programming problem. For the first, you should write up your solution using
LaTeX (see the Sprin 2012 Assignemnts page and the files
cs560-template.tex, cs560.bib, and plot.pdf for HW0.
Recall the architecture of a P-processor sorter, used to sort N
inputs. First consider the case when P=N, and then extend your
analysis to the case for N=kP, where k is some integer (this
case will require k passes through the array). Some questions to think
of:
- How to load and unload the array. Sure, pushing in MAXINT after the
data is good, but after that how do you reuse the architcture? With a global
reset? When can this occur? So, analyze the total execution time for the
architcture to completely finish the k passes (note that there will be
idle time between the passes).
- How many cycles (clock ticks) does it take to get the last data element
out of the array (for one-pass as well as for the multi-pass case)?
- For the one-pass case:
- How many cycles dos it take till the FIRST output appears?
- How many cycles dos it take till the first PROCESSOR is done?
- How many cycles dos it take till the LAST output appears?
- When does the i-th processor START?
- When is the i-th processor DONE?
- For the k-pass case, the essential question that you
MUST figure out is how to "tightly" interleave multiple passes, without a
global reset (hint: use the control sgnal). Once that is done, need to think
of similar questions as above.
- For the k-pass case, derive a formula for the total execution time
to sort the entire, N-element array (express this as a function of only
N and k, and eliminate P). Then, deduce the number of
passes that minimizes this function. Coment on the answer you get. Does it
make sense? Did your intuition tell you that this would be the case? Why?
- Starting from this Alphabets program, derive the
systolic architecture that we studied in class. Your derivation should choose
a schedule function, an allocation function, the resulting CoB, and then the
transformed program which can be interpreted as a description of the
architecture. Can you systematically derive the I/O (i.e., the loading and
unloading of the array) and the control signals needed for it?
Implement your design as BlueSpec design. Your circuit should be able to take
p and n as parameters, and should implement both the interface
module (in the Bluespec testbench) and also a verifier that actually checks
for the correctness of ht inouts. Please test the circuit thoroughly.
Submit a tar file of your complete implementation including all the Makefiles
needed for testing your system via email to Sanjay
Sanjay Rajopadhye 2012-07-09