Fall 2008 HW 0: Timing Programs and Writing Reports
Objectives
This homework assignment is intended to give an
introduction to the empirical analysis of programs, and to writing
good reports. By the end of this assignment, you should be familiar with
timing a program in C, plotting the results with Gnuplot, and
communicating those results clearly and concisely in a report.
The objectives are (i) to measure the running time of a number of programs
for a range of input data sizes; (ii) to observe and plot this data, (iii)
to make hypotheses about the nature of the functions that model the data;
(iv) to test your hypotheses by fitting analytic functions to the data;
and (v) to report your findings.
Tools, concepts, and provided code
This homework is to be
performed on the Linux machines in the CS department lab. You may use the
gnuplot utility for plotting your data and writing your
report. Here is a
gnuplot manual
and
here's another useful introductory article (18 page pdf with lots of
figures). Here is a list of commands
that will be useful for your assignment.
If you are more familiar with other tools for fitting curves to data,
plotting and visualizing the results (e.g., Matlab, R, Mathematica, Excel,
etc.) you are free to use them, but please remember that the tool should,
at a minimum, be capable of doing a linear least squares fit to a set of
data (i.e., finding the best straight line passing through a set of
data).
You do not have to write the programs, we have already collected a large
database of running times of a range of programs.
We have also provided an access function, tfun that lets you
query the database of running times, for different programs, input sizes
and numbers of runs. The executable is in
~cs475/provided/Homeworks/HW0/tfun. It queries the database and
reports running times. It takes three arguments: the first one specifies
which program (data-set) to report; the second is the value of the
independent variable (the input size, n for the first seven
data-sets and p, the number of processors, for the last two); and
the third argument is the number of independent runs to report. The
default executable is compiled for 64bit linux machines. An Alternative,
32-bit executable is in ~cs475/provided/Homeworks/HW0/tfun32.
Description
You need to call tfun with the
appropriate arguments to generate your data. Then, you should import this
into Gnuplot (or your favorite data analysis tool), generate plots of the
running time as a function of the input argument, after first determining
the average & standard deviation of multiple runs. For the first report,
due on Sunday, August 31 at midnight, it will suffice to take the average
of three (3) runs.
After making a few test plots, you should estimate (hypothesize) an
analytic function for each of the data-sets and validate your hypothesis
by doing a least-squares fit of the data.
What you must turn in
By Sunday, August 31 midnight, you
must turn in a tar file containing the following:
- Your data. Be sure to save the output as ascii text
file(s). You may find the linux utility script useful for this
purpose.
- Your gnuplot script/commands that you used analyzing and
plotting the data, as well as for determining least-squares fits (if you
use some other tool, please submit the script/commands that you used,
and make sure it's appropriately commented).
- A soft copy of your report. Remember that this is the most
important part of this homework, since 90% of your
grade will be based on it. The other elements are
supplementary and are intended to (1) back up the claims you make in
the report, and (2) allow other people to duplicate your experiments.
The tar file should be turned in using Checkin.
Good luck and have fun!
Last Modified: 09/19/2007