Fall 2009 HW 0: Timing Programs and Writing Reports


Objectives

This homework assignment is intended to give an introduction to the empirical analysis of programs, and to writing good reports. By the end of this assignment, you should be familiar with timing a program in C, plotting the results with Gnuplot, and communicating those results clearly and concisely in a report.

The objectives are (i) to measure the running time of a number of programs for a range of input data sizes; (ii) to observe and plot this data, (iii) to make hypotheses about the nature of the functions that model the data; (iv) to test your hypotheses by fitting analytic functions to the data; and (v) to report your findings.

Tools, concepts, and provided code

This homework is to be performed on the Linux machines in the CS department lab. You may use the gnuplot utility for plotting your data and writing your report. Here is a gnuplot manual and here's another useful introductory article (18 page pdf with lots of figures). Here is a list of commands that will be useful for your assignment.

If you are more familiar with other tools for fitting curves to data, plotting and visualizing the results (e.g., Matlab, R, Mathematica, Excel, etc.) you are free to use them, but please remember that the tool should, at a minimum, be capable of doing a linear least squares fit to a set of data (i.e., finding the best straight line passing through a set of data).

You do not have to write the programs, we have already collected a large database of running times of a range of programs.

We have also provided an access function, tfun that lets you query the database of running times, for different programs, input sizes and numbers of runs. The executable is in ~cs475/provided/Homeworks/HW0/tfun. It queries the database and reports running times. It takes three arguments: the first one specifies which program (data-set) to report; the second is the value of the independent variable (the input size, n for the first seven data-sets and p, the number of processors, for the last two); and the third argument is the number of independent runs to report. The default executable is compiled for 64bit linux machines.

Description

You need to call tfun with the appropriate arguments to generate your data. Then, you should import this into Gnuplot (or your favorite data analysis tool), generate plots of the running time as a function of the input argument, after first determining the average & standard deviation of multiple runs. For the first report, due on Wednesday, September 2 at midnight, it will suffice to take the average of three (3) runs.

After making a few test plots, you should estimate (hypothesize) an analytic function for each of the data-sets and validate your hypothesis by doing a least-squares fit of the data.

What you must turn in

By Wednesday, September 2, midnight, you must turn in a tar file containing the following: The tar file should be turned in using Checkin.

Good luck and have fun!


Last Modified: 08/27/2009