Fall 2011 HW 0: Timing Programs and Writing Reports


Objectives

This homework assignment is intended to give an introduction to the empirical analysis of programs, and to writing good reports. By the end of this assignment, you should be familiar with timing a program in C, plotting the results with Gnuplot, and communicating those results clearly and concisely in a report.

The objectives are to (i) analyze some data-sets and deduce the analytical functions that match some measured data-sets, and (ii) report this in a simple concise form. Rather than using sophisticated packages for data analysis, we want you to go to basics, and use the simple, yet efective techniques described in the lectures. Essentially, you should massage the data and plot it so that it appears as a straight line. We also want you to use thses plots in your reports.

This homework assignment will be done in two passes, the first one due on Tuesday, Aug 30 and the second revision will be due mid septemebr, based on feedback on the first pass. For this reason, the first pass does not require you to do analyze all the functions.

Tools, concepts, and provided code

This homework is to be performed on the Linux machines in the CS department lab. You must use the gnuplot utility for plotting your data and writing your report. Here is a gnuplot manual and here's another useful introductory article (18 page pdf with lots of figures). Here is a list of commands that will be useful for your assignment.

Even if you are more familiar with other tools for fitting curves to data, plotting and visualizing the results (e.g., Matlab, R, Mathematica, Excel, etc.), we want you to use Gnuplot for this assignment. A tutorial of the use of this tool is available in Lab1. We DO NOT want you to use any other more sophisticated data fitting tool for this assignment.

You do not have to write the programs, we have already collected a large database of running times of a range of programs.

We have also provided an access function, tfun that lets you query the database of running times, for different programs, input sizes and numbers of runs. The executable is in ~cs475/provided/Homeworks/HW0/tfun. It queries the database and reports running times. It takes three arguments: the first one specifies which program (data-set) to report; the second is the value of the independent variable (the input size, n for the first seven data-sets and p, the number of processors, for the last two); and the third argument is the number of independent runs to report.

Description

You need to call tfun with the appropriate arguments to generate your data. Then, you should import this into Gnuplot, generate plots of the running time as a function of the input argument, after first determining the average & standard deviation of multiple runs. For the first report, due midnight, Tuesday, Aug 30, it will please take the average of three (3) runs. Also, you need to report on only the firt five programs (data sets, i.e., the first argument to tfun is 1 through 5).

After making a few test plots, you should estimate (hypothesize) an analytic function for each of the data-sets and validate your hypothesis by doing a least-squares fit of the data.

What you must turn in

By midnight, Tuesday, Aug 30, you must turn in a soft copy of your report (as a pdf file). Remember that this is the most important part of this homework, since 90% of your grade will be based on it the report. The other elements are supplementary and are intended to (1) back up the claims you make in the report, and (2) allow other people to duplicate your experiments. When you redo the HW0, we will ask for these additional materials. You must submit the report using the submission features of RamCT.

Good luck and have fun!


Last Modified: 09/19/2007