CS475 Lab 7: Plotting and analyzing data

Objectives

This lab is intended to give an introduction to the empirical analysis of programs, and to writing good reports. The objectives are (i) to measure the running time of a number of programs for a range of input data sizes; (ii) to observe and plot this data, (iii) to make hypotheses about the nature of the functions that model the data; (iv) to test your hypotheses by fitting analytic functions to the data; and (v) to report your findings.

Tools, concepts, and provided code

This lab is to be performed on the Linux machines in the CS department lab. You may use the gnuplot utility for plotting your data and writing your report. Here is a gnuplot manual and here's another useful introductory article (18 page pdf with lots of figures). Here is a list of commands that will be useful for your assignment. If you are more familiar with other tools for fitting curves to data, plotting and visualizing the results (e.g., Matlab, R, Mathematica, Excel, etc.) you are free to use them, but please remember that the tool should, at a minimum, be capable of doing a linear least squares fit to a set of data (i.e., finding the best straight line passing through a set of data). We DO NOT want you to use any other more sophisticated data fitting tool for this lab.

You do not have to write the programs, we have already collected a large database of running times of a range of programs. We have also provided an access function, tfun that lets you query the database of running times, for different programs, input sizes and numbers of runs.

	Usage: tfun fun_type N num_trials 
fun_type :specifies which program (data-set) to report Valid values are integers 1..9
N :specifies the input size for the first seven data-sets and the number of processors for the last two data-sets. Valid values are integers. Floating values are truncated. For fun_type=1,2,3 and 5, N should be a multiple of 100.
num_trials :specifies the number of independent runs to report.
The default executable is compiled for 64bit linux machines.

Description

  1. You need to call tfun with the appropriate arguments to generate your data.
  2. Determine the average & standard deviation of multiple runs(limit to 5) for a given input.
  3. Import this data into Gnuplot (or your favorite data analysis tool), generate plots of the running time as a function of the input argument.
  4. After making a few test plots, you should estimate (hypothesize) an analytic function for each of the data-set.
  5. Validate your hypothesis by doing a least-squares fit of the data.

Good luck and have fun!

Last Modified: 11/07/2016