CS 540, Spring 2015: Assignment 4
Empirical Evaluation of Search and Evolutionary Algorithms on Timetabling

Part A: Best objective Function Values Due Monday 3/23/2015 at noon
Part B and C: Experiment Report and Short Feature Description Due Thursday 4/2/2015 at noon
Part D: Votes Due Tuesday 4/7/15 at noon

Best known objective function values
Best known solutions
Best known values in csv file
Description of Features

For this assignment, you are required to step back and analyze what you have done so far for this course. Computer scientists are superb at informal experimentation; it really is at the root of debugging. Hypothesize what is wrong, fix it and see what happens. Now you are going to practice formal experimentation. This should help prepare you for your project as well.

Not surprisingly, you will be evaluating the timetabling solvers that have been developed. The intent is to learn more about their performance relative to each other.

Part A: Best Objective Function Values [12 points, plus up to 5 points extra credit]

It can be difficult to compare search algorithm performance on problem sets because of large differences in the solution quality metric. To normalize differences, often the metric value is compared to some baseline value and reported as a % from the baseline. The optimal value is ideal, but it is not always known. Alternatively, the lower (or upper for maximization) bound or the best known value is used.

To support your second experiment, each student will be asked to determine the best known objective function values for 2 problem instances. The problem instance set will comprise the 24 test cases posted for assignment 3 plus enough additional test cases so that each student gets 2. We will post the additional test cases HERE. You will be sent email on 3/13 with which 2 instances you need to determine from the instance set.

It would be tempting to post poor best known solutions and compare to them. Certainly, such values could make a solver look good! To motivate everyone to find really good values, simply submitting plausible values for the objective functions earns 4 points per instance (2 points per objective function). To earn 6 points, you need to submit a really good values that cannot be bested.

We do not know the optimal values for the problems in the instance set. So I am allowing other students to submit new best known values for extra credit. If you submit a better objective function value (with a verifiable solution!) than what is posted, you earn 1 point per problem instance/objective function up to 5 points total, with no more than one try per problem. (So you can't submit slots=200, then 199, then...) An assignment will be set up on Canvas for this extra credit; it will be due March 28 at noon with no late period. Note: only one attempt per student will be allowed, so you will need to include all of the instances and objective functions (up to 5) in a single submission.

When you submit a baseline, you also need to say how you determined it (cite a website or paper or provide a solution in verifiable form.)

Part B: Experiment Report [78 points]

You must design and run an experiment on programs from assignments 2 and 3, analyze the results and articulate observations/conclusions from the data. The purpose is two fold: first, to give you some practice in designing and running formal comparative experiments, and second, to help you learn more from what you did in assignments 2 and 3. You will be required to design and run 2 experiments.

Experiment 1: Comparative Performance Experiment

For this you need to follow the "Canonical AI Comparison Experiment Protocol" (for non-Machine Learning) from the lecture notes.
  1. For each algorithm A being compared,
    1. For each parameter setting P of A,
      1. For each testing data set D, Run A on D collecting observations O
      2. Compute performance metrics M from observations O
    2. Compare performance on settings P for algorithm A
  2. Compare performance across set A on best P for each A using statistical tests for significance
In this case, you have two algorithms (set A): one from assignment 2 and one from assignment 3. We have asked a few students to allow us to post their code from assignment 2. Object code will be posted on the CS department machines at ~cs540/pub/TimeTablingCode. You can chose to use one of them or your code from assignment 2.

You need to decide on the settings of everything else (P, D, O, and M) and describe and justify them in your report. From assignment 2 and Part A, you now have access to a large number of problem instances, although you are welcome to go outside that set to others on the web or that you have generated. You then need to run the experiment as you have designed it, collecting the data (O). Your report must include table(s) either presenting raw or summarized data.

The last part is to compare the performance as measured by M (which may be the same as O or computed from O) using some statistical tests. This has two parts: First, decide on a hypothesis that compares the two. Second, run a statistical analysis to test the hypothesis. Your report should present summaries of the results/data, should state the hypothesis and report the results of the statistical test (including why you picked the test you did). I'm not expecting statistical sophistication; for simple statistics, you can run them in Microsoft excel or use basic R functionality (tutorial available here).

Experiment 2: Problem Impact Experiment

This experiment requires a bit more creativity. Here you need to think about a characteristic of the problems that allows you to predict/explain the performance of one of the two algorithms you compared in Experiment 1. For example, it could be that your algorithm did well until the number of courses exceeded some threshold or that it does better on randomly generated problems from students than on instances taken from a repository. Note: you may not use these characteristics, i.e. number of courses, number of students or student generated versus from repository.

The characteristic must be computable from the problem input. Additionally, its value should vary across the test cases in the instance test set (see Part A).

Given the characteristic, you need to run an experiment to determine how your algorithm's performance varies as the characteristic value varies. For this experiment, the independent variable is the problem set. The dependent variable will be the objective functions (as a standin for solution quality). You will run your algorithm for five minutes (on the capital machines) on each of the test cases and record the objective function values of the solution found. Note: You should run your algorithm with one of the settings for objective function but report both objective function values from that run.

The objective functions need to be converted into a performance metric. In this case, the metric will be percentage difference from the baseline value for each problem in the instance set from Part A. The best known solution values for each of the problems will be posted in the same directory as the instance set in a file to be made available after Part A is turned in. We will update the best known values as soon as we get them. In your results, you should list the baseline values that you used with the date that you gathered them.

CHANGE AS OF MARCH 16 You need to produce a csv (comma separated value) file which has five columns: algorithm, problem name, characteristic value and percentage from optimal for snuber of slots and percentage from optimal for student costs. The algorithm name will be the same in every line. The file should have 53 lines: a header line and one for each problem. The header line should be

algorithm, instance, characteristic-name, percent-from-optimal-slots, percent-from-optimal-costs
where characteristic-name is the name of your characteristic. You will have to compute the characteristic value for each of the problems. The file name should be mylastname-characteristicname.csv

You will be following a set experimental procedure, but you do have three degrees of freedom. First, you can do multiple runs of your algorithm and report the average, if you wish, in the file. If you chose to do so, the data you analyze can also include the multiple runs.

Second, you are required to explore a single characteristic. If you wish to explore more than one, that is acceptable as long as you add columns to the result file with the last column still being the required performance metric.

Third, you can include both algorithms if you like.

Your report for this experiment must describe your characteristic, explaining why you thought it would be informative. The most important part of this report will be analyzing and discussing the data to show how your characteristic predicts or explains the performance. You can try to model the percentage from optimal based on the characteristic value or you can bin the results into sets describing gradations from good to poor performance and show the relationship between the characteristic and the bins.

Rubric for Part B:

Part C: Short characteristic/Feature Descriptions [5 points for just submitting]

In assignments 2 and 3, the last 10 points were awarded based on relative performance. This time no ranking or relative points will be awarded. Instead, you have a chance for up to 5 points of extra credit.

You need to submit a short description of the characteristic you developed for experiment 2 (you can simply submit what you wrote up for Experiment 2). You should indicate why you thought it would be valuable and summarize how it well it characterized performance based on your data.

Part D: Voting for the Characteristics [5 points for voting, up to 5 points extra credit to be awarded based on votes]

Each person will be asked to read the other students' descriptions and nominate 1st, 2nd and 3rd places. You can judge by how compelling or well justified is their argument or how creative, i.e., which one appears to be the best feature. You will not be allowed to vote for yourself. All of the statements will be placed on the web site, anonymized as colors. So your votes will be cast by the color assigned to each.

You will need to submit three votes. So you will turn in something that looks like:

  1. yellow
  2. red
  3. green
Extra credit points will be assigned in rank order based on sums of votes received weighted by category (1st, 2nd, 3rd). Top vote getter gets 5 points, next gets 4, then 3, then 2 and then 1. So 5 students will earn extra credit.

What to hand in

You should submit four parts via Canvas by the due date/time for the assignment. In the parts below, the filenames should have your last name (as in "mylastname") as part of them because we will be downloading them to a central repository.
  1. For Part A, you should submit a single mylastname-best.csv file that includes for each of your assigned problems: the problem instance name, the number of slots, the total student cost, and the source. It should be a comma separated value file (meaning each field has a comma between it and the next field).
  2. For Part B, you should submit two files. Your first file should be your experiment report (both experiments) named mylastname-exp.pdf. For the first experiment, it should have the following sections in the report:
    1. Your experiment design (i.e., the parameter settings of the experiment) with justifications for why you chose the values you did to test (e.g., what problems did you chose, what metrics...)
    2. Data table(s): what values of performance metrics did you observe under what conditions
    3. A statistical analysis of the data: what were you comparing and how did you determine if there was a statistically significant difference (state the hypothesis, the test and the results)
    For the second experiment, the report should have the following sections:
    1. Feature: A description of your characteristic(s)
    2. Results: summary of the data
    3. Analysis: what relationship you found between the characteristic and the observed performance? How you substantiated the relationship (e.g., statistics, visualization)? Provide evidence in support of your conclusions.
    The second file should be the output from the second experiment named mylastname-featurename.csv which follows the requirements described in the "Experiment 2" section.
  3. For Part C, your submitted file should contain the plain text description in a .text file named mylastname-desc.text.
  4. Part D should be your votes as described in the Part C section.