Empirical Evaluation of Search and Evolutionary Algorithms on Timetabling

Part B and C: Experiment Report and Short Feature Description Due Thursday

Part D: Votes Due Tuesday

For this assignment, you are required to step back and analyze what you have done so far for this course. Computer scientists are superb at informal experimentation; it really is at the root of debugging. Hypothesize what is wrong, fix it and see what happens. Now you are going to practice formal experimentation. This should help prepare you for your project as well.

Not surprisingly, you will be evaluating the timetabling solvers that have been developed. The intent is to learn more about their performance relative to each other.

To support your second experiment, each student will be asked to determine the best known objective function values for 2 problem instances. The problem instance set will comprise the 24 test cases posted for assignment 3 plus enough additional test cases so that each student gets 2. We will post the additional test cases HERE. You will be sent email on 3/13 with which 2 instances you need to determine from the instance set.

It would be tempting to post poor best known solutions and compare to them. Certainly, such values could make a solver look good! To motivate everyone to find really good values, simply submitting plausible values for the objective functions earns 4 points per instance (2 points per objective function). To earn 6 points, you need to submit a really good values that cannot be bested.

We do not know the optimal values for the problems in the instance set. So I am allowing other students to submit new best known values for extra credit. If you submit a better objective function value (with a verifiable solution!) than what is posted, you earn 1 point per problem instance/objective function up to 5 points total, with no more than one try per problem. (So you can't submit slots=200, then 199, then...) An assignment will be set up on Canvas for this extra credit; it will be due March 28 at noon with no late period. Note: only one attempt per student will be allowed, so you will need to include all of the instances and objective functions (up to 5) in a single submission.

When you submit a baseline, you also need to say how you determined it (cite a website or paper or provide a solution in verifiable form.)

- For each algorithm A being compared,
- For each parameter setting P of A,
- For each testing data set D, Run A on D collecting observations O
- Compute performance metrics M from observations O

- Compare performance on settings P for algorithm A

- For each parameter setting P of A,
- Compare performance across set A on best P for each A using statistical tests for significance

You need to decide on the settings of everything else (P, D, O, and
M) and describe and justify them in your report. From assignment 2 and
Part A,
you now have access to a large number of
problem instances, although you are welcome to go outside that set
to others on the web or that you have generated.
You then need to run the experiment as you have designed it,
collecting the data (O). Your report **must** include table(s)
either presenting raw or summarized data.

The last part is to compare the performance as measured by M (which may be the same as O or computed from O) using some statistical tests. This has two parts: First, decide on a hypothesis that compares the two. Second, run a statistical analysis to test the hypothesis. Your report should present summaries of the results/data, should state the hypothesis and report the results of the statistical test (including why you picked the test you did). I'm not expecting statistical sophistication; for simple statistics, you can run them in Microsoft excel or use basic R functionality (tutorial available here).

The characteristic must be computable from the problem input. Additionally, its value should vary across the test cases in the instance test set (see Part A).

Given the characteristic, you need to run an experiment to determine
how your algorithm's performance varies as the characteristic
value varies. For this experiment, the independent variable is
the problem set. The dependent variable will be
the objective functions (as a standin for solution quality). You will run your algorithm for five minutes
(on the capital machines)
on each of the test cases and record the objective function values
of the solution found. **Note:** You should run your algorithm
with one of the settings for objective function but report both
objective function values from that run.

The objective functions need to be converted into a performance metric. In this case, the metric will be percentage difference from the baseline value for each problem in the instance set from Part A. The best known solution values for each of the problems will be posted in the same directory as the instance set in a file to be made available after Part A is turned in. We will update the best known values as soon as we get them. In your results, you should list the baseline values that you used with the date that you gathered them.

**CHANGE AS OF MARCH 16**
You need to produce a csv (comma separated value) file which has five
columns: algorithm, problem
name, characteristic value and percentage from optimal for snuber of
slots and percentage from optimal for student costs. The algorithm
name will be the same in every line.
The file
should have 53 lines: a header line and one for each problem. The
header line should be

algorithm, instance, characteristic-name, percent-from-optimal-slots, percent-from-optimal-costswhere

You will be following a set experimental procedure, but you do have three degrees of freedom. First, you can do multiple runs of your algorithm and report the average, if you wish, in the file. If you chose to do so, the data you analyze can also include the multiple runs.

Second, you are required to explore a single characteristic. If you wish to explore more than one, that is acceptable as long as you add columns to the result file with the last column still being the required performance metric.

Third, you can include both algorithms if you like.

Your report for this experiment must describe your characteristic, explaining why you thought it would be informative. The most important part of this report will be analyzing and discussing the data to show how your characteristic predicts or explains the performance. You can try to model the percentage from optimal based on the characteristic value or you can bin the results into sets describing gradations from good to poor performance and show the relationship between the characteristic and the bins.

- Experiment 1: [34 pts]
- Experiment Design [9 pts]
- Data table(s) [5 pts]
- Hypothesis [5 pts]
- Statistical analysis [5 pts]
- Explanation/conclusions [5 pts]
- Quality of writing [5 pts]

- Experiment 2: [34 pts]
- Description of problem characteristic [4 pts]
- Experiment Design [5 pts]
- Data table(s) [5 pts]
- Hypothesis [5 pts]
- Statistical analysis [5 pts]
- Explanation/conclusions [5 pts]
- Quality of writing [5 pts]

- Output file: [10 pts]

You need to submit a short description of the characteristic you developed for experiment 2 (you can simply submit what you wrote up for Experiment 2). You should indicate why you thought it would be valuable and summarize how it well it characterized performance based on your data.

You will need to submit three votes. So you will turn in something that looks like:

- yellow
- red
- green

- For
*Part A*, you should submit a single`mylastname-best.csv`file that includes for each of your assigned problems: the problem instance name, the number of slots, the total student cost, and the source. It should be a comma separated value file (meaning each field has a comma between it and the next field). - For
*Part B*, you should submit two files. Your first file should be your experiment report (both experiments) named`mylastname-exp.pdf`. For the first experiment, it should have the following sections in the report:- Your experiment design (i.e., the parameter settings of the experiment) with justifications for why you chose the values you did to test (e.g., what problems did you chose, what metrics...)
- Data table(s): what values of performance metrics did you observe under what conditions
- A statistical analysis of the data: what were you comparing and how did you determine if there was a statistically significant difference (state the hypothesis, the test and the results)

- Feature: A description of your characteristic(s)
- Results: summary of the data
- Analysis: what relationship you found between the characteristic and the observed performance? How you substantiated the relationship (e.g., statistics, visualization)? Provide evidence in support of your conclusions.

`mylastname-featurename.csv`which follows the requirements described in the "Experiment 2" section. - For
*Part C*, your submitted file should contain the plain text description in a .text file named`mylastname-desc.text`. *Part D*should be your votes as described in the Part C section.