CS540 Feature Descriptions

bottom

Feature Description
almond My basic hypothesis is that the size of the search space is the prime factor in a problem's difficulty. The characteristic chosen to represent the problem difficulty can be calculated through the following steps. First, take the "Room Capacity"(R) and "Number of Timeslots"(T) and multiply them together. This is the best indication of how large the search space is. A large T indicates that there are many possible timeslots for each course. A large R means that for each timeslot you could fit more courses in each timeslot, again resulting in a larger search space. There are several things that can restrict the search space. To account for these, R*T will be divided by restrictions to the search space.
The "Average Course Enrollment" (E) restricts the search space. A large E means that fewer courses can fit in each timeslot on average. E is calculated by summing the number of total students in every course and dividing by the number of courses. This also takes into account several other factors. First, if the number of students is high and number of courses is low then the search space is reduced since the likelihood of course conflicts is higher. Alternatively, if the enrollment number is low relative to the number of courses then there will likely be fewer course conflicts. Finally, if the number of courses is proportional to the number of enrollments then the denominator of the overall characteristic will approach one, indicating that it does not affect the characteristic when they are proportional. So the characteristic can be calculated as ((R*T) / E) and let's call it the search-space characteristic.
When comparing the difference to the best known value with the characteristic value it seems that the characteristic serves as a good predictor of the difficulty of the problem. An exponential curve matches well with the exponential curve seen in the "difference to the best known" data. The characteristic matches quite well on the high and the low side but has some significant spikes in the middle. There are two main problems that are not well represented by the search-space characteristic. I believe this misrepresentation is due to these problems having large search spaces but many of the schedules in the search space have values relatively close to the best known. This factor is very difficult to account for without doing a random sampling of the search space. Based on this, I think it is fair to say that my original hypothesis that the size of the search space is the prime factor in a problem's difficulty is correct.
aqua

The conflict density represent the percentage of examinations with common students among the various examinations. A Conflict Matrix C was defined where each element Cij = 1 if exam i conflict with exam j (have common students), or Cij = 0 otherwise. The conflict density represents the ratio between the number of elements of value "1" to the total number of elements in the conflict matrix. The goal is to predict, if the value of conflict density affect the performance of algorithm.

aquamarine The characteristic, slots_needed is an estimation of how many time slots are needed to minimize the student costs. The rationale is based on the formula for calculating total student costs, wehre Overnight Assignment Penalty is preferred if student costs are to be minimized. Therefore, the assumption is that, the greater slots_needed is, the better student costs are.

Given a course list, crs_list, I first calculate the average enrollment per course:

avg_c = mean(crs_list),

Based on avg_c, I then calculate how many courses can fit into a time slot, max_c:

crs_per_slot = max_c / avg_c

Assuming 50% of the courses assigned to the same slot are conflicted, thus crs_per_slot needs to be revised in order to take into account of the conflict:
crs_per_slot = 0.5 * max_c/avg_c

Finally, the characteristic, or slots_needed is obtained:

slots_needed = ceil(max_t/crs_per_slot)

characteristic = slots_needed

beige The Characteristic that I chose to investigate, was the number of courses that conflicted with one other. I defined a conflict as two classes that have the same student in it. This is a conflict, as these two classes can not be scheduled to take place during the same time period, as this would violate one of the hard constraints of the scheduling problem. I made sure in computing the number of conflicts that there were no duplicates, meaning that any two given courses can only conflict with each other once. The number of conflicts is a good indicator of the difficulty of the instance to solve, but it doesn't strictly correlate to the difficulty. This is because if a istance has many conflicts, but also many timeslots, the conflicts don't effect the problem as much. The reverse is true also.
black In A2 and A3, our objective is to optimize either the number of slots, or the cost. In both cases, the maximum number of exams that can be put into a slot will be dictated by two factors, one exam in that slot, and the amount of conflict that exam has with other unassigned exams. So characteristics describing a measurement of conflicting exams can be informative about a given instance. As a way of quantifying this measurement, I selected a characteristics called Exam Conflict Density.

Exam Conflict density is defined as the following:

Exam Conflict density = sum from exam=1 to n(total number of conflicting exams for exam i)/ (total number of given exams)^2

This was selected as a characteristics to compare, because when we want to create the most packed slot schedule, the slots with the exams which have high amount of conflict with other exams tend to have low number of exams possible. How the algorithm handles the conflicting exams can effect the overall structure of the packed structure. So intuitively, there could be a relationship between a measurement of conflicting exams in that schedule, and the algorithm performance.

We had a hypothesis that there is a relationship between our chosen characteristics (Exam Conflict density) and performance of the algorithm.
We had two performance metrics, percent from best known slots and percent of best-known costs. Our Statistical analysis revealed that exam conflict density has a
low linear correlation with percent from best-known slots, which means that with the increase in performance degradation has a very loose linear relationship with Exam conflict density.

The t-test results showed that when performance metric is assumed to be percent of best known slots, we can not reject the null hypothesis that there is no relationship between exam conflict density and performance of the algorithm with a 95% confidence level.

Performance variation against percent of best-known costs were also statistically analyzed. The Scatter plot showed that the relationship between percent of best-known costs and exam conflict density has less of a linear characteristics and more of a inverted exponential relationship.
The statistical analysis showed that with the increase in exam conflict density, percent of best-known costs decreases, and the relationship is very loosely linear (correlation coefficient is near 0 ).

The t-test results also showed that when performane is percent of best known costs, we can not reject the null hypothesis that there is no relationship between exam conflict density and performance of the algorithm with a 95% confidence level.
blue Description:
While gathering the necessary input files for analysis for this experiment I noticed that the test cases with higher number of students registered imply higher density schedules which are harder to schedule.So, I thought that there is a relationship between the maximum number of students registered,room capacity and the percentage deviation from the baseline.In order to prove or reject my hunch I set up my statistical analysis to find out if I was right or wrong.
When I was running my experiments both for assignment parts A and B, I noticed that there might have been an inverse relationship between the number of time-slots and the student cost when optimizing.It almost seemed like they were in some cases proportionate to one another.So, I looked at the problem files that we got such as .crs and .stu and thought that maybe there is a some sort of a ratio that can define how good my best time-slot and student cost solver with the best parameter setting will perform based on the room capacity and the maximum students registered.To explore this idea further I looked at some of the defining aspects of the problem such as maximum number of students registered and the room capacity as both of these variables have an impact on how sparse or how tight the resulting schedule might be.
Results:

To my amusement the results obtained definitely supported the observations that I made whiling running my test cases.I did not expect to get such as high positive correlation.In order for my algorithm to do good(same as the baseline) the maximum number of students to the room capacity ratio should not exceed 0.47 for the time-slot optimization and 0.13 for the student cost optimization.However,as it can be seen from the table ~\ref{table:table_3} the findings do make sense.As the ratio between the max number of students registered and the room capacity approaches 1 meaning that the room is filled it gets harder and harder to squeeze a new class schedule in requiring more time for the algorithm to rearrange, minimizing the time-slot or student costs.Since, I ran each test case for only 5 minutes my best solver($SLS$) was requiring more and more time minimize the time-slots and the student cost functions as the problems were getting more and more compact. As the student-per-course ratio increases like in the case of the denseRoster.crs where the student-per-course ratio is 25 my calculated ratio gets really close to 1,to the value of 0.921739 being more specific.Also, a good example is rand-gen6.crs for which my ration was 1, meaning that the number of students registered and the room capacity are equal.However, the trick with the rand-gen6.crs case is that each student is enrolled in at least 2 to 10 courses which makes it harder to schedule.What was really interesting was that fact that after plotting the ratio values and percentage deviation for time-slots and the student cost function I was able to get almost identical plots.
brown Description of the Characteristics : Each problem instance consists of * .crs and *.stu file. The value of the characteristic should be computable from these .crs and .stu files. For this experiment Ratio of Maximum number of time-slots and Total number of courses is considered as the characteristic of the test cases.

Reasons behind choosing this characteristics : There are certain reasons behind choosing this characteristic for the test cases (which are provided during this assignment). The SLS and GA solvers used for Experiment 1 are developed by me during assignment 2 and 3 respectively. There are certain heuristics used for designing the solvers. Rather than constructing random initial solutions for both SLS and GA, a graph coloring or adaptive ordering heuristics is considered for construction of the initial solutions which is based on a Conflicting Matrix. The conflicting matrix is based on number of common students (i.e. conflicts) for each pair of courses of a problem instance. The heuristic orders courses in descending order of number of conflicts and constructs initial solutions consist of minimum number of slots and maximum number of slots for case 0 (i.e. optimizing number of time-slots) and case 1 (i.e. optimizing student cost) respectively. The schedule which provides minimum student cost function among the neighborhood of the initial solution for both case 0 and 1 is considered as an optimal solution. So for case 0, the solver is biased to devise a schedule of minimum number of time-slots which may vary from the best found solutions but for case 1, it is intended that my solver for both SLS and GA constructs a solution consists of maximum number of time-slots which is provided as an input (i.e. from *.crs file). The test cases among 52 test cases having ratio of maximum number of time-slots and total number of courses equals to 1 are supposed to have a schedule with all courses in different time-slots for case 1 (i.e. optimizing student costs) which in turn is intended to reduce student cost value as there is an inverse-proportional relation between student cost value and number of time-slots for a certain schedule.

So it is expected to obtain less % difference from best solutions for the group of test cases having characteristic ratio equals to 1 than for the group having ratio less than 1. Though the effect of duration of the experiment can't be denied as the solvers (both GA and SLS) are executed for only 5 minutes. It may not be possible to detect a global optimum solution within 5 minutes.


Experiments and Summarization of Results: For the experiment purpose, the test cases are divided into two groups. One group corresponding to the test cases satisfying the characteristic ratio equals to 1 and other group corresponding to characteristic ratio 0. The experiment result with supporting plots are provided in the .pdf file of Part B. Here I wish to depict only the tabular representation of some statistical results obtained by incorporating few trivial statistical measurements on each of the two groups of the test cases and for each of the objective functions.

For Genetic Algorithm (GA) :

Pearson’s Correlation Coefficient for the two groups on best parameter settings for GA :

Case 0 or 1 For group having ratio < 1 For group having ratio = 1
Case 0 : optimizing 0.9835467519060.999952990241
# time-slots

Case 1 : optimizing 0.8184220444290.792700533368
student-cost







Mean of % difference from optimal solutions for two groups on best parameter settings of GA :

Case 0 or 1 For group having ratio < 1 For group having ratio < 1
Case 0 : optimizing 13.58823529417.22857142857
# time-slots

Case 1 : optimizing 215.01703629916908.0690426
student-cost


Pearson’s correlation coefficient is obtained by incorporating Pearson’s correlation method on the best objective function values (for both case 0 and 1) received from the best solutions and found by my GA solver for each of the two groups of the test cases. For each of the two groups and for each of the objective function, % difference from best solutions are calculated and means of these series are listed before.

From the aforementioned two tables it can be stated that for the group of test cases having characteristic ratio 1 , there is higher correlation between objective function values received from best solutions and found by my GA solver for case 0 and just opposite for case 1. The very same argument can be derived from observing table of mean of % difference from optimal.

From the aforementioned two statistical measurements, it can be concluded that the test-cases having characteristic ratio 1 are near optimal for case 0 (i.e. optimizing number of time-slots) and the test-cases having characteristic ratio less than 1 are near optimal for case 1 (i.e. optimizing student costs).

The same comparison also performed for Stochastic Local Search (SLS). Detailed description of experiments with supporting evidences are illustrated in report of Part B.
copper-->conflicts
navy-->conflict index
Description of problem characteristic:-

Examination timetabling is a complex problem which has a set of exams to be taken by a given number of students. If every student is taking only one exam each then it becomes a straightforward case but this will make student cost meaningless. I decided to analyze the data-sets for conflicts for all courses as a distinct measure to differentiate between different data-sets and observe the performance of my algorithm.

Conflicts = Cumulative number of conflicts for all students

It can be calculated by taking the sum of courses taken by each student individually and subtracting one in each case as the base case is each student having only one course. I want to mitigate this case as this makes student cost objective void If we examine the base case where each student is enrolled in just one course that would lead to all students giving the exams together. Along with this all students will have student cost as zero as they don\92t have multiple exams. As 0-1=-1 which would result in an anomaly here and it is not contributing to the conflict measurement.

Another characteristic I used was conflict_index which is calculated by dividing conflicts by number of slots, this would normalize conflicts for each problem instance and give us a final ratio. The importance of this number can be seen with the increase in density of conflicts results makes the timetabling problem difficult. Also it explains an indirect relationship between the student cost and conflict_index as more free slots give a flexibility in scheduling exams and reduces the student cost.

Conflict_index = Total number of conflicts/ total number of slots available

In our timetabling problem multiple course exams can be scheduled together this makes it interesting for both objective functions. On one hand it gives the opportunity to squeeze the size of the schedule, but on other side conflict resolution should be done for each student taking the exam in that slot. For instance a student is enrolled in course 001, 006 and 010. Now these three courses cannot be scheduled in one slot even if there are no other common students amongst these courses. If only one exam was conducted in one slot then, conflict resolution would have been done only for that course rather than across all the students.
cyan For this experiment, I have considered course conflict density as a characteristics which is derived from number of courses having same students of a problem instance.
Course conflict density is calculated as summation of conflicting courses for each course divided by total number of courses.

Course conflict density gives information that it is possible or not to assign same time slot to the courses. If number of conflicting courses is less then it is possible to assign same slot to courses as long as room capacity is not violated. Such stacking courses on same time slots decreases number of time slots.

It is observed that if course conflict density increases number of time
slots increases as well as student cost increases to some extend. It is observed that student cost does not depend course conflict density solely, it also depends on number on students. As course conflict density increases it is difficult to assign courses with same time slot which increases number of time slots, Where as if we don’t have enough available time slots then it is difficult to scatter the courses along the slots to minimize student cost.
denim Characteristic:
For the ACO algorithm, given an instance D that is comprised of X courses and Y students, the process time Tss of single iteration, single ant scenario is informative because it provides a characteristic that can be used to address the following inquiries:
•Time for 1 ant to generate a local solution = Tss
•Time for an ant to generate a global solution over 1 iteration = Tss
•Time for K ants to generate a local best solution = Tss * K
•Time for K ants to generate a global best solution over N iterations = Tss * K * N
•Number of iterations in 300 seconds = ceil(300 / (Tss * K * N))
The characteristic enabled me to determine the number of iterations the algorithm could execute within a given time limit, which was to be used for validating the assumption that more iterations equate to better solutions (lower costs). See the Hypothesis section for detail.
gold In generating problems for Assignment 3, I found that the degree of difficulty
in solving problems with optimization for student cost could be adjusted in
several ways: decreasing timeslots available, increasing the number students,
increasing the number of courses each student is enrolled in, or decreasing
the limit on classroom size. Since each of these individually have a profound
effect, I decided to create a feature that incorporates all of these elements.
This feature, "enrollment density" can be computed by dividing the total
number of student enrollments by the product of the number of timeslots and
the classroom size where a student enrollment is counted for each course each
student is enrolled in. If classroom size or number of timeslots increases,
potentially making the problem easier to solve, the enrollment density
decreases. If the number of student enrollments increases, potentially making
the problem more difficult to solve, the enrollment density increases.
gray My problem characteristic is called a “trouble count”. It attempts to find the students and courses most likely to give the algorithm trouble. Troublesome students are those who are enrolled in the most courses. The first step here is to find how many courses the most troublesome student is enrolled in. Then, add up all the students taking at least 75% of that student’s course load. For example, if the most troublesome student is taking 10 courses, we want to know how many students are taking at least 8 courses. Similarly, troublesome courses to deal with are the ones with the highest enrollment. A similar method as students is employed – find the largest course, then count all courses with enrollments of at least 75% of that course. The final “trouble count” score is given by simply adding the counts of troublesome students and courses together. This characteristic is aimed primarily at predicting performance for the student cost objective function. By finding the most conflicted students and courses, it should provide an estimate of how difficult the problem is, inferring that more time may be required to obtain optimal values.
green The feature selected is a variation on a Conflict Density[1]. Conflict Density is calculated
by creating a conflict matrix for all classes in the schedule. The conflict density is then the
number of classes pairs that have a conflict divided by the total number of class pairs. The
variation I came up with is to take into account the number of students that are involved in
that conflict. The student conflict density then becomes the number of students involved in
class conflicts divided by the number of classes.
References
[1] R. Qu, E.K. Burke, B. McCollum, L.T.G. Merlot, and S.Y. Lee. A survey of search
methodologies and automated system development for examination timetabling. Journal
of Scheduling, 12(1):55–89, 2009.
lime The problem characteristic, Enrollment Load, is defined as the number of courses taken by the student with the most courses divided by the number of students who are taking more courses than the average. As the number of students of the average increase and the number of courses for the most heavily enrolled student decreases, this characteristic becomes a smaller value and vice-versa. The characteristic is indicative of the number of students who have abnormally high enrollments compared to the average. When the value of the characteristic is smaller, the students are more uniformly enrolled in courses. When the value is higher, a more students are enrolled in an abnormally large number of courses.
magenta The characteristic that was tested is called compatibility search space size. This metric uses a binary operator called compatibility.
This operator works on two courses and produces a boolean value. Two courses are considered to be compatible if the courses have no students in common.
By this definition the following properties are known. The operator is not reflexive. This means that a course is not compatible with itself.
The operator is symmetric. This means that if course A is compatible with course B then course B is compatible with course A.
The operator is not transitive. This means that if course A is compatible with course B and course B is compatible with course C then it does not have to be the case that course A is compatible with course C.
Now with the definition of compatibility, a compatibility set can be defined. Each course in the problem has a set of courses called the compatibility set.
Every member in this set is compatible with the course that owns the set. Because the compatibility operator is not reflexive the maximum size of this set is N \96 1 where N is the number of courses.
The metric compatibility search space size is defined as the sum of the sizes of the compatibility sets for all N courses divided by N * (N \96 1).
This metric will produce a value between 0 and 1. 0 is defined to be the smallest size possible and 1 is defined to represent the largest size possible.
The metric measures size relative to the number of courses.
How does this metric work? Let us examine an example a problem with 5 courses. Now assume the characteristic value is 0.
This means that every course is incompatible with every other course. This means that the only valid solutions exist when all five course happen at different time slots.
This is because incompatible courses on the same time slot violate the hard constraint that two courses cannot have common students on the same exam time.
Therefore, for the number of solutions that have to be explored for student cost is 5!. This is every permutation of the five courses.
Now assume that all courses are incompatible except for course A and course B being compatible with each other. This means the characteristic value is 2/20 which is 0.1.
This means the number of valid solutions is the same as the previous example plus the solutions when grouping course A and B on the same time slot. This produces 4! additional valid solutions.
On the other side of the spectrum, assume that all courses are compatible with each other. This means the characteristic value is 1.
This means the search spaces contains all solutions of size 5,4,3,2, and 1. There are 5! solutions of size 5. There is (5 choose 2) * 4! solutions of size 4.
For solutions of size 3 there exists different combinations to produce this size. A solution can have one time slot that has 3 courses and 2 time slots with one course.
The solution can also have 2 time slots that have 2 course and one time slot with one course. Then for each of these categories there are different permutations that need to be accounted for.
The same logic is needed to produce the number of solutions of size 2. There is only one solution that has all the courses on one time slot.
This search space of this example is significantly larger than the previous two.
maroon There was a lot of information present in the two input files (.crs and .stu) given to the algorithms. So I thought of finding a characteristic which could allow me to predict the performance of my algorithm which gave me the best results. I noticed few features present in the input files like

- maximum number of time slots
- total number of courses
- room capacity
- average density( which can be calculated by diving the sum of number of students present in each class with the total number of courses),
- total number of conflicts (two courses are said to have a conflict if there is a student who has taken both the courses)

I propose my characteristic of the problem as the ratio of product of total number of conflicts and average density to the product of total number of courses and room capacity. I thought this would help me in predicting the performance of my algorithm with the best parameter settings.

Characteristic = (total number of conflicts*average density)/(total number of courses*room capacity)

I call this characteristic the complex ratio.

I thought the above characteristic would be informative because I think if the total number of conflicts and average density are both relatively high for a particular problem then the ratio with room capacity and total number of courses might give us information about the difficulty of the test data set or problem. If the ratio is higher it might imply that the problem is tough and vice versa which could be due to multiple reasons like
- If average density of classes is relatively high then not many exams(students) could fit in a room
- If number of conflicts are relatively high then again it would difficult to conduct multiple exams at the same time

The above characteristic was able to predict the performance of the algorithm accurately for most of the test cases. However there were few cases where this charactersitic failed to predict the performance of the algorithm accurately.
melon Potential Conflict Density - Consider the number of potential conflicst to be total number of courses that each course shares with at least one student. Example: if 3 students enrolled in a course c1 are each taking one other course disjoint with each other and c1, then c1 has 3 potential conflicts. Potential Conflict Density is the total number of potential conlicts divided by a measure of the number possible student placements in the available timeslots: #Potential Conflicts/(Room Capacity x Max #Timeslots)
orange My characteristic is called "percent-possible-conflict." It is a measurement of how many students are enrolled in two classes that have the possibility of being assigned to the same timeslot. Please note the subtle language - my characteristic measures POSSIBILITY of conflict, not PROBABILITY of conflict.

This is calculated by finding the number of students whose course enrollment for two of their classes could fit in a room of the given capacity specified in the problem. So we iterate over all the students and sum every 2-combination of classes that they are enrolled in. If we find that the sum is less than the room capacity, this means that during state perturbation, it is possible that a perturbation could occur such that there is a student conflict. After iterating over all of the students, we can calculate the percentage of students who are enrolled in 2 or more classes that could possibly be assigned to a timeslot such that a conflict occurs.
purple The characteristic I created for the exam timetabling problem is called Concurrent Enrollment Chance or CEC. CEC is a numerical value greater than or equal to 0 that represents the chance that in any random solution a student is concurrently enrolled during a timeslot and therefore the solution is invalid. At the root this value aims to approximate the difficulty of timeslot assignment for a problem. It is calculated with the formula below.

CEC(n) = StudentCount(n) * ((AvgStudentEnroll(n) - 1) * (1 / MaxTimeslots(n)))

In this formula StudentCount is the total number of students, AvgStudentEnroll is the the average number of courses each student is enrolled in, and MaxTimeslots is the maximum number of timeslots allowed by the problem definition. The premise of the formula is to pick a timeslot for a student with that students first enrollment. Then for each additional enrollment accumulate the probability that a timeslot is assigned that's identical to the first. The resulting value is a rough probability that one student has a concurrent enrollment in a random assignment. Therefore we must multiply the probability by the number of students giving us what I'm referring to as the CEC value.

I think this is valuable because the ease of finding a solution is paramount to finding a good solution. My genetic algorithm struggled initializing the population and therefore didn't complete as many iterations of evolution as it could have otherwise. It occured to me that solution difficulty was the most important characteristic of a problem. While there are many attributes contributing to solution difficulty I think that disallowing concurrent student enrollment is the most difficult and modeled my characteristic after it.

In my testing a found a distinct corrolation between strong solutions and low CEC values. Unfortunately the adverse corrolation isn't always perfect. As the solutions got weaker CEC values became less predictable instead of stricly growing. The analogy I've found for this is like pouring water down a funnel. The weak solutions can be anywhere but as the water goes down the funnel all strong solutions have low CEC values.
red Charatersitic: Average number of students per course / capacity

Calculated by: (number of students / number of courses) / room capacity

Average number of students per course / capacity tells us how full the classrooms are compared to the rooms were they will be taking the exam.
The closer the ratio is to 1 the least we can pack different courses into one classroom at exam time, because the closer we are to one the closer
the average number of students per classroom is to the size of exam rooms.
silver-->conflict density
coral-->maximal degree of conflict
In context of the work I decided to investigate the two following characteristics:
(a) conflict density (also called average density) of the courses and (b) maximal degree of the conflict (in respect to courses).

The idea behind the characteristics based on the fact the courses and their dependencies can be represented as a graph.
The courses are represented as vertexes with edges between every pair of conflicting courses;
the conflicting courses are courses that have common students.
Based on the graph representation of the courses, we can calculate the characteristics:

* The conflict density.

We represent the graph as a matrix of courses (NxN). If there is a conflict between
to courses, the value of the corresponding element of the matrix set to one; otherwise the
element set to zero. The conflict density is calculated as: (number of 1)/(total number of elements in the matrix)

* Maximal degree of the conflict.

This characteristic is based on the same matrix (graph) representation of the courses. We
pick the column (representing the course) with maximal number of ones in it and calculate:
(number of 1 in the column)/(total number of elements in the matrix)

The choice of the characteristics is not random and based on the literature research.
Burke et al.[1] and Smith-Miles et al.[2] discussed different problem characteristics
and identified the above two characteristics as the primary candidates for problem difficulty characterization.
The characteristics based on the fact the course conflict is one of the primary constrains
that limits the course “mobility”. In other words, if a course has very large number of
conflicts it is very likely that it will be hard to place or move to an
alternative time slot. Particularly this is true for the case when we try to
reduce the number of allocated timeslots; such course conflict can lead to an
additional allocation of time slot. As a result the authors suggest to look at
these two characteristics. The conflict density represents the average of the
conflicts in the problem and the maximal degree represents the maximal rate of
the conflicts.

I calculated regression line and correlation
coefficient r-value. The r-value that is close to one indicates a high level
of the correlation, while zero indicated a low level of correlation. The
r-value for the conflict density is 0.112 and the r-value for the maximal
degree of density is 0.24. As a result, based on the statistical analysis we
may conclude that the maximal degree of density characteristic predicts (and
explains) better the algorithm performance for this particular dataset.

References:
[1] Edmund K. Burke, A. J. Eckersley, B. McCollum, S. Petrovic, and R. Qu.
Analysing similarity in examination timetabling. In Proceedings of the 5th
international conference on the Practice and Theory of Automated Timetabling
(PATAT), Pittsburgh, USA, August 18th-20th 2004, ISBN 0-88748-413-1, pages
89–106, 2004.
[2] Kate Smith-Miles and Leo Lopes. Measuring instance difficulty for
combinatorial optimization problems. Computers & Operations Research, July
2011.
violet The problem characteristic computed from the problem input for this experiment is a ratio of the number of course conflicts to the total number of time slots available for scheduling the courses. As each student is processed, each course the student is taking is added to their other courses as a conflict course (removing duplicates). At the end each course maintains a set of courses which, if scheduled together, would create a scheduling conflict. The sum of these sets across all courses gives the total number of course conflicts for the problem input. While this characteristic alone is appealing, it does not tell us enough about the problem since we could have a large number of time slots and do quite well in terms of student cost. To account for this, I divide the number of conflicts by the number of available slots to obtain a characteristic which attempts to describe the overall difficulty of getting a certain number of conflicts into the given time slots. This characteristic did poorly in characterizing performance based on the data with a correlation value of -0.141.
yellow The problem characteristic that allows me to predict the performance of my genetic algorithm is the average number of exams per time slot. This value is calculated by adding up the total number of exams that will be given and dividing by the number of time slots. When the average number of exams per time slots is higher my genetic algorithm performs better than average.
top