
CS 540, Spring 2013: Assignment 2
Evolutionary Algorithms and ATSP

Part 1: Test case Due Monday 2/25/2013 at noon
Questions (Part 2) and Code (Part3) Due Tuesday 3/5/13 at noon
For this assignment, you are required to implement an evolutionary
algorithm
to solve the same problem as in assignment 1: asymmetric TSPs.
As before, you have some latitude in your
selection of language and algorithm. For language, if you wish to
choose a language other than C, C++, Java, Python, Lisp or whatever
you were allowed to use in assignment 1, you must obtain
instructor's permission.
For algorithm, the primary restriction is that you pick an algorithm
covered in class or described in the readings in the evolutionary
computation section of class (2/14-2/28). Otherwise,
you have a fair amount of latitude in the design of your algorithm, e.g.,
initialization, recombination operators, selection... You must write
your own code for this; you are not allowed to download a GA
implementation (e.g., Genitor). If you are in doubt about whether you
are straying too far, just ask. You can't hybridize this solution
with the search solution from assignment 1.
The input and output formats are identical to that in assignment 1, as
are the requirements for what to turn in.
As before, we will restrict the problem size, in this case to a
maximum number of locations of 600. We also will be running your code
on the veggie machines; code will be restricted to running on a single
core.
More Problems
As in assignment 1, more test problems will be added with you
contributing to this effort. However, you
will not be required to fill out our Colorado problem further! For
grading of the programs, we will
pick test problems from among those submitted. We ask you
to find or create a new test problem (in the required format) and
submit it. Specifics:
- You cannot submit a problem from TSPLIB.
- The problem can be either TSP or ATSP.
- If your source includes information on optimal or best known
solution, please include it!
- All viable problems (i.e., correct format, not duplicated)
will be shared with the rest of the class. In fact, as problems are
received and checked, they will be posted. Duplicate problems will not be accepted. So if someone else has found it
first and it has been posted, they get the credit and you
don't. (Incentive for getting problems in early!)
- The problem must be in the assignment's input format. We should
be able to feed it directly to anyone's code.
- The problem must be non-trivial, i.e., more than 25
locations/cities.
- No more problems from
- http://people.sc.fsu.edu/~jburkardt/datasets/cities/
- http://www.tsp.gatech.edu/
If you have obtained the problem from elsewhere, indicate
the source (URL) in your submission.
Here they are:
- Steve Ridges: HA30 is a table of 30 cities and the
distances in hundreds of miles between them. I found this at:
City Distance Datasets
- Robert Redburn: USCA312 describes 312 cities in
the US and Canada. Distances between the city are computed from
latitude and longitude, not from road mileage. (Same URL source as
Steve Ridges) This is a symmetric TSP dataset. PNG plot of city locations,
Name
of each city.
This should be a challenging problem given the enormous solution space
possible with 312 cities. As the PNG image illustrates, there are
regions of varying density across the dataset with some tightly
packed data points where only small improvements to tour cost are
likely and others with a single data point far away from other
locations. It will be interesting to see if GA's are able to
optimally place the large edges and get through enough generations
to minimize the dense regions.
- Joel
Maple: sgb128
This problem was found on the same site as the other two.
-
Mike Crawford: ca4663
4663 Canadian
locations, obtained
at,
Optimal
value=1,290,319, optimal
tour
- Pablo
Bidwell: PabloBidwell,
problem generated from his own program.
- Kevin
Archer: atex5,
obtained
from, The atex5 test set is asymmetric with 72 nodes. I used it for testing my assignment 1 and was not able to get close to the known best solution.
I searched Google and the known best solutions are in the 5100 range.
- Jim
Anderson: it101,
Randomly selected subset of the it16862 dataset
found here. Distances
were randomly tweaked to render an ATSP dataset.
- Mike
Martin: martin.atsp,
Data set is for ATSP using 100 location. It was generated by me and
there is no known optimal solution.
- Abhilash
Hazarika: 192Cities.atsp,
This is an ATSP data set of 192 cities. The data set has been
generated by using a code written by me. So this has no known
optimal solution set.
- Avinash
Pallapu: 100cities.atsp,
new ATSP data of dimension 100 which is self generated using python
- Fereydoon
Vafaei: nfl32.tsp,
Source: Challenge Travelling Tournament Instances,
This TSP dataset contains the distances between 32 cities that the
teams in the National Football League come from. The map of these 32
cities can be seen here. The dataset has been used to solve
Travelling Tournament Problem(not TSP) several times, and there are
solutions and lower bound found for TTP.
- Brian
Merrill: xql662_converted.tsp, This is a VSLI set from http://www.tsp.gatech.edu/vlsi/index.html#XQL662
This set was originally in EUC_2D format. I wrote a small Java program to convert it to EXPLICIT (i.e. FULL_MATRIX) edge weight format.
The best known tour length is 2513.
- Michael
Jones: xpr2308.tsp, XPR2308 is a symmetric data set taken from http://www.tsp.gatech.edu/vlsi/index.html. It contains 2308 locations based on a VLSI design, and has an optimal solution of 7219.
- Jeremy
Freed: Freed_Assignment2.atsp,
I wrote a program to randomly generate a set of 100 locations around a
1000x1000 unit grid. I then just used the Pythagorean Theorem to
calculate distances. To make it asymmetric, I made the east-to-west
distances 1.3 times longer than west-to-east.
- Shwetha
Gowdanakatte: galaxy40,
It is a converted tournament problem.
Referred --
--Url
- Michael
McCann: MJM,
I created this ATSP problem using a Python program I have written. I
don't have an optimum solution for the problem.
- Malgorzata
Urbanska: city200,
obtained
from
- Mike
Childs: chemreact.atsp, A good (not optimum) route is 813.96
I generated this from a real random number generator from random.org,
which uses atmospheric noise. It is highly asymmetric, which I
thought was something you might find in chemical reactions (eg,
energy required going from co2 to c o2 might be much higher than the
reverse), but it is just randomness.
- Tim
Cline: nodes600,
The attached data set contains weights for 600 nodes
(locations). The weight from a source node to the sink node is a
random value from 0 to 1000. The distance back from the sink to the
source is the same distance +/- 25% (the +/- is random). I generated
this with a program that I created. There is no known optimal
solution.
- Brock
Wilcox: lotr123,
This is a set of locations for events occurring in Lord of the
Rings, extracted from the
LOTRProject map. The
raw data I used can be
seen
here. I used the x,y coordinates given, not converting to a
surface distance, so these are long/lat degree based weights I
think. You know... in case you find yourself in middle earth and
want to optimally tour the interesting historical sights (traveling
by air).
- Scott
Goodwyn: nCube,
I created this data set. nCube.atsp is a dimension 512 problem. It's
based on finding the shortest circuit through the 512 vertices of an
9-dimensional hypercube. It is well-known that such Hamiltonian
paths exist, and if the edged weight between adjascent vertices is
1, there exists a minimal circuit of length 512 (or 2 to the power
of the dimension - the number of vertices). That is, one can visit
all vertices exactly once, going only from neighbor to neighbor. At
least one path can easily be constructed, but all such minimal paths
are not known for arbitrary size. All vertices have a 0 or 1 for
each of their indices, and neighbors only differ in one index. In
the classic hypercube graph, there are no paths from one vertex to
another except through immediate neighbors. But to make this problem
interesting, I created random longer paths as alternatives from
every vertex to every other vertex. All these paths are much longer
than 1, from 15-45 in fact. They are much more costly to use, and
each is greater than the cost to traverse the cube from one corner
to its opposite. I picture the added paths as long tunnels that go
outside and around the hyper cube to get to the other vertices. I'm
running it right now on my atsp solver (from assignment 1), and so
far the shortest path I've got is 557. It's taking a looong time!
- Ryan
Friese: dc563,
dc563 is modeled from a table compression application, and was
supplied by AT&T labs. I found this data set (and the resulting
paper) at http://www2.research.att.com/~dsj/chtsp/atsp.html. The
best known optimal for this problem is 25951.
- Matt
Klein: santa600,
Here is a subset of the "Traveling Santa Problem" from a recent
Kaggle competition
(http://www.kaggle.com/c/traveling-santa-problem). The original data
set included 150,000 (!) points, but I took just the first 600,
since the full data set stored as doubles would take about 90 GBs of
memory. This is a standard euclidean tsp data set.
- Anthony
Navarro: small_santa_dist,
Traveling Santa
URL: http://www.kaggle.com/c/traveling-santa-problem
This is a very large problem (originally 150,000) so I had to scale it
down a little. The data set is now 1000 entries. It is basically a
list of locations Santa would visit.
- Chao
Tian: coral,
This is a matrix of pairwise geographic distances between 35 sampled locations used in the paper ‘Range-wide population genetic structure of the Caribbean sea fan coral, Gorgonia ventalina" by Andras JP, Rypien KL, & Harvell CD published in Molecular Ecology in 2012. The dataset was downloaded from
http://datadryad.org/handle/10255/dryad.42919
- Charlie
Wahlquist: WahlquistFull, Randomly generated 600 member ATSP problem.. To make it asymmetric all distances from b->a are modified by a multiplier of 1.2.
Technically, Mike Crawford got in the first problem. But his was so
big that the download failed at my home. So I did not wish to make it
the problem everyone had to solve.
Questions
Point value for each question is listed with it.
- 7 points For the problem that you submitted, state what
type of problem it is (e.g., TSP/ATSP, cities, circuits...) and explain
where you got it or how you constructed it. Why do you think it is a
good test problem? Do you think it is/will be challenging? Why or
why not?
- 8 points Describe your algorithm and its implementation, explaining why
you designed it
as you did, citing any relevant literature or pilot experiments you
did to tune your implementation.
- 10 points Find and read a peer reviewed, published research paper on
applying an evolutionary algorithms to TSP
(either ATSP or TSP). Provide the citation, summarize the paper and
describe what you learned from it. Did it influence your design
(either positively or negatively)? Did they make a compelling case
for their approach?
Each answer is required to be at least 1/2 page in length.
What to hand in
You can submit everything electronically. You should
submit three files via RamCT by the due date/time for the
assignment.
- Your first file should be your new test problem. It should be submitted via RamCT as
Part 1.
- Your second file should contain the written answers to your
questions in ASCII, PDF or PS. It should be submitted via RamCT as
Part 2. Please name your file mylastname.{pdf,ps,txt}
- Your third file should be submitted as Part 3 in
RamCT. It is your code and accompanying files. You must name your submission file
mylastname.{zip,tar,gzip}. When the files are extracted, the
object file to be called should be no more than one directory down
from the tar or whatever file. The file (tarfile or zip) should include:
- Output from two runs, one for the full Colorado problem and
one from the first problem submitted by a student, in ASCII.
- the source code.
- A README file describing exactly how to compile and run your code and how
to set the parameters, if you have any. The README should include a
line that can be cut-and-pasted into the instructor's script for
running as batch.
- And if appropriate, a makefile for compiling your code and/or a
script file for running it.
Note: Your code MUST accept input in exactly the format
specified. Your code MUST produce an output file whose name can
be specified (no hardcoding!) as an argument and be in exactly the
format specified in this document. An automated test script will be
used to run your code and validate your answers. If your format does
not match, you will lose all points for program correctness and
quality of results!