CS 540, Spring 2013: Assignment 2
Evolutionary Algorithms and ATSP

Part 1: Test case Due Monday 2/25/2013 at noon
Questions (Part 2) and Code (Part3) Due Tuesday 3/5/13 at noon

For this assignment, you are required to implement an evolutionary algorithm to solve the same problem as in assignment 1: asymmetric TSPs.

As before, you have some latitude in your selection of language and algorithm. For language, if you wish to choose a language other than C, C++, Java, Python, Lisp or whatever you were allowed to use in assignment 1, you must obtain instructor's permission.

For algorithm, the primary restriction is that you pick an algorithm covered in class or described in the readings in the evolutionary computation section of class (2/14-2/28). Otherwise, you have a fair amount of latitude in the design of your algorithm, e.g., initialization, recombination operators, selection... You must write your own code for this; you are not allowed to download a GA implementation (e.g., Genitor). If you are in doubt about whether you are straying too far, just ask. You can't hybridize this solution with the search solution from assignment 1.

The input and output formats are identical to that in assignment 1, as are the requirements for what to turn in.

As before, we will restrict the problem size, in this case to a maximum number of locations of 600. We also will be running your code on the veggie machines; code will be restricted to running on a single core.

More Problems

As in assignment 1, more test problems will be added with you contributing to this effort. However, you will not be required to fill out our Colorado problem further! For grading of the programs, we will pick test problems from among those submitted. We ask you to find or create a new test problem (in the required format) and submit it. Specifics: If you have obtained the problem from elsewhere, indicate the source (URL) in your submission.

Here they are:

  1. Steve Ridges: HA30 is a table of 30 cities and the distances in hundreds of miles between them. I found this at: City Distance Datasets
  2. Robert Redburn: USCA312 describes 312 cities in the US and Canada. Distances between the city are computed from latitude and longitude, not from road mileage. (Same URL source as Steve Ridges) This is a symmetric TSP dataset. PNG plot of city locations, Name of each city. This should be a challenging problem given the enormous solution space possible with 312 cities. As the PNG image illustrates, there are regions of varying density across the dataset with some tightly packed data points where only small improvements to tour cost are likely and others with a single data point far away from other locations. It will be interesting to see if GA's are able to optimally place the large edges and get through enough generations to minimize the dense regions.
  3. Joel Maple: sgb128 This problem was found on the same site as the other two.
  4. Mike Crawford: ca4663 4663 Canadian locations, obtained at, Optimal value=1,290,319, optimal tour
  5. Pablo Bidwell: PabloBidwell, problem generated from his own program.
  6. Kevin Archer: atex5, obtained from, The atex5 test set is asymmetric with 72 nodes. I used it for testing my assignment 1 and was not able to get close to the known best solution. I searched Google and the known best solutions are in the 5100 range.
  7. Jim Anderson: it101, Randomly selected subset of the it16862 dataset found here. Distances were randomly tweaked to render an ATSP dataset.
  8. Mike Martin: martin.atsp, Data set is for ATSP using 100 location. It was generated by me and there is no known optimal solution.
  9. Abhilash Hazarika: 192Cities.atsp, This is an ATSP data set of 192 cities. The data set has been generated by using a code written by me. So this has no known optimal solution set.
  10. Avinash Pallapu: 100cities.atsp, new ATSP data of dimension 100 which is self generated using python
  11. Fereydoon Vafaei: nfl32.tsp, Source: Challenge Travelling Tournament Instances, This TSP dataset contains the distances between 32 cities that the teams in the National Football League come from. The map of these 32 cities can be seen here. The dataset has been used to solve Travelling Tournament Problem(not TSP) several times, and there are solutions and lower bound found for TTP.
  12. Brian Merrill: xql662_converted.tsp, This is a VSLI set from http://www.tsp.gatech.edu/vlsi/index.html#XQL662 This set was originally in EUC_2D format. I wrote a small Java program to convert it to EXPLICIT (i.e. FULL_MATRIX) edge weight format. The best known tour length is 2513.
  13. Michael Jones: xpr2308.tsp, XPR2308 is a symmetric data set taken from http://www.tsp.gatech.edu/vlsi/index.html. It contains 2308 locations based on a VLSI design, and has an optimal solution of 7219.
  14. Jeremy Freed: Freed_Assignment2.atsp, I wrote a program to randomly generate a set of 100 locations around a 1000x1000 unit grid. I then just used the Pythagorean Theorem to calculate distances. To make it asymmetric, I made the east-to-west distances 1.3 times longer than west-to-east.
  15. Shwetha Gowdanakatte: galaxy40, It is a converted tournament problem. Referred -- --Url
  16. Michael McCann: MJM, I created this ATSP problem using a Python program I have written. I don't have an optimum solution for the problem.
  17. Malgorzata Urbanska: city200, obtained from
  18. Mike Childs: chemreact.atsp, A good (not optimum) route is 813.96 I generated this from a real random number generator from random.org, which uses atmospheric noise. It is highly asymmetric, which I thought was something you might find in chemical reactions (eg, energy required going from co2 to c o2 might be much higher than the reverse), but it is just randomness.
  19. Tim Cline: nodes600, The attached data set contains weights for 600 nodes (locations). The weight from a source node to the sink node is a random value from 0 to 1000. The distance back from the sink to the source is the same distance +/- 25% (the +/- is random). I generated this with a program that I created. There is no known optimal solution.
  20. Brock Wilcox: lotr123, This is a set of locations for events occurring in Lord of the Rings, extracted from the LOTRProject map. The raw data I used can be seen here. I used the x,y coordinates given, not converting to a surface distance, so these are long/lat degree based weights I think. You know... in case you find yourself in middle earth and want to optimally tour the interesting historical sights (traveling by air).
  21. Scott Goodwyn: nCube, I created this data set. nCube.atsp is a dimension 512 problem. It's based on finding the shortest circuit through the 512 vertices of an 9-dimensional hypercube. It is well-known that such Hamiltonian paths exist, and if the edged weight between adjascent vertices is 1, there exists a minimal circuit of length 512 (or 2 to the power of the dimension - the number of vertices). That is, one can visit all vertices exactly once, going only from neighbor to neighbor. At least one path can easily be constructed, but all such minimal paths are not known for arbitrary size. All vertices have a 0 or 1 for each of their indices, and neighbors only differ in one index. In the classic hypercube graph, there are no paths from one vertex to another except through immediate neighbors. But to make this problem interesting, I created random longer paths as alternatives from every vertex to every other vertex. All these paths are much longer than 1, from 15-45 in fact. They are much more costly to use, and each is greater than the cost to traverse the cube from one corner to its opposite. I picture the added paths as long tunnels that go outside and around the hyper cube to get to the other vertices. I'm running it right now on my atsp solver (from assignment 1), and so far the shortest path I've got is 557. It's taking a looong time!
  22. Ryan Friese: dc563, dc563 is modeled from a table compression application, and was supplied by AT&T labs. I found this data set (and the resulting paper) at http://www2.research.att.com/~dsj/chtsp/atsp.html. The best known optimal for this problem is 25951.
  23. Matt Klein: santa600, Here is a subset of the "Traveling Santa Problem" from a recent Kaggle competition (http://www.kaggle.com/c/traveling-santa-problem). The original data set included 150,000 (!) points, but I took just the first 600, since the full data set stored as doubles would take about 90 GBs of memory. This is a standard euclidean tsp data set.
  24. Anthony Navarro: small_santa_dist, Traveling Santa URL: http://www.kaggle.com/c/traveling-santa-problem This is a very large problem (originally 150,000) so I had to scale it down a little. The data set is now 1000 entries. It is basically a list of locations Santa would visit.
  25. Chao Tian: coral, This is a matrix of pairwise geographic distances between 35 sampled locations used in the paper ‘Range-wide population genetic structure of the Caribbean sea fan coral, Gorgonia ventalina" by Andras JP, Rypien KL, & Harvell CD published in Molecular Ecology in 2012. The dataset was downloaded from http://datadryad.org/handle/10255/dryad.42919
  26. Charlie Wahlquist: WahlquistFull, Randomly generated 600 member ATSP problem.. To make it asymmetric all distances from b->a are modified by a multiplier of 1.2.
Technically, Mike Crawford got in the first problem. But his was so big that the download failed at my home. So I did not wish to make it the problem everyone had to solve.

Questions

Point value for each question is listed with it.
  1. 7 points For the problem that you submitted, state what type of problem it is (e.g., TSP/ATSP, cities, circuits...) and explain where you got it or how you constructed it. Why do you think it is a good test problem? Do you think it is/will be challenging? Why or why not?
  2. 8 points Describe your algorithm and its implementation, explaining why you designed it as you did, citing any relevant literature or pilot experiments you did to tune your implementation.
  3. 10 points Find and read a peer reviewed, published research paper on applying an evolutionary algorithms to TSP (either ATSP or TSP). Provide the citation, summarize the paper and describe what you learned from it. Did it influence your design (either positively or negatively)? Did they make a compelling case for their approach?
Each answer is required to be at least 1/2 page in length.

What to hand in

You can submit everything electronically. You should submit three files via RamCT by the due date/time for the assignment.
  1. Your first file should be your new test problem. It should be submitted via RamCT as Part 1.
  2. Your second file should contain the written answers to your questions in ASCII, PDF or PS. It should be submitted via RamCT as Part 2. Please name your file mylastname.{pdf,ps,txt}
  3. Your third file should be submitted as Part 3 in RamCT. It is your code and accompanying files. You must name your submission file mylastname.{zip,tar,gzip}. When the files are extracted, the object file to be called should be no more than one directory down from the tar or whatever file. The file (tarfile or zip) should include:
Note: Your code MUST accept input in exactly the format specified. Your code MUST produce an output file whose name can be specified (no hardcoding!) as an argument and be in exactly the format specified in this document. An automated test script will be used to run your code and validate your answers. If your format does not match, you will lose all points for program correctness and quality of results!