Colorado State University CS540: Artificial Intelligence (Spring, 98)
Computer Science Department
Colorado State University

Assignment 1: Neural Networks
Assigned January 30
Due: February 16
NTU Due: March 2

In this assignment you will develop and use software for training neural networks and hand in a written report about your work. You will work with two data sets, one is a two-bit exclusive-or (XOR) problem and the second is data about housing values in Boston.

XOR Data

The XOR problem consists of two binary inputs and one output. The desired mapping ( input -> output ) is (0 0) -> 0, (1 0) -> 1, (0 1) -> 1, (1 1) -> 0.

You must develop a program that generates this data and trains a two-layer neural network to approximate the desired output. You may start with the code linked to at the end of this assignment.

Use your software to estimate how the likelihood that the neural network learns a good approximation varies with the number of hidden units in the single hidden layer. To do this you must train networks with 0, 1, 2, 3, 5, and 10 hidden units. Try 3 or 4 different learning rates and 2 momentum rates (0 and 0.9) for each. Perform 30 runs for each, resulting in a total of 3 learning rates x 2 momentum rates x 10 runs x 6 architectures = 360 runs. Now pick the best combination of learning and momentum rates for each network architecture. Plot the number of converged runs versus the number of hidden units.

Write at least three double spaced pages of text about what you did and the results you found. Paste your graphs into your document. Attach a copy of your software at the end of the report for this assignment.

Boston Housing Data

Now for a real problem. From http://www.ics.uci.edu/~mlearn/MLSummary.html, I have downloaded the Boston housing data, which comes from the Statlib library at Carnegie Mellon University. The data consists of 506 samples of data containing 13 features of housing areas and the housing value in that area. The 13 features are
  1. per capita crime rate by town,
  2. proportion of residential land zoned for lots over 25,000 sq.ft.,
  3. proportion of non-retail business acres per town,
  4. Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  5. nitric oxides concentration (parts per 10 million)
  6. average number of rooms per dwelling
  7. proportion of owner-occupied units built prior to 1940
  8. weighted distances to five Boston employment centres
  9. index of accessibility to radial highways
  10. full-value property-tax rate per $10,000
  11. pupil-teacher ratio by town
  12. 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
  13. % lower status of the population
The housing value, the 14th number in each sample, is the median value of owner-occupied homes in $1000's.

I have randomly divided the data into training, validation, and test sets. The training set has 336 samples and the validation and test sets contain 85 samples each. I have also normalized the data so that each of the first 13 numbers has a mean of zero and a standard deviation of 1/3. The 14th number, which is the desired output, is in the range from 0 to 1. You can retrieve the data from these links:

Your assignment is to try to fit this data as well as you can with a neural network. To do this you must modify your neural network code to implement the early-stopping method discussed in class. Use your program to find the best sized network, meaning a small number of hidden units beyond which the test error is not much better.

Plot the test error versus the number of hidden units. For each point, plot the average test error for that sized network over 30 runs. Also plot, or draw by hand, the 95% confidence interval as a vertical bar on each point. The formula for a 95% confidence interval is 1.9 * std / sqrt(n) where std is the standard deviation and sqrt(n) is the square root of the number of samples. This assumes the samples are from a normal (Gaussian) distribution. Are any of the differences statistically significant? Which ones?

Are the error distributions really normal? Draw (by hand if you like) a frequency histogram of the testing set RMS errors for the 30 runs using the network you have decided is best. Does the histogram look normal?

Don't knock yourself and your computer out running experiments with every possible network size. Just do enough that you get a nice picture of the testing set RMS error vs. number of hidden units.

Write at least five pages of double-spaced text describing what you did for this assignment, the results you found, and your interpretation of them. Attach a copy of the code you wrote and used for this assignment. The code must include detailed comments in the forward-pass and back-propagation sections. Include enough comments to show me that you fully understand the code. Your results must include the plots you have produced. Your interpretation must include a plot of the actual and predicted housing values for the test data using the best network. This will help you visualize how well the network performed. In your discussion, you should also consider the magnitudes of the weight values in the hidden units. Does one input tend to have higher magnitude weights than the other inputs? This means that that input has greater predictive value for predicting housing values. Try to interpret the meaning of this by referring back to what each input represents. You could verify your finding by generating a scatter plot of the true housing value versus the value of that input and look for a correlation.

Code

You may use some of this code as a starting place. The C code for the XOR problem is the best place to start:
Copyright © 1998 Chuck Anderson
anderson@cs.colostate.edu