CS540: Artificial Intelligence (Spring, 98)You must develop a program that generates this data and trains a two-layer neural network to approximate the desired output. You may start with the code linked to at the end of this assignment.
Use your software to estimate how the likelihood that the neural network learns a good approximation varies with the number of hidden units in the single hidden layer. To do this you must train networks with 0, 1, 2, 3, 5, and 10 hidden units. Try 3 or 4 different learning rates and 2 momentum rates (0 and 0.9) for each. Perform 30 runs for each, resulting in a total of 3 learning rates x 2 momentum rates x 10 runs x 6 architectures = 360 runs. Now pick the best combination of learning and momentum rates for each network architecture. Plot the number of converged runs versus the number of hidden units.
Write at least three double spaced pages of text about what you did and the results you found. Paste your graphs into your document. Attach a copy of your software at the end of the report for this assignment.
I have randomly divided the data into training, validation, and test sets. The training set has 336 samples and the validation and test sets contain 85 samples each. I have also normalized the data so that each of the first 13 numbers has a mean of zero and a standard deviation of 1/3. The 14th number, which is the desired output, is in the range from 0 to 1. You can retrieve the data from these links:
Your assignment is to try to fit this data as well as you can with a neural network. To do this you must modify your neural network code to implement the early-stopping method discussed in class. Use your program to find the best sized network, meaning a small number of hidden units beyond which the test error is not much better.
Plot the test error versus the number of hidden units. For each point, plot the average test error for that sized network over 30 runs. Also plot, or draw by hand, the 95% confidence interval as a vertical bar on each point. The formula for a 95% confidence interval is 1.9 * std / sqrt(n) where std is the standard deviation and sqrt(n) is the square root of the number of samples. This assumes the samples are from a normal (Gaussian) distribution. Are any of the differences statistically significant? Which ones?
Are the error distributions really normal? Draw (by hand if you like) a frequency histogram of the testing set RMS errors for the 30 runs using the network you have decided is best. Does the histogram look normal?
Don't knock yourself and your computer out running experiments with every possible network size. Just do enough that you get a nice picture of the testing set RMS error vs. number of hidden units.
Write at least five pages of double-spaced text describing what you did for this assignment, the results you found, and your interpretation of them. Attach a copy of the code you wrote and used for this assignment. The code must include detailed comments in the forward-pass and back-propagation sections. Include enough comments to show me that you fully understand the code. Your results must include the plots you have produced. Your interpretation must include a plot of the actual and predicted housing values for the test data using the best network. This will help you visualize how well the network performed. In your discussion, you should also consider the magnitudes of the weight values in the hidden units. Does one input tend to have higher magnitude weights than the other inputs? This means that that input has greater predictive value for predicting housing values. Try to interpret the meaning of this by referring back to what each input represents. You could verify your finding by generating a scatter plot of the true housing value versus the value of that input and look for a correlation.