![]() |
Machine Learning Fall 2009 Department of Computer Science | ![]() |
g_n,j = e^(y_n,j) / (e^(y_n,1) + e^(y_n,2) + ... + e^(y_n,K))
This change must be made everywhere the output is calculated.
Target values must be indicator variables. Be very careful with class labels. Digits are from 0 to 9, but class indices will be from 1 to 10.
The function to be minimized by the Scaled Conjugate Gradient algorithm must return the negative log likelihood of the data, plus the mean squared weight values.
The gradient function doesn't change from the previous assignment, except for the calculation of the outputs. Also, both the V and the W gradient calculations now must include the weight penalty for all weights. Be sure the sign of your gradient calculation is correct. It should be the negative of the true gradient, to make scg() ascend the gradient.
Implement weight magnitude penalty of all weights.
Test your code by applying it to a very simple data set of your own construction, such as inputs being the integers 1 to 10 and the correct class being 1 for integers less than 6 and 2 for integers greater than 6. Report the results of your simple test.
Download the hand-written zip code training and testing data available at website for the
book The
Elements of Statistical Learning, by Hastie, Tibshirani and
Friedman, by clicking on the "Data" entry on the left-side menu.
Write R code to partition the training data into a training and
validation set. Extract some fraction, say 20%, of training samples from
each class (digit), for the validation set. So now you have three
sets of data: from zip.train you have training and
validation sets, and from zip.test you have the testing set.
Train an LDA classifier on the training set and report the classication accuracy on training, validation, and testing sets. To make LDA work, you will probably have to preprocess with PCA (principal components analysis) or add a small amount of noise to all images. You should get at least 0.4 of the test digits correctly classified by the LDA classifier.
Now train your logistic regression neural network on the training set, using 10 hidden units and a lambda of 0.001. Report the classification accuracy on training, validation, and testing sets.
Find good values for the number of hidden units and lambda that maximize the classification accuracy on the validation set. Show graphs of the results that clearly show where the better values are.
Try to interpret what the hidden units are learning by drawing images of the hidden unit weights after training with 10 to 20 hidden units. Do you see any patterns in these weights that make sense?
Code to perform the drawing and image preparation is
in digitsInteractiveStart.R. This
code updates the display of the 16 x 16 image every 50 recording mouse
coordinates. You might need to install the cairoDevice
package first.
Describe how your code works, and include figures of examples. Show examples that are successfully classified and ones that are not successfully classified.
CS545: Assignment 6 Name: ________________________
Grade: ___ out of 100 points
======================================================================
Neural Network Code (30 points total).
(20 points): Correct R code for logistic regression neural network
with weight magnitude penalty.
( 5 points): Discussion of simple test.
======================================================================
Classifying Zip Code Data with LDA (20 points total).
( 5 points): Correctly download, read, and prepare zip code data.
( 5 points): Correct code for partitioning data into training,
validation, and testing sets.
( 5 points): Correct application of LDA to zip code data.
( 5 points): Observations and discussion.
======================================================================
Classifying Zip Code Data with Logistic Regression Neural Network (30 points total).
( 5 points): Results of training with 10 hidden units and lambda of 0.001.
(10 points): Search for good values for the number of hidden units
and lambda. Describe the procedure and the results. Use figures.
(10 points): Display and interpretation of hidden unit weights.
( 5 points): Observations and discussion.
======================================================================
Interactive Digit Drawing and Classification (15 points total).
(10 points): Correct modification of supplied R code.
( 5 points): Observations and discussion.
======================================================================
Technical Writing Details. (5 points total)
( 5 points): Table of contents, heading and subheading structure
easy to follow and clearly divides report into logical
sections;
code, math, figure captions, and all other aspects of
report are well-written and formatted;
conclusion section included, describing what you learned
and which aspects were most difficult;
list of references and citations in report;
correct spelling and grammar.
======================================================================
Extra Credit. (1 point each)