Differences

This shows you the differences between two versions of the page.

--- assignments:assignment5 [2016/10/17 19:25]
asa [Submission]
+++ assignments:assignment5 [2016/10/17 19:30]
asa [Part 2: Exploring neural networks for digit classification]
@@ Line 16: / Line 16: @@
 In this segment of the assignment we will explore classification of handwritten digits with neural networks.  For that task, we will use part of the [[http://yann.lecun.com/exdb/mnist/ |MNIST]] dataset, which is very commonly used in the machine learning community.
 Your task is to explore various aspects of multi-layer neural networks using this dataset.
+We have prepared a {{ :assignments:mnist.tar.gz |small subset}} of the data with given split into training and test data.
-For simplicity, use 25 percent of the data for evaluating network performance, and the rest reserve for training.
-Normalize the data by dividing the features by the maximum value, which will normalize them to the range [0,1] (since the minimum is 0).
-As a basis for your implementation use the neural network code I showed in class.
 Here's what you need to do:
+  * The code that was provided does not really have a bias for all but the first layer.   Modify the code so that it correctly uses a bias for all layers.  This part is only worth 5 points, and you can do the rest of the assignment with the original version of the code.
   * Plot network accuracy as a function of the number of hidden units for a single-layer network with a logistic activation function.  Use a range of values where the network displays both under-fitting and over-fitting.
   * Plot network accuracy as a function of the number of hidden units for a two-layer network with a logistic activation function.  Here, also demonstrate performance in a range of values where the network exhibits both under-fitting and over-fitting.  Does this dataset benefit from the use of more than one layer?
   * Add weight decay regularization to the neural network class you used (explain in your report how you did it).  Does the network demonstrate less over-fitting on this dataset with the addition of weight decay?
   * The provided implementation uses the same activation function in each layer.  For solving regression problems we need to use a linear activation function to produce the output of the network.  Explain why, and what changes need to be made in the code.
-The code that was provided does not really have a bias for all but the first layer.  For 5 extra points, modify the code so that it correctly uses a bias for all layers.

CS545 fall 2016

User Tools

Site Tools

Differences

Page Tools