Assignment 4: Neural networks

Assignment 4: Neural networks

Due: October 30th at 11pm

Part 1: Activation functions

In this question we will explore the relationship between the tanh and logistic sigmoid as activation functions. Recall that $\sigma(x) = \frac{1}{1 + \exp(-x)}$ and $\tanh (x) = \frac{\exp(x) -\exp(-x)}{\exp(x) + \exp(-x)}$. Show that:

$\sigma(-x) = 1 - \sigma(x)$
$\tanh(x) = 2\sigma(2x) - 1$

Now for the interesting part: Given a network that uses tanh as the activation, show how to obtain a network that computes the same function with logistic sigmoid as the activation function.

Part 2: Multi-layer perceptrons

In the first few slides about neural networks (also section 7.1 in chapter e-7) we discussed the expressive power of multi-layer perceptrons (with a “sign” activation function). Describe in detail a multi-layer perceptron that implements the following decision boundary:

Part 3: Exploring neural networks for digit classification

In this part of the assignment you will predict 8x8 handwritten digits. Loading the data will be easy, since scikit-learn provides an interface for this dataset that loads it into a usable format (there is an example of usage here). Your task for this part is to explore various aspects of multi-layer neural networks using this dataset. For simplicity, use 25 percent of the data for evaluating network performance, and the rest reserve for training. Normalize the data by dividing the features by the maximum value, which will normalize them to the range [0,1] (since the minimum is 0). As a basis for your implementation use the neural network code I showed in class.

Here's what you need to do:

Plot network accuracy as a function of the number of hidden units for a single-layer network with a logistic activation function. Use a range of values where the network displays both under-fitting and over-fitting.
Plot network accuracy as a function of the number of hidden units for a two-layer network with a logistic activation function. Here, also demonstrate performance in a range of values where the network exhibits both under-fitting and over-fitting. Does this dataset benefit from the use of more than one layer?
Add weight decay regularization to the neural network class you used (explain in your report how you did it). Does the network demonstrate less over-fitting on this dataset with the addition of weight decay?
The provided implementation uses the same activation function in each layer. For solving regression problems we need to use a linear activation function to produce the output of the network. Explain why, and what changes need to be made in the code.

The code that was provided does not really have a bias for all but the first layer. For 5 extra points, modify the code so that it correctly uses a bias for all layers.

Submission

Submit the pdf of your report via Canvas and associated code. Python code can be displayed in your report if it is succinct (not more than a page or two at the most). The latex sample document shows how to display Python code in a latex document. Code needs to be there so we can make sure that you implemented the algorithms and data analysis methodology correctly. Canvas allows you to submit multiple files for an assignment, so DO NOT submit an archive file (tar, zip, etc). Canvas will only allow you to submit pdfs (.pdf extension) or python code (.py extension). For this assignment there is a strict 8 page limit (not including references and code that is provided as an appendix). We will take off points for reports that go over the page limit. In addition to the code snippets that you include in your report, make sure you provide complete code from which we can see exactly how your results were generated.

Grading

A few general guidelines for this and future assignments in the course:

Always provide a description of the method you used to produce a given result in sufficient detail such that the reader can reproduce your results on the basis of the description (UNLESS the method has been provided in class or is there in the book). Your code needs to be provided in sufficient detail so we can make sure that your implementation is correct. The saying that “the devil is in the details” holds true for machine learning, and is sometimes the makes the difference between correct and incorrect results. If your code is more than a few lines, you can include it as an appendix to your report, or submit it as a separate file. Make sure your code is readable!
You can provide results in the form of tables, figures or text - whatever form is most appropriate for a given problem.
In any machine learning paper there is a discussion of the results. There is a similar expectation from your assignments that you reason about your results. For example, for the learning curve problem, what can you say on the basis of the observed learning curve?
Write succinct answers. We will take off points for rambling answers that are not to the point, and and similarly, if we have to wade through a lot of data/results that are not to the point.

Grading sheet for assignment 4

Part 1:  15 points.
( 5 points):  Properties of the logistic function and tanh
(10 points):  Relationship between weights of equivalent networks

Part 2:  10 points.

Part 3:  65 points.
(20 points):  Exploration of a network with a single hidden layer
(20 points):  Exploration of a network with two hidden layers
(15 points):  How to add weight decay
(10 points):  Linear activation function

Report structure, grammar and spelling:  10 points
(10 points):  Heading and subheading structure easy to follow and clearly divides report into logical sections.  
              Code, math, figure captions, and all other aspects of the report are well-written and formatted.
              Grammar, spelling, and punctuation.  Answers are clear and to the point.