Assignment 5: Neural networks

Due: October 31st at 11:59pm

Part 1: Multi-layer perceptrons

In the first few slides about neural networks (also section 7.1 in chapter e-7) we discussed the expressive power of multi-layer perceptrons with a “sign” activation function. Describe in detail a multi-layer perceptron that implements the following decision boundary:

Part 2: Exploring neural networks for digit classification

In this segment of the assignment we will explore classification of handwritten digits with neural networks. For that task, we will use part of the MNIST dataset, which is very commonly used in the machine learning community. Your task is to explore various aspects of multi-layer neural networks using this dataset. We have prepared a small subset of the data with given split into training and test data.

Here's what you need to do:

The code that was provided does not really have a bias for all but the first layer. Modify the code so that it correctly uses a bias for all layers. This part is only worth 5 points, and you can do the rest of the assignment with the original version of the code. In an appendix of your report indicate what how you changed the code to accommodate a bias correctly.
Plot network accuracy as a function of the number of hidden units for a single-layer network with a logistic activation function. Use a range of values where the network displays both under-fitting and over-fitting.
Plot network accuracy as a function of the number of hidden units for a two-layer network with a logistic activation function. Here, also demonstrate performance in a range of values where the network exhibits both under-fitting and over-fitting. Does this dataset benefit from the use of more than one layer?
Add weight decay regularization to the neural network class (explain in your report how you did it). Does the network demonstrate less over-fitting on this dataset with the addition of weight decay?
The provided implementation uses the same activation function in each layer. For solving regression problems we need to use a linear activation function to produce the output of the network. Explain why, and what changes need to be made in the code.

Submission

Submit your report via Canvas. Python code can be displayed in your report if it is short, and helps understand what you have done. The sample LaTex document provided in assignment 1 shows how to display Python code. Submit the Python code that was used to generate the results as a file called assignment5.py (you can split the code into several .py files; Canvas allows you to submit multiple files). Typing

$ python assignment5.py

should generate all the tables/plots used in your report.

Grading

A few general guidelines for this and future assignments in the course:

Your answers should be concise and to the point.
You need to use LaTex to write the report.
The report is well structured, the writing is clear, with good grammar and correct spelling; good formatting of math, code, figures and captions (every figure and table needs to have a caption that explains what is being shown).
Whenever you use information from the web or published papers, a reference should be provided. Failure to do so is considered plagiarism.
Always provide a description of the method you used to produce a given result in sufficient detail such that the reader can reproduce your results on the basis of the description. You can use a few lines of python code or pseudo-code.
You can provide results in the form of tables, figures or text - whatever form is most appropriate for a given problem. There are no rules about how much space each answer should take. BUT we will take off points if we have to wade through a lot of redundant data.
In any machine learning paper there is a discussion of the results. There is a similar expectation from your assignments that you reason about your results.

We will take off points if these guidelines are not followed.

Grading sheet for assignment 5

Part 1:  15 points.

Part 2:  85 points.
( 5 points):  Fixing the code so it handles the bias term correctly
(25 points):  Exploration of a network with a single hidden layer
(25 points):  Exploration of a network with two hidden layers
(15 points):  Regularization
(15 points):  Linear activation function