{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Assignment 5: Neural Networks\n",
    "\n",
    "Due: November 6th at 11:59pm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exploring neural networks for digit classification \n",
    "\n",
    "In this assignment you will explore classification of handwritten digits with neural networks. For that task, we will use part of the MNIST dataset, which is very commonly used in the machine learning community. Your task is to explore various aspects of multi-layer neural networks using this dataset. We have prepared a [small subset](http://www.cs.colostate.edu/~cs545/fall16/lib/exe/fetch.php?media=assignments:mnist.tar.gz) of the data with a given split into training and test data.\n",
    "\n",
    "1. Plot network accuracy as a function of the number of hidden units for a single-layer network with a logistic activation function. Try to find a range of values where the network displays both under-fitting and over-fitting.  For a fixed architecture, explore accuracy as a function of the number of epochs used for training as well as the learning rate.\n",
    "2. Plot network accuracy as a function of the number of hidden units for a two-layer network with a logistic activation function, similarly to part 1 using a specific value of the learning rate and number of epochs. Does this dataset benefit from the use of more than one layer?\n",
    "3.  Add weight decay regularization to the neural network class (explain in your report how you did it). Does the network demonstrate less over-fitting on this dataset with the addition of weight decay?\n",
    "4. Modify the code to include the option of using the cross-entropy loss function instead of the quadratic loss function.  Run experiments and determine which activation function works best with the cross-entropy (consider logistic and ReLU activations for the hidden layers).  Explain in your writeup the required changes in the code.\n",
    "5. The code provided performs batch gradient descent.  Modify the code to perform stochastic gradient descent.  Explain in your report the change you made, and compare the performance of the resulting network in terms of accuracy and training time.\n",
    "6. The provided implementation uses the same activation function in each layer. For solving regression problems we need to use a linear activation function to produce the output of the network. Explain why, and what changes need to be made in the code.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your answer here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Your Report\n",
    "\n",
    "Answer the questions in the cells reserved for that purpose.\n",
    "\n",
    "Mathematical equations should be written as LaTex equations; the assignment contains multiple examples of both inline formulas (such as the one exemplifying the notation for the norm of a vector $||\\mathbf{x}||$ and those that appear on separate lines, e.g.:\n",
    "\n",
    "$$\n",
    "||\\mathbf{x}|| = \\sqrt{\\mathbf{x}^T \\mathbf{x}}.\n",
    "$$\n",
    "\n",
    "\n",
    "\n",
    "### Submission\n",
    "\n",
    "Submit your report as a Jupyter notebook via Canvas.  Running the notebook should generate all the plots and results in your notebook.\n",
    "\n",
    "\n",
    "### Grading \n",
    "\n",
    "Here is what the grade sheet will look like for this assignment.  A few general guidelines for this and future assignments in the course:\n",
    "\n",
    "  * Your answers should be concise and to the point.  We will take off points if that is not the case.\n",
    "  * Always provide a description of the method you used to produce a given result in sufficient detail such that the reader can reproduce your results on the basis of the description.  You can use a few lines of python code or pseudo-code.\n",
    "\n",
    "\n",
    "Grading sheet for the assignment:\n",
    "\n",
    "```\n",
    "Neural networks.\n",
    "(15 points):  Exploration of a network with a single hidden layer\n",
    "(15 points):  Exploration of a network with two hidden layers\n",
    "(15 points):  Regularization\n",
    "(20 points):  Cross-entropy\n",
    "(20 points):  Stochastic gradient descent\n",
    "(15 points):  Linear activation function for regression\n",
    "```\n",
    "\n",
    "Grading will be based on the following criteria:\n",
    "\n",
    "  * Correctness of answers to math problems\n",
    "  * Math is formatted as LaTex equations\n",
    "  * Correct behavior of the required code\n",
    "  * Easy to understand plots \n",
    "  * Overall readability and organization of the notebook\n",
    "  * Effort in making interesting observations where requested.\n",
    "  * Conciseness.  Points may be taken off if the notebook is overly \n",
    "  "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}