Differences

This shows you the differences between two versions of the page.

--- assignments:assignment5 [2016/08/09 10:25]
127.0.0.1 external edit
+++ assignments:assignment5 [2016/10/17 19:30]
asa [Part 2: Exploring neural networks for digit classification]
@@ Line 1: / Line 1: @@
-======== Assignment 5: Feature selection ===========
+~~NOTOC~~
-Due:  November 15th at 11pm
+======== Assignment 5: Neural networks ===========
+Due:  October 31st at 11:59pm
-===== Data =====
+===== Part 1:  Multi-layer perceptrons =====
-In this assignment you will compare several feature selection methods on several datasets.
+In the first few slides about neural networks (also section 7.1 in chapter e-7) we discussed the expressive power of multi-layer perceptrons with a "sign" activation function.  Describe in detail a multi-layer perceptron that implements the following decision boundary:
-The first dataset is the [[https://archive.ics.uci.edu/ml/datasets/Arcene| Arcene]] dataset which was used in the 2003 NIPS feature selection competition.  The dataset is produced by mass spectrometry of biological samples that comes from different types of cancer.
-The second dataset describes the expression of human genes in two types of leukemia The original publication that describes the data:
+{{ :assignments:boundary.png?200 |}}
-T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander.
-[[https://www.broadinstitute.org/mpr/publications/projects/Leukemia/Golub_et_al_1999.pdf | Molecular classification of cancer: class discovery and class prediction by gene expression monitoring]].
-Science, 286(5439):531, 1999.
-Download a processed version of the dataset in libsvm format from the [[https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html | libsvm data repository]].  Look for the dataset named "leukemia".  There are two files, one a training set and another which contains a test set.  Merge the two files into a single file for your experiments.
+===== Part 2:  Exploring neural networks for digit classification =====
-===== Part 1:  Filter methods =====
+In this segment of the assignment we will explore classification of handwritten digits with neural networks.  For that task, we will use part of the [[http://yann.lecun.com/exdb/mnist/ |MNIST]] dataset, which is very commonly used in the machine learning community.
+Your task is to explore various aspects of multi-layer neural networks using this dataset.
+We have prepared a {{ :assignments:mnist.tar.gz |small subset}} of the data with given split into training and test data.
-Implement a Python function that returns an array with the Golub score of a labeled dataset.  Recall that the Golub score for feature $i$ is defined as:
+Here's what you need to do:
-$$
+  * The code that was provided does not really have a bias for all but the first layer.   Modify the code so that it correctly uses a bias for all layers.  This part is only worth 5 points, and you can do the rest of the assignment with the original version of the code.
-\frac{|\mu_i^{(+)} - \mu_i^{(-)}|}{\sigma_i^{(+)} + \sigma_i^{(-)}}
+  * Plot network accuracy as a function of the number of hidden units for a single-layer network with a logistic activation function.  Use a range of values where the network displays both under-fitting and over-fitting.
-$$
+  * Plot network accuracy as a function of the number of hidden units for a two-layer network with a logistic activation function.  Here, also demonstrate performance in a range of values where the network exhibits both under-fitting and over-fitting.  Does this dataset benefit from the use of more than one layer?
-where $\mu_i^{(+)}$ is the average of feature $i$ in the positive examples,
+  * Add weight decay regularization to the neural network class you used (explain in your report how you did it).  Does the network demonstrate less over-fitting on this dataset with the addition of weight decay?
-where $\sigma_i^{(+)}$ is the standard deviation of feature $i$ in the positive examples, and $\mu_i^{(-)}, \sigma_i^{(-)}$ are defined analogously for the negative examples.
+  * The provided implementation uses the same activation function in each layer.  For solving regression problems we need to use a linear activation function to produce the output of the network.  Explain why, and what changes need to be made in the code.
-In order for your function to work with the scikit-learn filter framework it needs to have two parameters: ''golub(X, y)'', where X is the feature matrix, and y is a vector of labels.  All scikit-learn filter methods return two values - a vector of scores, and a vector of p-values.  For our purposes, we won't use p-values associated with the Golub scores, so just return the computed vector of scores twice (''return scores,scores'' if your vector of scores is stored in an array called scores)
-===== Part 2:  Embedded methods:  L1 SVM =====
-The L1-SVM is an SVM that uses the L1 norm as the regularization term by replacing $w^Tw$ with $\sum_{i=1}^d |w_i|$.  As discussed in class, the L1 SVM leads to very sparse solutions, and can therefore be used to perform feature selection.
-Run the L1-SVM on the datasets mentioned above.
+===== Submission =====
-In scikit-learn use ''LinearSVC(penalty='l1', dual=False)'' to create one.
-How many features have non-zero weight vector coefficients?  (Note that you can obtain the weight vector of a trained SVM by looking at its ''coef0_'' attribute.
-Compare the accuracy of the following approaches using cross-validation on the two datasets:
+Submit your report via Canvas.  Python code can be displayed in your report if it is short, and helps understand what you have done. The sample LaTex document provided in assignment 1 shows how to display Python code.  Submit the Python code that was used to generate the results as a file called ''assignment5.py'' (you can split the code into several .py files; Canvas allows you to submit multiple files).  Typing
-   * L1 SVM
+<code>
-   * L2 SVM trained on the features selected by the L1 SVM
+$ python assignment5.py
-   * L2 SVM trained on all the features
+</code>
-   * L2 SVM that uses RFE (with an L2-SVM) to select relevant features; use the class ''RFECV'' which automatically selects the number of features.
+should generate all the tables/plots used in your report.
-It has been argued in the literature that L1-SVMs often leads to solutions that are too sparse.  As a workaround, implement the following strategy:
-  * Create $k$ sub-samples of the training data.  For each sub-sample randomly choose a subset consisting of 80% of the training examples.
-  * For each sub-sample train an L1-SVM.
-  * For each feature compute a score that is the number of sub-samples for which that feature yielded a non-zero weight vector coefficient.
-In the next part of the assignment you will compare this approach to RFE and the Golub filter method that you implemented in part 1.
-===== Part 3:  Method comparison =====
-Compute the accuracy of a Linear L2 SVM as a function of the number of selected features on the leukemia and Arcene datasets for the following feature selection methods:
-  * The Golub score
-  * L1-SVM feature selection using subsamples
-  * RFE-SVM
-Make sure that your evaluation provides an un-biased estimate of classifier performance.
-Comment on the results.
-For the above experiment you do not need to select the optimal value for the SVM soft-margin constant.
-Compare these results to results obtained using internal cross-validation for selecting
-the soft margin constant $C$ over a grid of values.
-In writing your code, use scikit-learn's ability to combine analysis steps using the [[http://scikit-learn.org/stable/modules/pipeline.html |Pipeline class]].  This will be particularly useful for performing model selection.
-===== Submission =====
-Submit the pdf of your report and python code via Canvas.  Python code can be displayed in your report if it is succinct (not more than a page or two at the most) or submitted separately.  The latex sample document shows how to display Python code in a latex document.  Code needs to be there so we can make sure that you implemented the algorithms and data analysis methodology correctly.  Canvas allows you to submit multiple files for an assignment, so DO NOT submit an archive file (tar, zip, etc).  Canvas will only allow you to submit pdfs (.pdf extension) or python code (.py extension).
-For this assignment there is a strict 8 page limit (not including references and code that is provided as an appendix).  We will take off points for reports that go over the page limit.
-In addition to the code snippets that you include in your report, make sure you provide complete code from which we can see exactly how your results were generated.
 ===== Grading =====
@@ Line 80: / Line 44: @@
 A few general guidelines for this and future assignments in the course:
-  * Always provide a description of the method you used to produce a given result in sufficient detail such that the reader can reproduce your results on the basis of the description (UNLESS the method has been provided in class or is there in the book).  Your code needs to be provided in sufficient detail so we can make sure that your implementation is correct.  The saying that "the devil is in the details" holds true for machine learning, and is sometimes the makes the difference between correct and incorrect results.  If your code is more than a few lines, you can include it as an appendix to your report, or submit it as a separate file.  Make sure your code is readable!
+  * Your answers should be concise and to the point.
-  * You can provide results in the form of tables, figures or text - whatever form is most appropriate for a given problem.
+  * You need to use LaTex to write the report.
-  * In any machine learning paper there is a discussion of the results.  There is a similar expectation from your assignments that you reason about your results.  For example, for the learning curve problem, what can you say on the basis of the observed learning curve?
+  * The report is well structured, the writing is clear, with good grammar and correct spelling; good formatting of math, code, figures and captions (every figure and table needs to have a caption that explains what is being shown).
-  * Write succinct answers.  We will take off points for rambling answers that are not to the point, and and similarly, if we have to wade through a lot of data/results that are not to the point.
+  * Whenever you use information from the web or published papers, a reference should be provided.  Failure to do so is considered plagiarism.
+  * Always provide a description of the method you used to produce a given result in sufficient detail such that the reader can reproduce your results on the basis of the description.  You can use a few lines of python code or pseudo-code.
+  * You can provide results in the form of tables, figures or text - whatever form is most appropriate for a given problem.  There are no rules about how much space each answer should take.  BUT we will take off points if we have to wade through a lot of redundant data.
+  * In any machine learning paper there is a discussion of the results.  There is a similar expectation from your assignments that you reason about your results.
+We will take off points if these guidelines are not followed.
 <code>
-Grading sheet for assignment 3
+Grading sheet for assignment 5
 Part 1:  15 points.
-(15 points):  Correct implementation of the Golub score
-Part 2:  35 points.
-(15 points):  Comparison of L1 chosen features with use of all features.
-(20 points):  Correct implementation of L1-SVM feature selection using sub-samples.
-Part 3:  40 points.
-(25 points):  Accuracy as a function of number of features and discussion of the results
-(15 points):  Same, with model selection
-Report structure, grammar and spelling:  10 points
+Part 2:  85 points.
-(10 points):  Heading and subheading structure easy to follow and clearly divides report into logical sections.
+(25 points):  Exploration of a network with a single hidden layer
-              Code, math, figure captions, and all other aspects of the report are well-written and formatted.
+(25 points):  Exploration of a network with two hidden layers
-              Grammar, spelling, and punctuation.  Answers are clear and to the point.
+(15 points):  How to add weight decay
+(15 points):  Linear activation function
+( 5 points):  Fixing the code so it handles the bias term correctly
 </code>

CS545 fall 2016

User Tools

Site Tools

Differences

Page Tools