Differences

This shows you the differences between two versions of the page.

--- assignments:assignment2 [2016/09/04 10:19]
asa [Grading]
+++ assignments:assignment2 [2016/09/14 09:38]
asa [Assignment 2]
@@ Line 3: / Line 3: @@
 ====== Assignment 2 ======
-Due date:  Friday 9/16 at 11:59pm
+Due date:  Friday 9/17 at 11:59pm
 ==== Datasets ====
@@ Line 20: / Line 20: @@
   * The **adatron** version of the perceptron described next.
-In each case make sure that your perceptron includes a bias term (in slide set 2 and page 7 in the book you will find guidance on how to add a bias term to an algorithm that is expressed without one).
+In each case make sure that your implementation of the classifier includes a bias term (in slide set 2 and page 7 in the book you will find guidance on how to add a bias term to an algorithm that is expressed without one).
 === The adatron ===
-before we get to the adatron, we will derive an alternative form of the perceptron algorithm -- the dual perceptron algorithm.  All we need to look at is the weight update rule:
+Before we get to the adatron, we will derive an alternative form of the perceptron algorithm --- the dual perceptron algorithm.  All we need to look at is the weight update rule:
 $$\mathbf{w} \rightarrow \mathbf{w} + \eta y_i \mathbf{x}_i.$$
@@ Line 34: / Line 34: @@
 where $\alpha_i$ are positive numbers that describe the magnitude of the contribution $\mathbf{x}_i$ is making to the weight vector, and $N$ is the number of training examples.
-Therefore to initialize $\mathbf{w}$ to 0, we simply initialize $\alpha_i = 0$ for $i = 1,\ldots,N$.  For the adatron we'll use an alternative initialization:
+Therefore to initialize $\mathbf{w}$ to 0, we simply initialize $\alpha_i = 0$ for $i = 1,\ldots,N$.  In terms of the variables $\alpha_i$, the perceptron
-$\alpha_i = 1$ for $i = 1,\ldots,N$.
-Now back to the perceptron... in terms of the variables $\alpha_i$, the perceptron
 update rule becomes:
@@ Line 63: / Line 61: @@
 The variable $\eta$ plays the role of the learning rate $\eta$ employed in the perceptron algorithm and $\delta \alpha$ is the proposed magnitude of change in $\alpha_i$.
 We note that the adatron tries to maintain a //sparse// representation in terms of the training examples by keeping many $\alpha_i$ equal to zero.  The adatron converges to a special case of the SVM algorithm that we will learn later in the semester; this algorithm tries to maximize the margin with which each example is classified, which is captured by the variable $\gamma$ in the algorithm (notice that the magnitude of change proposed for each $\alpha_i$ becomes smaller as the margin increases towards 1).
+**Note:** if you observe an overflow issues in running the adatron, add an upper bound on the value of $\alpha_i$.
 Here's what you need to do:
-  - Implement the pocket algorithm and the adatron; each classifier should be implemented by a separate class, and use the same interface used in the code provided for the perceptron algorithm.
+  - Implement the pocket algorithm and the adatron; each classifier should be implemented in a separate Python class (they can all be in the same module), and use the same interface used in the code provided for the perceptron algorithm, i.e. provides the same methods.  Make sure each classifier you use (including the original version of the perceptron) implements a bias term.
-  - Compare the performance of these variants of the perceptron on the Gisette and QSAR datasets by computing an estimate of the out of sample error on a sample of the data that you reserve for testing (the test set).  In each case reserve about 60% of the data for training, and 40% for testing.  To gain more confidence in our error estimates, repeat this experiment using 10 random splits of the data into training/test sets.  Report the average error and its standard deviation in a [[https://en.wikibooks.org/wiki/LaTeX/Tables|LaTex table]].  Is there a version of the perceptron that appears to perform better?   (In answering this, consider the differences you observe in comparison to the standard deviation).
+  - Compare the performance of these variants of the perceptron on the Gisette and QSAR datasets by computing an estimate of the out of sample error on a sample of the data that you reserve for testing (the test set).  In each case reserve about 60% of the data for training, and 40% for testing.  To gain more confidence in our error estimates, repeat this experiment using 10 random splits of the data into training/test sets.  Report the average error and its standard deviation in a [[https://en.wikibooks.org/wiki/LaTeX/Tables|LaTex table]].  Is there a version of the perceptron that appears to perform better?   (In answering this, consider the differences in performance you observe in comparison to the standard deviation).

CS545 fall 2016

User Tools

Site Tools

Differences

Page Tools