This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
assignments:assignment2 [2016/09/04 10:19] asa [Grading] |
assignments:assignment2 [2016/09/06 09:51] asa |
||
---|---|---|---|
Line 20: | Line 20: | ||
* The **adatron** version of the perceptron described next. | * The **adatron** version of the perceptron described next. | ||
- | In each case make sure that your perceptron includes a bias term (in slide set 2 and page 7 in the book you will find guidance on how to add a bias term to an algorithm that is expressed without one). | + | In each case make sure that your implementation of the classifier includes a bias term (in slide set 2 and page 7 in the book you will find guidance on how to add a bias term to an algorithm that is expressed without one). |
=== The adatron === | === The adatron === | ||
- | before we get to the adatron, we will derive an alternative form of the perceptron algorithm -- the dual perceptron algorithm. All we need to look at is the weight update rule: | + | Before we get to the adatron, we will derive an alternative form of the perceptron algorithm --- the dual perceptron algorithm. All we need to look at is the weight update rule: |
$$\mathbf{w} \rightarrow \mathbf{w} + \eta y_i \mathbf{x}_i.$$ | $$\mathbf{w} \rightarrow \mathbf{w} + \eta y_i \mathbf{x}_i.$$ | ||
Line 34: | Line 34: | ||
where $\alpha_i$ are positive numbers that describe the magnitude of the contribution $\mathbf{x}_i$ is making to the weight vector, and $N$ is the number of training examples. | where $\alpha_i$ are positive numbers that describe the magnitude of the contribution $\mathbf{x}_i$ is making to the weight vector, and $N$ is the number of training examples. | ||
- | Therefore to initialize $\mathbf{w}$ to 0, we simply initialize $\alpha_i = 0$ for $i = 1,\ldots,N$. For the adatron we'll use an alternative initialization: | + | Therefore to initialize $\mathbf{w}$ to 0, we simply initialize $\alpha_i = 0$ for $i = 1,\ldots,N$. In terms of the variables $\alpha_i$, the perceptron |
- | $\alpha_i = 1$ for $i = 1,\ldots,N$. | + | |
- | Now back to the perceptron... in terms of the variables $\alpha_i$, the perceptron | + | |
update rule becomes: | update rule becomes: | ||
Line 66: | Line 64: | ||
Here's what you need to do: | Here's what you need to do: | ||
- | - Implement the pocket algorithm and the adatron; each classifier should be implemented by a separate class, and use the same interface used in the code provided for the perceptron algorithm. | + | - Implement the pocket algorithm and the adatron; each classifier should be implemented by a separate class, and use the same interface used in the code provided for the perceptron algorithm. Make sure each classifier you use (including the original version of the perceptron) implements a bias term. |
- Compare the performance of these variants of the perceptron on the Gisette and QSAR datasets by computing an estimate of the out of sample error on a sample of the data that you reserve for testing (the test set). In each case reserve about 60% of the data for training, and 40% for testing. To gain more confidence in our error estimates, repeat this experiment using 10 random splits of the data into training/test sets. Report the average error and its standard deviation in a [[https://en.wikibooks.org/wiki/LaTeX/Tables|LaTex table]]. Is there a version of the perceptron that appears to perform better? (In answering this, consider the differences you observe in comparison to the standard deviation). | - Compare the performance of these variants of the perceptron on the Gisette and QSAR datasets by computing an estimate of the out of sample error on a sample of the data that you reserve for testing (the test set). In each case reserve about 60% of the data for training, and 40% for testing. To gain more confidence in our error estimates, repeat this experiment using 10 random splits of the data into training/test sets. Report the average error and its standard deviation in a [[https://en.wikibooks.org/wiki/LaTeX/Tables|LaTex table]]. Is there a version of the perceptron that appears to perform better? (In answering this, consider the differences you observe in comparison to the standard deviation). | ||