Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/tablewidth/action.php on line 93
assignments:assignment2 [CS545 fall 2016]

User Tools

Site Tools


assignments:assignment2

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
assignments:assignment2 [2016/09/04 10:19]
asa [Grading]
assignments:assignment2 [2016/09/14 09:38]
asa [Assignment 2]
Line 3: Line 3:
 ====== Assignment 2 ====== ====== Assignment 2 ======
  
-Due date:  Friday 9/16 at 11:59pm+Due date:  Friday 9/17 at 11:59pm
  
 ==== Datasets ==== ==== Datasets ====
Line 20: Line 20:
   * The **adatron** version of the perceptron described next.   * The **adatron** version of the perceptron described next.
  
-In each case make sure that your perceptron ​includes a bias term (in slide set 2 and page 7 in the book you will find guidance on how to add a bias term to an algorithm that is expressed without one).+In each case make sure that your implementation of the classifier ​includes a bias term (in slide set 2 and page 7 in the book you will find guidance on how to add a bias term to an algorithm that is expressed without one).
  
 === The adatron === === The adatron ===
  
-before ​we get to the adatron, we will derive an alternative form of the perceptron algorithm -- the dual perceptron algorithm. ​ All we need to look at is the weight update rule:+Before ​we get to the adatron, we will derive an alternative form of the perceptron algorithm ​--- the dual perceptron algorithm. ​ All we need to look at is the weight update rule:
  
 $$\mathbf{w} \rightarrow \mathbf{w} + \eta y_i \mathbf{x}_i.$$ $$\mathbf{w} \rightarrow \mathbf{w} + \eta y_i \mathbf{x}_i.$$
Line 34: Line 34:
 where $\alpha_i$ are positive numbers that describe the magnitude of the contribution $\mathbf{x}_i$ is making to the weight vector, and $N$ is the number of training examples. where $\alpha_i$ are positive numbers that describe the magnitude of the contribution $\mathbf{x}_i$ is making to the weight vector, and $N$ is the number of training examples.
  
-Therefore to initialize $\mathbf{w}$ to 0, we simply initialize $\alpha_i = 0$ for $i = 1,​\ldots,​N$.  ​For the adatron we'll use an alternative initialization:​ +Therefore to initialize $\mathbf{w}$ to 0, we simply initialize $\alpha_i = 0$ for $i = 1,​\ldots,​N$.  ​In terms of the variables $\alpha_i$, the perceptron
-$\alpha_i = 1$ for $i = 1,​\ldots,​N$. ​  +
-Now back to the perceptron... in terms of the variables $\alpha_i$, the perceptron+
 update rule becomes: update rule becomes:
  
Line 63: Line 61:
 The variable $\eta$ plays the role of the learning rate $\eta$ employed in the perceptron algorithm and $\delta \alpha$ is the proposed magnitude of change in $\alpha_i$. ​ The variable $\eta$ plays the role of the learning rate $\eta$ employed in the perceptron algorithm and $\delta \alpha$ is the proposed magnitude of change in $\alpha_i$. ​
 We note that the adatron tries to maintain a //sparse// representation in terms of the training examples by keeping many $\alpha_i$ equal to zero.  The adatron converges to a special case of the SVM algorithm that we will learn later in the semester; this algorithm tries to maximize the margin with which each example is classified, which is captured by the variable $\gamma$ in the algorithm (notice that the magnitude of change proposed for each $\alpha_i$ becomes smaller as the margin increases towards 1). We note that the adatron tries to maintain a //sparse// representation in terms of the training examples by keeping many $\alpha_i$ equal to zero.  The adatron converges to a special case of the SVM algorithm that we will learn later in the semester; this algorithm tries to maximize the margin with which each example is classified, which is captured by the variable $\gamma$ in the algorithm (notice that the magnitude of change proposed for each $\alpha_i$ becomes smaller as the margin increases towards 1).
 +
 +**Note:** if you observe an overflow issues in running the adatron, add an upper bound on the value of $\alpha_i$.
  
 Here's what you need to do: Here's what you need to do:
  
-  - Implement the pocket algorithm and the adatron; each classifier should be implemented ​by a separate class, and use the same interface used in the code provided for the perceptron algorithm. +  - Implement the pocket algorithm and the adatron; each classifier should be implemented ​in a separate ​Python ​class (they can all be in the same module), and use the same interface used in the code provided for the perceptron algorithm, i.e. provides the same methods. ​ Make sure each classifier you use (including the original version of the perceptron) implements a bias term
-  - Compare the performance of these variants of the perceptron on the Gisette and QSAR datasets by computing an estimate of the out of sample error on a sample of the data that you reserve for testing (the test set).  In each case reserve about 60% of the data for training, and 40% for testing. ​ To gain more confidence in our error estimates, repeat this experiment using 10 random splits of the data into training/​test sets.  Report the average error and its standard deviation in a [[https://​en.wikibooks.org/​wiki/​LaTeX/​Tables|LaTex table]]. ​ Is there a version of the perceptron that appears to perform better? ​  (In answering this, consider the differences you observe in comparison to the standard deviation).+  - Compare the performance of these variants of the perceptron on the Gisette and QSAR datasets by computing an estimate of the out of sample error on a sample of the data that you reserve for testing (the test set).  In each case reserve about 60% of the data for training, and 40% for testing. ​ To gain more confidence in our error estimates, repeat this experiment using 10 random splits of the data into training/​test sets.  Report the average error and its standard deviation in a [[https://​en.wikibooks.org/​wiki/​LaTeX/​Tables|LaTex table]]. ​ Is there a version of the perceptron that appears to perform better? ​  (In answering this, consider the differences ​in performance ​you observe in comparison to the standard deviation).
  
  
assignments/assignment2.txt · Last modified: 2016/09/14 09:38 by asa