Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/tablewidth/action.php on line 93
assignments:assignment1 [CS545 fall 2016]

User Tools

Site Tools


assignments:assignment1

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
assignments:assignment1 [2016/08/27 17:19]
asa
assignments:assignment1 [2016/08/30 18:26]
asa
Line 23: Line 23:
   * Suppose you have data that is very imbalanced, and let's say for concreteness that we're working with a binary classification problem where the number of negative examples is much larger than the number of positive examples. ​ What can you say about the estimated error of the majority classifier? ​ What issue does that raise in your opinion about evaluating classifiers?​   * Suppose you have data that is very imbalanced, and let's say for concreteness that we're working with a binary classification problem where the number of negative examples is much larger than the number of positive examples. ​ What can you say about the estimated error of the majority classifier? ​ What issue does that raise in your opinion about evaluating classifiers?​
  
-To solve the issue with the standard error rate, it has been suggested to assign different costs to different types of errors using a cost matrix $c(h(\mathbf{x}_i),y_i)$, where $y_i$ is the actual class of example $i$, and $h(\mathbf{x}_i)$ is the the predicted class. ​ For a binary classification problem this is a $2 x 2$ matrix, and we'll assume there is no cost associated with a correct classification,​ which leaves two components to be determined:+To solve the issue with the standard error rate, it has been suggested to assign different costs to different types of errors using a cost matrix $c(y_i, h(\mathbf{x}_i))$,​ where $y_i$ is the actual class of example $i$, and $h(\mathbf{x}_i)$ is the the predicted class. ​ For a binary classification problem this is a $2 x 2$ matrix, and we'll assume there is no cost associated with a correct classification,​ which leaves two components to be determined:
  
   * $c_r = c(+1, -1)$, which is the reject cost (the cost of a false negative)   * $c_r = c(+1, -1)$, which is the reject cost (the cost of a false negative)
Line 37: Line 37:
 With these definitions,​ answer the following: With these definitions,​ answer the following:
  
-   * How should we choose $c_r$ and $c_a$ such that the majority classifier and the minority classifier both have an error of 0.5?  (The minority classifier is analogous to the majority classifier, except that it classifies everything as negative).  Section 1.4.1 in the book has a brief discussion of error measures.+   * How should we choose $c_r$ and $c_a$ such that the majority classifier and the minority classifier both have an error of 0.5?  (The minority classifier is analogous to the majority classifier, except that it classifies everything as positive, since we assumed the positive class has fewer representatives).  Section 1.4.1 in the book has a brief discussion of error measures.
  
  
assignments/assignment1.txt ยท Last modified: 2016/08/31 19:07 by asa