This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
assignments:assignment1 [2016/08/27 17:19] asa |
assignments:assignment1 [2016/08/28 15:00] asa |
||
---|---|---|---|
Line 23: | Line 23: | ||
* Suppose you have data that is very imbalanced, and let's say for concreteness that we're working with a binary classification problem where the number of negative examples is much larger than the number of positive examples. What can you say about the estimated error of the majority classifier? What issue does that raise in your opinion about evaluating classifiers? | * Suppose you have data that is very imbalanced, and let's say for concreteness that we're working with a binary classification problem where the number of negative examples is much larger than the number of positive examples. What can you say about the estimated error of the majority classifier? What issue does that raise in your opinion about evaluating classifiers? | ||
- | To solve the issue with the standard error rate, it has been suggested to assign different costs to different types of errors using a cost matrix $c(h(\mathbf{x}_i),y_i)$, where $y_i$ is the actual class of example $i$, and $h(\mathbf{x}_i)$ is the the predicted class. For a binary classification problem this is a $2 x 2$ matrix, and we'll assume there is no cost associated with a correct classification, which leaves two components to be determined: | + | To solve the issue with the standard error rate, it has been suggested to assign different costs to different types of errors using a cost matrix $c(y_i, h(\mathbf{x}_i))$, where $y_i$ is the actual class of example $i$, and $h(\mathbf{x}_i)$ is the the predicted class. For a binary classification problem this is a $2 x 2$ matrix, and we'll assume there is no cost associated with a correct classification, which leaves two components to be determined: |
* $c_r = c(+1, -1)$, which is the reject cost (the cost of a false negative) | * $c_r = c(+1, -1)$, which is the reject cost (the cost of a false negative) |