Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/tablewidth/action.php on line 93
feature_selection_bias [CS545 fall 2016]

User Tools

Site Tools


feature_selection_bias

Bias when using feature selection

When using feature selection you need to be very careful in how you evaluate your classifier.

Here's the wrong way of doing it:

from PyML import *
 
# the wrong way of using feature selection
 
data = SparseDataSet('colon.data')
# distinguish between normal tissue and tissue affected by colon cancer
# data is available from:
# http://mldata.org/repository/data/viewslug/colon-cancer/
 
# create an instance of the RFE feature selection method
rfe = featsel.RFE()
# a feature selector's train method selects a subset of features
rfe.train(data)
 
results1 = SVM().stratifiedCV(data)

If you run this you will get a classifier with perfect accuracy. Now let's do it the right way:

# the right way to perform feature selection:
# feature selection is performed as part of training the classifier
data = SparseDataSet('colon.data')
results2 = composite.FeatureSelect(SVM(), featsel.RFE()).stratifiedCV(data)
feature_selection_bias.txt ยท Last modified: 2016/08/09 10:25 (external edit)