Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/tablewidth/action.php on line 93
feature_selection_bias [CS545 fall 2016]

User Tools

Site Tools


Sidebar


Warning: Declaration of syntax_plugin_fontsize2::handle($match, $state, $pos, &$handler) should be compatible with DokuWiki_Syntax_Plugin::handle($match, $state, $pos, Doku_Handler $handler) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/fontsize2/syntax.php on line 19

Warning: Declaration of syntax_plugin_fontsize2::render($mode, &$renderer, $data) should be compatible with DokuWiki_Syntax_Plugin::render($format, Doku_Renderer $renderer, $data) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/fontsize2/syntax.php on line 19

CS545


Instructor
Asa Ben-Hur



 http://www.colostate.edu

feature_selection_bias

Bias when using feature selection

When using feature selection you need to be very careful in how you evaluate your classifier.

Here's the wrong way of doing it:

from PyML import *
 
# the wrong way of using feature selection
 
data = SparseDataSet('colon.data')
# distinguish between normal tissue and tissue affected by colon cancer
# data is available from:
# http://mldata.org/repository/data/viewslug/colon-cancer/
 
# create an instance of the RFE feature selection method
rfe = featsel.RFE()
# a feature selector's train method selects a subset of features
rfe.train(data)
 
results1 = SVM().stratifiedCV(data)

If you run this you will get a classifier with perfect accuracy. Now let's do it the right way:

# the right way to perform feature selection:
# feature selection is performed as part of training the classifier
data = SparseDataSet('colon.data')
results2 = composite.FeatureSelect(SVM(), featsel.RFE()).stratifiedCV(data)
feature_selection_bias.txt ยท Last modified: 2016/08/09 10:25 (external edit)