Warning: Declaration of action_plugin_tablewidth::register(&$controller) should be compatible with DokuWiki_Action_Plugin::register(Doku_Event_Handler $controller) in /s/bach/b/class/cs545/public_html/fall16/lib/plugins/tablewidth/action.php on line 93
====== Bias when using feature selection ====== When using feature selection you need to be very careful in how you evaluate your classifier. Here's the wrong way of doing it: from PyML import * # the wrong way of using feature selection data = SparseDataSet('colon.data') # distinguish between normal tissue and tissue affected by colon cancer # data is available from: # http://mldata.org/repository/data/viewslug/colon-cancer/ # create an instance of the RFE feature selection method rfe = featsel.RFE() # a feature selector's train method selects a subset of features rfe.train(data) results1 = SVM().stratifiedCV(data) If you run this you will get a classifier with perfect accuracy. Now let's do it the right way: # the right way to perform feature selection: # feature selection is performed as part of training the classifier data = SparseDataSet('colon.data') results2 = composite.FeatureSelect(SVM(), featsel.RFE()).stratifiedCV(data)