User Tools

Site Tools


feature_selection_bias

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

feature_selection_bias [2016/09/01 09:29] (current)
Line 1: Line 1:
 +====== Bias when using feature selection ======
  
 +When using feature selection you need to be very careful in how you evaluate your classifier.
 +
 +Here's the wrong way of doing it:
 +
 +<code python>
 +from PyML import *
 +
 +# the wrong way of using feature selection
 +
 +data = SparseDataSet('​colon.data'​)
 +# distinguish between normal tissue and tissue affected by colon cancer
 +# data is available from:
 +# http://​mldata.org/​repository/​data/​viewslug/​colon-cancer/​
 +
 +# create an instance of the RFE feature selection method
 +rfe = featsel.RFE()
 +# a feature selector'​s train method selects a subset of features
 +rfe.train(data)
 +
 +results1 = SVM().stratifiedCV(data)
 +</​code>​
 +
 +If you run this you will get a classifier with perfect accuracy. ​ Now let's do it the right way:
 +
 +<code python>
 +# the right way to perform feature selection:
 +# feature selection is performed as part of training the classifier
 +data = SparseDataSet('​colon.data'​)
 +results2 = composite.FeatureSelect(SVM(),​ featsel.RFE()).stratifiedCV(data)
 +</​code>​
feature_selection_bias.txt ยท Last modified: 2016/09/01 09:29 (external edit)