User Tools

Site Tools


Bias when using feature selection

When using feature selection you need to be very careful in how you evaluate your classifier.

Here's the wrong way of doing it:

from PyML import *
# the wrong way of using feature selection
data = SparseDataSet('')
# distinguish between normal tissue and tissue affected by colon cancer
# data is available from:
# create an instance of the RFE feature selection method
rfe = featsel.RFE()
# a feature selector's train method selects a subset of features
results1 = SVM().stratifiedCV(data)

If you run this you will get a classifier with perfect accuracy. Now let's do it the right way:

# the right way to perform feature selection:
# feature selection is performed as part of training the classifier
data = SparseDataSet('')
results2 = composite.FeatureSelect(SVM(), featsel.RFE()).stratifiedCV(data)
feature_selection_bias.txt ยท Last modified: 2016/08/09 10:25 (external edit)