Course projects
There are two topics available for projects: feature selection and prediction of protein function. Your first step is to choose a paper whose method you will use/implement.
Feature selection
Filter methods:
- Chris Ding and Hanchuan Peng. Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology, Vol. 3, No. 2, pp.185-205, 2005. (Joel)
- Yeung K, Bumgarner R. Multiclass classification of microarray data with repeated measurements: application to cancer. Genome Biology. 4:R83, 2003. (Simon)
- Y. Sun, S. Todorovic, and S. Goodison. Local Learning Based Feature Selection for High Dimensional Data Analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI), vol. 32, no. 9, pp. 1610-1626, 2010. (Nand)
Embedded feature selection methods:
- Hui Zou and Trevor Hastie. Regularization and Variable Selection via the Elastic Net. JRSSB (2005) 67(2) 301-320. An R package elasticnet is available from CRAN. (Majdi)
- Ji Zhu, Saharon Rosset, Trevor Hastie and Rob Tibshirani. 1-norm support vector machines. In: Neural Information Processing Systems (NIPS) 16, 2004. 1-norm classifiers create very sparse representations, so are useful for feature selection. (Prathamesh)
- Feiping Nie, Heng Huang, Cai Xiao, Chris Ding. Efficient and Robust Feature Selection via Joint L2,1-Norms Minimization. NIPS 2010. (Rehab)
- T. Joachims. A Support Vector Method for Multivariate Performance Measures. Proceedings of the International Conference on Machine Learning (ICML), 2005. code. Can be used for feature selection like SVMs are used in the RFE method.
Prediction of protein function
- Wei Bi and James Kwok. Multi-Label Classification on Tree- and DAG-Structured Hierarchies. International Conference on Machine Learning (ICML-11), 2011. (Indika)
Here are the links to the datasets you will use in your feature selection experiments:
