The main focus of research in my lab is on the development of machine learning methods for problems in bioinformatics. Our specialty is in the creative design of kernel methods for problems ranging from prediction of protein function, and interactions to prediction of alternative splicing.
Alternative splicing in plants¶
Splicing is the process whereby parts of a gene called introns are removed, and the RNA is spliced back to form the mature mRNA. A given gene can be spliced in multiple ways, a phenomenon called alternative splicing. Whereas it is well-studied in animals, alternative splicing in plants is not as well understood, and the differences in genome architecture between plants and animals lead to differences in alternative splicing. We are working on this in collaboration with A.S.N. Reddy of the Biology Department, and our approach is to computationally search for genomic features that are predictive of alternative splicing—elements that serve as splicing enhancers and suppressors, and test their biological relevance to the process.
A second avenue we are pursuing is to leverage next generation sequencing data for prediction of alternative splicing events and improve genome annotation. The noisy nature of this data makes this a challenging task.
The above figure shows a model created by our SpliceGrapher tool.
- Salah E. Abdel-Ghany*, Michael Hamilton*, Jennifer L. Jacobi, Peter Ngam, Nicholas Devitt, Faye Schilkey, Asa Ben-Hur, and Anireddy S.N. Reddy. A survey of the sorghum transcriptome using single-molecule long reads. Accepted, Nature Communications. (*Joint first authors). Here’s a link to the TAPIS software.
- Mark F. Rogers, Christina Boucher, and Asa Ben-Hur. SpliceGrapherXT: From Splice Graphs to Transcripts Using RNA-Seq. ACM Conference on Bioinformatics, Computational Biology and Biomedicine, 2013.
- M.F. Rogers, J. Thomas, A.S.N. Reddy, and A. Ben-Hur. SpliceGrapher: Detecting patterns of alternative splicing from RNA-seq data in the context of gene models and EST data. Genome Biology 13:R4, 2012.
- A.S.N. Reddy, Mark F. Rogers, Dale N. Richardson, Michael Hamilton, and Asa Ben-Hur. Deciphering the plant splicing code: Experimental and computational approaches for predicting alternative splicing and splicing regulatory elements. Frontiers in Plant Genetics and Genomics 3, 2012.
Protein protein interactions and their binding sites¶
Proteins perform their function by interacting with other proteins. Therefore understanding the complex network of interactions between an organism’s proteins is important for understanding their role. My work in this area includes genome-wide prediction of interaction networks in yeast and human, and more recently prediction at a finer scale in order to determine the interfaces at which proteins interact.
- Fayyaz Afsar Minhas, Brian Geiss, and Asa Ben-Hur. PAIRpred: Partner-specific prediction of interacting residues from sequence and structure, PROTEINS: Structure, Function, and Bioinformatics, 2013. PAIRpred software and data.
- F.A. Minhas and A. Ben-Hur. Multiple instance learning of Calmodulin binding sites. Bioinformatics (2012) 28(18): i416-i422 (special ECCB 2012 issue). An online version of the classifier is available.
- M. Hamilton, A.S.N. Reddy, and A. Ben-Hur. Kernel methods for Calmodulin binding and binding site prediction. In: ACM Conference on Bioinformatics, Computational Biology and Biomedicine, 2011. [ preprint ]
- A.S.N. Reddy, A. Ben-Hur, and I.S. Day. Experimental and computational approaches for the study of calmodulin interactions. Phytochemistry 72(11): 1007-1019, 2011.
Some of my older work in the area:
- H. Wang, E. Segal, A. Ben-Hur, Q. Li, M. Vidal and D. Koller. InSite: a computational method for identifying protein-protein interaction binding sites on a proteome-wide scale. Genome Biology, 8(9): R192, 2007.
- A. Ben-Hur and W.S. Noble. Kernel methods for predicting protein-protein interactions. In: Proceedings, thirteenth international conference on intelligent systems for molecular biology. Bioinformatics 21(Suppl. 1): i38-i46, 2005.
Protein function prediction¶
Despite having been studied for over twenty years, the standard method for protein function prediction remains annotation transfer. The difficulty in applying state-of-the-art machine learning methods is that proteins can have multiple functions, and that the system of keywords used to describe protein function, the Gene Ontology (GO), has a complex hierarchical structure. This provides genome annotators with a rich vocabulary with which to describe protein function, but makes it sub-optimal to use standard approaches. Therefore, there are significant opportunities to develop new classification methods that treat function prediction as a hierarchical classification problem.
Our approach uses the so-called structured SVM, which is able to fully model the complexity of this learning problem. The method we developed, GOstruct, has shown state-of-the-art performance in several benchmarks.
This project was funded by NSF grant ABI 0965768/0965616.
- Indika Kahanda, Chris Funk, Fahad Ullah, Karin Verspoor and Asa Ben-Hur. A close look at automated protein function prediction evaluation protocols.. GigaScience, 4(41), 2015.
- Christopher Funk, Indika Kahanda, Asa Ben-Hur and Karin Verspoor. Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct. Journal of Biomedical Semantics, 6(9), 2015.
- Automated function prediction competition participants. A large-scale evaluation of computational protein function prediction. Nature Methods, 10:221–227, 2013.
- Artem Sokolov, Chris Funk, Kiley Graim, Karin Verspoor, and Asa Ben-Hur. Combining heterogeneous data sources for accurate functional annotation of proteins. Automated Function Prediction Meeting Proceedings (ISMB 2011). BMC Bioinformatics, 14(Suppl 3), 2013.
- A. Sokolov and A. Ben-Hur. Multi-view prediction of protein function. In: ACM Conference on Bioinformatics, Computational Biology and Biomedicine, 2011. [ preprint ]
- A. Sokolov and A. Ben-Hur. Hierarchical classification of Gene Ontology terms using the GOstruct method. Journal of Bioinformatics and Computational Biology 8(2): 357-376, 2010. [ preprint ]
- A. Sokolov and A. Ben-Hur. GOstruct: utilizing the structure of the Gene Ontology for accurate prediction of protein function. 8th Annual International Conference on Computational System Bioinformatics (CSB2009).
- M. Rogers and A. Ben-Hur. The use of Gene Ontology evidence codes in preventing classifier assessment bias. Bioinformatics 25(9):1173-1177, 2009. [ preprint ]