The main focus of research in my lab is on the development of machine learning methods for problems in bioinformatics. Our specialty is in the creative design of kernel methods for problems ranging from prediction of protein function, and interactions to prediction of alternative splicing.

Prediction of protein interactions and interfaces


Proteins perform their function by interacting with other proteins. Therefore understanding the complex network of interactions between proteins, and at a finer level, knowing the interfaces through which those interactions occur are highly important. We are currently developing methods for predicting protein interfaces from protein 3-d structure using deep learning approaches.


This project is funded by a grant from the NSF’s ABI program (award # 1564840).

Our first go at this problem produced the PAIRpred method, which demonstrated the feasibility of partner specific interaction prediction from protein 3-d structure:

My earlier work in this area includes prediction of Calmodulin binding sites, and genome-wide prediction of interaction networks in yeast and human.

Alternative splicing in plants

Splicing is the process whereby parts of a gene called introns are removed, and the RNA is spliced back to form the mature mRNA. A given gene can be spliced in multiple ways, a phenomenon called alternative splicing. Whereas it is well-studied in animals, alternative splicing in plants is not as well understood, and the differences in genome architecture between plants and animals lead to differences in alternative splicing. We are working on this in collaboration with A.S.N. Reddy of the Biology Department, and our approach is to computationally search for genomic features that are predictive of alternative splicing—elements that serve as splicing enhancers and suppressors, and test their biological relevance to the process.

A second avenue we are pursuing is to leverage next generation sequencing data for prediction of alternative splicing events and improve genome annotation. The noisy nature of this data makes this a challenging task.


The above figure shows a model created by our SpliceGrapher tool.


This project is funded by NSF and DOE.

Protein function prediction


Despite having been studied for over twenty years, the standard method for protein function prediction remains annotation transfer. The difficulty in applying state-of-the-art machine learning methods is that proteins can have multiple functions, and that the system of keywords used to describe protein function, the Gene Ontology (GO), has a complex hierarchical structure. This provides genome annotators with a rich vocabulary with which to describe protein function, but makes it sub-optimal to use standard approaches. Therefore, there are significant opportunities to develop new classification methods that treat function prediction as a hierarchical classification problem.

Our approach uses the so-called structured SVM, which is able to fully model the complexity of this learning problem. The method we developed, GOstruct, has shown state-of-the-art performance in several benchmarks.


This project was funded by NSF grant ABI 0965768/0965616.