This page describes software created in my lab.

Deep Learning


SATORI is a deep learning tool for predicting interactions between regulatory features. It interprets self-attention weights as putative interactions and analyzes them for their significance.


RODAN is a basecaller for Oxford Nanopore direct RNA sequencing data. It provides state-of-the-art performance, surpassing the accuracy of the RNA basecallers from ONT.


Q_epsilon is a graph convolutional model for protein quality assessment. Unlike existing approaches, it is a pure deep learning approach that doesn’t use any engineered features, and relies on a loss function inspired by the SVM regression epsilon-insenstive loss.


DeepRAM provides a collection of deep learning architectures for genomics data, as part of a paper that evaluated approaches for transcription factor binding.

Graph convolution for protein structures

NIPS 2017 code repository. This paper is the first to introduce graph neural networks to the analysis of protein 3D structures.

Next generation sequencing analysis


TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) is a package for prediction and analysis of Pacific BioSciences Iso-Seq data. It performs error correction, splice isoform prediction, and prediction alternative polyadenylation sites. Results of running it in sorghum are available here


iDiffIR predicts differential intron retention from RNA-seq data.


SpliceGrapher is a tool for predicting splice graphs and alternative splicing patterns using RNA-seq data, guided by gene models and EST data.

Kernel methods


PyML is an interactive object oriented scripting framework for machine learning written in Python. PyML focuses on SVMs and other kernel methods.


PAIRpred: prediction of protein-protein interfaces from sequence and structure.


MI1 is a method for predicting Calmodulin binding sites. We have created an online tool that biologists can use to obtain predictions for their proteins.


Strut is a C++ package for structured large-margin classification that is designed for hierarchical classification problems such as protein function prediction.



DAWGS is a library implementing directed acyclic word graphs, which are data structures that allow efficient search for all substrings that occur within a given string. It also provides a method for discovery of k-mers-with-mismatches that discriminate between two sets of sequences