Software

This page describes software created in my lab.

Deep Learning

SATORI

SATORI is a deep learning tool for predicting interactions between regulatory features. It interprets self-attention weights as putative interactions and analyzes them for their significance.

RODAN

RODAN is a basecaller for Oxford Nanopore direct RNA sequencing data. It provides state-of-the-art performance, surpassing the accuracy of the RNA basecallers from ONT.

Q_epsilon

Q_epsilon is a graph convolutional model for protein quality assessment. Unlike existing approaches, it is a pure deep learning approach that doesn’t use any engineered features, and relies on a loss function inspired by the SVM regression epsilon-insenstive loss.

DeepRAM

DeepRAM provides a collection of deep learning architectures for genomics data, as part of a paper that evaluated approaches for transcription factor binding.

Graph convolution for protein structures

NIPS 2017 code repository. This paper is the first to introduce graph neural networks to the analysis of protein 3D structures.

Next generation sequencing analysis

TAPIS

TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) is a package for prediction and analysis of Pacific BioSciences Iso-Seq data. It performs error correction, splice isoform prediction, and prediction alternative polyadenylation sites. Results of running it in sorghum are available here

iDiffIR

iDiffIR predicts differential intron retention from RNA-seq data.

SpliceGrapher

SpliceGrapher is a tool for predicting splice graphs and alternative splicing patterns using RNA-seq data, guided by gene models and EST data.

Kernel methods

PyML

PyML is an interactive object oriented scripting framework for machine learning written in Python. PyML focuses on SVMs and other kernel methods.

PAIRpred

PAIRpred: prediction of protein-protein interfaces from sequence and structure.

MI1

MI1 is a method for predicting Calmodulin binding sites. We have created an online tool that biologists can use to obtain predictions for their proteins.

Strut

Strut is a C++ package for structured large-margin classification that is designed for hierarchical classification problems such as protein function prediction.

Algorithms

DAWGS

DAWGS is a library implementing directed acyclic word graphs, which are data structures that allow efficient search for all substrings that occur within a given string. It also provides a method for discovery of k-mers-with-mismatches that discriminate between two sets of sequences