Main.Projects History
Hide minor edits - Show changes to markup
548 projects this semester will focus on tools for spliced alignment of short read data.
Projects this semester will focus on tools for spliced alignment of short read data.
580 projects this semester will focus on tools for spliced alignment of short read data.
548 projects this semester will focus on tools for spliced alignment of short read data.
- We have more simulated data - 14 million paired end reads. There are two files: set1 and set2. Start with aligning them individually, and see if using them as paired end data improves accuracy. Here's a file that provides the cigar strings for the reads.
- We have more simulated data - 14 million paired end reads. There are two files: set1 and set2. Start with aligning them individually, and see if using them as paired end data improves accuracy. Here's a file that provides the cigar strings for the reads.
- For testing purposes, here are two files that provide a list of splice junctions generated from EST alignments and from curated gene models. [ EST junctions ], [ annotated junctions ].
- For testing purposes, here are two files that provide a list of splice junctions generated from EST alignments and from curated gene models. [ EST junctions ], [ annotated junctions ].
- For comparison, here the alignments produced by tophat for the first set of reads, and the second set of reads, and by mapsplice for the first set of reads, and for the second set of reads.
- For testing purposes, here are two files that provide a list of splice junctions generated from EST alignments and from curated gene models. [ EST junctions ], [ annotated junctions coming soon ].
- For testing purposes, here are two files that provide a list of splice junctions generated from EST alignments and from curated gene models. [ EST junctions ], [ annotated junctions ].
- For testing purposes, here are two files that provide a list of splice junctions generated from EST alignments and from curated gene models. [ EST junctions ], [ coming soon]
- For testing purposes, here are two files that provide a list of splice junctions generated from EST alignments and from curated gene models. [ EST junctions ], [ annotated junctions coming soon ].
- For testing purposes, here are two files that provide a list of splice junctions generated from EST alignments and from curated gene models. [ EST junctions ], [ gene model junctions ]
- For testing purposes, here are two files that provide a list of splice junctions generated from EST alignments and from curated gene models. [ EST junctions ], [ coming soon]
For aligning both datasets you will need the sequence of the Arabidiopsis genome. You can download these from the TAIR website. You will need the sequences for chromosomes 1-5.
New data
- We have more simulated data - 14 million paired end reads. There are two files: set1 and set2. Start with aligning them individually, and see if using them as paired end data improves accuracy. Here's a file that provides the cigar strings for the reads.
- For testing purposes, here are two files that provide a list of splice junctions generated from EST alignments and from curated gene models. [ EST junctions ], [ gene model junctions ]
For aligning the datasets you will need the sequence of the Arabidiopsis genome. You can download these from the TAIR website. You will need the sequences for chromosomes 1-5.
Short read simulated data from Arabidopsis thaliana. Here's a readme.
You can download the genome sequences from the TAIR website. You will need the sequences for chromosomes 1-5.
We ask you to apply your chosen program to the following two datasets:
- Short read simulated data from Arabidopsis thaliana. Here's a readme.
- Short read data generated by our collaborator from the biology department. The data is available from the NCBI short-read archive as GEO accession GSE32318. Note that this data is composed of two replicates that you need to align separately. The link for downloading the data is at the bottom of the page, labeled as supplementary file download.
For aligning both datasets you will need the sequence of the Arabidiopsis genome. You can download these from the TAIR website. You will need the sequences for chromosomes 1-5.
Simulated data from Arabidopsis thaliana. Here's a readme.
Short read simulated data from Arabidopsis thaliana. Here's a readme.
You can download the genome sequences from the TAIR website. You will need the sequences for chromosomes 1-5.
Presentation schedule:
Presentation schedule:
- Tuesday 10/11 Jeremy
- Thursday 10/13 Fayyaz and Mo
- Tuesday 10/18 Nathan and Arpita
- Thursday 10/20 Zhisheng and Indika
- Gsnap
- Gsnap (Zhisheng)
- BWA
- BWA (Nathan)
- palmapper
- palmapper (Fayyaz)
- SpliceMap
- SpliceMap (Mo)
- MapSplice
- MapSplice (Jeremy)
- TopHat
- TopHat (Jeremy)
Gregory R. Grant, Michael H. Farkas, Angel D. Pizarro, Nicholas F. Lahens, Jonathan Schug, Brian P. Brunk, Christian J. Stoeckert, John B. Hogenesch, and Eric A. Pierce. Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM). Bioinformatics (2011) 27(18): 2518-2528.
Gregory R. Grant, Michael H. Farkas, Angel D. Pizarro, Nicholas F. Lahens, Jonathan Schug, Brian P. Brunk, Christian J. Stoeckert, John B. Hogenesch, and Eric A. Pierce. Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM). Bioinformatics (2011) 27(18): 2518-2528.
- RUM
Gregory R. Grant, Michael H. Farkas, Angel D. Pizarro, Nicholas F. Lahens, Jonathan Schug, Brian P. Brunk, Christian J. Stoeckert, John B. Hogenesch, and Eric A. Pierce.
Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM). Bioinformatics (2011) 27(18): 2518-2528.
Paper: K. Wang et al. MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucl. Acids Res. (2010) 38(18): e178.
K. Wang et al. MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucl. Acids Res. (2010) 38(18): e178.
Paper: T.D. Wu and S. Nacu. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics (2010) 26: 873-881.
T.D. Wu and S. Nacu. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics (2010) 26: 873-881.
Kin Fai Au, Hui Jiang, Lan Lin, Yi Xing, and Wing Hung Wong. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Research (2010) doi: 10.1093/nar/gkq211.
Kin Fai Au, Hui Jiang, Lan Lin, Yi Xing, and Wing Hung Wong. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Research (2010).
- BWA
Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 2010;26(5):589-95.
Huang S, Zhang J, Li R, Zhang W, He Z, Lam T-W, Peng Z and Yiu S-M. SOAPsplice: genome-wide ab initio detection of splice junctions from RNA-Seq data. Frontiers in Genomic Assay Technology. (2010) 2:46.
Huang S, Zhang J, Li R, Zhang W, He Z, Lam T-W, Peng Z and Yiu S-M. SOAPsplice: genome-wide ab initio detection of splice junctions from RNA-Seq data. Frontiers in Genomic Assay Technology (2010) 2:46.
- SOAPsplice
Huang S, Zhang J, Li R, Zhang W, He Z, Lam T-W, Peng Z and Yiu S-M. SOAPsplice: genome-wide ab initio detection of splice junctions from RNA-Seq data. Frontiers in Genomic Assay Technology. (2010) 2:46.
Kin Fai Au, Hui Jiang, Lan Lin, Yi Xing, and Wing Hung Wong. . Nucleic Acids Research (2010) doi: 10.1093/nar/gkq211.
Kin Fai Au, Hui Jiang, Lan Lin, Yi Xing, and Wing Hung Wong. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Research (2010) doi: 10.1093/nar/gkq211.
- tophat
- TopHat
- SpliceMap
Kin Fai Au, Hui Jiang, Lan Lin, Yi Xing, and Wing Hung Wong. . Nucleic Acids Research (2010) doi: 10.1093/nar/gkq211.
- tophat
- tophat
Trapnell C, Pachter L, and Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics (2009) 25 (9): 1105-1111.
- palmapper.
- palmapper
De Bona, F. et al., Optimal spliced alignments of short sequence reads. ECCB08/Bioinformatics, 24 (16):i174, 2008.
Paper: Kai Wang, Darshan Singh, Zheng Zeng, Stephen J. Coleman, Yan Huang, Gleb L. Savich, Xiaping He, Piotr Mieczkowski, Sara A. Grimm, Charles M. Perou, James N. MacLeod, Derek Y. Chiang, Jan F. Prins, and Jinze Liu. MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucl. Acids Res. (2010) 38(18): e178.
Paper: K. Wang et al. MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucl. Acids Res. (2010) 38(18): e178.
Paper: Thomas D. Wu and Serban Nacu. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics (2010) 26: 873-881.
Paper: T.D. Wu and S. Nacu. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics (2010) 26: 873-881.
Paper: Kai Wang, Darshan Singh, Zheng Zeng, Stephen J. Coleman, Yan Huang, Gleb L. Savich, Xiaping He, Piotr Mieczkowski, Sara A. Grimm, Charles M. Perou, James N. MacLeod, Derek Y. Chiang, Jan F. Prins, and Jinze Liu. MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery.
Nucl. Acids Res. (2010) 38(18): e178
- Gsnap
Paper: Kai Wang, Darshan Singh, Zheng Zeng, Stephen J. Coleman, Yan Huang, Gleb L. Savich, Xiaping He, Piotr Mieczkowski, Sara A. Grimm, Charles M. Perou, James N. MacLeod, Derek Y. Chiang, Jan F. Prins, and Jinze Liu. MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucl. Acids Res. (2010) 38(18): e178.
- Gsnap
Paper: Thomas D. Wu and Serban Nacu. Fast and SNP-tolerant detection of complex variants and splicing in short reads.
Bioinformatics (2010) 26: 873-881.
Paper: Kai Wang, Darshan Singh, Zheng Zeng, Stephen J. Coleman, Yan Huang, Gleb L. Savich, Xiaping He, Piotr Mieczkowski, Sara A. Grimm, Charles M. Perou, James N. MacLeod, Derek Y. Chiang, Jan F. Prins, and Jinze Liu
MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery.
Paper: Kai Wang, Darshan Singh, Zheng Zeng, Stephen J. Coleman, Yan Huang, Gleb L. Savich, Xiaping He, Piotr Mieczkowski, Sara A. Grimm, Charles M. Perou, James N. MacLeod, Derek Y. Chiang, Jan F. Prins, and Jinze Liu. MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery.
- MapSplice
- MapSplice
Paper: Kai Wang, Darshan Singh, Zheng Zeng, Stephen J. Coleman, Yan Huang, Gleb L. Savich, Xiaping He, Piotr Mieczkowski, Sara A. Grimm, Charles M. Perou, James N. MacLeod, Derek Y. Chiang, Jan F. Prins, and Jinze Liu
MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery. Nucl. Acids Res. (2010) 38(18): e178
580 projects this semester will focus on tools for spliced alignment / alignment of short read data. Each student should choose one of the following programs to look at:
- Palma and Qpalma (Alex)
- The tophat / bowtie suite of programs (Kate)
- SSAHA2 (Nissa)
- BLAT (Michael)
- Exonerate (Alan)
- Pass (Zach)
- Gmap
Talk to the instructor to choose a program. You will apply the program to simulated short-read data that we will provide. Your final report will describe your experience with the program in comparison to some of the other programs. You will give a presentation during the last week of classes and submit a final report describing your analysis.
580 projects this semester will focus on tools for spliced alignment of short read data. Each student will choose one of the following programs to look at:
- MapSplice
- Gsnap
- palmapper.
- tophat
During the course of the project you will:
- Present the method in class.
- Apply the program to simulated short-read data that we will provide.
- Write a report that describes your experience with the program and present your findings to the class. These will be due during the last week of classes.
- BLAT
- BLAT (Michael)
- Pass (Zach)
- The tophat / bowtie suite of programs
- The tophat / bowtie suite of programs (Kate)
- Palma and Qpalma
- Palma and Qpalma (Alex)
- SSAHA2
- SSAHA2 (Nissa)
- Palma and
Qpalma
- The tophat /
bowtie suite of programs
- Palma and Qpalma
- The tophat / bowtie suite of programs
Each student will choose a project topic after talking with the instructor. Projects will involve analyzing sequence data using methods studied in class. As part of the project each student will give a presentation during the last week of classes and submit a final report describing the analysis.
580 projects this semester will focus on tools for spliced alignment / alignment of short read data. Each student should choose one of the following programs to look at:
- Palma and
Qpalma
- The tophat /
bowtie suite of programs
- SSAHA2
- BLAT
- Exonerate
Talk to the instructor to choose a program. You will apply the program to simulated short-read data that we will provide. Your final report will describe your experience with the program in comparison to some of the other programs. You will give a presentation during the last week of classes and submit a final report describing your analysis.
Projects will be done in groups of two. Each group will choose a project topic after talking with the instructor. Projects will involve analyzing sequence data using methods studied in class. As part of the project each group will give a presentation during the last week of classes and submit a final report describing what they did.
Each student will choose a project topic after talking with the instructor. Projects will involve analyzing sequence data using methods studied in class. As part of the project each student will give a presentation during the last week of classes and submit a final report describing the analysis.
Projects
Projects will be done in groups of two. Each group will choose a project topic after talking with the instructor. Projects will involve analyzing sequence data using methods studied in class. As part of the project each group will give a presentation during the last week of classes and submit a final report describing what they did.
