Due: 10am, Sept 6, 2016
Assignments account for 60 % of your final grade. All assignments are due by the following class start time (Tuesday 10 am). You will need to have completed the assignment to effectively follow along in the next class. A point will be awarded for each question (10 points total). Possible answers will be posted the evening after the assignments are due.
We are going use the yeast genome for these exercises. Make sure you have downloaded S288C_reference_genome_R64-1-1_20110203.tgz from http://www.yeastgenome.org.
S288C_reference_sequence_R64-1-1_20110203.fsais a fasta file. It contains annotation lines that begins with
>and sequence information that contains the characters A,T,G, or C. Can you figure out how many annotation lines are in the file? Write two commands piped together that will display the number of annotation lines in the
~/03_annotations/saccharomyces_cerevisiae_R64-1-1_20110208.gff). One cool aspect of grep is that you can specify to match strings that appear at the very beginning of a line using the
^symbol. This is used like so…
$grep '^apple' file.txt
This would find only instances of the word
apple that appeared at the beginning of a line. See how you could use
^ to pull out just the tab-delimited portion of the .gff file and save it as
sacCer3_tdt_annotation.gff. What command did you use?
sacCer3_tdt_annotation.gffand extract out annotation lines for ONLY the nuclear-encoded tRNAs. What piped series of commands would you use to (1) extract out just the lines that contain 'tRNA' entries (in the third column), (2) remove any lines that contain 'chrMito', and (3) save the resulting file with the name
sacCer3_tRNA_minusMito.gff(hint1: I used a grep command, another grep command, and a redirection. hint2: You may need to get tricky to extract the 'tRNA' entries. Think '\t'.
sacCer3_tRNA_minusMito.gfffile. What are they?
sacCer3_tRNA_minusMito.gfffile into a bed file? Save it as
sacCer3_tRNA.bed. (hint: see the
cutman pages and look at the examples)