Due: 10am, Sept 5, 2017
A point will be awarded for each question (10 points total).
S288C_reference_sequence_R64-1-1_20110203.fsa
. What command would you execute to show the number of lines, words, and characters in this file?S288C_reference_sequence_R64-1-1_20110203.fsa
is a fasta file. It contains annotation lines that begin with >
and sequence information that contains the characters A,T,G, or C. Can you figure out how many annotation lines are in the file? Write two commands piped together that will allow you to display the number of annotation lines in the S288C_reference_sequence_R64-1-1_20110203.fsa
file.test.fsa
that contains just the top 1000 lines of S288C_reference_sequence_R64-1-1_20110203.fsa
?~/03_annotations/saccharomyces_cerevisiae_R64-1-1_20110208.gff
). You'll remember that .gff files list all the annotated features in a genome (genes, start codons, tRNAs, snoRNAs, etc). A gff file contains (1) commented annotation information (lines that start with #); (2) tab-delimited feature information lines corresponding to all the genome features (lines start with chr…); and (3) sometimes a fasta file at the end. One cool aspect of grep is that you can specify to match strings that appear at the very beginning of a line using the ^
symbol. This is used like so… $grep '^apple' file.txt
This would find only instances of the word apple
that appeared at the beginning of a line. See how you could use ^
within a grep command to pull out just the tab-delimited feature information from the .gff file (in other words, leave behind #-commented information or fasta information) and save the feature information as sacCer3_tab.gff
. What command did you use?
sacCer3_tab.gff
and extract out tab-delimited lines for ONLY the tRNAs. What command would you use to (1) extract out just the lines that contain 'tRNA' entries listed in the third column and (2) save them to a file called sacCer3_tRNA.gff
. (Hint: to make sure you don't capture the word 'tRNA' listed in some other entry, try to restrict that you want just tRNAs that have a tab before and after them. Tabs can be specified as '\t'). (windows_hint;hint1; hint2)sacCer3_tRNA.gff
file. What are they? (hint3)sacCer3_tRNA.gff
file into a bed file and save it as sacCer3_tRNA.bed
. What command did you use? (hint4)sgpu
node, using the (2) maximum number of cores you can use on that node, using the (3) maximum time, and you would like your job to be in a (4) normal
queue. List the #SBATCH options at the top of your script to specify these requests? ####################################################################################### #NAME: # #DATE: # #ASSIGNMENT: 2 ####################################################################################### 1) 2) 3) 4) 5) 6) 7) 8) 9A) 9B) 9C) 10)