User Tools

Site Tools


assignments:2017assignment1

Due Date & Grading

Assignments account for 70% of your final grade. All assignments are due by the following class start time (Tuesday 10 am). You will need to have completed the assignment to effectively follow along in the next class. A point will be awarded for each question (10 points total). Possible answers will be posted the evening after the assignments are due.

Performing the assignment:

  • Using your text editor (TextWrangler, Notepad++, etc) start a file titled <yourlastname>_Assignment1.txt. Replace <yourlastname> with your actual last name.
  • Answer each question below. You can use this template by copying and pasting the template into your text editor.
  • Do NOT use Microsoft WORD or any Office software to edit your .txt file. This can add extra characters to the file and you'll lose points.
  • When a question asks for a command, please supply the entire command line, the full set of instructions that you would put after the prompt and before pressing return.
  • Turn the assignment in via canvas.

Instructions:

We are going to download the yeast genome. Each model organism hosts a database website that contains information on the model system's genome sequence and annotation. In addition to these resources, NCBI, UCSC Genome Browser, Ensembl, and RefSeq all host repositories of genome sequences and annotations.

Let's download the yeast genome:

  • Navigate to the Saccharomyces Genome Database (http://www.yeastgenome.org/).
  • Go to the menu Sequences
  • Go to Reference Genome
  • Go to Download Genome
  • Download the README file by right-clicking on genome_releases.README. Save it to a directory where you will do your assignment called Assignment1
  • Download the zipped file S288C_reference_genome_R64-1-1_20110203.tgz by right-clicking. Save to the Assignment1 directory. You'll notice this is not the most recent version. However, it is a very common stable version called SacCer3.
  • Using Finder/Explorer, open the Assignment1 directory. Double click the .tgz file to decompress and expand it.
  • Switch to the terminal.
  • Start in your home directory.

Assignment questions:

  1. Starting from your home directory, use a single command to navigate to the Assignment1 directory. What command did you use?
  2. Upon arriving in the Assignment1 directory, what two commands do you execute to A) double check where you are and B) see the directory contents?
  3. What command would you execute to read the genome_releases.README file contents? Execute this command.
  4. What command do you execute to navigate into the S288C_reference_genome_R64-1-1_20110203 directory from its parent directory? Execute this command to navigate into this directory.
  5. You'll notice there are several .fasta files, an .fsa file, and a .gff file. How would you execute an ls command with a wildcard so it lists ONLY the .fasta files?
  6. How would you execute an ls command with wildcards AND options to produce the following results: a list of just the .fasta files, sorted in order of file size, and displayed in long format? The output should look like so:
    -rw-r--r--@ 1 erinonish  staff  11005431 Feb  3  2011 orf_coding_all_R64-1-1_20110203.fasta
    -rw-r--r--@ 1 erinonish  staff   4841092 Feb  3  2011 orf_trans_all_R64-1-1_20110203.fasta
    -rw-r--r--@ 1 erinonish  staff   3432843 Feb  3  2011 NotFeature_R64-1-1_20110203.fasta
    -rw-r--r--@ 1 erinonish  staff    973127 Feb  3  2011 other_features_genomic_R64-1-1_20110203.fasta
    -rw-r--r--@ 1 erinonish  staff    145907 Feb  3  2011 rna_coding_R64-1-1_20110203.fasta
  7. This directory could use some organization. What command would you execute to make three sub-directories titled 01_genome, 02_fastafiles, 03_annotations, 04_other?
  8. What commands would you execute to move (A) the .fsa file into 01_genome; (B) all the .fasta files into 02_fastafiles; (C) the .gff file into 03_annotations; (D) the .sgd file into 04_other? if you do this right, you should end up with a file structure that looks like this
    .
    ├── 01_genome
    │   └── S288C_reference_sequence_R64-1-1_20110203.fsa
    ├── 02_fastafiles
    │   ├── NotFeature_R64-1-1_20110203.fasta
    │   ├── orf_coding_all_R64-1-1_20110203.fasta
    │   ├── orf_trans_all_R64-1-1_20110203.fasta
    │   ├── other_features_genomic_R64-1-1_20110203.fasta
    │   └── rna_coding_R64-1-1_20110203.fasta
    ├── 03_annotations
    │   └── saccharomyces_cerevisiae_R64-1-1_20110208.gff
    └── 04_other
        └── gene_association_R64-1-1_20110205.sgd
  9. Make a new directory called 00_README. Starting from within the S288C_reference_genome_R64-1-1_20110203 directory, how would you move genome_releases.README into 00_README? hint: it's in the parent directory.
  10. It is good practice to write some documentation every time you download something from the internet, like a genome. This information is often kept in a personal README file. Make your own readme file. Navigate to the directory 00_README. Make your own README file. Give it a name. Add content into your README file. Your README file should contain the information below. Copy the contents of your README file into the answer for this question.
    README file contents:
     - today's date
     - your name and that you are the author of the readme file
     - what you downloaded
     - the URL address where the downloaded information came from
     - why you wanted to download the the files 
     - how you organized your files
     - whether you encountered any problems

Hint

At the end of this exercise, your file structure of Assignment1 should look like the following. You may or may not have a .tgz file in there. You can remove the .tgz file at the end of completing this exercise.

.
├── S288C_reference_genome_R64-1-1_20110203
│   ├── 00_README
│   │   ├── 170824_download_README.txt (or whatever you named your readme file)
│   │   └── genome_releases.README
│   ├── 01_genome
│   │   └── S288C_reference_sequence_R64-1-1_20110203.fsa
│   ├── 02_fastafiles
│   │   ├── NotFeature_R64-1-1_20110203.fasta
│   │   ├── orf_coding_all_R64-1-1_20110203.fasta
│   │   ├── orf_trans_all_R64-1-1_20110203.fasta
│   │   ├── other_features_genomic_R64-1-1_20110203.fasta
│   │   └── rna_coding_R64-1-1_20110203.fasta
│   ├── 03_annotations
│   │   └── saccharomyces_cerevisiae_R64-1-1_20110208.gff
│   └── 04_other
│       └── gene_association_R64-1-1_20110205.sgd
├── S288C_reference_genome_R64-1-1_20110203.tgz

More Enrichment Exercises

assignments/2017assignment1.txt · Last modified: 2017/08/23 21:55 by erin