User Tools

Site Tools


assignments:enrichment2

Further Enrichment 2

Exercise 1: Structural biology

Do you love protein structures? There are many interesting standardized file formats that are used in structural biology. Explore some of these datasets:

  • Go to the Protein Data Base: http://www.rcsb.org/pdb/home/home.do
  • Search for your favorite protein structure
  • Of the search results that pop up, click on an interesting looking entry
  • When you arrive at a given page for an interesting crystal structure, look for the DOWNLOAD FILES pull down menu. Download a PDB file (compressed or uncompressed).

  • Use more/less to browse through the PDB file. Can you figure out how the information is organized? For more information check the PDB file format Wikipedia page
  • Can you count how many oxygens are in your protein?
  • Can you count the number of LYS (lysines) in your protein?

Exercise 2: Pubmed timeline

When you're starting a collection of bibliographic material, it can sometimes be useful to download bibliographic information in bulk from pubmed. Try the following exercise that will create a timeline of publication dates for a given researcher:

  • Enter the name of one of your favorite scientists.
  • Download the results of the search by selecting the pull down menu SEND TO
  • Select FILE
  • Select MEDLINE

  • Download the dataset and browse it in the terminal.
  • Display all the publication dates this person has published on, sorted by year (I couldn't get month sorting to work).

Exercise 3: What is a bed file for?

Bed files are very common standardized file formats. They are fairly compact ways of representing locations on a genome. For this exercise, see if you can upload your .bed file to a genome browser. A genome browser is an interface that can allow you to cruise around the genome. You can zoom out and see everything on a chromosome, or zoom into a specific gene or location. There are two main genome browsers: IGV which is a local program you can install. Or UCSC Genome browser, which is an online resource.

The sacCer3_tRNA.bed file you made contains the locations of all the tRNA gene in the yeast genome. Let's try to upload this file to the UCSC Genome browser. See if you can follow along with these commands:

  • Make sure the genome April 2011(SacCer_April2011/SacCer3) is selected
  • Press GO to take you to the browser
  • Check out the browser. To learn more about how to navigate on the browser, watch this video. It's a little out of date but it gets the main point across.
  • Let's add your .bed file to the browser. This will display all the tDNA locations specified in your bedfile as it's own custom track.
  • To do this, click ADD CUSTOM TRACKS. It's here:

  • Upload the file you made for Assignment2 titled sacCer3_tRNA.bed by selecting CHOOSE FILE and then pushing SUBMIT after you have directed to your .bed file.
  • On the MANAGE CUSTOM TRACKS page that results, push GO. (If this didn't work, there may be something wrong with your .bed file. Try to discern the error message and correct it. Common things that go wrong: you have an incompatibility between the genome versions. Make sure it is sacCer3 in both cases. Or, you have extra characters or weird characters)
  • Your custom track should appear in a row marked USER TRACK.
  • Zoom out to get a sense of what you are looking at (push ZOOM OUT, 10x)
  • Explore the genome and look and see where tRNAs are located. Try entering chrI in the enter position or search terms bar. Try zooming in on tRNAs by dragging over regions of interest.
  • Do your tRNA regions match up with SGD annotated tRNA regions?

Exercise 4: First shell script

Do you remember all the steps you took in Assignment 2 to extract out a .bed file containing the tRNA loci from a large, annotated .gff file with fasta info at the end? Can you make a very simple .sh script that makes a .bed file containing the loci for all the introns in the yeast genome? How many spliced genes are there? It's not many!


Exercise 5: More databases

There is a wealth of biological data out there. Find a biological database relevant to your field of study and download some datasets. What are the common file types? Which can be read as text files?


Exercise 6: Make your own oligo inventory

Do you order a lot of oligos? If so, the companies typically e-mail you in standardized formats. Try downloading all the e-mails and pulling out relevant information about each oligo based on their standardized format. You can then save these to an .xls sheet or a database program. Your database might be quite basic after Week2 of the course, but after Week4, it may become more sophisticated. If you are interested in making more apps like this, consider taking the python course.

assignments/enrichment2.txt ยท Last modified: 2016/09/02 20:02 by erin