Main.Biopython-parse History

Hide minor edits - Show changes to markup

April 22, 2010, at 12:44 PM MST by 10.84.44.65 -
Added lines 1-60:

(:source lang=python:)

""" Database file parsing in biopython - Fasta - GenBank - SwissProt see section 5.1 in the biopython tutorial http://biopython.org/DIST/docs/tutorial/Tutorial.html The SeqRecord class is described in section 4.1 """

from Bio import SeqIO parser = SeqIO.parse("cam.fasta", "fasta") print parser record = parser.next()

  1. it is an iterator we can loop thru FastaIterator using a for loop:

from Bio import SeqIO for seq_record in SeqIO.parse("cam.fasta", "fasta") :

    print seq_record.id 
    print seq_record.seq 
    print len(seq_record.seq) 
  1. Essentially the same code can be used to go thru files in uniprot format

from Bio import SeqIO parser = SeqIO.parse("cam.txt", "swiss") print parser record = parser.next()

from Bio import SeqIO for seq_record in SeqIO.parse("cam.txt", "swiss") :

    print seq_record.id 
    print seq_record.seq 
    print len(seq_record.seq) 
  1. If we had a file in genbank format we give the string "genbank" as
  2. an argument to the parse function.

from Bio import SeqIO for seq_record in SeqIO.parse("cam.gbk", "genbank") :

    print seq_record.id 
    print seq_record.seq 
    print len(seq_record.seq) 
  1. Note: use the 'read' function in SeqIO to read data from a sequence file that
  2. contains a single record.
  3. getting a list of records:

from Bio import SeqIO

  1. using list comprehensions

records = [record for record in SeqIO.parse("cam.fasta", "fasta")]

  1. or using the list constructor:

records = list(SeqIO.parse("cam.fasta", "fasta"))

(:sourceend:)