Due date: 11/2 at 11:59pm.
In this assignment you will write Python code that loads and analyzes protein-protein interaction data.
Your first task is to load interaction data stored in a file. The data is stored as a comma-delimited format, i.e. a CSV file. Here are the first few lines of our example file:
YPL094C,YPR086W YPL043W,YPR072W YPL070W,YPR193C
Each line has the format:
protein_a,protein_b
This indicates that protein_a
interacts with protein_b
.
In writing your code, use the following interaction dataset.
This file contains 10,517 interactions in yeast extracted from the Bind database.
Write a module called ppi1.py
with the following functions:
load_interactions(file_name)
: this function should return a list of tuples, where each element in the list is an interaction, and each element in the tuple is the identifier of a protein. For the above example, the return value should be a list of length three:[('YPL094C','YPR086W'), ('YPL043W','YPR072W'), ('YPL070W','YPR193C')]
interact(interactions, id1, id2)
: This function receives the IDs of two proteins and returns True if they appear in the given interaction dataset, and False otherwise. Make sure that your function returns the same value regardless of the order in which the proteins are provided.get_interactions(interactions, id)
. This function returns the IDs of all the proteins with which the protein with the given ID interacts with in the given interaction dataset.The following is a barebones use case of the code:
interactions = load_interactions('yeast_interactions.data') protein1 = 'YPL094C' protein2 = 'YPR086W' print("do " + protein1 + " protein " + protein2 + "interact? " + interact(interactions, protein1, protein2)) print ("the number of interactions of " + protein1 " : " + str(len(get_interactions(interactions, protein1))) print("the average number of interactions per protein: " + str(average_interactions(interactions)))
In the second part of the assignment, rewrite your code using dictionaries where now represent the set of interactions associated with a protein using a dictionary. This will make your code much faster, as you will no longer need to search the list of interactions using a for loop. Put the functions in a module called ppi2.py
.
In writing your code use the template shown in class. The “main” segment of the module should be used to test each of the functions. Submit the files ppi1.py
and ppi2.py
via Canvas.