User Tools

Site Tools


Assignment 6

Due date: 11/2 at 11:59pm.

Analyzing Protein-protein interactions

In this assignment you will write Python code that loads and analyzes protein-protein interaction data.

Your first task is to load interaction data stored in a file. The data is stored as a comma-delimited format, i.e. a CSV file. Here are the first few lines of our example file:


Each line has the format:


This indicates that protein_a interacts with protein_b. In writing your code, use the following interaction dataset. This file contains 10,517 interactions in yeast extracted from the Bind database.

Part 1: the naive implementation (60 pts)

Write a module called with the following functions:

  • load_interactions(file_name): this function should return a list of tuples, where each element in the list is an interaction, and each element in the tuple is the identifier of a protein. For the above example, the return value should be a list of length three:
[('YPL094C','YPR086W'), ('YPL043W','YPR072W'), ('YPL070W','YPR193C')]
  • interact(interactions, id1, id2): This function receives the IDs of two proteins and returns True if they appear in the given interaction dataset, and False otherwise. Make sure that your function returns the same value regardless of the order in which the proteins are provided.
  • get_interactions(interactions, id). This function returns the IDs of all the proteins with which the protein with the given ID interacts with in the given interaction dataset.
  • average_interactions(interactions). Returns the average number of interactions per protein in the given set of interactions.

The following is a barebones use case of the code:

interactions = load_interactions('')
protein1 = 'YPL094C'
protein2 = 'YPR086W'
print("do " + protein1 + " protein " + protein2 + "interact? " +   interact(interactions, protein1, protein2))
print ("the number of interactions of " + protein1 " : " + str(len(get_interactions(interactions, protein1)))
print("the average number of interactions per protein: " + str(average_interactions(interactions)))

Part 2: representing the network using dictionaries (40 pts)

In the second part of the assignment, rewrite your code using dictionaries where now represent the set of interactions associated with a protein using a dictionary. This will make your code much faster, as you will no longer need to search the list of interactions using a for loop. Put the functions in a module called


In writing your code use the template shown in class. The “main” segment of the module should be used to test each of the functions. Submit the files and via Canvas.

assignments/assignment6.txt · Last modified: 2016/10/24 20:06 by asa