Assignment 6

Due date: 11/2 at 11:59pm.

Analyzing Protein-protein interactions

In this assignment you will write Python code that loads and analyzes protein-protein interaction data.

Your first task is to load interaction data stored in a file. The data is stored as a comma-delimited format, i.e. a CSV file. Here are the first few lines of our example file:


Each line has the format:


This indicates that protein_a interacts with protein_b. In writing your code, use the following interaction dataset. This file contains 10,517 interactions in yeast extracted from the Bind database.

Part 1: the naive implementation (60 pts)

Write a module called with the following functions:

[('YPL094C','YPR086W'), ('YPL043W','YPR072W'), ('YPL070W','YPR193C')]

The following is a barebones use case of the code:

interactions = load_interactions('')
protein1 = 'YPL094C'
protein2 = 'YPR086W'
print("do " + protein1 + " protein " + protein2 + "interact? " +   interact(interactions, protein1, protein2))
print ("the number of interactions of " + protein1 " : " + str(len(get_interactions(interactions, protein1)))
print("the average number of interactions per protein: " + str(average_interactions(interactions)))

Part 2: representing the network using dictionaries (40 pts)

In the second part of the assignment, rewrite your code using dictionaries where now represent the set of interactions associated with a protein using a dictionary. This will make your code much faster, as you will no longer need to search the list of interactions using a for loop. Put the functions in a module called


In writing your code use the template shown in class. The “main” segment of the module should be used to test each of the functions. Submit the files and via Canvas.