Assignment 4
Due date: February 22nd.
String matching with mismatches
The find function we wrote in class determines whether a given string is a substring of another string, and returns the first position where they match. Your task is to write a more general function that allows mismatches to occur. The signature of your function should be:
find_with_mismatches(s, substr, num_mismatches)
You are looking for matches of substr in the string s that have up to num_mismatches mismatches, and you are to return the first index where it occurs, or -1 if it does not.
For example: the string CGCT occurs in AGGTCACTAG when you allow for a single mismatch in index 4.
In the context of motif finding this is very useful, since patterns (motifs) in DNA or protein sequences do not always occur in exactly the same way. Searching with mismatches allows us to capture this variability.
Further assume that your function is receiving a DNA sequence as input, and that positions that are not either A,C,G, or T in the string your are searching in (s) do not constitute matches.
Put your function in a file called find_with_mismatches.py.
Reverse complement
Write a function that receives as input a DNA sequence and computes its reverse complement. For example, the reverse complement of AGTCATG
is CATGACT. In computing the reverse complement assume that any character that is not A,C,G, or T is its own complement.
Call your function reverse_complement, and put it in a file called reverse_complement.py.
Submit the programs by email to your instructor. At the top of each file put a comment that identifies you and the program (use a multi-line comment using triple quotes). In addition, each function should include a comment in triple quotes that explains what it does, what kind of input it expects, and what it returns (such comments are used as help messages).
