Main.Assignment3 History
Hide minor edits - Show changes to markup
If you find it hard to deal with an arbitrary number of mismatches, then solve the problem for num_mismatches=1.
"""
"""
Nucleotide composition
Write a program that asks the user for a nucleotide sequence and then prints out the fraction of each nucleotide out of the total number of nucleotides in the sequence. Assume the user provides the input in capital letters.
Suppose the input sequence provided by the user is
In this assignment you will write a python module called assignment3.py, which performs the following tasks.
GC content
Write a function called gcContent(sequence) that computes the GC content of a DNA sequence, i.e. the fraction of G or C nucleotides in the sequence.
Your function should work regardless of the case in which the sequence is provided (upper/lower case).
Note that a DNA sequence can contain positions that are ambiguous. These are represented by ambiguity codes. For example, 'N' denotes that any nucleotide is possible in that position. In computing GC content ignore positions that contain ambiguous nucleotides.
For example, for the sequence
Then the output should look like:
The nucleotide composition is:A - 0.2C - 0.1G - 0.3T - 0.4
Note that non-nucleotide symbols are not counted (the N usually denotes a position where the sequence is unknown).
Position i in a string can be accessed as a[i], so you can use a
while or for loop to iterate through the letters of a string.
Call your program nucleotide_composition.py, and have a function in it that receives as a parameter that contains the sequence the user has provided.
Submit the program via ramct. At the top of each file put a comment that identifies you and the program (use a multi-line comment using triple quotes):
Your function should return 0.4
String matching with mismatches
Write a function called find_with_mismatches(s, substr, num_mismatches)
that determines whether the string s contains substr when allowed up to the given number of mismatches (num_mismatches). Your function should return the first position where a match occurs, and -1 if there is no match.
For example: the string CGCT occurs in AGGTCACTAG when you allow for a single mismatch in index 4.
In the context of motif finding this is very useful, since patterns (motifs) in DNA or protein sequences do not always occur in exactly the same way. Searching with mismatches allows us to capture this variability.
Further assume that your function is receiving a DNA sequence as input, and that positions that are not either A,C,G, or T in the string your are searching in (s) do not constitute matches.
Reverse complement
Write a function reverse_complement(sequence) that receives as input a DNA sequence and computes its reverse complement. For example, the reverse complement of AGTCATG
is CATGACT. In computing the reverse complement assume that any character that is not A,C,G, or T is its own complement.
Put the three functions in a module called assignment3.py, and include a "main" that allows a user to test them similarly to assignment 2.
At the top of the file put a comment that identifies you and the program (use a multi-line comment using triple quotes):
(:sourceend:)
(:sourceend:)
Also, just below each function definition, include a short description of the method and its parameters in triple quotes.
Submit the program via ramct by the due date.
Prime number detection
A prime number is a number that is only divisible by 1 or itself.
Write a method called check_prime that receives an integer as input and returns True if it's prime, and False otherwise. The name of the python module should be prime.py.
Submit the programs by email to your instructor. At the top of each file put a comment that identifies you and the program (use a multi-line comment using triple quotes):
Assignment 2Submitted by Your_NameA short description of your program
Submit the program via ramct. At the top of each file put a comment that identifies you and the program (use a multi-line comment using triple quotes):
(:source lang=python:) Assignment 3 Submitted by Your_Name A short description of your program (:sourceend:)
Due date: 2/15/10
Due date: 3/1/13
Due date: 2/15/10
Call your program nucleotide_composition.py, and have a function in it that receives as a parameter that contains the sequence the user has provided.
Write a method called check_prime that receives an integer as input and returns True if it's prime, and False otherwise.
Write a method called check_prime that receives an integer as input and returns True if it's prime, and False otherwise. The name of the python module should be prime.py.
Submit the programs by email to your instructor. At the top of each file put a comment that identifies you and the program (use a multi-line comment using triple quotes):
Assignment 2Submitted by Your_NameA short description of your program
Prime number detection
A prime number is a number that is only divisible by 1 or itself.
Write a method called check_prime that receives an integer as input and returns True if it's prime, and False otherwise.
Position i in a string can be accessed as a[i], so you can use a
while or for loop to iterate through the letters of a string.
Note that non-nucleotide symbols are not counted (the @N@ usually denotes a position where the sequence is unknown).
Note that non-nucleotide symbols are not counted (the N usually denotes a position where the sequence is unknown).
The nucleotide composition is:
A - 0.2
C - 0.1
G - 0.3
T - 0.4
The nucleotide composition is:A - 0.2C - 0.1G - 0.3T - 0.4
@@The nucleotide composition is: A - 0.2 C - 0.1 G - 0.3 T - 0.4@@
The nucleotide composition is:
A - 0.2
C - 0.1
G - 0.3
T - 0.4
Assignment 3
Nucleotide composition
Write a program that asks the user for a nucleotide sequence and then prints out the fraction of each nucleotide out of the total number of nucleotides in the sequence. Assume the user provides the input in capital letters.
Suppose the input sequence provided by the user is
TTACTNGGAGNT
Then the output should look like:
@@The nucleotide composition is: A - 0.2 C - 0.1 G - 0.3 T - 0.4@@
Note that non-nucleotide symbols are not counted (the @N@ usually denotes a position where the sequence is unknown).
