In the code below there is a dictionary (named codon_table
) in which keys represent codons and values represent corresponding amino acids.
Write a program that will translate a DNA sequence into an amino acid sequence using the codons disctionary. Print out the result. Note that *
are stop codons.
If you want to know more about how the codons dictionary was created, read the documentation for list comprehension and the built-in zip-function.
# Create codons dictionary
bases = ['t', 'c', 'a', 'g']
codons = [a+b+c for a in bases for b in bases for c in bases]
amino_acids = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'
codon_table = dict(zip(codons, amino_acids))
print(codon_table)
DNA = 'atgattccaacgcgaaggtcaagtacgtacagctctcagtgtgtgctactcaccgactccgtcatagcaaccggcgtcgtggtcgttaccattgcataa'
# translate the sequence
print(_________________)
a) Write a function that receives an amino acid sequence as string and returns a dictionary where the keys are the amino acid residues and the values are the number of times each residue appeared in the protein. For example, the expected result for the peptide LLTDSGT
is: {'L': 2, 'T': 2, 'D': 1, 'S': 1, 'G': 1}
.
Test your function on the provided sequences, and print the results in the following format:
L - 2
T - 2
D - 1
S - 1
G - 1
Remember: dict
is unordered.
def count_residues(protein_seq):
# your code goes here. remove the pass statement.
pass
protein_sequence = 'DQHTWMYAEGYLNHVYRCDKQRAEDKECNGLYAWALALESHGKGSYYCQGFKTFPNPWPMHMMTFVMADLYQYMEI'
aa_counts_dict = count_residues(protein_sequence)
# print results
b) Write a function that receives an amino acid sequence as a string and returns a dictionary with the frequencies of hydrophobic, posituvely-charged, negatively-charged, polar an other amino acids. Use the strings provided in the code below.
For example,
residues_type_frequencies('LLTDSGT')
{'hydrophobic': 0.286, 'positive': 0, 'negative': 0.143, 'polar': 0.428, 'other': 0.143}
Test your function on the provided amino acid sequence, and print the results in the following format:
hydrophobic - 0.286
positive - 0
negative - 0.143
polar - 0.428
other - 0.143
hydrophobic = 'AVILMFYW'
pos_charged = 'RHK'
neg_charged = 'DE'
polar = 'STNQ'
other = 'CUGP'
def residues_type_frequencies(protein_seq):
# your code goes here. remove the pass statement.
pass
aa_sequence = 'DQHTWMYAEGYLNHVYRCDKQRAEDKECNGLYAWALALESHGKGSYYCQGFKTFPNPWPMHMMTFVMADLYQYMEI'
aa_types_freq_dict = residues_type_frequencies(protein_sequence)
# print results
A palindromic sequence is a DNA sequence that is the same whether read 5' to 3' on one strand or 5' to 3' on the complementary strand. For example, the sequence 5' GAATTC 3' is palindromic, since the complement strand is 3' CTTAAG 5', or 5' GAATTC 3'.
Palindromic sequences are biologically interesting because they can form special structural motifs, such as hairpins, and often are cutting sites for restriction enzymes.
a) Write a function is_palindrome
that receives a DNA sequence as a string and returns True
(boolean) if it is palindromic and False
(boolean) otherwise. You may use the function defined in the lecture to find the complement strand.
The assertions test your function on the provided sequences. If you don't get any error messages, that means your function works fine.
def is_palindrome(seq):
# your code goes here. remove the pass statement.
pass
assert(is_palindrome('GAATTC'))
assert(is_palindrome('GATATC'))
assert(is_palindrome('AGCTTCTAGTCGACTAGAAGCT'))
assert(not is_palindrome('GAACTC'))
assert(not is_palindrome('GATATG'))
b) Now use the function is_palindrome
to look for palindromic subsequences within a given DNA sequence.
Write a function find_palindromes
that receives two parameters: a sequence seq
(string) and an integer n
. The function searches seq
for n bases long palindromic subsequences. It returns a list of all the palindromic subsequences found. If none were found, it returns an empty list. Implement the function using a for loop.
def find_palindromes(seq, n):
palindromes = []
for _________________:
if _________________:
palindromes.append(___________)
return palindromes
DNA_seq = 'GGAGCTCCCAAAGCCATCAATATTCATCAAAACGAATTCAACGGAGCTCGATATCGCATCGCAAAAGACACC'
palindromic_sequences = find_palindromes(DNA_seq,6)
assert palindromic_sequences == ['GAGCTC', 'AATATT', 'GAATTC', 'GAGCTC', 'GATATC']
c) Implement the same function using a while loop.
def find_palindromes(seq, n):
palindromes = []
while _________________:
if _________________:
palindromes.append(___________)
return palindromes
DNA_seq = 'GGAGCTCCCAAAGCCATCAATATTCATCAAAACGAATTCAACGGAGCTCGATATCGCATCGCAAAAGACACC'
palindromic_sequences = find_palindromes(DNA_seq,6)
assert palindromic_sequences == ['GAGCTC', 'AATATT', 'GAATTC', 'GAGCTC', 'GATATC']
A major caveat of the functions created so far is that they will return all palindromic sequences, even if they are overlapping, which makes no biological sense. For example, if we search the sequence GAATTCGAACAT
for 6-bases long palindromes, we will get both GAATTC
and TTCGAA
, although they are overlapping.
d) Choose one of the implementations from parts b and c, and change it so that no overlapping palindromes will be found. The function should return the upstream palindromes where there is an overlap.
def find_palindromes_no_overlap(S, n):
palindromes = []
# your code here
return palindromes
# test
DNA_seq = 'GGAGCTCCCAAAGCCATCAATATTCATCAAAACGAATTCAACGGAGCTCGATATCGCATCGCAAAAGACACC'
palindromic_sequences = find_palindromes_no_overlap(DNA_seq,6)
assert palindromic_sequences == ['GAGCTC', 'AATATT', 'GAATTC', 'GAGCTC', 'GATATC']
DNA_seq = 'GGAGCTCCCAAAGCCATCAGAATTCGAACATATCGCAAAAGACACC'
palindromic_sequences = find_palindromes(DNA_seq,6)
assert palindromic_sequences == ['GAGCTC', 'GAATTC', 'TTCGAA']
DNA_seq = 'GGAGCTCCCAAAGCCATCAGAATTCGAACATATCGCAAAAGACACC'
palindromic_sequences = find_palindromes_no_overlap(DNA_seq,6)
assert palindromic_sequences == ['GAGCTC', 'GAATTC']