Notebook

Python Programming for Biologists, Tel-Aviv University / 0411-3122 / Spring 2015 ¶

Homework 2¶

1) Bacteria growth¶

The bacteria P. pythonicus replicates every one hour, in a 100 ml tube. Being a very unfriendly bacteria, they reach stationary phase when there are 1,000,000 or more bacteria in the tube.

a) Write a program that will calculate the number of bacteria after one hour, two hours, etc, until stationarity is reached. The program will receive the starter size (number of bacteria to begin with), and start calculating from there. At each time point, the following message should be printed:
< time > hours: < no. of bacteria > bacteria

In [1]:

starter = 5 # Replace ??? with a value of your choice.
bacteria = starter
time = 0

while bacteria < 1000000:
    print(time,'hours:',bacteria,'bacteria')
    time = time + 1
    bacteria = bacteria * 2   

0 hours: 5 bacteria
1 hours: 10 bacteria
2 hours: 20 bacteria
3 hours: 40 bacteria
4 hours: 80 bacteria
5 hours: 160 bacteria
6 hours: 320 bacteria
7 hours: 640 bacteria
8 hours: 1280 bacteria
9 hours: 2560 bacteria
10 hours: 5120 bacteria
11 hours: 10240 bacteria
12 hours: 20480 bacteria
13 hours: 40960 bacteria
14 hours: 81920 bacteria
15 hours: 163840 bacteria
16 hours: 327680 bacteria
17 hours: 655360 bacteria

b) It turns out that the growth rate of P. pythonicus is affected by temperature. It's replication time r, is a function of the temperature T, so that:
$r = \frac{19 T (T - 70)}{2450} + 10$.
However, when the temperature is below 5, or over 50, the bacteria don't grow at all.
Write a program that will receive the starter size and the growth temperature, and will calculate the time to reach stationarity, printing the number of bacteria at each time point (like in part a). If bacteria can't grow, print an appropriate message (and don't do any calculation).

In [2]:

starter = 5 # Replace ??? with a value of your choice.
temp = 23 # Replace ??? with a value of your choice.

# check temperature and calculate replication time
if temp < 5 or temp > 50:
    print("Bacteria can't grow in this temperature")
else:
    r = (19 * temp * (temp - 70))/2450 + 10
# calculate and growth
    bacteria = starter
    time = 0
    while bacteria < 1000000:
        print(time,'hours:',bacteria,'bacteria')
        time = time + r
        bacteria = bacteria * 2           

0 hours: 5 bacteria
1.6167346938775502 hours: 10 bacteria
3.2334693877551004 hours: 20 bacteria
4.850204081632651 hours: 40 bacteria
6.466938775510201 hours: 80 bacteria
8.083673469387751 hours: 160 bacteria
9.700408163265301 hours: 320 bacteria
11.317142857142851 hours: 640 bacteria
12.933877551020402 hours: 1280 bacteria
14.550612244897952 hours: 2560 bacteria
16.167346938775502 hours: 5120 bacteria
17.784081632653052 hours: 10240 bacteria
19.400816326530602 hours: 20480 bacteria
21.017551020408153 hours: 40960 bacteria
22.634285714285703 hours: 81920 bacteria
24.251020408163253 hours: 163840 bacteria
25.867755102040803 hours: 327680 bacteria
27.484489795918353 hours: 655360 bacteria

2) Splicing out introns¶

a) Here’s a short section of genomic DNA:

ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGATCGATCGATCGATCGATCGATCGATCATGCTATCATCGATCGATATCGATGCATCGACTACTAT

It comprises two exons and an intron. The first exon runs from the start of the sequence to the sixty-third character, and the second exon runs from the ninety-first character to the end of the sequence.

Write a program that will print just the coding regions of the DNA sequence.

In [3]:

seq = 'ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGATCGATCGATCGATCGATCGATCGATCATGCTATCATCGATCGATATCGATGCATCGACTACTAT'

## your code goes here
exon1 = seq[:63]
exon2 = seq[90:]
coding = exon1 + exon2
print(coding)

ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGATCGAATCATCGATCGATATCGATGCATCGACTACTAT

b) Using the sequence from a, write a program that will calculate what percentage of the DNA sequence is coding.

In [5]:

## your code goes here
percent_coding = len(coding)/len(seq)*100
print(percent_coding,"%","of the sequence is coding")

77.23577235772358 % of the sequence is coding

c) Using the data from a, write a program that will print out the original genomic DNA sequence with coding bases in uppercase and non-coding bases in lowercase.

In [7]:

## your code goes here
intron = seq[63:90]
print(exon1 + intron.lower() + exon2)

ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGATCGAtcgatcgatcgatcgatcgatcatgctATCATCGATCGATATCGATGCATCGACTACTAT

3) Processing DNA in a file¶

The list sequences contains a number of DNA sequences as strings. Each sequence starts with the same 14 base pair fragment – a sequencing adapter that should have been removed.

Write a program that will trim this adapter and print the cleaned sequences to the screen. The program will then print the length of each cleaned sequence to the screen.

In [22]:

sequences = ['ATTCGATTATAAGCTCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATC', \
'ATTCGATTATAAGCACTGATCGATCGATCGATCGATCGATGCTATCGTCGT', \
'ATTCGATTATAAGCATCGATCACGATCTATCGTACGTATGCATATCGATATCGATCGTAGTC', \
'ATTCGATTATAAGCACTATCGATGATCTAGCTACGATCGTAGCTGTA', \
'ATTCGATTATAAGCACTAGCTAGTCTCGATGCATGATCAGCTTAGCTGATGATGCTATGCA']

## your code goes here
for seq in sequences:
    cleaned_seq = seq[14:]
    print(cleaned_seq)
    print(len(cleaned_seq))

TCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATC
42
ACTGATCGATCGATCGATCGATCGATGCTATCGTCGT
37
ATCGATCACGATCTATCGTACGTATGCATATCGATATCGATCGTAGTC
48
ACTATCGATGATCTAGCTACGATCGTAGCTGTA
33
ACTAGCTAGTCTCGATGCATGATCAGCTTAGCTGATGATGCTATGCA
47

4) Multiple exons from genomic DNA¶

The string genomic_dna contains a section of genomic DNA.

The list exons contains start/stop positions of exons. Each exon is a separate list (within the list of exons) with two elements: the start and stop positions.

Write a program that will extract the exon segments from genomic_dna using the positions in exons, concatenate them, and print them to the screen.

In [9]:

genomic_dna = 'TCGATCGTACCGTCGACGATGCTACGATCGTCGATCGTAGTCGATCATCGATCGATCGACTGATCGATCGATCGATCGATCGATATCGATCGATATCATCGATGCATCGATCATCGATCGATCGATCGATCGATCGATCATATGTCAGTCGATGCATCGTAGCATCGTATAGTAGCTACGTAGCTACGATCGATCGATCGATCGTAGCTAGCTAGCTAGATCGATCATCATCGTAGCTAGCTCGACTAGCTACGTACGATCGATGCATCGATCGTAGCTAGTACGATCGCGTAGCTAGCATGCTACGTAGATCGATCGATGCATGCTAGCTAGCTAGCTACGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGTAGCTAGCTACGATCGATGCTACGTAGATCGATCGCTAGTAGATCGATCGCTAGCTAGCTGACTAGTACGCTGCTAGTAGTCAGCTAGATCGATGCTAGTCA'
exons = [[5, 58], [72, 133], [190, 276], [340, 398]] # [[start, stop], [start, stop], ...]

## your code goes here
coding = ""
for exon in exons:
    start = exon[0]
    stop = exon[1]
    exon_seq = genomic_dna[start:stop]
    coding = coding + exon_seq
print(coding)

CGTACCGTCGACGATGCTACGATCGTCGATCGTAGTCGATCATCGATCGATCGCGATCGATCGATATCGATCGATATCATCGATGCATCGATCATCGATCGATCGATCGATCGACGATCGATCGATCGTAGCTAGCTAGCTAGATCGATCATCATCGTAGCTAGCTCGACTAGCTACGTACGATCGATGCATCGATCGTACGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGTAGCTAGCTACGATCG
258

References¶

Questions modified from Python for Biologists, a great book by Martin Jones.

Python Programming for Biologists, Tel-Aviv University / 0411-3122 / Spring 2015¶