from IPython.display import YouTubeVideo, HTML, Image
int
, float
, bool
)+
, -
, *
, ..., ==
, <
, ..., and
, or
, ...)if
, elif
, else
)for
)YouTubeVideo('kQC82okzTXI')
Characters are textual symbols, like letters (ABCDE...
), numerals (12345
), punctuation marks (,.?:&
), and even things like newline (\n
) and whitespace ().
x = "Py4Life"
y = 'I love python'
print(x)
print(y)
Strings are objects of type str
:
type(x)
We can concat (לשרשר) strings:
print(x + "2015")
We can convert string to numbers and vice versa (if it is appropriate):
x = "4"
y = int(x)
print("y+1 =", y + 1)
Otherwise, we get an error message...
print("x+1 =", x + 1)
x = str(y)
print("x =", x)
x = "3.14"
y = float(x)
print("y*2 =", y * 2)
Because we are biologists, strings are not just text, they are also sequences!
dna = "ATGCGTA"
print(dna)
Again we can concat strings:
upstream = "AAA"
downstream = "GGG"
dna = upstream + "ATG" + downstream
print(dna)
We can find the length of a string using the command len
:
n = len(dna)
print("The length of the DNA variable is", n)
dna = dna + "AGCTGA"
print("Now it is", len(dna))
Just a moment, what was that...?
print(dna)
dna = dna + "AGCTGA"
print(dna)
also works with numbers:
x = 10
x = x + 7
print(x)
We can extract subsets of a string by using slicing, with the corresponding indexes.
Remember: string indexes start from 0!
We can access specific indexes of the list (starting from 0)
bacteria = 'Escherichia coli'
# get the 1st and 6th letters
print(bacteria[0])
print(bacteria[5])
Indexes work from the tail as well, using negative numbers:
# get the last letter
print(bacteria[-1])
# get 5th letter from the end
print(bacteria[-5])
We can get a range of indexes using [start:end]
# get the 3rd to 8th letters
print(bacteria[2:8])
Notice that the start position is included, but not the end position. We actually take the character with indexes 2,3,4,5,6,7. And what do we get?
type(bacteria[2:8])
There are shorts for taking the first and last characters:
# get the first 5 letters
print(bacteria[0:5])
# or simply:
print(bacteria[:5])
# get 3rd to last nucleotides:
print(bacteria[3:])
# last 3 nucleotides
print(bacteria[-3:])
The sequence below (named seq) consists of 20 nucleotides.
seq = "CAAGTAATGGCAGCCATTAA"
# print 2nd nucleotide
print(seq[1])
# print 7th nucleotide
print(seq[6])
# print 2nd nucleotide from the tail
print(seq[-2])
first_half = seq[:10]
print(first_half)
second_half = seq[10:]
print(second_half)
middle = seq[5:15]
print(middle)
There are some methods (actions, commands) we can operate on strings. These are provoked using the '.
' character.
We can change a string to lowercase:
dna = dna.lower()
print(dna)
And back to uppercase:
dna = dna.upper()
print(dna)
We can replace characters:
rna = dna.replace("T", "U")
print(rna)
We can count characters.
For example, let's count the number of histidine (H
) and proline (P
) in the AA (amino-acid) sequence of Human Insulin:
insulin = 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN'
print("# of histidine:", insulin.count('H'))
print("# of proline:", insulin.count('P'))
We can find a substring within a string.
For example, we can look for the character D
in the insulin sequence.
pos = insulin.index('D')
print(pos)
type(pos)
print(insulin[pos])
The result is the index (position) of the first D
found in the sequence.
We can also look for longer substrings, representing motiffs. For example, let's find the position of the Insulin B-chain in the entire peptide:
b_chain = "FVNQHLCGSHLVEALYLVCGERGFFYTPKT"
position = insulin.index(b_chain)
print("Position:", position)
print(len(b_chain))
found = insulin[position:position + len(b_chain)] # slicing (notice the ':')
print(b_chain == found)
print("Original:", b_chain)
print("Found: ", found)
We can split a string on every occurence of a separator character:
names = "melanogaster,simulans,yakuba,ananassae"
species = names.split(",")
print(species)
What do we get?
type(species)
Lists are similar to strings in being sequential, only they can contain any type of data, not just characters.
This includes int
, float
, bool
, str
, and even list
.
Lists could even include mixed variable types.
We define a list just like any other variable, but use '[ ]' and ',' to separate elements.
# a list of strings
apes = ["Homo sapiens", "Pan troglodytes", "Pongo pygmaeus"]
print(apes)
# a list of numbers
nums = [7,13,2,400]
print(nums)
# a mixed list
mixed = [12,'Mus musculus',True]
print(mixed)
You can access list elements just like strings, using indexes (starting from 0):
print("Human:", apes[0])
print("Gorila:", apes[-1])
Lists are dynamic - you can append, remove and insert into them. This is done using list methods, again using the '.':
We can access and change list elements.
new_apes = apes[:] # make a copy of the apes list
new_apes[2] = 'Hylobates lar'
print(new_apes)
This does NOT work with strings though...
print(dna)
dna[5] = 'G'
# add element to the end of the list
apes.append("Gorilla gorilla")
print(apes)
# insert element at a given index
apes.insert(2, "Pan paniscus")
print(apes)
# remove element from list
apes.remove("Pongo pygmaeus")
print(apes)
To remove a list item by index:
# option 1
apes.remove(apes[1])
# option 2
del(apes[1])
We can concat lists, just like strings:
print(apes + ["Pongo pygmaeus", "Pongo abelii"])
Searching in lists is done using index
(not find
):
i = apes.index('Pan troglodytes')
print(i)
print(apes[i])
You can also check if something is in a list (works as well for strings):
if 'Saguinus nigricollis' in apes:
print('Saguinus nigricollis is an ape')
else:
print('Saguinus nigricollis is not an ape')
Suppose we have a list of experimental measurements and we want to do basic statistics: count the number of results, calculate the average, and find the maximum and minimum.
measurements = [33, 55,45,87,88,95,34,76,87,56,45,98,87,89,45,67,45,67,76,73,33,87,12,100,77,89,92]
count = len(measurements)
avg = sum(measurements) / len(measurements)
maximum = max(measurements)
minimum = min(measurements)
print(count, "measurements with average", avg, "maximum", maximum, "minimum", minimum)
We can sort lists using the sorted
method.
If the list is made entirely of numbers, then sorting is straightforward:
sorted_measurements = sorted(measurements)
print(sorted_measurements)
A list of strings will be sorted lexicographically (think about the way '<' and '>' work on strings):
sorted_apes = sorted(apes)
print(sorted_apes)
But beware of mixed lists!
mixed = apes + measurements
print(mixed)
print(sorted(mixed))
List elements can be of any type, including lists!
For example:
birds = ['Gallus gallus', 'Corvus corone', 'Passer domesticus']
snakes = ['Ophiophagus hannah', 'Vipera palaestinae', 'Python bivittatus']
animals = [apes,birds,snakes]
print(animals)
We access lists of lists using double-indexes. For example, to get the 3rd snake:
print(animals[2][2])
Note that the elements of the outer list are lists themselves, not strings. For example:
type(animals[1])
We can slice lists just like we did with strings, to get partial lists.
For example:
# get the first 10 measurements
print(measurements[:10])
# get the last 3 measurements
print(measurements[-3:])
Use the lists birds
and snakes
defined above to create a single list of strings with the animal names. Then add the string Mus musculus
to the list. Finally, remove the Corvus corone
from the list. Print the 2nd to 5th elements of the resulting list, sorted alphabetically.
# create list
animals = birds + snakes
# add Mus musculus
animals.append('Mus musculus')
# remove Corvus corone element
animals.remove('Corvus corone')
# print
print(sorted(animals[1:5]))
Say we want to print each element of our list:
print(apes[0], "is an ape")
print(apes[1], "is an ape")
print(apes[2], "is an ape")
print(apes[3], "is an ape")
but this is very repetitive and relies on us knowing the number of elements in the list. What we need is a way to say something along the lines of “for each element in the list of apes, print out the element, followed by the words ‘ is an ape’“. Python’s loop syntax allows us to express those instructions like this:
for ape in apes:
print(ape, "is an ape")
A more complex loop will go over each ape name and print some stats:
for ape in apes:
name_length = len(ape)
first_letter = ape[0]
print(ape, "is an ape. Its name starts with", first_letter)
print("Its name has", name_length, "letters")
We can also loop over a string.
Let's go over the Insulin AA sequnce and count the number of prolines manualy:
count = 0
for aa in insulin:
if aa == "P":
count = count + 1
print("# of prolines:", count)
Can you remember another way of doing this?
Let's count how many measurements (see above) are above the average:
print(measurements)
print(avg)
over = 0
for x in measurements:
if x > avg:
over = over + 1
print(over, "measurements are over the average.")
charged = ['R','H','K','D','E']
charged_count = 0
for aa in insulin:
if aa in charged:
charged_count += 1
insulin_length = len(insulin)
charged_ratio = charged_count/insulin_length
print("Ratio of charged amino acids is:",charged_ratio)
range
¶Sometimes we want to loop over consecutive numbers.
This is accomplished using the range
command.
range
accepts one, two, or three arguments: the bottom and upper limits and the step size.
The bottom limit can be omited - default is zero - and the step can be omited - default is 1.
The upper limit is not included.
for i in range(10): # aka range(0,10,1)
print(i)
for i in range(10,20):
print(i, end=' ') # prints ending with space instead of newline
for i in range(100,1000,10):
print(i, end=' ')
Let's check if the number n
is a prime number - that is, it can only be divided by 1 and itself:
n = 97 # try other numbers
divider = 1
for k in range(2,n): # why start at 2? can we choose a different limit to range? a different step perhaps?
if n % k == 0:
divider = k
if divider != 1:
print(n, "is divided by", divider)
else:
print(n, "is a prime number")
We can also use range()
to loop on a list. This is useful in some cases.
for i in range(len(apes)):
print(apes[i])
Here’s a short DNA sequence:
ACTGATCGATTACGTATAGTAGAATTCTATCATACATATATATCGATGCGTTCAT
The sequence contains a recognition site for the EcoRI restriction enzyme, which cuts at the motif G*AATTC
(the position of the cut is indicated by an asterisk). Write a program which will calculate the size of the two fragments that will be produced when the DNA sequence is digested with EcoRI.
(from Python for Biologists)
fragments = seq.split('GAATTC')
f1_length = len(fragments[0]) + 1 # add 1 for the 'G'
f2_length = len(fragments[1]) + 5 # add 5 for the 'AATTC'
print('Fragment lengths of',f1_length,'and',f2_length,'will be produced.')
Write a program that will print the complement of the sequence above.
(from Python for Biologists)
complement = ''
for base in seq:
if base == 'A':
complement = complement + 'T'
elif base == 'T':
complement = complement + 'A'
elif base == 'G':
complement = complement + 'C'
elif base == 'C':
complement = complement + 'G'
else:
print("Bad base:", base)
print("Complement:", complement)
Go over the ape_pics
list and display the pics using the command display(Image(url=<url string>))
.
Before each pic print the name of that ape from the apes
list.
ape_pics = ['http://upload.wikimedia.org/wikipedia/commons/thumb/6/68/Akha_cropped_hires.JPG/330px-Akha_cropped_hires.JPG', 'http://upload.wikimedia.org/wikipedia/commons/thumb/6/62/Schimpanse_Zoo_Leipzig.jpg/330px-Schimpanse_Zoo_Leipzig.jpg', 'http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Bonobo_0155.jpg/330px-Bonobo_0155.jpg', 'http://upload.wikimedia.org/wikipedia/commons/thumb/c/c0/Western_Lowland_Gorilla_at_Bronx_Zoo_2_cropped.jpg/338px-Western_Lowland_Gorilla_at_Bronx_Zoo_2_cropped.jpg']
from IPython.display import YouTubeVideo, HTML, Image, display
for i in range(len(apes)):
print(apes[i])
display(Image(url=ape_pics[i]))
This notebook is part of the Python Programming for Life Sciences Graduate Students course given in Tel-Aviv University, Spring 2015.
Part of this notebook was adapted from the Lists and Loops chapter in Martin Jones's Python for Biologists book.
The notebook was written using Python 3.4.1 and IPython 2.1.0 (download from PyZo).
The code is available at https://github.com//Py4Life/TAU2015/blob/master/lecture2.ipynb.
The notebook can be viewed online at http://nbviewer.ipython.org/github//Py4Life/TAU2015/blob/master/lecture2.ipynb.
The notebook is also available as a PDF at https://github.com/Py4Life/TAU2015/blob/master/lecture2.pdf?raw=true.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.