This is an extremely concise introduction to Python for programmers already proficient in at least one other programming language.
A slower and more thorough tutorial is the Python Tutorial by Guido van Rossum. Read it at least upto Chapter 10.
These code fragments are for Python version 3.x
Python code, especially for file system interaction, can be written so that it runs on many different operating systems. This makes your code more portable and easier to maintain as well.
import os
import sys
if sys.platform == 'win32':
ROOT = os.path.splitdrive(os.path.abspath('.'))[0]
elif sys.platform == 'linux2' or sys.platform == 'darwin':
ROOT = os.sep
else:
raise ValueError("unknown operating system")
dictfile = os.path.join(ROOT, 'usr','share','dict','words')
print(dictfile)
/usr/share/dict/words
Use built in functions to create ranges.
for i in range(1,10,2):
print(i)
1 3 5 7 9
Always open a file using the with
statement because it closes the file at the end of the statement (even if there is an exception during interaction with the file system). A for loop can be used to iterate through lines using the file handle.
with open(dictfile, 'r') as fhandle:
for line in fhandle:
line = line.strip()
if len(line) > 23:
print(line)
antidisestablishmentarianism formaldehydesulphoxylate pathologicopsychological scientificophilosophical tetraiodophenolphthalein thyroparathyroidectomize
List comprehensions are very useful to replace a for-loop. Example below finds unique elements as a one line python program using the built-in set data structure.
x = ['a', 'b', 'c', 'd', 'a', 'b', 'c']
print([ i for i in set(x) ])
['a', 'd', 'b', 'c']
Also, you can use an 'if' statement in a list comprehension.
print([ i for i in set(x) if i != 'a'])
['d', 'b', 'c']
Using list comprehensions, the following Python code prints out the lowercased tokens of length greater than 15 from Sense and Sensibility (note that one of them occurs twice).
import nltk
longwords = [ word.lower() for word in nltk.corpus.gutenberg.words('austen-sense.txt') if len(word) > 15]
print(longwords)
['incomprehensible', 'incomprehensible', 'disinterestedness', 'companionableness', 'disqualifications']
enumerate is very useful when you want a counter variable for each element in a list.
x = ['a', 'c', 'b', 'd']
for (index,element) in enumerate(x): print(index, element)
0 a 1 c 2 b 3 d
Dictionary comprehensions are just like list comprehensions except they let you build a dictionary instead of a list. Say we want to build a dictionary where the dictionary keys are lowercase ASCII characters and the values are the probabilities for each character. In the following we just assign a random probability to each lowercase character.
import string
import numpy
# set up a random probability distribution over lowercase ASCII characters
counts = [ numpy.random.random() for c in string.ascii_lowercase ]
total = sum(counts)
# the following is a dictionary comprehension
prob = { c: (counts[i] / total) for (i,c) in enumerate(string.ascii_lowercase) }
print(prob['e'])
print(prob['z'])
0.040804314032715325 0.04517300968758016
Often we wish to compute the argmax using a probability distribution. The argmax function returns the element that has the highest probability. $$\hat{x} = \arg\max_x P(x)$$
def P(c):
return prob[c]
# the character with the highest probability is given by argmax_c P(c)
argmax_char = max(string.ascii_lowercase, key=P)
print(argmax_char, P(argmax_char))
w 0.07334002043038697
Formatted strings, where you want to insert a value into a string, where %s is a string value, %d is a decimal integer, %f is a floating point number.
print("%s = %d and %s = %f" % ("x", 10, "y", 0.0003))
x = 10 and y = 0.000300
print("The %(foo)s is %(bar)i." % {'foo': 'answer', 'bar':42})
The answer is 42.
print("The {foo} is {bar}".format(foo='answer', bar=42))
The answer is 42
The builtin function 'tuple' can be used to create n-grams from a list of words.
words = ['a', 'good', 'book', 'is', 'all', 'you', 'need', '.']
print("print unigrams aka 1-grams: ", end='')
print(words)
print("print bigrams aka 2-grams: ", end='')
print([ tuple(words[i:i+2]) for i in range(len(words)-1) ])
print("print trigrams aka 3-grams: ", end=''),
print([ tuple(words[i:i+3]) for i in range(len(words)-2) ])
print unigrams aka 1-grams: ['a', 'good', 'book', 'is', 'all', 'you', 'need', '.'] print bigrams aka 2-grams: [('a', 'good'), ('good', 'book'), ('book', 'is'), ('is', 'all'), ('all', 'you'), ('you', 'need'), ('need', '.')] print trigrams aka 3-grams: [('a', 'good', 'book'), ('good', 'book', 'is'), ('book', 'is', 'all'), ('is', 'all', 'you'), ('all', 'you', 'need'), ('you', 'need', '.')]
The function itemgetter from the operator module in Python provides a concise way to sort on different tuple elements in a list of tuples. Note that itemgetter(1) is set to the 2nd component of the tuple, and used as a key to sort the tuples.
word_freq = [ ('the', 1223), ('a', 2413), ('Mr.', 450), ('Elton', 10) ]
print(word_freq)
from operator import itemgetter
word_freq.sort(key=itemgetter(1), reverse=True)
print(word_freq)
[('the', 1223), ('a', 2413), ('Mr.', 450), ('Elton', 10)] [('a', 2413), ('the', 1223), ('Mr.', 450), ('Elton', 10)]
You can also use the built-in 'map' function to get the sorted values.
print(list(map(itemgetter(1), word_freq)))
[2413, 1223, 450, 10]
print(list(map(itemgetter(0), word_freq)))
['a', 'the', 'Mr.', 'Elton']
A class works pretty much like what you would expect from other languages such as C++ or Java. Methods of a class are determined by indentation. Each method that is part of the class must take at least one argument. The first argument of each method in a class is a pointer to the object derived from the class definition. By convention this first argument is typically called self
and it is analogous but not exactly the same as the C++ this
pointer.
class C:
def foo(self):
return self.a
def bar(self, a):
self.a = a
x = C()
x.bar('a')
print(x.foo())
a
The magic method __init__
is the constructor method for the class and __del__
is the destructor method which is called by the garbage collector (Python is similar to Java -- it does not require explicit memory management).
from itertools import islice
class FileObject:
'''Wrapper for file objects to make sure the file gets closed on deletion.'''
def __init__(self, filename):
self.file = open(filename, 'r')
def __del__(self):
self.file.close()
del self.file
f = FileObject(dictfile) # dictfile is defined in an earlier cell
for line in islice(f.file, 5):
print(line,end='')
del f # get rid of f -- this is typically not explicitly done in Python. trust the garbage collector to do it for you.
A a aa aal aalii
A class is an iterator if it has a __iter__
and next
method
defined as shown in this example.
# circular queue
class cq:
q = [] # needs to be initialized with a list
def __init__(self,q): # the argument q is a list
self.q = q
def __iter__(self):
return self
def __next__(self):
r = self.q[0]
self.q = self.q[1:] + [r] # rotate the list
return r
x = cq([1,2,3])
print(x.__next__())
print(x.__next__())
print(x.__next__())
print(x.__next__())
print(x.__next__())
1 2 3 1 2
Methods like __iter__
in the above code for cq
is called a magic method. Here is a guide to all Python magic methods
The function islice allows you to take a slice of an iterator.
from itertools import islice
x = cq([1,2,3])
for i in islice(x, 5):
print(i)
y = cq([1,2,3,4,5])
for i in islice(y,3): print(i)
z = [i for i in islice(y,10)]
print(z)
1 2 3 1 2 1 2 3 [4, 5, 1, 2, 3, 4, 5, 1, 2, 3]
The class defaultdict allows convenient insertion into a dictionary. You do not need to check if a key exists first before updating the value when using defaultdict.
from collections import defaultdict
foo = defaultdict(int)
bar = defaultdict(list)
foo['a'] += 1
foo['a'] += 1
bar['b'].append(1)
bar['b'].append(2)
print(foo, bar)
defaultdict(<class 'int'>, {'a': 2}) defaultdict(<class 'list'>, {'b': [1, 2]})
Use generators instead of lists. Generators behave like streams which you can iterate over while lists are statically allocated.
def sum_of_squares(n):
v = 0
for i in range(1,n+1):
v += i*i
yield v
for i in sum_of_squares(10): print(i)
1 5 14 30 55 91 140 204 285 385
a = [1,2,3,4] # this is a list
b = [2*x for x in a] # this is a list comprehension
c = (2*x for x in a) # this is a generator, not a list. it creates an iterator object
print(b)
print(c)
[2, 4, 6, 8] <generator object <genexpr> at 0x17b552730>
n = ((a,b) for a in range(0,2) for b in range(4,6))
for i in n:
print(i)
(0, 4) (0, 5) (1, 4) (1, 5)
Read Generator Tricks for Systems Programmers by David Beazley.
# from Part 2 of Peter Norvig's excellent essay on xkcd 1313
# http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313-part2.ipynb
import re
searcher = re.compile('^a.o').search
data = frozenset('''all particularly just less indeed over soon course still yet before
certainly how actually better to finally pretty then around very early nearly now
always either where right often hard back home best out even away enough probably
ever recently never however here quite alone both about ok ahead of usually already
suddenly down simply long directly little fast there only least quickly much forward
today more on exactly else up sometimes eventually almost thus tonight as in close
clearly again no perhaps that when also instead really most why ago off
especially maybe later well together rather so far once'''.split())
%timeit { s for s in data if searcher(s) }
%timeit set(filter(searcher, data))
9.43 µs ± 32 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) 8.34 µs ± 9.61 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
So about 18% faster to use the built-in command filter
instead of the set comprehension with an if
statement.
def concat(x, y):
return x + y
foo = ('A', 'B')
bar = {'y': 'B', 'x': 'A'}
print(concat(*foo))
print(concat(**bar))
AB AB
def doit(x,y):
if x < 0:
raise ValueError("x should be >= 0")
return y
print(doit(0,10))
10
# This will raise an exception, if you uncomment the following line:
# print doit(-1,10)
import this, codecs
print(this.s)
print(codecs.encode(this.s, "rot-13")) # -> uryyb
The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! Gur Mra bs Clguba, ol Gvz Crgref Ornhgvshy vf orggre guna htyl. Rkcyvpvg vf orggre guna vzcyvpvg. Fvzcyr vf orggre guna pbzcyrk. Pbzcyrk vf orggre guna pbzcyvpngrq. Syng vf orggre guna arfgrq. Fcnefr vf orggre guna qrafr. Ernqnovyvgl pbhagf. Fcrpvny pnfrf nera'g fcrpvny rabhtu gb oernx gur ehyrf. Nygubhtu cenpgvpnyvgl orngf chevgl. Reebef fubhyq arire cnff fvyragyl. Hayrff rkcyvpvgyl fvyraprq. Va gur snpr bs nzovthvgl, ershfr gur grzcgngvba gb thrff. Gurer fubhyq or bar-- naq cersrenoyl bayl bar --boivbhf jnl gb qb vg. Nygubhtu gung jnl znl abg or boivbhf ng svefg hayrff lbh'er Qhgpu. Abj vf orggre guna arire. Nygubhtu arire vf bsgra orggre guna *evtug* abj. Vs gur vzcyrzragngvba vf uneq gb rkcynva, vg'f n onq vqrn. Vs gur vzcyrzragngvba vf rnfl gb rkcynva, vg znl or n tbbq vqrn. Anzrfcnprf ner bar ubaxvat terng vqrn -- yrg'f qb zber bs gubfr! The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!
Uncomment the following easter eggs to see what happens.
# from __future__ import braces
# import __phello__
import antigravity
This section is strictly for programming language wonks. Scoping in Python can sometimes be tricky.
x = 'a'
class wat:
x = 'b'
def __init__(self):
print("1:", x)
print("2:", self.x)
f = wat()
print("3:", f.x)
1: a 2: b 3: b
x = 'a'
print("1:", list(x for x in (1,2)), x)
print("2:", [x for x in (1,2)], x)
1: [1, 2] a 2: [1, 2] a
For more visit http://programmingwats.tumblr.com/
Python has syntactic support for function composition.
## function composition of foo with bar: foo(bar(args)) using a decorator
def foo(f):
def decorator_func(*args, **keyword_args):
f(*args, **keyword_args)
print("Desecration is the smile on my face\n")
return decorator_func
@foo
def bar(n):
print(n)
bar("Never in the wrong time or wrong place")
## function composition directly by calling foo(bar_bar(args))
def bar_bar(n):
print(n)
# notice how I give a function as an argument to foo which returns a function
# and I then provide an argument to that function returned by foo
foo(bar_bar)("My face my face, hey")
Never in the wrong time or wrong place Desecration is the smile on my face My face my face, hey Desecration is the smile on my face
from IPython.core.display import HTML
def css_styling():
styles = open("../css/notebook.css", "r").read()
return HTML(styles)
css_styling()