There are three kinds of modules/packages:
Standard library modules come included in Python and they contain many useful tools.
They are maintained by the core Python development team so you can count on them being reliable.
The Python Standard Library is very extensive, so I will just show you some highlights.
Refer to this and this for a more complete view on the Standard Library.
Note: Standard Library packages and modules are NOT the same thing as built-in objects (e.g. print
, open
, zip
, enumerate
). You still have to import
standard library modules/packages you just don't have to install them from elsewhere.
import sys
#get the interpreter path
print(f"Interpreter is located at: {sys.executable}\n")
#get module search path
print(f"Look for modules in: {sys.path}\n")
Interpreter is located at: /Users/carlosgonzalezoliver/anaconda/envs/py36/bin/python Look for modules in: ['', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python36.zip', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6/lib-dynload', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6/site-packages', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6/site-packages/IPython/extensions', '/Users/carlosgonzalezoliver/.ipython']
#kill the interpreter, stops your program's execution (works better outside of notebooks)
sys.exit()
An exception has occurred, use %tb to see the full traceback.
SystemExit
/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2870: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D. warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
sys
: command-line arguments¶Until now we have been getting input from the user in an "interactive" way.
That is, the program pauses execution and waits for the user to respond to the input
query.
You can also let users give input to your program at the beginning of execution and then execution is never halted.
This is done through command-line arguments
Imagine you have a file divide.py
that divides two numbers given by the user.
Using input()
we had
a = int(input("Give me the first number: "))
b = int(input("Give me the second number: "))
print(a / b)
With command-line arguments, the information is taken before execution.
import sys
a = int(sys.argv[1])
b = int(sys.argv[2])
print(a/b)
From the command line, you would call the program as such:
$ python divide.py 3 2
sys.argv
stores a list of strings given by the command line.
In this case:
print(sys.argv)
Would produce:
["divide.py", "3", "2"]
Command line arguments are often preferred when it is desireable to automate the execution of a program.
os
¶This module lets you perform actions related to the operating system.
import os
print(f"My operating system type is: {os.name}")
print(f"I am currently in directory: {os.getcwd()}")
My operating system type is: posix I am currently in directory: /Users/carlosgonzalezoliver/Projects/Notebooks/COMP_364/L24
You can also change your current working directory
os.chdir("/Users/carlosgonzalezoliver/Projects")
os.getcwd()
'/Users/carlosgonzalezoliver/Projects'
You can see what files are in a directory. No arguments means, look in the current directory.
os.listdir()
['.DS_Store', '.ipynb_checkpoints', 'AbstractClassification', 'ArXiVDT', 'BGSA_Workshop', 'briancaffey.github.io', 'cgoliver.github.io', 'cminerva', 'Crick', 'dapps', 'Dorys_Whaleish_Dictionary.py', 'ETHDogs', 'Ethereum', 'Euler', 'Git_Talk', 'Git_Tutorial', 'Google', 'haikus', 'Kattis', 'Keras-RCNN', 'Kernels', 'kernelx', 'machine_learning', 'mateRNAl', 'myproject', 'Notebooks', 'Nussinov', 'Pear', 'Pickypedia', 'Plumbing', 'pocketcluster', 'Popgen_sols', 'pyCourses', 'pyMeet', 'RNA', 'RNA-Popgen-Notebook', 'SeizuresBot', 'Test', 'testblog', 'tPPI', 'Voting', 'zminerva']
Or you can give a path.
os.listdir("/Users/carlosgonzalezoliver/Projects/Notebooks/COMP_364/L24")
['.ipynb_checkpoints', 'L24.ipynb', 'rand_dict.json', 'rand_dict.pickle', 'test.csv', 'test.txt']
Let's go back to where we were.
os.chdir("/Users/carlosgonzalezoliver/Projects/Notebooks/COMP_364/L24")
You can also create new directories.
os.mkdir("Temp")
os.listdir()
['.ipynb_checkpoints', 'L24.ipynb', 'rand_dict.json', 'rand_dict.pickle', 'Temp', 'test.csv', 'test.txt']
shutil
¶shutil
is used for file manipulation (not file content manipulation)
with open("test.txt", "w") as t:
t.write("Hello")
os.listdir()
['.ipynb_checkpoints', 'L24.ipynb', 'rand_dict.json', 'rand_dict.pickle', 'Temp', 'test.csv', 'test.txt']
import shutil
#copy the file
shutil.copyfile("test.txt", "test_copy.txt")
'test_copy.txt'
os.listdir()
#delete a directory
shutil.rmtree("Temp")
#deleting files is done with os
os.remove("test_copy.txt")
os.listdir()
There are a couple convenient "math" modules
import math
print(f"e^2: {math.exp(2)}")
print(f"log(1): {math.log(1)}")
print(f"3^4: {math.pow(3, 4)}")
print(f"sin(4): {math.sin(4)}")
e^2: 7.38905609893065 log(1): 0.0 3^4: 81.0 sin(4): -0.7568024953079282
The random
module gives you pseudo-random (no perfectly random generator exists) functionality.
import random
#random number uniformly from 0 and 1
print(f"uniform random number: {random.random()}")
print(f"uniform random number between 4 and 15 {random.randrange(4, 16)}")
mu = 0
sigma = 1
print(f"gaussian random number with mean {mu} and variance {sigma}: {random.gauss(mu, sigma)} ")
uniform random number: 0.4199826783981393 uniform random number between 4 and 15 11 gaussian random number with mean 0 and variance 1: 0.5279296760327262
Let's check that we're actually getting uniform and Gaussian distributions.
%matplotlib inline
import matplotlib.pyplot as plt
def rand_plot(samples):
n, bins, patches = plt.hist(samples, 10, normed=0, facecolor='green', alpha=0.75)
plt.xlabel("Value")
plt.ylabel("Count")
plt.show()
#uniform random number
unif = [random.uniform(10, 15) for _ in range(1000)]
rand_plot(unif)
#gaussian random number
gaussian = [random.gauss(mu, sigma) for _ in range(1000)]
rand_plot(gaussian)
We can also do random things with lists.
#randomly pick one item
birds = ["duck", "goose", "eagle", "swan"]
print(random.choice(birds))
#coin toss
coin = ["heads", "tails"]
print(random.choice(coin))
#shuffle the items of a list in place
random.shuffle(birds)
print(birds)
duck tails ['eagle', 'swan', 'goose', 'duck']
The collections
module lets us enhance some of the container types we've seen for more user friendliness.
import collections
#count number of occurences from a list
c = collections.Counter(["red", "red", "red", "black", "red", "blue", "blue"])
print(c)
print(c['red'])
#get the 2 most common elements
print(c.most_common(2))
Counter({'red': 4, 'blue': 2, 'black': 1}) 4 [('red', 4), ('blue', 2)]
namedtuple
lets us give names to the indices of a tuple.
Student = collections.namedtuple('Student', ['name', 'grade', 'major'])
s = Student('Carlos', 2.1, 'cs')
print(s.grade)
print(s.name)
print(s.major)
2.1 Carlos cs
Useful for giving CSV entries meaningful names.
test.csv
:
carlos,2.4,cs
jim,3.1,math
joan,2.5,phys
jack,3.6,cs
with open("test.csv", "r") as students:
for s in students:
#the _make() function lets you make a NamedTuple from an iterable
line = s.strip().split(",")
tup = Student._make(line)
print(tup)
print(tup.name)
Student(name='carlos', grade='2.4', major='cs') carlos Student(name='jim', grade='3.1', major='math') jim Student(name='joan', grade='2.5', major='phys') joan Student(name='jack', grade='3.6', major='cs') jack
The datetime
module is useful for handling date formats.
import datetime as dt
date = dt.date(2017, 11, 9)
print(date)
print(date.year)
#today's date
print(dt.date.today())
#compare dates
christmas = dt.date(2017, 12, 25)
till_christmas = christmas - dt.date.today()
#produces a timedelta object
print(type(till_christmas))
print(f"Days till Christmas: {till_christmas}")
#day of the week as an integer
print(dt.date.today().weekday())
print(christmas.weekday())
2017-11-09 2017 2017-11-06 <class 'datetime.timedelta'> Days till Christmas: 49 days, 0:00:00 0 0
The timeit
module helps you time the execution of some code snippets.
import timeit
timeit.timeit("[x*x for x in range(100)]")
7.79176791899954
The doctest
module lets you put executable python in docstrings as test calls to make sure everything works as expected. The module looks for >>>
interactive python calls and compares the actual call to what is in the string as the output.
import doctest
def mysquare(x):
"""
This function computes the square of a number.
>>> mysquare(5)
25
"""
return x*x
def mymean(nums):
"""
This function computes the mean of a list of numbers.
>>> mymean([2, 2, 3, 4])
2.75
"""
tot = 0
for i in nums:
tot += i
return tot / len(nums)
doctest.testmod()
TestResults(failed=0, attempted=2)
pickle
is a very useful module for storing python objects in files so that you can keep working on them later.
import pickle
rand_dict = {}
animals = ["dog", "cat", "giraffe", "lion", "zebra"]
for a in animals:
rand_dict[a] = random.random()
print(rand_dict)
{'dog': 0.3856088112400382, 'cat': 0.7451045373130774, 'giraffe': 0.4958727957365945, 'lion': 0.7046721287417177, 'zebra': 0.07175260118616189}
I can now store, or dump the dictionary to a file.
Pickle stores objects as a binary representation which is not human readable and only works in Python but is very fast.
pickle.dump(rand_dict, open("rand_dict.pickle", "wb"))
loaded = pickle.load(open("rand_dict.pickle", "rb"))
print(loaded)
{'dog': 0.3856088112400382, 'cat': 0.7451045373130774, 'giraffe': 0.4958727957365945, 'lion': 0.7046721287417177, 'zebra': 0.07175260118616189}
json
does a similar job but the contents are human-readable and can be read by any language. The downside is it's not as fast.
JSON cannot store any custom classes and not all python classes can be JSONed.
import json
json.dump(rand_dict, open("rand_dict.json", "w"))
jsoned = json.load(open("rand_dict.json", "r"))
jsoned
{'cat': 0.7451045373130774, 'dog': 0.3856088112400382, 'giraffe': 0.4958727957365945, 'lion': 0.7046721287417177, 'zebra': 0.07175260118616189}
Sometimes you can have tasks that can be easily parallelized.
Since most computers have more than one processor, we can let multiple processors work on our Python at the same time.
For example:
For a given number $n$ I want to compute the sum of every number up to $n$ cubed.
Obviously the process of squaring a particular number in the list is independent of squaring any other number.
from multiprocessing import Pool
import time
def cube_sum(x):
return sum([i**3 for i in range(x)])
#we use the context manager to take care of all the setup
#we create a Pool object which contains the processors we can send tasks to
#here we have chosen to use 4 processes
start = time.time()
nums = [i for i in range(10000)]
with Pool(4) as p:
result = p.map(cube_sum, nums)
print(f"Parallel job took: {time.time() - start}")
### normally:
start_serial = time.time()
serial_result = [cube_sum(x) for x in nums]
print(f"Serial job took {time.time() - start_serial}")
Parallel job took: 14.64725112915039 Serial job took 25.040592908859253
The reason I came up with such a weird function is that parallelizing is not always faster.
There is quite a bit of setup and communication that needs to happen to coordinate the processors (aka overhead).
If the actual comptuation is faster than the overhead then the normal serial method is faster.
There are many other modules that I did not cover, and many other functionalities of the ones I did cover that I didn't have time to show you.
Some notable Standard Library modules worth looking into:
re
: searching for patterns inside stringsstatistics
: basics statistics function (mean, std, etc)os.path
, glob
: handling file pathscsv
: automatic CSV file parsinglogging
: code and error loggingargpars
: command line argument parsertkinter
: building graphical user interfaces