# COMP 364: A brief tour of the Standard Library¶

There are three kinds of modules/packages:

• Modules you make yourself
• Third-party modules (e.g. matplotlib)
• Standard library modules

Standard library modules come included in Python and they contain many useful tools.

They are maintained by the core Python development team so you can count on them being reliable.

The Python Standard Library is very extensive, so I will just show you some highlights.

Refer to this and this for a more complete view on the Standard Library.

Note: Standard Library packages and modules are NOT the same thing as built-in objects (e.g. print, open, zip, enumerate). You still have to import standard library modules/packages you just don't have to install them from elsewhere.

• sys: functions and variables working on the Python interpreter
• os: operating system functionality
• shutil: file manipulation

### sys¶

In [1]:
import sys

#get the interpreter path
print(f"Interpreter is located at: {sys.executable}\n")
#get module search path
print(f"Look for modules in: {sys.path}\n")

Interpreter is located at: /Users/carlosgonzalezoliver/anaconda/envs/py36/bin/python

Look for modules in: ['', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python36.zip', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6/lib-dynload', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6/site-packages', '/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6/site-packages/IPython/extensions', '/Users/carlosgonzalezoliver/.ipython']


In [2]:
#kill the interpreter, stops your program's execution (works better outside of notebooks)
sys.exit()

An exception has occurred, use %tb to see the full traceback.

SystemExit

/Users/carlosgonzalezoliver/anaconda/envs/py36/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2870: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


### sys: command-line arguments¶

Until now we have been getting input from the user in an "interactive" way.

That is, the program pauses execution and waits for the user to respond to the input query.

You can also let users give input to your program at the beginning of execution and then execution is never halted.

This is done through command-line arguments

Imagine you have a file divide.py that divides two numbers given by the user.

Using input() we had

a = int(input("Give me the first number: "))
b = int(input("Give me the second number: "))

print(a / b)


With command-line arguments, the information is taken before execution.

import sys

a = int(sys.argv[1])
b = int(sys.argv[2])

print(a/b)


From the command line, you would call the program as such:

$python divide.py 3 2 sys.argv stores a list of strings given by the command line. In this case: print(sys.argv)  Would produce: ["divide.py", "3", "2"]  Command line arguments are often preferred when it is desireable to automate the execution of a program. ## os¶ This module lets you perform actions related to the operating system. In [3]: import os print(f"My operating system type is: {os.name}") print(f"I am currently in directory: {os.getcwd()}")  My operating system type is: posix I am currently in directory: /Users/carlosgonzalezoliver/Projects/Notebooks/COMP_364/L24  You can also change your current working directory In [4]: os.chdir("/Users/carlosgonzalezoliver/Projects") os.getcwd()  Out[4]: '/Users/carlosgonzalezoliver/Projects' You can see what files are in a directory. No arguments means, look in the current directory. In [5]: os.listdir()  Out[5]: ['.DS_Store', '.ipynb_checkpoints', 'AbstractClassification', 'ArXiVDT', 'BGSA_Workshop', 'briancaffey.github.io', 'cgoliver.github.io', 'cminerva', 'Crick', 'dapps', 'Dorys_Whaleish_Dictionary.py', 'ETHDogs', 'Ethereum', 'Euler', 'Git_Talk', 'Git_Tutorial', 'Google', 'haikus', 'Kattis', 'Keras-RCNN', 'Kernels', 'kernelx', 'machine_learning', 'mateRNAl', 'myproject', 'Notebooks', 'Nussinov', 'Pear', 'Pickypedia', 'Plumbing', 'pocketcluster', 'Popgen_sols', 'pyCourses', 'pyMeet', 'RNA', 'RNA-Popgen-Notebook', 'SeizuresBot', 'Test', 'testblog', 'tPPI', 'Voting', 'zminerva'] Or you can give a path. In [6]: os.listdir("/Users/carlosgonzalezoliver/Projects/Notebooks/COMP_364/L24")  Out[6]: ['.ipynb_checkpoints', 'L24.ipynb', 'rand_dict.json', 'rand_dict.pickle', 'test.csv', 'test.txt'] Let's go back to where we were. In [7]: os.chdir("/Users/carlosgonzalezoliver/Projects/Notebooks/COMP_364/L24")  You can also create new directories. In [8]: os.mkdir("Temp")  In [9]: os.listdir()  Out[9]: ['.ipynb_checkpoints', 'L24.ipynb', 'rand_dict.json', 'rand_dict.pickle', 'Temp', 'test.csv', 'test.txt'] ### shutil¶ shutil is used for file manipulation (not file content manipulation) In [10]: with open("test.txt", "w") as t: t.write("Hello")  In [11]: os.listdir()  Out[11]: ['.ipynb_checkpoints', 'L24.ipynb', 'rand_dict.json', 'rand_dict.pickle', 'Temp', 'test.csv', 'test.txt'] In [12]: import shutil #copy the file shutil.copyfile("test.txt", "test_copy.txt")  Out[12]: 'test_copy.txt' In [ ]: os.listdir()  In [13]: #delete a directory shutil.rmtree("Temp")  In [14]: #deleting files is done with os os.remove("test_copy.txt")  In [ ]: os.listdir()  # Math¶ There are a couple convenient "math" modules • math: basic math operations and quantities • random: pseudo-random numbers • statistics: basic statistics functions In [15]: import math print(f"e^2: {math.exp(2)}") print(f"log(1): {math.log(1)}") print(f"3^4: {math.pow(3, 4)}") print(f"sin(4): {math.sin(4)}")  e^2: 7.38905609893065 log(1): 0.0 3^4: 81.0 sin(4): -0.7568024953079282  ### Random¶ The random module gives you pseudo-random (no perfectly random generator exists) functionality. In [18]: import random #random number uniformly from 0 and 1 print(f"uniform random number: {random.random()}") print(f"uniform random number between 4 and 15 {random.randrange(4, 16)}") mu = 0 sigma = 1 print(f"gaussian random number with mean {mu} and variance {sigma}: {random.gauss(mu, sigma)} ")  uniform random number: 0.4199826783981393 uniform random number between 4 and 15 11 gaussian random number with mean 0 and variance 1: 0.5279296760327262  Let's check that we're actually getting uniform and Gaussian distributions. In [19]: %matplotlib inline import matplotlib.pyplot as plt def rand_plot(samples): n, bins, patches = plt.hist(samples, 10, normed=0, facecolor='green', alpha=0.75) plt.xlabel("Value") plt.ylabel("Count") plt.show() #uniform random number unif = [random.uniform(10, 15) for _ in range(1000)] rand_plot(unif) #gaussian random number gaussian = [random.gauss(mu, sigma) for _ in range(1000)] rand_plot(gaussian)  We can also do random things with lists. In [20]: #randomly pick one item birds = ["duck", "goose", "eagle", "swan"] print(random.choice(birds)) #coin toss coin = ["heads", "tails"] print(random.choice(coin)) #shuffle the items of a list in place random.shuffle(birds) print(birds)  duck tails ['eagle', 'swan', 'goose', 'duck']  # Data structures¶ The collections module lets us enhance some of the container types we've seen for more user friendliness. In [21]: import collections #count number of occurences from a list c = collections.Counter(["red", "red", "red", "black", "red", "blue", "blue"]) print(c) print(c['red']) #get the 2 most common elements print(c.most_common(2))  Counter({'red': 4, 'blue': 2, 'black': 1}) 4 [('red', 4), ('blue', 2)]  namedtuple lets us give names to the indices of a tuple. In [25]: Student = collections.namedtuple('Student', ['name', 'grade', 'major']) s = Student('Carlos', 2.1, 'cs') print(s.grade) print(s.name) print(s.major)  2.1 Carlos cs  Useful for giving CSV entries meaningful names. test.csv: carlos,2.4,cs jim,3.1,math joan,2.5,phys jack,3.6,cs In [28]: with open("test.csv", "r") as students: for s in students: #the _make() function lets you make a NamedTuple from an iterable line = s.strip().split(",") tup = Student._make(line) print(tup) print(tup.name)  Student(name='carlos', grade='2.4', major='cs') carlos Student(name='jim', grade='3.1', major='math') jim Student(name='joan', grade='2.5', major='phys') joan Student(name='jack', grade='3.6', major='cs') jack  The datetime module is useful for handling date formats. In [30]: import datetime as dt date = dt.date(2017, 11, 9) print(date) print(date.year) #today's date print(dt.date.today()) #compare dates christmas = dt.date(2017, 12, 25) till_christmas = christmas - dt.date.today() #produces a timedelta object print(type(till_christmas)) print(f"Days till Christmas: {till_christmas}") #day of the week as an integer print(dt.date.today().weekday()) print(christmas.weekday())  2017-11-09 2017 2017-11-06 <class 'datetime.timedelta'> Days till Christmas: 49 days, 0:00:00 0 0  # Quality Control¶ The timeit module helps you time the execution of some code snippets. In [31]: import timeit timeit.timeit("[x*x for x in range(100)]")  Out[31]: 7.79176791899954 The doctest module lets you put executable python in docstrings as test calls to make sure everything works as expected. The module looks for >>> interactive python calls and compares the actual call to what is in the string as the output. In [33]: import doctest def mysquare(x): """ This function computes the square of a number. >>> mysquare(5) 25 """ return x*x def mymean(nums): """ This function computes the mean of a list of numbers. >>> mymean([2, 2, 3, 4]) 2.75 """ tot = 0 for i in nums: tot += i return tot / len(nums) doctest.testmod()  Out[33]: TestResults(failed=0, attempted=2) # Data Storage¶ pickle is a very useful module for storing python objects in files so that you can keep working on them later. In [35]: import pickle rand_dict = {} animals = ["dog", "cat", "giraffe", "lion", "zebra"] for a in animals: rand_dict[a] = random.random() print(rand_dict)  {'dog': 0.3856088112400382, 'cat': 0.7451045373130774, 'giraffe': 0.4958727957365945, 'lion': 0.7046721287417177, 'zebra': 0.07175260118616189}  I can now store, or dump the dictionary to a file. Pickle stores objects as a binary representation which is not human readable and only works in Python but is very fast. In [36]: pickle.dump(rand_dict, open("rand_dict.pickle", "wb"))  In [37]: loaded = pickle.load(open("rand_dict.pickle", "rb"))  In [38]: print(loaded)  {'dog': 0.3856088112400382, 'cat': 0.7451045373130774, 'giraffe': 0.4958727957365945, 'lion': 0.7046721287417177, 'zebra': 0.07175260118616189}  json does a similar job but the contents are human-readable and can be read by any language. The downside is it's not as fast. JSON cannot store any custom classes and not all python classes can be JSONed. In [39]: import json json.dump(rand_dict, open("rand_dict.json", "w"))  In [40]: jsoned = json.load(open("rand_dict.json", "r"))  In [41]: jsoned  Out[41]: {'cat': 0.7451045373130774, 'dog': 0.3856088112400382, 'giraffe': 0.4958727957365945, 'lion': 0.7046721287417177, 'zebra': 0.07175260118616189} # Multiprocessing¶ Sometimes you can have tasks that can be easily parallelized. Since most computers have more than one processor, we can let multiple processors work on our Python at the same time. For example: For a given number$n$I want to compute the sum of every number up to$n\$ cubed.

Obviously the process of squaring a particular number in the list is independent of squaring any other number.

In [42]:
from multiprocessing import Pool
import time

def cube_sum(x):
return sum([i**3 for i in range(x)])

#we use the context manager to take care of all the setup
#we create a Pool object which contains the processors we can send tasks to
#here we have chosen to use 4 processes

start = time.time()
nums = [i for i in range(10000)]

with Pool(4) as p:
result = p.map(cube_sum, nums)
print(f"Parallel job took: {time.time() - start}")

### normally:
start_serial = time.time()
serial_result = [cube_sum(x) for x in nums]
print(f"Serial job took {time.time() - start_serial}")

Parallel job took: 14.64725112915039
Serial job took 25.040592908859253


The reason I came up with such a weird function is that parallelizing is not always faster.

There is quite a bit of setup and communication that needs to happen to coordinate the processors (aka overhead).

If the actual comptuation is faster than the overhead then the normal serial method is faster.

# Others¶

There are many other modules that I did not cover, and many other functionalities of the ones I did cover that I didn't have time to show you.

Some notable Standard Library modules worth looking into:

• re: searching for patterns inside strings
• statistics: basics statistics function (mean, std, etc)
• os.path, glob: handling file paths
• csv: automatic CSV file parsing
• logging: code and error logging
• argpars: command line argument parser
• tkinter: building graphical user interfaces