Python for Developers

First Edition

Chapter 11: Standard Library


It's often said that Python comes with "batteries included", in reference to the vast library of modules and packages that are distributed with the interpreter.

Some important modules of the standard library:

  • Math: math, cmath, and random decimal.
  • System: os, glob, subprocess and shutils.
  • Threads: threading.
  • Persistence: pickle and cPickle.
  • XML: xml.dom, xml.sax and ElementTree (since version 2.5).
  • Configuration: ConfigParser and optparse.
  • Time: time and datetime.
  • Other: sys, logging, traceback, types and timeit.

Maths

In addition to the builtin numeric types in the Python standard library, there are several modules devoted to implementing other types and mathematical operations.

The math module defines logarithmic, exponentiation, trigonometric, and hyperbolic functions, as well as angular conversions and more. The cmath module implements similar functions, but can handle complex numbers.

Example:

In [2]:
import math

import cmath

# Complex
for cpx in [3j, 1.5 + 1j, -2 - 2j]:

    # Polar coordinate conversion
    plr = cmath.polar(cpx)
    print 'Complex:', cpx
    print 'Polar:', plr, '(in radians)'
    print 'Amplitude:', abs(cpx)
    print 'Angle:', math.degrees(plr[1]), '(grades)'
Complex: 3j
Polar: (3.0, 1.5707963267948966) (in radians)
Amplitude: 3.0
Angle: 90.0 (grades)
Complex: (1.5+1j)
Polar: (1.8027756377319946, 0.5880026035475675) (in radians)
Amplitude: 1.80277563773
Angle: 33.690067526 (grades)
Complex: (-2-2j)
Polar: (2.8284271247461903, -2.356194490192345) (in radians)
Amplitude: 2.82842712475
Angle: -135.0 (grades)

The random module brings functions for random number generation.

Examples:

In [8]:
import random
import string

# Choose a letter
print random.choice(string.ascii_uppercase)

# Choose a number from 1 to 10
print random.randrange(1, 11)

# Choose a float from 0 to 1
print random.random()
B
2
0.117017323204

In the standard library there is the decimal module that defines operations with real numbers with fixed precision.

Example:

In [9]:
from decimal import Decimal

t = 5.
for i in range(50):
    t = t - 0.1

print 'Float:', t

t = Decimal('5.')
for i in range(50):
    t = t - Decimal('0.1')

print 'Decimal:', t
Float: 1.02695629778e-15
Decimal: 0.0

With this module, it is possible to reduce the introduction of rounding errors arising from floating point arithmetic.

In version 2.6, the module fractions, which deals with rational numbers, is also available.

Example:

In [10]:
from fractions import Fraction

# Three fractions
f1 = Fraction('-2/3')
f2 = Fraction(3, 4)
f3 = Fraction('.25')
print "Fraction('-2/3') =", f1
print "Fraction('3, 4') =", f2
print "Fraction('.25') =", f3

# Sum
print f1, '+', f2, '=', f1 + f2
print f2, '+', f3, '=', f2 + f3
Fraction('-2/3') = -2/3
Fraction('3, 4') = 3/4
Fraction('.25') = 1/4
-2/3 + 3/4 = 1/12
3/4 + 1/4 = 1

Fractions can be initialized in several ways: as a string, as a pair of integers, or as a real number. The module also has a function called gcd() which calculates the greatest common divisor (gcd) of two integers.

Files and I/O

Files in Python are represented by objects of type *file*, which offer various methods for file operations. Files can be opened for reading ('r', which is the default), writing ('w'), or appending ('a'), in text or binary ('b') mode.

In Python:

  • sys.stdin is the standard input.
  • sys.stdout is the standard output.
  • sys.stderr is the standard error output.

The standard input, output and error are handled by Python as open files. The input in read mode and the other in the recording mode.

Sample of writing:

In [3]:
import sys

# Create an object of type file
temp = open('temp.txt', 'w')

# Write output
for i in range(20):
    temp.write('%03d\n' % i)

temp.close()

temp = open('temp.txt')

# Write in terminal
for x in temp:
    # writing in sys.stdout sends
    # text to standard output
    sys.stdout.write(x)

temp.close()
000
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019

At each iteration in the second loop, the object returns a line from the file each time.

Reading example:

In [3]:
import sys
import os.path

fn = 'test.txt'

if not os.path.exists(fn):
    print 'Try again...'
    sys.exit()

# Numbering lines
for i, s in enumerate(open(fn)):
    print i + 1, s,
An exception has occurred, use %tb to see the full traceback.

SystemExit
Try again...
To exit: use 'exit', 'quit', or Ctrl-D.

It is possible to read all the lines with the method readlines():

In [4]:
# Prints a list with all the lines from a file
print open('temp.txt').readlines()
['000\n', '001\n', '002\n', '003\n', '004\n', '005\n', '006\n', '007\n', '008\n', '009\n', '010\n', '011\n', '012\n', '013\n', '014\n', '015\n', '016\n', '017\n', '018\n', '019\n']

The objects of type file also have the method seek(), which allow going to any position in the file.

In version 2.6 the module io is available, which implements file operations and text manipulation routines separately.

File Systems

Modern operating systems store files in hierarchical structures called file systems.

Several features related to file systems are implemented in the module os.path, such as:

  • os.path.basename(): returns the final component of a path.
  • os.path.dirname(): returns a path without the final component.
  • os.path.exists(): returns True if the path exists or False otherwise.
  • os.path.getsize(): returns the size of the file in bytes.

glob is another module related to the file system:

In [2]:
import os.path
import glob

# Shows a list of file names
# and their respective sizes 
for arq in sorted(glob.glob('*.py')):
    print arq, os.path.getsize(arq)

The glob.glob() function returns a list of filenames that meet the criteria passed as a parameter in a similar way to the ls command available on UNIX systems.

Temporary files

The module os implements some functions to facilitate the creation of temporary files, freeing the developer from some concerns, such as:

  • Avoiding collisions with names of files that are in use.
  • Identifying the appropriate area of the file system for temporary files (which varies by operating system).
  • Not exposing the implementation risks (temporary area is used by other processes).

Example:

In [4]:
import os

text = 'Test'
# creates a temporary file
temp = os.tmpfile()

# writes in the temp file
temp.write(text)

# Go back to the beginning the the file
temp.seek(0)

# Shows file content
print temp.read()

# Closes file
temp.close()
Test

There is also the tempnam() function, which returns a valid name for temporary file, including a path that respects the conventions of the operating system. However, it is up to the developer to ensure that the routine is used so as not to compromise the security of the application.

Compressed files

Python has modules to work with multiple formats of compressed files.

Example of writing a ".zip" file:

In [6]:
"""
Writing text in a compressed file
"""

import zipfile

text = """
**************************************
This text will be compressed and ...
... stored inside a zip file.
***************************************
"""

# Creates a new zip
zip = zipfile.ZipFile('arq.zip', 'w',
    zipfile.ZIP_DEFLATED)

# Writes a string in zip as if it were a file
zip.writestr('text.txt', text)

# closes the zip
zip.close()

Reading example:

In [7]:
"""
Reading a compressed file
"""

import zipfile

# Open the zip file for reading 
zip = zipfile.ZipFile('arq.zip')

# Gets a list of compressed files
arqs = zip.namelist()

for arq in arqs:
    # Shows the file name
    print 'File:', arq
    # get file info
    zipinfo = zip.getinfo(arq)
    print 'Original size:', zipinfo.file_size
    print 'Compressed size:', zipinfo.compress_size

    # Shows file content
    print zip.read(arq)
File: text.txt
Original size: 147
Compressed size: 75

**************************************
This text will be compressed and ...
... stored inside a zip file.
***************************************

Python also provides modules for gzip, bzip2 and tar formats that are widely used in UNIX environments.

Data file

In the standard library, Python also provides a module to simplify the processing of files in CSV (Comma Separated Values) format.

In CSV format, the data is stored in text form, separated by commas, one record per line.

Writing example:

In [ ]:
import csv

# Data
dt = (('temperatura', 15.0, 'C', '10:40', '2006-12-31'),
    ('peso', 42.5, 'kg', '10:45', '2006-12-31'))

# A writing routine which receives one object of type file
out = csv.writer(file('dt.csv', 'w'))

# Writing the tuples in file
out.writerows(dt)

Reading example:

In [9]:
import csv

# The reading routine receives a file object
dt = csv.reader(file('dt.csv'))

# For each record in file, prints
for reg in dt:
    print reg
['temperature', '15.0', 'C', '10:40', '2006-12-31']
['weight', '42.5', 'kg', '10:45', '2006-12-31']

The CSV format is supported by most spreadsheet and databases for data import and export.

Operating System

Apart from the file system, the modules of the standard library also provides access to other services provided by the operating system.

Example:

In [10]:
import os
import sys
import platform

def uid():
    """
    uid() -> returns the current user identification
    or None if not possible to identify
    """

    # Ambient variables for each operating system
    us = {'Windows': 'USERNAME',
        'Linux': 'USER'}

    u = us.get(platform.system())
    return os.environ.get(u)

print 'User:', uid()
print 'plataform:', platform.platform()
print 'Current dir:', os.path.abspath(os.curdir)
exep, exef = os.path.split(sys.executable)
print 'Executable:', exef
print 'Executable dir:', exep
User: csig
plataform: Linux-3.11.0-15-generic-x86_64-with-Ubuntu-13.10-saucy
Current dir: /home/csig/teste/python-for-developers/Chapter11
Executable: python
Executable dir: /home/csig/env/teste/bin

Process execution example:

In [ ]:
import sys
from subprocess import Popen, PIPE

# ping
cmd = 'ping -c 1 '
# No Windows
if sys.platform == 'win32':
    cmd = 'ping -n 1 '

# Local just for testing
host = '127.0.0.1'

# Comunicates with another process
# a pipe with the command stdout
py = Popen(cmd + host, stdout=PIPE)

# Shows command output
print py.stdout.read()

The subprocess module provides a generic way of running processes with Popen() function which allows communication with the process through operating system pipes.

Time

Python has two modules to handle time:

  • Time: implements functions that allow using the time generated by the system.
  • Datetime: implements high-level types to perform date and time operations.

Example with time:

In [12]:
import time

# localtime() Returns a date and local time in the form 
# of a structure called struct_time, which is a 
# collection with the items: year, month, day, hour, minute,
# secund, day of the week, day of the year and e daylight saving time
print time.localtime()

# asctime() returns a date and hour with string, according to
# operating system configuration
print time.asctime()

# time() returns system time in seconds
ts1 = time.time()

# gmtime() converts seconds to struct_time
tt1 = time.gmtime(ts1)
print ts1, '->', tt1

# Adding an hour
tt2 = time.gmtime(ts1 + 3600.)

# mktime() converts struct_time  to seconds
ts2 = time.mktime(tt2)
print ts2, '->', tt2

# clock() returs time since the program started, in seconds
print 'The program took', time.clock(), \
    'seconds up to now...'

# Counting seconds...
for i in xrange(5):

    # sleep() waits the number of seconds specified as parameter
    time.sleep(1)
    print i + 1, 'second(s)'
time.struct_time(tm_year=2014, tm_mon=2, tm_mday=4, tm_hour=13, tm_min=51, tm_sec=28, tm_wday=1, tm_yday=35, tm_isdst=1)
Tue Feb  4 13:51:28 2014
1391529088.79 -> time.struct_time(tm_year=2014, tm_mon=2, tm_mday=4, tm_hour=15, tm_min=51, tm_sec=28, tm_wday=1, tm_yday=35, tm_isdst=0)
1391543488.0 -> time.struct_time(tm_year=2014, tm_mon=2, tm_mday=4, tm_hour=16, tm_min=51, tm_sec=28, tm_wday=1, tm_yday=35, tm_isdst=0)
The program took 1.59 seconds up to now...
1 second(s)
2 second(s)
3 second(s)
4 second(s)
5 second(s)

In datetime, four types are defined for representing time:

  • datetime: date and time.
  • date: just date.
  • time: just time.
  • timedelta: time diference.

Example:

In [14]:
import datetime

# datetime() receives as parameter:
# year, month, day, hour, minute, second and 
# returns an object of type datetime
dt = datetime.datetime(2020, 12, 31, 23, 59, 59)

# Objects date and time can be created from
# a datetime object
date = dt.date()
hour = dt.time()

# How many time to 12/31/2020
dd = dt - dt.today()

print 'Date:', date
print 'Hour:', hour
print 'How many time to 12/31/2020:', dd
Date: 2020-12-31
Hour: 23:59:59
How many time to 12/31/2020: 2522 days, 10:02:22.939467

Objects of types date and datetime return dates in ISO format.

Regulares expressions

Regular expression is a form of identifying patterns in character strings. In Python, the re module provides a syntactic parser that allows the use of such expressions. The patterns are defined by characters that have special meaning to the parser.

Main characteres:

  • Point (.): In standard mode means any character except the newline.
  • Circunflex (^): In standard mode, means beginning of the string.
  • Dollar ($): In standard mode, means end of the string.
  • Backslash (\): Escape character, allows using special chars as normal chars.
  • Brackets ([]): Any character of the listed inside the brackets.
  • Asterisk (*): Zero or more ocurrrences of previous expression.
  • Plus sign (+): One or more ocurrences of previous expression.
  • Question mark (?): Zero or one ocurrence of previous expression.
  • Braces ({n}): n ocurrences of previous expression.
  • Vertical bar (|): logical “or”.
  • Parenthesis (()): Delimit a group of expressions.
  • \d: Digit. Same as [0-9].
  • \D: Non digit. Same as [^0-9].
  • \s: Any spacing character ([ \t\n\r\f\v]).
  • \S: Any nonspacing character ([^ \t\n\r\f\v]).
  • \w: Alphanumeric character or underline ([a-zA-Z0-9_]).
  • \W: Not an Alphanumeric character or underline ([^a-zA-Z0-9_]).

Exemplos:

In [22]:
import re

# Compile the regular expression using compile()
# the compiled regular expression is stored and 
# can be reused
rex = re.compile('\w+')

# Finds the occurrences according to the expression
bands = 'Yes, Genesis & Camel'
print bands, '->', rex.findall(bands)

# Identify occurrences of Björk (and their variations)
bjork = re.compile('[Bb]j[öo]rk')
for m in ('Björk', 'björk', 'Biork', 'Bjork', 'bjork'):

    # match() finds occurrences at the beginning of the string
    # to find at any part of the string, use search()
    print m, '->', bool(bjork.match(m))

# Replacing text
text = 'The next track is Stairway to Heaven'
print text, '->', re. sub('[Ss]tairway [Tt]o [Hh]eaven',
    'The Rover', text)

# Splitting text
bands = 'Tool, Porcupine Tree and NIN'
print bands, '->', re.split(',?\s+and?\s+', bands)
Yes, Genesis & Camel -> ['Yes', 'Genesis', 'Camel']
Björk -> False
björk -> False
Biork -> False
Bjork -> True
bjork -> True
The next track is Stairway to Heaven -> The next track is The Rover
Tool, Porcupine Tree and NIN -> ['Tool, Porcupine Tree', 'NIN']

The behaviour of the functions of this module can be changed by options, to treat strings as unicode, for instance.

In [1]:
 
Out[1]: