Python Getting Started

In [1]:
import addutils.toc ; addutils.toc.js(ipy_notebook=True)
Out[1]:

After learning the basic Python concepts, there are still some skills to learn to start working effectively.

In this Notebook we will see how to manage functions and how to work with inport, namespaces and packages. Then we will see how to read and write external data and how to manage the external environment. Since most of our customers are working on Windows-based systems, this notebook is mainly oriented to this specific OS. Nevertheless many concepts you will find here can be applied to Mac or Linux systems.

In [2]:
from addutils import css_notebook
css_notebook()
Out[2]:
In [3]:
pwd
Out[3]:
'/home/matteo/Projects/tutorials/python-ipython'

1 Comment your code !

If there was a single thing you have to keep in mind, well, this is it: comment your code! This is particulary important when you start to have structured code involving classes and functions and when you start to collaborate with someone else. As you see in the following code there are two types of comments in Python:

# Single line comments are defined by the # sign
    '''
    Milti-line comments are defined using
    three consecutive single quote signs.
    '''

But remember:

  • Comments must be used just to explain what can't be understood by reading the code
  • Bad, obvious and out-of-date comments are much worst than no-comment
  • First write clear code, then add comments to explain what isn't obvious

2 Functions

2.1 Local Functions

Local functions are used to avoid code repetition and to give a tidy face to your code. Have a look to the code in the next cell and notice the following things:

  1. Always use intelligible names for variables. In this case we used spacing_string
  2. Function arguments can have default values. This mean that you don't have to reassign all the values every time you call a function: you will define just the values you need. Notice: mandatory (non-default) values come always first in the function call definition
  3. Function arguments are named: you can specify the arguments in any order when calling the function
  4. Functions can be defined anywhere in the code (better at the beginning of the code), and always before being called
  5. Arguments are passed by reference
In [4]:
def local(spacing_string, n=5):
    '''Print n carriage returns
       "spacing_string" must be provided
       "n" can be omitted and gets the default value'''
    # Variables defined inside functions are local
    print(spacing_string*n)

local('-')     # n = 5 (default value)
local('*',n=9) # n = 9 (named argument)
-----
*********
In [5]:
# Since you wrote a nice description for you function
# you can invoke help with help(local) or alternatively with local?
local?

2.2 External Functions

External functions are saved in external files. As an example you will find in this folder a file named my_module.py. This is the code:

In [6]:
import addutils.my_module as my
%pfile my.my_function
# Check the code below ↓

my_function(name) accepts a tuple made of two strings and calls _my_private_function. Functions whos name begins with '_' are meant to be private and cannot be called from outside. Lets try a call

In [7]:
import addutils.my_module as my
print(my.my_function(('rick', 'bayes')))
rick [BAYES]

Try by yourself  the following commands:

my.MODULE_CONSTANT
my.module_variable
my.my_function?
my?
In [8]:
my?

Testing your code with if name == 'main': To write reliable code, one of the most important things is to do continuous testing. In Python there is an easy way to test your code every time you modify your functions. When check name == 'main' is True, it means that the module has been called from the command line. You can use this check to write your Unit Testing code:

if __name__ == '__main__':
    ''' This is a Unit Test: use "run my_module" from Python interpreter'''
    print 'This is the testing code:'
    print my_function(('John', 'Doe'))

Try to call your module from the command line:

In [9]:
%run -m addutils.my_module
This is the testing code:
Johnn [DOE]
/home/matteo/anaconda3/envs/addfor_tutorials/lib/python3.6/runpy.py:125: RuntimeWarning: 'addutils.my_module' found in sys.modules after import of package 'addutils', but prior to execution of 'addutils.my_module'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))

2.3 Private Methods

Methods whose names start with '_' are meant to be private: this means you aren't supposed to access it. This is an example:

def _my_private_function(first_name, second_name):
    ...
    return full_name

If you "import my_module as my" and try to type my.[TAB] you'll see just the public methods and variables.

Actually Python allows you to call private methods anyway but we advice you to do it just when you'll be much more proficient in using this language. Try if you want:

my._my_private_function('John', 'McEnroe')
In [10]:
#my._my_private_function('John', 'McEnroe')

Try by yourself  some more examples:

# Explore other private methods with: my._ + TAB
import numpy as np
print(my.__doc__)
print(my.my_function.__doc__)
name = ('Graham', 'Chapman')
print(my.my_function(name))
my?               # Module documentation: OBJECT INTROSPECTION
my??              # will also show the function's source code if possible
np.*load*?        # ? can be also used to search the namespace
my.my_function?   # Module function documentation
help(my)          # Module Help: notice private functions not listed
In [11]:
#help(my)

2.4 Anonymous Functions (lambda functions)

Python supports the creation of anonymous functions (i.e. functions that are not bound to a name) at runtime, using a construct called "lambda".

This piece of code shows the difference between a normal function definition f and a lambda function g:

In [12]:
def f (x):
    return x**2

g = lambda x: x**2
print(f(4), g(4))
16 16

Note that the lambda definition does not include a 'return' statement (it always contains an expression which is returned).

Also note that you can put a lambda definition anywhere a function is expected, and you don't have to assign it to a variable at all.

Check the following code to have an idea of the typical usage for lambda functions: here we sanitize a list of strings by 'mapping' a list:

In [13]:
import re
states = [' Alabama ', 'Georgia!', '  ## Georgia', ' ? georgia', 'FlOrIda']
clean = lambda str: re.sub('[!#?]', '', str.strip()).title()
for c in map(clean, states):
    print(c)
Alabama
Georgia
 Georgia
 Georgia
Florida

3 File I/O

3.1 Simple I/O

In Python is very easy to work with files. Try by yourself  this self-explaining code:

In [14]:
import os.path
path = os.path.join(os.path.curdir, "example_data", "my_input.txt")
ifile = open(path, 'r')
for l in ifile: # ifile is an iterator
    print(l, end='')    # ',' is for suppressing the newline '\n'
ifile.close()
First Second
10     0.32432
20  1.324
21 7.237923
36 .83298932
56        237.327823
In [15]:
# Read all the lines in a list
ifile = open(path, 'r')
lines = ifile.readlines()
print(lines)
ifile.close()
['First Second\n', '10     0.32432\n', '20  1.324\n', '21 7.237923\n', '36 .83298932\n', '56        237.327823\n']

Read a file, format and write back

In [16]:
ifile = open(path)         # 'read mode' is default
path_2 = os.path.join(os.path.curdir, "tmp", "my_input2.txt")
ofile = open(path_2, 'w')   # Open Output file in 'write mode'
for line in ifile:                   # Read ONE line at a time
    s = line.split()
    try:
        ofile.write('{:g} {:14.3e}\n'.format(float(s[0]), float(s[1])))
        print('{:g} {:14.3e}\n'.format(float(s[0]), float(s[1])), end='')
    except:
        ofile.write('{0} {1}\n'.format(s[0], s[1]))
        print('{} {}\n'.format(s[0], s[1]))
# Notice: 'print' automatically adds a newline at the of the string

ifile.close()
ofile.close()
First Second

10      3.243e-01
20      1.324e+00
21      7.238e+00
36      8.330e-01
56      2.373e+02

When it's possible use the "with" syntax, this will close the file automatically in case of an exception preventing the program flow to reach the 'close' statements. This is also considered a "more pythonic" style.

In [17]:
with open(path) as fid:
    for line in fid:
        s = line.split()
        try:
            print('{:g} {:14.3e}\n'.format(float(s[0]), float(s[1])), end='')
        except:
            print('{} {}\n'.format(s[0], s[1]))
First Second

10      3.243e-01
20      1.324e+00
21      7.238e+00
36      8.330e-01
56      2.373e+02

3.2 Pickle / cPickle

This is the most common way to serialize and save to disk any type of Python object. Mind that if you need to save complex and structured data and share it, cPickle is not the preferred method: consider instead of using a specific file format like hdf5

A Python pickle file is (and always has been) a byte stream. Which means that you should always open a pickle file in binary mode: “wb” to write it, and “rb” to read it. The Python docs contain correct example code.

See also Programming Python for absolute beginners: Chapter 7 Storing Complex Data on stackoverflow.

In [18]:
import pickle                                # in Python 3 cPickle doesn't exist anymore
ls = ['one', 'two', 'three']
with open('tmp/out_ascii.pkl', 'wb') as f:   # Can choose an arbitrary extension
    pickle.dump(ls, f, 0)                    # dump with protocol '0': readable ASCII
with open('tmp/out_compb.pkl', 'wb') as f:   # Can choose an arbitrary extension
    pickle.dump(ls, f, 2)                    # dump with protocol '2': compressed bin

with open('tmp/out_compb.pkl', 'rb') as f:
    d2 = pickle.load(f)                      # Protocol is automatically detected
print(d2)
['one', 'two', 'three']

4 Operating System

4.1 General Info

Python gives you extensive possibilities to access you PC operating system. There are three modules in the Python Standard Library that you must be aware of:

4.2 sys — System-specific parameters and functions

Try some example commands by running the following cells

In [19]:
import sys 
# Platform identifier
sys.platform
Out[19]:
'linux'
In [20]:
# Version number of the Python interpreter
sys.version
Out[20]:
'3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) \n[GCC 7.2.0]'
In [21]:
# PYTHONPATH: Folders in which looking for modules
for p in sys.path:
    print(p.strip())
/home/matteo/anaconda3/envs/addfor_tutorials/lib/python36.zip
/home/matteo/anaconda3/envs/addfor_tutorials/lib/python3.6
/home/matteo/anaconda3/envs/addfor_tutorials/lib/python3.6/lib-dynload
/home/matteo/anaconda3/envs/addfor_tutorials/lib/python3.6/site-packages
/home/matteo/anaconda3/envs/addfor_tutorials/lib/python3.6/site-packages/IPython/extensions
/home/matteo/.ipython
In [22]:
# Shows where the Python files are installed
print(sys.exec_prefix)
/home/matteo/anaconda3/envs/addfor_tutorials
In [23]:
# Information about the float DataType
sys.float_info
Out[23]:
sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)
In [24]:
# The largest (simple) positive integer supported, in Python 2.x was sys.maxint
# now in Python 3 the integers are only limited by 'maxsize'. Example:

int(2**4000)
Out[24]:
13182040934309431001038897942365913631840191610932727690928034502417569281128344551079752123172122033140940756480716823038446817694240581281731062452512184038544674444386888956328970642771993930036586552924249514488832183389415832375620009284922608946111038578754077913265440918583125586050431647284603636490823850007826811672468900210689104488089485347192152708820119765006125944858397761874669301278745233504796586994514054435217053803732703240283400815926169348364799472716094576894007243168662568886603065832486830606125017643356469732407252874567217733694824236675323341755681839221954693820456072020253884371226826844858636194212875139566587445390068014747975813971748114770439248826688667129237954128555841874460665729630492658600179338272579110020881228767361200603478973120168893997574353727653998969223092798255701666067972698906236921628764772837915526086464389161570534616956703744840502975279094087587298968423516531626090898389351449020056851221079048966718878943309232071978575639877208621237040940126912767610658141079378758043403611425454744180577150855204937163460902512732551260539639221457005977247266676344018155647509515396711351487546062479444592779055555421362722504575706910949376
In [25]:
# Maximum size integers, lists, strings, dicts can have
sys.maxsize
Out[25]:
9223372036854775807

4.3 os — Miscellaneous operating system interfaces

Try some example commands by running the following cells

In [26]:
import os
for counter, osvariable in enumerate(os.environ):
    if counter >= 10:
        print('AND MORE ...')
        break
    print('{:>25s}: {:s}'.format(osvariable,os.environ[osvariable][:64]))
else:
    print('============ No more OS Variables ============')
                 XDG_VTNR: 7
                 LC_PAPER: it_IT.UTF-8
               LC_ADDRESS: it_IT.UTF-8
           XDG_SESSION_ID: c2
     XDG_GREETER_DATA_DIR: /var/lib/lightdm-data/matteo
              LC_MONETARY: it_IT.UTF-8
        CLUTTER_IM_MODULE: xim
                  SESSION: ubuntu
           GPG_AGENT_INFO: /home/matteo/.gnupg/S.gpg-agent:0:1
                     TERM: xterm-color
AND MORE ...
In [27]:
# How to check a system variable:
if 'NUMBER_OF_PROCESSORS' in os.environ:
    print('Number of processors in this machine:', os.environ['NUMBER_OF_PROCESSORS'])
In [28]:
# Working directory
print(os.getcwd())
/home/matteo/Projects/tutorials/python-ipython
In [29]:
# List the files in the current directory
for filename in sorted(os.listdir(os.getcwd())):
    print(filename)
.ipynb_checkpoints
example_data
index.ipynb
py01v04_ipython_notebook_introduction.ipynb
py02v04_python_basics.ipynb
py03v04_python_getting_started.ipynb
py04v04_python_style_guide.ipynb
py05v04_python_more_examples.ipynb
py06v04_python_object_oriented.ipynb
py07v04_python_speed-up_with_C.ipynb
py08v04_Unicode.ipynb
py09v04_python_regular_expressions.ipynb
py10v04_ipython_notebook_widgets.ipynb
tmp
utilities

Difference between os.name and sys.platform:

  • sys.platform will distinguish between linux, other unixes, and OS X
  • os.name is "posix" for all of them
In [30]:
os.name
Out[30]:
'posix'
In [31]:
# Correctly handle paths, and filenames
# os.name can be 'posix', 'nt', 'mac', 'os2', 'ce', 'java', 'riscos'
if os.name == 'posix':
    full_path = "/Users/dani/myfile.py"
elif os.name == 'nt':
    full_path = 'C:\\myfile.py'

print(os.path.splitdrive(full_path))
print(os.path.split(full_path))
('', '/Users/dani/myfile.py')
('/Users/dani', 'myfile.py')
In [32]:
if os.name == 'posix':
    os.system('ls')
else:
    os.system('dir')

4.4 glob — Unix style pathname pattern expansion

Try some example commands by running the following cells

In [33]:
import glob
print(glob.glob('*.txt'))
[]

Visit www.add-for.com for more tutorials and updates.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.