Tutorial on Python for scientific computing

Marcos Duarte

This tutorial is a short introduction to programming and a demonstration of the basic features of Python for scientific computing. To use Python for scientific computing we need the Python program itself with its main modules and specific packages for scientific computing. See this notebook on how to install Python for scientific computing.
Once you get Python and the necessary packages for scientific computing ready to work, there are different ways to run Python, the main ones are:

  • open a terminal window in your computer and type python or ipython that the Python interpreter will start
  • run the IPython notebook and start working with Python in a browser
  • run Spyder, an interactive development environment (IDE)
  • run the IPython qtconsole, a more featured terminal
  • run IPython completely in the cloud with for example, https://cloud.sagemath.com or https://www.wakari.io
  • run Python online in a website such as https://www.pythonanywhere.com/
  • run Python using any other Python editor or IDE

We will use the IPython Notebook for this tutorial but you can run almost all the things we will see here using the other forms listed above.

Python as a calculator

Once in the IPython notebook, if you type a simple mathematical expression and press Shift+Enter it will give the result of the expression:

In [1]:
1 + 2 - 30
Out[1]:
-27
In [2]:
4/5
Out[2]:
0.8

If you are using Python version 2.x instead of Python 3.x, you should have got 0 as the result of 4 divided by 5, which is wrong! The problem is that for Python versions up to 2.x, the operator '/' performs division with integers and the result will also be an integer (this behavior was changed in version 3.x).

If you want the normal behavior for division, in Python 2.x you have two options: tell Python that at least one of the numbers is not an integer or import the new division operator (which is inoffensive if you are already using Python 3), let's see these two options:

In [3]:
4/5.
Out[3]:
0.8
In [4]:
from __future__ import division
In [5]:
4/5
Out[5]:
0.8

I prefer to use the import division option (from future!); if we put this statement in the beginning of a file or IPython notebook, it will work for all subsequent commands.

Another command that changed its behavior from Python 2.x to 3.x is the print command.
In Python 2.x, the print command could be used as a statement:

In [6]:
print 4/5
  File "<ipython-input-6-cdea43a24122>", line 1
    print 4/5
          ^
SyntaxError: invalid syntax

With Python 3.x, the print command bahaves as a true function and has to be called with parentheses. Let's also import this future command to Python 2.x and use it from now on:

In [7]:
from __future__ import print_function
In [8]:
print(4/5)
0.8

With the print function, let's explore the mathematical operations available in Python:

In [9]:
print('1+2 = ', 1+2, '\n', '4*5 = ', 4*5, '\n', '6/7 = ', 6/7, '\n', '8**2 = ', 8**2, sep='')
1+2 = 3
4*5 = 20
6/7 = 0.8571428571428571
8**2 = 64

And if we want the square-root of a number:

In [3]:
sqrt(9)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-c836dfef5db4> in <module>()
----> 1 sqrt(9)

NameError: name 'sqrt' is not defined

We get an error message saying that the sqrt function if not defined. This is because sqrt and other mathematical functions are available with the math module:

In [11]:
import math
In [12]:
math.sqrt(9)
Out[12]:
3.0
In [5]:
from math import sqrt
In [6]:
sqrt(9)
Out[6]:
3.0

The import function

We used the command 'import' to be able to call certain functions. In Python functions are organized in modules and packages and they have to be imported in order to be used.

A module is a file containing Python definitions (e.g., functions) and statements. Packages are a way of structuring Python’s module namespace by using “dotted module names”. For example, the module name A.B designates a submodule named B in a package named A. To be used, modules and packages have to be imported in Python with the import function.

Namespace is a container for a set of identifiers (names), and allows the disambiguation of homonym identifiers residing in different namespaces. For example, with the command import math, we will have all the functions and statements defined in this module in the namespace 'math.', for example, 'math.pi' is the π constant and 'math.cos()', the cosine function.

By the way, to know which Python version you are running, we can use one of the following modules:

In [13]:
import sys
sys.version
Out[13]:
'3.4.1 |Anaconda 2.0.1 (64-bit)| (default, May 19 2014, 13:02:30) [MSC v.1600 64 bit (AMD64)]'

And if you are in an IPython session:

In [14]:
from IPython import sys_info
print(sys_info())
{'commit_hash': '681fd77',
 'commit_source': 'installation',
 'default_encoding': 'cp1252',
 'ipython_path': 'C:\\Anaconda3\\lib\\site-packages\\IPython',
 'ipython_version': '2.1.0',
 'os_name': 'nt',
 'platform': 'Windows-7-6.1.7601-SP1',
 'sys_executable': 'C:\\Anaconda3\\python.exe',
 'sys_platform': 'win32',
 'sys_version': '3.4.1 |Anaconda 2.0.1 (64-bit)| (default, May 19 2014, '
                '13:02:30) [MSC v.1600 64 bit (AMD64)]'}

The first option gives information about the Python version; the latter also includes the IPython version, operating system, etc.

Object-oriented programming

Python is designed as an object-oriented programming (OOP) language. OOP is a paradigm that represents concepts as "objects" that have data fields (attributes that describe the object) and associated procedures known as methods.

This means that all elements in Python are objects and they have attributes which can be acessed with the dot (.) operator after the name of the object. We already experimented with that when we imported the module sys, it became an object, and we acessed one of its attribute: sys.version.

OOP as a paradigm is much more than defining objects, attributes, and methods, but for now this is enough to get going with Python.

Python and IPython help

To get help about any Python command, use help():

In [15]:
help(math.degrees)
Help on built-in function degrees in module math:

degrees(...)
    degrees(x)
    
    Convert angle x from radians to degrees.

Or if you are in the IPython environment, simply add '?' to the function that a window will open at the bottom of your browser with the same help content:

In [16]:
math.degrees?

And if you add a second '?' to the statement you get access to the original script file of the function (an advantage of an open source language), unless that function is a built-in function that does not have a script file, which is the case of the standard modules in Python (but you can access the Python source code if you want; it just does not come with the standard program for installation).

So, let's see this feature with another function:

In [17]:
import scipy.fftpack
scipy.fftpack.fft??

To know all the attributes of an object, for example all the functions available in math, we can use the function dir:

In [18]:
print(dir(math))
['__doc__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'hypot', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'trunc']

Tab completion in IPython

IPython has tab completion: start typing the name of the command (object) and press tab to see the names of objects available with these initials letters. When the name of the object is typed followed by a dot (math.), pressing tab will show all available attribites, scroll down to the desired attribute and press Enter to select it.

The four most helpful commands in IPython

These are the most helpful commands in IPython (from IPython tutorial):

  • ? : Introduction and overview of IPython’s features.
  • %quickref : Quick reference.
  • help : Python’s own help system.
  • object? : Details about ‘object’, use ‘object??’ for extra details.

See these IPython Notebooks for more on IPython and the Notebook capabilities.

Comments

Comments in Python start with the hash character, #, and extend to the end of the physical line:

In [19]:
# Import the math library to access more math stuff
import math
math.pi  # this is the pi constant; a useless comment since this is obvious
Out[19]:
3.141592653589793

To insert comments spanning more than one line, use a multi-line string with a pair of matching triple-quotes: """ or ''' (we will see the string data type later). A typical use of a multi-line comment is as documentation strings and are meant for anyone reading the code:

In [20]:
"""Documentation strings are typically written like that.

A docstring is a string literal that occurs as the first statement
in a module, function, class, or method definition.

"""
Out[20]:
'Documentation strings are typically written like that.\n\nA docstring is a string literal that occurs as the first statement\nin a module, function, class, or method definition.\n\n'

A docstring like above is useless and its output as a standalone statement looks uggly in IPython Notebook, but you will see its real importance when reading and writting codes.

Commenting a programming code is an important step to make the code more readable, which Python cares a lot.
There is a style guide for writting Python code (PEP 8) with a session about how to write comments.

Magic functions

IPython has a set of predefined ‘magic functions’ that you can call with a command line style syntax.
There are two kinds of magics, line-oriented and cell-oriented.
Line magics are prefixed with the % character and work much like OS command-line calls: they get as an argument the rest of the line, where arguments are passed without parentheses or quotes.
Cell magics are prefixed with a double %%, and they are functions that get as an argument not only the rest of the line, but also the lines below it in a separate argument.

Assignment and expressions

The equal sign ('=') is used to assign a value to a variable. Afterwards, no result is displayed before the next interactive prompt:

In [3]:
x = 1

Spaces between the statements are optional but it helps for readability.

To see the value of the variable, call it again or use the print function:

In [4]:
x
Out[4]:
1
In [5]:
print(x)
1

Of course, the last assignment is that holds:

In [6]:
x = 2
x = 3
x
Out[6]:
3

In mathematics '=' is the symbol for identity, but in computer programming '=' is used for assignment, it means that the right part of the expresssion is assigned to its left part.
For example, 'x=x+1' does not make sense in mathematics but it does in computer programming:

In [7]:
x = 1
print(x)
x = x + 1
print(x)
1
2

A value can be assigned to several variables simultaneously:

In [8]:
x = y = 4
print(x)
print(y)
4
4

Several values can be assigned to several variables at once:

In [9]:
x, y = 5, 6
print(x)
print(y)
5
6

And with that, you can do (!):

In [10]:
x, y = y, x
print(x)
print(y)
6
5

Variables must be “defined” (assigned a value) before they can be used, or an error will occur:

In [11]:
x = z
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-11-cfba4031bce1> in <module>()
----> 1 x = z

NameError: name 'z' is not defined

Variables and types

There are different types of built-in objects in Python (and remember that everything in Python is an object):

In [30]:
import types
print(dir(types))
['BuiltinFunctionType', 'BuiltinMethodType', 'CodeType', 'DynamicClassAttribute', 'FrameType', 'FunctionType', 'GeneratorType', 'GetSetDescriptorType', 'LambdaType', 'MappingProxyType', 'MemberDescriptorType', 'MethodType', 'ModuleType', 'SimpleNamespace', 'TracebackType', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_calculate_meta', 'new_class', 'prepare_class']

Let's see some of them now.

Numbers: int, float, complex

Numbers can an integer (int), float, and complex (with imaginary part).
Let's use the function type to show the type of number (and later for any other object):

In [31]:
type(6)
Out[31]:
int

A float is a non-integer number:

In [32]:
math.pi
Out[32]:
3.141592653589793
In [33]:
type(math.pi)
Out[33]:
float

Python (IPython) is showing math.pi with only 15 decimal cases, but internally a float is represented with higher precision.
Floating point numbers in Python are implemented using a double (eight bytes) word; the precison and internal representation of floating point numbers are machine specific and are available in:

In [34]:
sys.float_info
Out[34]:
sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

Be aware that floating-point numbers can be trick in computers:

In [35]:
0.1 + 0.2
Out[35]:
0.30000000000000004
In [36]:
0.1 + 0.2 - 0.3
Out[36]:
5.551115123125783e-17

These results are not correct (and the problem is not due to Python). The error arises from the fact that floating-point numbers are represented in computer hardware as base 2 (binary) fractions and most decimal fractions cannot be represented exactly as binary fractions. As consequence, decimal floating-point numbers are only approximated by the binary floating-point numbers actually stored in the machine. See here for more on this issue.

A complex number has real and imaginary parts:

In [37]:
1+2j
Out[37]:
(1+2j)
In [38]:
print(type(1+2j))
<class 'complex'>

Each part of a complex number is represented as a floating-point number. We can see them using the attributes .real and .imag:

In [39]:
print((1+2j).real)
print((1+2j).imag)
1.0
2.0

Strings

Strings can be enclosed in single quotes or double quotes:

In [40]:
s = 'string (str) is a built-in type in Python'
s
Out[40]:
'string (str) is a built-in type in Python'
In [41]:
type(s)
Out[41]:
str

String enclosed with single and double quotes are equal, but it may be easier to use one instead of the other:

In [42]:
'string (str) is a Python's built-in type'
  File "<ipython-input-42-ca70e9285fe4>", line 1
    'string (str) is a Python's built-in type'
                              ^
SyntaxError: invalid syntax
In [43]:
"string (str) is a Python's built-in type"
Out[43]:
"string (str) is a Python's built-in type"

But you could have done that using the Python escape character '\':

In [44]:
'string (str) is a Python\'s built-in type'
Out[44]:
"string (str) is a Python's built-in type"

Strings can be concatenated (glued together) with the + operator, and repeated with *:

In [45]:
s = 'P' + 'y' + 't' + 'h' + 'o' + 'n'
print(s)
print(s*5)
Python
PythonPythonPythonPythonPython

Strings can be subscripted (indexed); like in C, the first character of a string has subscript (index) 0:

In [46]:
print('s[0] = ', s[0], '  (s[index], start at 0)')
print('s[5] = ', s[5])
print('s[-1] = ', s[-1], '  (last element)')
print('s[:] = ', s[:], '  (all elements)')
print('s[1:] = ', s[1:], '  (from this index (inclusive) till the last (inclusive))')
print('s[2:4] = ', s[2:4], '  (from first index (inclusive) till second index (exclusive))')
print('s[:2] = ', s[:2], '  (till this index, exclusive)')
print('s[:10] = ', s[:10], '  (Python handles the index if it is larger than the string length)')
print('s[-10:] = ', s[-10:])
print('s[0:5:2] = ', s[0:5:2], '  (s[ini:end:step])')
print('s[::2] = ', s[::2], '  (s[::step], initial and final indexes can be omitted)')
print('s[0:5:-1] = ', s[::-1], '  (s[::-step] reverses the string)')
print('s[:2] + s[2:] = ', s[:2] + s[2:], '  (because of Python indexing, this sounds natural)')
s[0] =  P   (s[index], start at 0)
s[5] =  n
s[-1] =  n   (last element)
s[:] =  Python   (all elements)
s[1:] =  ython   (from this index (inclusive) till the last (inclusive))
s[2:4] =  th   (from first index (inclusive) till second index (exclusive))
s[:2] =  Py   (till this index, exclusive)
s[:10] =  Python   (Python handles the index if it is larger than the string length)
s[-10:] =  Python
s[0:5:2] =  Pto   (s[ini:end:step])
s[::2] =  Pto   (s[::step], initial and final indexes can be omitted)
s[0:5:-1] =  nohtyP   (s[::-step] reverses the string)
s[:2] + s[2:] =  Python   (because of Python indexing, this sounds natural)

len()

Python has a built-in functon to get the number of itens of a sequence:

In [47]:
help(len)
Help on built-in function len in module builtins:

len(...)
    len(object)
    
    Return the number of items of a sequence or mapping.

In [48]:
s = 'Python'
len(s)
Out[48]:
6

The function len() helps to understand how the backward indexing works in Python.
The index s[-i] should be understood as s[len(s) - i] rather than accessing directly the i-th element from back to front. This is why the last element of a string is s[-1]:

In [49]:
print('s = ', s)
print('len(s) = ', len(s))
print('len(s)-1 = ',len(s) - 1)
print('s[-1] = ', s[-1])
print('s[len(s) - 1] = ', s[len(s) - 1])
s =  Python
len(s) =  6
len(s)-1 =  5
s[-1] =  n
s[len(s) - 1] =  n

Or, strings can be surrounded in a pair of matching triple-quotes: """ or '''. End of lines do not need to be escaped when using triple-quotes, but they will be included in the string. This is how we created a multi-line comment earlier:

In [50]:
"""Strings can be surrounded in a pair of matching triple-quotes: \""" or '''.

End of lines do not need to be escaped when using triple-quotes,
but they will be included in the string.

"""
Out[50]:
'Strings can be surrounded in a pair of matching triple-quotes: """ or \'\'\'.\n\nEnd of lines do not need to be escaped when using triple-quotes,\nbut they will be included in the string.\n\n'

Lists

Values can be grouped together using different types, one of them is list, which can be written as a list of comma-separated values between square brackets. List items need not all have the same type:

In [51]:
x = ['spam', 'eggs', 100, 1234]
x
Out[51]:
['spam', 'eggs', 100, 1234]

Lists can be indexed and the same indexing rules we saw for strings are applied:

In [52]:
x[0]
Out[52]:
'spam'

The function len() works for lists:

In [53]:
len(x)
Out[53]:
4

Tuples

A tuple consists of a number of values separated by commas, for instance:

In [54]:
t = ('spam', 'eggs', 100, 1234)
t
Out[54]:
('spam', 'eggs', 100, 1234)

The type tuple is why multiple assignments in a single line works; elements separated by commas (with or without surrounding parentheses) are a tuple and in an expression with an '=', the right-side tuple is attributed to the left-side tuple:

In [55]:
a, b = 1, 2
print('a = ', a, '\nb = ', b)
a =  1 
b =  2

Is the same as:

In [56]:
(a, b) = (1, 2)
print('a = ', a, '\nb = ', b)
a =  1 
b =  2

Sets

Python also includes a data type for sets. A set is an unordered collection with no duplicate elements.

In [57]:
basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
fruit = set(basket)  # create a set without duplicates
fruit
Out[57]:
{'apple', 'banana', 'orange', 'pear'}

As set is an unordered collection, it can not be indexed as lists and tuples.

In [58]:
set(['orange', 'pear', 'apple', 'banana'])
'orange' in fruit  # fast membership testing
Out[58]:
True

Dictionaries

Dictionary is a collection of elements organized keys and values. Unlike lists and tuples, which are indexed by a range of numbers, dictionaries are indexed by their keys:

In [59]:
tel = {'jack': 4098, 'sape': 4139}
tel
Out[59]:
{'sape': 4139, 'jack': 4098}
In [60]:
tel['guido'] = 4127
tel
Out[60]:
{'guido': 4127, 'sape': 4139, 'jack': 4098}
In [61]:
tel['jack']
Out[61]:
4098
In [62]:
del tel['sape']
tel['irv'] = 4127
tel
Out[62]:
{'guido': 4127, 'irv': 4127, 'jack': 4098}
In [63]:
tel.keys()
Out[63]:
dict_keys(['guido', 'irv', 'jack'])
In [64]:
'guido' in tel
Out[64]:
True

The dict() constructor builds dictionaries directly from sequences of key-value pairs:

In [65]:
tel = dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])
tel
Out[65]:
{'guido': 4127, 'sape': 4139, 'jack': 4098}

Built-in Constants

  • False : false value of the bool type
  • True : true value of the bool type
  • None : sole value of types.NoneType. None is frequently used to represent the absence of a value.

In computer science, the Boolean or logical data type is composed by two values, true and false, intended to represent the values of logic and Boolean algebra. In Python, 1 and 0 can also be used in most situations as equivalent to the Boolean values.

Logical (Boolean) operators

and, or, not

  • and : logical AND operator. If both the operands are true then condition becomes true. (a and b) is true.
  • or : logical OR Operator. If any of the two operands are non zero then condition becomes true. (a or b) is true.
  • not : logical NOT Operator. Reverses the logical state of its operand. If a condition is true then logical NOT operator will make false.

Comparisons

The following comparison operations are supported by objects in Python:

  • == : equal
  • != : not equal
  • < : strictly less than
  • <= : less than or equal
  • > : strictly greater than
  • >= : greater than or equal
  • is : object identity
  • is not : negated object identity
In [66]:
True == False
Out[66]:
False
In [67]:
not True == False
Out[67]:
True
In [68]:
1 < 2 > 1
Out[68]:
True
In [69]:
True != (False or True)
Out[69]:
False
In [70]:
True != False or True
Out[70]:
True

Indentation and whitespace

In Python, statement grouping is done by indentation (this is mandatory), which are done by inserting whitespaces, not tabs. Indentation is also recommended for alignment of function calling that span more than one line for better clarity.
We will see examples of indentation in the next session.

Control of flow

if...elif...else

Conditional statements (to peform something if another thing is True or False) can be implemmented using the if statement:

if expression:
   statement
elif:
   statement     
else:
   statement

elif (one or more) and else are optionals.
The indentation is obligatory.
For example:

In [71]:
if True:
    pass

Which does nothing useful.

Let's use the if...elif...else statements to categorize the body mass index of a person:

In [72]:
# body mass index
weight = 100  # kg
height = 1.70  # m
bmi = weight / height**2
In [73]:
if bmi < 15:
    c = 'very severely underweight'
elif 15 <= bmi < 16:
    c = 'severely underweight'
elif 16 <= bmi < 18.5:
    c = 'underweight'
elif 18.5 <= bmi < 25:
    c = 'normal'
elif 25 <= bmi < 30:
    c = 'overweight'
elif 30 <= bmi < 35:
    c = 'moderately obese'
elif 35 <= bmi < 40:
    c = 'severely obese'
else:
    c = 'very severely obese'
    
print('For a weight of {0:.1f} kg and a height of {1:.2f} m,\n\
the body mass index (bmi) is {2:.1f} kg/m2,\nwhich is considered {3:s}.'\
      .format(weight, height, bmi, c))
For a weight of 100.0 kg and a height of 1.70 m,
the body mass index (bmi) is 34.6 kg/m2,
which is considered moderately obese.

for

The for statement iterates over a sequence to perform operations (a loop event).

for iterating_var in sequence:
    statements
In [12]:
for i in [3, 2, 1, 'go!']:
    print(i),
3
2
1
go!
In [13]:
for letter in 'Python':
    print(letter),
P
y
t
h
o
n

The range() function

The built-in function range() is useful if we need to create a sequence of numbers, for example, to iterate over this list. It generates lists containing arithmetic progressions:

In [14]:
help(range)
Help on class range in module builtins:

class range(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object
 |  
 |  Return a sequence of numbers from start to stop by step.
 |  
 |  Methods defined here:
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(self, key, /)
 |      Return self[key].
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __hash__(self, /)
 |      Return hash(self).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __le__(self, value, /)
 |      Return self<=value.
 |  
 |  __len__(self, /)
 |      Return len(self).
 |  
 |  __lt__(self, value, /)
 |      Return self<value.
 |  
 |  __ne__(self, value, /)
 |      Return self!=value.
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  __reduce__(...)
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  __reversed__(...)
 |      Return a reverse iterator.
 |  
 |  count(...)
 |      rangeobject.count(value) -> integer -- return number of occurrences of value
 |  
 |  index(...)
 |      rangeobject.index(value, [start, [stop]]) -> integer -- return index of value.
 |      Raise ValueError if the value is not present.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  start
 |  
 |  step
 |  
 |  stop

In [15]:
range(10)
Out[15]:
range(0, 10)
In [16]:
range(1, 10, 2)
Out[16]:
range(1, 10, 2)
In [17]:
for i in range(10):
    n2 = i**2
    print(n2),
0
1
4
9
16
25
36
49
64
81

while

The while statement is used for repeating sections of code in a loop until a condition is met (this different than the for statement which executes n times):

while expression:
    statement

Let's generate the Fibonacci series using a while loop:

In [18]:
# Fibonacci series: the sum of two elements defines the next
a, b = 0, 1
while b < 1000:
    print(b, end='   ')
    a, b = b, a+b
1   1   2   3   5   8   13   21   34   55   89   144   233   377   610   987   

Function definition

A function in a programming language is a piece of code that performs a specific task. Functions are used to reduce duplication of code making easier to reuse it and to decompose complex problems into simpler parts. The use of functions contribute to the clarity of the code.

A function is created with the def keyword and the statements in the block of the function must be indented:

In [19]:
def function():
    pass

As per construction, this function does nothing when called:

In [21]:
function()

The general syntax of a function definition is:

def function_name( parameters ):
   """Function docstring.

   The help for the function

   """

   function body

   return variables

A more useful function:

In [22]:
def fibo(N):
    """Fibonacci series: the sum of two elements defines the next.
    
    The series is calculated till the input parameter N and
    returned as an ouput variable.
    
    """
    
    a, b, c = 0, 1, []
    while b < N:
        c.append(b)
        a, b = b, a + b
        
    return c
In [23]:
fibo(100)
Out[23]:
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
In [24]:
if 3 > 2:
       print('teste')
teste

Let's implemment the body mass index calculus and categorization as a function:

In [12]:
def bmi(weight, height):
    """Body mass index calculus and categorization.
    
    Enter the weight in kg and the height in m.
    See http://en.wikipedia.org/wiki/Body_mass_index
    
    """
    
    bmi = weight / height**2
    
    if bmi < 15:
     c = 'very severely underweight'
    elif 15 <= bmi < 16:
        c = 'severely underweight'
    elif 16 <= bmi < 18.5:
        c = 'underweight'
    elif 18.5 <= bmi < 25:
        c = 'normal'
    elif 25 <= bmi < 30:
      c = 'overweight'
    elif 30 <= bmi < 35:
        c = 'moderately obese'
    elif 35 <= bmi < 40:
        c = 'severely obese'
    else:
        c = 'very severely obese'
     
    s = 'For a weight of {0:.1f} kg and a height of {1:.2f} m,\
    the body mass index (bmi) is {2:.1f} kg/m2,\
    which is considered {3:s}.'\
    .format(weight, height, bmi, c)
    
    print(s)
In [13]:
bmi(73, 1.70)
For a weight of 73.0 kg and a height of 1.70 m,    the body mass index (bmi) is 25.3 kg/m2,    which is considered overweight.

Numeric data manipulation with Numpy

Numpy is the fundamental package for scientific computing in Python and has a N-dimensional array package convenient to work with numerical data. With Numpy it's much easier and faster to work with numbers grouped as 1-D arrays (a vector), 2-D arrays (like a table or matrix), or higher dimensions. Let's create 1-D and 2-D arrays in Numpy:

In [87]:
import numpy as np
In [88]:
x1d = np.array([1, 2, 3, 4, 5, 6])
print(type(x1d))
x1d
<class 'numpy.ndarray'>
Out[88]:
array([1, 2, 3, 4, 5, 6])
In [89]:
x2d = np.array([[1, 2, 3], [4, 5, 6]])
x2d
Out[89]:
array([[1, 2, 3],
       [4, 5, 6]])

len() and the Numpy functions size() and shape() give information aboout the number of elements and the structure of the Numpy array:

In [90]:
print('1-d array:')
print(x1d)
print('len(x1d) = ', len(x1d))
print('np.size(x1d) = ', np.size(x1d))
print('np.shape(x1d) = ', np.shape(x1d))
print('np.ndim(x1d) = ', np.ndim(x1d))
print('\n2-d array:')
print(x2d)
print('len(x2d) = ', len(x2d))
print('np.size(x2d) = ', np.size(x2d))
print('np.shape(x2d) = ', np.shape(x2d))
print('np.ndim(x2d) = ', np.ndim(x2d))
1-d array:
[1 2 3 4 5 6]
len(x1d) =  6
np.size(x1d) =  6
np.shape(x1d) =  (6,)
np.ndim(x1d) =  1

2-d array:
[[1 2 3]
 [4 5 6]]
len(x2d) =  2
np.size(x2d) =  6
np.shape(x2d) =  (2, 3)
np.ndim(x2d) =  2

Create random data

In [91]:
x = np.random.randn(4,3)
x
Out[91]:
array([[-0.84313748,  0.7128172 ,  0.26805518],
       [ 1.66552011,  0.81119094,  0.72512201],
       [-2.6924966 ,  1.75984096,  0.58488695],
       [ 0.64630267, -0.8719821 , -0.8510621 ]])

Joining (stacking together) arrays

In [92]:
x = np.random.randint(0, 5, size=(2, 3))
print(x)
y = np.random.randint(5, 10, size=(2, 3))
print(y)
[[0 4 0]
 [3 1 2]]
[[6 9 6]
 [6 5 5]]
In [93]:
np.vstack((x,y))
Out[93]:
array([[0, 4, 0],
       [3, 1, 2],
       [6, 9, 6],
       [6, 5, 5]])
In [94]:
np.hstack((x,y))
Out[94]:
array([[0, 4, 0, 6, 9, 6],
       [3, 1, 2, 6, 5, 5]])

Create equally spaced data

In [95]:
np.arange(start = 1, stop = 10, step = 2)
Out[95]:
array([1, 3, 5, 7, 9])
In [96]:
np.linspace(start = 0, stop = 1, num = 11)
Out[96]:
array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ])

Interpolation

Consider the following data:

In [97]:
y = [5,  4, 10,  8,  1, 10,  2,  7,  1,  3]

Suppose we want to create data in between the given data points (interpolation); for instance, let's try to double the resolution of the data by generating twice as many data:

In [98]:
t = np.linspace(0, len(y), len(y))       # time vector for the original data
tn = np.linspace(0, len(y), 2 * len(y))  # new time vector for the new time-normalized data
yn = np.interp(tn, t, y)                 # new time-normalized data
yn
Out[98]:
array([ 5.        ,  4.52631579,  4.05263158,  6.52631579,  9.36842105,
        9.26315789,  8.31578947,  5.78947368,  2.47368421,  3.36842105,
        7.63157895,  8.31578947,  4.52631579,  2.78947368,  5.15789474,
        6.36842105,  3.52631579,  1.10526316,  2.05263158,  3.        ])

The key is the Numpy interp function, from its help:

interp(x, xp, fp, left=None, right=None)       
One-dimensional linear interpolation.   
Returns the one-dimensional piecewise linear interpolant to a function with given values at discrete data-points.

A plot of the data will show what we have done:

In [99]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(10,5))
plt.plot(t, y, 'bo-', lw=2, label='original data')
plt.plot(tn, yn, '.-', color=[1, 0, 0, .5], lw=2, label='interpolated')
plt.legend(loc='best', framealpha=.5)
plt.show()

Read and save files

There are two kinds of computer files: text files and binary files:

Text file: computer file where the content is structured as a sequence of lines of electronic text. Text files can contain plain text (letters, numbers, and symbols) but they are not limited to such. The type of content in the text file is defined by the Unicode encoding (a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems).

Binary file: computer file where the content is encoded in binary form, a sequence of integers representing byte values.

Let's see how to save and read numeric data stored in a text file:

Using plain Python

In [100]:
f = open("newfile.txt", "w")           # open file for writing
f.write("This is a test\n")            # save to file
f.write("And here is another line\n")  # save to file
f.close()
f = open('newfile.txt', 'r')           # open file for reading
f = f.read()                           # read from file
print(f)
This is a test
And here is another line

In [101]:
help(open)
Help on built-in function open in module io:

open(...)
    open(file, mode='r', buffering=-1, encoding=None,
         errors=None, newline=None, closefd=True, opener=None) -> file object
    
    Open file and return a stream.  Raise IOError upon failure.
    
    file is either a text or byte string giving the name (and the path
    if the file isn't in the current working directory) of the file to
    be opened or an integer file descriptor of the file to be
    wrapped. (If a file descriptor is given, it is closed when the
    returned I/O object is closed, unless closefd is set to False.)
    
    mode is an optional string that specifies the mode in which the file
    is opened. It defaults to 'r' which means open for reading in text
    mode.  Other common values are 'w' for writing (truncating the file if
    it already exists), 'x' for creating and writing to a new file, and
    'a' for appending (which on some Unix systems, means that all writes
    append to the end of the file regardless of the current seek position).
    In text mode, if encoding is not specified the encoding used is platform
    dependent: locale.getpreferredencoding(False) is called to get the
    current locale encoding. (For reading and writing raw bytes use binary
    mode and leave encoding unspecified.) The available modes are:
    
    ========= ===============================================================
    Character Meaning
    --------- ---------------------------------------------------------------
    'r'       open for reading (default)
    'w'       open for writing, truncating the file first
    'x'       create a new file and open it for writing
    'a'       open for writing, appending to the end of the file if it exists
    'b'       binary mode
    't'       text mode (default)
    '+'       open a disk file for updating (reading and writing)
    'U'       universal newline mode (deprecated)
    ========= ===============================================================
    
    The default mode is 'rt' (open for reading text). For binary random
    access, the mode 'w+b' opens and truncates the file to 0 bytes, while
    'r+b' opens the file without truncation. The 'x' mode implies 'w' and
    raises an `FileExistsError` if the file already exists.
    
    Python distinguishes between files opened in binary and text modes,
    even when the underlying operating system doesn't. Files opened in
    binary mode (appending 'b' to the mode argument) return contents as
    bytes objects without any decoding. In text mode (the default, or when
    't' is appended to the mode argument), the contents of the file are
    returned as strings, the bytes having been first decoded using a
    platform-dependent encoding or using the specified encoding if given.
    
    'U' mode is deprecated and will raise an exception in future versions
    of Python.  It has no effect in Python 3.  Use newline to control
    universal newlines mode.
    
    buffering is an optional integer used to set the buffering policy.
    Pass 0 to switch buffering off (only allowed in binary mode), 1 to select
    line buffering (only usable in text mode), and an integer > 1 to indicate
    the size of a fixed-size chunk buffer.  When no buffering argument is
    given, the default buffering policy works as follows:
    
    * Binary files are buffered in fixed-size chunks; the size of the buffer
      is chosen using a heuristic trying to determine the underlying device's
      "block size" and falling back on `io.DEFAULT_BUFFER_SIZE`.
      On many systems, the buffer will typically be 4096 or 8192 bytes long.
    
    * "Interactive" text files (files for which isatty() returns True)
      use line buffering.  Other text files use the policy described above
      for binary files.
    
    encoding is the name of the encoding used to decode or encode the
    file. This should only be used in text mode. The default encoding is
    platform dependent, but any encoding supported by Python can be
    passed.  See the codecs module for the list of supported encodings.
    
    errors is an optional string that specifies how encoding errors are to
    be handled---this argument should not be used in binary mode. Pass
    'strict' to raise a ValueError exception if there is an encoding error
    (the default of None has the same effect), or pass 'ignore' to ignore
    errors. (Note that ignoring encoding errors can lead to data loss.)
    See the documentation for codecs.register or run 'help(codecs.Codec)'
    for a list of the permitted encoding error strings.
    
    newline controls how universal newlines works (it only applies to text
    mode). It can be None, '', '\n', '\r', and '\r\n'.  It works as
    follows:
    
    * On input, if newline is None, universal newlines mode is
      enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
      these are translated into '\n' before being returned to the
      caller. If it is '', universal newline mode is enabled, but line
      endings are returned to the caller untranslated. If it has any of
      the other legal values, input lines are only terminated by the given
      string, and the line ending is returned to the caller untranslated.
    
    * On output, if newline is None, any '\n' characters written are
      translated to the system default line separator, os.linesep. If
      newline is '' or '\n', no translation takes place. If newline is any
      of the other legal values, any '\n' characters written are translated
      to the given string.
    
    If closefd is False, the underlying file descriptor will be kept open
    when the file is closed. This does not work when a file name is given
    and must be True in that case.
    
    A custom opener can be used by passing a callable as *opener*. The
    underlying file descriptor for the file object is then obtained by
    calling *opener* with (*file*, *flags*). *opener* must return an open
    file descriptor (passing os.open as *opener* results in functionality
    similar to passing None).
    
    open() returns a file object whose type depends on the mode, and
    through which the standard file operations such as reading and writing
    are performed. When open() is used to open a file in a text mode ('w',
    'r', 'wt', 'rt', etc.), it returns a TextIOWrapper. When used to open
    a file in a binary mode, the returned class varies: in read binary
    mode, it returns a BufferedReader; in write binary and append binary
    modes, it returns a BufferedWriter, and in read/write mode, it returns
    a BufferedRandom.
    
    It is also possible to use a string or bytearray as a file for both
    reading and writing. For strings StringIO can be used like a file
    opened in a text mode, and for bytes a BytesIO can be used like a file
    opened in a binary mode.

Using Numpy

In [102]:
import numpy as np
data = np.random.randn(3,3)
np.savetxt('myfile.txt', data, fmt="%12.6G")    # save to file
data = np.genfromtxt('myfile.txt', unpack=True) # read from file
data
Out[102]:
array([[-0.113108,  0.510521, -0.240193],
       [-0.448538, -0.348518, -1.97899 ],
       [ 0.21628 , -0.572211, -1.49621 ]])

Ploting with matplotlib

Matplotlib is the most-widely used packge for plotting data in Python. Let's see some examples of it.

In [25]:
import matplotlib.pyplot as plt

Use the IPython magic %matplotlib inline to plot a figure inline in the notebook with the rest of the text:

In [27]:
%matplotlib notebook
C:\Miniconda3\lib\site-packages\IPython\kernel\__init__.py:13: ShimWarning: The `IPython.kernel` package has been deprecated. You should import from ipykernel or jupyter_client instead.
  "You should import from ipykernel or jupyter_client instead.", ShimWarning)
In [28]:
import numpy as np
In [29]:
t = np.linspace(0, 0.99, 100)
x = np.sin(2 * np.pi * 2 * t) 
n = np.random.randn(100) / 5
plt.Figure(figsize=(12,8))
plt.plot(t, x, label='sine', linewidth=2)
plt.plot(t, x + n, label='noisy sine', linewidth=2)
plt.annotate(s='$sin(4 \pi t)$', xy=(.2, 1), fontsize=20, color=[0, 0, 1])
plt.legend(loc='best', framealpha=.5)
plt.xlabel('Time [s]')
plt.ylabel('Amplitude')
plt.title('Data plotting using matplotlib')
plt.show()

Use the IPython magic %matplotlib qt to plot a figure in a separate window (from where you will be able to change some of the figure proprerties):

In [23]:
%matplotlib qt
In [24]:
mu, sigma = 10, 2
x = mu + sigma * np.random.randn(1000)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
ax1.plot(x, 'ro')
ax1.set_title('Data')
ax1.grid()

n, bins, patches = ax2.hist(x, 25, normed=True, facecolor='r') # histogram
ax2.set_xlabel('Bins')
ax2.set_ylabel('Probability')
ax2.set_title('Histogram')
fig.suptitle('Another example using matplotlib', fontsize=18, y=1)
ax2.grid()

plt.tight_layout()
plt.show()

And a window with the following figure should appear:

In [110]:
from IPython.display import Image
Image(url="./../images/plot.png")
Out[110]:

You can switch back and forth between inline and separate figure using the %matplotlib magic commands used above. There are plenty more examples with the source code in the matplotlib gallery.

In [25]:
# get back the inline plot
%matplotlib inline

Signal processing with Scipy

The Scipy package has a lot of functions for signal processing, among them: Integration (scipy.integrate), Optimization (scipy.optimize), Interpolation (scipy.interpolate), Fourier Transforms (scipy.fftpack), Signal Processing (scipy.signal), Linear Algebra (scipy.linalg), and Statistics (scipy.stats). As an example, let's see how to use a low-pass Butterworth filter to attenuate high-frequency noise and how the differentiation process of a signal affects the signal-to-noise content. We will also calculate the Fourier transform of these data to look at their frequencies content.

In [112]:
from scipy.signal import butter, filtfilt
import scipy.fftpack
freq = 100.
t = np.arange(0,1,.01);
w = 2*np.pi*1 # 1 Hz
y = np.sin(w*t)+0.1*np.sin(10*w*t)
# Butterworth filter
b, a = butter(4, (5/(freq/2)), btype = 'low')
y2 = filtfilt(b, a, y)
# 2nd derivative of the data
ydd = np.diff(y,2)*freq*freq   # raw data
y2dd = np.diff(y2,2)*freq*freq # filtered data
# frequency content 
yfft = np.abs(scipy.fftpack.fft(y))/(y.size/2);   # raw data
y2fft = np.abs(scipy.fftpack.fft(y2))/(y.size/2); # filtered data
freqs = scipy.fftpack.fftfreq(y.size, 1./freq)
yddfft = np.abs(scipy.fftpack.fft(ydd))/(ydd.size/2);
y2ddfft = np.abs(scipy.fftpack.fft(y2dd))/(ydd.size/2);
freqs2 = scipy.fftpack.fftfreq(ydd.size, 1./freq)

And the plots:

In [115]:
fig, ((ax1,ax2),(ax3,ax4)) = plt.subplots(2, 2, figsize=(10, 4))

ax1.set_title('Temporal domain', fontsize=14)
ax1.plot(t, y, 'r', linewidth=2, label = 'raw data')
ax1.plot(t, y2, 'b', linewidth=2, label = 'filtered @ 5 Hz')
ax1.set_ylabel('f')
ax1.legend(frameon=False, fontsize=12)

ax2.set_title('Frequency domain', fontsize=14)
ax2.plot(freqs[:yfft.size/4], yfft[:yfft.size/4],'r',  lw=2,label='raw data')
ax2.plot(freqs[:yfft.size/4],y2fft[:yfft.size/4],'b--',lw=2,label='filtered @ 5 Hz')
ax2.set_ylabel('FFT(f)')
ax2.legend(frameon=False, fontsize=12)

ax3.plot(t[:-2], ydd, 'r', linewidth=2, label = 'raw')
ax3.plot(t[:-2], y2dd, 'b', linewidth=2, label = 'filtered @ 5 Hz')
ax3.set_xlabel('Time [s]'); ax3.set_ylabel("f ''")

ax4.plot(freqs[:yddfft.size/4], yddfft[:yddfft.size/4], 'r', lw=2, label = 'raw')
ax4.plot(freqs[:yddfft.size/4],y2ddfft[:yddfft.size/4],'b--',lw=2, label='filtered @ 5 Hz')
ax4.set_xlabel('Frequency [Hz]'); ax4.set_ylabel("FFT(f '')")
plt.show()

Symbolic mathematics with Sympy

Sympy is a package to perform symbolic mathematics in Python. Let's see some of its features:

In [19]:
from IPython.display import display
import sympy as sym
from sympy.interactive import printing
printing.init_printing()

Define some symbols and the create a second-order polynomial function (a.k.a., parabola):

In [20]:
x, y = sym.symbols('x y')
y = x**2 - 2*x - 3
y
Out[20]:
$$x^{2} - 2 x - 3$$

Plot the parabola at some given range:

In [21]:
from sympy.plotting import plot
%matplotlib inline
plot(y, (x, -3, 5));

And the roots of the parabola are given by:

In [120]:
sym.solve(y, x)
Out[120]:
$$\begin{bmatrix}-1, & 3\end{bmatrix}$$

We can also do symbolic differentiation and integration:

In [121]:
dy = sym.diff(y, x)
dy
Out[121]:
$$2 x - 2$$
In [122]:
sym.integrate(dy, x)
Out[122]:
$$x^{2} - 2 x$$

For example, let's use Sympy to represent three-dimensional rotations. Consider the problem of a coordinate system xyz rotated in relation to other coordinate system XYZ. The single rotations around each axis are illustrated by:

In [123]:
from IPython.display import Image
Image(url="./../images/rotations.png")
Out[123]:

The single 3D rotation matrices around Z, Y, and X axes can be expressed in Sympy:

In [124]:
from IPython.core.display import Math
from sympy import symbols, cos, sin, Matrix, latex
a, b, g = symbols('alpha beta gamma')

RX = Matrix([[1, 0, 0], [0, cos(a), -sin(a)], [0, sin(a), cos(a)]])
display(Math(latex('\\mathbf{R_{X}}=') + latex(RX, mat_str = 'matrix')))

RY = Matrix([[cos(b), 0, sin(b)], [0, 1, 0], [-sin(b), 0, cos(b)]])
display(Math(latex('\\mathbf{R_{Y}}=') + latex(RY, mat_str = 'matrix')))

RZ = Matrix([[cos(g), -sin(g), 0], [sin(g), cos(g), 0], [0, 0, 1]])
display(Math(latex('\\mathbf{R_{Z}}=') + latex(RZ, mat_str = 'matrix')))
$$\mathbf{R_{X}}=\left[\begin{matrix}1 & 0 & 0\\0 & \cos{\left (\alpha \right )} & - \sin{\left (\alpha \right )}\\0 & \sin{\left (\alpha \right )} & \cos{\left (\alpha \right )}\end{matrix}\right]$$
$$\mathbf{R_{Y}}=\left[\begin{matrix}\cos{\left (\beta \right )} & 0 & \sin{\left (\beta \right )}\\0 & 1 & 0\\- \sin{\left (\beta \right )} & 0 & \cos{\left (\beta \right )}\end{matrix}\right]$$
$$\mathbf{R_{Z}}=\left[\begin{matrix}\cos{\left (\gamma \right )} & - \sin{\left (\gamma \right )} & 0\\\sin{\left (\gamma \right )} & \cos{\left (\gamma \right )} & 0\\0 & 0 & 1\end{matrix}\right]$$

And using Sympy, a sequence of elementary rotations around X, Y, Z axes is given by:

In [125]:
RXYZ = RZ*RY*RX
display(Math(latex('\\mathbf{R_{XYZ}}=') + latex(RXYZ, mat_str = 'matrix')))
$$\mathbf{R_{XYZ}}=\left[\begin{matrix}\cos{\left (\beta \right )} \cos{\left (\gamma \right )} & \sin{\left (\alpha \right )} \sin{\left (\beta \right )} \cos{\left (\gamma \right )} - \sin{\left (\gamma \right )} \cos{\left (\alpha \right )} & \sin{\left (\alpha \right )} \sin{\left (\gamma \right )} + \sin{\left (\beta \right )} \cos{\left (\alpha \right )} \cos{\left (\gamma \right )}\\\sin{\left (\gamma \right )} \cos{\left (\beta \right )} & \sin{\left (\alpha \right )} \sin{\left (\beta \right )} \sin{\left (\gamma \right )} + \cos{\left (\alpha \right )} \cos{\left (\gamma \right )} & - \sin{\left (\alpha \right )} \cos{\left (\gamma \right )} + \sin{\left (\beta \right )} \sin{\left (\gamma \right )} \cos{\left (\alpha \right )}\\- \sin{\left (\beta \right )} & \sin{\left (\alpha \right )} \cos{\left (\beta \right )} & \cos{\left (\alpha \right )} \cos{\left (\beta \right )}\end{matrix}\right]$$

Suppose there is a rotation only around X ($\alpha$) by $\pi/2$; we can get the numerical value of the rotation matrix by substituing the angle values:

In [127]:
r = RXYZ.subs({a: np.pi/2, b: 0, g: 0})
r
Out[127]:
$$\left[\begin{matrix}1 & 0 & 0\\0 & 6.12323399573677 \cdot 10^{-17} & -1.0\\0 & 1.0 & 6.12323399573677 \cdot 10^{-17}\end{matrix}\right]$$

And we can prettify this result:

In [128]:
display(Math(latex(r'\mathbf{R_{(\alpha=\pi/2)}}=') +
             latex(r.n(chop=True, prec=3), mat_str = 'matrix')))
$$\mathbf{R_{(\alpha=\pi/2)}}=\left[\begin{matrix}1.0 & 0 & 0\\0 & 0 & -1.0\\0 & 1.0 & 0\end{matrix}\right]$$

For more about Sympy, see http://docs.sympy.org/latest/tutorial/.

Data analysis with pandas

"pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python."

To work with labellled data, pandas has a type called DataFrame (basically, a matrix where columns and rows have may names and may be of different types) and it is also the main type of the software R. Fo ezample:

In [129]:
import pandas as pd
In [130]:
x = 5*['A'] + 5*['B']
x
Out[130]:
['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B']
In [131]:
df = pd.DataFrame(np.random.rand(10,2), columns=['Level 1', 'Level 2'] )
df['Group'] = pd.Series(['A']*5 + ['B']*5)
plot = df.boxplot(by='Group')
In [132]:
from pandas.tools.plotting import scatter_matrix
df = pd.DataFrame(np.random.randn(100, 3), columns=['A', 'B', 'C'])
plot = scatter_matrix(df, alpha=0.5, figsize=(8, 6), diagonal='kde')

pandas is aware the data is structured and give you basic statistics considerint that and nicely formatted:

In [133]:
df.describe()
Out[133]:
A B C
count 100.000000 100.000000 100.000000
mean 0.176822 -0.028189 0.079938
std 0.962097 1.056000 0.955360
min -1.909003 -2.515748 -2.564882
25% -0.434171 -0.851826 -0.531646
50% 0.113800 0.053892 0.088606
75% 0.661160 0.782833 0.600897
max 2.673605 1.989213 2.621580

For more on pandas, see this tutorial: http://pandas.pydata.org/pandas-docs/stable/10min.html.

To learn more about Python

There is a lot of good material in the internet about Python for scientific computing, here is a small list of interesting stuff: