What is Python?

Python is a dynamic, strongly typed, object-oriented programming language.

In [2]:
from IPython.display import Image

Courtesy of flicky user sfroehlich1121.

Before we go into what the above means, note that there is a strong, and growing, Python community in Astronomy - for instance

  • X-ray astronomy (CIAO)

  • optical/ground-based astronomy (PyRAF)

  • radio (CASA)

  • Solar astronomy (SunPy)

It's also a useful "transfereable" skill, for those of you that end up moving out of Astronomy.

Once you've got the hang of Python - and found out that many questions you have have already been answered on Stack Overflow - the trick is in finding the package you need, rather than writing your own code. cough cough astropy cough cough

You should also be aware of R - The R project for Statistical Computing - which is not Python, but is important as Astronomy moves into the era of

In [4]:

courtesy of the internetz.


  • always active or changing

  • having or showing a lot of energy

  • of or relating to energy, motion, or physical force

At least that's what Merriam-Webster says.

In [1]:
a = 23
print("The answer is " + str(a))
b = a + len("confused kitty is confused")
print("The answer is now " + str(b))
a = "Goooooooooooooollllll"
print("The answer is now " + a)
The answer is 23
The answer is now 49
The answer is now Goooooooooooooollllll

Things to note:

  • you can define variables when you need them

  • you do not need to define the "type" of a variable

  • you can change what you store in a variable (one moment it's an integer, then it's a string)

Strongly typed

The operations you can do depend on the "type" of the variable (or constant).

In [2]:
len("This is a string")
In [3]:
TypeError                                 Traceback (most recent call last)
<ipython-input-3-c9328ce85f6c> in <module>()
----> 1 len(23)

TypeError: object of type 'int' has no len()


These error messages look confusing at first, but you will quickly learn what to look for: ignore everything until the last line!

On the last line you'll see "BlahError: hopefully something more informative than blah", which tells you the type of error (so in this case, it is a "TypeError"), and then an optional string giving more information, which in this case says that you have tried to use len on something that doesn't have a length.

In [4]:
x = { "foo": [1,2,3], "bar": "star" }
[1, 2, 3]
In [5]:

Note that you can use functions on variables of different types; here we have used len on a string and a dictionary, which is a fancy-schmancy array which lets you access its elements with a key value that is not an integer. Dictionaries are also called associative arrays. For those that care, Python falls into the 0-index camp; that is lists indexes start at 0, rather than 1 for those of you fortunate enough to have to work with FORTRAN.

However, you can't do math on a dictionary (a += 1 is a short form for the mathematically-confusing statement a = a + 1):

In [6]:
x += 1
TypeError                                 Traceback (most recent call last)
<ipython-input-6-9762fa126317> in <module>()
----> 1 x += 1

TypeError: unsupported operand type(s) for +=: 'dict' and 'int'

And you can't do math on a string - that is, unlike some languages, a string is a string is a string:

In [7]:
2 + "2"
TypeError                                 Traceback (most recent call last)
<ipython-input-7-716307f7c065> in <module>()
----> 1 2 + "2"

TypeError: unsupported operand type(s) for +: 'int' and 'str'

You have to explicitly convert between object types; here we use the int function to change a string into an integer.

2 + int("2")

For those that are interested, int can fail:

In [9]:
ValueError                                Traceback (most recent call last)
<ipython-input-9-1d62e5f34adf> in <module>()
----> 1 int("2.3")

ValueError: invalid literal for int() with base 10: '2.3'

and you can actually "catch" this error and handle it (e.g. when you are writing a program and want to validate user input to make sure you don't end up wasting resources or doing the wrong thing):

In [3]:
inval = raw_input("Think of a number: ")
    ival = int(inval)
    print("Yay: an integer {}".format(ival))
except ValueError:
        fval = float(inval)
        print("Oh, a real-valued number {}".format(fval))
        print("Errrr: {}".format(inval))
        raise ValueError("I did not expect to be sent '{}'".format(inval))
Think of a number: 23.2
Oh, a real-valued number 23.2

The indentation (i.e. the spaces) are important; they are what define the struture of the program (i.e. what parts run together).

Here we create a function that wraps up the above code (except for the user-input part):

In [2]:
def convert_value(inval):
    """Convert an input value (a string) into an integer or
    floating point value, returning the converted value. For
    other input a ValueError is raised."""
        val = int(inval)
        print("Yay: an integer {}".format(val))
    except ValueError:
            val = float(inval)
            print("Oh, a real-valued number {}".format(val))
            print("Errrr: {}".format(inval))
            raise ValueError("I did not expect to be sent '{}'".format(inval))
    return val

which we can then call:

In [5]:
Yay: an integer 23
In [3]:
Oh, a real-valued number 23.0
In [6]:
Errrr: 23i
ValueError                                Traceback (most recent call last)
<ipython-input-6-cd4d3ecd93df> in <module>()
----> 1 convert_value("23i")

<ipython-input-4-3657fa1efecd> in convert_value(inval)
     13         except:
     14             print("Errrr: {}".format(inval))
---> 15             raise ValueError("I did not expect to be sent '{}'".format(inval))
     17     return val

ValueError: I did not expect to be sent '23i'

Aside II

Note that the error message is longer, but actually contains some useful information, such as the line that raised the error. For more complicated cases - where an error is raised in a routine called by a routine called by a routine (yadda yadda) - you will see this path (at least up to a certain depth, where the system just stops and says that it is TMI).

Object-oriented Language

Everything in Python is an object; above we have shown strings, integers, dictionaries, and arrays. Objects are mystical containers that provide storage - e.g. the value of the integer or the key and value pairs stored in dictionaries - and functionality (also known as methods). They are an important part of Python, but for now we just care about the fact that:

  • some concepts - such as length - can be applied to many different types of objects; for the common concepts this means we can say len(x) to get its length, whether x is a string, array, dictionary, or ...

  • for specialized concepts - such as the mean of an array - you will end up writing code that looks like x.mean(), where x is the variable.

In [4]:
import numpy as np

Wait a minute; what's this all about?

Python comes with a bunch of commands and types built in, but you are almost certain to need something else. In this case, I want to load in the basic numeric-processing package, numpy, which I do so with the import command. Since I am lazy, and don't like typing numpy, I use the as np suffix to tell Python that if I say np.foo then I am referring to numpy.foo. You will find that np is a common short-form for numpy!

In [11]:
x = np.arange(1, 10, 2)
[1 3 5 7 9]

You can also use the import command to load in just a specific symbol from a package, by saying

from numpy import arange, mean
y = arange(1, 10, 2)
ym = mean(y)

or even just load everything in with the syntax

from numpy import *

but this is strongly discouraged (it makes it hard to identify what a name, particularly common ones such as mean, means).

In [12]:

You can find out the methods that are defined for x by using the dir() command:

In [13]:

Here we call the mean method, which - in this case, calculates the mean of all the values of the array - and returns it. Since I am using the IPython interactive environment, the last return value in a block is automatically displayed:

In [14]:

In a program (or script), you will want to say something like

In [15]:
m = x.mean()
print("The mean is {0}".format(m))
print("         or {:.3f}".format(m))
print("         or {:13.6e}".format(m))
print("    or even %g" % (m,))
The mean is 5.0
         or 5.000
         or  5.000000e+00
    or even 5
In [16]:
# Oh, don't forget that a # character starts a comment,
print("Unless # it is part of a string")
print('and that strings start and end with either a single or double quote') # this is back to being a comment
Unless # it is part of a string
and that strings start and end with either a single or double quote
In [2]:
"""Although you can also make really long "comments"
using the triple-quote, as shown here.

This is normally used for providing in-line documentation to
your code, but can be useful to quickly comment out a 
region of recalcitrant code, since it creates a string
which is then ignored by the program.
'Although you can also make really long "comments"\nusing the triple-quote, as shown here.\n\nThis is normally used for providing in-line documentation to\nyour code, but can be useful to quickly comment out a \nregion of recalcitrant code, since it creates a string\nwhich is then ignored by the program.\n'
In [5]:
# Let's do something
import matplotlib.pyplot as plt
x = np.linspace(0, 10, 45)      # a one-dimensional array of values
y = np.sin(x) * np.exp(-x/5)    # this calculation is done on each element of the array
In [19]:
plt.plot(x, y)
[<matplotlib.lines.Line2D at 0x7fb3b1370150>]

Oh, that isn't very informative; I need to "show" the plot.

In [20]:

You can get the image to appear inline in the notebook with some IPython magic:

In [21]:
%matplotlib inline
In [22]:
plt.plot(x, y)
[<matplotlib.lines.Line2D at 0x7fb3b013a250>]
In [25]:
plt.plot(x, y, 'r--')
plt.ylabel(r"$sin(x) * e^{-x/5}$") # use LaTeX; to avoid potential confusion with \ characters, use r"..."!
<matplotlib.text.Text at 0x7fb3affbfd90>


From IPython - whether run from the command line or using this fancy notebook interface - you can use help:

In [26]:
Help on built-in function arange in module numpy.core.multiarray:

    arange([start,] stop[, step,], dtype=None)
    Return evenly spaced values within a given interval.
    Values are generated within the half-open interval ``[start, stop)``
    (in other words, the interval including `start` but excluding `stop`).
    For integer arguments the function is equivalent to the Python built-in
    `range <http://docs.python.org/lib/built-in-funcs.html>`_ function,
    but returns an ndarray rather than a list.
    When using a non-integer step, such as 0.1, the results will often not
    be consistent.  It is better to use ``linspace`` for these cases.
    start : number, optional
        Start of interval.  The interval includes this value.  The default
        start value is 0.
    stop : number
        End of interval.  The interval does not include this value, except
        in some cases where `step` is not an integer and floating point
        round-off affects the length of `out`.
    step : number, optional
        Spacing between values.  For any output `out`, this is the distance
        between two adjacent values, ``out[i+1] - out[i]``.  The default
        step size is 1.  If `step` is specified, `start` must also be given.
    dtype : dtype
        The type of the output array.  If `dtype` is not given, infer the data
        type from the other input arguments.
    arange : ndarray
        Array of evenly spaced values.
        For floating point arguments, the length of the result is
        ``ceil((stop - start)/step)``.  Because of floating point overflow,
        this rule may result in the last element of `out` being greater
        than `stop`.
    See Also
    linspace : Evenly spaced numbers with careful handling of endpoints.
    ogrid: Arrays of evenly spaced numbers in N-dimensions.
    mgrid: Grid-shaped arrays of evenly spaced numbers in N-dimensions.
    >>> np.arange(3)
    array([0, 1, 2])
    >>> np.arange(3.0)
    array([ 0.,  1.,  2.])
    >>> np.arange(3,7)
    array([3, 4, 5, 6])
    >>> np.arange(3,7,2)
    array([3, 5])

The following will list all symbols that contain lin in the np package; since this is a notebook this appears as a nice little separate frame, listing

In [34]:

and don't forget tab completion:


will give you the option of selecting either linalg or linspace.

If you remember the function we created earlier, we can ask for help on that too:

In [8]:
Help on function convert_value in module __main__:

    Convert an input value (a string) into an integer or
    floating point value, returning the converted value. For
    other input a ValueError is raised.


Symbols - that is variables and functions - can contain alpha-numeric characters, with a few restrictions, such as no . " or ' and they can't start with a number. That is

  • Good

    • foo
    • bar
    • fooBar
    • foo_bar
    • foo1
    • foo1bar
    • _foo
    • __foo
    • __foo__
  • Bad

    • 1foo
    • foo.bar (this actually refers to bar in the foo module)
    • 123

Symbols that begin with a capital letter refer to a class; that is the name of a type.

There is an informal convention that says that symbols beginning with a single _ character are not part of the "main" interface of a module; you are free to use them but take some care. Symbols beginning with two __ characters should not be used unless you know what you are doing.

What about ()?

Python uses "()" to indicate arguments to a function - at least it does when it's straight after a "name", that is, something like mean, bob5, or temp.

Confusingly enough - although it's very useful - is that you can refer to a function without the brackets; they can even be treated as variables and passed around.

So, how do you know if you need brackets (function) or not (symbol)?

In [7]:
<function numpy.core.fromnumeric.mean>
In [9]:
In [10]:
TypeError                                 Traceback (most recent call last)
<ipython-input-10-3b4822f967ee> in <module>()
----> 1 np.mean()

TypeError: mean() takes at least 1 argument (0 given)
In [11]:
TypeError                                 Traceback (most recent call last)
<ipython-input-11-ca867d7a203c> in <module>()
----> 1 np.pi()

TypeError: 'float' object is not callable

We'll see this error message again...

If you are really stuck, then you could read the documentation...

In [8]:
Help on function mean in module numpy.core.fromnumeric:

mean(a, axis=None, dtype=None, out=None, keepdims=False)
    Compute the arithmetic mean along the specified axis.
    Returns the average of the array elements.  The average is taken over
    the flattened array by default, otherwise over the specified axis.
    `float64` intermediate and return values are used for integer inputs.
    a : array_like
        Array containing numbers whose mean is desired. If `a` is not an
        array, a conversion is attempted.
    axis : int, optional
        Axis along which the means are computed. The default is to compute
        the mean of the flattened array.
    dtype : data-type, optional
        Type to use in computing the mean.  For integer inputs, the default
        is `float64`; for floating point inputs, it is the same as the
        input dtype.
    out : ndarray, optional
        Alternate output array in which to place the result.  The default
        is ``None``; if provided, it must have the same shape as the
        expected output, but the type will be cast if necessary.
        See `doc.ufuncs` for details.
    keepdims : bool, optional
        If this is set to True, the axes which are reduced are left
        in the result as dimensions with size one. With this option,
        the result will broadcast correctly against the original `arr`.
    m : ndarray, see dtype parameter above
        If `out=None`, returns a new array containing the mean values,
        otherwise a reference to the output array is returned.
    See Also
    average : Weighted average
    std, var, nanmean, nanstd, nanvar
    The arithmetic mean is the sum of the elements along the axis divided
    by the number of elements.
    Note that for floating-point input, the mean is computed using the
    same precision the input has.  Depending on the input data, this can
    cause the results to be inaccurate, especially for `float32` (see
    example below).  Specifying a higher-precision accumulator using the
    `dtype` keyword can alleviate this issue.
    >>> a = np.array([[1, 2], [3, 4]])
    >>> np.mean(a)
    >>> np.mean(a, axis=0)
    array([ 2.,  3.])
    >>> np.mean(a, axis=1)
    array([ 1.5,  3.5])
    In single precision, `mean` can be inaccurate:
    >>> a = np.zeros((2, 512*512), dtype=np.float32)
    >>> a[0, :] = 1.0
    >>> a[1, :] = 0.1
    >>> np.mean(a)
    Computing the mean in float64 is more accurate:
    >>> np.mean(a, dtype=np.float64)

Here's an (admittedly very simple) example of treating a function as a variable (in this case a):

In [14]:
a = np.mean
[ 0.          0.22727273  0.45454545  0.68181818  0.90909091]

Beware the Ides of March

Or, at least, redefining symbols:

In [16]:
Help on built-in function range in module __builtin__:

    range(stop) -> list of integers
    range(start, stop[, step]) -> list of integers
    Return a list containing an arithmetic progression of integers.
    range(i, j) returns [i, i+1, i+2, ..., j-1]; start (!) defaults to 0.
    When step is given, it specifies the increment (or decrement).
    For example, range(4) returns [0, 1, 2, 3].  The end point is omitted!
    These are exactly the valid indices for a list of 4 elements.

In [17]:
range(1, 10, 2)
[1, 3, 5, 7, 9]

However, you can redefine it by using the name to the left of the equals sign (if your editor does syntax highlighting, as is shown here, where the range symbol is colored green, then you get a visual indication you've done something potentially catastrophic, but Python lets you do this if you want):

In [18]:
range = 23.0

This can lead to confusing error messages, such as seeing 'float' object is not callable and wondering just what on Earth is going on! It's one of the reasons why we suggest not to use the from blah import * syntax if you can help it (although, to be honest, that wouldn't help in this particular case since range is a built-in Python function, but I've spent more-than-enough time re-defining range that I thought I'd save you the trouble).

In [20]:
range(1, 10, 2)
TypeError                                 Traceback (most recent call last)
<ipython-input-20-adbe681dc9aa> in <module>()
----> 1 range(1, 10, 2)

TypeError: 'float' object is not callable

Note that the help message has changed; it now describes the help of floating-point objects (or whatever the value stored in range is):

In [19]:
Help on float object:

class float(object)
 |  float(x) -> floating point number
 |  Convert a string or number to a floating point number, if possible.
 |  Methods defined here:
 |  __abs__(...)
 |      x.__abs__() <==> abs(x)
 |  __add__(...)
 |      x.__add__(y) <==> x+y
 |  __coerce__(...)
 |      x.__coerce__(y) <==> coerce(x, y)
 |  __div__(...)
 |      x.__div__(y) <==> x/y
 |  __divmod__(...)
 |      x.__divmod__(y) <==> divmod(x, y)
 |  __eq__(...)
 |      x.__eq__(y) <==> x==y
 |  __float__(...)
 |      x.__float__() <==> float(x)
 |  __floordiv__(...)
 |      x.__floordiv__(y) <==> x//y
 |  __format__(...)
 |      float.__format__(format_spec) -> string
 |      Formats the float according to format_spec.
 |  __ge__(...)
 |      x.__ge__(y) <==> x>=y
 |  __getattribute__(...)
 |      x.__getattribute__('name') <==> x.name
 |  __getformat__(...)
 |      float.__getformat__(typestr) -> string
 |      You probably don't want to use this function.  It exists mainly to be
 |      used in Python's test suite.
 |      typestr must be 'double' or 'float'.  This function returns whichever of
 |      'unknown', 'IEEE, big-endian' or 'IEEE, little-endian' best describes the
 |      format of floating point numbers used by the C type named by typestr.
 |  __getnewargs__(...)
 |  __gt__(...)
 |      x.__gt__(y) <==> x>y
 |  __hash__(...)
 |      x.__hash__() <==> hash(x)
 |  __int__(...)
 |      x.__int__() <==> int(x)
 |  __le__(...)
 |      x.__le__(y) <==> x<=y
 |  __long__(...)
 |      x.__long__() <==> long(x)
 |  __lt__(...)
 |      x.__lt__(y) <==> x<y
 |  __mod__(...)
 |      x.__mod__(y) <==> x%y
 |  __mul__(...)
 |      x.__mul__(y) <==> x*y
 |  __ne__(...)
 |      x.__ne__(y) <==> x!=y
 |  __neg__(...)
 |      x.__neg__() <==> -x
 |  __nonzero__(...)
 |      x.__nonzero__() <==> x != 0
 |  __pos__(...)
 |      x.__pos__() <==> +x
 |  __pow__(...)
 |      x.__pow__(y[, z]) <==> pow(x, y[, z])
 |  __radd__(...)
 |      x.__radd__(y) <==> y+x
 |  __rdiv__(...)
 |      x.__rdiv__(y) <==> y/x
 |  __rdivmod__(...)
 |      x.__rdivmod__(y) <==> divmod(y, x)
 |  __repr__(...)
 |      x.__repr__() <==> repr(x)
 |  __rfloordiv__(...)
 |      x.__rfloordiv__(y) <==> y//x
 |  __rmod__(...)
 |      x.__rmod__(y) <==> y%x
 |  __rmul__(...)
 |      x.__rmul__(y) <==> y*x
 |  __rpow__(...)
 |      y.__rpow__(x[, z]) <==> pow(x, y[, z])
 |  __rsub__(...)
 |      x.__rsub__(y) <==> y-x
 |  __rtruediv__(...)
 |      x.__rtruediv__(y) <==> y/x
 |  __setformat__(...)
 |      float.__setformat__(typestr, fmt) -> None
 |      You probably don't want to use this function.  It exists mainly to be
 |      used in Python's test suite.
 |      typestr must be 'double' or 'float'.  fmt must be one of 'unknown',
 |      'IEEE, big-endian' or 'IEEE, little-endian', and in addition can only be
 |      one of the latter two if it appears to match the underlying C reality.
 |      Override the automatic determination of C-level floating point type.
 |      This affects how floats are converted to and from binary strings.
 |  __str__(...)
 |      x.__str__() <==> str(x)
 |  __sub__(...)
 |      x.__sub__(y) <==> x-y
 |  __truediv__(...)
 |      x.__truediv__(y) <==> x/y
 |  __trunc__(...)
 |      Return the Integral closest to x between 0 and x.
 |  as_integer_ratio(...)
 |      float.as_integer_ratio() -> (int, int)
 |      Return a pair of integers, whose ratio is exactly equal to the original
 |      float and with a positive denominator.
 |      Raise OverflowError on infinities and a ValueError on NaNs.
 |      >>> (10.0).as_integer_ratio()
 |      (10, 1)
 |      >>> (0.0).as_integer_ratio()
 |      (0, 1)
 |      >>> (-.25).as_integer_ratio()
 |      (-1, 4)
 |  conjugate(...)
 |      Return self, the complex conjugate of any float.
 |  fromhex(...)
 |      float.fromhex(string) -> float
 |      Create a floating-point number from a hexadecimal string.
 |      >>> float.fromhex('0x1.ffffp10')
 |      2047.984375
 |      >>> float.fromhex('-0x1p-1074')
 |      -4.9406564584124654e-324
 |  hex(...)
 |      float.hex() -> string
 |      Return a hexadecimal representation of a floating-point number.
 |      >>> (-0.1).hex()
 |      '-0x1.999999999999ap-4'
 |      >>> 3.14159.hex()
 |      '0x1.921f9f01b866ep+1'
 |  is_integer(...)
 |      Return True if the float is an integer.
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  imag
 |      the imaginary part of a complex number
 |  real
 |      the real part of a complex number
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  __new__ = <built-in method __new__ of type object>
 |      T.__new__(S, ...) -> a new object with type S, a subtype of T

What can we store?

In [21]:
# Simple scalar types
a = 23            # integer
a = -2.3e-43      # floating-point
a = 4.2+2.3j      # complex

a = "A string"    # strings
a = True          # booleans, False is also acceptable

a = None          # None is used to indicate "does not exist"/"invalid"/"undefined"/"missing"
In [23]:
# Containers
a = [1,2,3,"foo",[1,2]]                       # list (elements do not need to have the same type)
a = { "foo": 1, "bar": { "foobar": True }}    # dictionary
a = (True, 23.4, "false")                     # tuples (like lists but are immutable)

# Access elements using [...] syntax
a = [1,2,3]; print(a[1]);                     # the first element of a list is numbered 0
a = { "foo": 1 }; print(a["foo"])
a = (True, 23.4, "false"); print(a[0])        # again, first element is numbered 0


# We can change elements in a list
a = [1,2,3]
a[1] = 4
[1, 2, 3]
[1, 4, 3]
In [25]:
# but not in tuples
a = (1, 2, 3)
a[1] = 4
(1, 2, 3)
TypeError                                 Traceback (most recent call last)
<ipython-input-25-20f3bc8b98a6> in <module>()
      2 a = (1, 2, 3)
      3 print(a)
----> 4 a[1] = 4

TypeError: 'tuple' object does not support item assignment

The numpy module extends Python by providing "arrays"; these are similar in feel to Python lists but are much more efficient (quicker, use less memory) because they are not as flexible (each element has to have the same data type).

In [26]:
# np.arange is like Python's range
a = np.arange(12)
[ 0  1  2  3  4  5  6  7  8  9 10 11]
In [28]:
# The resize method works "in place" (i.e. changes the object it is called on)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
In [35]:
# The reshape method creates a copy of the input array
b = np.arange(10, 34, 2)
c = b.reshape(3,4)
[10 12 14 16 18 20 22 24 26 28 30 32]
[[10 12 14 16]
 [18 20 22 24]
 [26 28 30 32]]
In [36]:
# By default many routines will "ignore" the dimensions of the data
print("b.sum = {}".format(b.sum()))
print("c.sum = {}".format(c.sum()))
b.sum = 252
c.sum = 252
In [41]:
# Some will work as "reducers" - i.e. here we sum along
# one axis, so converting a 2D array into a 1D one.
print("c.sum[axis=0] = {}".format(c.sum(axis=0)))
print("c.sum[axis=1] = {}".format(c.sum(axis=1)))
c.sum[axis=0] = [54 60 66 72]
c.sum[axis=1] = [ 52  84 116]

I'm sold

You can start an interactive Python session with


or, to run in a notebook (browser)

ipython notebook

assuming you have a Python installation available, such as that provided by Anaconda.