The cell below loads the visual style of the notebook when run.
In [ ]:
from IPython.core.display import HTML
css_file = './styles/styles.css'
HTML(open(css_file, "r").read())

SciPy - Scientific algorithms for Python

Learning Objectives

  • Use SciPy for
    • Integration
    • Interpolation
    • Fourier analysis and period finding
    • Special Functions
  • Where to find help on scipy

So far we've seen how to read and write data to text files, how to plot data and how to do some basic analysis with NumPy. SciPy provides a wide array of tools for the scientist.

Scipy Overview

Some of the topics that SciPy covers are:

As well as the links above, a good place to get started with SciPy topics is the tutorial, which has sections on each topic.

We will come back to many of these topics later in the course. Specifically there will be lessons that focus on fitting models to data and solving differential equations. In this practical, I want to briefly look at some of the functionality within SciPy to tackle some common tasks we might encounter as astronomers. The idea is to get an idea of the kinds of things that SciPy can do, rather than exhaustively explore a single area.

I hope that this lecture will not be two overwhelming - it's purpose is to impress on you that a small amount of Python code, powered by SciPy, can acheive a lot of complex tasks - try not to worry if there are one or two things you do not understand...


Integration

Suppose we want to evaluate

$$ \int_a^b f(x)\, dx. $$

If $f(x)$ can be integrated analytically all is fine. What if there is no analytical solution? Or if we don't know the function $f(x)$ at all, but just have a series of measures of it's value? Numerical evaluation of an integral is called numerical quadrature, and scipy provides a series of functions for this - the quad, dblquad and tplquad for single, double and triple integrals respectively.

In [ ]:
from scipy.integrate import quad, dblquad, tplquad

Thes functions have a large number of optional arguments which alter the behaviour of the function (try quad? for example).

Basic usage works like this:

In [ ]:
# define a simple function for the integrand - in this case y = x
def f(x):
    return x
In [ ]:
x_lower = 2 # lower limit of integral
x_upper = 4 # upper limit

value, abserr = quad(f, x_lower, x_upper) # integrate numerically!

print ("The integral value is {}, with absolute error {}".format(value,abserr))

Thus, numerical integration of simple functions is pretty easy.

One optional argument is worth considering. The args keyword allows us to deal with integrand functions that take more than one argument. For example, the function below calculates $x$ raised to some, user-defined, power.

In [ ]:
# integrand function with two arguments
def f(x,power):
    return x**power

Let's say we want to use this to integrate $x^2$. We want the second argument to the function to be 2. We would integrate it using quad like so:

In [ ]:
x_lower = 3
x_upper = 6

#args is a list of the extra arguments to integrand function
value, abserr = quad(f,x_lower,x_upper, args=(2,) )  

print ("The integral value is {}, with absolute error {}".format(value,abserr))

Integrating under data

What if we don't know the underlying functional form, but just have a series of samples of the data? As an example, below we have some measurements of the radiation dose rate a spacecraft is receiving at different times. Suppose want to work out the total dose?

In [ ]:
import numpy as np
# refer to session 2 if you don't understand the next 3 lines!
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('bmh')

# t = time of measurements, in hours
# G = measured radiation at spacecraft, in Grays/hour
data = np.loadtxt('data/radiation_dose.txt')
t = data[:,0] # first column
G = data[:,1] # second column

fig,ax = plt.subplots()
ax.plot(t, G, 'ro') # plot with red circles
ax.plot(t, G, 'k-') # and a black line
ax.grid(True)
ax.set_xlabel('Time (hours)')
ax.set_ylabel('Radiation dose rate (Grays/hour)')
plt.show()

The total radiation received is the area under this curve. In the past you've probably found this by hand, by using the trapezium rule. Actually, it would not be too hard to write a function to integrate this, but why bother, when SciPy has this coded up for you, along with the (usually) more accurate Simpsons rule.

Add it up

Make a crude estimate by-eye of the area under the curve. Now, look at the online documentation for SciPy's version of Simpson's rule. Use Simpson's rule to estimate the area under the curve, and thus the total radiation dose...

In [ ]:
from scipy.integrate import simps

total_dose = 0.0
# YOUR CODE HERE
print ('Total dose = {:.1f} Grays'.format(total_dose))

To infinity and beyond

quad can accept infinite limits to the integral, by passing np.inf as the limit.

Calculate the integral

$$ E_n(x) = \int_1^\infty \frac{e^{-xt}}{t^n}\, dt $$

For $n=1$ and $x=0.5$

Hint: I have provided the integrand function already - you'll need to use the args keyword to make sure $n$ and $x$ have the correct values

In [ ]:
 def integrand(t, n, x):
    return np.exp(-x*t) / t**n
# YOUR CODE HERE

Interpolation

Interpolation is very easy in scipy. The interp1d function takes arrays of x and y data and returns a function that can be called for an arbitrary value of x, and returns the interpolated y value. Consider our spacecraft above:

In [ ]:
from scipy.interpolate import interp1d

# get a function that does linear interpolation
# arguments are 'x' and 'y' for the curve we wish to 
# interpolate
linear_interpolation = interp1d(t,G)

# call it to find dose at t=120
val = linear_interpolation(120.0)

print("Dose on spacecraft at 120 hours = {} Grays/hr".format(val))

# the value to calculate at can be an array, or list of values
# the function will return the interpolated value at each position
#in the list
print(linear_interpolation([120,130]))

The interp1d function takes a kind argument that allows you to set the type of interpolation:

In [ ]:
# make a fine grid of 1000 values
finely_spaced_t_values = np.linspace(t.min(), t.max(), 1000) 

# get a function that does linear interpolation
linear_interpolation = interp1d(t,G)
# call it
G_lin_interp = linear_interpolation(finely_spaced_t_values)

cubic_interpolation = interp1d(t,G,kind='cubic')
G_cub_interp = cubic_interpolation(finely_spaced_t_values)

# plot
fig,ax = plt.subplots(figsize=(10,5))
ax.plot(t, G, 'ro') # plot with red circles
ax.plot(finely_spaced_t_values, G_lin_interp, 'k--') # dashed line
ax.plot(finely_spaced_t_values, G_cub_interp, 'k:') # dotted line


# format plot
ax.grid(True)
ax.axis('tight')
ax.set_xlabel('Time (hours)')
ax.set_ylabel('Radiation dose rate (Grays/hour)')
plt.show()

Fourier transform and period finding

Fourier transforms are one of the most widespread tools in physics and astronomy. SciPy provides access to the efficient and well tested fourier transform library FFTPACK.

To use the fftpack module, import it with

In [ ]:
from scipy import fftpack

In astronomy, one important use of Fourier transforms is to find periodic signals in data, such as the period of a pulsating or rotating star. This works because the Fourier transform of an infinitely long sine wave is a delta function at the period of the wave. Let's look at an example. First, we'll load and plot the data:

In [ ]:
time, flux = np.loadtxt('data/lightcurve.txt',unpack=True)
fig,ax = plt.subplots()
ax.plot(time,flux,lw=1) # thinner linewidth
ax.set_xlabel('Time (hours)')
ax.set_ylabel('Flux')
plt.show()

We can see that it's a periodically variable star, and the period looks to be around 5 hours. Let's calculate a fourier transform to find the period:

In [ ]:
# calculate the fast fourier transform 
# gives an array of fourier transform values at each frequency
F = fftpack.rfft(flux)
# these values can be positive or negative, we only care about the size
F = np.abs(F)

#calculate the corresponding frequencies
N = len(time) # number of times
dt = time[1]-time[0] # time step between times
freq = fftpack.rfftfreq(N,dt)

#plot the fourier transform
fig,ax = plt.subplots()
ax.plot(freq,F)
ax.set_xlabel('Frequency (1/hour)')
ax.set_ylabel('Power')
ax.set_xlim(0,5)
plt.show()

As expected, the Fourier transform has a sharp peak around a frequency $f = 0.2$ hour$^{-1}$, which corresponds to a period $P = 1/f = 5$ hours. However, note the small second peak - this is telling us there is a second period present in the data at $\sim3$ hours, which we would have missed looking at the data by-eye!

The Lomb-Scargle periodogram

The Fourier transform only works on regularly spaced data. The Lomb-Scargle periodogram is a classic method for finding periodicity in irregularly-sampled data. Scipy has an implementation of the Lomb-Scargle periodogram. You can find a full tutorial on it's use here


Special Functions

We have only skimmed the surface of scipy in this lecture. Later we'll see examples of using scipy to solve differential equations, and fit models to data. Before we finish, I want to mention the special function module within scipy.

Special functions are mathematical functions which have established names because they are important to mathematics or physics. Many special functions are the solutions to differential equations or integrals.

To demonstrate the special functions I will use the Bessel functions. Bessel functions are the functions $y(x)$ which solve Bessel's equation

$$ x^2 \frac{d^2y}{dx^2} + x \frac{dy}{dx} + (x^2 - n^2) y = 0. $$

The solutions to this equation are called Bessel functions of the first kind. The solution for a given $n$ is denoted $J_n(x)$. No simple expressions for them exist, but they crop up frequently in astronomy - from studies of planetary dynamics, to calculating the rotation curves of galaxies. Therefore it is useful to have computer algorithms for calculating their value at a given value of $x$. Naturally, scipy contains these algorithms!

We use them in scipy as follows:

In [ ]:
from scipy.special import jn, jn_zeros
#
# The scipy.special module includes a large number of Bessel-functions
# We only use jn, which are the Bessel functions of the first kind 
# We also include the function jn_zeros that gives the locations of 
# the zeroes of the function jn
#

value = jn(0, 1.5)
print ("J_0(1.5) = {:.2f}".format( value ))

Let's use SciPy to plot the Bessel functions from $n=0...3$:

In [ ]:
x = np.linspace(0,10,100)
fig,ax = plt.subplots()
for n in range(4):
    ax.plot(x, jn(n, x), label=r'$J_{}(x)$'.format(n))
ax.legend()
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()

And we can use the jn_zeros function to find the values of $x$ for which the Bessel functions are 0:

In [ ]:
# zeros of Bessel functions
n = 1 # 1st order Bessel Function
m = 3 # number of zeros to compute
print( jn_zeros(n,m) )

Homework #2

Diffraction: The Airy Rings

One of the places Bessel functions crop up in astronomy is diffraction from a circular aperture. As you recall, we have seen that the finite aperture causes diffraction. In turn this means that stars cannot be point sources in astronomical images. Instead, in the absence of blurring from seeing, the star produces a diffraction pattern, known as an Airy disc.

It is possible to show that, for an aperture of diameter $D$, the diffraction pattern produced at an angle $\theta$ has the form

$$ I(\theta) = I_0 \left( \frac{2J_1(x)}{x} \right)^2,$$

where

$$ x = \frac{\pi D}{\lambda} \sin \theta.$$

In the equations above, $\lambda$ is the wavelength of light, $I_0$ is the intensity when $\theta=0$ and $J_1$ is the Bessel function of the Bessel function of the first kind and order 1.

Question 1: 2 points

Complete the code below to write a function that calculates the diffraction pattern from a telescope. Make sure your function passes the tests below.

Hint: read the documentation of the function with care, and think hard about units.

In [ ]:
def diffraction(D,w,theta):
    """Calculate the intensity of the diffraction pattern from a telescope
    
    Args:
        D (float): the diameter of the telescope, in metres
        w (float): the wavelength of light, in metres
        theta (float): the angle from the centre of the star, IN ARCSECONDS
        
    Returns:
        float: the intensity of the diffraction pattern, normalised to unity at theta=0
    """
    # YOUR CODE HERE
    raise NotImplementedError()
In [ ]:
from nose.tools import assert_equals, assert_almost_equal
assert_almost_equal(diffraction(0.2,550e-9,0.5), 0.09191596586)

Question 2: 1 point

Plot the diffraction pattern for a 1m diameter telescope observing at a wavelength of 550 nm. You should calculate plot the diffraction pattern between 0 and 1 arcsecond.

Hint: the function you have written will probably return NaN (not a number) for $\theta=0$. Make sure you start calculating the diffraction pattern value at a very small, but non-zero value of $\theta$.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

Question 3: 3 points

Below you have two cells, a code cell and a markdown cell. In the code cell, use the jn_zeros function to find the value of $x$ where the diffraction pattern has it's first zero. In the markdown cell, show that this means that the first zero lies at an angle

$$\theta = 1.22 \frac{\lambda}{D}.$$

In the markdown cell, briefly explain why this leads to the Rayleigh criterion for resolving two objects.

Hint: you might want to look over the bootcamp section on writing equations in markdown

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR ANSWER HERE

Question 4a: 1 point

In practice, we hardly ever reach the diffraction limit of a telescope. This is because the image is further blurred by turbulence in the atmosphere, or seeing.

Looking at the linked notes, we can see that one way to think about seeing is that large aperture telescopes collect wavefronts of light with many tiny corrugations. Each straight section of the wavefront produces a little Airy discs, which is shifted in position a small amount due to the tilt of the wavefront. In a single exposure, all these little Airy discs are added together to make the final image of a star.

In the cells below, you will simulate this process for a 1m telescope observing at 550 nm. This is fairly complex, so I will guide you through it step by step.

To start with, we need to modify the Airy disc function above to accept a fourth argument, which will be the amount the Airy disc is shifted due to seeing, measured in arcseconds. Make sure the modified function passes the tests below before proceeding!

In [ ]:
def diffraction(D,w,theta,theta_off):
    """Calculate the intensity of the diffraction pattern from a telescope
    
    Args:
        D (float): the diameter of the telescope, in metres
        w (float): the wavelength of light, in metres
        theta (float): the angle from the centre of the star, IN ARCSECONDS
        theta_off (float): an amount to shift the diffraction pattern by, IN ARCSECONDS
        
    Returns:
        float: the intensity of the diffraction pattern, normalised to unity at theta=0
    """
# YOUR CODE HERE
raise NotImplementedError()
In [ ]:
from nose.tools import assert_equals, assert_almost_equal
assert_almost_equal(diffraction(0.2,550e-9,0.5,0.0), 0.09191596586)
assert_almost_equal(diffraction(0.2,550e-9,0.5,0.5001), 0.999999923313)

Question 4b: 3 points

Next we will simulate the effect of seeing by creating 50,000 little Airy disc patterns, each offset by a random amount and adding them together to find the resulting stellar profile. To help you, I explain how to do this in more detail:

  1. Use np.linspace to make an x grid between 0 and 2 arcseconds, with 100 steps
  2. Use np.zeros to create an array of zeros with 100 entries. We will use this to store our stellar profile as it builds up.
  3. Create a matplotlib figure and axis
  4. Use a for loop to do the following 50,000 times:
    • Simulate the effect of 0.5 arcsecond seeing by calculating a random number, which is normally distributed with a mean of 0 and a standard deviation of 0.5. This random number will be the shift in this Airy disc pattern, caused by the seeing.
    • Using this random number as the offset argument, calculate the diffraction pattern
    • each time around the for loop, add your diffraction pattern into the total diffraction pattern array
  5. Plot the total diffraction pattern
In [ ]:
# YOUR CODE HERE
raise NotImplementedError()

Extra Credit (4 points)

So far everything we have done has been in 1D. For extra credit this week, I want you to calculate and plot the Airy disc in two dimensions. If you've used numpy functions for your diffraction function it should work equally well when the theta argument is a scalar, a 1D array or a 2D array.

The challenge, then, is to create a 2D array of theta values where each entry in the array is the distance (in arcseconds) from the central array entry. I won't give you too much help here - so unleash your Google skills! The only hint I'll give you is that part of the solution is to use numpy's meshgrid function.

When you display your 2D Airy disc, try looking at the help for matplotlib's imshow function and use the vmin and vmax arguments to highlight the outer structure in the Airy disc.

In [ ]:
# YOUR CODE HERE
raise NotImplementedError()