Week 05 - Numpy II and Scipy

Today's Agenda

  • Numpy II
  • Scipy

Numpy II

Last time in Week 05, we covered Numpy and Matplotlib. This time we will be focusing on more advanced concepts of Numpy.

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

Review

As a review, let's explore some of the concepts that were introduced last time, in Numpy I.

Create 1D-arrays

We introduced how to create a 1D-array

In [2]:
x = np.array([1,2,3,5,6,7,8,10],dtype=float)
x
Out[2]:
array([  1.,   2.,   3.,   5.,   6.,   7.,   8.,  10.])
In [3]:
y = np.arange(10)
y
Out[3]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [4]:
z = np.linspace(0,100,50)
z
Out[4]:
array([   0.        ,    2.04081633,    4.08163265,    6.12244898,
          8.16326531,   10.20408163,   12.24489796,   14.28571429,
         16.32653061,   18.36734694,   20.40816327,   22.44897959,
         24.48979592,   26.53061224,   28.57142857,   30.6122449 ,
         32.65306122,   34.69387755,   36.73469388,   38.7755102 ,
         40.81632653,   42.85714286,   44.89795918,   46.93877551,
         48.97959184,   51.02040816,   53.06122449,   55.10204082,
         57.14285714,   59.18367347,   61.2244898 ,   63.26530612,
         65.30612245,   67.34693878,   69.3877551 ,   71.42857143,
         73.46938776,   75.51020408,   77.55102041,   79.59183673,
         81.63265306,   83.67346939,   85.71428571,   87.75510204,
         89.79591837,   91.83673469,   93.87755102,   95.91836735,
         97.95918367,  100.        ])
In [5]:
h = np.random.randn(100)
h
Out[5]:
array([ 0.83750222,  0.72756919,  0.46886074, -2.62190382,  1.42631581,
       -0.67845114,  1.55721883,  0.75780596,  0.33130481, -0.84227672,
       -0.67615973, -0.80770128, -1.06770136,  0.8316753 , -1.48003128,
       -0.62884149,  1.44292394,  1.6873517 , -0.39379518,  0.20679185,
       -0.60037733, -1.13283498, -0.57879882, -1.15478064, -1.02835363,
       -0.27996954, -0.48096109,  0.1021943 ,  1.07518864,  0.63877344,
       -0.3250393 , -0.45777746,  0.74853747, -0.01529722,  1.24361657,
       -0.04615094, -0.42099845, -0.49830421, -0.63905389, -1.41180759,
        0.50483609,  0.37209184, -0.54757462, -0.07875835, -0.77888681,
        0.30262381, -0.29802849, -0.75109444, -1.38946554, -0.68889992,
       -1.16205261, -2.14828345,  0.83808458, -1.12204052,  0.79596845,
        1.12354065,  0.61477048, -0.96238095, -0.22881613, -0.4057673 ,
        0.91075627, -2.64430738,  0.93424681,  0.64474075,  0.5467888 ,
        0.78418197,  2.0177139 , -2.62813792,  0.39324099, -1.47314286,
       -0.86644329, -0.32973047, -0.18273216,  1.58910071, -0.46382031,
       -0.25799498,  0.9110974 ,  0.03276809,  1.26063654, -2.76816364,
       -0.36334006,  0.8052834 , -0.54148883, -0.00623805,  1.06663428,
        0.81046827,  1.17079664,  1.21784954, -1.18059528, -0.08302514,
       -0.48945512,  0.26548053,  0.98639233,  0.91047992, -0.41426257,
        1.18547661,  0.94759834, -1.15599518, -0.08423546, -1.54095918])

Handling arrays

These are just a few of the different ways to create numpy arrays.

You can also use functions like np.max() and np.min() to get the maximum and minimum values, respectively.

In [6]:
print('Min X: {0:.3f} \t Max X: {1:.3f}'.format(np.min(x), np.max(x)) )
Min X: 1.000 	 Max X: 10.000

Apply mathematical functions

In [7]:
zz = x**2 + 3*x**3
zz
Out[7]:
array([    4.,    28.,    90.,   400.,   684.,  1078.,  1600.,  3100.])

Conditionals

Find the indices of the elements in an array that meet some criteria. In this example, we're finding all the elements that are within 100 and 500 in array "zz".

In [8]:
zz_idx = np.where((zz>= 100)&(zz <= 500))[0]
print('zz_idx: {0}'.format(zz_idx))

zz[zz_idx]
zz_idx: [3]
Out[8]:
array([ 400.])

Manipulating Arrays

There are a lot of things we can do to a numpy array.

In [9]:
h1 = np.random.randint(10, 50, 50)
h1
Out[9]:
array([49, 16, 47, 32, 14, 31, 29, 18, 39, 13, 17, 47, 42, 23, 27, 20, 37,
       19, 18, 38, 21, 33, 10, 22, 35, 40, 17, 41, 41, 18, 36, 21, 27, 35,
       31, 18, 45, 32, 41, 31, 25, 28, 13, 13, 49, 22, 47, 10, 12, 45])

We can get the overall size and shape of the array.

We cane use the functions numpy.size and numpy.shape to get the total number of elements in an array and the shape of the array, respectively.

In [10]:
np.size(h1)
Out[10]:
50
In [11]:
h1.shape
Out[11]:
(50,)
In [12]:
A = np.array([[1,2,3,4,5],
              [6,7,8,9,10],
              [12,13,14,16,17],
              [13,45,67,89,90] ])
A
Out[12]:
array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [12, 13, 14, 16, 17],
       [13, 45, 67, 89, 90]])
In [13]:
np.shape(A)
Out[13]:
(4, 5)

You can also transpose array A.

In [14]:
A_t = np.transpose(A)
A_t
Out[14]:
array([[ 1,  6, 12, 13],
       [ 2,  7, 13, 45],
       [ 3,  8, 14, 67],
       [ 4,  9, 16, 89],
       [ 5, 10, 17, 90]])

Why are Numpy arrays better than lists:

  • Python lists are very general.
  • Lists do not support matrix and dot multiplications, etc.
  • Numpy arrays are memory efficient.
  • Numpy arrays are statically typed and homogeneous.
  • They are fast at mathematical functions.
  • They can be used in compiled languages, e.g. C and Fortran.

Array-generating functions

For large arrays it is inpractical to initialize the data manually, using normal Python lists. Instead, we can use many of the Numpy functions to generate arrays of different forms.

numpy.arange

We use this one to create a sequence of ordered elements

In [15]:
np.arange(0,10,1)
Out[15]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [16]:
np.arange(0,20,5)
Out[16]:
array([ 0,  5, 10, 15])
In [17]:
np.arange(-40,21,10)
Out[17]:
array([-40, -30, -20, -10,   0,  10,  20])

linspace and logspace

We use these functions to created ordered lists, separated by intervals in real- and log-space.

In [18]:
B = np.linspace(0,50)
B
Out[18]:
array([  0.        ,   1.02040816,   2.04081633,   3.06122449,
         4.08163265,   5.10204082,   6.12244898,   7.14285714,
         8.16326531,   9.18367347,  10.20408163,  11.2244898 ,
        12.24489796,  13.26530612,  14.28571429,  15.30612245,
        16.32653061,  17.34693878,  18.36734694,  19.3877551 ,
        20.40816327,  21.42857143,  22.44897959,  23.46938776,
        24.48979592,  25.51020408,  26.53061224,  27.55102041,
        28.57142857,  29.59183673,  30.6122449 ,  31.63265306,
        32.65306122,  33.67346939,  34.69387755,  35.71428571,
        36.73469388,  37.75510204,  38.7755102 ,  39.79591837,
        40.81632653,  41.83673469,  42.85714286,  43.87755102,
        44.89795918,  45.91836735,  46.93877551,  47.95918367,
        48.97959184,  50.        ])
In [19]:
B = np.linspace(0,100, 20)
B
Out[19]:
array([   0.        ,    5.26315789,   10.52631579,   15.78947368,
         21.05263158,   26.31578947,   31.57894737,   36.84210526,
         42.10526316,   47.36842105,   52.63157895,   57.89473684,
         63.15789474,   68.42105263,   73.68421053,   78.94736842,
         84.21052632,   89.47368421,   94.73684211,  100.        ])

Array of 25 elements from $10^{0}$ to $10^{3}$, with base of 10.

In [20]:
B = np.logspace(0,3,25)
B
Out[20]:
array([    1.        ,     1.33352143,     1.77827941,     2.37137371,
           3.16227766,     4.21696503,     5.62341325,     7.49894209,
          10.        ,    13.33521432,    17.7827941 ,    23.71373706,
          31.6227766 ,    42.16965034,    56.23413252,    74.98942093,
         100.        ,   133.35214322,   177.827941  ,   237.13737057,
         316.22776602,   421.69650343,   562.34132519,   749.89420933,
        1000.        ])

Creating an array of 11 elements from $e^{0}$ to $e^{10}$, with the base == numpy.e

In [21]:
B = np.logspace(0,10,11, base=np.e)
B
Out[21]:
array([  1.00000000e+00,   2.71828183e+00,   7.38905610e+00,
         2.00855369e+01,   5.45981500e+01,   1.48413159e+02,
         4.03428793e+02,   1.09663316e+03,   2.98095799e+03,
         8.10308393e+03,   2.20264658e+04])

Random Data

In [22]:
from numpy import random
In [23]:
# Uniform random numbers in [0,1]
random.rand(5,5)
Out[23]:
array([[ 0.70333174,  0.82201784,  0.86694282,  0.45872856,  0.65237078],
       [ 0.92471522,  0.07738043,  0.00792367,  0.89370944,  0.12375922],
       [ 0.01117345,  0.94193399,  0.66596627,  0.97658871,  0.8521725 ],
       [ 0.17140515,  0.27607763,  0.61752201,  0.0452752 ,  0.70188108],
       [ 0.06569232,  0.63847745,  0.6397245 ,  0.99397272,  0.61126429]])
In [24]:
# 20 Random integers from 10 to 30
random.randint(10,30,20)
Out[24]:
array([21, 23, 24, 13, 22, 18, 13, 12, 21, 27, 26, 23, 28, 19, 26, 18, 27,
       15, 12, 27])

Arrays of zeros and ones.

In [25]:
np.zeros(20)
Out[25]:
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.])

You can use these to populate other arrays

In [26]:
nelem = 10
C = np.ones(10)
C
Out[26]:
array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
In [27]:
for ii in range(C.size):
    C[ii] = random.rand()
C
Out[27]:
array([ 0.62330987,  0.00501833,  0.85661478,  0.11083291,  0.44725486,
        0.47957761,  0.74933274,  0.59828477,  0.34619114,  0.48713529])

Diagonals

You can also construct an array with another array as the diagonal

In [28]:
np.diag(random.randint(10,20,5))
Out[28]:
array([[15,  0,  0,  0,  0],
       [ 0, 15,  0,  0,  0],
       [ 0,  0, 14,  0,  0],
       [ 0,  0,  0, 18,  0],
       [ 0,  0,  0,  0, 11]])

Indexing

You can choose which values to select. Normally, you select the rows first, and then the cols of a numpy.ndarray.

In [29]:
M = random.rand(10,5)
M
Out[29]:
array([[ 0.63421292,  0.51336646,  0.35657472,  0.84240729,  0.96886034],
       [ 0.54602167,  0.61559373,  0.70844189,  0.27409218,  0.97493221],
       [ 0.79749899,  0.48887077,  0.23291903,  0.80607861,  0.95317165],
       [ 0.01174903,  0.44857507,  0.07807561,  0.30203585,  0.71012929],
       [ 0.05193125,  0.78803809,  0.82081371,  0.11816636,  0.79262616],
       [ 0.29631858,  0.48613545,  0.03677773,  0.28241598,  0.0182624 ],
       [ 0.38232493,  0.30529739,  0.77661725,  0.50868523,  0.73230337],
       [ 0.85892041,  0.44281793,  0.30585396,  0.81619482,  0.09236484],
       [ 0.39316536,  0.02240015,  0.87000599,  0.65969264,  0.30898496],
       [ 0.40240788,  0.1709426 ,  0.89597609,  0.14847419,  0.28419652]])

Selecting the 1st row

In [30]:
M[1,:]
Out[30]:
array([ 0.54602167,  0.61559373,  0.70844189,  0.27409218,  0.97493221])

The 2nd column

In [31]:
M[:,1]
Out[31]:
array([ 0.51336646,  0.61559373,  0.48887077,  0.44857507,  0.78803809,
        0.48613545,  0.30529739,  0.44281793,  0.02240015,  0.1709426 ])

Select a range of columns and rows

In [32]:
M[1:3, 2:4]
Out[32]:
array([[ 0.70844189,  0.27409218],
       [ 0.23291903,  0.80607861]])

You can easily use this to create a mask, for when you are cleaning your data.

In [33]:
A = random.rand(3,3)
np.fill_diagonal(A, np.nan)
A
Out[33]:
array([[        nan,  0.18696976,  0.45805872],
       [ 0.65087411,         nan,  0.6289959 ],
       [ 0.78895587,  0.04721399,         nan]])
In [34]:
B = np.arange(0,9).reshape((3,3))
B
Out[34]:
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

Appying the mask from $A \to B$

In [35]:
A_mask = np.isfinite(A)
A_mask
Out[35]:
array([[False,  True,  True],
       [ True, False,  True],
       [ True,  True, False]], dtype=bool)
In [36]:
B[A_mask]
Out[36]:
array([1, 2, 3, 5, 6, 7])

Binning you data

This is probably one of the best functions of Numpy. You can use this to bin you data, and calculate means, standard deviations, etc.

numpy.digitize

In [37]:
# Creating my bin edges
bins = np.arange(0,13)
bins
Out[37]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
In [38]:
# Generating Data
data = 10*random.rand(100)
data
Out[38]:
array([ 5.90317954,  5.35358527,  5.21419733,  3.42967037,  2.86224222,
        6.11096249,  1.95423629,  4.36342125,  7.25799434,  1.54609088,
        0.72642201,  4.78934623,  3.93302968,  3.68390917,  7.68534614,
        3.55537239,  9.21838273,  9.22987585,  8.96152438,  8.31621782,
        9.30807818,  6.14835972,  9.778559  ,  2.44184111,  1.98587576,
        4.22691609,  5.4981662 ,  1.00559139,  9.01167816,  8.20859076,
        1.97974162,  1.0483506 ,  2.75088446,  3.77448632,  8.18256242,
        6.31300584,  7.50397032,  5.34857889,  5.10785904,  8.93012728,
        9.27372029,  1.7611759 ,  6.60773795,  1.54883243,  6.32796315,
        9.90382221,  6.35113732,  9.34146487,  4.99957054,  4.69721008,
        7.70600515,  7.37731352,  5.79169937,  0.47538048,  6.19525903,
        7.46575739,  3.07154341,  9.47508329,  6.30866702,  4.61427438,
        5.38557581,  7.58639921,  7.23312292,  0.24806187,  5.87968887,
        2.86711537,  6.03587702,  6.44504998,  0.70319294,  0.92092988,
        9.72635171,  1.7052563 ,  5.06069725,  3.09859745,  1.30786496,
        0.85679922,  4.24514051,  9.2030531 ,  0.07054763,  4.89661628,
        4.3830221 ,  9.45581732,  5.65104052,  5.53582213,  4.1341352 ,
        5.51723999,  1.60328303,  1.77015089,  9.71183461,  1.81226424,
        6.98969811,  3.37602433,  4.02928118,  9.25202018,  5.56791132,
        5.25108506,  2.54666044,  3.76347944,  2.93046016,  2.01268476])

Now I want to bin my data and calculate the mean for each bin

In [39]:
# Defining statistical function to use
stat_func = np.nanmean
# Binning the data
data_bins = np.digitize(data, bins)
data_bins
Out[39]:
array([ 6,  6,  6,  4,  3,  7,  2,  5,  8,  2,  1,  5,  4,  4,  8,  4, 10,
       10,  9,  9, 10,  7, 10,  3,  2,  5,  6,  2, 10,  9,  2,  2,  3,  4,
        9,  7,  8,  6,  6,  9, 10,  2,  7,  2,  7, 10,  7, 10,  5,  5,  8,
        8,  6,  1,  7,  8,  4, 10,  7,  5,  6,  8,  8,  1,  6,  3,  7,  7,
        1,  1, 10,  2,  6,  4,  2,  1,  5, 10,  1,  5,  5, 10,  6,  6,  5,
        6,  2,  2, 10,  2,  7,  4,  5, 10,  6,  6,  3,  4,  3,  3])

Calculating the mean for each of the bins

In [40]:
failval = -10
bins_stat = np.array([stat_func(data[data_bins == ii]) \
                      if len(data[data_bins == ii]) > 0 \
                      else failval \
                      for ii in xrange(1,len(bins))])
bins_stat = np.asarray(bins_stat)
bins_stat
Out[40]:
array([  0.57161915,   1.61759341,   2.63026979,   3.52067917,
         4.48899399,   5.47108844,   6.34851978,   7.47698862,
         8.51980453,   9.42069582, -10.        , -10.        ])

You can put all of this into a function that estimates errors and more...

In [54]:
import math

def myceil(x, base=10):
    """
    Returns the upper-bound integer of 'x' in base 'base'.

    Parameters
    ----------
    x: float
        number to be approximated to closest number to 'base'

    base: float
        base used to calculate the closest 'largest' number

    Returns
    -------
    n_high: float
        Closest float number to 'x', i.e. upper-bound float.

    Example
    -------
    >>>> myceil(12,10)
      20
    >>>>
    >>>> myceil(12.05, 0.1)
     12.10000 
    """
    n_high = float(base*math.ceil(float(x)/base))

    return n_high

def myfloor(x, base=10):
    """
    Returns the lower-bound integer of 'x' in base 'base'

    Parameters
    ----------
    x: float
        number to be approximated to closest number of 'base'

    base: float
        base used to calculate the closest 'smallest' number

    Returns
    -------
    n_low: float
        Closest float number to 'x', i.e. lower-bound float.

    Example
    -------
    >>>> myfloor(12, 5)
    >>>> 10
    """
    n_low = float(base*math.floor(float(x)/base))

    return n_low

def Bins_array_create(arr, base=10):
    """
    Generates array between [arr.min(), arr.max()] in steps of `base`.

    Parameters
    ----------
    arr: array_like, Shape (N,...), One-dimensional
        Array of numerical elements

    base: float, optional (default=10)
        Interval between bins

    Returns
    -------
    bins_arr: array_like
        Array of bin edges for given arr

    """
    base = float(base)
    arr  = np.array(arr)
    assert(arr.ndim==1)
    arr_min  = myfloor(arr.min(), base=base)
    arr_max  = myceil( arr.max(), base=base)
    bins_arr = np.arange(arr_min, arr_max+0.5*base, base)

    return bins_arr
In [108]:
def Mean_std_calc_one_array(x1, y1, arr_len=0, statfunc=np.nanmean,
                           failval=np.nan, error='std',
                           base=10.):
    """
    Calculates statistics of two arrays, e.g. scatter,
    error in `statfunc`, etc.
    
    Parameters
    ----------
    x1: array-like, shape (N,)
        array of x-values
    
    y1: array-like, shape (N,)
        array of y-values
    
    arr_len: int, optional (default = 0)
        minimum number of elements in the bin
    
    statfunc: numpy function, optional (default = numpy.nanmean)
        statistical function used to evaluate the bins
    
    failval: int or float, optional (default = numpy.nan)
        Number to use to replace when the number of elements in the 
        bin is smaller than `arr_len`
    
    error: string, optional (default = 'std')
        type of error to evaluate
        Options:
            - 'std': Evaluates the standard deviation of the bin
            - 'stat': Evaluates the error in the mean/median of each bin
            - 'none': Does not calculate the error in `y1`
    base: float
        Value of bin width in units of that of `x1`
    
    Returns
    --------
    x1_stat: array-like, shape (N,)
        `stat_func` of each bin in `base` spacings for x1

    y1_stat: array-like, shape (N,)
        `stat_func` of each bin in `base` spacings for y1
    """
    x1 = np.asarray(x1)
    y1 = np.asarray(y1)
    assert((x1.ndim==1) & (y1.ndim==1))
    assert((x1.size >0) & (y1.size>0))
    n_elem = len(x1)
    ## Computing Bins
    x1_bins  = Bins_array_create(x1, base=base)
    x1_digit = np.digitize(x1, x1_bins)
    ## Computing Statistics in bins
    x1_stat = np.array([statfunc(x1[x1_digit==ii]) 
                                 if len(x1[x1_digit==ii])>arr_len 
                                 else failval
                                 for ii in range(1,x1_bins.size)])
    y1_stat = np.array([statfunc(y1[x1_digit==ii])
                                 if len(y1[x1_digit==ii])>arr_len 
                                 else failval
                                 for ii in range(1,x1_bins.size)])
    ## Computing error in the data
    if error=='std':
        stat_err = np.nanstd
        y1_err = np.array([stat_err(y1[x1_digit==ii]) 
                                     if len(y1[x1_digit==ii])>arr_len 
                                     else failval
                                     for ii in range(1,x1_bins.size)])
    if error!='none':
        y1_err = np.array([stat_err(y1[x1_digit==ii])/np.sqrt(len(y1[x1_digit==ii]))
                          if len(y1[x1_digit==ii])>arr_len
                          else failval
                          for ii in range(1,x1_bins.size)])
        if (stat_func==np.median) or (stat_func==np.nanmedian):
            y1_err *= 1.253
    else:
        y1_err = np.zeros(y1.stat.size)
    
    return x1_stat, y1_stat, y1_err

Example of using these function:

In [125]:
import numpy as np
x_arr = np.arange(100)
y_arr = 50*np.random.randn(x_arr.size)

# Computing mean and error in the mean for `x_arr` and `y_arr`
x_stat, y_stat, y_err = Mean_std_calc_one_array(x_arr, y_arr,
                                               statfunc=np.nanmean,
                                               failval=np.nan,
                                               base=10)
x_stat2, y_stat2, y_err2 = Mean_std_calc_one_array(x_arr, y_arr,
                                               statfunc=np.nanmedian,
                                               failval=np.nan,
                                               base=10)
In [127]:
plt.style.use('seaborn-notebook')
plt.clf()
plt.close()
fig = plt.figure(figsize=(12,8))
ax  = fig.add_subplot(111,axisbg='white')
ax.plot(x_arr, y_arr, 'ro', label='Data')
ax.errorbar(x_stat, y_stat, yerr=y_err, color='blue', marker='o',
           linestyle='--',label='Mean')
ax.errorbar(x_stat2, y_stat2, yerr=y_err2, color='green', marker='o',
           linestyle='--',label='Median')
ax.set_xlabel('X axis', fontsize=20)
ax.set_ylabel('Y axis', fontsize=20)
ax.set_title('Data and the Binned Data', fontsize=24)
plt.legend(fontsize=20)
plt.show()

With this function, it is really easy to apply statistics on binned data, as well as to estimate errors on the data.

Reshaping, resizing and stacking arrays

One can always modify the shape of a numpy.ndarray, as well as append it to a pre-existing array.

In [41]:
A = np.array([[n+m*10 for n in range(5)] for m in range(5)])
A
Out[41]:
array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44]])
In [42]:
n, m = A.shape
In [43]:
B = A.reshape((1,n*m))
B
Out[43]:
array([[ 0,  1,  2,  3,  4, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31,
        32, 33, 34, 40, 41, 42, 43, 44]])
In [44]:
A_f = A.flatten()
In [45]:
C = random.rand(A.size)
C
Out[45]:
array([ 0.19128911,  0.57823882,  0.24878763,  0.39657524,  0.60868046,
        0.63974783,  0.46267723,  0.01998228,  0.55092204,  0.63844936,
        0.08424844,  0.1178531 ,  0.2605398 ,  0.18705487,  0.26524993,
        0.55921608,  0.70568846,  0.80065224,  0.32349928,  0.5289561 ,
        0.91586611,  0.81767972,  0.17038868,  0.28769872,  0.06057029])
In [46]:
C.shape
Out[46]:
(25,)
In [47]:
# Stacking the two arrays
D = np.column_stack((A_f,C))
D
Out[47]:
array([[  0.00000000e+00,   1.91289109e-01],
       [  1.00000000e+00,   5.78238816e-01],
       [  2.00000000e+00,   2.48787626e-01],
       [  3.00000000e+00,   3.96575235e-01],
       [  4.00000000e+00,   6.08680462e-01],
       [  1.00000000e+01,   6.39747834e-01],
       [  1.10000000e+01,   4.62677232e-01],
       [  1.20000000e+01,   1.99822778e-02],
       [  1.30000000e+01,   5.50922038e-01],
       [  1.40000000e+01,   6.38449356e-01],
       [  2.00000000e+01,   8.42484394e-02],
       [  2.10000000e+01,   1.17853100e-01],
       [  2.20000000e+01,   2.60539795e-01],
       [  2.30000000e+01,   1.87054871e-01],
       [  2.40000000e+01,   2.65249932e-01],
       [  3.00000000e+01,   5.59216077e-01],
       [  3.10000000e+01,   7.05688457e-01],
       [  3.20000000e+01,   8.00652237e-01],
       [  3.30000000e+01,   3.23499284e-01],
       [  3.40000000e+01,   5.28956104e-01],
       [  4.00000000e+01,   9.15866113e-01],
       [  4.10000000e+01,   8.17679723e-01],
       [  4.20000000e+01,   1.70388681e-01],
       [  4.30000000e+01,   2.87698715e-01],
       [  4.40000000e+01,   6.05702928e-02]])
In [48]:
# Selecting from 3rd to 11th row
D[2:10]
Out[48]:
array([[  2.        ,   0.24878763],
       [  3.        ,   0.39657524],
       [  4.        ,   0.60868046],
       [ 10.        ,   0.63974783],
       [ 11.        ,   0.46267723],
       [ 12.        ,   0.01998228],
       [ 13.        ,   0.55092204],
       [ 14.        ,   0.63844936]])

np.concatenate

You can also concadenate different arrays

In [49]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5,6]])
In [50]:
np.concatenate((a,b))
Out[50]:
array([[1, 2],
       [3, 4],
       [5, 6]])
In [51]:
np.concatenate((a,b.T), axis=1)
Out[51]:
array([[1, 2, 5],
       [3, 4, 6]])

Copy and "Deep Copy"

Sometimes it is important to create new copies of arrays and other objects. For this reason, one uses numpy.copy to create new copies of arrays

In [52]:
A = np.array([[1, 2], [3, 4]])
A
Out[52]:
array([[1, 2],
       [3, 4]])
In [53]:
# `B` is now referring to the same array data as `A`
B = A

If we make any changes to B, A will also be affected by this change.

In [54]:
B[0,0] = 10
B
Out[54]:
array([[10,  2],
       [ 3,  4]])
In [55]:
A
Out[55]:
array([[10,  2],
       [ 3,  4]])

To get a completely independent, new object, you would use:

In [56]:
B = np.copy(A)
# Modifying `B`
B[0,0] = -5
B
Out[56]:
array([[-5,  2],
       [ 3,  4]])
In [57]:
A
Out[57]:
array([[10,  2],
       [ 3,  4]])

The array A was not affected by this changed. This is important when you're constantly re-defining new arrays

Scipy - Library of Scientific Algorithms for Python

SciPy provides a large number of higher-level scientif algorithms. It includes:

  • Special Functions
  • Integration
  • Optimization
  • Interpolation
  • Fourier Transforms
  • Signal Processing
  • Linear Algebra
  • Statistics
  • Multi-dimensional image processing
In [58]:
from scipy import *

Interpolation

You can use Scipy to interpolate your data. You would use the interp1d function to interpolate your function.

In [59]:
from scipy.interpolate import *
In [60]:
def f(x):
    return np.sin(x)
In [61]:
n = arange(0, 10)  
x = linspace(0, 9, 100)

y_meas = f(n) + 0.1 * randn(len(n)) # simulate measurement with noise
y_real = f(x)

linear_interpolation = interp1d(n, y_meas)
y_interp1 = linear_interpolation(x)

cubic_interpolation = interp1d(n, y_meas, kind='cubic')
y_interp2 = cubic_interpolation(x)
In [62]:
fig, ax = plt.subplots(figsize=(10,4))
ax.set_facecolor('white')
ax.plot(n, y_meas, 'bs', label='noisy data')
ax.plot(x, y_real, 'k', lw=2, label='true function')
ax.plot(x, y_interp1, 'r', label='linear interp')
ax.plot(x, y_interp2, 'g', label='cubic interp')
ax.legend(loc=3);

KD-Trees

You can also use SciPy to calculate KD-Trees for a set of points

In [63]:
Lbox = 250.
Npts = 1000
# Creating cartesian coordinates
x = np.random.uniform(0, Lbox, Npts)
y = np.random.uniform(0, Lbox, Npts)
z = np.random.uniform(0, Lbox, Npts)
sample1 = np.vstack([x, y, z]).T
sample1
Out[63]:
array([[  29.97084048,   83.70201436,  244.7334759 ],
       [ 169.90821434,   54.39523605,  216.99049523],
       [ 136.16726216,  220.75200499,  100.38244749],
       ..., 
       [  87.29467063,   26.89732224,  222.09370448],
       [ 200.13101232,   27.84003363,   92.85642522],
       [  44.50785257,  111.70047818,   96.49064269]])
In [64]:
sample1.shape
Out[64]:
(1000, 3)

Let's say we want to know how many points are within distances of 30 and 50 from other points. To know this, you construct a KD-Tree

In [65]:
from scipy.spatial import cKDTree
In [66]:
# Initializing KDTree
KD_obj = cKDTree(sample1)
In [67]:
N_neighbours = cKDTree.count_neighbors(KD_obj, KD_obj, 50) - \
                cKDTree.count_neighbors(KD_obj, KD_obj, 30)
print("Number of Neighbours: {0}".format(N_neighbours))
Number of Neighbours: 20042

Let's say you want to get the distances to the Nth-nearest neighbor.

In [68]:
k_nearest = 4
dist_k, dist_k_idx = cKDTree.query(KD_obj, sample1, k_nearest)
dist_k
Out[68]:
array([[  0.        ,  25.93189529,  26.17156604,  26.81214231],
       [  0.        ,   9.5017603 ,  12.9681842 ,  20.79923232],
       [  0.        ,  11.45087107,  15.74440684,  19.75190004],
       ..., 
       [  0.        ,  13.64764887,  19.40903698,  21.12311228],
       [  0.        ,   6.21723664,   6.54602253,  14.30274301],
       [  0.        ,  10.27377938,  16.74578303,  18.35651455]])

You can also get the indices

In [69]:
dist_k_idx
Out[69]:
array([[  0, 800, 364,  67],
       [  1, 522,  74,  98],
       [  2, 442, 578, 218],
       ..., 
       [997, 679, 276, 235],
       [998, 681, 291, 212],
       [999, 354, 599,  70]])

The first columns corresponds to itself.

You can also find pairs that are separated by at most a distance r

In [70]:
pairs = KD_obj.query_ball_tree(KD_obj, 30)
pairs[0:10]
Out[70]:
[[0, 67, 364, 470, 800, 921],
 [1, 74, 98, 100, 522, 528, 596, 633, 637, 914],
 [2, 218, 279, 296, 361, 442, 486, 578, 624, 756],
 [3, 524, 580, 634, 653, 866, 929],
 [4, 21, 543, 804, 864, 973],
 [5, 409, 464, 559, 784, 792, 903, 944, 958],
 [6, 15, 24, 144, 456, 607, 870, 988],
 [7, 213, 229, 606, 824, 978],
 [8, 55, 197, 655, 677, 686, 863],
 [9, 21, 135, 258, 262, 544, 587]]

So that's it for today's lesson

Next week:

  • Advanced Visualization
  • Seaborn