# NumPy arrays¶

Nikolay Koldunov

[email protected]

================

• a powerful N-dimensional array object
• tools for integrating C/C++ and Fortran code
• useful linear algebra, Fourier transform, and random number capabilities
In [1]:
import numpy as np
%matplotlib inline

In [2]:
np.set_printoptions(precision=3 , suppress= True) # this is just to make the output look better


Load data in to a variable:

In [3]:
temp = np.loadtxt('Ham_3column.txt')

In [4]:
temp

Out[4]:
array([[1891.,    1.,    1.,  -72.],
[1891.,    1.,    2.,  -43.],
[1891.,    1.,    3.,  -32.],
...,
[2014.,    8.,   29.,  216.],
[2014.,    8.,   30.,  198.],
[2014.,    8.,   31.,  184.]])
In [5]:
temp.shape

Out[5]:
(45168, 4)

In [6]:
temp.size

Out[6]:
180672

So it's a row-major order. Matlab and Fortran use column-major order for arrays.

In [7]:
type(temp)

Out[7]:
numpy.ndarray

Numpy arrays are statically typed, which allow faster operations

In [8]:
temp.dtype

Out[8]:
dtype('float64')

You can't assign value of different type to element of the numpy array:

In [9]:
temp[0,0] = 'Year'

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-c2dbfa0bc312> in <module>
----> 1 temp[0,0] = 'Year'

ValueError: could not convert string to float: 'Year'

Slicing works similarly to Matlab:

In [10]:
temp[0:5,:]

Out[10]:
array([[1891.,    1.,    1.,  -72.],
[1891.,    1.,    2.,  -43.],
[1891.,    1.,    3.,  -32.],
[1891.,    1.,    4.,   12.],
[1891.,    1.,    5.,  -29.]])
In [11]:
temp[-5:-1,:]

Out[11]:
array([[2014.,    8.,   27.,  219.],
[2014.,    8.,   28.,  234.],
[2014.,    8.,   29.,  216.],
[2014.,    8.,   30.,  198.]])

One can look at the data. This is done by matplotlib module:

In [12]:
import matplotlib.pylab as plt
plt.plot(temp[:,3])

Out[12]:
[<matplotlib.lines.Line2D at 0x11b17be10>]

## Index slicing¶

In general it is similar to Matlab

First 12 elements of second column (months). Remember that indexing starts with 0:

In [13]:
temp[0:12,2]

Out[13]:
array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.])

First raw:

In [14]:
temp[0,:]

Out[14]:
array([1891.,    1.,    1.,  -72.])

## Exercise¶

• Plot only first 1000 values
• Plot last 1000 values
In [ ]:


In [ ]:


In [ ]:



We can create mask, selecting all raws where values in third raw (days) equals 10:

In [15]:
mask = (temp[:,2]==10)


Here we apply this mask and show only first 5 raws of the array:

In [16]:
temp[mask][:20,:]

Out[16]:
array([[1891.,    1.,   10.,  -89.],
[1891.,    2.,   10.,  -19.],
[1891.,    3.,   10.,   32.],
[1891.,    4.,   10.,   84.],
[1891.,    5.,   10.,  188.],
[1891.,    6.,   10.,  130.],
[1891.,    7.,   10.,  161.],
[1891.,    8.,   10.,  171.],
[1891.,    9.,   10.,  221.],
[1891.,   10.,   10.,  181.],
[1891.,   11.,   10.,   80.],
[1891.,   12.,   10.,  107.],
[1892.,    1.,   10.,   -4.],
[1892.,    2.,   10.,   36.],
[1892.,    3.,   10.,   16.],
[1892.,    4.,   10.,  146.],
[1892.,    5.,   10.,  195.],
[1892.,    6.,   10.,  205.],
[1892.,    7.,   10.,  209.],
[1892.,    8.,   10.,  155.]])

You don't have to create separate variable for mask, but apply it directly. Here instead of first five rows I show five last rows:

In [17]:
temp[temp[:,2]==10][-5:,:]

Out[17]:
array([[2014.,    4.,   10.,  116.],
[2014.,    5.,   10.,   27.],
[2014.,    6.,   10.,  300.],
[2014.,    7.,   10.,  277.],
[2014.,    8.,   10.,  259.]])

You can combine conditions. In this case we select days from 10 to 12 (only first 10 elements are shown):

In [18]:
temp[(temp[:,2]>=10)&(temp[:,2]<=12)][0:10,:]

Out[18]:
array([[1891.,    1.,   10.,  -89.],
[1891.,    1.,   11.,   16.],
[1891.,    1.,   12.,   21.],
[1891.,    2.,   10.,  -19.],
[1891.,    2.,   11.,   36.],
[1891.,    2.,   12.,   31.],
[1891.,    3.,   10.,   32.],
[1891.,    3.,   11.,   46.],
[1891.,    3.,   12.,   46.],
[1891.,    4.,   10.,   84.]])

## Exercise¶

Select only summer months
Select only first half of the year
In [ ]:



## Basic operations¶

Create example array from first 12 values of second column and perform some basic operations:

In [19]:
days = temp[0:12,2]
days

Out[19]:
array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.])
In [20]:
days+10

Out[20]:
array([11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22.])
In [21]:
days*20

Out[21]:
array([ 20.,  40.,  60.,  80., 100., 120., 140., 160., 180., 200., 220.,
240.])
In [22]:
days*days

Out[22]:
array([  1.,   4.,   9.,  16.,  25.,  36.,  49.,  64.,  81., 100., 121.,
144.])

What's wrong with this figure?

In [23]:
plt.plot(temp[:100,3])

Out[23]:
[<matplotlib.lines.Line2D at 0x11bfaee50>]

## Exercise¶

• Create new array that will contain only temperatures

• Convert temperature to deg C

• Convert all temperatures to deg F

In [ ]:



## Basic statistics¶

Create temp_values that will contain only data values:

In [24]:
temp_values = temp[:,3]/10.
temp_values

Out[24]:
array([-7.2, -4.3, -3.2, ..., 21.6, 19.8, 18.4])

Simple statistics:

In [25]:
temp_values.min()

Out[25]:
-14.6
In [26]:
temp_values.max()

Out[26]:
37.3
In [27]:
temp_values.mean()

Out[27]:
12.488779667020898
In [28]:
temp_values.std()

Out[28]:
8.05358769929934
In [29]:
temp_values.sum()

Out[29]:
564093.2

You can also use sum function:

In [30]:
np.sum(temp_values)

Out[30]:
564093.2

One can make operations on the subsets:

## Exercise¶

Calculate mean for first 1000 values of temperature
In [ ]:



## Saving data¶

You can save your data as a text file

In [31]:
np.savetxt('temp_only_values.csv',temp[:, 3]/10., fmt='%.4f')


In [32]:
!head temp_only_values.csv

-7.2000
-4.3000
-3.2000
1.2000
-2.9000
-4.3000
-3.7000
-9.7000
-9.9000
-8.9000


You can also save it as binary:

In [33]:
f=open('temp_only_values.bin', 'w')
temp[:,3].tofile(f)
f.close()


## Exercises¶

• Select and plot only data for October
• Calculate monthly means for years from 1990 to 1999 and plot them