Numpy

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

  • a powerful N-dimensional array object
  • sophisticated (broadcasting) functions
  • tools for integrating C/C++ and Fortran code
  • useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. More info at: http://www.numpy.org/

In [1]:
import numpy as np

Creating arrays

In [3]:
np.choose?

We can create two arrays from a list:

In [188]:
a = np.array([1, 2])
a
Out[188]:
array([1, 2])
In [189]:
A = np.array([[0, 1, 2, 3, 4],
              [5, 6, 7, 8, 9]])
A
Out[189]:
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

Each numpy.array is a python object with attributes and methods, let's some of them.

the shape

In [190]:
a.shape
Out[190]:
(2,)
In [191]:
A.shape
Out[191]:
(2, 5)

The array type:

In [192]:
a.dtype
Out[192]:
dtype('int64')

they are statically typed, so you can not, assign something with a different type.

In [193]:
a[0] = 'python'
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-193-4f48d9ac26e9> in <module>()
----> 1 a[0] = 'python'

ValueError: invalid literal for int() with base 10: 'python'

The number of elements:

In [194]:
a.size
Out[194]:
2
In [196]:
A.size
Out[196]:
10

Number of bytes per element

In [198]:
a.itemsize
Out[198]:
8
In [199]:
A.itemsize
Out[199]:
8

the number of bytes

In [195]:
a.nbytes
Out[195]:
16
In [197]:
A.nbytes
Out[197]:
80

The number of dimension:

In [14]:
a.ndim
Out[14]:
1
In [15]:
A.ndim
Out[15]:
2

Generating arrays

In [16]:
np.arange(1, 10, 2)
Out[16]:
array([1, 3, 5, 7, 9])
In [3]:
np.arange(1, 2, 0.2)
Out[3]:
array([ 1. ,  1.2,  1.4,  1.6,  1.8])
In [17]:
a = np.arange(10)
a
Out[17]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [19]:
A= np.reshape(a, (5, 2))
A
Out[19]:
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

Generate an array filled with ones:

In [34]:
np.ones((10, 2), dtype=np.float16)
Out[34]:
array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]], dtype=float16)

Generate an array filled with zeros:

In [35]:
np.zeros((10, 2), dtype=np.float128)
Out[35]:
array([[ 0.0,  0.0],
       [ 0.0,  0.0],
       [ 0.0,  0.0],
       [ 0.0,  0.0],
       [ 0.0,  0.0],
       [ 0.0,  0.0],
       [ 0.0,  0.0],
       [ 0.0,  0.0],
       [ 0.0,  0.0],
       [ 0.0,  0.0]], dtype=float128)

Split an interval in piecies:

In [23]:
c = np.linspace(0, 10, num=5)
c
Out[23]:
array([  0. ,   2.5,   5. ,   7.5,  10. ])
In [30]:
d = np.logspace(2, 3, num=4, base=2)
d
Out[30]:
array([ 4.        ,  5.0396842 ,  6.34960421,  8.        ])

Get random numbers:

In [37]:
np.random.rand(2, 5)
Out[37]:
array([[ 0.70923459,  0.36520342,  0.93789577,  0.37788093,  0.88874502],
       [ 0.00887959,  0.07073521,  0.15550365,  0.34043936,  0.34841286]])
In [39]:
np.random.randint(0, 9)
Out[39]:
8
In [40]:
np.random.random_integers(0, 9, (2, 5))
Out[40]:
array([[9, 8, 4, 2, 9],
       [9, 0, 4, 2, 1]])

Generate diagonal matrix:

In [41]:
np.diag([1, 2, 3, 4])
Out[41]:
array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])
In [42]:
np.diag([1, 2, 3, 4], k=2)
Out[42]:
array([[0, 0, 1, 0, 0, 0],
       [0, 0, 0, 2, 0, 0],
       [0, 0, 0, 0, 3, 0],
       [0, 0, 0, 0, 0, 4],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0]])

Generating a mesh:

In [31]:
x, y = np.mgrid[0:5, 5:10]
In [32]:
x
Out[32]:
array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])
In [33]:
y
Out[33]:
array([[5, 6, 7, 8, 9],
       [5, 6, 7, 8, 9],
       [5, 6, 7, 8, 9],
       [5, 6, 7, 8, 9],
       [5, 6, 7, 8, 9]])

Basic manipulation

We can moltiply the arrays to each other:

In [202]:
a = np.array([1, 2])
a
Out[202]:
array([1, 2])
In [203]:
A = np.array([[0, 1, 2, 3, 4],
              [5, 6, 7, 8, 9]])
A
Out[203]:
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
In [201]:
a + 2
Out[201]:
array([3, 4])
In [204]:
A * 10
Out[204]:
array([[ 0, 10, 20, 30, 40],
       [50, 60, 70, 80, 90]])
In [359]:
A * A
Out[359]:
array([[ 0,  1,  4,  9, 16],
       [25, 36, 49, 64, 81]])
In [363]:
A.T.shape, a.shape
Out[363]:
((5, 2), (2,))
In [364]:
a * A.T
Out[364]:
array([[ 0, 10],
       [ 1, 12],
       [ 2, 14],
       [ 3, 16],
       [ 4, 18]])
In [368]:
C = array([[n + m * 10 for n in range(3)] for m in range(3)])
In [369]:
C
Out[369]:
array([[ 0,  1,  2],
       [10, 11, 12],
       [20, 21, 22]])
In [370]:
np.dot(C, C)
Out[370]:
array([[ 50,  53,  56],
       [350, 383, 416],
       [650, 713, 776]])
In [4]:
c = np.arange(1, 4)
c
Out[4]:
array([1, 2, 3])
In [373]:
np.dot(c, C)
Out[373]:
array([80, 86, 92])
In [375]:
np.dot(c, c)
Out[375]:
14

Matix

In [383]:
m = np.matrix(c).T
m
Out[383]:
matrix([[1],
        [2],
        [3]])
In [384]:
M = np.matrix(C)
M
Out[384]:
matrix([[ 0,  1,  2],
        [10, 11, 12],
        [20, 21, 22]])
In [379]:
M * M
Out[379]:
matrix([[ 50,  53,  56],
        [350, 383, 416],
        [650, 713, 776]])
In [381]:
M * m
Out[381]:
matrix([[  8],
        [ 68],
        [128]])
In [382]:
M + M
Out[382]:
matrix([[ 0,  2,  4],
        [20, 22, 24],
        [40, 42, 44]])

for a complete list of the linear algebra function please have a look at the documentation

Indexing

In [5]:
B = np.reshape(np.arange(40), (5, 8))
B
Out[5]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39]])

Slice

In [6]:
B[:, 1::3]
Out[6]:
array([[ 1,  4,  7],
       [ 9, 12, 15],
       [17, 20, 23],
       [25, 28, 31],
       [33, 36, 39]])
In [7]:
B[::2, ::3]
Out[7]:
array([[ 0,  3,  6],
       [16, 19, 22],
       [32, 35, 38]])

Arrays of integer

In [8]:
B[np.array([0, 0, 0, 2, 2, 2, 4, 4, 4]), np.array([0, 3, 6, 0, 3, 6, 0, 3, 6])]
Out[8]:
array([ 0,  3,  6, 16, 19, 22, 32, 35, 38])

Arrays of boolean

In [9]:
B > 25
Out[9]:
array([[False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False],
       [False, False,  True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True,  True,  True]], dtype=bool)
In [10]:
B[B > 25]
Out[10]:
array([26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39])
In [11]:
bool_index = (B % 2).astype(np.bool)
In [12]:
bool_index
Out[12]:
array([[False,  True, False,  True, False,  True, False,  True],
       [False,  True, False,  True, False,  True, False,  True],
       [False,  True, False,  True, False,  True, False,  True],
       [False,  True, False,  True, False,  True, False,  True],
       [False,  True, False,  True, False,  True, False,  True]], dtype=bool)
In [13]:
B[bool_index]
Out[13]:
array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33,
       35, 37, 39])
In [14]:
B[~bool_index]
Out[14]:
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,
       34, 36, 38])
In [350]:
B[~bool_index] *= 10
In [351]:
B
Out[351]:
array([[  0,   1,  20,   3,  40,   5,  60,   7],
       [ 80,   9, 100,  11, 120,  13, 140,  15],
       [160,  17, 180,  19, 200,  21, 220,  23],
       [240,  25, 260,  27, 280,  29, 300,  31],
       [320,  33, 340,  35, 360,  37, 380,  39]])
In [353]:
bool_index.nonzero()
Out[353]:
(array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4]),
 array([1, 3, 5, 7, 1, 3, 5, 7, 1, 3, 5, 7, 1, 3, 5, 7, 1, 3, 5, 7]))
In [354]:
B[(~bool_index).nonzero()]
Out[354]:
array([  0,  20,  40,  60,  80, 100, 120, 140, 160, 180, 200, 220, 240,
       260, 280, 300, 320, 340, 360, 380])
In [355]:
B
Out[355]:
array([[  0,   1,  20,   3,  40,   5,  60,   7],
       [ 80,   9, 100,  11, 120,  13, 140,  15],
       [160,  17, 180,  19, 200,  21, 220,  23],
       [240,  25, 260,  27, 280,  29, 300,  31],
       [320,  33, 340,  35, 360,  37, 380,  39]])
In [15]:
np.where(B % 2)
Out[15]:
(array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4]),
 array([1, 3, 5, 7, 1, 3, 5, 7, 1, 3, 5, 7, 1, 3, 5, 7, 1, 3, 5, 7]))
In [357]:
np.diag(B)
Out[357]:
array([  0,   9, 180,  27, 360])
In [17]:
which = [1, 0, 1, 0]
choices = [[1, 2, 3, 4], [5, 6, 7, 8]]

np.choose(which, choices)
Out[17]:
array([5, 2, 7, 4])

Add new axis

In [9]:
v = np.array([1,2,3])
v.shape
Out[9]:
(3,)
In [11]:
v[:, np.newaxis]
Out[11]:
array([[1],
       [2],
       [3]])
In [12]:
v[np.newaxis, :]
Out[12]:
array([[1, 2, 3]])
In [20]:
v[:, np.newaxis] * v[np.newaxis, :]
Out[20]:
array([[1, 2, 3],
       [2, 4, 6],
       [3, 6, 9]])
In [21]:
v[:, np.newaxis].shape, v[np.newaxis, :].shape
Out[21]:
((3, 1), (1, 3))

Other operations

In [22]:
np.repeat(v, 4)
Out[22]:
array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])
In [23]:
np.concatenate((v, v*10))
Out[23]:
array([ 1,  2,  3, 10, 20, 30])
In [2]:
v = np.arange(10)
np.roll(v, 1)
Out[2]:
array([9, 0, 1, 2, 3, 4, 5, 6, 7, 8])

Vectorialize function

In [24]:
def func(x):
    if x>1:
        return 5
    return x
In [25]:
func(v)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-8e4da8fa57ff> in <module>()
----> 1 func(v)

<ipython-input-24-015cd1760d75> in func(x)
      1 def func(x):
----> 2     if x>1:
      3         return 5
      4     return x

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
In [29]:
newfunc = np.vectorize(func)
In [30]:
print(v)
newfunc(v)
[1 2 3]
Out[30]:
array([1, 5, 5])

Import data from a text file

In [45]:
!head Klima_LT_N_daily_1981-2012_8320_Bozen-Bolzano.csv -n 20
                        AUTONOME PROVINZ           PROVINCI
                       26. Brand- und Ziv             26. P
                       26.4 - Hydrographi             26.4 
                                                           
     Station - stazioneBozen - Bolzano                     
      Nummer - codice :8320                                
     Rechtswert - X UTM677473                              
     Hochwert - Y UTM :5151945                             
         Höhe - quota :254                                 
     Zeitraum - periodo1981              2012              
                                                           
         Data
Datum    Precipitazione NieTemperatura
Temper
                                          massima Maximum    minima Minimum  
         01:01:1981           0,0               -6,0              10,0       
         02:01:1981           0,0               3,0               10,0       
         03:01:1981           0,0               -5,0              6,0        
         04:01:1981           0,4               -4,0              3,0        
         05:01:1981           0,0               0,0               6,0        

Here we have two main problems:

  • how to convert the date;
  • how to convert the numbers;

We can define two function that take a string and return a python object.

In [13]:
import numpy as np
from datetime import datetime as dt

now = dt.now()
In [16]:
dt.
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-16-383803645475> in <module>()
----> 1 now.hour += 5

AttributeError: attribute 'hour' of 'datetime.datetime' objects is not writable

The modulte datetime in the standard library has function to convert an object to a string and a string to an object.

In [3]:
# object => string
dt.strftime(now, "%d:%m:%Y")
Out[3]:
'02:02:2014'
In [4]:
# string => object
dt.strptime('01:01:1981', "%d:%m:%Y")
Out[4]:
datetime.datetime(1981, 1, 1, 0, 0)

We can define our function using lambda, that is only a short way to define a function

In [5]:
str2npdatetime = lambda bytes: dt.strptime(bytes.decode(), "%d:%m:%Y")
str2float = lambda bytes: float(bytes.decode().replace(',', '.'))

That is equivalent to write:

In [17]:
def bytes2date(bytes):
    """Convert bytes to numpy datetime. ::
    
        >>> bytes2date(b'01:01:1981')
        '1981-01-01'
    """
    return dt.strftime(dt.strptime(bytes.decode(), "%d:%m:%Y"), "%Y-%m-%d")

def bytes2float(bytes):
    """Convert bytes to float. ::
    
        >>> bytes2float(b'2.5')
        2.5
    """
    return float(bytes.decode().replace(',', '.'))

Check that the function are working:

In [7]:
bytes2date(b'01:01:1981')
Out[7]:
'1981-01-01'
In [8]:
bytes2float(b'2,5')
Out[8]:
2.5

Ok, we are ready to import the csv.

In [18]:
data = np.genfromtxt(# name of the file with the data
                     "Klima_LT_N_daily_1981-2012_8320_Bozen-Bolzano.csv", 
                     # set the number of rows that we have to skip
                     skiprows=15,
                     # set a dictionary giving a converter function for each column
                     converters={0: bytes2date, 
                                 1: bytes2float, 
                                 2: bytes2float, 
                                 3: bytes2float},
                     # define the name and type for each 
                     dtype=[('date', 'datetime64[D]'), 
                            ('rainfall', 'float'), 
                            ('Tmin', 'float'), 
                            ('Tmax', 'float')])
In [19]:
data
Out[19]:
array([(datetime.date(1981, 1, 1), 0.0, -6.0, 10.0),
       (datetime.date(1981, 1, 2), 0.0, 3.0, 10.0),
       (datetime.date(1981, 1, 3), 0.0, -5.0, 6.0), ...,
       (datetime.date(2012, 12, 29), 0.0, -1.9, 6.5),
       (datetime.date(2012, 12, 30), 0.0, -3.3, 6.7),
       (datetime.date(2012, 12, 31), 0.0, -3.4, 7.2)], 
      dtype=[('date', '<M8[D]'), ('rainfall', '<f8'), ('Tmin', '<f8'), ('Tmax', '<f8')])
In [20]:
data.shape
Out[20]:
(11688,)
In [22]:
data.dtype.fields
Out[22]:
mappingproxy({'Tmin': (dtype('float64'), 16), 'Tmax': (dtype('float64'), 24), 'date': (dtype('<M8[D]'), 0), 'rainfall': (dtype('float64'), 8)})
In [30]:
data[0]
Out[30]:
(datetime.date(1981, 1, 1), 0.0, -6.0, 10.0)
In [31]:
data['date']
Out[31]:
array(['1981-01-01', '1981-01-02', '1981-01-03', ..., '2012-12-29',
       '2012-12-30', '2012-12-31'], dtype='datetime64[D]')
In [32]:
data['rainfall'].mean()
Out[32]:
1.9436344969199226
In [33]:
data['rainfall'].std()
Out[33]:
5.8859954441709244
In [34]:
data['rainfall'].max()
Out[34]:
112.0
In [35]:
np.median(data['rainfall'])
Out[35]:
0.0

We can select only certain columns

In [37]:
data[['date', 'Tmin', 'Tmax']]
Out[37]:
array([(datetime.date(1981, 1, 1), -6.0, 10.0),
       (datetime.date(1981, 1, 2), 3.0, 10.0),
       (datetime.date(1981, 1, 3), -5.0, 6.0), ...,
       (datetime.date(2012, 12, 29), -1.9, 6.5),
       (datetime.date(2012, 12, 30), -3.3, 6.7),
       (datetime.date(2012, 12, 31), -3.4, 7.2)], 
      dtype=[('date', '<M8[D]'), ('Tmin', '<f8'), ('Tmax', '<f8')])
In [23]:
data[['Tmin', 'Tmax']].mean()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-cd4bf427c4ac> in <module>()
----> 1 data[['Tmin', 'Tmax']].mean()

/usr/lib/python3.3/site-packages/numpy/core/_methods.py in _mean(a, axis, dtype, out, keepdims)
     60         dtype = mu.dtype('f8')
     61 
---> 62     ret = um.add.reduce(arr, axis=axis, dtype=dtype, out=out, keepdims=keepdims)
     63     if isinstance(ret, mu.ndarray):
     64         ret = um.true_divide(

TypeError: cannot perform reduce with flexible type

This is a rec-array, to perform the mean we have to convert the rec-array to a normal one, we can do this operation with:

In [35]:
data_array = data[['rainfall', 'Tmin', 'Tmax']].view(float).reshape(len(data),-1)
In [36]:
data_array.mean(axis=0, )
Out[36]:
array([  1.9436345 ,          nan,  18.79189767])

we have nan because numpy does not handle the nan values.

In [37]:
np.nonzero(np.isnan(data['Tmin']))
Out[37]:
(array([10742, 10834, 10972, 11022, 11043]),)
In [42]:
!head -10770 Klima_LT_N_daily_1981-2012_8320_Bozen-Bolzano.csv | tail -20
         24:05:2010           0,0               11,4              29,5       
         25:05:2010           0,0               13,3              30,5       
         26:05:2010           0,0               15,9              27,6       
         27:05:2010           0,0               12,9              22,2       
         28:05:2010           1,5               11,0              24,0       
         29:05:2010           0,0               14,1              24,2       
         30:05:2010           1,3               15,4              22,7       
         31:05:2010           0,7               ---               22,5       
         01:06:2010           0,0               11,9              26,1       
         02:06:2010           0,0               14,6              25,2       
         03:06:2010           0,0               15,6              26,6       
         04:06:2010           0,0               12,9              32,8       
         05:06:2010           0,0               16,2              31,4       
         06:06:2010           0,0               19,5              32,2       
         07:06:2010           0,0               19,7              26,1       
         08:06:2010           0,0               20,4              29,7       
         09:06:2010           0,0               19,9              31,2       
         10:06:2010           0,0               20,9              29,9       
         11:06:2010           0,0               17,2              33,4       
         12:06:2010           0,0               19,5              31,9       

Masked arrays

In [38]:
data_array_ma = np.ma.masked_array(data_array, np.isnan(data_array))
In [39]:
data_array_ma.mean(axis=0, )
Out[39]:
masked_array(data = [1.9436344969199226 6.597629033638557 18.791897672827037],
             mask = [False False False],
       fill_value = 1e+20)
In [45]:
data_array_ma.std(axis=0, )
Out[45]:
masked_array(data = [5.885995444170924 8.01268084851955 9.408077570471425],
             mask = [False False False],
       fill_value = 1e+20)

Visualize the data

In [128]:
%matplotlib inline
import matplotlib.pyplot as plt

date = data['date'].astype('O')

fig, ax0 = plt.subplots(figsize=(20,5))

ax1 = ax0.twinx()
ax1.plot(date, data['rainfall'], 'k-', label='rainfall', alpha=0.5)
ax1.axis('tight')
ax1.set_title('$Rainfall/T_{min}/T_{max}$ in Bozen/Bolzano')
ax1.set_xlabel('year')
ax1.set_ylabel('rainfall [mm]')

ax0.plot(date, data['Tmin'], 'b-', label='Tmin', alpha=0.7)
ax0.plot(date, data['Tmax'], 'r-', label='Tmin', alpha=0.7)
ax0.set_ylabel('Temperature [°C]')
ax0.grid()
In [106]:
import datetime as dt
In [163]:
%matplotlib inline


def get_index(date, start, dyears=0, dmonths=0, ddays=0):
    indexes = []
    dates = []
    stop = date[0]
    while stop < date[-1]:
        stop = dt.date(start.year + dyears, start.month + dmonths, start.day + ddays)
        istart = np.where(date == start)[0]
        istop = np.where(date == stop)[0]
        indexes.append((istart if istart else None, istop if istop else None))
        dates.append((start, stop))
        start = stop
    return indexes, dates


def split_plot(date, data, indexes, dates, figsize=(10, 5)):
    date = [obj for obj in date]
    fig, axes = plt.subplots(nrows=len(indexes), ncols=1, figsize=figsize)
    fig.tight_layout(h_pad=1.)
    axes[0].set_title('$Rainfall/T_{min}/T_{max}$ in Bozen/Bolzano')
    for (istart, istop), (start, stop), ax0 in zip(indexes, dates, axes):
        ax1 = ax0.twinx()
        ax1.plot(date[istart:istop], data['rainfall'][istart:istop], 'k-', label='rainfall', alpha=0.5)
        #ax1.axis('tight')
        ax1.set_xlabel('year')
        ax1.set_ylabel('rainfall [mm]')
        
        ax0.plot(date[istart:istop], data['Tmin'][istart:istop], 'b-', label='Tmin', alpha=0.7)
        ax0.plot(date[istart:istop], data['Tmax'][istart:istop], 'r-', label='Tmin', alpha=0.7)
        ax0.set_ylabel('Temperature [°C]')
        limits = [start, stop]
        print(limits)
        ax0.set_xlim([start, stop])
        ax0.grid()
    #return fig


dyear = 10
start = dt.date(1980, 1, 1)
indexes, dates = get_index(date, start, dyear)
split_plot(date, data, indexes, dates, figsize=(10, 5))
[datetime.date(1980, 1, 1), datetime.date(1990, 1, 1)]
[datetime.date(1990, 1, 1), datetime.date(2000, 1, 1)]
[datetime.date(2000, 1, 1), datetime.date(2010, 1, 1)]
[datetime.date(2010, 1, 1), datetime.date(2020, 1, 1)]

Save in the native format

The original format in xls is more than 3Mb:

In [170]:
%%sh 
du -h Klima_LT_N_daily_1981-2012_8320_Bozen-Bolzano.xls
3.3M	Klima_LT_N_daily_1981-2012_8320_Bozen-Bolzano.xls

the version converted in csv is less than 1Mb:

In [168]:
%%sh 
du -h Klima_LT_N_daily_1981-2012_8320_Bozen-Bolzano.csv
892K	Klima_LT_N_daily_1981-2012_8320_Bozen-Bolzano.csv

We can save the data in the numpy native format, with:

In [44]:
np.save('bozen.npy', data)

And the file size is less than 400Kb.

In [45]:
%%sh 
du -h bozen.npy
368K	bozen.npy

To load from the native format use the load function

In [165]:
bozen = np.load('bozen.npy')
In [166]:
bozen
Out[166]:
array([(datetime.date(1981, 1, 1), 0.0, -6.0, 10.0),
       (datetime.date(1981, 1, 2), 0.0, 3.0, 10.0),
       (datetime.date(1981, 1, 3), 0.0, -5.0, 6.0), ...,
       (datetime.date(2012, 12, 29), 0.0, -1.9, 6.5),
       (datetime.date(2012, 12, 30), 0.0, -3.3, 6.7),
       (datetime.date(2012, 12, 31), 0.0, -3.4, 7.2)], 
      dtype=[('date', '<M8[D]'), ('rainfall', '<f8'), ('Tmin', '<f8'), ('Tmax', '<f8')])
In [ ]: