Nikolay Koldunov
koldunovn@gmail.com
================
import numpy as np
%matplotlib inline
np.set_printoptions(precision=3 , suppress= True) # this is just to make the output look better
Load data in to a variable:
temp = np.loadtxt('Ham_3column.txt')
temp
array([[1891., 1., 1., -72.], [1891., 1., 2., -43.], [1891., 1., 3., -32.], ..., [2014., 8., 29., 216.], [2014., 8., 30., 198.], [2014., 8., 31., 184.]])
temp.shape
(45168, 4)
temp.size
180672
So it's a row-major order. Matlab and Fortran use column-major order for arrays.
type(temp)
numpy.ndarray
Numpy arrays are statically typed, which allow faster operations
temp.dtype
dtype('float64')
You can't assign value of different type to element of the numpy array:
temp[0,0] = 'Year'
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-9-c2dbfa0bc312> in <module> ----> 1 temp[0,0] = 'Year' ValueError: could not convert string to float: 'Year'
Slicing works similarly to Matlab:
temp[0:5,:]
array([[1891., 1., 1., -72.], [1891., 1., 2., -43.], [1891., 1., 3., -32.], [1891., 1., 4., 12.], [1891., 1., 5., -29.]])
temp[-5:-1,:]
array([[2014., 8., 27., 219.], [2014., 8., 28., 234.], [2014., 8., 29., 216.], [2014., 8., 30., 198.]])
One can look at the data. This is done by matplotlib module:
import matplotlib.pylab as plt
plt.plot(temp[:,3])
[<matplotlib.lines.Line2D at 0x11b17be10>]
In general it is similar to Matlab
First 12 elements of second column (months). Remember that indexing starts with 0:
temp[0:12,2]
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.])
First raw:
temp[0,:]
array([1891., 1., 1., -72.])
We can create mask, selecting all raws where values in third raw (days) equals 10:
mask = (temp[:,2]==10)
Here we apply this mask and show only first 5 raws of the array:
temp[mask][:20,:]
array([[1891., 1., 10., -89.], [1891., 2., 10., -19.], [1891., 3., 10., 32.], [1891., 4., 10., 84.], [1891., 5., 10., 188.], [1891., 6., 10., 130.], [1891., 7., 10., 161.], [1891., 8., 10., 171.], [1891., 9., 10., 221.], [1891., 10., 10., 181.], [1891., 11., 10., 80.], [1891., 12., 10., 107.], [1892., 1., 10., -4.], [1892., 2., 10., 36.], [1892., 3., 10., 16.], [1892., 4., 10., 146.], [1892., 5., 10., 195.], [1892., 6., 10., 205.], [1892., 7., 10., 209.], [1892., 8., 10., 155.]])
You don't have to create separate variable for mask, but apply it directly. Here instead of first five rows I show five last rows:
temp[temp[:,2]==10][-5:,:]
array([[2014., 4., 10., 116.], [2014., 5., 10., 27.], [2014., 6., 10., 300.], [2014., 7., 10., 277.], [2014., 8., 10., 259.]])
You can combine conditions. In this case we select days from 10 to 12 (only first 10 elements are shown):
temp[(temp[:,2]>=10)&(temp[:,2]<=12)][0:10,:]
array([[1891., 1., 10., -89.], [1891., 1., 11., 16.], [1891., 1., 12., 21.], [1891., 2., 10., -19.], [1891., 2., 11., 36.], [1891., 2., 12., 31.], [1891., 3., 10., 32.], [1891., 3., 11., 46.], [1891., 3., 12., 46.], [1891., 4., 10., 84.]])
Select only summer months
Select only first half of the year
Create example array from first 12 values of second column and perform some basic operations:
days = temp[0:12,2]
days
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.])
days+10
array([11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22.])
days*20
array([ 20., 40., 60., 80., 100., 120., 140., 160., 180., 200., 220., 240.])
days*days
array([ 1., 4., 9., 16., 25., 36., 49., 64., 81., 100., 121., 144.])
What's wrong with this figure?
plt.plot(temp[:100,3])
[<matplotlib.lines.Line2D at 0x11bfaee50>]
Create new array that will contain only temperatures
Convert temperature to deg C
Convert all temperatures to deg F
Create temp_values that will contain only data values:
temp_values = temp[:,3]/10.
temp_values
array([-7.2, -4.3, -3.2, ..., 21.6, 19.8, 18.4])
Simple statistics:
temp_values.min()
-14.6
temp_values.max()
37.3
temp_values.mean()
12.488779667020898
temp_values.std()
8.05358769929934
temp_values.sum()
564093.2
You can also use sum function:
np.sum(temp_values)
564093.2
One can make operations on the subsets:
Calculate mean for first 1000 values of temperature
You can save your data as a text file
np.savetxt('temp_only_values.csv',temp[:, 3]/10., fmt='%.4f')
Head of resulting file:
!head temp_only_values.csv
-7.2000 -4.3000 -3.2000 1.2000 -2.9000 -4.3000 -3.7000 -9.7000 -9.9000 -8.9000
You can also save it as binary:
f=open('temp_only_values.bin', 'w')
temp[:,3].tofile(f)
f.close()