Pandas basics

In [1]:
import pandas as pd

Pandas series

pandas series is similar to numpy array, But it suppport lots of extra functionality like Pandaseries.describe()

Basic acces is samilar to numpy arrary, it support access by index( s[5] ) or slicing ( s[5:10] ).
It also support vectorise operation and looping like numpy array.
Implemented in C so it works very fast.

Benfits of Pandas series

In [8]:
s=pd.Series([2,3,4,5,6])
print s.describe()
count    5.000000
mean     4.000000
std      1.581139
min      2.000000
25%      3.000000
50%      4.000000
75%      5.000000
max      6.000000
dtype: float64

Pandas Index

Hybrid of list and python Dictionary. It map key value pair.

In [11]:
sal=pd.Series([40,12,43,56],
             index=['Ram',
                  'Syam',
                  "Rahul",
                  "Ganesh"])
print sal
Ram       40
Syam      12
Rahul     43
Ganesh    56
dtype: int64
In [20]:
print sal[0]
40

lookUp by index

In [21]:
print sal.loc["Syam"]
12

Using sal[position] is not prefered instead prefer to use sal.iloc[position] becouse Index has different meaning in series so it avoid confusion

In [19]:
print sal.iloc[3]
56

argmax() function return index of max value element

In [24]:
print sal.argmax()
Ganesh
In [25]:
print sal.loc["Ganesh"]
print sal.max()
56
56

Adding series with Differen index

In [27]:
a=pd.Series([1,2,3,4],
            index=["a","b","c","d"])
b=pd.Series([9,8,7,6],
           index=["c","d","e","f"])
print a
a    1
b    2
c    3
d    4
dtype: int64
In [28]:
print b
c    9
d    8
e    7
f    6
dtype: int64
In [29]:
print a+b
a   NaN
b   NaN
c    12
d    12
e   NaN
f   NaN
dtype: float64

C,D are common in both so added correctly rest are just assign a volue NaN (Not a number)

we can modify it such that in case of mismatch original data will assign instead of NaN or drop All NaN

In [35]:
res = (a+b)
print res.dropna()
c    12
d    12
dtype: float64

Treat missing values as 0

In [37]:
res=a.add(b,fill_value=0)
print res
a     1
b     2
c    12
d    12
e     7
f     6
dtype: float64

s.apply(function_name) used to apply some operation on each element.

Example:

adding 5 to each element , we can do this by simply series+5 becouse it is a vector, But lets do using this new techniqe s.apply(function)

In [39]:
print res
a     1
b     2
c    12
d    12
e     7
f     6
dtype: float64
In [40]:
print res+5
a     6
b     7
c    17
d    17
e    12
f    11
dtype: float64
In [41]:
def add_5(x):
    return x+5
In [44]:
print res.apply(add_5)
a     6
b     7
c    17
d    17
e    12
f    11
dtype: float64

Plotting

automaticaly plot index vs data plot

In [47]:
%pylab inline
res.plot()
Populating the interactive namespace from numpy and matplotlib
Out[47]:
<matplotlib.axes.AxesSubplot at 0x5746350>