In this lesson were going to go back to the basics. We will be working with a small data set so that you can easily understand what I am trying to explain. We will be adding columns, deleting columns, and slicing the data many different ways. Enjoy!
# Import libraries
import pandas as pd
import sys
print('Python version ' + sys.version)
print('Pandas version: ' + pd.__version__)
Python version 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)] Pandas version: 1.3.5
# Our small data set
d = [0,1,2,3,4,5,6,7,8,9]
# Create dataframe
df = pd.DataFrame(d)
df
0 | |
---|---|
0 | 0 |
1 | 1 |
2 | 2 |
3 | 3 |
4 | 4 |
5 | 5 |
6 | 6 |
7 | 7 |
8 | 8 |
9 | 9 |
# Lets change the name of the column
df.columns = ['Rev']
df
Rev | |
---|---|
0 | 0 |
1 | 1 |
2 | 2 |
3 | 3 |
4 | 4 |
5 | 5 |
6 | 6 |
7 | 7 |
8 | 8 |
9 | 9 |
# Lets add a column
df['NewCol'] = 5
df
Rev | NewCol | |
---|---|---|
0 | 0 | 5 |
1 | 1 | 5 |
2 | 2 | 5 |
3 | 3 | 5 |
4 | 4 | 5 |
5 | 5 | 5 |
6 | 6 | 5 |
7 | 7 | 5 |
8 | 8 | 5 |
9 | 9 | 5 |
# Lets modify our new column
df['NewCol'] = df['NewCol'] + 1
df
Rev | NewCol | |
---|---|---|
0 | 0 | 6 |
1 | 1 | 6 |
2 | 2 | 6 |
3 | 3 | 6 |
4 | 4 | 6 |
5 | 5 | 6 |
6 | 6 | 6 |
7 | 7 | 6 |
8 | 8 | 6 |
9 | 9 | 6 |
# We can delete columns
del df['NewCol']
df
Rev | |
---|---|
0 | 0 |
1 | 1 |
2 | 2 |
3 | 3 |
4 | 4 |
5 | 5 |
6 | 6 |
7 | 7 |
8 | 8 |
9 | 9 |
# Lets add a couple of columns
df['test'] = 3
df['col'] = df['Rev']
df
Rev | test | col | |
---|---|---|---|
0 | 0 | 3 | 0 |
1 | 1 | 3 | 1 |
2 | 2 | 3 | 2 |
3 | 3 | 3 | 3 |
4 | 4 | 3 | 4 |
5 | 5 | 3 | 5 |
6 | 6 | 3 | 6 |
7 | 7 | 3 | 7 |
8 | 8 | 3 | 8 |
9 | 9 | 3 | 9 |
# If we wanted, we could change the name of the index
i = ['a','b','c','d','e','f','g','h','i','j']
df.index = i
df
Rev | test | col | |
---|---|---|---|
a | 0 | 3 | 0 |
b | 1 | 3 | 1 |
c | 2 | 3 | 2 |
d | 3 | 3 | 3 |
e | 4 | 3 | 4 |
f | 5 | 3 | 5 |
g | 6 | 3 | 6 |
h | 7 | 3 | 7 |
i | 8 | 3 | 8 |
j | 9 | 3 | 9 |
We can now start to select pieces of the dataframe using *loc*.
df.loc['a']
Rev 0 test 3 col 0 Name: a, dtype: int64
# df.loc[inclusive:inclusive]
df.loc['a':'d']
Rev | test | col | |
---|---|---|---|
a | 0 | 3 | 0 |
b | 1 | 3 | 1 |
c | 2 | 3 | 2 |
d | 3 | 3 | 3 |
# df.iloc[inclusive:exclusive]
# Note: .iloc is strictly integer position based. It is available from [version 0.11.0] (http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#v0-11-0-april-22-2013)
df.iloc[0:3]
Rev | test | col | |
---|---|---|---|
a | 0 | 3 | 0 |
b | 1 | 3 | 1 |
c | 2 | 3 | 2 |
We can also select using the column name.
df['Rev']
a 0 b 1 c 2 d 3 e 4 f 5 g 6 h 7 i 8 j 9 Name: Rev, dtype: int64
df[['Rev', 'test']]
Rev | test | |
---|---|---|
a | 0 | 3 |
b | 1 | 3 |
c | 2 | 3 |
d | 3 | 3 |
e | 4 | 3 |
f | 5 | 3 |
g | 6 | 3 |
h | 7 | 3 |
i | 8 | 3 |
j | 9 | 3 |
# df.ix[rows,columns]
# replaces the deprecated ix function
#df.ix[0:3,'Rev']
df.loc[df.index[0:3],'Rev']
a 0 b 1 c 2 Name: Rev, dtype: int64
# replaces the deprecated ix function
#df.ix[5:,'col']
df.loc[df.index[5:],'col']
f 5 g 6 h 7 i 8 j 9 Name: col, dtype: int64
# replaces the deprecated ix function
#df.ix[:3,['col', 'test']]
df.loc[df.index[:3],['col', 'test']]
col | test | |
---|---|---|
a | 0 | 3 |
b | 1 | 3 |
c | 2 | 3 |
There is also some handy function to select the top and bottom records of a dataframe.
# Select top N number of records (default = 5)
df.head()
Rev | test | col | |
---|---|---|---|
a | 0 | 3 | 0 |
b | 1 | 3 | 1 |
c | 2 | 3 | 2 |
d | 3 | 3 | 3 |
e | 4 | 3 | 4 |
# Select bottom N number of records (default = 5)
df.tail()
Rev | test | col | |
---|---|---|---|
f | 5 | 3 | 5 |
g | 6 | 3 | 6 |
h | 7 | 3 | 7 |
i | 8 | 3 | 8 |
j | 9 | 3 | 9 |
This tutorial was created by HEDARO