pandas methods and geopandas dataframe objects

This notebook demonstrates the use of a number of methods and attributes of pandas' dataframes on GeoPandas GeoDataframe objects. Because GeoDataframe objects are subclasses of pandas dataframe objects, they can be used if they were dataframe objects.

In [36]:
#import module and shapefile
import geopandas as gpd

df = gpd.read_file("C:\data\mtbs_fod_pts_data\mtbs_fod_pts_20170501.shp")

#display GeoDataFrame, looks just like pandas dataframe
df.head()
Out[36]:
FIRE_ID FIRENAME ASMNT_TYPE PRE_ID POST_ID ND_T IG_T LOW_T MOD_T HIGH_T ... ADMIN MTBS_ZONE GACC HUC4_CODE HUC4_NAME Version RevCode RelDate Fire_Type geometry
0 AK6529614185119990612 WITCH Extended 50660141994192 50650141999199 -970.0 -150.0 75.0 350.0 600.0 ... NPS Alaska Alaska 19040401 Eagle to Circle Revised A 2013-08-31 WF POINT (-141.851 65.29600000000001)
1 AK6775716231420040531 UVGOON CREEK Initial 70800122002176 70800122004183 -970.0 -150.0 -10.0 250.0 9999.0 ... NPS Alaska Alaska 19050403 Lower Noatuk River Revised A 2013-08-31 WF POINT (-162.314 67.75700000000001)
2 AL3038308812219980404 UNNAMED Initial (SS) NA 50210391998128 -9999.0 -9999.0 650.0 -9999.0 -9999.0 ... OTHER Southeast Southern 03160205 Mobile Bay Revised A 2013-08-31 WF POINT (-88.122 30.383)
3 AL3041008830120050916 BAY FIRE Initial (SS) NA 50210392005291 -9999.0 -9999.0 525.0 72.0 -175.0 ... OTHER Southeast Southern 03170009 Mississippi Coastal Revised A 2013-08-31 WF POINT (-88.301 30.41)
4 AL3067208833720050207 UNNAMED Initial 50210392004049 50210392005083 -970.0 -150.0 60.0 450.0 9999.0 ... OTHER Southeast Southern 03170008 Escatawpa Revised A 2013-08-31 RX POINT (-88.337 30.672)

5 rows × 30 columns

In [9]:
# components of GeoDataFrame object follow those of pandas dataframes. 
#Here, all GeoDataframe attributes are set to their own variable
index = df.index
columns = df.columns
data = df.values
In [3]:
index # returns number of rows as tuple. is referred to as "row index"
Out[3]:
RangeIndex(start=0, stop=20340, step=1)
In [6]:
columns # "column index"
Out[6]:
Index(['FIRE_ID', 'FIRENAME', 'ASMNT_TYPE', 'PRE_ID', 'POST_ID', 'ND_T',
       'IG_T', 'LOW_T', 'MOD_T', 'HIGH_T', 'FIRE_YEAR', 'FIRE_MON', 'FIRE_DAY',
       'LAT', 'LONG', 'WRS_PATH', 'WRS_ROW', 'P_ACRES', 'R_ACRES', 'STATE',
       'ADMIN', 'MTBS_ZONE', 'GACC', 'HUC4_CODE', 'HUC4_NAME', 'Version',
       'RevCode', 'RelDate', 'Fire_Type', 'geometry'],
      dtype='object')
In [7]:
data # returns attributes for all rows
Out[7]:
array([['AK6529614185119990612', 'WITCH', 'Extended', ..., '2013-08-31',
        'WF', <shapely.geometry.point.Point object at 0x0000018EB8F4E400>],
       ['AK6775716231420040531', 'UVGOON CREEK', 'Initial', ...,
        '2013-08-31', 'WF',
        <shapely.geometry.point.Point object at 0x0000018EB8F4E9E8>],
       ['AL3038308812219980404', 'UNNAMED', 'Initial (SS)', ...,
        '2013-08-31', 'WF',
        <shapely.geometry.point.Point object at 0x0000018EB8F4EF98>],
       ..., 
       ['WY4495310930920110821', 'HOLE IN THE WALL', 'Extended (SS)', ...,
        '2013-07-31', 'WF',
        <shapely.geometry.point.Point object at 0x0000018EBEDC0C88>],
       ['WY4497710769520030809', 'LITTLE HORN II', 'Extended', ...,
        '2009-11-18', 'WF',
        <shapely.geometry.point.Point object at 0x0000018EBEDC31D0>],
       ['WY4499610614320120921', 'BORDER', 'Initial', ..., '2014-04-16',
        'WF', <shapely.geometry.point.Point object at 0x0000018EBEDC3748>]], dtype=object)
In [10]:
#RangeIndex is a special type of index object analogous to a Python range object, 
#saving memory by only printing stop start and step values
type(index)
Out[10]:
pandas.core.indexes.range.RangeIndex
In [11]:
type(columns)
Out[11]:
pandas.core.indexes.base.Index
In [12]:
type(data) # all three are exactly the same as with a pandas dataframe. 
Out[12]:
numpy.ndarray
In [37]:
#print data types per column. mostly objects because attribute tables from shapefiles contain text as well as numbers
df.dtypes.head() 
Out[37]:
FIRE_ID       object
FIRENAME      object
ASMNT_TYPE    object
PRE_ID        object
POST_ID       object
dtype: object
In [14]:
# count all different data types
df.get_dtype_counts()
Out[14]:
float64    14
object     16
dtype: int64
In [19]:
# print type of column, returns pandas Series object
type(df['FIRENAME'])
Out[19]:
pandas.core.series.Series
In [21]:
# compare different types of columns data
firename = df['FIRENAME'] #object type 
fireyear = df['FIRE_YEAR'] #float64 type
In [38]:
# for object, returns sum of counts
firename.value_counts().head()
Out[38]:
UNNAMED       7594
COTTONWOOD      23
BEAR CREEK      16
COYOTE          16
ROCK CREEK      14
Name: FIRENAME, dtype: int64
In [39]:
# returns counts per year
fireyear.value_counts().head()
Out[39]:
2011.0    1708
2010.0    1220
2015.0    1152
2006.0    1063
2014.0    1001
Name: FIRE_YEAR, dtype: int64
In [24]:
# describe method used on object type
firename.describe()
Out[24]:
count       20340
unique      10534
top       UNNAMED
freq         7594
Name: FIRENAME, dtype: object
In [25]:
# describe method used on float64 type, returns more statistics as expected with numerical values
fireyear.describe()
Out[25]:
count    20340.000000
mean      2003.333972
std          8.894303
min       1984.000000
25%       1997.000000
50%       2006.000000
75%       2011.000000
max       2015.000000
Name: FIRE_YEAR, dtype: float64
In [40]:
# returns boolean and checks if value is null
firename.isnull().head()
Out[40]:
0    False
1    False
2    False
3    False
4    False
Name: FIRENAME, dtype: bool
In [41]:
fireyear.isnull().head()
Out[41]:
0    False
1    False
2    False
3    False
4    False
Name: FIRE_YEAR, dtype: bool
In [29]:
# checks if there are NAN values in column
firename.hasnans
Out[29]:
False
In [30]:
fireyear.hasnans
Out[30]:
False
In [42]:
# checks if there are 0 values in column, returns boolean
fireyear.notnull().head()
Out[42]:
0    True
1    True
2    True
3    True
4    True
Name: FIRE_YEAR, dtype: bool
In [33]:
# checks if row value corresponds with a certain value
firename = df['FIRENAME']
firename == "UNNAMED"
Out[33]:
0        False
1        False
2         True
3        False
4         True
5         True
6         True
7         True
8         True
9         True
10        True
11       False
12        True
13        True
14        True
15        True
16       False
17       False
18       False
19       False
20       False
21       False
22       False
23        True
24       False
25       False
26       False
27       False
28       False
29        True
         ...  
20310    False
20311     True
20312    False
20313    False
20314    False
20315    False
20316    False
20317    False
20318    False
20319    False
20320    False
20321    False
20322    False
20323    False
20324    False
20325    False
20326    False
20327    False
20328    False
20329    False
20330    False
20331    False
20332    False
20333    False
20334    False
20335    False
20336    False
20337    False
20338    False
20339    False
Name: FIRENAME, Length: 20340, dtype: bool
In [34]:
# example of chaining methods
fireyear.isnull().sum()
Out[34]:
0
In [43]:
# set a different column as index, in this case "FIRENAME"
df2 = df.set_index('FIRENAME')
df2.head()
Out[43]:
FIRE_ID ASMNT_TYPE PRE_ID POST_ID ND_T IG_T LOW_T MOD_T HIGH_T FIRE_YEAR ... ADMIN MTBS_ZONE GACC HUC4_CODE HUC4_NAME Version RevCode RelDate Fire_Type geometry
FIRENAME
WITCH AK6529614185119990612 Extended 50660141994192 50650141999199 -970.0 -150.0 75.0 350.0 600.0 1999.0 ... NPS Alaska Alaska 19040401 Eagle to Circle Revised A 2013-08-31 WF POINT (-141.851 65.29600000000001)
UVGOON CREEK AK6775716231420040531 Initial 70800122002176 70800122004183 -970.0 -150.0 -10.0 250.0 9999.0 2004.0 ... NPS Alaska Alaska 19050403 Lower Noatuk River Revised A 2013-08-31 WF POINT (-162.314 67.75700000000001)
UNNAMED AL3038308812219980404 Initial (SS) NA 50210391998128 -9999.0 -9999.0 650.0 -9999.0 -9999.0 1998.0 ... OTHER Southeast Southern 03160205 Mobile Bay Revised A 2013-08-31 WF POINT (-88.122 30.383)
BAY FIRE AL3041008830120050916 Initial (SS) NA 50210392005291 -9999.0 -9999.0 525.0 72.0 -175.0 2005.0 ... OTHER Southeast Southern 03170009 Mississippi Coastal Revised A 2013-08-31 WF POINT (-88.301 30.41)
UNNAMED AL3067208833720050207 Initial 50210392004049 50210392005083 -970.0 -150.0 60.0 450.0 9999.0 2005.0 ... OTHER Southeast Southern 03170008 Escatawpa Revised A 2013-08-31 RX POINT (-88.337 30.672)

5 rows × 29 columns