fast.ai map of p1v3 participants

In [1]:
import numpy as np
import pandas as pd

#put the path to your data files here
#PATH = '...'
In [2]:
df = pd.read_csv(f'{PATH}fastaip1v3.csv')
df.head()
Out[2]:
lat long
0 -44 173
1 -43 147
2 -39 -73
3 -38 145
4 -38 145
In [3]:
df = df.groupby(['lat','long']).agg({'lat':'count'})
df.rename(columns={'lat': 'count'}, inplace=True)
df.reset_index(level=['lat', 'long'], inplace=True)
df.head()
Out[3]:
lat long count
0 -44 173 1
1 -43 147 1
2 -39 -73 1
3 -38 145 29
4 -35 -59 1
In [4]:
df.shape, df['count'].sum()
Out[4]:
((491, 3), 2721)

World Cities Database downloaded from kaggle

https://www.kaggle.com/max-mind/world-cities-database

In [5]:
df_places = pd.read_csv(f'{PATH}worldcitiespop.csv', low_memory=False)
df_places.head()
Out[5]:
Country City AccentCity Region Population Latitude Longitude
0 ad aixas Aixàs 06 NaN 42.483333 1.466667
1 ad aixirivali Aixirivali 06 NaN 42.466667 1.500000
2 ad aixirivall Aixirivall 06 NaN 42.466667 1.500000
3 ad aixirvall Aixirvall 06 NaN 42.466667 1.500000
4 ad aixovall Aixovall 06 NaN 42.466667 1.483333
In [6]:
df_places['lat'] = df_places.Latitude.round()
df_places['long'] = df_places.Longitude.round()
df_places.head()
Out[6]:
Country City AccentCity Region Population Latitude Longitude lat long
0 ad aixas Aixàs 06 NaN 42.483333 1.466667 42.0 1.0
1 ad aixirivali Aixirivali 06 NaN 42.466667 1.500000 42.0 2.0
2 ad aixirivall Aixirivall 06 NaN 42.466667 1.500000 42.0 2.0
3 ad aixirvall Aixirvall 06 NaN 42.466667 1.500000 42.0 2.0
4 ad aixovall Aixovall 06 NaN 42.466667 1.483333 42.0 1.0
In [ ]:
#hack (see below)
# for checking manual addition of Country	City	AccentCity where
# there are no population figures
tmp = df_places.loc[df_places['lat'] == 24.0]
tmp = tmp.loc[df_places['long'] == 40.0]
tmp.head()

Many places have the same lat and long. Select the place with the highest population.

In [8]:
idx = df_places.groupby(['lat','long'])['Population'].transform(max) == df_places['Population']
df_places = df_places[idx]
df_places.head()
Out[8]:
Country City AccentCity Region Population Latitude Longitude lat long
93 ae abu dhabi Abu Dhabi 01 603687.0 24.466667 54.366667 24.0 54.0
242 ae dubai Dubai 03 1137376.0 25.258172 55.304717 25.0 55.0
4547 af asadabad Asadabad 34 48400.0 34.873113 71.146970 35.0 71.0
5410 af aybak Aybak 32 47877.0 36.264680 68.015512 36.0 68.0
6644 af baglan Baglan 03 108481.0 36.130684 68.708286 36.0 69.0
In [9]:
df_map = df.merge(df_places, how='left', on=['lat','long'])
df_map=df_map[['lat','long','count','Country','AccentCity','Region']]
df_map.head()
Out[9]:
lat long count Country AccentCity Region
0 -44 173 1 nz Christchurch E9
1 -43 147 1 au Hobart 06
2 -39 -73 1 cl Temuco 04
3 -38 145 29 au Melbourne 07
4 -35 -59 1 ar Lujan 01
In [10]:
df_map.shape, df_map['count'].sum() #fails sanity check
Out[10]:
((496, 6), 2801)

Problem with repeated values of lat long count

In [11]:
for i in range (1, df_map.shape[0]):
    if (df_map.loc[i,['lat','long', 'count']] == df_map.loc[i-1,['lat','long', 'count']]).all():
        print (i)
109
149
241
444
490
In [12]:
df_map.iloc[108:110]
Out[12]:
lat long count Country AccentCity Region
108 21 106 4 vn Hanoi 44
109 21 106 4 vn Ha Noi 44
In [13]:
df_map.drop(df_map.index[109], inplace=True)
In [14]:
df_map.iloc[147:149]
Out[14]:
lat long count Country AccentCity Region
148 29 77 67 in Delhi 07
149 29 77 67 in New Delhi 07
In [15]:
df_map.drop(df_map.index[148], inplace=True)
In [16]:
df_map.iloc[238:240]
Out[16]:
lat long count Country AccentCity Region
240 39 -95 1 us Kansas City KS
241 39 -95 1 us Kansas City MO
In [17]:
df_map.drop(df_map.index[239], inplace=True)
In [18]:
df_map.iloc[440:442]
Out[18]:
lat long count Country AccentCity Region
443 53 -1 7 gb Leicester H4
444 53 -1 7 gb Leicester H5
In [19]:
df_map.drop(df_map.index[441], inplace=True)
In [20]:
df_map.iloc[485:487]
Out[20]:
lat long count Country AccentCity Region
489 60 18 1 se Uppsala 21
490 60 18 1 se Uppsala 26
In [21]:
df_map.drop(df_map.index[486], inplace=True)
In [22]:
df_map.shape, df_map['count'].sum() #passes sanity check
Out[22]:
((491, 6), 2721)

Some places had no Population values, all NaNs

In [23]:
df_map.iloc[df_map.isnull().any(1).nonzero()[0]]
Out[23]:
lat long count Country AccentCity Region
122 24 40 1 NaN NaN NaN
172 31 122 1 NaN NaN NaN
186 33 77 2 NaN NaN NaN
200 35 127 1 NaN NaN NaN
215 36 108 1 NaN NaN NaN
216 36 127 1 NaN NaN NaN
217 36 129 1 NaN NaN NaN
234 38 0 2 NaN NaN NaN
249 39 39 1 NaN NaN NaN
354 47 -89 1 NaN NaN NaN

Manually allocate Country AccentCity Region by looking at

https://www.findlatitudeandlongitude.com

and looking at data from df_places at the #hack above

122 24 40 sa Medina 05

172 31 122 cn Shanghai 23

186 33 77 in Darcha 11

200 35 127 kr Gwangju 16

215 36 108 cn Qingyang 15

216 36 127 kr Jeonju 17

217 36 129 kr Yeongcheon 14

234 38 0 es San Juan de Alicante 60

249 39 39 tr Elazig 62

354 47 -89 us Houghton MI

In [24]:
df_map.loc[122, ['Country','AccentCity','Region']]=['sa','Medina','05']
df_map.loc[172, ['Country','AccentCity','Region']]=['cn','Shanghai','23']
df_map.loc[186, ['Country','AccentCity','Region']]=['in','Darcha','11']
df_map.loc[200, ['Country','AccentCity','Region']]=['kr','Gwangju','16']
df_map.loc[215, ['Country','AccentCity','Region']]=['cn','Qingyang','15']
df_map.loc[216, ['Country','AccentCity','Region']]=['kr','Jeonju','17']
df_map.loc[217, ['Country','AccentCity','Region']]=['kr','Yeongcheon','14']
df_map.loc[234, ['Country','AccentCity','Region']]=['es', 'San Juan de Alicante', '60']
df_map.loc[249, ['Country','AccentCity','Region']]=['tr', 'Elazig', '62']
df_map.loc[354, ['Country','AccentCity','Region']]=['us', 'Houghton', 'MI']
df_map.loc[122]
Out[24]:
lat               24
long              40
count              1
Country           sa
AccentCity    Medina
Region            05
Name: 122, dtype: object

Convert two-letter country codes to upper case

In [25]:
df_map['Country'] = df_map['Country'].apply(lambda x: x.upper())
df_map.head()
Out[25]:
lat long count Country AccentCity Region
0 -44 173 1 NZ Christchurch E9
1 -43 147 1 AU Hobart 06
2 -39 -73 1 CL Temuco 04
3 -38 145 29 AU Melbourne 07
4 -35 -59 1 AR Lujan 01

ISO countries and regional codes downloaded from github

https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes

In [26]:
df_iso = pd.read_csv(f'{PATH}all.csv')
df_iso.head()
Out[26]:
name alpha-2 alpha-3 country-code iso_3166-2 region sub-region intermediate-region region-code sub-region-code intermediate-region-code
0 Afghanistan AF AFG 4 ISO 3166-2:AF Asia Southern Asia NaN 142.0 34.0 NaN
1 Åland Islands AX ALA 248 ISO 3166-2:AX Europe Northern Europe NaN 150.0 154.0 NaN
2 Albania AL ALB 8 ISO 3166-2:AL Europe Southern Europe NaN 150.0 39.0 NaN
3 Algeria DZ DZA 12 ISO 3166-2:DZ Africa Northern Africa NaN 2.0 15.0 NaN
4 American Samoa AS ASM 16 ISO 3166-2:AS Oceania Polynesia NaN 9.0 61.0 NaN
In [27]:
df_iso.rename(columns = {'alpha-2':'Country'}, inplace=True)
df_map = df_map.merge(df_iso, how='left', on=['Country'])
df_map.head()
Out[27]:
lat long count Country AccentCity Region name alpha-3 country-code iso_3166-2 region sub-region intermediate-region region-code sub-region-code intermediate-region-code
0 -44 173 1 NZ Christchurch E9 New Zealand NZL 554 ISO 3166-2:NZ Oceania Australia and New Zealand NaN 9.0 53.0 NaN
1 -43 147 1 AU Hobart 06 Australia AUS 36 ISO 3166-2:AU Oceania Australia and New Zealand NaN 9.0 53.0 NaN
2 -39 -73 1 CL Temuco 04 Chile CHL 152 ISO 3166-2:CL Americas Latin America and the Caribbean South America 19.0 419.0 5.0
3 -38 145 29 AU Melbourne 07 Australia AUS 36 ISO 3166-2:AU Oceania Australia and New Zealand NaN 9.0 53.0 NaN
4 -35 -59 1 AR Lujan 01 Argentina ARG 32 ISO 3166-2:AR Americas Latin America and the Caribbean South America 19.0 419.0 5.0
In [28]:
df_map.groupby('sub-region').agg({'count':'sum'}).sort_values('count', ascending=False)
Out[28]:
count
sub-region
Northern America 807
Southern Asia 730
Western Europe 223
Eastern Europe 168
Northern Europe 164
South-eastern Asia 126
Sub-Saharan Africa 93
Latin America and the Caribbean 84
Australia and New Zealand 81
Southern Europe 79
Eastern Asia 69
Northern Africa 47
Western Asia 45
Central Asia 5
In [30]:
df_map.shape, df_map['count'].sum() #passes sanity check
Out[30]:
((491, 16), 2721)
In [29]:
df_map.to_csv('map.csv')

Steps in Tableau

  1. ‘Connect’ to a ‘Text file’ map.csv
  2. Go to worksheet
  3. In ‘Measures’ click on # symbol left of ‘Long’, set ‘Geographic role’ to ‘Longitude’
  4. Drag ‘Lat’ and ‘Long’ to ‘Dimensions’
  5. Drag ‘Lat’ and ‘Long’ to ‘Columns’ and ‘Rows’
  6. Click on world map on the right
  7. Drag to ‘Marks’ : ‘Count’, ‘Accent City’, ‘Country’, ‘Country-Code’, ‘Name’, ‘Sub-Region’
  8. Drag ‘Sub-Region’ to ‘Color’ box
  9. Drag ‘SUM(Count)’ to ‘Size’ box, adjust with slider