Firstly, I import the data from Terrassa and its neighborhoods from Watson Studio.
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3
def __iter__(self): return 0
# The code was removed by Watson Studio for sharing.
body = client_0c0bb06d086e4548b74bdc75a0202b31.get_object(Bucket='courseracapstone-donotdelete-pr-vgeqqkhebpzdpz',Key='terrassa_cp.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )
df = pd.read_csv(body)
df.head()
df.tail()
DISTRITO | NOMBRE | BARRIO | CP | |
---|---|---|---|---|
37 | 6 | NORD-EST | CAN MONTLLOR | 8226 |
38 | 6 | NORD-EST | CAN TUSELL | 8226 |
39 | 6 | NORD-EST | ÈGARA | 8226 |
40 | 6 | NORD-EST | SANT PERE NORD | 8226 |
41 | 6 | NORD-EST | SANT LLORENÇ | 8226 |
I'll need geolocation data for each borough so I import Nominatim
!pip install geopy
from geopy.geocoders import Nominatim # import geocoder
print('Geopy installed!')
Requirement already satisfied: geopy in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (2.0.0) Requirement already satisfied: geographiclib<2,>=1.49 in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (from geopy) (1.50) Geopy installed!
Then I get the coordinates for each neighborhood of the city. Then, attach the values to the initial dataframe.
codigopostal=df['BARRIO']
geolocator = Nominatim(user_agent="trs_explorer")
lat1=[]
lng1=[]
i=0
while i < len(codigopostal):
adress = '{}, Terrassa'.format(codigopostal.iloc[i])
location = geolocator.geocode(adress)
lat1.append(location.latitude)
lng1.append(location.longitude)
i+=1
print(lat1, lng1)
[41.568122, 41.5616629, 41.563695, 41.566705400000004, 41.5698673, 41.5651401, 41.5576742, 41.5636816, 41.5591743, 41.5486406, 41.5558373, 41.5547426, 41.5384149, 41.528402150000005, 41.5529442, 41.5549277, 41.555971, 41.5610392, 41.5415984, 41.5536025, 41.5456455, 41.560813, 41.5580054, 41.5503669, 41.5700966, 41.5706224, 41.5734892, 41.5819269, 41.6030075, 41.592014, 41.5821582, 41.590154, 41.57606295, 41.5722124, 41.5765175, 41.5700474, 41.5735713, 41.5760017, 41.579434, 41.5704929, 41.5754692, 41.5779893] [2.0181104, 2.0198819, 2.011765, 2.0257736010366303, 2.0142622, 2.0310002, 2.0353672, 2.041227, 2.0390651, 2.0188198, 2.0233033, 2.0292697, 2.0422606, 2.033451695297567, 2.027317, 2.012964, 2.0338187, 2.0030872, 1.9864402, 2.0041857, 1.9974909, 1.9958264, 1.9914735, 1.998714, 2.0007054, 1.997071, 1.9786736, 2.0100153, 1.9587632, 2.0292331, 2.0146286, 2.0188661, 2.008812147844067, 2.0140207, 2.0023154, 2.0362315, 2.0392091, 2.0337363, 2.0187554, 2.0305142, 2.0233688, 2.0309005]
df['LATITUD']=lat1
df['LONGITUD']=lng1
df.head()
DISTRITO | NOMBRE | BARRIO | CP | LATITUD | LONGITUD | |
---|---|---|---|---|---|---|
0 | 1 | CENTRE | ANTIC POBLE DE SANT PERE | 8221 | 41.568122 | 2.018110 |
1 | 1 | CENTRE | CEMENTIRI VELL | 8221 | 41.561663 | 2.019882 |
2 | 1 | CENTRE | CENTRE | 8221 | 41.563695 | 2.011765 |
3 | 1 | CENTRE | PLAÇA DE CATALUNYA / ESCOLA INDUSTRIAL | 8221 | 41.566705 | 2.025774 |
4 | 1 | CENTRE | VALLPARADÍS | 8221 | 41.569867 | 2.014262 |
With this dataframe, I want to look for all the restaurants registered in Foursquare in Terrassa, plot them in a map, coloured by category.
#import required libraries
import numpy as np
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!pip install folium
import folium
print('libraries ready!')
Requirement already satisfied: folium in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (0.12.0) Requirement already satisfied: jinja2>=2.9 in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (from folium) (2.11.2) Requirement already satisfied: branca>=0.3.0 in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (from folium) (0.4.2) Requirement already satisfied: numpy in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (from folium) (1.18.5) Requirement already satisfied: requests in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (from folium) (2.24.0) Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (from jinja2>=2.9->folium) (1.1.1) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (from requests->folium) (1.25.9) Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (from requests->folium) (3.0.4) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (from requests->folium) (2020.12.5) Requirement already satisfied: idna<3,>=2.5 in /opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages (from requests->folium) (2.9) libraries ready!
I need the location of Terrassa to center the following maps.
# Let's get the coordinates of Terrassa
address = 'Terrassa, Barcelona, Spain'
geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Terrassa are {}, {}.'.format(latitude, longitude))
The geograpical coordinate of Terrassa are 41.5629885, 2.0102442.
# create map of Terrassa using latitude and longitude values
map_terrassa = folium.Map(location=[latitude, longitude], zoom_start=12)
# add markers to map
for lat, lng, neigh in zip(df['LATITUD'], df['LONGITUD'], df['BARRIO']):
label = '{}'.format(neigh)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=8,
popup=label,
color='purple',
fill=True,
fill_color='purple',
fill_opacity=0.5,
parse_html=False).add_to(map_terrassa)
map_terrassa
Let's get Foursquare ready to use:
# The code was removed by Watson Studio for sharing.
Now let's limit the venues results and the maximum radius where Foursquare should look in each neighborhood.
LIMIT=100
radius=500
def getNearbyVenues(names, latitudes, longitudes, radius=400):
venues_list=[]
for name, lat, lng in zip(names, latitudes, longitudes):
print(name)
# create the API request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
lng,
radius,
LIMIT)
# make the GET request
results = requests.get(url).json()["response"]['groups'][0]['items']
# return only relevant information for each nearby venue
venues_list.append([(
name,
lat,
lng,
v['venue']['name'],
v['venue']['location']['lat'],
v['venue']['location']['lng'],
v['venue']['categories'][0]['name']) for v in results])
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['Neighborhood',
'Neighborhood Latitude',
'Neighborhood Longitude',
'Venue',
'Venue Latitude',
'Venue Longitude',
'Venue Category']
return(nearby_venues)
Now we use the function:
terrassa_venues = getNearbyVenues(names=df['BARRIO'],
latitudes=df['LATITUD'],
longitudes=df['LONGITUD']
)
ANTIC POBLE DE SANT PERE CEMENTIRI VELL CENTRE PLAÇA DE CATALUNYA / ESCOLA INDUSTRIAL VALLPARADÍS CA N'ANGLADA MONTSERRAT TORRE-SANA VILARDELL CAN JOFRESA CAN PALET CAN PALET II CAN PARELLADA LES FONTS GUADALHORCE SEGLE XX XÚQUER CA N'AURELL CAN PALET DE VISTA ALEGRE LA COGULLADA LES MARTINES LA MAURINA ROC BLANC VISTA ALEGRE CAN BOADA (NUCLI ANTIC) CAN BOADA DEL PI CAN GONTERES CAN ROCA ELS CAUS / ELS PINETONS FONT DE L'ESPARDENYERA PLA DEL BON AIRE PLA DEL BON AIRE / EL GARROT POBLE NOU / ZONA ESPORTIVA SANT PERE TORRENT D'EN PERE PARRES LES ARENES LA GRÍPIA CAN MONTLLOR CAN TUSELL ÈGARA SANT PERE NORD SANT LLORENÇ
Now we have all the venues for each neighborhood including parks, squares, restaurants etc. We only need restaurants so I filter the data.
terrassa_venues.head()
Neighborhood | Neighborhood Latitude | Neighborhood Longitude | Venue | Venue Latitude | Venue Longitude | Venue Category | |
---|---|---|---|---|---|---|---|
0 | ANTIC POBLE DE SANT PERE | 41.568122 | 2.01811 | La Passtiseria | 41.569552 | 2.015209 | Dessert Shop |
1 | ANTIC POBLE DE SANT PERE | 41.568122 | 2.01811 | Di Carlo | 41.569525 | 2.018971 | Pizza Place |
2 | ANTIC POBLE DE SANT PERE | 41.568122 | 2.01811 | Seu d'Ègara - Esglésies de Sant Pere | 41.565656 | 2.015822 | Historic Site |
3 | ANTIC POBLE DE SANT PERE | 41.568122 | 2.01811 | La Cuina d'en Brich | 41.567885 | 2.018343 | Paella Restaurant |
4 | ANTIC POBLE DE SANT PERE | 41.568122 | 2.01811 | Rostisseria Carme | 41.569014 | 2.021051 | Spanish Restaurant |
target = terrassa_venues[terrassa_venues['Venue Category'].str.contains('Restaurant')]
target
Neighborhood | Neighborhood Latitude | Neighborhood Longitude | Venue | Venue Latitude | Venue Longitude | Venue Category | |
---|---|---|---|---|---|---|---|
3 | ANTIC POBLE DE SANT PERE | 41.568122 | 2.018110 | La Cuina d'en Brich | 41.567885 | 2.018343 | Paella Restaurant |
4 | ANTIC POBLE DE SANT PERE | 41.568122 | 2.018110 | Rostisseria Carme | 41.569014 | 2.021051 | Spanish Restaurant |
6 | ANTIC POBLE DE SANT PERE | 41.568122 | 2.018110 | Teik It | 41.569686 | 2.014270 | Sushi Restaurant |
8 | ANTIC POBLE DE SANT PERE | 41.568122 | 2.018110 | Restaurant Subit | 41.565655 | 2.021438 | Restaurant |
16 | CEMENTIRI VELL | 41.561663 | 2.019882 | La Kaña Bar | 41.559759 | 2.023489 | Spanish Restaurant |
... | ... | ... | ... | ... | ... | ... | ... |
292 | LA GRÍPIA | 41.573571 | 2.039209 | bar el malagueño | 41.571478 | 2.036649 | Spanish Restaurant |
297 | CAN TUSELL | 41.579434 | 2.018755 | La Cantonada de les Barriques | 41.578382 | 2.015616 | Spanish Restaurant |
298 | CAN TUSELL | 41.579434 | 2.018755 | D'Tapes | 41.576499 | 2.020324 | Fast Food Restaurant |
304 | ÈGARA | 41.570493 | 2.030514 | Restaurant Claret | 41.567243 | 2.029554 | Restaurant |
312 | SANT PERE NORD | 41.575469 | 2.023369 | D'Tapes | 41.576499 | 2.020324 | Fast Food Restaurant |
93 rows × 7 columns
print('There are {} uniques categories of restaurants.'.format(len(target['Venue Category'].unique())))
There are 15 uniques categories of restaurants.
unicos = target['Venue Category'].unique()
df_unicos=pd.DataFrame(unicos, columns=['Restaurante'])
rangocolores=len(target['Venue Category'].unique())
# create map of the restaurants of Terrassa using latitude and longitude values
map_restaurants = folium.Map(location=[latitude, longitude], zoom_start=12)
#set color scheme for the clusters
x = np.arange(rangocolores)
ys = [i + x + (i*x)**2 for i in range(rangocolores)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lng, venu, barri in zip(target['Venue Latitude'], target['Venue Longitude'], target['Venue Category'], target['Neighborhood']):
mo = df_unicos[df_unicos['Restaurante'].str.match(venu)].index.item()
label = '{},{}'.format(venu, barri)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color=rainbow[mo],
fill=True,
fill_color=rainbow[mo],
fill_opacity=0.7).add_to(map_restaurants)
map_restaurants
# one hot encoding
terrassa_onehot = pd.get_dummies(target[['Venue Category']], prefix='', prefix_sep='')
# add neighborhood column back to dataframe
terrassa_onehot['Neighborhood'] = target['Neighborhood']
# move neighborhood column to the first column
fixed_columns = [terrassa_onehot.columns[-1]] + list(terrassa_onehot.columns[:-1])
terrassa_onehot = terrassa_onehot[fixed_columns]
terrassa_onehot.head()
Neighborhood | Asian Restaurant | Chinese Restaurant | Falafel Restaurant | Fast Food Restaurant | Gluten-free Restaurant | Italian Restaurant | Japanese Restaurant | Mediterranean Restaurant | Mexican Restaurant | Paella Restaurant | Restaurant | Spanish Restaurant | Sushi Restaurant | Tapas Restaurant | Vegetarian / Vegan Restaurant | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | ANTIC POBLE DE SANT PERE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
4 | ANTIC POBLE DE SANT PERE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
6 | ANTIC POBLE DE SANT PERE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
8 | ANTIC POBLE DE SANT PERE | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
16 | CEMENTIRI VELL | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
terrassa_onehot.shape
(93, 16)
terrassa_grouped = terrassa_onehot.groupby('Neighborhood').mean().reset_index()
terrassa_grouped
Neighborhood | Asian Restaurant | Chinese Restaurant | Falafel Restaurant | Fast Food Restaurant | Gluten-free Restaurant | Italian Restaurant | Japanese Restaurant | Mediterranean Restaurant | Mexican Restaurant | Paella Restaurant | Restaurant | Spanish Restaurant | Sushi Restaurant | Tapas Restaurant | Vegetarian / Vegan Restaurant | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ANTIC POBLE DE SANT PERE | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.25 | 0.250000 | 0.250000 | 0.250000 | 0.000000 | 0.000000 |
1 | CA N'ANGLADA | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 0.333333 | 0.333333 | 0.000000 | 0.333333 | 0.000000 |
2 | CA N'AURELL | 0.0 | 0.166667 | 0.166667 | 0.000000 | 0.000000 | 0.333333 | 0.000000 | 0.333333 | 0.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
3 | CAN BOADA (NUCLI ANTIC) | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
4 | CAN GONTERES | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
5 | CAN PALET | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 0.666667 | 0.333333 | 0.000000 | 0.000000 | 0.000000 |
6 | CAN PALET DE VISTA ALEGRE | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
7 | CAN PALET II | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 |
8 | CAN ROCA | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
9 | CAN TUSELL | 0.0 | 0.000000 | 0.000000 | 0.500000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 0.000000 | 0.500000 | 0.000000 | 0.000000 | 0.000000 |
10 | CEMENTIRI VELL | 0.0 | 0.250000 | 0.125000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.125000 | 0.0 | 0.00 | 0.250000 | 0.125000 | 0.000000 | 0.125000 | 0.000000 |
11 | CENTRE | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.043478 | 0.173913 | 0.043478 | 0.391304 | 0.0 | 0.00 | 0.130435 | 0.000000 | 0.000000 | 0.173913 | 0.043478 |
12 | GUADALHORCE | 0.5 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 0.000000 | 0.500000 | 0.000000 | 0.000000 | 0.000000 |
13 | LA COGULLADA | 0.0 | 0.250000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.500000 | 0.0 | 0.00 | 0.000000 | 0.250000 | 0.000000 | 0.000000 | 0.000000 |
14 | LA GRÍPIA | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 |
15 | LA MAURINA | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 0.500000 | 0.500000 | 0.000000 | 0.000000 | 0.000000 |
16 | LES FONTS | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
17 | MONTSERRAT | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
18 | PLAÇA DE CATALUNYA / ESCOLA INDUSTRIAL | 0.0 | 0.000000 | 0.000000 | 0.166667 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 0.333333 | 0.333333 | 0.000000 | 0.166667 | 0.000000 |
19 | POBLE NOU / ZONA ESPORTIVA | 0.0 | 0.000000 | 0.333333 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 0.333333 | 0.000000 | 0.000000 | 0.333333 | 0.000000 |
20 | ROC BLANC | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 |
21 | SANT PERE | 0.0 | 0.000000 | 0.166667 | 0.166667 | 0.000000 | 0.000000 | 0.000000 | 0.166667 | 0.0 | 0.00 | 0.166667 | 0.166667 | 0.166667 | 0.000000 | 0.000000 |
22 | SANT PERE NORD | 0.0 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
23 | SEGLE XX | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.500000 | 0.000000 | 0.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.500000 | 0.000000 |
24 | TORRENT D'EN PERE PARRES | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25 | VALLPARADÍS | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.333333 | 0.0 | 0.00 | 0.333333 | 0.000000 | 0.333333 | 0.000000 | 0.000000 |
26 | VILARDELL | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
27 | VISTA ALEGRE | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
28 | XÚQUER | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.0 | 0.00 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
29 | ÈGARA | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.00 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
terrassa_grouped.shape
(30, 16)
def return_most_common_venues(row, num_top_venues):
row_categories = row.iloc[1:]
row_categories_sorted = row_categories.sort_values(ascending=False)
return row_categories_sorted.index.values[0:num_top_venues]
num_top_venues = 3
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
try:
columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
except:
columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = terrassa_grouped['Neighborhood']
for ind in np.arange(terrassa_grouped.shape[0]):
neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(terrassa_grouped.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted
Neighborhood | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | |
---|---|---|---|---|
0 | ANTIC POBLE DE SANT PERE | Sushi Restaurant | Spanish Restaurant | Restaurant |
1 | CA N'ANGLADA | Tapas Restaurant | Spanish Restaurant | Restaurant |
2 | CA N'AURELL | Mediterranean Restaurant | Italian Restaurant | Falafel Restaurant |
3 | CAN BOADA (NUCLI ANTIC) | Mediterranean Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
4 | CAN GONTERES | Mediterranean Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
5 | CAN PALET | Restaurant | Spanish Restaurant | Vegetarian / Vegan Restaurant |
6 | CAN PALET DE VISTA ALEGRE | Mediterranean Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
7 | CAN PALET II | Spanish Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
8 | CAN ROCA | Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
9 | CAN TUSELL | Spanish Restaurant | Fast Food Restaurant | Vegetarian / Vegan Restaurant |
10 | CEMENTIRI VELL | Restaurant | Chinese Restaurant | Tapas Restaurant |
11 | CENTRE | Mediterranean Restaurant | Tapas Restaurant | Italian Restaurant |
12 | GUADALHORCE | Spanish Restaurant | Asian Restaurant | Vegetarian / Vegan Restaurant |
13 | LA COGULLADA | Mediterranean Restaurant | Spanish Restaurant | Chinese Restaurant |
14 | LA GRÍPIA | Spanish Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
15 | LA MAURINA | Spanish Restaurant | Restaurant | Vegetarian / Vegan Restaurant |
16 | LES FONTS | Mediterranean Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
17 | MONTSERRAT | Mexican Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
18 | PLAÇA DE CATALUNYA / ESCOLA INDUSTRIAL | Spanish Restaurant | Restaurant | Tapas Restaurant |
19 | POBLE NOU / ZONA ESPORTIVA | Tapas Restaurant | Restaurant | Falafel Restaurant |
20 | ROC BLANC | Spanish Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
21 | SANT PERE | Sushi Restaurant | Spanish Restaurant | Restaurant |
22 | SANT PERE NORD | Fast Food Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
23 | SEGLE XX | Tapas Restaurant | Japanese Restaurant | Vegetarian / Vegan Restaurant |
24 | TORRENT D'EN PERE PARRES | Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
25 | VALLPARADÍS | Sushi Restaurant | Restaurant | Mediterranean Restaurant |
26 | VILARDELL | Mexican Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
27 | VISTA ALEGRE | Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
28 | XÚQUER | Mexican Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
29 | ÈGARA | Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
For that K-means will be the choice
# set number of clusters
kclusters = 5
terrassa_grouped_clustering = terrassa_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(terrassa_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]
array([4, 4, 4, 0, 0, 2, 0, 1, 2, 1], dtype=int32)
Create a dataframe with the 3 most common restaurants and its cluster
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
terrassa_merged = df
# merge terrassa_grouped with terrassa_data to add latitude/longitude for each neighborhood
terrassa_merged = terrassa_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='BARRIO')
terrassa_merged.head() # check the last columns!
DISTRITO | NOMBRE | BARRIO | CP | LATITUD | LONGITUD | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | CENTRE | ANTIC POBLE DE SANT PERE | 8221 | 41.568122 | 2.018110 | 4.0 | Sushi Restaurant | Spanish Restaurant | Restaurant |
1 | 1 | CENTRE | CEMENTIRI VELL | 8221 | 41.561663 | 2.019882 | 4.0 | Restaurant | Chinese Restaurant | Tapas Restaurant |
2 | 1 | CENTRE | CENTRE | 8221 | 41.563695 | 2.011765 | 4.0 | Mediterranean Restaurant | Tapas Restaurant | Italian Restaurant |
3 | 1 | CENTRE | PLAÇA DE CATALUNYA / ESCOLA INDUSTRIAL | 8221 | 41.566705 | 2.025774 | 4.0 | Spanish Restaurant | Restaurant | Tapas Restaurant |
4 | 1 | CENTRE | VALLPARADÍS | 8221 | 41.569867 | 2.014262 | 4.0 | Sushi Restaurant | Restaurant | Mediterranean Restaurant |
Now let's check the resulting clusters
terrassa = terrassa_merged.dropna(how='any', axis=0)
terrassa['Cluster Labels']=terrassa['Cluster Labels'].astype(int)
terrassa.head()
/opt/conda/envs/Python-3.7-main/lib/python3.7/site-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy from ipykernel import kernelapp as app
DISTRITO | NOMBRE | BARRIO | CP | LATITUD | LONGITUD | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | CENTRE | ANTIC POBLE DE SANT PERE | 8221 | 41.568122 | 2.018110 | 4 | Sushi Restaurant | Spanish Restaurant | Restaurant |
1 | 1 | CENTRE | CEMENTIRI VELL | 8221 | 41.561663 | 2.019882 | 4 | Restaurant | Chinese Restaurant | Tapas Restaurant |
2 | 1 | CENTRE | CENTRE | 8221 | 41.563695 | 2.011765 | 4 | Mediterranean Restaurant | Tapas Restaurant | Italian Restaurant |
3 | 1 | CENTRE | PLAÇA DE CATALUNYA / ESCOLA INDUSTRIAL | 8221 | 41.566705 | 2.025774 | 4 | Spanish Restaurant | Restaurant | Tapas Restaurant |
4 | 1 | CENTRE | VALLPARADÍS | 8221 | 41.569867 | 2.014262 | 4 | Sushi Restaurant | Restaurant | Mediterranean Restaurant |
The logic behind that is to group different neighborhoods following their 3 most common restaurants. Checking the results on a map it's visible that the outskirts of the city share the same kind of restaurants ( cluster in red color). Also, many neighborhoods in the city center and close to it share the same kind of restaurants. This is probably related to the big offer of restaurants in the city center.
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(terrassa['LATITUD'], terrassa['LONGITUD'], terrassa['BARRIO'], terrassa['Cluster Labels']):
label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
folium.CircleMarker(
[lat, lon],
radius=8,
popup=label,
color=rainbow[cluster-1],
fill=True,
fill_color=rainbow[cluster-1],
fill_opacity=0.7).add_to(map_clusters)
map_clusters
Cluster 1,
terrassa.loc[terrassa['Cluster Labels'] == 0, terrassa.columns[[2] + list(range(6, terrassa.shape[1]))]]
BARRIO | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | |
---|---|---|---|---|---|
13 | LES FONTS | 0 | Mediterranean Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
18 | CAN PALET DE VISTA ALEGRE | 0 | Mediterranean Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
24 | CAN BOADA (NUCLI ANTIC) | 0 | Mediterranean Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
26 | CAN GONTERES | 0 | Mediterranean Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
Cluster 2
terrassa.loc[terrassa['Cluster Labels'] == 1, terrassa.columns[[2] + list(range(6, terrassa.shape[1]))]]
BARRIO | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | |
---|---|---|---|---|---|
11 | CAN PALET II | 1 | Spanish Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
14 | GUADALHORCE | 1 | Spanish Restaurant | Asian Restaurant | Vegetarian / Vegan Restaurant |
21 | LA MAURINA | 1 | Spanish Restaurant | Restaurant | Vegetarian / Vegan Restaurant |
22 | ROC BLANC | 1 | Spanish Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
36 | LA GRÍPIA | 1 | Spanish Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
38 | CAN TUSELL | 1 | Spanish Restaurant | Fast Food Restaurant | Vegetarian / Vegan Restaurant |
Cluster 3
terrassa.loc[terrassa['Cluster Labels'] == 2, terrassa.columns[[2] + list(range(6, terrassa.shape[1]))]]
BARRIO | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | |
---|---|---|---|---|---|
10 | CAN PALET | 2 | Restaurant | Spanish Restaurant | Vegetarian / Vegan Restaurant |
23 | VISTA ALEGRE | 2 | Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
27 | CAN ROCA | 2 | Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
34 | TORRENT D'EN PERE PARRES | 2 | Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
39 | ÈGARA | 2 | Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
Cluster 4
terrassa.loc[terrassa['Cluster Labels'] == 3, terrassa.columns[[2] + list(range(6, terrassa.shape[1]))]]
BARRIO | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | |
---|---|---|---|---|---|
6 | MONTSERRAT | 3 | Mexican Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
8 | VILARDELL | 3 | Mexican Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
16 | XÚQUER | 3 | Mexican Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
Cluster 5
terrassa.loc[terrassa['Cluster Labels'] == 4, terrassa.columns[[2] + list(range(6, terrassa.shape[1]))]]
BARRIO | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | |
---|---|---|---|---|---|
0 | ANTIC POBLE DE SANT PERE | 4 | Sushi Restaurant | Spanish Restaurant | Restaurant |
1 | CEMENTIRI VELL | 4 | Restaurant | Chinese Restaurant | Tapas Restaurant |
2 | CENTRE | 4 | Mediterranean Restaurant | Tapas Restaurant | Italian Restaurant |
3 | PLAÇA DE CATALUNYA / ESCOLA INDUSTRIAL | 4 | Spanish Restaurant | Restaurant | Tapas Restaurant |
4 | VALLPARADÍS | 4 | Sushi Restaurant | Restaurant | Mediterranean Restaurant |
5 | CA N'ANGLADA | 4 | Tapas Restaurant | Spanish Restaurant | Restaurant |
15 | SEGLE XX | 4 | Tapas Restaurant | Japanese Restaurant | Vegetarian / Vegan Restaurant |
17 | CA N'AURELL | 4 | Mediterranean Restaurant | Italian Restaurant | Falafel Restaurant |
19 | LA COGULLADA | 4 | Mediterranean Restaurant | Spanish Restaurant | Chinese Restaurant |
32 | POBLE NOU / ZONA ESPORTIVA | 4 | Tapas Restaurant | Restaurant | Falafel Restaurant |
33 | SANT PERE | 4 | Sushi Restaurant | Spanish Restaurant | Restaurant |
40 | SANT PERE NORD | 4 | Fast Food Restaurant | Vegetarian / Vegan Restaurant | Tapas Restaurant |
Combining the map with all the restaurants and the different clusters of restaurants depending on their neighborhood the interested person could better select where and which kind of restaurant would like to open. One must take into account that foursquare is not totally up-to-date in small cities due to lack of subscribers, so information might be missing. Also, the naming of venues categories is not consistent and for example in this notebook pizza places have been ignored. Finally, where locations are poorly populated with restaurants the statistics are altered. If there is only 1 restaurant in a neighborhood the 2nd and 3rd common restaurant categories are filled with values that are meaningless.