An investor is looking to open a new restaurant in San Francisco, but he is not sure about the best location for his new venue and needs input for making the decision. San Francisco is rather busy city famous for its business innovation and several famous tourit attractions. So while it looks promising to set up a new restaurant business in San Francisco, the venue's location must be carefully picked in order to maximize the profit. According to an analysis in the FSR Magazine, the 8 factors for choosing a new restaurant location are
In the capstone project, we will get the help from FourSquare API to address at least part of these considerations.
Because of the availability of datasets, we will not address all of the factors listed above. However, we will work on some of the most important factors such as visibility, parking, crime rates, and affordability. We will utilize the following datasets/tools.
Static datasets:
Search engines:
In this section, we are going to explore San Francisco crime and housing datasets and answer two of the most important factors discussed in the Introduction. Then, using the Foursquare API, we will explore neighborhoods of the city of San Francisco. The neighborhoods will be clustered using the $k$-mean algorithm. The combined results will provide us insights into possible locations for opening a new restaurant.
import numpy as np
import pandas as pd
import geopandas as gpd
import folium
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
import pysal as ps
import requests
from pandas.io.json import json_normalize
from geopandas.tools import sjoin
from geopandas import GeoDataFrame
from geopy.geocoders import Nominatim
from folium.plugins import FastMarkerCluster
from shapely.geometry import Point
from sklearn.cluster import KMeans
#from branca.utilities import split_six
%matplotlib inline
/Users/chiachen/miniconda3/envs/mlenv/lib/python3.6/site-packages/pysal/__init__.py:65: VisibleDeprecationWarning: PySAL's API will be changed on 2018-12-31. The last release made with this API is version 1.14.4. A preview of the next API version is provided in the `pysal` 2.0 prelease candidate. The API changes and a guide on how to change imports is provided at https://migrating.pysal.org ), VisibleDeprecationWarning)
Read in the San Francisco Police Department Incident Reports and perform an initial check.
df_crime = pd.read_csv("./Police_Department_Incident_Reports__2018_to_Present.csv")
df_crime.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 111531 entries, 0 to 111530 Data columns (total 26 columns): Incident Datetime 111531 non-null object Incident Date 111531 non-null object Incident Time 111531 non-null object Incident Year 111531 non-null int64 Incident Day of Week 111531 non-null object Report Datetime 111531 non-null object Row ID 111531 non-null int64 Incident ID 111531 non-null int64 Incident Number 111531 non-null int64 CAD Number 86415 non-null float64 Report Type Code 111531 non-null object Report Type Description 111531 non-null object Filed Online 23641 non-null object Incident Code 111531 non-null int64 Incident Category 111520 non-null object Incident Subcategory 111520 non-null object Incident Description 111531 non-null object Resolution 111531 non-null object Intersection 105956 non-null object CNN 105956 non-null float64 Police District 111531 non-null object Analysis Neighborhood 105913 non-null object Supervisor District 105956 non-null float64 Latitude 105956 non-null float64 Longitude 105956 non-null float64 point 105956 non-null object dtypes: float64(5), int64(5), object(16) memory usage: 22.1+ MB
First five rows of the dataset.
pd.set_option('display.max_columns', 100)
df_crime.head()
Incident Datetime | Incident Date | Incident Time | Incident Year | Incident Day of Week | Report Datetime | Row ID | Incident ID | Incident Number | CAD Number | Report Type Code | Report Type Description | Filed Online | Incident Code | Incident Category | Incident Subcategory | Incident Description | Resolution | Intersection | CNN | Police District | Analysis Neighborhood | Supervisor District | Latitude | Longitude | point | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2018/01/01 01:30:00 AM | 2018/01/01 | 01:30 | 2018 | Monday | 2018/01/01 02:13:00 AM | 61870203073 | 618702 | 180000263 | 180010563.0 | II | Initial | NaN | 3073 | Robbery | Robbery - Other | Robbery, W/ Other Weapon | Open or Active | JUSTIN DR \ COLLEGE AVE | 21236000.0 | Ingleside | Bernal Heights | 9.0 | 37.732261 | -122.423486 | (37.732261252752224, -122.42348641495892) |
1 | 2018/01/01 01:59:00 AM | 2018/01/01 | 01:59 | 2018 | Monday | 2018/01/01 01:59:00 AM | 61870768000 | 618707 | 180000326 | 180010504.0 | II | Initial | NaN | 68000 | Fire Report | Fire Report | Fire Report | Open or Active | 16TH ST \ MISSION ST | 24170000.0 | Mission | Mission | 9.0 | 37.765051 | -122.419669 | (37.76505133632968, -122.41966897380142) |
2 | 2018/01/01 02:28:00 AM | 2018/01/01 | 02:28 | 2018 | Monday | 2018/01/01 02:31:00 AM | 61870904134 | 618709 | 180000348 | 180010636.0 | II | Initial | NaN | 4134 | Assault | Simple Assault | Battery | Open or Active | 03RD ST \ PERRY ST | 20657000.0 | Southern | South of Market | 6.0 | 37.782119 | -122.396841 | (37.78211912156566, -122.39684142850209) |
3 | 2018/01/01 02:28:00 AM | 2018/01/01 | 02:28 | 2018 | Monday | 2018/01/01 02:31:00 AM | 61870928160 | 618709 | 180000348 | 180010636.0 | II | Initial | NaN | 28160 | Malicious Mischief | Vandalism | Malicious Mischief, Vandalism to Vehicle | Open or Active | 03RD ST \ PERRY ST | 20657000.0 | Southern | South of Market | 6.0 | 37.782119 | -122.396841 | (37.78211912156566, -122.39684142850209) |
4 | 2018/01/01 02:08:00 AM | 2018/01/01 | 02:08 | 2018 | Monday | 2018/01/01 02:08:00 AM | 61871004014 | 618710 | 180000285 | 180010537.0 | II | Initial | NaN | 4014 | Assault | Aggravated Assault | Assault, Aggravated, W/ Force | Cite or Arrest Adult | CESAR CHAVEZ ST \ CAPP ST \ MISSION ST | 21304000.0 | Mission | Bernal Heights | 9.0 | 37.748166 | -122.418221 | (37.74816568813204, -122.41822117169174) |
The most important columns are Incident Category, Latitude, Longitude, and time stamps. We remove columns that are not needed for the analysis.
columns = ['Incident Datetime', 'Incident Day of Week', 'Incident Year',
'Report Datetime', 'Row ID', 'Incident ID', 'CAD Number', 'Report Type Code',
'Report Type Description', 'Filed Online', 'Incident Code', 'Incident Subcategory',
'Incident Description', 'Intersection', 'CNN', 'Analysis Neighborhood',
'Supervisor District', 'Resolution', 'point']
df_crime = df_crime.drop(columns, axis=1)
Dropping NaN rows from the remaining dataset.
df_crime.isnull().sum()
Incident Date 0 Incident Time 0 Incident Number 0 Incident Category 11 Police District 0 Latitude 5575 Longitude 5575 dtype: int64
df_crime.dropna(inplace=True)
df_crime.isnull().sum()
Incident Date 0 Incident Time 0 Incident Number 0 Incident Category 0 Police District 0 Latitude 0 Longitude 0 dtype: int64
Get a list the type of incidents reported
df_crime['Incident Category'].unique()
array(['Robbery', 'Fire Report', 'Assault', 'Malicious Mischief', 'Larceny Theft', 'Non-Criminal', 'Miscellaneous Investigation', 'Disorderly Conduct', 'Warrant', 'Weapons Carrying Etc', 'Recovered Vehicle', 'Other Miscellaneous', 'Burglary', 'Missing Person', 'Suspicious Occ', 'Civil Sidewalks', 'Fraud', 'Motor Vehicle Theft', 'Traffic Violation Arrest', 'Drug Offense', 'Weapons Offense', 'Offences Against The Family And Children', 'Stolen Property', 'Lost Property', 'Other Offenses', 'Traffic Collision', 'Suicide', 'Homicide', 'Vehicle Misplaced', 'Other', 'Family Offense', 'Forgery And Counterfeiting', 'Sex Offense', 'Arson', 'Courtesy Report', 'Case Closure', 'Gambling', 'Drug Violation', 'Prostitution', 'Juvenile Offenses', 'Embezzlement', 'Vehicle Impounded', 'Vandalism', 'Human Trafficking (A), Commercial Sex Acts', 'Liquor Laws', 'Suspicious', 'Motor Vehicle Theft?', 'Rape', 'Weapons Offence', 'Human Trafficking, Commercial Sex Acts'], dtype=object)
Remove the 'Non-Criminal' column.
df_crime = df_crime[df_crime['Incident Category'] != 'Non-Criminal'].reset_index(drop=True)
Visualize crime distribution by category.
df_crime['Incident Category'].value_counts().plot(kind='bar', figsize=(16,8))
plt.ylabel('Number of incidents')
plt.show()
The number one category is larceny theft, followed by assault and burglary, not including 'Other Miscellaneous' and 'Miscellaneous Mischief.'
Visualize crime distribution by police districts.
# calculating total number of incidents per district
crimedata_police_district = pd.DataFrame(df_crime['Police District'].value_counts().astype(float))
crimedata_police_district = crimedata_police_district.reset_index()
crimedata_police_district.columns = ['District', 'Number']
crimedata_police_district.plot(kind='bar', figsize=(16,8), legend=None)
xticks = [i for i in range(len(crimedata_police_district))]
plt.xticks(xticks, list(crimedata_police_district['District']))
plt.xlabel('Police district')
plt.ylabel('Number of incidents')
plt.show()
It appears that the Central Police District has the most number of incidents. The next district is Mission district.
We first convert the Pandas df_crime into a GeoPandas GeoDataFrame, a spatial version of df_crime. This is done by first creating Shapely point geometry objects with proper coordinate projection for each record. Then we attach the results as a new column to df_crime.
Creating Shapely object for each record. Details of the coordinate system, ESPG 4326 which represents the standard WGS84 coordinate system, can be found in this link. Here we implement the Point() function from the shapely package.
geometry = gpd.GeoSeries(df_crime.apply(lambda z: Point(z['Longitude'], z['Latitude']), 1), crs={'init': 'epsg:4326'})
Convert df_crime into GeoDataFrame.
df_crime = gpd.GeoDataFrame(df_crime, geometry=geometry)
df_crime.head()
Incident Date | Incident Time | Incident Number | Incident Category | Police District | Latitude | Longitude | geometry | |
---|---|---|---|---|---|---|---|---|
0 | 2018/01/01 | 01:30 | 180000263 | Robbery | Ingleside | 37.732261 | -122.423486 | POINT (-122.4234864149589 37.73226125275223) |
1 | 2018/01/01 | 01:59 | 180000326 | Fire Report | Mission | 37.765051 | -122.419669 | POINT (-122.4196689738014 37.76505133632968) |
2 | 2018/01/01 | 02:28 | 180000348 | Assault | Southern | 37.782119 | -122.396841 | POINT (-122.3968414285021 37.78211912156566) |
3 | 2018/01/01 | 02:28 | 180000348 | Malicious Mischief | Southern | 37.782119 | -122.396841 | POINT (-122.3968414285021 37.78211912156566) |
4 | 2018/01/01 | 02:08 | 180000285 | Assault | Mission | 37.748166 | -122.418221 | POINT (-122.4182211716917 37.74816568813204) |
To map out the crime data, there are three geological units we can work with: police districe, census tracts, and neighborhoods. Here we choose the last one which is designed by San Francisco Association of Realtors. The data can be obtained from DataSF. We download the shape file from the website and import it using GeoPandas.
nbrhoods = gpd.read_file('sf_neighborhoods.shp')
nbrhoods.head()
nbrhood | nid | sfar_distr | geometry | |
---|---|---|---|---|
0 | Alamo Square | 6e | District 6 - Central North | POLYGON ((-122.4294839489174 37.77509623070431... |
1 | Anza Vista | 6a | District 6 - Central North | POLYGON ((-122.4474643913587 37.77986335309237... |
2 | Balboa Terrace | 4a | District 4 - Twin Peaks West | POLYGON ((-122.464508862148 37.73220849554402,... |
3 | Bayview | 10a | District 10 - Southeast | POLYGON ((-122.38758527039 37.7502633777501, -... |
4 | Bernal Heights | 9a | District 9 - Central East | POLYGON ((-122.4037549223623 37.74919006373567... |
nbrhoods.plot(figsize=(12,14))
plt.show()
Check nbrhoods's coordinate reference system.
print(nbrhoods.crs)
{'init': 'epsg:4326'}
Using the geological information in df_crime, we can calculate the number of crimes in each neighborhood by implementing GeoPandas' sjoin function. Since we want to aggregate the number in each neighborhood, we set op='within.' The resulted GeoDataFrame is further grouped by neighborhood.
nbh_crime_counts = gpd.tools.sjoin(df_crime.to_crs(nbrhoods.crs), nbrhoods, how="inner", op='intersects').groupby('nbrhood').size()
nbh_crime_counts = pd.DataFrame(data=nbh_crime_counts.reset_index())
nbh_crime_counts.columns=['nbrhood', 'incident_counts']
nbh_crime_counts.head()
nbrhood | incident_counts | |
---|---|---|
0 | Alamo Square | 687 |
1 | Anza Vista | 310 |
2 | Balboa Terrace | 49 |
3 | Bayview | 3185 |
4 | Bayview Heights | 253 |
Finally, we combine the nbh_crime_counts and the nbrhoods GeoDataFrames using the merge function. We use nbrhood as the key where the two frames are joined. Details of the implementation can be found here.
nbrhoods = nbrhoods.merge(nbh_crime_counts, on='nbrhood')
nbrhoods.head()
nbrhood | nid | sfar_distr | geometry | incident_counts | |
---|---|---|---|---|---|
0 | Alamo Square | 6e | District 6 - Central North | POLYGON ((-122.4294839489174 37.77509623070431... | 687 |
1 | Anza Vista | 6a | District 6 - Central North | POLYGON ((-122.4474643913587 37.77986335309237... | 310 |
2 | Balboa Terrace | 4a | District 4 - Twin Peaks West | POLYGON ((-122.464508862148 37.73220849554402,... | 49 |
3 | Bayview | 10a | District 10 - Southeast | POLYGON ((-122.38758527039 37.7502633777501, -... | 3185 |
4 | Bernal Heights | 9a | District 9 - Central East | POLYGON ((-122.4037549223623 37.74919006373567... | 1561 |
Delete df_crime in order to reduce memory usage.
del df_crime
Import the San Francisco Historica Secured Property Tax Rolls, 2007-2015 and perform an initial check.
df_import = pd.read_csv('Historic_Secured_Property_Tax_Rolls.csv')
df_import.info()
/Users/chiachen/miniconda3/envs/mlenv/lib/python3.6/site-packages/IPython/core/interactiveshell.py:3020: DtypeWarning: Columns (29) have mixed types. Specify dtype option on import or set low_memory=False. interactivity=interactivity, compiler=compiler, result=result)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1612110 entries, 0 to 1612109 Data columns (total 43 columns): Closed Roll Fiscal Year 1612109 non-null float64 Property Location 1612110 non-null object Neighborhood Code 1611431 non-null object Neighborhood Code Definition 1564368 non-null object Block and Lot Number 1612110 non-null object Volume Number 1612110 non-null int64 Property Class Code 1611252 non-null object Property Class Code Definition 1596776 non-null object Year Property Built 1483511 non-null float64 Number of Bathrooms 1612110 non-null float64 Number of Bedrooms 1612110 non-null int64 Number of Rooms 1612110 non-null int64 Number of Stories 1612110 non-null int64 Number of Units 1612110 non-null int64 Characteristics Change Date 1404465 non-null object Zoning Code 1393875 non-null object Construction Type 1357497 non-null object Lot Depth 1612110 non-null float64 Lot Frontage 1612110 non-null float64 Property Area in Square Feet 1612110 non-null int64 Basement Area 1612109 non-null float64 Lot Area 1612109 non-null float64 Lot Code 569318 non-null object Prior Sales Date 0 non-null float64 Recordation Date 1491934 non-null object Document Number 635747 non-null object Document Number 2 1612109 non-null float64 Tax Rate Area Code 1607742 non-null float64 Percent of Ownership 1612109 non-null float64 Closed Roll Exemption Type Code 740703 non-null object Closed Roll Exemption Type Code Definition 740660 non-null object Closed Roll Status Code 28589 non-null object Closed Roll Misc Exemption Value 1612109 non-null float64 Closed Roll Homeowner Exemption Value 1612109 non-null float64 Current Sales Date 835570 non-null object Closed Roll Assessed Fixtures Value 1612109 non-null float64 Closed Roll Assessed Improvement Value 1612109 non-null float64 Closed Roll Assessed Land Value 1612109 non-null float64 Closed Roll Assessed Personal Prop Value 1612109 non-null float64 Zipcode of Parcel 1584672 non-null float64 Supervisor District 1586142 non-null float64 Neighborhoods - Analysis Boundaries 1584816 non-null object Location 1591808 non-null object dtypes: float64(19), int64(6), object(18) memory usage: 528.9+ MB
We will work with the following columns only.
columns = ['Block and Lot Number',
'Closed Roll Assessed Fixtures Value',
'Closed Roll Assessed Improvement Value',
'Closed Roll Assessed Land Value',
'Closed Roll Assessed Personal Prop Value', 'Neighborhoods - Analysis Boundaries',
'Location']
Get the data obtained in 2014. The df_housing will only contain columns defined in the last cell.
df_housing = df_import[df_import['Closed Roll Fiscal Year']==2014.0].loc[:,columns].reset_index(drop=True)
Check if there's any NaNs. If so, drop those rows.
df_housing.isnull().sum()
Block and Lot Number 0 Closed Roll Assessed Fixtures Value 0 Closed Roll Assessed Improvement Value 0 Closed Roll Assessed Land Value 0 Closed Roll Assessed Personal Prop Value 0 Neighborhoods - Analysis Boundaries 2497 Location 1619 dtype: int64
df_housing.dropna(inplace=True)
df_housing.isnull().sum()
Block and Lot Number 0 Closed Roll Assessed Fixtures Value 0 Closed Roll Assessed Improvement Value 0 Closed Roll Assessed Land Value 0 Closed Roll Assessed Personal Prop Value 0 Neighborhoods - Analysis Boundaries 0 Location 0 dtype: int64
Compute tha total value of the house. The total value is the combination of assessed fixtures value, improvement value, land value, and personal prop value.
df_housing['total_price'] = df_housing['Closed Roll Assessed Fixtures Value'] + \
df_housing['Closed Roll Assessed Improvement Value'] + \
df_housing['Closed Roll Assessed Land Value'] + \
df_housing['Closed Roll Assessed Personal Prop Value']
df_housing.head()
Block and Lot Number | Closed Roll Assessed Fixtures Value | Closed Roll Assessed Improvement Value | Closed Roll Assessed Land Value | Closed Roll Assessed Personal Prop Value | Neighborhoods - Analysis Boundaries | Location | total_price | |
---|---|---|---|---|---|---|---|---|
0 | 3751435 | 0.0 | 149168.0 | 149168.0 | 0.0 | South of Market | (37.7816504619473, -122.399116945614) | 298336.0 |
2 | 6276009 | 0.0 | 270000.0 | 405000.0 | 0.0 | Excelsior | (37.7190514589638, -122.433999199176) | 675000.0 |
3 | 3751420 | 0.0 | 128078.0 | 128078.0 | 0.0 | South of Market | (37.7816504619473, -122.399116945614) | 256156.0 |
4 | 7517378 | 0.0 | 129545.0 | 141594.0 | 0.0 | Noe Valley | (37.7463212609468, -122.441519528492) | 271139.0 |
5 | 3735098 | 0.0 | 336716.0 | 336716.0 | 0.0 | Financial District/South Beach | (37.7857477114134, -122.397398669759) | 673432.0 |
Change the format of GPS coordinates so that it is consistent with other datasets.
coordinates = df_housing['Location'].str.strip('()') \
.str.split(', ', expand=True) \
.rename(columns={0:'Latitude', 1:'Longitude'})
columns = list(df_housing.columns) + list(coordinates.columns)
df_housing = pd.concat([df_housing, coordinates], axis=1, ignore_index=True)
df_housing.columns = columns
df_housing = df_housing.drop(columns=['Closed Roll Assessed Fixtures Value',
'Closed Roll Assessed Improvement Value',
'Closed Roll Assessed Land Value',
'Closed Roll Assessed Personal Prop Value',
'Location'])
Latitude and longitude are text objects. Convert them to float.
df_housing[['Latitude','Longitude']] = df_housing[['Latitude','Longitude']].apply(pd.to_numeric)
Final checkup.
df_housing.head()
Block and Lot Number | Neighborhoods - Analysis Boundaries | total_price | Latitude | Longitude | |
---|---|---|---|---|---|
0 | 3751435 | South of Market | 298336.0 | 37.781650 | -122.399117 |
2 | 6276009 | Excelsior | 675000.0 | 37.719051 | -122.433999 |
3 | 3751420 | South of Market | 256156.0 | 37.781650 | -122.399117 |
4 | 7517378 | Noe Valley | 271139.0 | 37.746321 | -122.441520 |
5 | 3735098 | Financial District/South Beach | 673432.0 | 37.785748 | -122.397399 |
df_housing.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 204319 entries, 0 to 206815 Data columns (total 5 columns): Block and Lot Number 204319 non-null object Neighborhoods - Analysis Boundaries 204319 non-null object total_price 204319 non-null float64 Latitude 204319 non-null float64 Longitude 204319 non-null float64 dtypes: float64(3), object(2) memory usage: 9.4+ MB
Following the steps in the previous section, we are going to convert df_housing into GeoDataFrame which is will be used for mapping. First, we generate the Shapely point column by combining Longitude and Latitude data.
geometry_housing = gpd.GeoSeries(df_housing.apply(lambda z: Point(z['Longitude'], z['Latitude']), 1), crs={'init': 'epsg:4326'})
df_housing = gpd.GeoDataFrame(df_housing, geometry=geometry_housing)
df_housing.head()
Block and Lot Number | Neighborhoods - Analysis Boundaries | total_price | Latitude | Longitude | geometry | |
---|---|---|---|---|---|---|
0 | 3751435 | South of Market | 298336.0 | 37.781650 | -122.399117 | POINT (-122.399116945614 37.7816504619473) |
2 | 6276009 | Excelsior | 675000.0 | 37.719051 | -122.433999 | POINT (-122.433999199176 37.7190514589638) |
3 | 3751420 | South of Market | 256156.0 | 37.781650 | -122.399117 | POINT (-122.399116945614 37.7816504619473) |
4 | 7517378 | Noe Valley | 271139.0 | 37.746321 | -122.441520 | POINT (-122.441519528492 37.7463212609468) |
5 | 3735098 | Financial District/South Beach | 673432.0 | 37.785748 | -122.397399 | POINT (-122.397398669759 37.7857477114134) |
Using sjoin and mean functions, we compute average housing price in each neighborhood. The price is in units of million.
nbh_house_avg_value = gpd.tools.sjoin(df_housing.to_crs(nbrhoods.crs), nbrhoods, how="inner", op='intersects').groupby('nbrhood').mean()
nbh_house_avg_value = pd.DataFrame(data=nbh_house_avg_value.reset_index())
nbh_house_avg_value = nbh_house_avg_value.drop(columns=['Latitude', 'Longitude', 'index_right', 'incident_counts'])
nbh_house_avg_value.columns=['nbrhood', 'house_avg_price']
# Normalize the price by one million.
nbh_house_avg_value['house_avg_price'] = nbh_house_avg_value['house_avg_price'] / 1_000_000
nbh_house_avg_value.head()
nbrhood | house_avg_price | |
---|---|---|
0 | Alamo Square | 0.862016 |
1 | Anza Vista | 1.121534 |
2 | Balboa Terrace | 0.737963 |
3 | Bayview | 0.439684 |
4 | Bayview Heights | 0.296895 |
Finally, we merge the average housing price information with the nbrhoods GeoDataFrame.
nbrhoods = nbrhoods.merge(nbh_house_avg_value, on='nbrhood')
nbrhoods.head()
nbrhood | nid | sfar_distr | geometry | incident_counts | house_avg_price | |
---|---|---|---|---|---|---|
0 | Alamo Square | 6e | District 6 - Central North | POLYGON ((-122.4294839489174 37.77509623070431... | 687 | 0.862016 |
1 | Anza Vista | 6a | District 6 - Central North | POLYGON ((-122.4474643913587 37.77986335309237... | 310 | 1.121534 |
2 | Balboa Terrace | 4a | District 4 - Twin Peaks West | POLYGON ((-122.464508862148 37.73220849554402,... | 49 | 0.737963 |
3 | Bayview | 10a | District 10 - Southeast | POLYGON ((-122.38758527039 37.7502633777501, -... | 3185 | 0.439684 |
4 | Bernal Heights | 9a | District 9 - Central East | POLYGON ((-122.4037549223623 37.74919006373567... | 1561 | 0.448200 |
Delete df_import and df_housing datasets in order to save memory.
del df_import
del df_housing
In this section, we are going to use folium to produce crime and housing price maps of San Francisco using the data prepared in the previous two sections. Before that we first use GeoPandas' representative_point() function to generate a representative location for each neighborhood. This data will be used to create popups on the map.
#nbh_centroid = pd.DataFrame(nbrhoods.centroid)
nbh_centroid = pd.DataFrame(nbrhoods.representative_point())
nbh_centroid.columns=(['centroid'])
nbh_centroid['nbrhood'] = nbrhoods['nbrhood']
nbh_centroid['incident_counts'] = nbrhoods['incident_counts']
nbh_centroid['house_avg_price'] = nbrhoods['house_avg_price']
lat = []
lng = []
for index, row in nbh_centroid.iterrows():
tmp = str(row[0]).strip('POINT ()').split(' ')
lng.append(float(tmp[0]))
lat.append(float(tmp[1]))
#print(tmp[0], tmp[1])
nbh_centroid['Latitude'] = lat
nbh_centroid['Longitude'] = lng
nbh_centroid = nbh_centroid.drop(columns=['centroid'])
nbh_centroid.head()
nbrhood | incident_counts | house_avg_price | Latitude | Longitude | |
---|---|---|---|---|---|
0 | Alamo Square | 687 | 0.862016 | 37.776076 | -122.433919 |
1 | Anza Vista | 310 | 1.121534 | 37.780611 | -122.443255 |
2 | Balboa Terrace | 49 | 0.737963 | 37.730649 | -122.468267 |
3 | Bayview | 3185 | 0.439684 | 37.732391 | -122.387170 |
4 | Bernal Heights | 1561 | 0.448200 | 37.740230 | -122.415885 |
Define a function that generates popups.
def get_popups(df, field, name, map_object):
for lat, lng, nbrhood, value in zip( df['Latitude'],
df['Longitude'],
df['nbrhood'],
df[field]
):
label = ("{0}, {1}: {2:.2f}").format(nbrhood, name, value)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=2,
popup=label,
color='green',
fill=True,
fill_color='#3186cc',
fill_opacity=0.3).add_to(map_object)
Define San Francisco's GPS coordinates.
SF_Coord = (37.7792808, -122.4192363)
Create the crime map object.
# Create San Francisco base map
SF_crime_map = folium.Map(location=SF_Coord, zoom_start=12)
#geodata = gpd.read_file('./tmp/geo_export_0e291dd6-c6fb-40dd-8323-68a750ad5743.geojson')
# Crime data at the census tract level
threshold_scale = [0, 1000, 2000, 4000, 6000, 8000]
SF_crime_map.choropleth(geo_data = nbrhoods.to_json(),
data = nbrhoods,
columns = ['nbrhood', 'incident_counts'],
key_on = 'feature.properties.nbrhood',
fill_color = 'YlOrRd',
fill_opacity = 0.60,
line_opacity = 0.60,
legend_name = 'Number of incidents',
name = 'Number of Incidents',
threshold_scale = threshold_scale,
reset = True
)
get_popups(nbh_centroid, 'incident_counts', 'Incident Counts', SF_crime_map)
# Add control layer to the map
#folium.LayerControl().add_to(SF_crime_map)
#SF_crime_map
Create the housing price map object.
# Create San Francisco base map
SF_housing_map = folium.Map(location=SF_Coord, zoom_start=12)
threshold_scale2 = [0.0, 0.5, 1.0, 2.0, 4.0, 8.0]
SF_housing_map.choropleth(geo_data = nbrhoods.to_json(),
data = nbrhoods,
columns = ['nbrhood', 'house_avg_price'],
key_on = 'feature.properties.nbrhood',
fill_color = 'YlOrRd',
fill_opacity = 0.60,
line_opacity = 0.60,
legend_name = 'Average Housing Price (Million)',
name = 'Average Housing Price',
threshold_scale = threshold_scale2,
reset = True
)
get_popups(nbh_centroid, 'house_avg_price', 'Avg. House Price (Million)', SF_housing_map)
# Add control layer to the map
#folium.LayerControl().add_to(SF_housing_map)
#SF_housing_map
In this section, are going to use the Foursquare APIs to explore San Francisco neighborhoods and cluster them using k-means clustering. From the Toronto zipcode neighborhood exercise, we learn that k-means will generate at least one cluster that emphasizes on the restaurant section. This is exactly the kind of information needed when we are looking for possible locations for opening a new restaurant.
Set up Foursqure API id and basic API call parameters
# @hidden_cell
CLIENT_ID = 'JBAVIGVGG3N3AWC1FGO2G3U1N3GUOWBEKXFI1SDAOCYYPULD'
CLIENT_SECRET = 'GPGOGGAB5YFPREDIUHAT5OZNYRDBVGZH1WC21KBQMVEP3BIC'
VERSION = '20180927'
# Set up the FourSquare API call parameters
RADIUS = 500
LIMIT = 100
Define the function that extracts the category of the venue.
def get_category_type(row):
try:
categories_list = row['categories']
except:
categories_list = row['venue.categories']
if len(categories_list) == 0:
return None
else:
return categories_list[0]['name']
We use the function defined in the Toronto exercise. Here the function calls the explore method to return a list of recommended venues for each neighborhood.
def getNearbyVenues(names, latitudes, longitudes, radius, limit):
venues_check_list = []
venues_list=[]
idx = 0
for name, lat, lng in zip(names, latitudes, longitudes):
# create the API request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
lng,
radius,
limit)
# make the GET request
results = requests.get(url).json()["response"]['groups'][0]['items']
# return only relevant information for each nearby venue
venues_list.append([(
name,
lat,
lng,
v['venue']['name'],
v['venue']['location']['lat'],
v['venue']['location']['lng'],
v['venue']['categories'][0]['name']) for v in results])
num_of_venues_found = len(results)
if (num_of_venues_found == 0):
venues_check_list.append(False)
else:
venues_check_list.append(True)
print('{0:4d} Neighborhood: {1:35s}, number of venues found:{2:6d}'.format(idx, name, num_of_venues_found))
idx = idx + 1
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['nbrhood',
'nbrhood Latitude',
'nbrhood Longitude',
'Venue',
'Venue Latitude',
'Venue Longitude',
'Venue Category']
return(nearby_venues, venues_check_list)
Apply the function getNearbyVenues() to the neighborhoods whose coordinates are extracted from nbh_centroid.
nbhs = nbh_centroid.loc[:, 'nbrhood']
latitudes = nbh_centroid.loc[:, 'Latitude']
longitudes = nbh_centroid.loc[:, 'Longitude']
print('\n Search radius: {0:8.1f} meters'.format(RADIUS))
print(' Maximum number of venues: {0:6d}\n'.format(LIMIT))
SF_venues, SF_venues_check_list = getNearbyVenues(nbhs, latitudes, longitudes, RADIUS, LIMIT)
Search radius: 500.0 meters Maximum number of venues: 100 0 Neighborhood: Alamo Square , number of venues found: 72 1 Neighborhood: Anza Vista , number of venues found: 18 2 Neighborhood: Balboa Terrace , number of venues found: 15 3 Neighborhood: Bayview , number of venues found: 17 4 Neighborhood: Bernal Heights , number of venues found: 39 5 Neighborhood: Buena Vista Park/Ashbury Heights , number of venues found: 53 6 Neighborhood: Central Richmond , number of venues found: 96 7 Neighborhood: Central Sunset , number of venues found: 12 8 Neighborhood: Clarendon Heights , number of venues found: 13 9 Neighborhood: Corona Heights , number of venues found: 59 10 Neighborhood: Cow Hollow , number of venues found: 100 11 Neighborhood: Crocker Amazon , number of venues found: 16 12 Neighborhood: Diamond Heights , number of venues found: 10 13 Neighborhood: Downtown , number of venues found: 100 14 Neighborhood: Duboce Triangle , number of venues found: 87 15 Neighborhood: Eureka Valley / Dolores Heights , number of venues found: 100 16 Neighborhood: Excelsior , number of venues found: 4 17 Neighborhood: Financial District/Barbary Coast , number of venues found: 100 18 Neighborhood: Yerba Buena , number of venues found: 100 19 Neighborhood: Forest Hill , number of venues found: 6 20 Neighborhood: Forest Hills Extension , number of venues found: 12 21 Neighborhood: Forest Knolls , number of venues found: 11 22 Neighborhood: Glen Park , number of venues found: 47 23 Neighborhood: Golden Gate Heights , number of venues found: 14 24 Neighborhood: Golden Gate Park , number of venues found: 6 25 Neighborhood: Haight Ashbury , number of venues found: 100 26 Neighborhood: Hayes Valley , number of venues found: 100 27 Neighborhood: Hunters Point , number of venues found: 4 28 Neighborhood: Ingleside , number of venues found: 25 29 Neighborhood: Ingleside Heights , number of venues found: 8 30 Neighborhood: Ingleside Terrace , number of venues found: 12 31 Neighborhood: Inner Mission , number of venues found: 81 32 Neighborhood: Inner Parkside , number of venues found: 27 33 Neighborhood: Inner Richmond , number of venues found: 51 34 Neighborhood: Inner Sunset , number of venues found: 14 35 Neighborhood: Jordan Park / Laurel Heights , number of venues found: 35 36 Neighborhood: Lake Street , number of venues found: 19 37 Neighborhood: Monterey Heights , number of venues found: 4 38 Neighborhood: Lake Shore , number of venues found: 6 39 Neighborhood: Lakeside , number of venues found: 54 40 Neighborhood: Lone Mountain , number of venues found: 13 41 Neighborhood: Lower Pacific Heights , number of venues found: 100 42 Neighborhood: Marina , number of venues found: 84 43 Neighborhood: Merced Heights , number of venues found: 5 44 Neighborhood: Merced Manor , number of venues found: 13 45 Neighborhood: Midtown Terrace , number of venues found: 5 46 Neighborhood: South Beach , number of venues found: 56 47 Neighborhood: Miraloma Park , number of venues found: 7 48 Neighborhood: Mission Bay , number of venues found: 57 49 Neighborhood: Mission Dolores , number of venues found: 100 50 Neighborhood: Mission Terrace , number of venues found: 22 51 Neighborhood: Mount Davidson Manor , number of venues found: 18 52 Neighborhood: Noe Valley , number of venues found: 57 53 Neighborhood: North Beach , number of venues found: 100 54 Neighborhood: North Panhandle , number of venues found: 45 55 Neighborhood: North Waterfront , number of venues found: 39 56 Neighborhood: Oceanview , number of venues found: 8 57 Neighborhood: Outer Mission , number of venues found: 17 58 Neighborhood: Outer Parkside , number of venues found: 21 59 Neighborhood: Outer Richmond , number of venues found: 46 60 Neighborhood: Outer Sunset , number of venues found: 30 61 Neighborhood: Pacific Heights , number of venues found: 61 62 Neighborhood: Parkside , number of venues found: 34 63 Neighborhood: Cole Valley/Parnassus Heights , number of venues found: 25 64 Neighborhood: Pine Lake Park , number of venues found: 10 65 Neighborhood: Portola , number of venues found: 5 66 Neighborhood: Potrero Hill , number of venues found: 47 67 Neighborhood: Presidio , number of venues found: 12 68 Neighborhood: Presidio Heights , number of venues found: 26 69 Neighborhood: Russian Hill , number of venues found: 70 70 Neighborhood: Saint Francis Wood , number of venues found: 53 71 Neighborhood: Sea Cliff , number of venues found: 7 72 Neighborhood: Silver Terrace , number of venues found: 4 73 Neighborhood: South of Market , number of venues found: 93 74 Neighborhood: Stonestown , number of venues found: 17 75 Neighborhood: Sunnyside , number of venues found: 16 76 Neighborhood: Telegraph Hill , number of venues found: 100 77 Neighborhood: Twin Peaks , number of venues found: 8 78 Neighborhood: Van Ness/Civic Center , number of venues found: 72 79 Neighborhood: Visitacion Valley , number of venues found: 5 80 Neighborhood: West Portal , number of venues found: 58 81 Neighborhood: Western Addition , number of venues found: 62 82 Neighborhood: Westwood Highlands , number of venues found: 10 83 Neighborhood: Westwood Park , number of venues found: 32 84 Neighborhood: Lincoln Park , number of venues found: 22 85 Neighborhood: Sherwood Forest , number of venues found: 4 86 Neighborhood: Tenderloin , number of venues found: 100 87 Neighborhood: Central Waterfront/Dogpatch , number of venues found: 53 88 Neighborhood: Candlestick Point , number of venues found: 8 89 Neighborhood: Bayview Heights , number of venues found: 4 90 Neighborhood: Little Hollywood , number of venues found: 11 91 Neighborhood: Nob Hill , number of venues found: 48
print(SF_venues.shape)
SF_venues.head()
(3567, 7)
nbrhood | nbrhood Latitude | nbrhood Longitude | Venue | Venue Latitude | Venue Longitude | Venue Category | |
---|---|---|---|---|---|---|---|
0 | Alamo Square | 37.776076 | -122.433919 | Alamo Square | 37.775906 | -122.434047 | Park |
1 | Alamo Square | 37.776076 | -122.433919 | Painted Ladies | 37.776010 | -122.433179 | Historic Site |
2 | Alamo Square | 37.776076 | -122.433919 | Alamo Square Dog Park | 37.775878 | -122.435740 | Dog Run |
3 | Alamo Square | 37.776076 | -122.433919 | Originals Vinyl | 37.775835 | -122.431227 | Record Shop |
4 | Alamo Square | 37.776076 | -122.433919 | The Center SF | 37.774545 | -122.430730 | Spiritual Center |
Find out the number of unique categories can be curated from all the returned venues.
print('There are {} uniques categories.'.format(len(SF_venues['Venue Category'].unique())))
There are 343 uniques categories.
First we one-hot encode venue categories.
SF_onehot = pd.get_dummies(SF_venues[['Venue Category']], prefix="", prefix_sep="")
# add postcode column back to dataframe
SF_onehot['nbrhood'] = SF_venues['nbrhood']
# move postcode column to the first column
fixed_columns = [SF_onehot.columns[-1]] + list(SF_onehot.columns[:-1])
SF_onehot = SF_onehot[fixed_columns]
SF_onehot.shape
(3567, 344)
Group rows by neighborhood name and by taking the mean of the frequency of occurrence of each category.
SF_grouped = SF_onehot.groupby('nbrhood').mean().reset_index()
SF_grouped.head()
nbrhood | ATM | Acai House | Accessories Store | Adult Boutique | Afghan Restaurant | African Restaurant | Alternative Healer | American Restaurant | Antique Shop | Arcade | Argentinian Restaurant | Art Gallery | Art Museum | Arts & Crafts Store | Asian Restaurant | Athletics & Sports | Automotive Shop | BBQ Joint | Baby Store | Bagel Shop | Bakery | Bank | Bar | Baseball Field | Baseball Stadium | Basketball Court | Basketball Stadium | Beach | Bed & Breakfast | Beer Bar | Beer Garden | Beer Store | Belgian Restaurant | Big Box Store | Bike Rental / Bike Share | Bike Shop | Bistro | Board Shop | Boat or Ferry | Bookstore | Boutique | Bowling Alley | Boxing Gym | Brazilian Restaurant | Breakfast Spot | Brewery | Bubble Tea Shop | Buffet | Building | ... | Stadium | Stationery Store | Steakhouse | Street Food Gathering | Supermarket | Supplement Shop | Surf Spot | Sushi Restaurant | Szechuan Restaurant | Taco Place | Taiwanese Restaurant | Tapas Restaurant | Tattoo Parlor | Tea Room | Tech Startup | Tennis Court | Thai Restaurant | Theater | Thrift / Vintage Store | Tiki Bar | Tour Provider | Tourist Information Center | Toy / Game Store | Track | Track Stadium | Trade School | Trail | Train Station | Trattoria/Osteria | Tree | Tunnel | Turkish Restaurant | Tuscan Restaurant | Udon Restaurant | Used Bookstore | Vegetarian / Vegan Restaurant | Veterinarian | Video Game Store | Video Store | Vietnamese Restaurant | Vineyard | Wagashi Place | Weight Loss Center | Whisky Bar | Wine Bar | Wine Shop | Winery | Wings Joint | Women's Store | Yoga Studio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Alamo Square | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.013889 | 0.013889 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.027778 | 0.0 | 0.0 | 0.013889 | 0.013889 | 0.055556 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.027778 | 0.0 | 0.0 | 0.0 | 0.00 | 0.013889 | 0.013889 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.027778 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.013889 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.027778 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | Anza Vista | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.055556 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.055556 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.00 | 0.000000 | 0.000000 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.055556 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | Balboa Terrace | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.066667 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.066667 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.00 | 0.000000 | 0.000000 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.066667 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | Bayview | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.058824 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.058824 | 0.0 | 0.0 | 0.058824 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.00 | 0.000000 | 0.000000 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.058824 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.058824 | 0.058824 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | Bayview Heights | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.25 | 0.000000 | 0.000000 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 rows × 344 columns
Define a function that sorts the venues in descending order.
def return_most_common_venues(row, num_top_venues):
row_categories = row.iloc[1:]
row_categories_sorted = row_categories.sort_values(ascending=False)
return row_categories_sorted.index.values[0:num_top_venues]
Create a dataframe that contains venues in descending order for each zipcode area.
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['nbrhood']
for ind in np.arange(num_top_venues):
try:
columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
except:
columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
SF_venues_sorted = pd.DataFrame(columns=columns)
SF_venues_sorted['nbrhood'] = SF_grouped['nbrhood']
for ind in np.arange(SF_grouped.shape[0]):
SF_venues_sorted.iloc[ind, 1:] = return_most_common_venues(SF_grouped.iloc[ind, :], num_top_venues)
print(SF_venues_sorted.shape)
SF_venues_sorted
(92, 11)
nbrhood | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Alamo Square | Bar | Record Shop | Boutique | Italian Restaurant | Wine Bar | Sushi Restaurant | Gift Shop | Mediterranean Restaurant | BBQ Joint | Ethiopian Restaurant |
1 | Anza Vista | Café | Health & Beauty Service | Coffee Shop | Southern / Soul Food Restaurant | Mexican Restaurant | Grocery Store | Sandwich Place | Tunnel | Big Box Store | Arts & Crafts Store |
2 | Balboa Terrace | Light Rail Station | Pharmacy | Circus | Bakery | Park | Playground | Dessert Shop | Vietnamese Restaurant | Italian Restaurant | American Restaurant |
3 | Bayview | Southern / Soul Food Restaurant | Park | Light Rail Station | Chinese Restaurant | Thrift / Vintage Store | Theater | African Restaurant | Plaza | Gym | Bakery |
4 | Bayview Heights | Breakfast Spot | Park | Burger Joint | Latin American Restaurant | Food | Exhibit | Farmers Market | Fast Food Restaurant | Filipino Restaurant | Flea Market |
5 | Bernal Heights | Coffee Shop | Park | Bakery | Italian Restaurant | Playground | Gourmet Shop | Yoga Studio | Seafood Restaurant | Sandwich Place | Gym |
6 | Buena Vista Park/Ashbury Heights | Boutique | Park | Coffee Shop | Clothing Store | Convenience Store | Trail | Scenic Lookout | Gift Shop | Breakfast Spot | Café |
7 | Candlestick Point | Football Stadium | Park | Campground | Stadium | American Restaurant | Food & Drink Shop | Soccer Field | Flea Market | Event Space | Exhibit |
8 | Central Richmond | Grocery Store | Chinese Restaurant | Sushi Restaurant | Korean Restaurant | Café | Deli / Bodega | Bakery | Dim Sum Restaurant | Coffee Shop | Vietnamese Restaurant |
9 | Central Sunset | Chinese Restaurant | Gym / Fitness Center | Pilates Studio | Spa | Shoe Store | Food | Pet Store | Playground | Dessert Shop | Ethiopian Restaurant |
10 | Central Waterfront/Dogpatch | Bakery | Art Gallery | Cocktail Bar | Brewery | Wine Bar | Café | Gym / Fitness Center | Coffee Shop | Gift Shop | Dessert Shop |
11 | Clarendon Heights | Trail | Park | Road | Bus Stop | Scenic Lookout | Art Gallery | Playground | Convenience Store | Wine Bar | Dance Studio |
12 | Cole Valley/Parnassus Heights | Wine Bar | Park | Yoga Studio | Athletics & Sports | Burger Joint | Sports Bar | Sports Club | Breakfast Spot | Street Food Gathering | Mexican Restaurant |
13 | Corona Heights | Gay Bar | Park | Thai Restaurant | Cosmetics Shop | Dog Run | Sushi Restaurant | Dim Sum Restaurant | Grocery Store | Coffee Shop | Diner |
14 | Cow Hollow | Cosmetics Shop | Wine Bar | French Restaurant | Gym / Fitness Center | Italian Restaurant | Sandwich Place | Salad Place | Gym | Bakery | Salon / Barbershop |
15 | Crocker Amazon | Pharmacy | Gastropub | Coffee Shop | Bar | Latin American Restaurant | Scenic Lookout | Tennis Court | Basketball Court | Pizza Place | American Restaurant |
16 | Diamond Heights | Trail | Playground | Grocery Store | Pharmacy | Dim Sum Restaurant | Salon / Barbershop | Bus Station | Video Store | Coffee Shop | Shopping Mall |
17 | Downtown | Hotel | Cocktail Bar | Speakeasy | Theater | American Restaurant | Hostel | Breakfast Spot | Thai Restaurant | Bar | Coffee Shop |
18 | Duboce Triangle | Gay Bar | Coffee Shop | Gym | Sushi Restaurant | New American Restaurant | Mexican Restaurant | Grocery Store | Sandwich Place | Jewelry Store | Cocktail Bar |
19 | Eureka Valley / Dolores Heights | Gay Bar | Coffee Shop | New American Restaurant | Thai Restaurant | Pet Store | Playground | Park | Deli / Bodega | Mexican Restaurant | Men's Store |
20 | Excelsior | Convenience Store | Moving Target | Scenic Lookout | Lake | Flower Shop | Ethiopian Restaurant | Event Space | Exhibit | Farmers Market | Fast Food Restaurant |
21 | Financial District/Barbary Coast | Coffee Shop | Italian Restaurant | Food Truck | Men's Store | New American Restaurant | Sandwich Place | Gym | Seafood Restaurant | Café | Japanese Restaurant |
22 | Forest Hill | Japanese Restaurant | Hotpot Restaurant | Tennis Court | Park | Playground | French Restaurant | Cycle Studio | Coworking Space | Event Space | Exhibit |
23 | Forest Hills Extension | Convenience Store | Burger Joint | Hotpot Restaurant | Gym | French Restaurant | Park | Pharmacy | Dive Bar | Sandwich Place | Bus Stop |
24 | Forest Knolls | Trail | Park | Mountain | Garden | Fountain | Football Stadium | Ethiopian Restaurant | Event Space | French Restaurant | Exhibit |
25 | Glen Park | Park | Café | Trail | Coffee Shop | Grocery Store | Cosmetics Shop | Dive Bar | Spa | Tennis Court | Gift Shop |
26 | Golden Gate Heights | Trail | Park | Bus Stop | Playground | Scenic Lookout | Home Service | Tennis Court | Video Game Store | Flower Shop | Farmers Market |
27 | Golden Gate Park | Park | Track | Disc Golf | Bus Stop | Yoga Studio | Ethiopian Restaurant | Exhibit | Farmers Market | Fast Food Restaurant | Filipino Restaurant |
28 | Haight Ashbury | Café | Boutique | Coffee Shop | Thrift / Vintage Store | Pizza Place | Shoe Store | Clothing Store | Thai Restaurant | Gift Shop | Board Shop |
29 | Hayes Valley | Boutique | Wine Bar | Café | French Restaurant | Sushi Restaurant | Clothing Store | Cocktail Bar | Coffee Shop | Bubble Tea Shop | Mexican Restaurant |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
62 | Outer Mission | Latin American Restaurant | Cosmetics Shop | Spanish Restaurant | Chinese Restaurant | Mexican Restaurant | Bakery | Grocery Store | Food | Asian Restaurant | Motel |
63 | Outer Parkside | Coffee Shop | Chinese Restaurant | Pizza Place | Pharmacy | Beach | Thai Restaurant | Grocery Store | Bar | Bakery | Light Rail Station |
64 | Outer Richmond | Chinese Restaurant | Café | Japanese Restaurant | Sporting Goods Shop | Music Store | Bakery | Sandwich Place | Ramen Restaurant | Yoga Studio | Record Shop |
65 | Outer Sunset | Art Gallery | Mexican Restaurant | Thai Restaurant | Light Rail Station | Coffee Shop | Yoga Studio | Arts & Crafts Store | Brewery | Breakfast Spot | Bookstore |
66 | Pacific Heights | Cosmetics Shop | Coffee Shop | Park | Grocery Store | Juice Bar | Gym / Fitness Center | French Restaurant | Salon / Barbershop | Bar | Ice Cream Shop |
67 | Parkside | Chinese Restaurant | Dumpling Restaurant | Light Rail Station | Bubble Tea Shop | Café | Japanese Restaurant | Sandwich Place | Bar | Liquor Store | Burrito Place |
68 | Pine Lake Park | Park | Gym | Music Venue | Sandwich Place | Food Truck | Dog Run | Asian Restaurant | Hawaiian Restaurant | Event Space | Exhibit |
69 | Portola | Recreation Center | Lake | Playground | Shopping Mall | Bus Station | Food Truck | Food Stand | Electronics Store | Ethiopian Restaurant | Event Space |
70 | Potrero Hill | Park | Café | Brewery | Grocery Store | Playground | Breakfast Spot | Burger Joint | Bus Station | Pizza Place | Sandwich Place |
71 | Presidio | Museum | Brewery | American Restaurant | Outdoor Sculpture | Trail | Asian Restaurant | General Entertainment | Art Gallery | Park | Bowling Alley |
72 | Presidio Heights | American Restaurant | Park | Cosmetics Shop | Golf Course | Baby Store | New American Restaurant | Supermarket | Café | Bookstore | Miscellaneous Shop |
73 | Russian Hill | Park | Chocolate Shop | Ice Cream Shop | Coffee Shop | Hotel | Bike Rental / Bike Share | Tour Provider | Diner | Art Gallery | Seafood Restaurant |
74 | Saint Francis Wood | Coffee Shop | Chinese Restaurant | Italian Restaurant | Indian Restaurant | Pizza Place | Gym / Fitness Center | Wine Bar | Mexican Restaurant | Pub | Music Store |
75 | Sea Cliff | Trail | Tea Room | Beach | Scenic Lookout | Golf Course | Neighborhood | Football Stadium | Food Truck | Event Space | Exhibit |
76 | Sherwood Forest | Tree | Park | Monument / Landmark | Trail | Yoga Studio | Flea Market | Ethiopian Restaurant | Event Space | Exhibit | Farmers Market |
77 | Silver Terrace | Park | Liquor Store | Soccer Field | Outdoor Gym | Yoga Studio | Electronics Store | Event Space | Exhibit | Farmers Market | Fast Food Restaurant |
78 | South Beach | Café | Gym | Sandwich Place | Coffee Shop | Deli / Bodega | Park | Scenic Lookout | Residential Building (Apartment / Condo) | Spa | American Restaurant |
79 | South of Market | Nightclub | Sandwich Place | Art Gallery | Coffee Shop | Furniture / Home Store | Food Truck | Jewelry Store | Bar | Café | Clothing Store |
80 | Stonestown | Café | Sandwich Place | Pizza Place | College Cafeteria | Tennis Court | Park | Coffee Shop | Gym | Juice Bar | Mexican Restaurant |
81 | Sunnyside | Bar | Vietnamese Restaurant | Baseball Field | Trail | Grocery Store | Dumpling Restaurant | Soccer Field | Cantonese Restaurant | Spa | College Gym |
82 | Telegraph Hill | Italian Restaurant | Pizza Place | Cocktail Bar | Coffee Shop | Park | Mexican Restaurant | Bakery | Trail | Chinese Restaurant | Scenic Lookout |
83 | Tenderloin | Coffee Shop | Vietnamese Restaurant | Thai Restaurant | Theater | Cocktail Bar | Speakeasy | Indian Restaurant | Burger Joint | Sandwich Place | Hotel |
84 | Twin Peaks | Trail | Scenic Lookout | Reservoir | Yoga Studio | Filipino Restaurant | Egyptian Restaurant | Electronics Store | Ethiopian Restaurant | Event Space | Exhibit |
85 | Van Ness/Civic Center | Thai Restaurant | Vietnamese Restaurant | Sandwich Place | Sushi Restaurant | Coffee Shop | Bar | Korean Restaurant | Southern / Soul Food Restaurant | Vegetarian / Vegan Restaurant | Theater |
86 | Visitacion Valley | Garden | Baseball Field | Park | Pool | Trail | Flower Shop | Event Space | Exhibit | Farmers Market | Fast Food Restaurant |
87 | West Portal | Coffee Shop | Chinese Restaurant | Music Store | Pizza Place | Indian Restaurant | Wine Bar | Park | Mexican Restaurant | Pub | Gym / Fitness Center |
88 | Western Addition | Cosmetics Shop | Shopping Mall | Jazz Club | Japanese Restaurant | New American Restaurant | Furniture / Home Store | Sushi Restaurant | Gift Shop | Tea Room | Grocery Store |
89 | Westwood Highlands | Yoga Studio | Diner | Sushi Restaurant | Bus Line | Food | Monument / Landmark | Gun Range | Cantonese Restaurant | Breakfast Spot | Trail |
90 | Westwood Park | Yoga Studio | Asian Restaurant | Poke Place | Pharmacy | Coffee Shop | Bubble Tea Shop | Café | Gastropub | Food | Grocery Store |
91 | Yerba Buena | Coffee Shop | Hotel | Sushi Restaurant | Sandwich Place | Gym / Fitness Center | Café | Museum | Art Museum | Bar | Pizza Place |
92 rows × 11 columns
We use k-means to cluster the results into 5 clusters.
# Set the number of clusters
kclusters = 5
SF_grouped_clustering = SF_grouped.drop('nbrhood', 1)
# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=34).fit(SF_grouped_clustering)
# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]
array([1, 1, 3, 3, 0, 1, 3, 3, 1, 3], dtype=int32)
Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
# Create the dataframe
SF_merged = SF_venues_sorted
# Add clustering labels
SF_merged['Cluster Labels'] = kmeans.labels_
# Merge SF_grouped with SF_data to add latitude/longitude for each neighborhood
SF_merged = nbh_centroid.join(SF_venues_sorted.set_index('nbrhood'), on='nbrhood')
SF_merged.head()
nbrhood | incident_counts | house_avg_price | Latitude | Longitude | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | Cluster Labels | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Alamo Square | 687 | 0.862016 | 37.776076 | -122.433919 | Bar | Record Shop | Boutique | Italian Restaurant | Wine Bar | Sushi Restaurant | Gift Shop | Mediterranean Restaurant | BBQ Joint | Ethiopian Restaurant | 1 |
1 | Anza Vista | 310 | 1.121534 | 37.780611 | -122.443255 | Café | Health & Beauty Service | Coffee Shop | Southern / Soul Food Restaurant | Mexican Restaurant | Grocery Store | Sandwich Place | Tunnel | Big Box Store | Arts & Crafts Store | 1 |
2 | Balboa Terrace | 49 | 0.737963 | 37.730649 | -122.468267 | Light Rail Station | Pharmacy | Circus | Bakery | Park | Playground | Dessert Shop | Vietnamese Restaurant | Italian Restaurant | American Restaurant | 3 |
3 | Bayview | 3185 | 0.439684 | 37.732391 | -122.387170 | Southern / Soul Food Restaurant | Park | Light Rail Station | Chinese Restaurant | Thrift / Vintage Store | Theater | African Restaurant | Plaza | Gym | Bakery | 3 |
4 | Bernal Heights | 1561 | 0.448200 | 37.740230 | -122.415885 | Coffee Shop | Park | Bakery | Italian Restaurant | Playground | Gourmet Shop | Yoga Studio | Seafood Restaurant | Sandwich Place | Gym | 1 |
Generate the San Francisco neighborhood clusters map.
# Create San Francisco base map
SF_cluster_map = folium.Map(location=SF_Coord, zoom_start=12)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**3.2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to map
markers_colors = []
for lat, lng, nbrhood, cluster in zip(
SF_merged['Latitude'],
SF_merged['Longitude'],
SF_merged['nbrhood'],
SF_merged['Cluster Labels']):
label = ("Cluster : {}, Neighborhood: {}").format(cluster, nbrhood)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color=rainbow[cluster-1],
fill=True,
fill_color=rainbow[cluster-1],
fill_opacity=0.7).add_to(SF_cluster_map)
#SF_cluster_map
display(SF_crime_map)
display(SF_housing_map)
display(SF_cluster_map)
The crime and housing price maps suggest that North Beach has a relatively low crime rate and an somewhat below average housing price. This neighborhood is close to attractions such as the Fisherman's Wharf, the Lombard Street and the Colt Tower. This neighborhood belongs to the first cluster. Let's take a look at this cluster.
def examine_clusters(id):
return SF_merged.loc[SF_merged['Cluster Labels'] == id, SF_merged.columns[[0] + [1] + list(range(5, SF_merged.shape[1]))]]
pd.set_option('display.max_rows', 100)
examine_clusters(0)
nbrhood | incident_counts | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | Cluster Labels | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
19 | Forest Hill | 89 | Japanese Restaurant | Hotpot Restaurant | Tennis Court | Park | Playground | French Restaurant | Cycle Studio | Coworking Space | Event Space | Exhibit | 0 |
24 | Golden Gate Park | 544 | Park | Track | Disc Golf | Bus Stop | Yoga Studio | Ethiopian Restaurant | Exhibit | Farmers Market | Fast Food Restaurant | Filipino Restaurant | 0 |
43 | Merced Heights | 33 | Garden | Racetrack | Park | Liquor Store | Light Rail Station | Yoga Studio | Flea Market | Event Space | Exhibit | Farmers Market | 0 |
44 | Merced Manor | 61 | Gym | Park | Art Gallery | Tennis Court | Japanese Restaurant | Bubble Tea Shop | Music Venue | Weight Loss Center | Food Truck | Pet Store | 0 |
47 | Miraloma Park | 191 | Park | Playground | Monument / Landmark | Tree | Bus Line | College Auditorium | Event Space | Exhibit | Farmers Market | Fast Food Restaurant | 0 |
64 | Pine Lake Park | 36 | Park | Gym | Music Venue | Sandwich Place | Food Truck | Dog Run | Asian Restaurant | Hawaiian Restaurant | Event Space | Exhibit | 0 |
72 | Silver Terrace | 678 | Park | Liquor Store | Soccer Field | Outdoor Gym | Yoga Studio | Electronics Store | Event Space | Exhibit | Farmers Market | Fast Food Restaurant | 0 |
89 | Bayview Heights | 253 | Breakfast Spot | Park | Burger Joint | Latin American Restaurant | Food | Exhibit | Farmers Market | Fast Food Restaurant | Filipino Restaurant | Flea Market | 0 |
So indeed, the first few most common venues in neighborhoods that belong to cluster #0 are all restaurants which include bars, coffee shops/cafe, Chinese/Japanese/Korean/American/Italian restaurants.
According to the maps, it appears that the neighborhood Russian Hills has a relatively low crime rate and housing price. This neighborhood is close several attractions too. Let's use Foursquare API to explore vicinities of this neighborhood.
nbh_name = 'North Beach'
nbh_index = nbrhoods.index[nbrhoods['nbrhood'] == nbh_name][0]
nbh_lat, nbh_lng = nbh_centroid.loc[nbh_index, ['Latitude', 'Longitude']]
nbrhoods.loc[nbh_index]
nbrhood North Beach nid 8d sfar_distr District 8 - Northeast geometry POLYGON ((-122.4172945622149 37.80506527491107... incident_counts 732 house_avg_price 0.799696 Name: 53, dtype: object
SF_venues_sorted.loc[SF_venues_sorted['nbrhood'] == nbh_name]
nbrhood | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | Cluster Labels | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
58 | North Beach | Italian Restaurant | Pizza Place | Chinese Restaurant | Bakery | Cocktail Bar | Coffee Shop | Café | Dive Bar | Deli / Bodega | Yoga Studio | 1 |
This table shows that the North Beach already has many restaurants in the area. This sort of indicates that indeed North Beach is an ideal neighborhood for opening a new restanrant. But we have many coompetitors! To know our potential competitors, we use Foursquare API and this time add food to the parameter section to look for restaurants in this neighborhood. Using the results, we'll map out our competitors.
# Set up the FourSquare API call
section = 'food'
url = 'https://api.foursquare.com/v2/venues/explore?client_id={0}&client_secret={1}&v={2}&ll={3},{4}§ion={5}&radius={6}&limit={7}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
nbh_lat,
nbh_lng,
section,
RADIUS,
LIMIT)
# Fetch the top 100 venues
results = requests.get(url).json()
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head()
name | categories | lat | lng | |
---|---|---|---|---|
0 | Tony’s Pizza Napoletana | Pizza Place | 37.800328 | -122.409040 |
1 | Park Tavern | New American Restaurant | 37.801097 | -122.409301 |
2 | Trattoria Contadina | Trattoria/Osteria | 37.800078 | -122.412422 |
3 | The Italian Homemade Company | Italian Restaurant | 37.801497 | -122.411795 |
4 | Mario's Bohemian Cigar Store Cafe | Café | 37.800391 | -122.409876 |
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))
62 venues were returned by Foursquare.
# Create San Francisco base map
target_map = folium.Map(location=(nbh_lat, nbh_lng), zoom_start=17)
for lat, lng, categories in zip(
nearby_venues['lat'],
nearby_venues['lng'],
nearby_venues['categories']):
label = ("{}").format(categories)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=6,
popup=label,
color='green',
fill=True,
fill_color='green',
fill_opacity=0.7).add_to(target_map)
target_map
The above map shows that the restaurants are located mostly in the blocks along Columbus Avene. We can get a list of restaurant categories from nearby_venues.
venue_category_list = list(nearby_venues['categories'].unique())
venue_category_list
['Pizza Place', 'New American Restaurant', 'Trattoria/Osteria', 'Italian Restaurant', 'Café', 'Seafood Restaurant', 'Argentinian Restaurant', 'Bakery', 'Tuscan Restaurant', 'Latin American Restaurant', 'South American Restaurant', 'Breakfast Spot', 'Deli / Bodega', 'Sicilian Restaurant', 'Chinese Restaurant', 'Taco Place', 'Mexican Restaurant', 'Sushi Restaurant', 'Asian Restaurant', 'Sandwich Place', 'Diner', 'Burger Joint', 'Gastropub', 'French Restaurant', 'Persian Restaurant', 'Thai Restaurant', 'Indian Restaurant', 'Southern / Soul Food Restaurant']
By analyzing public datasets obtained from DataSF, we locate North Beach as a possible neighborhood that is ideal for opening a new restaurant. We rely on aggregating crime and housing price data, the two primary factors that are critical for determining the location. With the help from the Foursquare API, we further demonstrate that North Beach indeed is a good candidate by showing there are already restaurants in this area along the Columbus Avenue. We also obtain a list of our competitors by using again the Foursquare Explore API, and are able to pinpoint the competitors' location. At the end, while we have identified an ideal neighborhood, we are also facing rather strong competitions. To differenciate our new restaurant from competitors, we need inputs from the data. Obviously, our analysis has rooms for improvement.
Visibility: Our results indicate that North Beach should have sizable foot and car traffic along the Columbus Avenue. So to follow these traffics, we can certainly open our new restaurant somewhere on the avenue. A good alternative is the blocks away from the avenue. However, we need to analyze the data in further detail so that the traffic pattern can be revealed.
Competitor analysis: In Section 4.2, we very list the categories of restaurant in North Beach neighborhood. This list provides an excellent overview of our competitors. It also guides us in determining which category we should be focusing on or avoiding. An obvious improvement here is to cluster the restaurant categories. The results could let us know the landscape of the restaurant business in this neighborhood.
Parking: This is a rather important factor that is not addressed in our analysis. While we have public parking space data, there are many private parkings in San Francisco and we don't have the data. The next step is to obtain a distribution of these parking spaces. Or one could extract the parking information in realtime by using the ParkWhiz API calls.
In this capstone project, we address the business problem of finding a good location in San Francisco for opening a new restaurant. We identify the most important factors that could impact the choice. Using crime and housing price datasets, we are able to locate a possible neighborhood that has a relatively low crime rate and housing cost. Foursquare API recommendation results also seem to support our pick. But we equally are facing competitions. We have discussed several possible improvements. It would be a very interesting followup project if somehow we could automate the location recommendation process. In thsi project, we sort of idtentify the neighborhood by just eyeballing the maps and results. To build an automatic recommender, we would have to design a quantitative measure that allows us to gauge each location. The final location is obviously a compromise between all the factors that could impact the selection. So in addition to the clustering algorithm, we may need other machine learning models such as regression to determine or estimate the score. We'll leave this for future projects.