import pandas as pd
inspections = pd.read_csv("../data/inspections.csv", index_col=0)
inspections.head(3)
CAMIS | DBA | BORO | BUILDING | STREET | ZIPCODE | PHONE | CUISINE DESCRIPTION | INSPECTION DATE | ACTION | VIOLATION CODE | VIOLATION DESCRIPTION | CRITICAL FLAG | SCORE | GRADE | GRADE DATE | RECORD DATE | INSPECTION TYPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 41158108 | NICK'S GOURMET DELI | QUEENS | 7415 | DITMARS BOULEVARD | 11370 | 7182788338 | American | 07/21/2015 | Violations were cited in the following area(s). | 04L | Evidence of mice or live mice present in facil... | Critical | 11.0 | NaN | NaN | 09/27/2016 | Cycle Inspection / Initial Inspection |
1 | 41187577 | HANSOL NUTRITION CENTER | QUEENS | 16026 | NORTHERN BOULEVARD | 11358 | 7188880200 | Korean | 07/13/2016 | Violations were cited in the following area(s). | 06A | Personal cleanliness inadequate. Outer garment... | Critical | 30.0 | NaN | NaN | 09/27/2016 | Cycle Inspection / Initial Inspection |
2 | 41705988 | KURA | MANHATTAN | 130 | ST MARKS PLACE | 10009 | 2122281010 | Japanese | 05/08/2013 | Violations were cited in the following area(s). | 02B | Hot food item not held at or above 140º F. | Critical | 27.0 | NaN | NaN | 09/27/2016 | Pre-permit (Operational) / Initial Inspection |
inspections['INSPECTION DATE'] = pd.to_datetime(inspections['INSPECTION DATE'])
# random_camis = inspections.sample().iloc[0]['CAMIS']
# inspections[inspections['CAMIS'] == random_camis].sort_values(by='INSPECTION DATE')['INSPECTION DATE']
Get initial inspection date.
inspections_f = inspections.groupby('CAMIS')\
.apply(lambda df: inspections[inspections['CAMIS'] == df['CAMIS'].iloc[0]]\
.sort_values(by="INSPECTION DATE")\
.iloc[0]\
.drop('CAMIS')
)\
.rename(columns={'INSPECTION DATE': 'INITIAL INSPECTION DATE'})
inspections_f.sample(1)
DBA | BORO | BUILDING | STREET | ZIPCODE | PHONE | CUISINE DESCRIPTION | INITIAL INSPECTION DATE | ACTION | VIOLATION CODE | VIOLATION DESCRIPTION | CRITICAL FLAG | SCORE | GRADE | GRADE DATE | RECORD DATE | INSPECTION TYPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CAMIS | |||||||||||||||||
50001937 | TASTY CHICKEN | BROOKLYN | 1687 | 86TH STREET | 11214 | 7182591111 | American | 08/21/2013 | Violations were cited in the following area(s). | 15L | Smoke free workplace smoking policy inadequate... | Not Critical | NaN | NaN | NaN | 09/27/2016 | Smoke-Free Air Act / Initial Inspection |
Not sure why, but this operation drops the DBA
field. None others are affected. Maybe something with the indexing?
Let's take the most recent inspection date.
inspections_ff = inspections_f.copy()
inspections_ff['LATEST INSPECTION DATE'] = inspections_ff\
.apply(lambda srs: inspections[inspections['CAMIS'] == srs.name]\
.sort_values(by='INSPECTION DATE')\
.iloc[-1]\
['INSPECTION DATE'],
axis='columns')
Rename the INSPECTION TYPE
to INITIAL INSPECTION TYPE
(this flags new establishments from pre-existing ones, in the cotext of our dataset).
inspections_ff = inspections_ff.rename(columns={'INSPECTION TYPE': 'INITIAL INSPECTION TYPE'})
Checking the flags:
inspections_ff['INITIAL INSPECTION TYPE'].value_counts()
Cycle Inspection / Initial Inspection 12537 Pre-permit (Operational) / Initial Inspection 7630 Pre-permit (Non-operational) / Initial Inspection 2256 Cycle Inspection / Re-inspection 877 Administrative Miscellaneous / Initial Inspection 683 Smoke-Free Air Act / Initial Inspection 281 Trans Fat / Initial Inspection 247 Pre-permit (Operational) / Re-inspection 82 Trans Fat / Compliance Inspection 80 Trans Fat / Re-inspection 75 Inter-Agency Task Force / Initial Inspection 53 Calorie Posting / Initial Inspection 30 Trans Fat / Second Compliance Inspection 27 Administrative Miscellaneous / Re-inspection 27 Smoke-Free Air Act / Re-inspection 23 Administrative Miscellaneous / Compliance Inspection 14 Cycle Inspection / Compliance Inspection 12 Pre-permit (Operational) / Compliance Inspection 12 Cycle Inspection / Reopening Inspection 8 Smoke-Free Air Act / Compliance Inspection 6 Calorie Posting / Re-inspection 5 Pre-permit (Operational) / Reopening Inspection 4 Pre-permit (Non-operational) / Re-inspection 3 Cycle Inspection / Second Compliance Inspection 2 Administrative Miscellaneous / Second Compliance Inspection 2 Pre-permit (Operational) / Second Compliance Inspection 1 Smoke-Free Air Act / Limited Inspection 1 Calorie Posting / Compliance Inspection 1 Smoke-Free Air Act / Second Compliance Inspection 1 Name: INITIAL INSPECTION TYPE, dtype: int64
A lot of them are null.
inspections_ff['INITIAL INSPECTION TYPE'].isnull().astype(int).sum()
1094
Remember, this is new establishments that have not been inspected yet. That the numbers match up here is encouraging.
inspections['INSPECTION TYPE'].isnull().astype(int).sum()
1094
We reattach the lost DBA column.
inspections_ff.head(1)
DBA | BORO | BUILDING | STREET | ZIPCODE | PHONE | CUISINE DESCRIPTION | INITIAL INSPECTION DATE | ACTION | VIOLATION CODE | VIOLATION DESCRIPTION | CRITICAL FLAG | SCORE | GRADE | GRADE DATE | RECORD DATE | INITIAL INSPECTION TYPE | LATEST INSPECTION DATE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CAMIS | ||||||||||||||||||
30075445 | NaN | BRONX | 1007.0 | MORRIS PARK AVE | 10462 | 7.188925e+09 | Bakery | 2013-06-01 | Violations were cited in the following area(s). | 16B | The original nutritional fact labels and/or in... | Not Critical | NaN | NaN | NaN | 09/24/2016 | Trans Fat / Compliance Inspection | 2016-02-18 |
inspections_fff = inspections_ff.copy()
inspections_fff['DBA'] = inspections_fff.apply(lambda srs:
inspections[inspections['CAMIS'] == srs.name]\
.iloc[0]['DBA'],
axis='columns')
Prepend descriptors, to more easily distinguish things down the road.
inspections_fff.columns = ["DOHMH " + col for col in inspections_fff.columns]
inspections_fff.sample()
DOHMH DBA | DOHMH BORO | DOHMH BUILDING | DOHMH STREET | DOHMH ZIPCODE | DOHMH PHONE | DOHMH CUISINE DESCRIPTION | DOHMH INITIAL INSPECTION DATE | DOHMH ACTION | DOHMH VIOLATION CODE | DOHMH VIOLATION DESCRIPTION | DOHMH CRITICAL FLAG | DOHMH SCORE | DOHMH GRADE | DOHMH GRADE DATE | DOHMH RECORD DATE | DOHMH INITIAL INSPECTION TYPE | DOHMH LATEST INSPECTION DATE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CAMIS | ||||||||||||||||||
40374834 | CASA BELLA | MANHATTAN | 127.0 | MULBERRY STREET | 10013 | 2.124314e+09 | Italian | 2013-08-15 | Violations were cited in the following area(s). | 06D | Food contact surface not properly washed, rins... | Critical | 6.0 | NaN | NaN | 09/24/2016 | Cycle Inspection / Initial Inspection | 2016-04-26 |
Now we get information from Yelp!
from yelp.client import Client
from yelp.oauth1_authenticator import Oauth1Authenticator
from yelp.errors import BusinessUnavailable
import os
import json
def import_credentials(filename='../data/yelp_credentials.json'):
try:
data = json.load(open(filename))
return data
except:
raise IOError('This API requires Yelp credentials to work. Did you forget to define them?')
credentials = import_credentials()
auth = Oauth1Authenticator(
consumer_key=credentials['consumer_key'],
consumer_secret=credentials['consumer_secret'],
token=credentials['token'],
token_secret=credentials['token_secret']
)
client = Client(auth)
from tqdm import tqdm
import httplib2.HttpLib2Error
--------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-77-ee8a03be3a31> in <module>() ----> 1 from httplib2.error import HTTPError ImportError: No module named 'httplib2.error'
from urllib.error import HTTPError
import yelp
def yelp_phone_fetch(num):
"""
Performs the phone search described in notebook 02 to fetch information on the entity associated with a number.
"""
if not num:
return None
else:
try:
business = client.phone_search(num).businesses[0]
if business and business.location and business.location.coordinate:
return {
'Yelp ID': business.id,
'Yelp Is Claimed': business.is_claimed,
'Yelp Is Closed': business.is_closed,
'Yelp Name': business.name,
'Yelp URL': business.url,
'Yelp Review Count': business.review_count,
'Yelp Categories': business.categories,
'Yelp Rating': business.rating,
'Yelp Address': business.location.address,
'Yelp Neighborhoods': business.location.neighborhoods,
'Yelp Latitude': business.location.coordinate.latitude,
'Yelp Longitude': business.location.coordinate.longitude,
}
else: # Partial information, skip.
return None
except IndexError: # Phone search failed!
return None
except yelp.errors.InvalidParameter: # Invalid number!
return None
After some testing there appears to be a very significant miss rate. Remember, we're fuzzy matching phone numbers from one data set with phone numbers in another dataset. It's far from 100% that we'll get something back out. I'm interested in what percentage of the time we're successful.
def random_number_dba_tuple():
random_number, random_dba = inspections_fff.sample().iloc[0][['DOHMH PHONE', 'DOHMH DBA']]
random_number = str(int(random_number))
return random_number, random_dba
random_number_dba_tuple()
(2125878880, 'MULTI TASTES DINER')
yelp_phone_fetch(random_number_dba_tuple()[0])
{'Yelp Address': ['41 E 11th St'], 'Yelp Categories': [Category(name='Japanese', alias='japanese')], 'Yelp ID': 'ootoya-greenwich-village-new-york', 'Yelp Is Claimed': True, 'Yelp Is Closed': False, 'Yelp Latitude': 40.7333107, 'Yelp Longitude': -73.9929962, 'Yelp Name': 'Ootoya Greenwich Village', 'Yelp Neighborhoods': ['Greenwich Village'], 'Yelp Rating': 4.0, 'Yelp Review Count': 135, 'Yelp URL': 'https://www.yelp.com/biz/ootoya-greenwich-village-new-york?adjust_creative=dkJPGu_jtTyHwsEgZIZN6g&utm_campaign=yelp_api&utm_medium=api_v2_phone_search&utm_source=dkJPGu_jtTyHwsEgZIZN6g'}
Hey I've been there! Nice. More rigorously:
one_hundred_randoms = [random_number_dba_tuple() for i in range(100)]
testset = [yelp_phone_fetch(num) for num, placename in tqdm(one_hundred_randoms)]
0%| | 0/100 [00:00<?, ?it/s] 1%|▍ | 1/100 [00:00<00:25, 3.85it/s] 2%|▊ | 2/100 [00:00<00:28, 3.45it/s] 3%|█▎ | 3/100 [00:00<00:25, 3.74it/s] 4%|█▋ | 4/100 [00:02<00:52, 1.84it/s] 5%|██ | 5/100 [00:02<00:42, 2.25it/s] 6%|██▌ | 6/100 [00:02<00:35, 2.62it/s] 7%|██▉ | 7/100 [00:02<00:31, 2.99it/s] 8%|███▎ | 8/100 [00:02<00:29, 3.10it/s] 9%|███▊ | 9/100 [00:03<00:32, 2.78it/s] 10%|████ | 10/100 [00:03<00:30, 2.94it/s] 11%|████▌ | 11/100 [00:03<00:26, 3.33it/s] 12%|████▉ | 12/100 [00:04<00:24, 3.54it/s] 13%|█████▎ | 13/100 [00:04<00:23, 3.64it/s] 14%|█████▋ | 14/100 [00:04<00:29, 2.95it/s] 15%|██████▏ | 15/100 [00:05<00:33, 2.58it/s] 16%|██████▌ | 16/100 [00:05<00:29, 2.88it/s] 17%|██████▉ | 17/100 [00:05<00:25, 3.22it/s] 18%|███████▍ | 18/100 [00:06<00:23, 3.47it/s] 19%|███████▊ | 19/100 [00:06<00:20, 3.86it/s] 20%|████████▏ | 20/100 [00:06<00:20, 3.96it/s] 21%|████████▌ | 21/100 [00:06<00:18, 4.35it/s] 22%|█████████ | 22/100 [00:07<00:18, 4.14it/s] 23%|█████████▍ | 23/100 [00:07<00:23, 3.23it/s] 24%|█████████▊ | 24/100 [00:07<00:27, 2.72it/s] 25%|██████████▎ | 25/100 [00:08<00:24, 3.10it/s] 26%|██████████▋ | 26/100 [00:08<00:20, 3.55it/s] 27%|███████████ | 27/100 [00:08<00:19, 3.65it/s] 28%|███████████▍ | 28/100 [00:09<00:24, 2.89it/s] 29%|███████████▉ | 29/100 [00:09<00:21, 3.36it/s] 30%|████████████▎ | 30/100 [00:09<00:24, 2.89it/s] 31%|████████████▋ | 31/100 [00:10<00:20, 3.31it/s] 32%|█████████████ | 32/100 [00:10<00:18, 3.63it/s] 33%|█████████████▌ | 33/100 [00:10<00:17, 3.74it/s] 34%|█████████████▉ | 34/100 [00:10<00:16, 3.98it/s] 35%|██████████████▎ | 35/100 [00:10<00:14, 4.38it/s] 36%|██████████████▊ | 36/100 [00:11<00:14, 4.30it/s] 37%|███████████████▏ | 37/100 [00:11<00:13, 4.82it/s] 38%|███████████████▌ | 38/100 [00:11<00:13, 4.54it/s] 39%|███████████████▉ | 39/100 [00:11<00:12, 4.75it/s] 40%|████████████████▍ | 40/100 [00:11<00:12, 4.87it/s] 41%|████████████████▊ | 41/100 [00:12<00:12, 4.81it/s] 42%|█████████████████▏ | 42/100 [00:12<00:11, 5.05it/s] 43%|█████████████████▋ | 43/100 [00:12<00:10, 5.27it/s] 44%|██████████████████ | 44/100 [00:12<00:11, 4.93it/s] 45%|██████████████████▍ | 45/100 [00:12<00:11, 4.97it/s] 46%|██████████████████▊ | 46/100 [00:13<00:10, 4.99it/s] 47%|███████████████████▎ | 47/100 [00:13<00:11, 4.72it/s] 48%|███████████████████▋ | 48/100 [00:13<00:14, 3.51it/s] 49%|████████████████████ | 49/100 [00:14<00:17, 2.83it/s] 50%|████████████████████▌ | 50/100 [00:14<00:15, 3.23it/s] 51%|████████████████████▉ | 51/100 [00:14<00:17, 2.85it/s] 52%|█████████████████████▎ | 52/100 [00:15<00:15, 3.19it/s] 53%|█████████████████████▋ | 53/100 [00:15<00:16, 2.81it/s] 54%|██████████████████████▏ | 54/100 [00:16<00:17, 2.56it/s] 55%|██████████████████████▌ | 55/100 [00:16<00:15, 2.86it/s] 56%|██████████████████████▉ | 56/100 [00:16<00:13, 3.21it/s] 57%|███████████████████████▎ | 57/100 [00:16<00:11, 3.63it/s] 58%|███████████████████████▊ | 58/100 [00:17<00:12, 3.31it/s] 59%|████████████████████████▏ | 59/100 [00:17<00:11, 3.59it/s] 60%|████████████████████████▌ | 60/100 [00:17<00:10, 3.90it/s] 61%|█████████████████████████ | 61/100 [00:17<00:09, 4.20it/s] 62%|█████████████████████████▍ | 62/100 [00:17<00:08, 4.36it/s] 63%|█████████████████████████▊ | 63/100 [00:18<00:08, 4.42it/s] 64%|██████████████████████████▏ | 64/100 [00:18<00:07, 4.93it/s] 65%|██████████████████████████▋ | 65/100 [00:18<00:07, 4.83it/s] 66%|███████████████████████████ | 66/100 [00:18<00:07, 4.52it/s] 67%|███████████████████████████▍ | 67/100 [00:18<00:06, 4.74it/s] 68%|███████████████████████████▉ | 68/100 [00:19<00:06, 4.88it/s] 69%|████████████████████████████▎ | 69/100 [00:19<00:07, 4.41it/s] 70%|████████████████████████████▋ | 70/100 [00:20<00:09, 3.05it/s] 71%|█████████████████████████████ | 71/100 [00:20<00:08, 3.55it/s] 72%|█████████████████████████████▌ | 72/100 [00:20<00:07, 3.94it/s] 73%|█████████████████████████████▉ | 73/100 [00:20<00:06, 4.22it/s] 74%|██████████████████████████████▎ | 74/100 [00:20<00:05, 4.35it/s] 75%|██████████████████████████████▊ | 75/100 [00:21<00:07, 3.35it/s] 76%|███████████████████████████████▏ | 76/100 [00:21<00:06, 3.96it/s] 77%|███████████████████████████████▌ | 77/100 [00:21<00:05, 3.98it/s] 78%|███████████████████████████████▉ | 78/100 [00:21<00:05, 4.01it/s] 79%|████████████████████████████████▍ | 79/100 [00:22<00:04, 4.24it/s] 80%|████████████████████████████████▊ | 80/100 [00:22<00:04, 4.84it/s] 81%|█████████████████████████████████▏ | 81/100 [00:22<00:04, 4.73it/s] 82%|█████████████████████████████████▌ | 82/100 [00:22<00:03, 5.09it/s] 83%|██████████████████████████████████ | 83/100 [00:22<00:03, 4.94it/s] 84%|██████████████████████████████████▍ | 84/100 [00:22<00:03, 5.22it/s] 85%|██████████████████████████████████▊ | 85/100 [00:23<00:03, 4.78it/s] 86%|███████████████████████████████████▎ | 86/100 [00:24<00:07, 1.96it/s] 87%|███████████████████████████████████▋ | 87/100 [00:24<00:05, 2.35it/s] 88%|████████████████████████████████████ | 88/100 [00:24<00:04, 2.80it/s] 89%|████████████████████████████████████▍ | 89/100 [00:25<00:03, 3.16it/s] 90%|████████████████████████████████████▉ | 90/100 [00:25<00:02, 3.43it/s] 91%|█████████████████████████████████████▎ | 91/100 [00:25<00:02, 3.80it/s] 92%|█████████████████████████████████████▋ | 92/100 [00:25<00:01, 4.44it/s] 93%|██████████████████████████████████████▏ | 93/100 [00:25<00:01, 4.77it/s] 94%|██████████████████████████████████████▌ | 94/100 [00:26<00:01, 3.44it/s] 95%|██████████████████████████████████████▉ | 95/100 [00:26<00:01, 3.80it/s] 96%|███████████████████████████████████████▎ | 96/100 [00:26<00:00, 4.14it/s] 97%|███████████████████████████████████████▊ | 97/100 [00:26<00:00, 4.16it/s] 98%|████████████████████████████████████████▏| 98/100 [00:27<00:00, 4.70it/s] 99%|████████████████████████████████████████▌| 99/100 [00:27<00:00, 4.46it/s] 100%|████████████████████████████████████████| 100/100 [00:27<00:00, 4.71it/s]
np.array([entity == None for entity in testset]).astype(int).sum()
19
19 misses in 100 randoms. So we're able to match 80% of the time. Not bad as far as fuzzy matches go.
If our data is missing at random (MAR), then we are happy, because this makes for a statistically valid sample of all restaurants in New York City—we can simply drop the other values.
But there's a high likelihood, in my professional opinion, that the data is missing not at a random (MNAR). I expect that it's more likely for a restaurant or eatery in a poor neighborhood to be missing the phone number information in Yelp! than one in a richer, whiter community. This is a severe under-reporting issue that will invalidate any conclusions we try to make using this data with regarding to "gentrification" and whatnot.
We'll need to validate the data geospatially. We'll do that next.
Run the full Yelp! API data through, one slice at a time.
len(inspections_fff)
26074
inspections_fff.to_csv("../data/inspections_flattened_initial.csv", encoding="utf-8")
Oy vey. This will require two days of processing, since the API limit is 25000/day.
from tqdm import tqdm_notebook
del inspections
del inspections_f
del inspections_ff
inspections_fff['DOHMH PHONE'].iloc[2702:2705]
CAMIS 40788706 7187231080 40788884 __________ 40788886 6462307208 Name: DOHMH PHONE, dtype: object
Uh, ok.
raw_yelp_5000 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(num) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][:5000])]
raw_yelp_5000_2 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(num) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][5000:10000])]
raw_yelp_5000_3 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(num) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][10000:15000])]
import pickle
with open("../data/raw_yelp_list.pkl", "wb") as f:
pickle.dump(raw_yelp_5000+ raw_yelp_5000_2 + raw_yelp_5000_3, f)
import pickle
with open("../data/raw_yelp_list.pkl", "rb") as f:
raw_yelp_1_to_3 = pickle.load(f)
len(raw_yelp_1_to_3)
15000
raw_yelp_5000_4 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(str(int(num))) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][15000:20000])]
inspections_fff['DOHMH PHONE'].iloc[20000 + 3270]
1646644665.0
This appears to be an invalid phone number. Parsing it through Yelp raises a reliable error, which I've now coded into the loop.
yelp_phone_fetch(int(inspections_fff['DOHMH PHONE'].iloc[20000 + 3270]))
There seem to be multiple such numbers in there.
yelp_phone_fetch(int(inspections_fff['DOHMH PHONE'].iloc[20000 + 4065]))
raw_yelp_5000_5 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(str(int(num))) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][20000:25000])]
raw_yelp_5000_6 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(str(int(num))) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][25000:])]
all_raws = raw_yelp_1_to_3 + raw_yelp_5000_4 + raw_yelp_5000_5 + raw_yelp_5000_6
len(all_raws)
26064
With the data thus acquired, we now assign it to our dataframe. Before we do that, first we need to clean up what we have.
inspections_fff.sample(1)
DOHMH DBA | DOHMH BORO | DOHMH BUILDING | DOHMH STREET | DOHMH ZIPCODE | DOHMH PHONE | DOHMH CUISINE DESCRIPTION | DOHMH INITIAL INSPECTION DATE | DOHMH ACTION | DOHMH VIOLATION CODE | DOHMH VIOLATION DESCRIPTION | DOHMH CRITICAL FLAG | DOHMH SCORE | DOHMH GRADE | DOHMH GRADE DATE | DOHMH RECORD DATE | DOHMH INITIAL INSPECTION TYPE | DOHMH LATEST INSPECTION DATE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CAMIS | ||||||||||||||||||
41405535 | TWO BOOTS | MANHATTAN | 625.0 | 9 AVENUE | 10036 | 2.129563e+09 | Pizza | 2014-06-24 | Violations were cited in the following area(s). | 04L | Evidence of mice or live mice present in facil... | Critical | 15.0 | NaN | NaN | 09/27/2016 | Cycle Inspection / Initial Inspection | 2016-09-13 |
all_raws[0] = {'Yelp Address': ['1007 Morris Park Avenue'],
'Yelp Categories': ["Category(name='Bakeries', alias='bakeries')",
"Category(name='Desserts', alias='desserts')"],
'Yelp ID': 'morris-pk-bake-shop-bronx',
'Yelp Is Claimed': False,
'Yelp Is Closed': False,
'Yelp Latitude': 40.848445892334,
'Yelp Longitude': -73.8560791015625,
'Yelp Name': 'Morris Pk Bake Shop',
'Yelp Neighborhoods': ['Morris Park'],
'Yelp Rating': 4.5,
'Yelp Review Count': 27,
'Yelp URL': 'https://www.yelp.com/biz/morris-pk-bake-shop-bronx?adjust_creative=dkJPGu_jtTyHwsEgZIZN6g&utm_campaign=yelp_api&utm_medium=api_v2_phone_search&utm_source=dkJPGu_jtTyHwsEgZIZN6g'}
all_raws[0]
{'Yelp Address': ['1007 Morris Park Avenue'], 'Yelp Categories': ["Category(name='Bakeries', alias='bakeries')", "Category(name='Desserts', alias='desserts')"], 'Yelp ID': 'morris-pk-bake-shop-bronx', 'Yelp Is Claimed': False, 'Yelp Is Closed': False, 'Yelp Latitude': 40.848445892334, 'Yelp Longitude': -73.8560791015625, 'Yelp Name': 'Morris Pk Bake Shop', 'Yelp Neighborhoods': ['Morris Park'], 'Yelp Rating': 4.5, 'Yelp Review Count': 27, 'Yelp URL': 'https://www.yelp.com/biz/morris-pk-bake-shop-bronx?adjust_creative=dkJPGu_jtTyHwsEgZIZN6g&utm_campaign=yelp_api&utm_medium=api_v2_phone_search&utm_source=dkJPGu_jtTyHwsEgZIZN6g'}
import re
import copy
re_cat_word = re.compile("name='[-\w\s&()/,']+(?=')")
def safe_group(match_obj):
try:
return match_obj.group().replace("name='", "")
except AttributeError:
return None
def format_data(yelp_dict):
if yelp_dict:
ret = copy.deepcopy(yelp_dict)
if ret['Yelp Address']:
ret['Yelp Address'] = ret['Yelp Address'][0]
cats = ret['Yelp Categories']
# print(cats)
if cats:
try:
parsed_cats = [re.search(re_cat_word, str(cat)).group().replace("name='", "") for cat in cats]
except:
print(cats)
parsed_cats = []
# print(parsed_cats)
ret['Yelp Categories'] = "|".join(parsed_cats)
neighborhoods = ret['Yelp Neighborhoods']
if neighborhoods:
ret['Yelp Neighborhoods'] = "|".join(ret['Yelp Neighborhoods'])
del ret['Yelp URL']
return ret
else:
return None
formatted_yelp_data = []
for raw in tqdm_notebook(all_raws):
formatted_yelp_data.append(format_data(raw))
[Category(name="Women's Clothing", alias='womenscloth')] [Category(name='Department Stores', alias='deptstores'), Category(name="Men's Clothing", alias='menscloth'), Category(name="Women's Clothing", alias='womenscloth')] [Category(name="Men's Clothing", alias='menscloth'), Category(name="Women's Clothing", alias='womenscloth')] [Category(name="Men's Clothing", alias='menscloth'), Category(name='Used, Vintage & Consignment', alias='vintage')] [Category(name='Jewelry', alias='jewelry'), Category(name="Women's Clothing", alias='womenscloth')] [Category(name='Motorcycle Gear', alias='motorcyclinggear'), Category(name="Men's Clothing", alias='menscloth'), Category(name='Coffee & Tea', alias='coffee')] [Category(name="Children's Clothing", alias='childcloth'), Category(name='Ice Cream & Frozen Yogurt', alias='icecream'), Category(name='Desserts', alias='desserts')] [Category(name='Shoe Stores', alias='shoes'), Category(name="Men's Clothing", alias='menscloth'), Category(name="Women's Clothing", alias='womenscloth')] [Category(name='Toy Stores', alias='toys'), Category(name="Men's Clothing", alias='menscloth'), Category(name="Women's Clothing", alias='womenscloth')]
formatted_yelp_data[0]
{'Yelp Address': '1007 Morris Park Avenue', 'Yelp Categories': 'Bakeries|Desserts', 'Yelp ID': 'morris-pk-bake-shop-bronx', 'Yelp Is Claimed': False, 'Yelp Is Closed': False, 'Yelp Latitude': 40.848445892334, 'Yelp Longitude': -73.8560791015625, 'Yelp Name': 'Morris Pk Bake Shop', 'Yelp Neighborhoods': 'Morris Park', 'Yelp Rating': 4.5, 'Yelp Review Count': 27}
Now assign.
inspections_ffff = inspections_fff.copy()
for key in formatted_yelp_data[0].keys():
inspections_ffff[key] = [s[key] if s else None for s in formatted_yelp_data]
inspections_ffff.columns = [col.upper() for col in inspections_ffff.columns]
inspections_ffff
DOHMH DBA | DOHMH BORO | DOHMH BUILDING | DOHMH STREET | DOHMH ZIPCODE | DOHMH PHONE | DOHMH CUISINE DESCRIPTION | DOHMH INITIAL INSPECTION DATE | DOHMH ACTION | DOHMH VIOLATION CODE | ... | YELP LATITUDE | YELP ADDRESS | YELP ID | YELP RATING | YELP REVIEW COUNT | YELP IS CLAIMED | YELP LONGITUDE | YELP NEIGHBORHOODS | YELP CATEGORIES | YELP NAME | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CAMIS | |||||||||||||||||||||
30075445 | MORRIS PARK BAKE SHOP | BRONX | 1007.0 | MORRIS PARK AVE | 10462 | 7.188925e+09 | Bakery | 2013-06-01 | Violations were cited in the following area(s). | 16B | ... | 40.848446 | 1007 Morris Park Avenue | morris-pk-bake-shop-bronx | 4.5 | 27.0 | False | -73.856079 | Morris Park | Bakeries|Desserts | Morris Pk Bake Shop |
30112340 | WENDY'S | BROOKLYN | 469.0 | FLATBUSH AVENUE | 11225 | 7.182875e+09 | Hamburgers | 2014-06-05 | Violations were cited in the following area(s). | 10B | ... | 40.662952 | 469 Flatbush Ave | wendys-brooklyn-4 | 2.0 | 23.0 | False | -73.961753 | Prospect Heights|Prospect Lefferts Gardens | Fast Food|Burgers | Wendy's |
30191841 | DJ REYNOLDS PUB AND RESTAURANT | MANHATTAN | 351.0 | WEST 57 STREET | 10019 | 2.122453e+09 | Irish | 2013-07-22 | Violations were cited in the following area(s). | 10B | ... | 40.767750 | 351 W 57th St | dj-reynolds-new-york-3 | 3.0 | 75.0 | False | -73.984870 | Midtown West|Hell's Kitchen | Irish|Pubs | DJ Reynolds |
40356018 | RIVIERA CATERER | BROOKLYN | 2780.0 | STILLWELL AVENUE | 11224 | 7.183723e+09 | American | 2013-06-05 | Violations were cited in the following area(s). | 10F | ... | 40.579521 | 2780 Stillwell Ave | riviera-caterers-brooklyn | 4.0 | 23.0 | True | -73.982430 | Coney Island | Caterers | Riviera Caterers |
40356151 | BRUNOS ON THE BOULEVARD | QUEENS | 8825.0 | ASTORIA BOULEVARD | 11369 | 7.183351e+09 | American | 2014-04-11 | Violations were cited in the following area(s). | 04J | ... | 40.764240 | 8825 Astoria Blvd | events-by-brunos-jackson-heights | 4.0 | 15.0 | True | -73.880410 | East Elmhurst | Caterers|Venues & Event Spaces | Events By Bruno's |
40356483 | WILKEN'S FINE FOOD | BROOKLYN | 7114.0 | AVENUE U | 11234 | 7.184444e+09 | Delicatessen | 2013-07-09 | Violations were cited in the following area(s). | 02H | ... | 40.619900 | 7114 Ave U | wilkens-ii-deli-brooklyn | 3.5 | 25.0 | False | -73.906853 | Bergen Beach | Delis | Wilkens II Deli |
40356731 | TASTE THE TROPICS ICE CREAM | BROOKLYN | 1839.0 | NOSTRAND AVENUE | 11226 | 7.188561e+09 | Ice Cream, Gelato, Yogurt, Ices | 2013-07-10 | Violations were cited in the following area(s). | 10B | ... | 40.640820 | 1839 Nostrand Ave | taste-the-tropics-brooklyn | 4.5 | 16.0 | False | -73.948151 | Flatbush | Ice Cream & Frozen Yogurt | Taste the Tropics |
40357217 | WILD ASIA | BRONX | 2300.0 | SOUTHERN BOULEVARD | 10460 | 7.182208e+09 | American | 2013-06-19 | Violations were cited in the following area(s). | 10B | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
40357437 | C & C CATERING SERVICE | BROOKLYN | 7715.0 | 18 AVENUE | 11214 | 7.182323e+09 | American | 2014-04-16 | Violations were cited in the following area(s). | 06D | ... | 40.611713 | 7715 18th Ave | c-and-c-catering-service-brooklyn | 3.0 | 2.0 | True | -73.997261 | Bensonhurst | Caterers | C & C Catering Service |
40359480 | 1 EAST 66TH STREET KITCHEN | MANHATTAN | 1.0 | EAST 66 STREET | 10065 | 2.128794e+09 | American | 2014-05-07 | Violations were cited in the following area(s). | 10B | ... | 40.768684 | 1 E 66th St | wyeth-james-new-york | 0.0 | 0.0 | False | -73.969337 | Upper East Side | None | Wyeth James |
40359705 | NATHAN'S FAMOUS | BROOKLYN | 1310.0 | SURF AVENUE | 11224 | 7.183332e+09 | Hotdogs | 2013-12-03 | Violations were cited in the following area(s). | 10F | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
40360045 | SEUDA FOODS | BROOKLYN | 705.0 | KINGS HIGHWAY | 11223 | 7.183752e+09 | Jewish/Kosher | 2013-10-10 | Violations were cited in the following area(s). | 08A | ... | 40.606472 | 705 Kings Hwy | seuda-foods-brooklyn | 4.5 | 3.0 | False | -73.965415 | Midwood|Gravesend | Caterers | Seuda Foods |
40360076 | CARVEL ICE CREAM | BROOKLYN | 203.0 | CHURCH AVENUE | 11218 | 7.184390e+09 | Ice Cream, Gelato, Yogurt, Ices | 2014-02-10 | Violations were cited in the following area(s). | 10F | ... | 40.643380 | 203 Church Ave | carvel-brooklyn-3 | 3.5 | 18.0 | False | -73.978199 | Kensington|Flatbush | Ice Cream & Frozen Yogurt | Carvel |
40361322 | CARVEL ICE CREAM | QUEENS | 26515.0 | HILLSIDE AVENUE | 11004 | 7.183430e+09 | Ice Cream, Gelato, Yogurt, Ices | 2013-09-18 | Violations were cited in the following area(s). | 08A | ... | 40.743605 | 26515 Hillside Ave | carvel-glen-oaks-2 | 5.0 | 4.0 | False | -73.660313 | None | Ice Cream & Frozen Yogurt | Carvel |
40361606 | THE MOVABLE FEAST | BROOKLYN | 284.0 | PROSPECT PARK WEST | 11215 | 7.189653e+09 | American | 2013-11-14 | Violations were cited in the following area(s). | 10F | ... | 40.658035 | 284 Prospect Park W | the-movable-feast-brooklyn | 4.5 | 16.0 | True | -73.982879 | South Slope|Windsor Terrace | Caterers | The Movable Feast |
40361618 | SAL'S DELI | QUEENS | 12908.0 | 20 AVENUE | 11356 | 7.186619e+09 | Delicatessen | 2014-08-16 | Violations were cited in the following area(s). | 08C | ... | 40.781471 | 12908 20th Ave | sals-deli-college-point | 4.5 | 4.0 | False | -73.839287 | College Point | Delis|Sandwiches | Sal's Deli |
40361708 | BULLY'S DELI | MANHATTAN | 759.0 | BROADWAY | 10003 | 2.122550e+09 | Delicatessen | 2014-01-21 | Violations were cited in the following area(s). | 10J | ... | 40.730826 | 759 Broadway | bullys-deli-new-york-13 | 3.0 | 110.0 | True | -73.992404 | Greenwich Village | Delis | Bully's Deli |
40361998 | STEVE CHU'S DELI & GROCERY | QUEENS | 3406.0 | 10 STREET | 11106 | 7.187269e+09 | Delicatessen | 2013-03-13 | Violations were cited in the following area(s). | 10F | ... | 40.764748 | 3406 10th St | steve-chus-deli-and-grocery-astoria | 4.0 | 4.0 | False | -73.938873 | Astoria | Delis | Steve Chu's Deli & Grocery |
40362098 | HARRIET'S KITCHEN | MANHATTAN | 502.0 | AMSTERDAM AVENUE | 10024 | 2.127210e+09 | American | 2014-02-03 | Violations were cited in the following area(s). | 04M | ... | 40.786678 | 502 Amsterdam Ave | harriets-kitchen-new-york | 3.0 | 106.0 | True | -73.976074 | Upper West Side | American (Traditional)|Burgers | Harriet's Kitchen |
40362264 | P & S DELI GROCERY | MANHATTAN | 730.0 | COLUMBUS AVENUE | 10025 | 2.129323e+09 | American | 2013-04-08 | Violations were cited in the following area(s). | 02G | ... | 40.792419 | 724 Columbus Ave | ps-deli-and-grocery-manhattan | 4.0 | 2.0 | False | -73.967912 | Upper West Side | Grocery | PS Deli and Grocery |
40362274 | ANGELIKA FILM CENTER | MANHATTAN | 18.0 | WEST HOUSTON STREET | 10012 | 2.129953e+09 | American | 2013-04-05 | Violations were cited in the following area(s). | 10F | ... | 40.725891 | 18 W Houston St | angelika-film-center-and-cafe-new-york-new-york-2 | 3.5 | 396.0 | True | -73.996986 | Greenwich Village | Cinema|Cafes | Angelika Film Center & Cafe - New York |
40362432 | HO MEI RESTAURANT | QUEENS | 10305.0 | 37 AVENUE | 11368 | 7.187797e+09 | Chinese | 2013-05-02 | Violations were cited in the following area(s). | 06D | ... | 40.753552 | 10305 37th Avenue | ho-mei-corona | 5.0 | 3.0 | False | -73.864227 | North Corona | Restaurants | Ho Mei |
40362715 | THE COUNTRY CAFE | MANHATTAN | 60.0 | WALL STREET | 10005 | 3.474279e+09 | Sandwiches/Salads/Mixed Buffet | 2013-09-18 | Violations were cited in the following area(s). | 06B | ... | 40.706241 | 60 Wall St | the-country-cafe-new-york-2 | 2.5 | 16.0 | False | -74.008506 | Financial District | Sandwiches|Cafes|Breakfast & Brunch | The Country Cafe |
40362869 | SHASHEMENE INT'L RESTAURA | BROOKLYN | 195.0 | EAST 56 STREET | 11203 | 3.474301e+09 | Caribbean | 2013-05-08 | Violations were cited in the following area(s). | 10B | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
40363093 | CARVEL ICE CREAM | BRONX | 1006.0 | EAST 233 STREET | 10466 | 7.186527e+09 | Ice Cream, Gelato, Yogurt, Ices | 2013-08-02 | Violations were cited in the following area(s). | 06D | ... | 40.890331 | 1006 E 233rd St | carvel-bronx-3 | 2.0 | 5.0 | False | -73.848595 | Edenwald | Ice Cream & Frozen Yogurt | Carvel |
40363098 | DUNKIN' DONUTS | BROOKLYN | 56.0 | COURT STREET | 11201 | 7.186252e+09 | Donuts | 2013-01-08 | Violations were cited in the following area(s). | 08A | ... | 40.692272 | 56 Court st | dunkin-donuts-brooklyn-131 | 1.0 | 2.0 | False | -73.991493 | Brooklyn Heights | Donuts|Coffee & Tea | Dunkin' Donuts |
40363289 | HAPPY GARDEN | BRONX | 1236.0 | 238 SPOFFORD AVE | 10474 | 7.186172e+09 | Chinese | 2013-12-30 | Violations were cited in the following area(s). | 10F | ... | 40.814002 | 1236 Spofford Avenue | chinese-fried-chicken-bronx-3 | 0.0 | 0.0 | False | -73.889529 | Hunts Point | Chinese | Chinese Fried Chicken |
40363298 | CAFE METRO | MANHATTAN | 625.0 | 8 AVENUE | 10018 | 2.127149e+09 | American | 2013-12-13 | Violations were cited in the following area(s). | 06D | ... | 40.756252 | 625 8th Ave Bsmt 2 | cosane-new-york | 0.0 | 0.0 | False | -73.990685 | Theater District|Midtown West|Hell's Kitchen | Coffee & Tea | Cosane |
40363333 | TONY'S DELI | QUEENS | 1069.0 | WYCKOFF AVENUE | 11385 | 7.183868e+09 | Delicatessen | 2013-06-10 | Violations were cited in the following area(s). | 04N | ... | 40.694920 | 1069 Wyckoff Avenue | tonts-too-deli-and-coffee-shop-ridgewood | 0.0 | 0.0 | False | -73.902460 | Bushwick|Ridgewood | Delis|Sandwiches | Tont's Too Deli & Coffee Shop |
40363426 | LEXLER DELI | MANHATTAN | 405.0 | LEXINGTON AVENUE | 10174 | 2.126871e+09 | Sandwiches/Salads/Mixed Buffet | 2013-08-21 | Violations were cited in the following area(s). | 10F | ... | 40.751910 | 405 Lexington Ave | lexler-deli-new-york | 4.5 | 15.0 | False | -73.975530 | Midtown East | Delis | Lexler Deli |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
50055989 | NaN | QUEENS | 25109.0 | NORTHERN BLVD | 11362 | 3.473282e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50055990 | RESTAURANT ASSOCIATES INC | MANHATTAN | 1177.0 | AVENUE OF THE AMERICAS | 10036 | 2.123363e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50055996 | NaN | MANHATTAN | NaN | CANAL ST | 10013 | 7.184619e+09 | Other | 1900-01-01 | NaN | NaN | ... | 40.759792 | 39-15 Main St | alicechencpa-queens | 1.0 | 3.0 | False | -73.830063 | Flushing|Downtown Flushing | Financial Services | AliceChenCPA |
50055997 | NaN | BRONX | 1167.0 | BOSTON RD | 10456 | 7.184508e+09 | Other | 1900-01-01 | NaN | NaN | ... | 40.828840 | 1167 Boston Rd | mikes-pizza-and-pasta-bronx | 5.0 | 1.0 | False | -73.903340 | Morrisania | Pizza | Mike's Pizza and Pasta |
50056005 | SANTA PANZA | BROOKLYN | 1079.0 | BROADWAY | 11221 | 7.186980e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056006 | NaN | BROOKLYN | 239.0 | BRIGHTON BEACH AVE | 11235 | 9.019072e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056007 | NaN | MANHATTAN | 1.0 | E 35TH ST | 10016 | 9.178265e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056009 | NaN | QUEENS | 15702.0 | NORTHERN BLVD | 11354 | 9.176238e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056014 | PIZZA ROYALE | BROOKLYN | 6716.0 | FORT HAMILTON PKWY | 11219 | 7.182385e+09 | Other | 1900-01-01 | NaN | NaN | ... | 40.629331 | 6718 Fort Hamilton Pkwy | pizza-royale-brooklyn | 3.0 | 39.0 | True | -74.011227 | Dyker Heights | Pizza|Italian | Pizza Royale |
50056016 | NaN | QUEENS | 1526.0 | COLLEGE POINT BLVD | 11356 | 9.178734e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056017 | VANGUARD WINE BAR | MANHATTAN | 1372.0 | 1ST AVE | 10021 | 2.122889e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056018 | NaN | BRONX | 3513.0 | E TREMONT AVE | 10465 | 7.188220e+09 | Other | 1900-01-01 | NaN | NaN | ... | 40.830978 | 3513 E Tremont Ave | legendary-bar-and-restaurant-bronx | 4.5 | 11.0 | False | -73.826118 | Schuylerville | Bars|Italian | Legendary Bar & Restaurant |
50056020 | DOT & LINE | BROOKLYN | 253.0 | BERGEN ST | 11217 | 3.475178e+08 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056021 | NaN | MANHATTAN | 2656.0 | BROADWAY | 10025 | 7.184900e+09 | Other | 1900-01-01 | NaN | NaN | ... | 40.797590 | 2656 Broadway | lava-kitchen-new-york | 0.0 | 0.0 | True | -73.969310 | Manhattan Valley | Chinese|Asian Fusion|Noodles | Lava Kitchen |
50056022 | NaN | QUEENS | 3526.0 | FARRINGTON ST | 11354 | 7.184600e+09 | Other | 1900-01-01 | NaN | NaN | ... | 40.763670 | 3526 Farrington St | joomakgol-flushing | 4.5 | 47.0 | True | -73.831750 | Flushing|Downtown Flushing | Korean | JooMakGol |
50056030 | NaN | QUEENS | 11916.0 | ROCKAWAY BLVD | 11420 | 9.178057e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056031 | NaN | BROOKLYN | 652.0 | PENNSYLVANIA AVE | 11207 | 9.174454e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056033 | NaN | QUEENS | 10837.0 | CORONA AVE | 11368 | 9.172952e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056036 | NORTHERN CAFE INC. | QUEENS | 10113.0 | NORTHERN BLVD | 11368 | 3.477838e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056056 | NaN | QUEENS | 1304.0 | BEACH CHANNEL DR | 11691 | 3.478130e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056057 | NaN | QUEENS | 6418.0 | 108TH ST | 11375 | 9.173228e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056063 | NaN | MANHATTAN | 200.0 | DYCKMAN ST | 10040 | 3.472959e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056065 | NaN | MANHATTAN | 4241.0 | BROADWAY | 10033 | 6.468212e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056067 | HAROLD'S MEAT & THREE | MANHATTAN | 231.0 | HUDSON ST | 10013 | 3.476752e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056068 | XELAJU RESTAURANT | QUEENS | 8830.0 | PARSONS BLVD | 11432 | 7.186573e+09 | Other | 1900-01-01 | NaN | NaN | ... | 40.706911 | 88-30 Parsons Blvd | luna-de-xelaju-restaurant-and-pizzeria-jamaica | 2.0 | 1.0 | False | -73.802483 | Jamaica | Spanish | Luna de Xelaju Restaurant & Pizzeria |
50056069 | NaN | MANHATTAN | 4484.0 | BROADWAY | 10040 | 9.172041e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056073 | CULINART INC | MANHATTAN | 1.0 | UN PLZ | 10017 | 2.127581e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056074 | ABRACO | MANHATTAN | 81.0 | E 7TH ST | 10003 | 3.476102e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056081 | NaN | MANHATTAN | 232.0 | E 111TH ST | 10029 | 9.174841e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
50056087 | NaN | BROOKLYN | 293.0 | AVENUE X | 11223 | 9.177708e+09 | Other | 1900-01-01 | NaN | NaN | ... | NaN | None | None | NaN | NaN | None | NaN | None | None | None |
26064 rows × 29 columns
inspections_ffff.to_csv("../data/yelp_dohmh_agg_data.csv", encoding='utf-8')