In [1]:

import pandas as pd

In [2]:

inspections = pd.read_csv("../data/inspections.csv", index_col=0)

In [3]:

inspections.head(3)

Out[3]:

	CAMIS	DBA	BORO	BUILDING	STREET	ZIPCODE	PHONE	CUISINE DESCRIPTION	INSPECTION DATE	ACTION	VIOLATION CODE	VIOLATION DESCRIPTION	CRITICAL FLAG	SCORE	GRADE	GRADE DATE	RECORD DATE	INSPECTION TYPE
0	41158108	NICK'S GOURMET DELI	QUEENS	7415	DITMARS BOULEVARD	11370	7182788338	American	07/21/2015	Violations were cited in the following area(s).	04L	Evidence of mice or live mice present in facil...	Critical	11.0	NaN	NaN	09/27/2016	Cycle Inspection / Initial Inspection
1	41187577	HANSOL NUTRITION CENTER	QUEENS	16026	NORTHERN BOULEVARD	11358	7188880200	Korean	07/13/2016	Violations were cited in the following area(s).	06A	Personal cleanliness inadequate. Outer garment...	Critical	30.0	NaN	NaN	09/27/2016	Cycle Inspection / Initial Inspection
2	41705988	KURA	MANHATTAN	130	ST MARKS PLACE	10009	2122281010	Japanese	05/08/2013	Violations were cited in the following area(s).	02B	Hot food item not held at or above 140Âº F.	Critical	27.0	NaN	NaN	09/27/2016	Pre-permit (Operational) / Initial Inspection

In [3]:

inspections['INSPECTION DATE'] = pd.to_datetime(inspections['INSPECTION DATE'])

In [47]:

# random_camis = inspections.sample().iloc[0]['CAMIS']
# inspections[inspections['CAMIS'] == random_camis].sort_values(by='INSPECTION DATE')['INSPECTION DATE']

Get initial inspection date.

In [4]:

inspections_f = inspections.groupby('CAMIS')\
                            .apply(lambda df: inspections[inspections['CAMIS'] == df['CAMIS'].iloc[0]]\
                                               .sort_values(by="INSPECTION DATE")\
                                               .iloc[0]\
                                               .drop('CAMIS')
                                  )\
                            .rename(columns={'INSPECTION DATE': 'INITIAL INSPECTION DATE'})

In [5]:

inspections_f.sample(1)

Out[5]:

	DBA	BORO	BUILDING	STREET	ZIPCODE	PHONE	CUISINE DESCRIPTION	INITIAL INSPECTION DATE	ACTION	VIOLATION CODE	VIOLATION DESCRIPTION	CRITICAL FLAG	SCORE	GRADE	GRADE DATE	RECORD DATE	INSPECTION TYPE
CAMIS
50001937	TASTY CHICKEN	BROOKLYN	1687	86TH STREET	11214	7182591111	American	08/21/2013	Violations were cited in the following area(s).	15L	Smoke free workplace smoking policy inadequate...	Not Critical	NaN	NaN	NaN	09/27/2016	Smoke-Free Air Act / Initial Inspection

Not sure why, but this operation drops the DBA field. None others are affected. Maybe something with the indexing?

Let's take the most recent inspection date.

In [5]:

inspections_ff = inspections_f.copy()
inspections_ff['LATEST INSPECTION DATE'] = inspections_ff\
        .apply(lambda srs: inspections[inspections['CAMIS'] == srs.name]\
                                  .sort_values(by='INSPECTION DATE')\
                                  .iloc[-1]\
                                  ['INSPECTION DATE'],
              axis='columns')

Rename the INSPECTION TYPE to INITIAL INSPECTION TYPE (this flags new establishments from pre-existing ones, in the cotext of our dataset).

In [6]:

inspections_ff = inspections_ff.rename(columns={'INSPECTION TYPE': 'INITIAL INSPECTION TYPE'})

Checking the flags:

In [91]:

inspections_ff['INITIAL INSPECTION TYPE'].value_counts()

Out[91]:

Cycle Inspection / Initial Inspection                          12537
Pre-permit (Operational) / Initial Inspection                   7630
Pre-permit (Non-operational) / Initial Inspection               2256
Cycle Inspection / Re-inspection                                 877
Administrative Miscellaneous / Initial Inspection                683
Smoke-Free Air Act / Initial Inspection                          281
Trans Fat / Initial Inspection                                   247
Pre-permit (Operational) / Re-inspection                          82
Trans Fat / Compliance Inspection                                 80
Trans Fat / Re-inspection                                         75
Inter-Agency Task Force / Initial Inspection                      53
Calorie Posting / Initial Inspection                              30
Trans Fat / Second Compliance Inspection                          27
Administrative Miscellaneous / Re-inspection                      27
Smoke-Free Air Act / Re-inspection                                23
Administrative Miscellaneous / Compliance Inspection              14
Cycle Inspection / Compliance Inspection                          12
Pre-permit (Operational) / Compliance Inspection                  12
Cycle Inspection / Reopening Inspection                            8
Smoke-Free Air Act / Compliance Inspection                         6
Calorie Posting / Re-inspection                                    5
Pre-permit (Operational) / Reopening Inspection                    4
Pre-permit (Non-operational) / Re-inspection                       3
Cycle Inspection / Second Compliance Inspection                    2
Administrative Miscellaneous / Second Compliance Inspection        2
Pre-permit (Operational) / Second Compliance Inspection            1
Smoke-Free Air Act / Limited Inspection                            1
Calorie Posting / Compliance Inspection                            1
Smoke-Free Air Act / Second Compliance Inspection                  1
Name: INITIAL INSPECTION TYPE, dtype: int64

A lot of them are null.

In [93]:

inspections_ff['INITIAL INSPECTION TYPE'].isnull().astype(int).sum()

Out[93]:

Remember, this is new establishments that have not been inspected yet. That the numbers match up here is encouraging.

In [94]:

inspections['INSPECTION TYPE'].isnull().astype(int).sum()

Out[94]:

We reattach the lost DBA column.

In [95]:

inspections_ff.head(1)

Out[95]:

	DBA	BORO	BUILDING	STREET	ZIPCODE	PHONE	CUISINE DESCRIPTION	INITIAL INSPECTION DATE	ACTION	VIOLATION CODE	VIOLATION DESCRIPTION	CRITICAL FLAG	SCORE	GRADE	GRADE DATE	RECORD DATE	INITIAL INSPECTION TYPE	LATEST INSPECTION DATE
CAMIS
30075445	NaN	BRONX	1007.0	MORRIS PARK AVE	10462	7.188925e+09	Bakery	2013-06-01	Violations were cited in the following area(s).	16B	The original nutritional fact labels and/or in...	Not Critical	NaN	NaN	NaN	09/24/2016	Trans Fat / Compliance Inspection	2016-02-18

In [7]:

inspections_fff = inspections_ff.copy()
inspections_fff['DBA'] = inspections_fff.apply(lambda srs: 
                                                   inspections[inspections['CAMIS'] == srs.name]\
                                                   .iloc[0]['DBA'],
                                               axis='columns')

Prepend descriptors, to more easily distinguish things down the road.

In [8]:

inspections_fff.columns = ["DOHMH " + col for col in inspections_fff.columns]

In [105]:

inspections_fff.sample()

Out[105]:

	DOHMH DBA	DOHMH BORO	DOHMH BUILDING	DOHMH STREET	DOHMH ZIPCODE	DOHMH PHONE	DOHMH CUISINE DESCRIPTION	DOHMH INITIAL INSPECTION DATE	DOHMH ACTION	DOHMH VIOLATION CODE	DOHMH VIOLATION DESCRIPTION	DOHMH CRITICAL FLAG	DOHMH SCORE	DOHMH GRADE	DOHMH GRADE DATE	DOHMH RECORD DATE	DOHMH INITIAL INSPECTION TYPE	DOHMH LATEST INSPECTION DATE
CAMIS
40374834	CASA BELLA	MANHATTAN	127.0	MULBERRY STREET	10013	2.124314e+09	Italian	2013-08-15	Violations were cited in the following area(s).	06D	Food contact surface not properly washed, rins...	Critical	6.0	NaN	NaN	09/24/2016	Cycle Inspection / Initial Inspection	2016-04-26

Now we get information from Yelp!

In [9]:

from yelp.client import Client
from yelp.oauth1_authenticator import Oauth1Authenticator
from yelp.errors import BusinessUnavailable
import os
import json

def import_credentials(filename='../data/yelp_credentials.json'):
    try:
        data = json.load(open(filename))
        return data
    except:
        raise IOError('This API requires Yelp credentials to work. Did you forget to define them?')

credentials = import_credentials()

auth = Oauth1Authenticator(
    consumer_key=credentials['consumer_key'],
    consumer_secret=credentials['consumer_secret'],
    token=credentials['token'],
    token_secret=credentials['token_secret']
)

client = Client(auth)

In [10]:

from tqdm import tqdm

In [77]:

import httplib2.HttpLib2Error

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-77-ee8a03be3a31> in <module>()
----> 1 from httplib2.error import HTTPError

ImportError: No module named 'httplib2.error'

In [82]:

from urllib.error import HTTPError
import yelp

def yelp_phone_fetch(num):
    """
    Performs the phone search described in notebook 02 to fetch information on the entity associated with a number.
    """
    if not num:
        return None
    else:
        try:
            business = client.phone_search(num).businesses[0]
            if business and business.location and business.location.coordinate:
                return {
                    'Yelp ID': business.id,
                    'Yelp Is Claimed': business.is_claimed,
                    'Yelp Is Closed': business.is_closed,
                    'Yelp Name': business.name,
                    'Yelp URL': business.url,
                    'Yelp Review Count': business.review_count,
                    'Yelp Categories': business.categories,
                    'Yelp Rating': business.rating,
                    'Yelp Address': business.location.address,
                    'Yelp Neighborhoods': business.location.neighborhoods,
                    'Yelp Latitude': business.location.coordinate.latitude,
                    'Yelp Longitude': business.location.coordinate.longitude,
                       }
            else:  # Partial information, skip.
                return None
        except IndexError:  # Phone search failed!
            return None
        except yelp.errors.InvalidParameter:  # Invalid number!
            return None

After some testing there appears to be a very significant miss rate. Remember, we're fuzzy matching phone numbers from one data set with phone numbers in another dataset. It's far from 100% that we'll get something back out. I'm interested in what percentage of the time we're successful.

In [12]:

def random_number_dba_tuple():
    random_number, random_dba = inspections_fff.sample().iloc[0][['DOHMH PHONE', 'DOHMH DBA']]
    random_number = str(int(random_number))
    return random_number, random_dba

In [15]:

random_number_dba_tuple()

Out[15]:

(2125878880, 'MULTI TASTES DINER')

In [127]:

yelp_phone_fetch(random_number_dba_tuple()[0])

Out[127]:

{'Yelp Address': ['41 E 11th St'],
 'Yelp Categories': [Category(name='Japanese', alias='japanese')],
 'Yelp ID': 'ootoya-greenwich-village-new-york',
 'Yelp Is Claimed': True,
 'Yelp Is Closed': False,
 'Yelp Latitude': 40.7333107,
 'Yelp Longitude': -73.9929962,
 'Yelp Name': 'Ootoya Greenwich Village',
 'Yelp Neighborhoods': ['Greenwich Village'],
 'Yelp Rating': 4.0,
 'Yelp Review Count': 135,
 'Yelp URL': 'https://www.yelp.com/biz/ootoya-greenwich-village-new-york?adjust_creative=dkJPGu_jtTyHwsEgZIZN6g&utm_campaign=yelp_api&utm_medium=api_v2_phone_search&utm_source=dkJPGu_jtTyHwsEgZIZN6g'}

Hey I've been there! Nice. More rigorously:

In [133]:

one_hundred_randoms = [random_number_dba_tuple() for i in range(100)]

In [138]:

testset = [yelp_phone_fetch(num) for num, placename in tqdm(one_hundred_randoms)]

  0%|                                                  | 0/100 [00:00<?, ?it/s]
  1%|▍                                         | 1/100 [00:00<00:25,  3.85it/s]
  2%|▊                                         | 2/100 [00:00<00:28,  3.45it/s]
  3%|█▎                                        | 3/100 [00:00<00:25,  3.74it/s]
  4%|█▋                                        | 4/100 [00:02<00:52,  1.84it/s]
  5%|██                                        | 5/100 [00:02<00:42,  2.25it/s]
  6%|██▌                                       | 6/100 [00:02<00:35,  2.62it/s]
  7%|██▉                                       | 7/100 [00:02<00:31,  2.99it/s]
  8%|███▎                                      | 8/100 [00:02<00:29,  3.10it/s]
  9%|███▊                                      | 9/100 [00:03<00:32,  2.78it/s]
 10%|████                                     | 10/100 [00:03<00:30,  2.94it/s]
 11%|████▌                                    | 11/100 [00:03<00:26,  3.33it/s]
 12%|████▉                                    | 12/100 [00:04<00:24,  3.54it/s]
 13%|█████▎                                   | 13/100 [00:04<00:23,  3.64it/s]
 14%|█████▋                                   | 14/100 [00:04<00:29,  2.95it/s]
 15%|██████▏                                  | 15/100 [00:05<00:33,  2.58it/s]
 16%|██████▌                                  | 16/100 [00:05<00:29,  2.88it/s]
 17%|██████▉                                  | 17/100 [00:05<00:25,  3.22it/s]
 18%|███████▍                                 | 18/100 [00:06<00:23,  3.47it/s]
 19%|███████▊                                 | 19/100 [00:06<00:20,  3.86it/s]
 20%|████████▏                                | 20/100 [00:06<00:20,  3.96it/s]
 21%|████████▌                                | 21/100 [00:06<00:18,  4.35it/s]
 22%|█████████                                | 22/100 [00:07<00:18,  4.14it/s]
 23%|█████████▍                               | 23/100 [00:07<00:23,  3.23it/s]
 24%|█████████▊                               | 24/100 [00:07<00:27,  2.72it/s]
 25%|██████████▎                              | 25/100 [00:08<00:24,  3.10it/s]
 26%|██████████▋                              | 26/100 [00:08<00:20,  3.55it/s]
 27%|███████████                              | 27/100 [00:08<00:19,  3.65it/s]
 28%|███████████▍                             | 28/100 [00:09<00:24,  2.89it/s]
 29%|███████████▉                             | 29/100 [00:09<00:21,  3.36it/s]
 30%|████████████▎                            | 30/100 [00:09<00:24,  2.89it/s]
 31%|████████████▋                            | 31/100 [00:10<00:20,  3.31it/s]
 32%|█████████████                            | 32/100 [00:10<00:18,  3.63it/s]
 33%|█████████████▌                           | 33/100 [00:10<00:17,  3.74it/s]
 34%|█████████████▉                           | 34/100 [00:10<00:16,  3.98it/s]
 35%|██████████████▎                          | 35/100 [00:10<00:14,  4.38it/s]
 36%|██████████████▊                          | 36/100 [00:11<00:14,  4.30it/s]
 37%|███████████████▏                         | 37/100 [00:11<00:13,  4.82it/s]
 38%|███████████████▌                         | 38/100 [00:11<00:13,  4.54it/s]
 39%|███████████████▉                         | 39/100 [00:11<00:12,  4.75it/s]
 40%|████████████████▍                        | 40/100 [00:11<00:12,  4.87it/s]
 41%|████████████████▊                        | 41/100 [00:12<00:12,  4.81it/s]
 42%|█████████████████▏                       | 42/100 [00:12<00:11,  5.05it/s]
 43%|█████████████████▋                       | 43/100 [00:12<00:10,  5.27it/s]
 44%|██████████████████                       | 44/100 [00:12<00:11,  4.93it/s]
 45%|██████████████████▍                      | 45/100 [00:12<00:11,  4.97it/s]
 46%|██████████████████▊                      | 46/100 [00:13<00:10,  4.99it/s]
 47%|███████████████████▎                     | 47/100 [00:13<00:11,  4.72it/s]
 48%|███████████████████▋                     | 48/100 [00:13<00:14,  3.51it/s]
 49%|████████████████████                     | 49/100 [00:14<00:17,  2.83it/s]
 50%|████████████████████▌                    | 50/100 [00:14<00:15,  3.23it/s]
 51%|████████████████████▉                    | 51/100 [00:14<00:17,  2.85it/s]
 52%|█████████████████████▎                   | 52/100 [00:15<00:15,  3.19it/s]
 53%|█████████████████████▋                   | 53/100 [00:15<00:16,  2.81it/s]
 54%|██████████████████████▏                  | 54/100 [00:16<00:17,  2.56it/s]
 55%|██████████████████████▌                  | 55/100 [00:16<00:15,  2.86it/s]
 56%|██████████████████████▉                  | 56/100 [00:16<00:13,  3.21it/s]
 57%|███████████████████████▎                 | 57/100 [00:16<00:11,  3.63it/s]
 58%|███████████████████████▊                 | 58/100 [00:17<00:12,  3.31it/s]
 59%|████████████████████████▏                | 59/100 [00:17<00:11,  3.59it/s]
 60%|████████████████████████▌                | 60/100 [00:17<00:10,  3.90it/s]
 61%|█████████████████████████                | 61/100 [00:17<00:09,  4.20it/s]
 62%|█████████████████████████▍               | 62/100 [00:17<00:08,  4.36it/s]
 63%|█████████████████████████▊               | 63/100 [00:18<00:08,  4.42it/s]
 64%|██████████████████████████▏              | 64/100 [00:18<00:07,  4.93it/s]
 65%|██████████████████████████▋              | 65/100 [00:18<00:07,  4.83it/s]
 66%|███████████████████████████              | 66/100 [00:18<00:07,  4.52it/s]
 67%|███████████████████████████▍             | 67/100 [00:18<00:06,  4.74it/s]
 68%|███████████████████████████▉             | 68/100 [00:19<00:06,  4.88it/s]
 69%|████████████████████████████▎            | 69/100 [00:19<00:07,  4.41it/s]
 70%|████████████████████████████▋            | 70/100 [00:20<00:09,  3.05it/s]
 71%|█████████████████████████████            | 71/100 [00:20<00:08,  3.55it/s]
 72%|█████████████████████████████▌           | 72/100 [00:20<00:07,  3.94it/s]
 73%|█████████████████████████████▉           | 73/100 [00:20<00:06,  4.22it/s]
 74%|██████████████████████████████▎          | 74/100 [00:20<00:05,  4.35it/s]
 75%|██████████████████████████████▊          | 75/100 [00:21<00:07,  3.35it/s]
 76%|███████████████████████████████▏         | 76/100 [00:21<00:06,  3.96it/s]
 77%|███████████████████████████████▌         | 77/100 [00:21<00:05,  3.98it/s]
 78%|███████████████████████████████▉         | 78/100 [00:21<00:05,  4.01it/s]
 79%|████████████████████████████████▍        | 79/100 [00:22<00:04,  4.24it/s]
 80%|████████████████████████████████▊        | 80/100 [00:22<00:04,  4.84it/s]
 81%|█████████████████████████████████▏       | 81/100 [00:22<00:04,  4.73it/s]
 82%|█████████████████████████████████▌       | 82/100 [00:22<00:03,  5.09it/s]
 83%|██████████████████████████████████       | 83/100 [00:22<00:03,  4.94it/s]
 84%|██████████████████████████████████▍      | 84/100 [00:22<00:03,  5.22it/s]
 85%|██████████████████████████████████▊      | 85/100 [00:23<00:03,  4.78it/s]
 86%|███████████████████████████████████▎     | 86/100 [00:24<00:07,  1.96it/s]
 87%|███████████████████████████████████▋     | 87/100 [00:24<00:05,  2.35it/s]
 88%|████████████████████████████████████     | 88/100 [00:24<00:04,  2.80it/s]
 89%|████████████████████████████████████▍    | 89/100 [00:25<00:03,  3.16it/s]
 90%|████████████████████████████████████▉    | 90/100 [00:25<00:02,  3.43it/s]
 91%|█████████████████████████████████████▎   | 91/100 [00:25<00:02,  3.80it/s]
 92%|█████████████████████████████████████▋   | 92/100 [00:25<00:01,  4.44it/s]
 93%|██████████████████████████████████████▏  | 93/100 [00:25<00:01,  4.77it/s]
 94%|██████████████████████████████████████▌  | 94/100 [00:26<00:01,  3.44it/s]
 95%|██████████████████████████████████████▉  | 95/100 [00:26<00:01,  3.80it/s]
 96%|███████████████████████████████████████▎ | 96/100 [00:26<00:00,  4.14it/s]
 97%|███████████████████████████████████████▊ | 97/100 [00:26<00:00,  4.16it/s]
 98%|████████████████████████████████████████▏| 98/100 [00:27<00:00,  4.70it/s]
 99%|████████████████████████████████████████▌| 99/100 [00:27<00:00,  4.46it/s]
100%|████████████████████████████████████████| 100/100 [00:27<00:00,  4.71it/s]

In [145]:

np.array([entity == None for entity in testset]).astype(int).sum()

Out[145]:

19 misses in 100 randoms. So we're able to match 80% of the time. Not bad as far as fuzzy matches go.

If our data is missing at random (MAR), then we are happy, because this makes for a statistically valid sample of all restaurants in New York City—we can simply drop the other values.

But there's a high likelihood, in my professional opinion, that the data is missing not at a random (MNAR). I expect that it's more likely for a restaurant or eatery in a poor neighborhood to be missing the phone number information in Yelp! than one in a richer, whiter community. This is a severe under-reporting issue that will invalidate any conclusions we try to make using this data with regarding to "gentrification" and whatnot.

We'll need to validate the data geospatially. We'll do that next.

Run the full Yelp! API data through, one slice at a time.

In [150]:

len(inspections_fff)

Out[150]:

In [13]:

inspections_fff.to_csv("../data/inspections_flattened_initial.csv", encoding="utf-8")

Oy vey. This will require two days of processing, since the API limit is 25000/day.

In [15]:

from tqdm import tqdm_notebook

In [14]:

del inspections
del inspections_f
del inspections_ff

In [38]:

inspections_fff['DOHMH PHONE'].iloc[2702:2705]

Out[38]:

CAMIS
40788706    7187231080
40788884    __________
40788886    6462307208
Name: DOHMH PHONE, dtype: object

Uh, ok.

Hideous¶

In [39]:

raw_yelp_5000 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(num) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][:5000])]

In [42]:

raw_yelp_5000_2 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(num) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][5000:10000])]

In [43]:

raw_yelp_5000_3 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(num) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][10000:15000])]

In [54]:

import pickle

with open("../data/raw_yelp_list.pkl", "wb") as f:
    pickle.dump(raw_yelp_5000+ raw_yelp_5000_2 + raw_yelp_5000_3, f)

In [18]:

import pickle

with open("../data/raw_yelp_list.pkl", "rb") as f:
    raw_yelp_1_to_3 = pickle.load(f)

In [20]:

len(raw_yelp_1_to_3)

Out[20]:

In [24]:

raw_yelp_5000_4 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(str(int(num))) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][15000:20000])]

In [35]:

inspections_fff['DOHMH PHONE'].iloc[20000 + 3270]

Out[35]:

1646644665.0

This appears to be an invalid phone number. Parsing it through Yelp raises a reliable error, which I've now coded into the loop.

In [84]:

yelp_phone_fetch(int(inspections_fff['DOHMH PHONE'].iloc[20000 + 3270]))

There seem to be multiple such numbers in there.

In [85]:

yelp_phone_fetch(int(inspections_fff['DOHMH PHONE'].iloc[20000 + 4065]))

In [87]:

raw_yelp_5000_5 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(str(int(num))) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][20000:25000])]

In [88]:

raw_yelp_5000_6 = [yelp_phone_fetch(int(num)) if pd.notnull(num) and str.isdigit(str(int(num))) else None for num in tqdm_notebook(inspections_fff['DOHMH PHONE'][25000:])]

In [90]:

all_raws = raw_yelp_1_to_3 + raw_yelp_5000_4 + raw_yelp_5000_5 + raw_yelp_5000_6

In [91]:

len(all_raws)

Out[91]:

With the data thus acquired, we now assign it to our dataframe. Before we do that, first we need to clean up what we have.

In [92]:

inspections_fff.sample(1)

Out[92]:

	DOHMH DBA	DOHMH BORO	DOHMH BUILDING	DOHMH STREET	DOHMH ZIPCODE	DOHMH PHONE	DOHMH CUISINE DESCRIPTION	DOHMH INITIAL INSPECTION DATE	DOHMH ACTION	DOHMH VIOLATION CODE	DOHMH VIOLATION DESCRIPTION	DOHMH CRITICAL FLAG	DOHMH SCORE	DOHMH GRADE	DOHMH GRADE DATE	DOHMH RECORD DATE	DOHMH INITIAL INSPECTION TYPE	DOHMH LATEST INSPECTION DATE
CAMIS
41405535	TWO BOOTS	MANHATTAN	625.0	9 AVENUE	10036	2.129563e+09	Pizza	2014-06-24	Violations were cited in the following area(s).	04L	Evidence of mice or live mice present in facil...	Critical	15.0	NaN	NaN	09/27/2016	Cycle Inspection / Initial Inspection	2016-09-13

In [127]:

all_raws[0] = {'Yelp Address': ['1007 Morris Park Avenue'],
 'Yelp Categories': ["Category(name='Bakeries', alias='bakeries')",
  "Category(name='Desserts', alias='desserts')"],
 'Yelp ID': 'morris-pk-bake-shop-bronx',
 'Yelp Is Claimed': False,
 'Yelp Is Closed': False,
 'Yelp Latitude': 40.848445892334,
 'Yelp Longitude': -73.8560791015625,
 'Yelp Name': 'Morris Pk Bake Shop',
 'Yelp Neighborhoods': ['Morris Park'],
 'Yelp Rating': 4.5,
 'Yelp Review Count': 27,
 'Yelp URL': 'https://www.yelp.com/biz/morris-pk-bake-shop-bronx?adjust_creative=dkJPGu_jtTyHwsEgZIZN6g&utm_campaign=yelp_api&utm_medium=api_v2_phone_search&utm_source=dkJPGu_jtTyHwsEgZIZN6g'}

In [131]:

all_raws[0]

Out[131]:

{'Yelp Address': ['1007 Morris Park Avenue'],
 'Yelp Categories': ["Category(name='Bakeries', alias='bakeries')",
  "Category(name='Desserts', alias='desserts')"],
 'Yelp ID': 'morris-pk-bake-shop-bronx',
 'Yelp Is Claimed': False,
 'Yelp Is Closed': False,
 'Yelp Latitude': 40.848445892334,
 'Yelp Longitude': -73.8560791015625,
 'Yelp Name': 'Morris Pk Bake Shop',
 'Yelp Neighborhoods': ['Morris Park'],
 'Yelp Rating': 4.5,
 'Yelp Review Count': 27,
 'Yelp URL': 'https://www.yelp.com/biz/morris-pk-bake-shop-bronx?adjust_creative=dkJPGu_jtTyHwsEgZIZN6g&utm_campaign=yelp_api&utm_medium=api_v2_phone_search&utm_source=dkJPGu_jtTyHwsEgZIZN6g'}

In [232]:

import re
import copy
re_cat_word = re.compile("name='[-\w\s&()/,']+(?=')")

def safe_group(match_obj):
    try:
        return match_obj.group().replace("name='", "")
    except AttributeError:
        return None

def format_data(yelp_dict):
    if yelp_dict:
        ret = copy.deepcopy(yelp_dict)
        if ret['Yelp Address']:
            ret['Yelp Address'] = ret['Yelp Address'][0]
        cats = ret['Yelp Categories']
        # print(cats)
        if cats:
            try:
                parsed_cats = [re.search(re_cat_word, str(cat)).group().replace("name='", "") for cat in cats]
            except:
                print(cats)
                parsed_cats = []
            # print(parsed_cats)
            ret['Yelp Categories'] = "|".join(parsed_cats)
        neighborhoods = ret['Yelp Neighborhoods']
        if neighborhoods:
            ret['Yelp Neighborhoods'] = "|".join(ret['Yelp Neighborhoods'])
        del ret['Yelp URL']
        return ret
    else:
        return None

In [235]:

formatted_yelp_data = []

for raw in tqdm_notebook(all_raws):
    formatted_yelp_data.append(format_data(raw))

[Category(name="Women's Clothing", alias='womenscloth')]
[Category(name='Department Stores', alias='deptstores'), Category(name="Men's Clothing", alias='menscloth'), Category(name="Women's Clothing", alias='womenscloth')]
[Category(name="Men's Clothing", alias='menscloth'), Category(name="Women's Clothing", alias='womenscloth')]
[Category(name="Men's Clothing", alias='menscloth'), Category(name='Used, Vintage & Consignment', alias='vintage')]
[Category(name='Jewelry', alias='jewelry'), Category(name="Women's Clothing", alias='womenscloth')]
[Category(name='Motorcycle Gear', alias='motorcyclinggear'), Category(name="Men's Clothing", alias='menscloth'), Category(name='Coffee & Tea', alias='coffee')]
[Category(name="Children's Clothing", alias='childcloth'), Category(name='Ice Cream & Frozen Yogurt', alias='icecream'), Category(name='Desserts', alias='desserts')]
[Category(name='Shoe Stores', alias='shoes'), Category(name="Men's Clothing", alias='menscloth'), Category(name="Women's Clothing", alias='womenscloth')]
[Category(name='Toy Stores', alias='toys'), Category(name="Men's Clothing", alias='menscloth'), Category(name="Women's Clothing", alias='womenscloth')]

In [236]:

formatted_yelp_data[0]

Out[236]:

{'Yelp Address': '1007 Morris Park Avenue',
 'Yelp Categories': 'Bakeries|Desserts',
 'Yelp ID': 'morris-pk-bake-shop-bronx',
 'Yelp Is Claimed': False,
 'Yelp Is Closed': False,
 'Yelp Latitude': 40.848445892334,
 'Yelp Longitude': -73.8560791015625,
 'Yelp Name': 'Morris Pk Bake Shop',
 'Yelp Neighborhoods': 'Morris Park',
 'Yelp Rating': 4.5,
 'Yelp Review Count': 27}

Now assign.

In [241]:

inspections_ffff = inspections_fff.copy()

for key in formatted_yelp_data[0].keys():
    inspections_ffff[key] = [s[key] if s else None for s in formatted_yelp_data]

In [249]:

inspections_ffff.columns = [col.upper() for col in inspections_ffff.columns]

In [250]:

inspections_ffff

Out[250]:

	DOHMH DBA	DOHMH BORO	DOHMH BUILDING	DOHMH STREET	DOHMH ZIPCODE	DOHMH PHONE	DOHMH CUISINE DESCRIPTION	DOHMH INITIAL INSPECTION DATE	DOHMH ACTION	DOHMH VIOLATION CODE	...	YELP LATITUDE	YELP ADDRESS	YELP ID	YELP RATING	YELP REVIEW COUNT	YELP IS CLAIMED	YELP LONGITUDE	YELP NEIGHBORHOODS	YELP CATEGORIES	YELP NAME
CAMIS
30075445	MORRIS PARK BAKE SHOP	BRONX	1007.0	MORRIS PARK AVE	10462	7.188925e+09	Bakery	2013-06-01	Violations were cited in the following area(s).	16B	...	40.848446	1007 Morris Park Avenue	morris-pk-bake-shop-bronx	4.5	27.0	False	-73.856079	Morris Park	Bakeries\|Desserts	Morris Pk Bake Shop
30112340	WENDY'S	BROOKLYN	469.0	FLATBUSH AVENUE	11225	7.182875e+09	Hamburgers	2014-06-05	Violations were cited in the following area(s).	10B	...	40.662952	469 Flatbush Ave	wendys-brooklyn-4	2.0	23.0	False	-73.961753	Prospect Heights\|Prospect Lefferts Gardens	Fast Food\|Burgers	Wendy's
30191841	DJ REYNOLDS PUB AND RESTAURANT	MANHATTAN	351.0	WEST 57 STREET	10019	2.122453e+09	Irish	2013-07-22	Violations were cited in the following area(s).	10B	...	40.767750	351 W 57th St	dj-reynolds-new-york-3	3.0	75.0	False	-73.984870	Midtown West\|Hell's Kitchen	Irish\|Pubs	DJ Reynolds
40356018	RIVIERA CATERER	BROOKLYN	2780.0	STILLWELL AVENUE	11224	7.183723e+09	American	2013-06-05	Violations were cited in the following area(s).	10F	...	40.579521	2780 Stillwell Ave	riviera-caterers-brooklyn	4.0	23.0	True	-73.982430	Coney Island	Caterers	Riviera Caterers
40356151	BRUNOS ON THE BOULEVARD	QUEENS	8825.0	ASTORIA BOULEVARD	11369	7.183351e+09	American	2014-04-11	Violations were cited in the following area(s).	04J	...	40.764240	8825 Astoria Blvd	events-by-brunos-jackson-heights	4.0	15.0	True	-73.880410	East Elmhurst	Caterers\|Venues & Event Spaces	Events By Bruno's
40356483	WILKEN'S FINE FOOD	BROOKLYN	7114.0	AVENUE U	11234	7.184444e+09	Delicatessen	2013-07-09	Violations were cited in the following area(s).	02H	...	40.619900	7114 Ave U	wilkens-ii-deli-brooklyn	3.5	25.0	False	-73.906853	Bergen Beach	Delis	Wilkens II Deli
40356731	TASTE THE TROPICS ICE CREAM	BROOKLYN	1839.0	NOSTRAND AVENUE	11226	7.188561e+09	Ice Cream, Gelato, Yogurt, Ices	2013-07-10	Violations were cited in the following area(s).	10B	...	40.640820	1839 Nostrand Ave	taste-the-tropics-brooklyn	4.5	16.0	False	-73.948151	Flatbush	Ice Cream & Frozen Yogurt	Taste the Tropics
40357217	WILD ASIA	BRONX	2300.0	SOUTHERN BOULEVARD	10460	7.182208e+09	American	2013-06-19	Violations were cited in the following area(s).	10B	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
40357437	C & C CATERING SERVICE	BROOKLYN	7715.0	18 AVENUE	11214	7.182323e+09	American	2014-04-16	Violations were cited in the following area(s).	06D	...	40.611713	7715 18th Ave	c-and-c-catering-service-brooklyn	3.0	2.0	True	-73.997261	Bensonhurst	Caterers	C & C Catering Service
40359480	1 EAST 66TH STREET KITCHEN	MANHATTAN	1.0	EAST 66 STREET	10065	2.128794e+09	American	2014-05-07	Violations were cited in the following area(s).	10B	...	40.768684	1 E 66th St	wyeth-james-new-york	0.0	0.0	False	-73.969337	Upper East Side	None	Wyeth James
40359705	NATHAN'S FAMOUS	BROOKLYN	1310.0	SURF AVENUE	11224	7.183332e+09	Hotdogs	2013-12-03	Violations were cited in the following area(s).	10F	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
40360045	SEUDA FOODS	BROOKLYN	705.0	KINGS HIGHWAY	11223	7.183752e+09	Jewish/Kosher	2013-10-10	Violations were cited in the following area(s).	08A	...	40.606472	705 Kings Hwy	seuda-foods-brooklyn	4.5	3.0	False	-73.965415	Midwood\|Gravesend	Caterers	Seuda Foods
40360076	CARVEL ICE CREAM	BROOKLYN	203.0	CHURCH AVENUE	11218	7.184390e+09	Ice Cream, Gelato, Yogurt, Ices	2014-02-10	Violations were cited in the following area(s).	10F	...	40.643380	203 Church Ave	carvel-brooklyn-3	3.5	18.0	False	-73.978199	Kensington\|Flatbush	Ice Cream & Frozen Yogurt	Carvel
40361322	CARVEL ICE CREAM	QUEENS	26515.0	HILLSIDE AVENUE	11004	7.183430e+09	Ice Cream, Gelato, Yogurt, Ices	2013-09-18	Violations were cited in the following area(s).	08A	...	40.743605	26515 Hillside Ave	carvel-glen-oaks-2	5.0	4.0	False	-73.660313	None	Ice Cream & Frozen Yogurt	Carvel
40361606	THE MOVABLE FEAST	BROOKLYN	284.0	PROSPECT PARK WEST	11215	7.189653e+09	American	2013-11-14	Violations were cited in the following area(s).	10F	...	40.658035	284 Prospect Park W	the-movable-feast-brooklyn	4.5	16.0	True	-73.982879	South Slope\|Windsor Terrace	Caterers	The Movable Feast
40361618	SAL'S DELI	QUEENS	12908.0	20 AVENUE	11356	7.186619e+09	Delicatessen	2014-08-16	Violations were cited in the following area(s).	08C	...	40.781471	12908 20th Ave	sals-deli-college-point	4.5	4.0	False	-73.839287	College Point	Delis\|Sandwiches	Sal's Deli
40361708	BULLY'S DELI	MANHATTAN	759.0	BROADWAY	10003	2.122550e+09	Delicatessen	2014-01-21	Violations were cited in the following area(s).	10J	...	40.730826	759 Broadway	bullys-deli-new-york-13	3.0	110.0	True	-73.992404	Greenwich Village	Delis	Bully's Deli
40361998	STEVE CHU'S DELI & GROCERY	QUEENS	3406.0	10 STREET	11106	7.187269e+09	Delicatessen	2013-03-13	Violations were cited in the following area(s).	10F	...	40.764748	3406 10th St	steve-chus-deli-and-grocery-astoria	4.0	4.0	False	-73.938873	Astoria	Delis	Steve Chu's Deli & Grocery
40362098	HARRIET'S KITCHEN	MANHATTAN	502.0	AMSTERDAM AVENUE	10024	2.127210e+09	American	2014-02-03	Violations were cited in the following area(s).	04M	...	40.786678	502 Amsterdam Ave	harriets-kitchen-new-york	3.0	106.0	True	-73.976074	Upper West Side	American (Traditional)\|Burgers	Harriet's Kitchen
40362264	P & S DELI GROCERY	MANHATTAN	730.0	COLUMBUS AVENUE	10025	2.129323e+09	American	2013-04-08	Violations were cited in the following area(s).	02G	...	40.792419	724 Columbus Ave	ps-deli-and-grocery-manhattan	4.0	2.0	False	-73.967912	Upper West Side	Grocery	PS Deli and Grocery
40362274	ANGELIKA FILM CENTER	MANHATTAN	18.0	WEST HOUSTON STREET	10012	2.129953e+09	American	2013-04-05	Violations were cited in the following area(s).	10F	...	40.725891	18 W Houston St	angelika-film-center-and-cafe-new-york-new-york-2	3.5	396.0	True	-73.996986	Greenwich Village	Cinema\|Cafes	Angelika Film Center & Cafe - New York
40362432	HO MEI RESTAURANT	QUEENS	10305.0	37 AVENUE	11368	7.187797e+09	Chinese	2013-05-02	Violations were cited in the following area(s).	06D	...	40.753552	10305 37th Avenue	ho-mei-corona	5.0	3.0	False	-73.864227	North Corona	Restaurants	Ho Mei
40362715	THE COUNTRY CAFE	MANHATTAN	60.0	WALL STREET	10005	3.474279e+09	Sandwiches/Salads/Mixed Buffet	2013-09-18	Violations were cited in the following area(s).	06B	...	40.706241	60 Wall St	the-country-cafe-new-york-2	2.5	16.0	False	-74.008506	Financial District	Sandwiches\|Cafes\|Breakfast & Brunch	The Country Cafe
40362869	SHASHEMENE INT'L RESTAURA	BROOKLYN	195.0	EAST 56 STREET	11203	3.474301e+09	Caribbean	2013-05-08	Violations were cited in the following area(s).	10B	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
40363093	CARVEL ICE CREAM	BRONX	1006.0	EAST 233 STREET	10466	7.186527e+09	Ice Cream, Gelato, Yogurt, Ices	2013-08-02	Violations were cited in the following area(s).	06D	...	40.890331	1006 E 233rd St	carvel-bronx-3	2.0	5.0	False	-73.848595	Edenwald	Ice Cream & Frozen Yogurt	Carvel
40363098	DUNKIN' DONUTS	BROOKLYN	56.0	COURT STREET	11201	7.186252e+09	Donuts	2013-01-08	Violations were cited in the following area(s).	08A	...	40.692272	56 Court st	dunkin-donuts-brooklyn-131	1.0	2.0	False	-73.991493	Brooklyn Heights	Donuts\|Coffee & Tea	Dunkin' Donuts
40363289	HAPPY GARDEN	BRONX	1236.0	238 SPOFFORD AVE	10474	7.186172e+09	Chinese	2013-12-30	Violations were cited in the following area(s).	10F	...	40.814002	1236 Spofford Avenue	chinese-fried-chicken-bronx-3	0.0	0.0	False	-73.889529	Hunts Point	Chinese	Chinese Fried Chicken
40363298	CAFE METRO	MANHATTAN	625.0	8 AVENUE	10018	2.127149e+09	American	2013-12-13	Violations were cited in the following area(s).	06D	...	40.756252	625 8th Ave Bsmt 2	cosane-new-york	0.0	0.0	False	-73.990685	Theater District\|Midtown West\|Hell's Kitchen	Coffee & Tea	Cosane
40363333	TONY'S DELI	QUEENS	1069.0	WYCKOFF AVENUE	11385	7.183868e+09	Delicatessen	2013-06-10	Violations were cited in the following area(s).	04N	...	40.694920	1069 Wyckoff Avenue	tonts-too-deli-and-coffee-shop-ridgewood	0.0	0.0	False	-73.902460	Bushwick\|Ridgewood	Delis\|Sandwiches	Tont's Too Deli & Coffee Shop
40363426	LEXLER DELI	MANHATTAN	405.0	LEXINGTON AVENUE	10174	2.126871e+09	Sandwiches/Salads/Mixed Buffet	2013-08-21	Violations were cited in the following area(s).	10F	...	40.751910	405 Lexington Ave	lexler-deli-new-york	4.5	15.0	False	-73.975530	Midtown East	Delis	Lexler Deli
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
50055989	NaN	QUEENS	25109.0	NORTHERN BLVD	11362	3.473282e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50055990	RESTAURANT ASSOCIATES INC	MANHATTAN	1177.0	AVENUE OF THE AMERICAS	10036	2.123363e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50055996	NaN	MANHATTAN	NaN	CANAL ST	10013	7.184619e+09	Other	1900-01-01	NaN	NaN	...	40.759792	39-15 Main St	alicechencpa-queens	1.0	3.0	False	-73.830063	Flushing\|Downtown Flushing	Financial Services	AliceChenCPA
50055997	NaN	BRONX	1167.0	BOSTON RD	10456	7.184508e+09	Other	1900-01-01	NaN	NaN	...	40.828840	1167 Boston Rd	mikes-pizza-and-pasta-bronx	5.0	1.0	False	-73.903340	Morrisania	Pizza	Mike's Pizza and Pasta
50056005	SANTA PANZA	BROOKLYN	1079.0	BROADWAY	11221	7.186980e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056006	NaN	BROOKLYN	239.0	BRIGHTON BEACH AVE	11235	9.019072e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056007	NaN	MANHATTAN	1.0	E 35TH ST	10016	9.178265e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056009	NaN	QUEENS	15702.0	NORTHERN BLVD	11354	9.176238e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056014	PIZZA ROYALE	BROOKLYN	6716.0	FORT HAMILTON PKWY	11219	7.182385e+09	Other	1900-01-01	NaN	NaN	...	40.629331	6718 Fort Hamilton Pkwy	pizza-royale-brooklyn	3.0	39.0	True	-74.011227	Dyker Heights	Pizza\|Italian	Pizza Royale
50056016	NaN	QUEENS	1526.0	COLLEGE POINT BLVD	11356	9.178734e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056017	VANGUARD WINE BAR	MANHATTAN	1372.0	1ST AVE	10021	2.122889e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056018	NaN	BRONX	3513.0	E TREMONT AVE	10465	7.188220e+09	Other	1900-01-01	NaN	NaN	...	40.830978	3513 E Tremont Ave	legendary-bar-and-restaurant-bronx	4.5	11.0	False	-73.826118	Schuylerville	Bars\|Italian	Legendary Bar & Restaurant
50056020	DOT & LINE	BROOKLYN	253.0	BERGEN ST	11217	3.475178e+08	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056021	NaN	MANHATTAN	2656.0	BROADWAY	10025	7.184900e+09	Other	1900-01-01	NaN	NaN	...	40.797590	2656 Broadway	lava-kitchen-new-york	0.0	0.0	True	-73.969310	Manhattan Valley	Chinese\|Asian Fusion\|Noodles	Lava Kitchen
50056022	NaN	QUEENS	3526.0	FARRINGTON ST	11354	7.184600e+09	Other	1900-01-01	NaN	NaN	...	40.763670	3526 Farrington St	joomakgol-flushing	4.5	47.0	True	-73.831750	Flushing\|Downtown Flushing	Korean	JooMakGol
50056030	NaN	QUEENS	11916.0	ROCKAWAY BLVD	11420	9.178057e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056031	NaN	BROOKLYN	652.0	PENNSYLVANIA AVE	11207	9.174454e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056033	NaN	QUEENS	10837.0	CORONA AVE	11368	9.172952e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056036	NORTHERN CAFE INC.	QUEENS	10113.0	NORTHERN BLVD	11368	3.477838e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056056	NaN	QUEENS	1304.0	BEACH CHANNEL DR	11691	3.478130e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056057	NaN	QUEENS	6418.0	108TH ST	11375	9.173228e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056063	NaN	MANHATTAN	200.0	DYCKMAN ST	10040	3.472959e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056065	NaN	MANHATTAN	4241.0	BROADWAY	10033	6.468212e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056067	HAROLD'S MEAT & THREE	MANHATTAN	231.0	HUDSON ST	10013	3.476752e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056068	XELAJU RESTAURANT	QUEENS	8830.0	PARSONS BLVD	11432	7.186573e+09	Other	1900-01-01	NaN	NaN	...	40.706911	88-30 Parsons Blvd	luna-de-xelaju-restaurant-and-pizzeria-jamaica	2.0	1.0	False	-73.802483	Jamaica	Spanish	Luna de Xelaju Restaurant & Pizzeria
50056069	NaN	MANHATTAN	4484.0	BROADWAY	10040	9.172041e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056073	CULINART INC	MANHATTAN	1.0	UN PLZ	10017	2.127581e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056074	ABRACO	MANHATTAN	81.0	E 7TH ST	10003	3.476102e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056081	NaN	MANHATTAN	232.0	E 111TH ST	10029	9.174841e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None
50056087	NaN	BROOKLYN	293.0	AVENUE X	11223	9.177708e+09	Other	1900-01-01	NaN	NaN	...	NaN	None	None	NaN	NaN	None	NaN	None	None	None

26064 rows × 29 columns

In [251]:

inspections_ffff.to_csv("../data/yelp_dohmh_agg_data.csv", encoding='utf-8')