Simple demo of how to get data out out of Food Standards Agency. This uses a scraper I've built previously for grabbing data from FSA into a SQLite database.
To install the scraper, uncomment and run the following:
#%pip install git+https://github.com/ouseful-datasupply/food_gov_uk.git
With the scraper installed, grab the data from the FSA website into a SQLite db (food.sqlite
):
#!rm food.sqlite
# The full download takes quite a lot of time (20 mins+); I've raised an issue:
# https://github.com/ouseful-datasupply/food_gov_uk/issues/1
#Please feel free to submit a PR
# If you fany making PRs, there's also a lot of try:except: blocks that should be properly trapped.
# It would perhaps also make sense to include a "demo" that just grabs one data file.
# It would be useful to also automagically use a tqdm notebook widget if we are in a notebook
# I guess we are calling from commandline here though...
!oi_fsa collect --dbname food.sqlite
Check what tables are available:
import pandas as pd
import sqlite3
# Read sqlite query results into a pandas DataFrame
con = sqlite3.connect("food.sqlite")
df = pd.read_sql_query("SELECT name FROM sqlite_master WHERE type='table';", con)
df.head()
name | |
---|---|
0 | fsa_ratings_metadata |
1 | ratingstable |
Preview each table, strating with the fsa_ratings_metadata
table:
pd.read_sql_query("SELECT * FROM fsa_ratings_metadata LIMIT 5;", con)
Local authority | Last update | Number of businesses | Link | |
---|---|---|---|---|
0 | Babergh | 20/07/2020 | 864 | http://ratings.food.gov.uk/OpenDataFiles/FHRS2... |
1 | Basildon | 31/07/2020 | 1,213 | http://ratings.food.gov.uk/OpenDataFiles/FHRS1... |
2 | Bedford | 10/07/2020 | 1,419 | http://ratings.food.gov.uk/OpenDataFiles/FHRS7... |
3 | Braintree | 30/07/2020 | 1,145 | http://ratings.food.gov.uk/OpenDataFiles/FHRS1... |
4 | Breckland | 28/07/2020 | 1,276 | http://ratings.food.gov.uk/OpenDataFiles/FHRS2... |
And the ratingstable
:
pd.read_sql_query("SELECT * FROM ratingstable LIMIT 5;", con)
FHRSID | LocalAuthorityBusinessID | BusinessName | BusinessType | BusinessTypeID | AddressLine1 | AddressLine2 | AddressLine3 | PostCode | RatingValue | ... | LocalAuthorityEmailAddress | SchemeType | NewRatingPending | AddressLine4 | Hygiene | Structural | ConfidenceInManagement | Longitude | Latitude | RightToReply | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 861699 | PI/000109856 | A G Lifestyle | Restaurant/Cafe/Canteen | 1 | 38 Station Road | SUDBURY | Suffolk | CO10 2SS | 5 | ... | foodsafety@baberghmidsuffolk.gov.uk | FHRS | False | None | 5 | 0 | 0 | 0.73160300000000 | 52.03688800000000 | None |
1 | 526543 | PI/000036352 | A Leeder Butchers | Retailers - other | 4613 | 33 Swan Street | Boxford | SUDBURY | CO10 5NZ | 5 | ... | foodsafety@baberghmidsuffolk.gov.uk | FHRS | False | Suffolk | 0 | 0 | 0 | 0.85756600000000 | 52.02934900000000 | None |
2 | 1067344 | PI/000179206 | A Slice of NY | Restaurant/Cafe/Canteen | 1 | 11a Friars Street | SUDBURY | Suffolk | CO10 2AA | 5 | ... | foodsafety@baberghmidsuffolk.gov.uk | FHRS | False | None | 0 | 0 | 0 | 0.72908000000000 | 52.03735200000000 | None |
3 | 526626 | PI/000046693 | A Taste Of Home | Other catering premises | 7841 | None | None | None | None | AwaitingInspection | ... | foodsafety@baberghmidsuffolk.gov.uk | FHRS | False | None | None | None | None | None | None | None |
4 | 526987 | PI/000111131 | Abc Preschool | Hospitals/Childcare/Caring Premises | 5 | ABC Pre School | Country Primary School | Strickmere | CO7 6YG | Exempt | ... | foodsafety@baberghmidsuffolk.gov.uk | FHRS | False | Stratford St Mary | None | None | None | 0.97423600000000 | 51.97134600000000 | None |
5 rows × 25 columns
The ratingstable
is the interesting one. Let's just have a guess that there may be things called supermarkets:
ratings_df = pd.read_sql_query("SELECT * FROM ratingstable WHERE BusinessType LIKE '%supermarket%' LIMIT 5;", con)
ratings_df
FHRSID | LocalAuthorityBusinessID | BusinessName | BusinessType | BusinessTypeID | AddressLine1 | AddressLine2 | AddressLine3 | PostCode | RatingValue | ... | LocalAuthorityEmailAddress | SchemeType | NewRatingPending | AddressLine4 | Hygiene | Structural | ConfidenceInManagement | Longitude | Latitude | RightToReply | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 911046 | PI/000174005 | Aldi Foodstore | Retailers - supermarkets/hypermarkets | 7840 | Aldi Store Ltd | 2 Donald Mackintosh Way | Pinewood | IP8 3LQ | 5 | ... | foodsafety@baberghmidsuffolk.gov.uk | FHRS | False | IPSWICH | 5 | 5 | 5 | None | None | None |
1 | 526264 | PI/000011690 | Aldi Foodstore Ltd | Retailers - supermarkets/hypermarkets | 7840 | Girling Street | SUDBURY | Suffolk | CO10 1NB | 5 | ... | foodsafety@baberghmidsuffolk.gov.uk | FHRS | False | None | 5 | 0 | 0 | 0.72920400000000 | 52.04225900000000 | None |
2 | 1025780 | PI/000177356 | Budgens | Retailers - supermarkets/hypermarkets | 7840 | The Pharmacy | Hall Street | Long Melford | CO10 9JG | 5 | ... | foodsafety@baberghmidsuffolk.gov.uk | FHRS | False | SUDBURY | 0 | 0 | 5 | 0.71929200000000 | 52.07931500000000 | None |
3 | 782482 | PI/000170924 | Co-operative Food Group Ltd | Retailers - supermarkets/hypermarkets | 7840 | Co-operative Supermarket | 46 Canhams Road | Great Cornard | CO10 0ER | 5 | ... | foodsafety@baberghmidsuffolk.gov.uk | FHRS | False | SUDBURY | 0 | 0 | 0 | 0.75501000000000 | 52.02886900000000 | None |
4 | 526594 | PI/000039208 | Co-Operative Group Society Ltd | Retailers - supermarkets/hypermarkets | 7840 | 80a High Street | Lavenham | SUDBURY | CO10 9PT | 5 | ... | foodsafety@baberghmidsuffolk.gov.uk | FHRS | False | Suffolk | 0 | 5 | 0 | 0.79535500000000 | 52.10948700000000 | None |
5 rows × 25 columns
What other business types are there?
business_types = pd.read_sql_query("SELECT DISTINCT BusinessType FROM ratingstable;", con)
business_types
BusinessType | |
---|---|
0 | Restaurant/Cafe/Canteen |
1 | Retailers - other |
2 | Other catering premises |
3 | Hospitals/Childcare/Caring Premises |
4 | School/college/university |
5 | Retailers - supermarkets/hypermarkets |
6 | Manufacturers/packers |
7 | Pub/bar/nightclub |
8 | Takeaway/sandwich shop |
9 | Farmers/growers |
10 | Hotel/bed & breakfast/guest house |
11 | Mobile caterer |
12 | Distributors/Transporters |
13 | Importers/Exporters |
You could also filter down on business names:
q='''
SELECT DISTINCT BusinessName, COUNT(BusinessName) AS Num FROM ratingstable
WHERE BusinessType LIKE '%supermarket%'
GROUP BY BusinessName ORDER BY Num DESC;
'''
supermarket_brands = pd.read_sql_query(q, con)
supermarket_brands.head(10)
BusinessName | Num | |
---|---|---|
0 | Tesco | 1664 |
1 | Sainsbury's | 974 |
2 | Lidl | 444 |
3 | Aldi | 421 |
4 | Iceland | 398 |
5 | Co-op | 368 |
6 | Tesco Express | 363 |
7 | Morrisons | 288 |
8 | Asda | 275 |
9 | Marks & Spencer | 214 |
We can map things easily enough:
import folium
q='''
SELECT DISTINCT BusinessName, Latitude, Longitude FROM ratingstable
WHERE BusinessType LIKE '%supermarket%' AND PostCode LIKE 'MK_ %' AND Longitude NOT NULL;
'''
mk_supermarkets = pd.read_sql_query(q, con)
mk_supermarkets.head()
BusinessName | Latitude | Longitude | |
---|---|---|---|
0 | Aldi | 52.00415000000000 | -0.79434300000000 |
1 | Aldi | 52.00359344482420 | -0.72759300470352 |
2 | Asda | 52.00531500000000 | -0.72898600000000 |
3 | Co-op | 51.99034900000000 | -0.75381900000000 |
4 | Co-op | 52.00190700000000 | -0.77912200000000 |
Hmm... so we're not? Maybe my scraper is borked:-(
Anyway, example of how to work with what we've got...
mk_supermarkets.dtypes
BusinessName object Latitude object Longitude object dtype: object
Cast the lat/lon to numerics:
mk_supermarkets[['Latitude', 'Longitude']] = mk_supermarkets[['Latitude', 'Longitude']].apply(pd.to_numeric, errors='coerce')
# We could probably alternatively CAST() in the query
Get the average location to center the map:
AVERAGE_LOCATION = mk_supermarkets[['Latitude', 'Longitude']].mean()
AVERAGE_LOCATION
Latitude 52.012905 Longitude -0.751281 dtype: float64
m = folium.Map(AVERAGE_LOCATION, width=500, height=800, zoom_start=11)
def add_marker(row, m):
folium.Circle(location=[row['Latitude'], row['Longitude']],
color = 'red', radius=50, fill=True, fill_opacity=1.0,
tooltip=row['BusinessName']).add_to(m)
mk_supermarkets.apply(add_marker, m=m, axis=1)
m
The following is cribbed from https://blog.ouseful.info/2019/03/26/we-need-to-talk-about-geo/ and is a demo at an LA level. You'd probably want to go to lower geography (I'm not sure how low deprivation stats collection areas go? LSOA at least, I think...) but this is just a proof of concept.
There are other examples in Components for Rolling Your Own GIS Inside Jupyter Notebooks showing how to create draggable search areas and run geo-queries; I've also previosuly done some LSOA examples. The Police API demo has some other interesting examples, like plotting crimes along a route, which could be reversioned here to give a map of food outlets along a road, for example.
Start by grabbing some data. There's possibly an API way somewhere for doing this, or you can do it manually by searching http://geoportal.statistics.gov.uk/ eg for LAs http://geoportal.statistics.gov.uk/datasets/1d78d47c87df4212b79fe2323aae8e08_0
Get a shapefile from ONS geoportal:
import geopandas
#From the downloads area of the page, grab the link for the shapefile download
# This may take some time...
url='https://opendata.arcgis.com/datasets/7ff28788e1e640de8150fb8f35703f6e_2.zip?outSR=%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D'
gdf = geopandas.read_file(url)
Get some deprivation data - there may be more recent data available; it'd also be neater if this were done via an API call to something, perhaps on https://opendatacommunities.org/ or similar:
#https://www.gov.uk/government/statistics/english-indices-of-deprivation-2015
#File 10: local authority district summaries
data_url = 'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/464464/File_10_ID2015_Local_Authority_District_Summaries.xlsx'
#Download and read in the deprivation data Excel file
df = pd.read_excel(data_url, sheet_name=None)
#Preview the name of the sheets in the data loaded from the Excel file
df.keys()
dict_keys(['Notes', 'IMD', 'Income', 'Employment', 'Education', 'Health', 'Crime', 'Barriers', 'Living', 'IDACI', 'IDAOPI'])
Merge boundaries and deprivation data:
#Merge in data
gdf = pd.merge(gdf, df['Education'],
how='inner', #The type of join (what happens if data is in one dataset and not the other)
left_on='lad16cd', #Column we're merging on in left dataframe
right_on='Local Authority District code (2013)'#Column we're merging on in right dataframe
)
Preview the merged data:
gdf.head()
objectid | lad16cd | lad16nm | lad16nmw | bng_e | bng_n | long | lat | st_areasha | st_lengths | geometry | Local Authority District code (2013) | Local Authority District name (2013) | Education, Skills and Training - Average rank | Education, Skills and Training - Rank of average rank | Education, Skills and Training - Average score | Education, Skills and Training - Rank of average score | Education, Skills and Training - Proportion of LSOAs in most deprived 10% nationally | Education, Skills and Training - Rank of proportion of LSOAs in most deprived 10% nationally | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | E06000001 | Hartlepool | None | 447157 | 531476 | -1.27023 | 54.676159 | 9.359786e+07 | 69382.685924 | MULTIPOLYGON (((447097.001 537152.001, 447228.... | E06000001 | Hartlepool | 20101.48 | 72 | 30.510 | 47 | 0.2069 | 37 |
1 | 2 | E06000002 | Middlesbrough | None | 451141 | 516887 | -1.21099 | 54.544670 | 5.387900e+07 | 42085.584812 | MULTIPOLYGON (((449861.900 521260.700, 449853.... | E06000002 | Middlesbrough | 22728.01 | 24 | 40.640 | 3 | 0.4419 | 1 |
2 | 3 | E06000003 | Redcar and Cleveland | None | 464359 | 519597 | -1.00611 | 54.567520 | 2.448388e+08 | 96189.660709 | MULTIPOLYGON (((455776.701 528322.499, 455684.... | E06000003 | Redcar and Cleveland | 19185.28 | 95 | 27.875 | 71 | 0.1818 | 54 |
3 | 4 | E06000004 | Stockton-on-Tees | None | 444937 | 518183 | -1.30669 | 54.556911 | 2.049366e+08 | 115439.477112 | MULTIPOLYGON (((444126.099 528005.799, 444165.... | E06000004 | Stockton-on-Tees | 16660.09 | 150 | 24.637 | 110 | 0.1750 | 59 |
4 | 5 | E06000005 | Darlington | None | 428029 | 515649 | -1.56835 | 54.535351 | 1.974820e+08 | 105799.568559 | POLYGON ((423475.701 524731.596, 423497.204 52... | E06000005 | Darlington | 16385.06 | 155 | 22.569 | 129 | 0.1385 | 75 |
Example of plotting boundary overlaid on previous map (this will probably clobber the tooltip/popup; choropleth should be plotted first or on a lower layer):
folium.Choropleth(gdf[gdf['lad16nm']=='Milton Keynes'], key_on='feature.properties.lad16cd',
data=df['Education'],
columns=['Local Authority District code (2013)',
'Education, Skills and Training - Rank of average rank'],
fill_color='YlOrBr').add_to(m)
m
/usr/local/lib/python3.7/site-packages/pyproj/crs.py:77: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. return _prepare_from_string(" ".join(pjargs))
Here's the start of a simple recipe for finding areas that neighbour a particular area:
#Via https://gis.stackexchange.com/a/300262/119781
def plotNeighbours(gdf, region='Milton Keynes',
indicator='Education, Skills and Training - Rank of average rank',
cmap='OrRd'):
''' Plot choropleth for an indicator relative to a specified region and its neighbours. '''
targetBoundary = gdf[gdf['lad16nm']==region]['geometry'].values[0]
neighbours = gdf.apply(lambda row: row['geometry'].touches(targetBoundary) or row['geometry']==targetBoundary ,
axis=1)
# neighbours is a gdf that contains areas that neighbour the specified area
# Now plot these etc...
Close the database connection:
con.close()