Food Standards Agency¶

Simple demo of how to get data out out of Food Standards Agency. This uses a scraper I've built previously for grabbing data from FSA into a SQLite database.

To install the scraper, uncomment and run the following:

In [3]:

#%pip install git+https://github.com/ouseful-datasupply/food_gov_uk.git

With the scraper installed, grab the data from the FSA website into a SQLite db (food.sqlite):

In [ ]:

#!rm food.sqlite

# The full download takes quite a lot of time (20 mins+); I've raised an issue:
# https://github.com/ouseful-datasupply/food_gov_uk/issues/1
#Please feel free to submit a PR
# If you fany making PRs, there's also a lot of try:except: blocks that should be properly trapped.
# It would perhaps also make sense to include a "demo" that just grabs one data file.
# It would be useful to also automagically use a tqdm notebook widget if we are in a notebook
# I guess we are calling from commandline here though...
!oi_fsa collect --dbname food.sqlite

Check what tables are available:

In [5]:

import pandas as pd
import sqlite3

# Read sqlite query results into a pandas DataFrame
con = sqlite3.connect("food.sqlite")
df = pd.read_sql_query("SELECT name FROM sqlite_master WHERE type='table';", con)

df.head()

Out[5]:

	name
0	fsa_ratings_metadata
1	ratingstable

Preview each table, strating with the fsa_ratings_metadata table:

In [6]:

pd.read_sql_query("SELECT * FROM fsa_ratings_metadata LIMIT 5;", con)

Out[6]:

	Local authority	Last update	Number of businesses	Link
0	Babergh	20/07/2020	864	http://ratings.food.gov.uk/OpenDataFiles/FHRS2...
1	Basildon	31/07/2020	1,213	http://ratings.food.gov.uk/OpenDataFiles/FHRS1...
2	Bedford	10/07/2020	1,419	http://ratings.food.gov.uk/OpenDataFiles/FHRS7...
3	Braintree	30/07/2020	1,145	http://ratings.food.gov.uk/OpenDataFiles/FHRS1...
4	Breckland	28/07/2020	1,276	http://ratings.food.gov.uk/OpenDataFiles/FHRS2...

And the ratingstable:

In [7]:

pd.read_sql_query("SELECT * FROM ratingstable LIMIT 5;", con)

Out[7]:

	FHRSID	LocalAuthorityBusinessID	BusinessName	BusinessType	BusinessTypeID	AddressLine1	AddressLine2	AddressLine3	PostCode	RatingValue	...	LocalAuthorityEmailAddress	SchemeType	NewRatingPending	AddressLine4	Hygiene	Structural	ConfidenceInManagement	Longitude	Latitude	RightToReply
0	861699	PI/000109856	A G Lifestyle	Restaurant/Cafe/Canteen	1	38 Station Road	SUDBURY	Suffolk	CO10 2SS	5	...	foodsafety@baberghmidsuffolk.gov.uk	FHRS	False	None	5	0	0	0.73160300000000	52.03688800000000	None
1	526543	PI/000036352	A Leeder Butchers	Retailers - other	4613	33 Swan Street	Boxford	SUDBURY	CO10 5NZ	5	...	foodsafety@baberghmidsuffolk.gov.uk	FHRS	False	Suffolk	0	0	0	0.85756600000000	52.02934900000000	None
2	1067344	PI/000179206	A Slice of NY	Restaurant/Cafe/Canteen	1	11a Friars Street	SUDBURY	Suffolk	CO10 2AA	5	...	foodsafety@baberghmidsuffolk.gov.uk	FHRS	False	None	0	0	0	0.72908000000000	52.03735200000000	None
3	526626	PI/000046693	A Taste Of Home	Other catering premises	7841	None	None	None	None	AwaitingInspection	...	foodsafety@baberghmidsuffolk.gov.uk	FHRS	False	None	None	None	None	None	None	None
4	526987	PI/000111131	Abc Preschool	Hospitals/Childcare/Caring Premises	5	ABC Pre School	Country Primary School	Strickmere	CO7 6YG	Exempt	...	foodsafety@baberghmidsuffolk.gov.uk	FHRS	False	Stratford St Mary	None	None	None	0.97423600000000	51.97134600000000	None

5 rows × 25 columns

The ratingstable is the interesting one. Let's just have a guess that there may be things called supermarkets:

In [8]:

ratings_df = pd.read_sql_query("SELECT * FROM ratingstable WHERE BusinessType LIKE '%supermarket%' LIMIT 5;", con)
ratings_df

Out[8]:

	FHRSID	LocalAuthorityBusinessID	BusinessName	BusinessType	BusinessTypeID	AddressLine1	AddressLine2	AddressLine3	PostCode	RatingValue	...	LocalAuthorityEmailAddress	SchemeType	NewRatingPending	AddressLine4	Hygiene	Structural	ConfidenceInManagement	Longitude	Latitude	RightToReply
0	911046	PI/000174005	Aldi Foodstore	Retailers - supermarkets/hypermarkets	7840	Aldi Store Ltd	2 Donald Mackintosh Way	Pinewood	IP8 3LQ	5	...	foodsafety@baberghmidsuffolk.gov.uk	FHRS	False	IPSWICH	5	5	5	None	None	None
1	526264	PI/000011690	Aldi Foodstore Ltd	Retailers - supermarkets/hypermarkets	7840	Girling Street	SUDBURY	Suffolk	CO10 1NB	5	...	foodsafety@baberghmidsuffolk.gov.uk	FHRS	False	None	5	0	0	0.72920400000000	52.04225900000000	None
2	1025780	PI/000177356	Budgens	Retailers - supermarkets/hypermarkets	7840	The Pharmacy	Hall Street	Long Melford	CO10 9JG	5	...	foodsafety@baberghmidsuffolk.gov.uk	FHRS	False	SUDBURY	0	0	5	0.71929200000000	52.07931500000000	None
3	782482	PI/000170924	Co-operative Food Group Ltd	Retailers - supermarkets/hypermarkets	7840	Co-operative Supermarket	46 Canhams Road	Great Cornard	CO10 0ER	5	...	foodsafety@baberghmidsuffolk.gov.uk	FHRS	False	SUDBURY	0	0	0	0.75501000000000	52.02886900000000	None
4	526594	PI/000039208	Co-Operative Group Society Ltd	Retailers - supermarkets/hypermarkets	7840	80a High Street	Lavenham	SUDBURY	CO10 9PT	5	...	foodsafety@baberghmidsuffolk.gov.uk	FHRS	False	Suffolk	0	5	0	0.79535500000000	52.10948700000000	None

5 rows × 25 columns

What other business types are there?

In [9]:

business_types = pd.read_sql_query("SELECT DISTINCT BusinessType FROM ratingstable;", con)
business_types

Out[9]:

	BusinessType
0	Restaurant/Cafe/Canteen
1	Retailers - other
2	Other catering premises
3	Hospitals/Childcare/Caring Premises
4	School/college/university
5	Retailers - supermarkets/hypermarkets
6	Manufacturers/packers
7	Pub/bar/nightclub
8	Takeaway/sandwich shop
9	Farmers/growers
10	Hotel/bed & breakfast/guest house
11	Mobile caterer
12	Distributors/Transporters
13	Importers/Exporters

You could also filter down on business names:

In [10]:

q='''
SELECT DISTINCT BusinessName, COUNT(BusinessName) AS Num FROM ratingstable
WHERE BusinessType LIKE '%supermarket%'
GROUP BY BusinessName ORDER BY Num DESC;
'''

supermarket_brands = pd.read_sql_query(q, con)
supermarket_brands.head(10)

Out[10]:

	BusinessName	Num
0	Tesco	1664
1	Sainsbury's	974
2	Lidl	444
3	Aldi	421
4	Iceland	398
5	Co-op	368
6	Tesco Express	363
7	Morrisons	288
8	Asda	275
9	Marks & Spencer	214

Simple Mapping¶

We can map things easily enough:

In [11]:

import folium

q='''
SELECT DISTINCT BusinessName, Latitude, Longitude FROM ratingstable
WHERE BusinessType LIKE '%supermarket%' AND PostCode LIKE 'MK_ %' AND Longitude NOT NULL;
'''

mk_supermarkets = pd.read_sql_query(q, con)

mk_supermarkets.head()

Out[11]:

	BusinessName	Latitude	Longitude
0	Aldi	52.00415000000000	-0.79434300000000
1	Aldi	52.00359344482420	-0.72759300470352
2	Asda	52.00531500000000	-0.72898600000000
3	Co-op	51.99034900000000	-0.75381900000000
4	Co-op	52.00190700000000	-0.77912200000000

Hmm... so we're not? Maybe my scraper is borked:-(

Anyway, example of how to work with what we've got...

In [12]:

mk_supermarkets.dtypes

Out[12]:

BusinessName    object
Latitude        object
Longitude       object
dtype: object

Cast the lat/lon to numerics:

In [13]:

mk_supermarkets[['Latitude', 'Longitude']] = mk_supermarkets[['Latitude', 'Longitude']].apply(pd.to_numeric, errors='coerce')
# We could probably alternatively CAST() in the query

Get the average location to center the map:

In [14]:

AVERAGE_LOCATION = mk_supermarkets[['Latitude', 'Longitude']].mean()
AVERAGE_LOCATION

Out[14]:

Latitude     52.012905
Longitude    -0.751281
dtype: float64

In [15]:

m = folium.Map(AVERAGE_LOCATION, width=500, height=800, zoom_start=11)

def add_marker(row, m):
    folium.Circle(location=[row['Latitude'], row['Longitude']],
                  color = 'red', radius=50, fill=True, fill_opacity=1.0,
                  tooltip=row['BusinessName']).add_to(m)

mk_supermarkets.apply(add_marker, m=m, axis=1)

m

Out[15]:

Index Of Deprivation — Example¶

The following is cribbed from https://blog.ouseful.info/2019/03/26/we-need-to-talk-about-geo/ and is a demo at an LA level. You'd probably want to go to lower geography (I'm not sure how low deprivation stats collection areas go? LSOA at least, I think...) but this is just a proof of concept.

There are other examples in Components for Rolling Your Own GIS Inside Jupyter Notebooks showing how to create draggable search areas and run geo-queries; I've also previosuly done some LSOA examples. The Police API demo has some other interesting examples, like plotting crimes along a route, which could be reversioned here to give a map of food outlets along a road, for example.

Data Ingest¶

Start by grabbing some data. There's possibly an API way somewhere for doing this, or you can do it manually by searching http://geoportal.statistics.gov.uk/ eg for LAs http://geoportal.statistics.gov.uk/datasets/1d78d47c87df4212b79fe2323aae8e08_0

Get a shapefile from ONS geoportal:

In [16]:

import geopandas
 
#From the downloads area of the page, grab the link for the shapefile download
# This may take some time...
url='https://opendata.arcgis.com/datasets/7ff28788e1e640de8150fb8f35703f6e_2.zip?outSR=%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D'
gdf = geopandas.read_file(url)
 

Get some deprivation data - there may be more recent data available; it'd also be neater if this were done via an API call to something, perhaps on https://opendatacommunities.org/ or similar:

In [18]:

#https://www.gov.uk/government/statistics/english-indices-of-deprivation-2015
#File 10: local authority district summaries
data_url = 'https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/464464/File_10_ID2015_Local_Authority_District_Summaries.xlsx'
 
#Download and read in the deprivation data Excel file
df = pd.read_excel(data_url, sheet_name=None)
 
#Preview the name of the sheets in the data loaded from the Excel file
df.keys()

Out[18]:

dict_keys(['Notes', 'IMD', 'Income', 'Employment', 'Education', 'Health', 'Crime', 'Barriers', 'Living', 'IDACI', 'IDAOPI'])

Merge boundaries and deprivation data:

In [19]:

#Merge in data
gdf = pd.merge(gdf, df['Education'],
               how='inner',  #The type of join (what happens if data is in one dataset and not the other)
               left_on='lad16cd', #Column we're merging on in left dataframe
               right_on='Local Authority District code (2013)'#Column we're merging on in right dataframe
              )

Preview the merged data:

In [20]:

gdf.head()

Out[20]:

	objectid	lad16cd	lad16nm	lad16nmw	bng_e	bng_n	long	lat	st_areasha	st_lengths	geometry	Local Authority District code (2013)	Local Authority District name (2013)	Education, Skills and Training - Average rank	Education, Skills and Training - Rank of average rank	Education, Skills and Training - Average score	Education, Skills and Training - Rank of average score	Education, Skills and Training - Proportion of LSOAs in most deprived 10% nationally	Education, Skills and Training - Rank of proportion of LSOAs in most deprived 10% nationally
0	1	E06000001	Hartlepool	None	447157	531476	-1.27023	54.676159	9.359786e+07	69382.685924	MULTIPOLYGON (((447097.001 537152.001, 447228....	E06000001	Hartlepool	20101.48	72	30.510	47	0.2069	37
1	2	E06000002	Middlesbrough	None	451141	516887	-1.21099	54.544670	5.387900e+07	42085.584812	MULTIPOLYGON (((449861.900 521260.700, 449853....	E06000002	Middlesbrough	22728.01	24	40.640	3	0.4419	1
2	3	E06000003	Redcar and Cleveland	None	464359	519597	-1.00611	54.567520	2.448388e+08	96189.660709	MULTIPOLYGON (((455776.701 528322.499, 455684....	E06000003	Redcar and Cleveland	19185.28	95	27.875	71	0.1818	54
3	4	E06000004	Stockton-on-Tees	None	444937	518183	-1.30669	54.556911	2.049366e+08	115439.477112	MULTIPOLYGON (((444126.099 528005.799, 444165....	E06000004	Stockton-on-Tees	16660.09	150	24.637	110	0.1750	59
4	5	E06000005	Darlington	None	428029	515649	-1.56835	54.535351	1.974820e+08	105799.568559	POLYGON ((423475.701 524731.596, 423497.204 52...	E06000005	Darlington	16385.06	155	22.569	129	0.1385	75

Example of plotting boundary overlaid on previous map (this will probably clobber the tooltip/popup; choropleth should be plotted first or on a lower layer):

In [22]:

folium.Choropleth(gdf[gdf['lad16nm']=='Milton Keynes'], key_on='feature.properties.lad16cd',
                  data=df['Education'],
                  columns=['Local Authority District code (2013)',
                           'Education, Skills and Training - Rank of average rank'],
            fill_color='YlOrBr').add_to(m)
m

/usr/local/lib/python3.7/site-packages/pyproj/crs.py:77: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method.
  return _prepare_from_string(" ".join(pjargs))

Out[22]:

Here's the start of a simple recipe for finding areas that neighbour a particular area:

In [ ]:

#Via https://gis.stackexchange.com/a/300262/119781
 
def plotNeighbours(gdf, region='Milton Keynes',
                   indicator='Education, Skills and Training - Rank of average rank',
                   cmap='OrRd'):
    ''' Plot choropleth for an indicator relative to a specified region and its neighbours. '''
 
    targetBoundary = gdf[gdf['lad16nm']==region]['geometry'].values[0]
    neighbours = gdf.apply(lambda row: row['geometry'].touches(targetBoundary) or row['geometry']==targetBoundary ,
                           axis=1)
    # neighbours is a gdf that contains areas that neighbour the specified area
    # Now plot these etc...

Tidy Up¶

Close the database connection:

In [8]:

con.close()