MK Spending Data

Pulling together spending data from Milton Keynes Council.

We need to start by pulling down the data files and getting the data into a single data table.

Spending Data Prior to 2014/15

In [8]:
!mkdir mkdata
In [6]:
import requests, re, urllib.request
from bs4 import BeautifulSoup


def linkgrabber_MK(response):
    soup=BeautifulSoup(response.content)

    results=[]

    #Get the list of results
    resultsblock=soup.find('div',{'class':'content right'})

    for result in resultsblock.findAll('a',text=re.compile('^Council Spend Data.*')):
        csvurl=result.attrs['href']
        fn=result.attrs['href'].split('/')[-1]
        results.append((csvurl,fn))
    return results

def harvester(links,path='.'):
    for csvurl,fn in links:
        print('Grabbing {0} as {1}'.format(csvurl, fn))
        urllib.request.urlretrieve(csvurl.strip('/'), '/'.join([path,fn]))
In [9]:
url="http://www.milton-keynes.gov.uk/your-council-and-elections/council-information-and-accounts/data-performance-and-spending/Milton-Keynes-Council-spend-archive"
html2= requests.get(url)
cc=linkgrabber_MK(html2)
harvester(cc,'mkdata')
Grabbing http://www.milton-keynes.gov.uk/assets/attach/18898/03-2014.csv as 03-2014.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/18178/02-2014.csv as 02-2014.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/17750/01-2014.csv as 01-2014.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/17149/12-2013.csv as 12-2013.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/16719/11-2013.csv as 11-2013.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/16333/10-2013.csv as 10-2013.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/15845/09-2013.csv as 09-2013.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/15393/Spend%20data%2008-2013.csv as Spend%20data%2008-2013.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/15236/07-2013.csv as 07-2013.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/15038/06-2013.csv as 06-2013.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/14869/05-2013(1).csv as 05-2013(1).csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/14535/04-2013.csv as 04-2013.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/14287/03-2013.csv as 03-2013.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/14069/02-2013.csv as 02-2013.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/13688/01_2013.csv as 01_2013.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/13291/12-2012.csv as 12-2012.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/13011/11-2012.csv as 11-2012.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/12815/Council_Spend_Data_October_2012.csv as Council_Spend_Data_October_2012.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/12569/Council_Spend_September_2012.csv as Council_Spend_September_2012.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/12410/August_Spend_2012.csv as August_Spend_2012.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/12085/Council_Spend_Data_July_2012.csv as Council_Spend_Data_July_2012.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/11674/Council_Spend_Data_June_2012.csv as Council_Spend_Data_June_2012.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/11384/05-2012.csv as 05-2012.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/11055/04-2012.csv as 04-2012.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/10857/10-2011_to_03-2012.csv as 10-2011_to_03-2012.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/10856/01-2011_to_09-2011.csv as 01-2011_to_09-2011.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/14534/03-2011(1).csv as 03-2011(1).csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/14533/02-2011(1).csv as 02-2011(1).csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/14532/01-2011(2).csv as 01-2011(2).csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/7023/12-2010.csv as 12-2010.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/7022/11-2010.csv as 11-2010.csv
Grabbing http://www.milton-keynes.gov.uk/assets/attach/7021/10-2010.csv as 10-2010.csv
In [14]:
def linkgrabber_MK2(response):
    soup=BeautifulSoup(response.content)

    results=[]

    #Get the list of results
    resultsblock=soup.find('ul',{'class':'documents'})

    for result in resultsblock.findAll('a',href=re.compile('.*\.csv$')):
        csvurl=result.attrs['href']
        csvurl='http://www.milton-keynes.gov.uk/'+csvurl
        fn=result.attrs['href'].split('/')[-1]
        results.append((csvurl,fn))
    return results
In [15]:
url='http://www.milton-keynes.gov.uk/your-council-and-elections/council-information-and-accounts/data-performance-and-spending/milton-keynes-council-spend'
html2= requests.get(url)
cc=linkgrabber_MK2(html2)
harvester(cc,'mkdata')
Grabbing http://www.milton-keynes.gov.uk//assets/attach/22476/11-2014.csv as 11-2014.csv
Grabbing http://www.milton-keynes.gov.uk//assets/attach/22178/10-2014.csv as 10-2014.csv
Grabbing http://www.milton-keynes.gov.uk//assets/attach/21862/09-2014.csv as 09-2014.csv
Grabbing http://www.milton-keynes.gov.uk//assets/attach/21228/08-2014.csv as 08-2014.csv
Grabbing http://www.milton-keynes.gov.uk//assets/attach/20768/07-2014.csv as 07-2014.csv
Grabbing http://www.milton-keynes.gov.uk//assets/attach/20306/06-2014.csv as 06-2014.csv
Grabbing http://www.milton-keynes.gov.uk//assets/attach/19870/05-2014.csv as 05-2014.csv
Grabbing http://www.milton-keynes.gov.uk//assets/attach/19433/04-2014.csv as 04-2014.csv

Spending Data for Current Financial Year (2014/15)

In [16]:
!head mkdata/04-2014.csv








Filetype Audit

In [18]:
#derived from /usr/local/bin/chardetect.py
import os
from chardet.universaldetector import UniversalDetector
def sniffer(fname):
    file=open(fname, 'rb')
    u = UniversalDetector()
    for line in file:
        u.feed(line)
    u.close()
    result = u.result
    if result['encoding']:
        return (result['encoding'], result['confidence'])

for fname in os.listdir('mkdata'):
    if fname.endswith('csv'):
        print(fname,sniffer('mkdata/'+fname))
01-2011(2).csv ('ascii', 1.0)
01-2011_to_09-2011.csv ('ascii', 1.0)
01-2014.csv ('ascii', 1.0)
01_2013.csv ('ascii', 1.0)
02-2011(1).csv ('ascii', 1.0)
02-2013.csv ('ascii', 1.0)
02-2014.csv ('ascii', 1.0)
03-2011(1).csv ('ascii', 1.0)
03-2013.csv ('ascii', 1.0)
03-2014.csv ('ascii', 1.0)
04-2012.csv ('ascii', 1.0)
04-2013.csv ('ascii', 1.0)
04-2014.csv ('ascii', 1.0)
05-2012.csv ('ascii', 1.0)
05-2013(1).csv ('ascii', 1.0)
05-2014.csv ('ascii', 1.0)
06-2013.csv ('ascii', 1.0)
06-2014.csv ('ascii', 1.0)
07-2013.csv ('ascii', 1.0)
07-2014.csv ('ascii', 1.0)
08-2014.csv ('ascii', 1.0)
09-2013.csv ('ascii', 1.0)
09-2014.csv ('ISO-8859-2', 0.7888398211101958)
10-2010.csv ('ascii', 1.0)
10-2011_to_03-2012.csv ('ascii', 1.0)
10-2013.csv ('ascii', 1.0)
10-2014.csv ('ascii', 1.0)
11-2010.csv ('ascii', 1.0)
11-2012.csv ('ascii', 1.0)
11-2013.csv ('ascii', 1.0)
11-2014.csv ('ascii', 1.0)
12-2010.csv ('ascii', 1.0)
12-2012.csv ('ascii', 1.0)
12-2013.csv ('ascii', 1.0)
August_Spend_2012.csv ('ascii', 1.0)
Council_Spend_Data_July_2012.csv ('ascii', 1.0)
Council_Spend_Data_June_2012.csv ('ascii', 1.0)
Council_Spend_Data_October_2012.csv ('ascii', 1.0)
Council_Spend_September_2012.csv ('ascii', 1.0)
Spend%20data%2008-2013.csv ('ascii', 1.0)

Create a Single Data Table

In [ ]:
import pandas as pd
In [26]:
dd=pd.DataFrame()
for fname in os.listdir('mkdata/'):
    enc,cert=sniffer('mkdata/'+fname)
    dd=pd.concat([dd,pd.read_csv('mkdata/'+fname,encoding=enc)])
In [27]:
dd.to_csv('mkSpendingData.csv',index=False)

Cast Appropriate Data Types

In [28]:
dd.dtypes
Out[28]:
Body name                           object
Service area categorisation         object
Service division categorisation     object
Expenses type                       object
Date                                object
Transaction number                   int64
Amount                             float64
Supplier name                       object
dtype: object
In [29]:
from dateutil.parser import *
dd['Date']=dd['Date'].apply(parse)

Start to Look at the Data

Quick preview of what the datatable looks like:

In [39]:
dd[:5]
Out[39]:
Body name Service area categorisation Service division categorisation Expenses type Date Transaction number Amount Supplier name Cumul Amount Financial Year
0 Milton Keynes Council Environment Infrastructure Supplies and services 2010-01-10 5100473631 8432.64 A1 Security Home Counties (Uk) Ltd 8432.64 2009
1 Milton Keynes Council CHILDREN AND YOUNG PEOPLE'S SERVICE Early Years and Extended Services Premises-related expenditure 2010-01-10 5100470016 75.00 Acorn (M K) Nurseries 8507.64 2009
2 Milton Keynes Council CHILDREN AND YOUNG PEOPLE'S SERVICE Early Years and Extended Services Premises-related expenditure 2010-01-10 5100470014 350.00 Acorn (M K) Nurseries 8857.64 2009
3 Milton Keynes Council CHILDREN AND YOUNG PEOPLE'S SERVICE Early Years and Extended Services Premises-related expenditure 2010-01-10 5100470017 75.00 Acorn (M K) Nurseries 8932.64 2009
4 Milton Keynes Council Community Wellbeing-GF Head of Adult Social Care Employees 2010-01-10 5100473816 600.00 ACS Care Services Limited 9532.64 2009

Start by looking at accumulated totals.

In [30]:
from ggplot import *
In [32]:
dd.sort('Date',inplace=True)
dd['Cumul Amount'] = dd['Amount'].cumsum()
In [34]:
#Financial year models
#http://stackoverflow.com/a/26342880/454773
def getFiscalYear(dt):
    year = dt.year
    if dt.month<4: year -= 1
    return year
In [37]:
dd['Financial Year']= dd['Date'].apply(getFiscalYear)
In [43]:
dd['Cumul Amount Tax Year'] = dd.groupby('Financial Year')['Amount'].cumsum()
In [45]:
#I don't know (yet?!) how to cast an April-March tax year x-axis that would allow us to compare
# accumulated totals using differently coloured lines in the same chart.
# Also not that the incomplete 2014 tax year has an x-axis of fewer days than the other years...
ggplot(dd,aes(x='Date',y='Cumul Amount Tax Year'))+geom_line()+facet_wrap('Financial Year',scales="free_x")
Out[45]:
<ggplot: (-9223363294681154030)>
In [72]:
pd.options.display.float_format = '{:20,.2f}'.format

Let's look at totals by Service Area Categorisation:

In [79]:
servSpend=pd.pivot_table(dd, values='Amount', index=['Service area categorisation'], columns=['Financial Year'], aggfunc=np.sum)
servSpend
Out[79]:
Financial Year 2009 2010 2011 2012 2013 2014
Service area categorisation
Adult Social Care & Health Group nan 2,507,671.42 66,917,432.11 50,622,120.65 76,771,472.60 40,815,721.67
BBS nan nan nan nan nan 203,630.77
CHILDREN AND YOUNG PEOPLE'S SERVICE 1,481,308.65 19,530,427.28 2,868,301.07 nan nan nan
Capital PCs not allocated nan 612,216.00 1,191,209.76 880,015.12 822,128.10 539,744.52
Chief Executives 1.48 317,793.94 60,454.79 nan nan nan
Children & Families: Education, Effectiv nan 1,667,997.99 31,636,348.07 29,491,161.55 41,141,363.03 23,808,969.52
Children & Families: Integrated Support nan 1,196,990.11 22,621,653.85 22,436,458.02 30,706,208.94 18,541,475.99
Community Facilities Unit nan 155,514.68 6,706,288.32 11,342,114.06 437,578.74 nan
Community Wellbeing-GF 2,249,593.30 23,665,867.38 3,061,094.89 nan nan nan
Corporate Codes nan 32,981.50 2,997,220.00 53,883,296.53 26,704,333.75 14,228,367.00
Corporate Core nan 8,222.61 291,620.39 286,825.18 647,557.65 426,898.86
Debt Financing nan 11,998.00 735,093.15 722,527.40 1,074,312.42 709,355.60
Environment 895,065.20 24,735,756.19 5,320,645.98 nan nan nan
Finance & Risk 3,354,756.67 34,734,949.25 6,422,187.71 nan nan nan
HOUSING REVENUE ACCOUNT 163,061.58 16,796,584.03 1,852,262.94 nan nan nan
Highways & Transportation Group nan 1,470,833.03 27,779,408.21 23,530,954.98 49,991,082.73 27,460,049.83
Housing & Community Group nan 166,380.22 12,759,630.05 22,477,055.85 19,272,726.22 10,132,276.11
Housing Revenue Account nan 41,097.36 26,222,740.82 11,102,537.85 15,595,743.21 7,915,323.07
Neighbourhood Services Group nan 491,764.86 30,507,386.58 27,860,325.86 632,060.74 nan
Organisational Transformation Programme nan 39,026.38 39,954.74 3,821.84 nan nan
Planning & Strategic Transport nan nan nan nan 232,366.84 4,293,981.23
Planning & Transport nan nan nan nan 505,401.39 4,118,557.54
Planning, Economy & Development Group nan 19,827.39 2,382,212.36 2,875,179.69 18,770,095.17 5,299,982.97
Profit Centres No Longer in use nan nan nan 256,297.93 293,838.52 nan
Public Health nan nan nan 197,454.41 6,479,471.47 4,864,291.54
Regulatory Unit nan 33,756.20 2,978,481.19 224,546.58 nan nan
Resources nan 6,165,323.55 1,824,799.93 nan nan nan
Resources: Public Access Group nan 361,435.29 6,098,724.27 8,757,760.53 8,173,702.20 3,941,619.20
Resources:Finance,HR & Governance Group nan 12,065,805.46 111,422,782.02 47,384,457.88 46,524,010.08 28,219,036.37
Strategy & Partnerships 236,446.90 2,194,903.19 300,030.00 nan nan nan
Tariff Projects nan nan nan nan 47,775.00 912,063.14

Do we trust that the nan values are based on no transactions recorded in those areas? Check by listing the unique tax years associated with spending items in each category:

In [76]:
dd.groupby(['Service area categorisation'])['Financial Year'].unique()
Out[76]:
Service area categorisation
Adult Social Care & Health Group            [2010, 2011, 2012, 2013, 2014]
BBS                                                                 [2014]
CHILDREN AND YOUNG PEOPLE'S SERVICE                     [2009, 2010, 2011]
Capital PCs not allocated                   [2010, 2011, 2012, 2013, 2014]
Chief Executives                                        [2009, 2010, 2011]
Children & Families: Education, Effectiv    [2010, 2011, 2012, 2013, 2014]
Children & Families: Integrated Support     [2010, 2011, 2012, 2013, 2014]
Community Facilities Unit                         [2010, 2011, 2012, 2013]
Community Wellbeing-GF                                  [2009, 2010, 2011]
Corporate Codes                             [2010, 2011, 2012, 2013, 2014]
Corporate Core                              [2010, 2011, 2012, 2013, 2014]
Debt Financing                              [2010, 2011, 2012, 2013, 2014]
Environment                                             [2009, 2010, 2011]
Finance & Risk                                          [2009, 2010, 2011]
HOUSING REVENUE ACCOUNT                                 [2009, 2010, 2011]
Highways & Transportation Group             [2010, 2011, 2012, 2013, 2014]
Housing & Community Group                   [2010, 2011, 2012, 2013, 2014]
Housing Revenue Account                     [2010, 2011, 2012, 2013, 2014]
Neighbourhood Services Group                      [2010, 2011, 2012, 2013]
Organisational Transformation Programme                 [2010, 2011, 2012]
Planning & Strategic Transport                                [2013, 2014]
Planning & Transport                                          [2013, 2014]
Planning, Economy & Development Group       [2010, 2011, 2012, 2013, 2014]
Profit Centres No Longer in use                               [2012, 2013]
Public Health                                           [2012, 2013, 2014]
Regulatory Unit                                         [2010, 2011, 2012]
Resources                                                     [2010, 2011]
Resources: Public Access Group              [2010, 2011, 2012, 2013, 2014]
Resources:Finance,HR & Governance Group     [2010, 2011, 2012, 2013, 2014]
Strategy & Partnerships                                 [2009, 2010, 2011]
Tariff Projects                                               [2013, 2014]
Name: Financial Year, dtype: object
In [ ]: