360 Giving Grant Navigator Data Conversation

Earlier today, via @OwenBoswarva, I noticed this post announcing A big day for charity data and the release of "details of more than a quarter of a million different grants covering the past 20 years" by 360 Giving.

The site aggrgegates grant funding donations from a variety of sources and makes them available via an online database and a downloadable CSV document containing about 230,000 rows.

In [184]:
!wget http://lin-360giving.aptivate.org/static/uploads/all_grants.csv.gz -P data
!gunzip "data/all_grants.csv.gz" -d data
--2014-08-14 18:35:54--  http://lin-360giving.aptivate.org/static/uploads/all_grants.csv.gz
Resolving lin-360giving.aptivate.org (lin-360giving.aptivate.org)... 85.159.210.79
Connecting to lin-360giving.aptivate.org (lin-360giving.aptivate.org)|85.159.210.79|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 22082694 (21M) [application/x-gzip]
Saving to: 'data/all_grants.csv.gz'

100%[======================================>] 22,082,694  3.06MB/s   in 6.9s   

2014-08-14 18:36:06 (3.04 MB/s) - 'data/all_grants.csv.gz' saved [22082694/22082694]

gzip: data is a directory -- ignored
In [1]:
import pandas as pd
In [185]:
!ls data
README.md  all_grants.csv
In [231]:
df=pd.read_csv('data/all_grants.csv', dtype={'title':object},
               parse_dates=['applicationdate_startdate', 'awarddate_startdate', 'planneddates_startdate','planneddates_enddate'] )
In [232]:
df[:3]
Out[232]:
id title description fundingorganization_id fundingorganization_name recipientorganization_id recipientorganization_name currency totalamountappliedfor totalamountawarded recipientorganization_charitynumber recipientorganization_companynumber applicationdate_startdate awarddate_startdate planneddates_startdate planneddates_enddate
0 360G-biglottery:f282eb6b-4200-44d9-96a4-9d09aa... NaN This tennis club will work with six local scho... 360G-biglottery:Joint Pot Big Lottery 360G-biglottery:Acklam Tennis Club Acklam Tennis Club NaN NaN 4203 NaN NaN NaT 2008-02-20 NaT NaT
1 360G-biglottery:f282f0e4-77fc-4b21-bff9-d0519d... NaN This school provides educational activities fo... 360G-biglottery:Big Lottery Fund Big Lottery 360G-biglottery:Holy Cross RC Primary School Holy Cross RC Primary School NaN NaN 5972 NaN NaN NaT 2007-03-15 NaT NaT
2 360G-biglottery:f2828b86-14a3-4557-a85b-c69263... NaN This group in Croydon will provide training, h... 360G-biglottery:Joint Pot Big Lottery 360G-biglottery:African Youth Development Asso... African Youth Development Association NaN NaN 10000 NaN NaN NaT 2007-12-10 NaT NaT
In [233]:
df['title'].unique()
Out[233]:
array([nan], dtype=object)
In [191]:
#Who are the funders?
funders=df['fundingorganization_name'].unique().tolist()
funders
Out[191]:
['Big Lottery',
 'Arts Council England',
 'Sport England',
 'Nesta',
 'TSB',
 'Wellcome Trust',
 'Arts Council Wales',
 'Sport Northern Ireland',
 'DSDNI',
 'Sport Wales',
 'Creative Scotland',
 'Nominet Trust']
In [238]:
df[df['fundingorganization_name'].str.contains('Rock')]
Out[238]:
id title description fundingorganization_id fundingorganization_name recipientorganization_id recipientorganization_name currency totalamountappliedfor totalamountawarded recipientorganization_charitynumber recipientorganization_companynumber applicationdate_startdate awarddate_startdate planneddates_startdate planneddates_enddate

So this makes me think that the data from the Gateway to Research is missing and could be added in to the mix, if they make bulk downloads available (or if we scrape it..)? Also, the TSB data at least - which is already included in the 360 Giving data - would have to be deduped.

In [198]:
#Which organisations are the most 20 successful recipents in terms of number of grants awarded?
df.groupby('recipientorganization_name').size().order(ascending=False)[:20]
Out[198]:
recipientorganization_name
Early Years Team                              2476
The Ulster Scots Agency                       2281
VCU                                           2036
Arts & creativity Branch                      1546
NIHE                                          1513
Sport NI                                      1068
UNIVERSITY OF OXFORD                           805
UNIVERSITY OF CAMBRIDGE                        634
Arts Council                                   599
UNIVERSITY COLLEGE LONDON                      553
University of Oxford                           514
NWDO                                           449
UNIVERSITY OF EDINBURGH                        429
Policing and Community Safety Partnership      402
University of Cambridge                        390
Foras na Gaeilge                               374
Youth Justice Agency                           363
University College London                      362
BRO West Team                                  355
UNIVERSITY OF MANCHESTER                       313
dtype: int64

Straight away we see a data cleanliness issue in terms of the different capitalisation of the same recipients. I'd also hazard a guess that there may be whitespace issues. Let's do a bit of tidying:

In [199]:
#Strip any whitespace around the name
df['recipientorganization_name'] = df['recipientorganization_name'].str.strip()
#Generate a new column with names in uppercase
df['recipientorganization_name_upper'] = df['recipientorganization_name'].str.upper()
In [214]:
#So which organisations are actually the most 20 successful recipents in terms of number of grants awarded?
df.groupby('recipientorganization_name_upper').size().order(ascending=False)[:20]
Out[214]:
recipientorganization_name_upper
EARLY YEARS TEAM                             2476
THE ULSTER SCOTS AGENCY                      2281
VCU                                          2036
ARTS & CREATIVITY BRANCH                     1546
NIHE                                         1513
UNIVERSITY OF OXFORD                         1436
UNIVERSITY OF CAMBRIDGE                      1142
SPORT NI                                     1068
UNIVERSITY COLLEGE LONDON                    1051
UNIVERSITY OF EDINBURGH                       717
ARTS COUNCIL                                  599
UNIVERSITY OF MANCHESTER                      585
UNIVERSITY OF GLASGOW                         468
NWDO                                          449
UNIVERSITY OF BRISTOL                         440
POLICING AND COMMUNITY SAFETY PARTNERSHIP     402
UNIVERSITY OF LIVERPOOL                       399
IMPERIAL COLLEGE LONDON                       386
FORAS NA GAEILGE                              374
YOUTH JUSTICE AGENCY                          363
dtype: int64
In [208]:
#Now let's see which are the most 20 successful organisations in terms of the total amounts awarded?
orgs=df[['recipientorganization_name_upper','totalamountawarded']].groupby('recipientorganization_name_upper')
orgs.sum().sort('totalamountawarded',ascending=False)[:10]
Out[208]:
totalamountawarded
recipientorganization_name_upper
NIHE 7.133095e+08
UNIVERSITY OF OXFORD 6.382125e+08
UNIVERSITY OF CAMBRIDGE 4.605349e+08
UNIVERSITY COLLEGE LONDON 4.135489e+08
UNIVERSITY OF EDINBURGH 2.429283e+08
IMPERIAL COLLEGE LONDON 1.588882e+08
UNIVERSITY OF MANCHESTER 1.527562e+08
UNIVERSITY OF DUNDEE 1.501241e+08
KING'S COLLEGE LONDON 1.310204e+08
UNIVERSITY OF BRISTOL 1.160328e+08
In [212]:
#So the universities are getting lots of grants and lots of income. How about if we try to separate out "winners" by funder?
#First in terms of number of grants received
df.groupby(['fundingorganization_name','recipientorganization_name_upper']).size().order(ascending=False)[:30]
Out[212]:
fundingorganization_name  recipientorganization_name_upper         
DSDNI                     EARLY YEARS TEAM                             2476
                          THE ULSTER SCOTS AGENCY                      2281
                          VCU                                          2036
                          ARTS & CREATIVITY BRANCH                     1546
                          NIHE                                         1513
Wellcome Trust            UNIVERSITY OF OXFORD                         1401
DSDNI                     SPORT NI                                     1068
Wellcome Trust            UNIVERSITY OF CAMBRIDGE                      1063
                          UNIVERSITY COLLEGE LONDON                     957
                          UNIVERSITY OF EDINBURGH                       684
DSDNI                     ARTS COUNCIL                                  599
Wellcome Trust            UNIVERSITY OF MANCHESTER                      499
DSDNI                     NWDO                                          449
Wellcome Trust            UNIVERSITY OF GLASGOW                         434
DSDNI                     POLICING AND COMMUNITY SAFETY PARTNERSHIP     402
Wellcome Trust            UNIVERSITY OF BRISTOL                         384
DSDNI                     FORAS NA GAEILGE                              374
                          YOUTH JUSTICE AGENCY                          363
                          BRO WEST TEAM                                 355
Wellcome Trust            MISCELLANEOUS ENGLISH PROVINCES               355
                          UNIVERSITY OF LIVERPOOL                       352
                          KING'S COLLEGE LONDON                         339
                          IMPERIAL COLLEGE LONDON                       312
                          UNIVERSITY OF DUNDEE                          304
DSDNI                     DERRY CITY COUNCIL                            293
                          OMAGH COUNCIL                                 287
                          EQUALITY & STRATEGY DIRECTORATE               276
                          BRO NORTH TEAM                                268
Wellcome Trust            IMPERIAL COLLEGE OF SCI TECH & MED            262
                          MISCELLANEOUS LONDON                          255
dtype: int64
In [215]:
#Which universities have received the most awards?
unis=df[df['recipientorganization_name_upper'].str.contains('UNIVERSITY')]
unis.groupby(['fundingorganization_name','recipientorganization_name_upper']).size().order(ascending=False)[:20]
Out[215]:
fundingorganization_name  recipientorganization_name_upper
Wellcome Trust            UNIVERSITY OF OXFORD                1401
                          UNIVERSITY OF CAMBRIDGE             1063
                          UNIVERSITY COLLEGE LONDON            957
                          UNIVERSITY OF EDINBURGH              684
                          UNIVERSITY OF MANCHESTER             499
                          UNIVERSITY OF GLASGOW                434
                          UNIVERSITY OF BRISTOL                384
                          UNIVERSITY OF LIVERPOOL              352
                          UNIVERSITY OF DUNDEE                 304
                          UNIVERSITY OF LEEDS                  238
                          UNIVERSITY OF BIRMINGHAM             212
                          CARDIFF UNIVERSITY                   170
                          UNIVERSITY OF LEICESTER              169
                          UNIVERSITY OF NOTTINGHAM             161
                          UNIVERSITY OF WARWICK                147
                          UNIVERSITY OF SHEFFIELD              139
                          UNIVERSITY OF NEWCASTLE              129
                          QUEEN MARY, UNIVERSITY OF LONDON     125
                          UNIVERSITY OF SOUTHAMPTON            121
                          UNIVERSITY OF YORK                   119
dtype: int64
In [219]:
#How much cash have these universities received?
uniFunds=unis[['fundingorganization_name','recipientorganization_name_upper','totalamountawarded']]
uniFunds.groupby(['fundingorganization_name','recipientorganization_name_upper']).sum().sort('totalamountawarded',ascending=False)[:10]
Out[219]:
totalamountawarded
fundingorganization_name recipientorganization_name_upper
Wellcome Trust UNIVERSITY OF OXFORD 6.277400e+08
UNIVERSITY OF CAMBRIDGE 4.432572e+08
UNIVERSITY COLLEGE LONDON 3.987253e+08
UNIVERSITY OF EDINBURGH 2.390989e+08
UNIVERSITY OF DUNDEE 1.482222e+08
UNIVERSITY OF MANCHESTER 1.389828e+08
UNIVERSITY OF GLASGOW 1.031193e+08
UNIVERSITY OF BRISTOL 1.026566e+08
UNIVERSITY OF LEEDS 6.190601e+07
UNIVERSITY OF SHEFFIELD 6.070813e+07
In [223]:
#Bored of the universities... how about limited companies?
ltd=df[df['recipientorganization_name_upper'].str.contains('LTD') | df['recipientorganization_name_upper'].str.contains('LIMITED')]
ltd.groupby(['fundingorganization_name','recipientorganization_name_upper']).size().order(ascending=False)[:20]
Out[223]:
fundingorganization_name  recipientorganization_name_upper       
TSB                       TWI LIMITED                                66
                          C-TECH INNOVATION LIMITED                  39
                          CENTRE FOR PROCESS INNOVATION LIMITED      38
                          NPL MANAGEMENT LIMITED                     37
Arts Council Wales        CHAPTER (CARDIFF) LTD.                     32
Creative Scotland         HIGHLANDS AND ISLANDS ARTS LTD             30
TSB                       RICARDO UK LIMITED                         27
                          AIRBUS UK LIMITED                          26
Creative Scotland         DANCE BASE LIMITED                         25
TSB                       BUILDING RESEARCH ESTABLISHMENT LIMITED    25
Arts Council Wales        EARTHFALL DANCE LTD.                       24
TSB                       QINETIQ LIMITED                            23
Creative Scotland         CITIZENS’ THEATRE LIMITED                  23
TSB                       BAE SYSTEMS (OPERATIONS) LIMITED           22
Creative Scotland         HORSECROSS ARTS LIMITED                    21
Arts Council Wales        DAWNS TAN TAN DANCE LTD.                   21
Sport England             BLACK COUNTRY CONSORTIUM LTD               21
TSB                       AIRBUS OPERATIONS LIMITED                  21
                          GKN AEROSPACE SERVICES LIMITED             20
Creative Scotland         TRAVERSE THEATRE (SCOTLAND) LIMITED        20
dtype: int64
In [230]:
#And in terms of the cash received?
ltdFunds=ltd[['fundingorganization_name','recipientorganization_name_upper','totalamountawarded']]
ltdFunds.groupby(['fundingorganization_name','recipientorganization_name_upper']).sum().sort('totalamountawarded',ascending=False)[:15]
Out[230]:
totalamountawarded
fundingorganization_name recipientorganization_name_upper
TSB NATIONAL RENEWABLE ENERGY CENTRE LIMITED 93760524
Big Lottery SUSTRANS LIMITED 76104184
TSB CENTRE FOR PROCESS INNOVATION LIMITED 75132460
Big Lottery LIFE CHANGES (TRUSTEE) LIMITED 49338186
TSB AIRBUS UK LIMITED 37976726
Big Lottery FORCES IN MIND TRUSTEE LIMITED 34808423
TSB GKN AEROSPACE SERVICES LIMITED 30163228
Big Lottery LEGACY TRUST UK LIMITED 28850000
CATCH 22 CHARITY LIMITED 27636814
Sport England ECORYS UK LIMITED 27630000
TSB CELL THERAPY CATAPULT LIMITED 23388900
Big Lottery THE SOIL ASSOCIATION LIMITED 21549732
TSB AGUSTAWESTLAND LIMITED 18523006
Wellcome Trust UK BIOBANK LTD 18297438
TSB AIRBUS OPERATIONS LIMITED 17536265
In [224]:
#Who's most successful in terms of total number of grants received apart from the universities?
nunis=df[~df['recipientorganization_name_upper'].str.contains('UNIVERSITY')]
nunis.groupby(['fundingorganization_name','recipientorganization_name_upper']).size().order(ascending=False)[:20]
Out[224]:
fundingorganization_name  recipientorganization_name_upper         
DSDNI                     EARLY YEARS TEAM                             2476
                          THE ULSTER SCOTS AGENCY                      2281
                          VCU                                          2036
                          ARTS & CREATIVITY BRANCH                     1546
                          NIHE                                         1513
                          SPORT NI                                     1068
                          ARTS COUNCIL                                  599
                          NWDO                                          449
                          POLICING AND COMMUNITY SAFETY PARTNERSHIP     402
                          FORAS NA GAEILGE                              374
                          YOUTH JUSTICE AGENCY                          363
                          BRO WEST TEAM                                 355
Wellcome Trust            MISCELLANEOUS ENGLISH PROVINCES               355
                          KING'S COLLEGE LONDON                         339
                          IMPERIAL COLLEGE LONDON                       312
DSDNI                     DERRY CITY COUNCIL                            293
                          OMAGH COUNCIL                                 287
                          EQUALITY & STRATEGY DIRECTORATE               276
                          BRO NORTH TEAM                                268
Wellcome Trust            IMPERIAL COLLEGE OF SCI TECH & MED            262
dtype: int64
In [226]:
#Hmmm, councils... which have had most grants?
ccls=df[df['recipientorganization_name_upper'].str.contains('COUNCIL')]
ccls.groupby(['fundingorganization_name','recipientorganization_name_upper']).size().order(ascending=False)[:20]
Out[226]:
fundingorganization_name  recipientorganization_name_upper             
DSDNI                     ARTS COUNCIL                                     599
                          DERRY CITY COUNCIL                               293
                          OMAGH COUNCIL                                    287
                          BALLYMONEY COUNCIL                               192
                          NORTHERN IRELAND MUSEUM COUNCIL                  107
                          BELFAST CITY COUNCIL                              97
                          STRABANE DISTRICT COUNCIL                         93
                          DOWN DISTRICT COUNCIL                             91
                          LISBURN CITY COUNCIL                              88
Sport Wales               FLINTSHIRE COUNTY COUNCIL                         70
DSDNI                     MOYLE COUNCIL                                     69
                          COLERAINE COUNCIL                                 64
Wellcome Trust            MEDICAL RESEARCH COUNCIL                          63
DSDNI                     FERMANAGH DISTRICT COUNCIL                        48
                          ANTRIM BOROUGH COUNCIL                            46
Sport Wales               CONWY COUNTY BOROUGH COUNCIL                      43
                          THE COUNCIL OF THE CITY AND COUNTY OF CARDIFF     43
Creative Scotland         CITY OF EDINBURGH COUNCIL                         43
DSDNI                     COOKSTOWN BOROUGH COUNCIL                         43
Sport Wales               GWYNEDD COUNCIL                                   42
dtype: int64
In [228]:
#How about council success in terms of cash...?
cclFunds=ccls[['fundingorganization_name','recipientorganization_name_upper','totalamountawarded']]
cclFunds.groupby(['fundingorganization_name','recipientorganization_name_upper']).sum().sort('totalamountawarded',ascending=False)[:15]
Out[228]:
totalamountawarded
fundingorganization_name recipientorganization_name_upper
DSDNI ARTS COUNCIL 48512098.35
Big Lottery CORNWALL COUNCIL 27033958.00
FALKIRK COUNCIL 25744798.00
BELFAST CITY COUNCIL 25416214.00
TSB GLASGOW CITY COUNCIL 24053259.00
Big Lottery BIRMINGHAM VOLUNTARY SERVICE COUNCIL 18541998.00
Wellcome Trust MEDICAL RESEARCH COUNCIL 18126555.00
Sport England SPORTS COUNCIL TRUST COMPANY 16000000.00
Big Lottery BIRMINGHAM CITY COUNCIL 15172512.00
Sport Northern Ireland NORTH DOWN BOROUGH COUNCIL 14916784.19
Big Lottery STOCKPORT METROPOLITAN BOROUGH COUNCIL 12159922.00
WOLVERHAMPTON VOLUNTARY SECTOR COUNCIL 10422400.00
LIVERPOOL CITY COUNCIL 10116854.00
SCOTTISH COUNCIL FOR VOLUNTARY ORGANISATIONS 9826005.00
THE NATIONAL COUNCIL FOR VOLUNTARY ORGANISATIONS 9674505.00

Putting the Stuff into a Database

The following example shows one way of getting the data into a database that's already up and running...

In [2]:
#psql lets us return tables as dataframes
from pandas import read_sql_query as psql
#psqlx is a convenience fucntion for executing raw queries where we aren't so bothered about the response...
#Errors should still be thrown...
from pandas.io.sql import execute as psqlx

import pandas as pd
from sqlalchemy import create_engine
In [30]:
#Create the database connection for pandas dataframe operations
engine = create_engine('postgresql://test:[email protected]:5432/tm351test')
In [9]:
#This ia a helper routine to help find the maximum length of each field - cast as a string  - in the datafile
def colMaxLen(df,maxlens):
    for col in df.columns:
        cml=df[col].astype(str).map(len).max()
        if cml > maxlens[col]: maxlens[col]=cml
In [43]:
#Let's detect the maximum field lengths in the dataset so we can use it to help configue our database table
fname='data/all_grants.csv'
maxlens={}
for name in df.columns: maxlens[name]=0
            
chunks=pd.read_csv(fname,chunksize=10000)
for chunk in chunks:
    colMaxLen(chunk,maxlens)
maxlens
Out[43]:
{'planneddates_enddate': 3,
 'awarddate_startdate': 25,
 'currency': 3,
 'applicationdate_startdate': 3,
 'recipientorganization_charitynumber': 8,
 'recipientorganization_companynumber': 3,
 'fundingorganization_id': 38,
 'id': 62,
 'totalamountawarded': 11,
 'fundingorganization_name': 22,
 'recipientorganization_id': 96,
 'description': 512,
 'recipientorganization_name': 113,
 'planneddates_startdate': 3,
 'totalamountappliedfor': 3,
 'title': 3}
In [141]:
#If the table we're going to use already exists, get rid of it - we're going for a fresh start...
psqlx("DROP TABLE IF EXISTS charities360data;",engine)
Out[141]:
<sqlalchemy.engine.result.ResultProxy at 0x7fce63dc2b00>
In [142]:
#Here's where I create the table - gonna be lazy and make lots of things varchar for a quick start...
psqlx("CREATE TABLE charities360data ( planneddates_enddate VARCHAR (30), awarddate_startdate VARCHAR (30), currency VARCHAR (10), \
        applicationdate_startdate VARCHAR (30), recipientorganization_charitynumber VARCHAR (10), \
        recipientorganization_companynumber VARCHAR (10), fundingorganization_id VARCHAR (50), \
        id VARCHAR (75) PRIMARY KEY, totalamountawarded FLOAT, fundingorganization_name VARCHAR (30), \
        recipientorganization_id VARCHAR(100), description VARCHAR(1000), recipientorganization_name VARCHAR (150), \
        recipientorganization_name_upper VARCHAR (150), \
        planneddates_startdate VARCHAR (30), totalamountappliedfor FLOAT, title VARCHAR (100) );",engine)
Out[142]:
<sqlalchemy.engine.result.ResultProxy at 0x7fce63dc2438>
In [143]:
#Test the table
psql("SELECT * FROM charities360data LIMIT 1;", engine)
Out[143]:
planneddates_enddate awarddate_startdate currency applicationdate_startdate recipientorganization_charitynumber recipientorganization_companynumber fundingorganization_id id totalamountawarded fundingorganization_name recipientorganization_id description recipientorganization_name recipientorganization_name_upper planneddates_startdate totalamountappliedfor title
In [144]:
#This routine will load data in from a pandas dataframe, itself created from 10k rows at a time read in from the source CSV file
dateFields=[]#['applicationdate_startdate', 'awarddate_startdate', 'planneddates_startdate','planneddates_enddate']
chunks=pd.read_csv(fname,parse_dates=dateFields,chunksize=10000)
for chunk in chunks:
    #Just in case, strip of any whotespace around names
    for col in ['recipientorganization_name','fundingorganization_name']:
        chunk[col]=chunk[col].str.strip()
    #Another normalisation step: introduce a new column corresponding to organisation name in all upper case characters
    chunk['recipientorganization_name_upper']=chunk['recipientorganization_name'].str.upper()
    chunk.to_sql('charities360data', engine, index=False, if_exists='append')
In [145]:
#Quick query to see which organisation names appear the most
psql("SELECT  recipientorganization_name, COUNT(*) AS tally FROM charities360data \
        WHERE recipientorganization_name ILIKE '%%UNIVERSITY%%' \
        GROUP BY recipientorganization_name HAVING COUNT(*)>3  \
        ORDER BY tally DESC LIMIT 20;", engine)
Out[145]:
recipientorganization_name tally
0 UNIVERSITY OF OXFORD 809
1 UNIVERSITY OF CAMBRIDGE 643
2 University of Oxford 592
3 UNIVERSITY COLLEGE LONDON 556
4 University of Cambridge 499
5 University College London 495
6 UNIVERSITY OF EDINBURGH 432
7 UNIVERSITY OF MANCHESTER 316
8 University of Edinburgh 285
9 University of Manchester 269
10 UNIVERSITY OF GLASGOW 256
11 UNIVERSITY OF BRISTOL 222
12 University of Bristol 218
13 UNIVERSITY OF LIVERPOOL 217
14 University of Glasgow 211
15 UNIVERSITY OF DUNDEE 187
16 University of Liverpool 181
17 University of Leeds 155
18 University of Nottingham 151
19 UNIVERSITY OF LEEDS 148
In [154]:
#We note that the data appears to have the same organisation names in a variety of capitalisations...
#That's why we normalise them...
psql("SELECT  recipientorganization_name_upper, COUNT(*) AS tally, to_char(SUM(totalamountawarded),'99999999999D99') FROM charities360data \
        WHERE recipientorganization_name_upper ILIKE '%%UNIVERSITY%%' AND totalamountawarded != 'NaN' AND totalamountawarded>10000000 \
        GROUP BY recipientorganization_name_upper HAVING COUNT(*)>3  \
        ORDER BY tally DESC;", engine)
Out[154]:
recipientorganization_name_upper tally to_char
0 UNIVERSITY OF OXFORD 7 82317012.00
In [160]:
#Quick test query around the total amounts awarded to orgs by normalised name
psql("SELECT  recipientorganization_name_upper ,totalamountawarded FROM charities360data \
        WHERE  totalamountawarded >10000000 AND totalamountawarded != 'NaN' \
        AND  recipientorganization_name_upper ILIKE '%%UNIVERSITY%%' ORDER BY totalamountawarded DESC; ",engine)
Out[160]:
recipientorganization_name_upper totalamountawarded
0 UNIVERSITY COLLEGE LONDON 45000000
1 UNIVERSITY OF BRISTOL- NATIONAL COMPOSITES CENTRE 28000000
2 UNIVERSITY OF SHEFFIELD 16294595
3 UNIVERSITY OF MANCHESTER 15000000
4 UNIVERSITY OF OXFORD 14288026
5 UNIVERSITY OF MANCHESTER 14151671
6 UNIVERSITY OF KWAZULU NATAL 13526338
7 UNIVERSITY OF OXFORD 12548429
8 UNIVERSITY OF OXFORD 12000000
9 UNIVERSITY OF EDINBURGH 11200000
10 UNIVERSITY OF OXFORD 11111368
11 UNIVERSITY OF OXFORD 11046122
12 UNIVERSITY OF OXFORD 11000000
13 UNIVERSITY OF LEEDS 10400000
14 UNIVERSITY OF OXFORD 10323067
15 UNIVERSITY OF CAMBRIDGE 10188580
In [114]:
#By the by, how many unique upper case names are there?
psql("SELECT COUNT(*) FROM (SELECT DISTINCT recipientorganization_name_upper FROM charities360data) tmp",engine)
Out[114]:
count
0 122421
In [241]:
#One of the other tables I have in the same database is a list of company names and company numbers from Companies House
#Here's what each row looks like in that table...
psql("SELECT * from companydata limit 1",engine)
Out[241]:
companyname companynumber regaddress_careof regaddress_pobox regaddress_addressline1 regaddress_addressline2 regaddress_posttown regaddress_county regaddress_country regaddress_postcode companycategory companystatus countryoforigin incorporationdate
0 ! LTD 08209948 NaN NaN METROHOUSE 57 PEPPER ROAD HUNSLET LEEDS YORKSHIRE UNITED KINGDOM LS10 2RU Private Limited Company Active United Kingdom 2012-09-11
In [100]:
#Let's see how many names we an trivially join on between the 360Charities data and the Companies House data...
cnum=psql("SELECT DISTINCT ch.recipientorganization_name_upper, co.companynumber FROM charities360data ch JOIN companydata co  \
        ON ch.recipientorganization_name_upper = co.companyname ;",engine)
cnum[:10]
Out[100]:
recipientorganization_name_upper companynumber
0 100 BLACK MEN OF LONDON 04200429
1 100 BLACK WOMEN OF LONDON 07923522
2 10:10 06958799
3 101 PLAYGROUP 04945702
4 10RADIO CIC 06004254
5 11 TECH 18 05480166
6 170 COMMUNITY PROJECT 03416785
7 198 CONTEMPORARY ARTS AND LEARNING LTD 02369267
8 1ST 4 KIDZ COMMUNITY INTEREST COMPANY 07175119
9 1ST CALL LOCKOUTS LIMITED 04807947
In [103]:
#How many matches have we got?
len(cnum)
Out[103]:
15462
In [121]:
#We can bring in additional data about a company from it's company number using the OpenCorporates API

import json, requests
  
def openCorporatesLookupCompanyNumber(cnum):
    ''' Grab company data from OpenCorporates '''
    url='https://api.opencorporates.com/v0.3/companies/gb/{0}?format=json'.format(cnum)
    jsondata = json.loads(requests.get(url).text)
    return jsondata['results']

    
corpdata=openCorporatesLookupCompanyNumber('04200429')

#We can then do things like look the the list of officers
officers = pd.DataFrame( [ d['officer'] for d in corpdata['company']['officers'] ] )
officers
Out[121]:
end_date id name opencorporates_url position start_date uid
0 None 38555511 OLUKAYODELE ALAKE https://opencorporates.com/officers/38555511 director 2010-01-01 None
1 None 38555524 JONATHAN MASHE THOMAS https://opencorporates.com/officers/38555524 director 2007-07-09 None
2 2014-07-08 38555530 LENNY LAWRENCE https://opencorporates.com/officers/38555530 director 2010-03-01 None
3 2012-10-13 38555533 EDWARD ROYER https://opencorporates.com/officers/38555533 director 2007-07-09 None
4 None 38555536 KOLARELE SONAIKE https://opencorporates.com/officers/38555536 director 2005-10-24 None
5 2001-04-17 38555540 KEVIN BREWER https://opencorporates.com/officers/38555540 nominated secretary 2001-04-17 None
6 2004-01-01 38555541 LLOYD DIXON https://opencorporates.com/officers/38555541 secretary 2001-04-17 None
7 2007-07-09 38555543 KENNETH BARNES https://opencorporates.com/officers/38555543 director 2001-04-17 None
8 2005-10-24 38555545 GAIRY DAVIS https://opencorporates.com/officers/38555545 director 2004-07-01 None
9 2006-10-31 38555546 ANIETIE EKA https://opencorporates.com/officers/38555546 director 2005-10-24 None
10 2005-10-24 38555547 DAVID PETER OKORO https://opencorporates.com/officers/38555547 director 2004-07-01 None
11 2009-01-01 38555548 ANTHONY ROBERT WARNER https://opencorporates.com/officers/38555548 director 2007-07-09 None
12 2014-07-08 39746456 OLUKAYODELE ALAKE https://opencorporates.com/officers/39746456 secretary 2004-01-01 None
13 None 39746460 JONATHAN MASHE THOMAS https://opencorporates.com/officers/39746460 secretary 2007-07-09 None
In [134]:
#We can also search OpenCorporates officers by name
def openCorporatesOfficerSearch(officername):
    ''' Search for officers by exact match name '''
    url='https://api.opencorporates.com/v0.3/officers/search'
    params={'q':'"'+officername+'"'}
    jsondata = json.loads(requests.get(url,params=params).text)
    data=[]
    for d in jsondata['results']['officers']:
        tmp=d['officer']
        for i in ['jurisdiction_code','company_number']:
            tmp[i]=tmp['company'][i]
        tmp['company_name']=tmp['company']['name']
        tmp.pop('company',None)
        data.append(tmp)
    df= pd.DataFrame( data )
    return df


openCorporatesOfficerSearch("JONATHAN MASHE THOMAS")
Out[134]:
company_name company_number id jurisdiction_code name opencorporates_url position retrieved_at
0 MARSH ANALYTICS LTD 07057484 58417258 gb JONATHAN MASHE THOMAS https://opencorporates.com/officers/58417258 director 2014-06-25T14:08:25+00:00
1 AFRICAN & CARIBBEAN DIVERSITY 06513287 52442487 gb JONATHAN MASHE THOMAS https://opencorporates.com/officers/52442487 director 2014-08-12T19:10:55+00:00
2 100 BLACK MEN OF LONDON 04200429 39746460 gb JONATHAN MASHE THOMAS https://opencorporates.com/officers/39746460 secretary 2014-08-13T17:37:31+00:00
3 100 BLACK MEN OF LONDON 04200429 38555524 gb JONATHAN MASHE THOMAS https://opencorporates.com/officers/38555524 director 2014-08-13T17:37:31+00:00
In [104]:
#If we use the join to get a list of company numbers for companies in the 360 charities data,
#we can then look these up in the Companies House datatable to see what postcodes they correspond to,
#and then group and count on this basis.
#That is, for compnay matched charities, what are the most popular postcodes?
pcc=psql("SELECT regaddress_postcode, COUNT(*) FROM companydata \
            WHERE companynumber IN (SELECT DISTINCT co.companynumber  \
            FROM charities360data ch JOIN companydata co  \
            ON ch.recipientorganization_name_upper = co.companyname) \
            AND regaddress_postcode !='NaN' \
            GROUP BY regaddress_postcode HAVING COUNT(*)> 5 ORDER BY COUNT(*) DESC;",engine)
pcc
Out[104]:
regaddress_postcode count
0 EC4M 6YH 12
1 EC1V 4PW 12
2 IV1 1YN 10
3 N1 6AH 10
4 L2 2AH 9
5 CB1 2LA 9
6 E1 6AB 8
7 EC1V 4PY 8
8 N7 6PA 8
9 N9 0PZ 8
10 E8 3DL 8
11 CB4 0WS 8
12 BN1 3XG 7
13 EH7 5QY 7
14 SE8 4AG 7
15 TS10 4RF 7
16 NW1 0NH 7
17 L3 5TF 7
18 WC1H 9NA 6
19 BN44 3TN 6
20 BS15 8DB 6
21 E2 6HG 6
22 EC1R 3GA 6
23 EH2 2PR 6
24 G1 5HD 6
25 G1 5HZ 6
26 G2 1BA 6
27 HU19 2DL 6
28 KT15 2NX 6
29 L8 1XE 6
30 LE1 1RE 6
31 LS1 4ND 6
32 OX4 2HN 6
33 PH16 5BU 6
34 SE1 0EH 6

There may be several reasons why the same postcode appears to be associated with the several different organisations:

  • each postcode is associated with multiple postal addresses, so the organisations may actually have different addresses;
  • the companies may have been set up using a corporate services company that handles things like registered company address;
  • the companies may be part of a corporate group with the same registered address;
  • the companies may have registered addresses associated with a community centre or other form of civic hub.
In [106]:
#We can tunnel in on a particular postcode
mpc=psql("SELECT DISTINCT ch.fundingorganization_name , ch.recipientorganization_name_upper, co.companynumber, regaddress_addressline1 \
        FROM charities360data ch JOIN companydata co  \
        ON ch.recipientorganization_name_upper = co.companyname \
        WHERE co.regaddress_postcode ='HU19 2DL';",engine)
mpc
Out[106]:
fundingorganization_name recipientorganization_name_upper companynumber regaddress_addressline1
0 Big Lottery CHILDREN AND FAMILY ACTION 07612029 29-31 SEASIDE ROAD
1 Big Lottery SHORES TEAM LIMITED 06288967 THE SHORES RESOURCE CENTRE
2 Big Lottery SHORES COMMUNITY DINER LIMITED 06459499 29-31 SEASIDE ROAD
3 Sport England CHILDREN AND FAMILY ACTION 07612029 29-31 SEASIDE ROAD
4 Big Lottery SEASIDE RADIO LIMITED 05031695 29 Ă› 31 SEASIDE ROAD
5 Big Lottery SOUTHERN HOLDERNESS RESOURCE CENTRE 07271654 29-31 SEASIDE ROAD
6 Big Lottery SHORES HOMECARE LIMITED 06459496 29-31 SEASIDE ROAD
In [109]:
#Let's make it easier to run this sort of query...

def searchByCompanyPostcode(pc):
    mpc=psql("SELECT DISTINCT ch.fundingorganization_name, ch.recipientorganization_name_upper, co.companynumber, regaddress_addressline1 \
        FROM charities360data ch JOIN companydata co  \
        ON ch.recipientorganization_name_upper = co.companyname \
        WHERE co.regaddress_postcode ='"+pc+"';",engine)
    return mpc
In [110]:
searchByCompanyPostcode('KT15 2NX')
Out[110]:
fundingorganization_name recipientorganization_name_upper companynumber regaddress_addressline1
0 TSB THALES TRANSPORT AND SECURITY LIMITED 03132438 2 DASHWOOD LANG ROAD
1 TSB QUINTEC ASSOCIATES LIMITED 02375963 2 DASHWOOD LANG ROAD
2 TSB THALES RESEARCH & TECHNOLOGY (UK) LIMITED 00774298 2 DASHWOOD LANG ROAD
3 TSB THALES ATM LIMITED 03229755 2 DASHWOOD LANG ROAD
4 TSB THALES UK LIMITED 00868273 2 DASHWOOD LANG ROAD
5 TSB THALES AVIONICS LIMITED 00523160 2 DASHWOOD LANG ROAD
In [111]:
searchByCompanyPostcode('N9 0PZ')
Out[111]:
fundingorganization_name recipientorganization_name_upper companynumber regaddress_addressline1
0 Arts Council England ONE-TO-ONE (ENFIELD) 03324923 COMMUNITY HOUSE 311 FORE STREET
1 Big Lottery ENFIELD TURKISH CYPRIOT ASSOCIATION 03258314 COMMUNITY HOUSE
2 Sport England ONE-TO-ONE (ENFIELD) 03324923 COMMUNITY HOUSE 311 FORE STREET
3 Big Lottery ONE-TO-ONE (ENFIELD) 03324923 COMMUNITY HOUSE 311 FORE STREET
4 Big Lottery THE ENFIELD CLUBHOUSE LIMITED 04568108 COMMUNITY HOUSE (ROOM 10)
5 Big Lottery ENFIELD RACIAL EQUALITY COUNCIL 04527878 COMMUNITY HOUSE
6 Big Lottery ENFIELD SAHELI 06663604 COMMUNITY HOUSE 311 FORE STREET
7 Big Lottery ENFIELD DISABILITY ACTION 03937507 COMMUNITY HOUSE
8 Big Lottery GARGAAR SOMALI WELFARE ASSOCIATION 05100011 COMMUNITY HOUSE 311 FORE STREET
9 Big Lottery ENFIELD VOLUNTARY ACTION 03755382 COMMUNITY HOUSE

Bringing in Charities Commission Data

The Charities Commission don't currently publish details of their register in as a bulk download dataset, which means that to get the data we either need to scrape the site ourselves, or find an alternative source. One such source is OpenCharities, who make a scrape of the data available at Opencharities.org [I'm not sure how current the scrape is?].

In [4]:
#Grab the data file
!wget http://OpenCharities.org/charities.csv.zip -P data
    
#Uncompress it
!unzip data/charities.csv.zip -d data
--2014-08-18 10:57:31--  http://opencharities.org/charities.csv.zip
Resolving opencharities.org (opencharities.org)... 46.43.37.20
Connecting to opencharities.org (opencharities.org)|46.43.37.20|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 35713392 (34M) [application/zip]
Saving to: 'data/charities.csv.zip'

100%[======================================>] 35,713,392  4.78MB/s   in 7.4s   

2014-08-18 10:57:43 (4.59 MB/s) - 'data/charities.csv.zip' saved [35713392/35713392]

Archive:  data/charities.csv.zip
  inflating: data/charities.csv      
In [157]:
!head -n 3 data/charities.csv
title,charity_number,activities,contact_name,address,website,telephone,date_registered,date_removed,accounts_date,spending,income,company_number,openlylocal_url,twitter_account_name,facebook_account_name,youtube_account_name,feed_url,charity_classification_uids,signed_up_for_1010,last_checked,created_at,updated_at
ASSOCIATION FOR INDUSTRIAL ARCHAEOLOGY,277511,"THE AIA IS THE NATIONAL ORGANISATION FOR PEOPLE WHO SHARE AN INTEREST IN BRITAIN'S INDUSTRIAL PAST. IT BRINGS TOGETHER PEOPLE WHO ARE RESEARCHING, RECORDING, PRESERVING AND PRESENTING THE GREAT VARIETY OF THIS COUNTRY'S INDUSTRIAL HERITAGE.",MR DAVID DE HAAN MSC DIC FMA,"7 ST. MICHAELS CLOSE, MADELEY, TELFORD, TF7 5SD",http://www.industrial-archaeology.org,01952 435934,1979-08-22,,2010-12-31,,,1326854,http://OpenlyLocal.com/charities/1-ASSOCIATION-FOR-INDUSTRIAL-ARCHAEOLOGY,,,,,"102,307,207,308,112",false,2012-01-19T14:10:19+00:00,2010-08-22T20:38:01+01:00,2012-04-15T10:31:36+01:00
INTERNATIONAL SOCIETY OF CHEMOTHERAPY FOR  INFECTION AND CANCER,1029653,"OBJECTIVES ARE TO ADVANCE THE EDUCATION AND SCIENCE OF CHEMOTHERAPY. INTERNATIONAL CONGRESSES PROMOTING THE DEVELOPMENT OF CHEMOTHERAPY ARE HELD REGULARLY. ISC ENCOURAGES, SUPPORTS AND SPONSORS THE FORMATION OF INTERNATIONAL WORKING GROUPS FOR SPECIAL RESEARCH PROJECTS, COMMISSIONS, TRAINING PROJECTS AND THE USE OF CHEMOTHERAPEUTICS IN THE FIELD OF ANTIMICROBIAL AND ANTINEOPLASTIC CHEMOTHERAPY.",FIONA MACKENZIE,"MEDICAL MICROBIOLOGY, ABERDEEN ROYAL INFIRMARY, CORNHILL ROAD, ABERDEEN, AB25 2ZN",http://www.ischemo.org,01224 552127,1993-12-03,,2008-12-31,,,,http://OpenlyLocal.com/charities/2-INTERNATIONAL-SOCIETY-OF-CHEMOTHERAPY-FOR-INFECTION-AND-CANCER,,,,,"304,302,306,301,308,206,102,207,103",false,2010-09-17T15:39:54+01:00,2010-08-22T20:38:01+01:00,2012-04-15T10:56:52+01:00
In [10]:
#Find out how wide each daata field is to help us set up the database table
fname='data/charities.csv'

df=pd.read_csv(fname)

maxlens={}
for name in df.columns: maxlens[name]=0
            
chunks=pd.read_csv(fname,chunksize=10000)
for chunk in chunks:
    colMaxLen(chunk,maxlens)
maxlens
Out[10]:
{'charity_classification_uids': 135,
 'openlylocal_url': 189,
 'charity_number': 11,
 'income': 11,
 'created_at': 25,
 'contact_name': 146,
 'address': 194,
 'youtube_account_name': 255,
 'company_number': 20,
 'twitter_account_name': 255,
 'title': 150,
 'spending': 11,
 'accounts_date': 10,
 'updated_at': 25,
 'website': 129,
 'date_removed': 10,
 'last_checked': 25,
 'activities': 445,
 'feed_url': 163,
 'date_registered': 10,
 'telephone': 35,
 'signed_up_for_1010': 5,
 'facebook_account_name': 136}
In [74]:
#If the table we're going to use already exists, get rid of it - we're going for a fresh start...
psqlx("DROP TABLE IF EXISTS charitiescommission",engine)
Out[74]:
<sqlalchemy.engine.result.ResultProxy at 0x7f7b02e18320>
In [75]:
#Postgresql defaults to lower caps - we can force capitalisation on a table name by double quoting it
#psqlx('DROP TABLE IF EXISTS "charitiesCommission"',engine)
In [76]:
#Create the table using fields a bit bigger than the max width
sql="CREATE TABLE charitiescommission ("
colstr='charity_name_upper'+" VARCHAR ("+str(maxlens['title']+5)+"), pcode_guess VARCHAR (10)"
for i in maxlens:
    tmp=i+" VARCHAR ("+str(maxlens[i]+5)+")"
    colstr=", ".join([colstr,tmp])
        
psqlx( sql +colstr +");",engine)
Out[76]:
<sqlalchemy.engine.result.ResultProxy at 0x7f7afa46c320>
In [77]:
#Get the data into the postgresql database, via a pandas dataframe
chunks=pd.read_csv(fname,chunksize=10000, dtype ={'charity_number':str})
for chunk in chunks:
    for col in ['title']:
        chunk[col]=chunk[col].str.strip()
    #I'm going to add another column which is a guess at the postcode extracted from the address
    chunk['pcode_guess']=chunk['address'].str.extract('([A-Z]{1,2}[0-9]{1,2} [0-9][A-Z]{2})')
    #Let's also get a normalised, upper case variant of the charity name
    chunk['charity_name_upper']=chunk['title'].str.upper()
    chunk.to_sql('charitiescommission', engine, index=False, if_exists='append')
In [95]:
#A quick test query to see what we can join with the 360 charities data using charity name...
chcnum=psql("SELECT DISTINCT ch.recipientorganization_name_upper, cc.charity_number, cc.company_number, cc.pcode_guess \
        FROM charities360data ch JOIN charitiescommission cc  \
        ON ch.recipientorganization_name_upper = cc.charity_name_upper ;",engine)
chcnum[:10]
Out[95]:
recipientorganization_name_upper charity_number company_number pcode_guess
0 100 BLACK MEN OF LONDON 1105904 4200429 NaN
1 101 PLAYGROUP 1101406 4945702.0 N8 9NH
2 101 PLAYGROUP 803462 NaN N8 9BG
3 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP 516163 NaN S35 4FP
4 1066 CHILDMINDING ASSOCIATION 1029817 NaN TN37 7RG
5 10TH BATH (COMBE DOWN) SCOUT GROUP 305613 NaN BA2 5EQ
6 10TH BIRKENHEAD SCOUT GROUP 520165 NaN CH63 8NZ
7 10TH OXFORD BOY SCOUT GROUP 304435 NaN OX29 8JZ
8 10TH ROYSTON SCOUT GROUP 1017455 NaN SG8 7XE
9 10TH SEVENOAKS BOY SCOUT GROUP 303442 NaN TN11 9BG
In [113]:
#How many matches were there?
len(chcnum)
Out[113]:
20649
In [96]:
#Are there any popular postcodes?
chcnum[['pcode_guess']].groupby(['pcode_guess']).size().order(ascending=False)[:20]

#As with populat postcodes for company registered addresses, there may be several reasons for popular charity address postcodes
Out[96]:
pcode_guess
NaN            1075
W10 5XL          11
N9 0PZ           10
E2 6HG           10
SE1 9BG           9
W6 9LP            9
BN1 3XG           9
CR4 3UD           9
N7 6LA            8
BS15 8DB          8
NE1 8XS           8
SE1 0EH           8
ME14 1HH          8
W13 9LA           7
W2 5ES            7
S1 4FW            7
E1 6AB            7
NE46 3NP          7
LA9 4PU           7
E2 9PL            6
dtype: int64
In [98]:
#Let's have a quick look at charities registered to one of those postcodes
psql("SELECT DISTINCT charity_name_upper,address FROM charitiescommission WHERE pcode_guess='NE46 3NP'",engine)
Out[98]:
charity_name_upper address
0 TYNEDALE WOMEN'S TRAINING GROUP TYNEDALE WOMEN'S TRAINING GROUP, COMMUNITY CEN...
1 HEXHAM COMMUNITY CENTRE HEXHAM COMMUNITY CENTRE, GILESGATE, HEXHAM, NO...
2 TYNEDALE CITIZENS' ADVICE BUREAU THE COMMUNITY CENTRE, GILESGATE, HEXHAM, NORTH...
3 WEST NORTHUMBERLAND CITIZENS ADVICE BUREAU COMMUNITY CENTRE, GILESGATE, HEXHAM, NE46 3NP
4 SIXTYEIGHTYTHIRTY COMMUNITY CENTRE, GILESGATE, HEXHAM, NORTHUMBE...
5 TYNEDALE VOLUNTARY ACTION TYNEDALE VOLUNTARY ACTION, HEXHAM COMMUNITY CE...
6 GATEWAY INTO THE COMMUNITY HEXHAM COMMUNITY CENTRE, GILESGATE, HEXHAM, NO...
7 HEXHAM AND TYNEDALE COMMUNITY TRUST HEXHAM COMMUNITY CENTRE, GILESGATE, HEXHAM, NO...
8 HEXHAM YOUTH INITIATIVE HEXHAM YOUTH INITIATIVE, THE COMMUNITY CENTRE,...
9 SIXTYEIGHTYTHIRTY COMMUNITY CENTRE, GILESGATE, HEXHAM, NE46 3NP
10 THE SAMLING FOUNDATION SAMLING, HEXHAM COMMUNITY CENTRE, GILESGATE, H...
11 THE SAMLING FOUNDATION HEXHAM COMMUNITY CENTRE, GILESGATE, HEXHAM, NO...
In [158]:
#Although the Charity Commission doesn't publish an API, Open Charities does... 
import json, requests

def opencharitiesLookup(id):
    url = 'http://opencharities.org/charities/{0}.json'.format(id)
    jsondata = json.loads(requests.get(url).text)
    return jsondata
In [136]:
#Lets see what we can back...
charityNum='202918'

jdata = opencharitiesLookup(charityNum)
jdata
Out[136]:
{'charity': {'financial_breakdown': {'assets': {'other_assets': '108400000',
    'total_liabilities': '-68600000',
    'long_term_investments': '3000000',
    'own_use_assets': '13100000'},
   'income': {'charitable': '166000000',
    'investment': '900000',
    'voluntary': '111500000',
    'other': '800000',
    'trading': '88700000',
    'investment_gains': '0'},
   'spending': {'investment_management': '100000',
    'governance': '1300000',
    'generating_voluntary_income': '23200000',
    'other': '1400000',
    'trading': '68600000',
    'charitable_activities': '290000000'}},
  'activities': "OXFAM'S OBJECTS ARE TO PREVENT AND RELIEVE POVERTY AND PROTECT THE VULNERABLE ANYWHERE IN THE WORLD. OXFAM FURTHERS ITS OBJECTS THROUGH INTERLINKED ACTIVITIES OF HUMANITARIAN RELIEF, DEVELOPMENT WORK AND ADVOCACY AND CAMPAIGNING. OXFAM IS AN AFFILIATE MEMBER OF OXFAM INTERNATIONAL. DETAILS OF OXFAM'S MISSION AND WAYS OF WORKING CAN BE FOUND AT WWW.OXFAM.ORG.UK",
  'updated_at': '2014-06-11T20:59:28Z',
  'income': None,
  'grants': [{'classification_id': None,
    'cost_centre': None,
    'transaction_type': 'LotteryGrant',
    'invoice_date': None,
    'description': 'Roars Not Whispers Is A 3-Year Youth Led Programme That Will Build The Capacity Of Young People To Represent Themselves And Others, Take Action On Issues Of Concern And Bring These Issues Further Up The Public And Political Agenda In Scotland. Oxfam And The Scottish Youth Parliament Will Deliver The Work With A Wide Range Of Other Youth Focused Groups Involved.  Young People From Every Local Authority Area In Scotland Will Be Trained As Peer Educators. They Will Then Pass On Skills To Other Young People In',
    'date_fuzziness': None,
    'uid': None,
    'created_at': '2010-09-23T09:51:14+01:00',
    'id': 243306,
    'supplier_id': 47408,
    'value': 1187948.0,
    'department_name': None,
    'date': '2006-06-30',
    'invoice_number': None,
    'source_url': 'http://www.lottery.culture.gov.uk/details.asp?ID=YSC/1/010201390&DBID=BL',
    'csv_line_number': None,
    'service': 'Health, Education, Environment and Charitable Expenditure',
    'updated_at': '2010-09-23T09:51:14+01:00'},
   {'classification_id': None,
    'cost_centre': None,
    'transaction_type': 'LotteryGrant',
    'invoice_date': None,
    'description': 'This project will work in partnership with Refugee Community Organisations (grass roots refugee and asylum seeker organisations) in Swansea, Newport and Cardiff to address the particular issues faced by refugee and asylum seeking women. This will include provision of drop-ins and activities to reduce social isolation. It will provide training to women to become champions for their community and advocates for better services to meet the specific needs of the target beneficiaries.',
    'date_fuzziness': None,
    'uid': None,
    'created_at': '2010-09-23T10:41:31+01:00',
    'id': 288237,
    'supplier_id': 47408,
    'value': 247411.0,
    'department_name': None,
    'date': '2008-04-25',
    'invoice_number': None,
    'source_url': 'http://www.lottery.culture.gov.uk/details.asp?ID=PPA/1/010248997&DBID=BL',
    'csv_line_number': None,
    'service': 'Health, Education, Environment and Charitable Expenditure',
    'updated_at': '2010-09-23T10:41:31+01:00'}],
  'area_of_benefit': 'NATIONAL AND OVERSEAS',
  'contact_name': 'MR JOSS SAUNDERS',
  'created_at': '2010-08-23T00:02:03+01:00',
  'annual_reports': [{'endowment_funds': 2600000,
    'grants_to_institutions': 46000000,
    'financial_year_end': '2007-04-30',
    'support_costs': 21300000,
    'voluntary_income': 147800000,
    'total_current_assets': 99200000,
    'created_at': '2010-10-26T22:44:03+01:00',
    'reserves': 48600000,
    'other_expenses': 800000,
    'income_from_endowments': 0,
    'cash': 75700000,
    'investment_gains': 0,
    'total_expenses': 297200000,
    'current_investment_assets': 0,
    'charity_only_accounts': None,
    'charitable_activities_costs': 211200000,
    'total_funds': 70700000,
    'investment_management_costs': 100000,
    'fixed_investment_assets_at_end_of_year': 2900000,
    'voluntary_income_costs': 23900000,
    'fixed_assets_at_end_of_year': 19800000,
    'other_income': 1100000,
    'long_term_creditors_or_provisions': 9900000,
    'annual_return_code': 'AR07',
    'employees': 5728,
    'consolidated_accounts': True,
    'fixed_assets_at_start_of_year': 20300000,
    'updated_at': '2010-10-26T22:44:03+01:00',
    'total_income': 290700000,
    'charity_id': 92548,
    'gains_from_asset_revaluations': 0,
    'financial_year_start': '2006-05-01',
    'depreciation': 4600000,
    'investment_income': 3300000,
    'income_from_charitable_activities': 62700000,
    'activities_generating_funds': 75800000,
    'creditors_within_1_year': 18300000,
    'fundraising_trading_costs': 60000000,
    'total_assets': 70700000,
    'gains_on_pension_fund': 2000000,
    'fixed_investment_assets_at_start_of_year': 2900000,
    'volunteers': 21000,
    'pension_assets': -20100000,
    'governance_costs': 1200000,
    'id': 103,
    'income_from_legacies': 12300000,
    'restricted_funds': 24500000,
    'unrestricted_funds': 43600000},
   {'endowment_funds': 2700000,
    'grants_to_institutions': 53400000,
    'financial_year_end': '2008-04-30',
    'support_costs': 25400000,
    'voluntary_income': 142100000,
    'total_current_assets': 99400000,
    'created_at': '2010-10-26T22:44:03+01:00',
    'reserves': 44600000,
    'other_expenses': 600000,
    'income_from_endowments': 0,
    'cash': 68300000,
    'investment_gains': 100000,
    'total_expenses': 298400000,
    'current_investment_assets': 0,
    'charity_only_accounts': None,
    'charitable_activities_costs': 212300000,
    'total_funds': 76900000,
    'investment_management_costs': 100000,
    'fixed_investment_assets_at_end_of_year': 3300000,
    'voluntary_income_costs': 23500000,
    'fixed_assets_at_end_of_year': 18500000,
    'other_income': 4800000,
    'long_term_creditors_or_provisions': 12100000,
    'annual_return_code': 'AR08',
    'employees': 4800,
    'consolidated_accounts': True,
    'fixed_assets_at_start_of_year': 19800000,
    'updated_at': '2010-10-26T22:44:03+01:00',
    'total_income': 299700000,
    'charity_id': 92548,
    'gains_from_asset_revaluations': 0,
    'financial_year_start': '2007-05-01',
    'depreciation': 4700000,
    'investment_income': 3600000,
    'income_from_charitable_activities': 71500000,
    'activities_generating_funds': 77700000,
    'creditors_within_1_year': 15900000,
    'fundraising_trading_costs': 60600000,
    'total_assets': 76900000,
    'gains_on_pension_fund': 4800000,
    'fixed_investment_assets_at_start_of_year': 2900000,
    'volunteers': 20000,
    'pension_assets': -13000000,
    'governance_costs': 1300000,
    'id': 104,
    'income_from_legacies': 11600000,
    'restricted_funds': 27400000,
    'unrestricted_funds': 46800000},
   {'endowment_funds': 2400000,
    'grants_to_institutions': 53900000,
    'financial_year_end': '2009-04-30',
    'support_costs': 28200000,
    'voluntary_income': 133900000,
    'total_current_assets': 90500000,
    'created_at': '2010-10-26T22:44:03+01:00',
    'reserves': 34700000,
    'other_expenses': 200000,
    'income_from_endowments': 0,
    'cash': 45700000,
    'investment_gains': 0,
    'total_expenses': 318600000,
    'current_investment_assets': 0,
    'charity_only_accounts': None,
    'charitable_activities_costs': 234400000,
    'total_funds': 52300000,
    'investment_management_costs': 100000,
    'fixed_investment_assets_at_end_of_year': 2900000,
    'voluntary_income_costs': 19800000,
    'fixed_assets_at_end_of_year': 17200000,
    'other_income': 4200000,
    'long_term_creditors_or_provisions': 11600000,
    'annual_return_code': 'AR09',
    'employees': 4624,
    'consolidated_accounts': True,
    'fixed_assets_at_start_of_year': 18500000,
    'updated_at': '2010-10-26T22:44:03+01:00',
    'total_income': 308300000,
    'charity_id': 92548,
    'gains_from_asset_revaluations': 0,
    'financial_year_start': '2008-05-01',
    'depreciation': 4500000,
    'investment_income': 2500000,
    'income_from_charitable_activities': 88400000,
    'activities_generating_funds': 79300000,
    'creditors_within_1_year': 19300000,
    'fundraising_trading_costs': 62900000,
    'total_assets': 52300000,
    'gains_on_pension_fund': -14300000,
    'fixed_investment_assets_at_start_of_year': 3300000,
    'volunteers': None,
    'pension_assets': -24500000,
    'governance_costs': 1200000,
    'id': 105,
    'income_from_legacies': 10500000,
    'restricted_funds': 24000000,
    'unrestricted_funds': 25900000},
   {'endowment_funds': 2500000,
    'grants_to_institutions': 54300000,
    'financial_year_end': '2010-03-31',
    'support_costs': 21500000,
    'voluntary_income': 125600000,
    'total_current_assets': 113100000,
    'created_at': '2011-03-05T22:23:09+00:00',
    'reserves': 38100000,
    'other_expenses': 2000000,
    'income_from_endowments': 100000,
    'cash': 66500000,
    'investment_gains': 0,
    'total_expenses': 294800000,
    'current_investment_assets': 0,
    'charity_only_accounts': None,
    'charitable_activities_costs': 216300000,
    'total_funds': 68300000,
    'investment_management_costs': 100000,
    'fixed_investment_assets_at_end_of_year': 3000000,
    'voluntary_income_costs': 17200000,
    'fixed_assets_at_end_of_year': 15600000,
    'other_income': 2800000,
    'long_term_creditors_or_provisions': 12800000,
    'annual_return_code': 'AR10',
    'employees': 4371,
    'consolidated_accounts': True,
    'fixed_assets_at_start_of_year': 17200000,
    'updated_at': '2011-03-05T22:23:09+00:00',
    'total_income': 318000000,
    'charity_id': 92548,
    'gains_from_asset_revaluations': 0,
    'financial_year_start': '2009-04-01',
    'depreciation': 4300000,
    'investment_income': 900000,
    'income_from_charitable_activities': 114300000,
    'activities_generating_funds': 74400000,
    'creditors_within_1_year': 18300000,
    'fundraising_trading_costs': 58100000,
    'total_assets': 68300000,
    'gains_on_pension_fund': -7200000,
    'fixed_investment_assets_at_start_of_year': 2900000,
    'volunteers': None,
    'pension_assets': -29300000,
    'governance_costs': 1100000,
    'id': 27676,
    'income_from_legacies': 13300000,
    'restricted_funds': 39300000,
    'unrestricted_funds': 26500000},
   {'endowment_funds': 2500000,
    'grants_to_institutions': 72500000,
    'financial_year_end': '2011-03-31',
    'support_costs': 26800000,
    'voluntary_income': 138400000,
    'total_current_assets': 120400000,
    'created_at': '2012-01-26T23:44:12+00:00',
    'reserves': 41400000,
    'other_expenses': 1200000,
    'income_from_endowments': 0,
    'cash': 73700000,
    'investment_gains': 400000,
    'total_expenses': 361100000,
    'current_investment_assets': 0,
    'charity_only_accounts': None,
    'charitable_activities_costs': 272700000,
    'total_funds': 83400000,
    'investment_management_costs': 200000,
    'fixed_investment_assets_at_end_of_year': 3400000,
    'voluntary_income_costs': 20800000,
    'fixed_assets_at_end_of_year': 15100000,
    'other_income': 1800000,
    'long_term_creditors_or_provisions': 10700000,
    'annual_return_code': 'AR11',
    'employees': 4689,
    'consolidated_accounts': True,
    'fixed_assets_at_start_of_year': 15600000,
    'updated_at': '2012-01-26T23:44:12+00:00',
    'total_income': 367500000,
    'charity_id': 92548,
    'gains_from_asset_revaluations': 0,
    'financial_year_start': '2010-04-01',
    'depreciation': 3900000,
    'investment_income': 900000,
    'income_from_charitable_activities': 140500000,
    'activities_generating_funds': 85900000,
    'creditors_within_1_year': 23100000,
    'fundraising_trading_costs': 65000000,
    'total_assets': 83400000,
    'gains_on_pension_fund': 8300000,
    'fixed_investment_assets_at_start_of_year': 3000000,
    'volunteers': 22000,
    'pension_assets': -18300000,
    'governance_costs': 1200000,
    'id': 31739,
    'income_from_legacies': 14100000,
    'restricted_funds': 38000000,
    'unrestricted_funds': 42900000},
   {'endowment_funds': 2600000,
    'grants_to_institutions': 90600000,
    'financial_year_end': '2012-03-31',
    'support_costs': 30600000,
    'voluntary_income': 129700000,
    'total_current_assets': 122700000,
    'created_at': '2014-06-12T22:47:36+01:00',
    'reserves': 39200000,
    'other_expenses': 300000,
    'income_from_endowments': 100000,
    'cash': 63300000,
    'investment_gains': 400000,
    'total_expenses': 378700000,
    'current_investment_assets': 0,
    'charity_only_accounts': None,
    'charitable_activities_costs': 286400000,
    'total_funds': 73900000,
    'investment_management_costs': 100000,
    'fixed_investment_assets_at_end_of_year': 3000000,
    'voluntary_income_costs': 23000000,
    'fixed_assets_at_end_of_year': 14800000,
    'other_income': 2300000,
    'long_term_creditors_or_provisions': 12500000,
    'annual_return_code': 'AR12',
    'employees': 4885,
    'consolidated_accounts': True,
    'fixed_assets_at_start_of_year': 15100000,
    'updated_at': '2014-06-12T22:47:36+01:00',
    'total_income': 385500000,
    'charity_id': 92548,
    'gains_from_asset_revaluations': 0,
    'financial_year_start': '2011-04-01',
    'depreciation': 3500000,
    'investment_income': 1200000,
    'income_from_charitable_activities': 162400000,
    'activities_generating_funds': 89900000,
    'creditors_within_1_year': 20500000,
    'fundraising_trading_costs': 67600000,
    'total_assets': 73900000,
    'gains_on_pension_fund': -16300000,
    'fixed_investment_assets_at_start_of_year': 3400000,
    'volunteers': 22000,
    'pension_assets': -30600000,
    'governance_costs': 1300000,
    'id': 45244,
    'income_from_legacies': 15000000,
    'restricted_funds': 39100000,
    'unrestricted_funds': 32200000},
   {'endowment_funds': 2600000,
    'grants_to_institutions': 79300000,
    'financial_year_end': '2013-03-31',
    'support_costs': 27700000,
    'voluntary_income': 111500000,
    'total_current_assets': 108400000,
    'created_at': '2014-06-12T22:47:36+01:00',
    'reserves': 26500000,
    'other_expenses': 1400000,
    'income_from_endowments': 0,
    'cash': 52900000,
    'investment_gains': 0,
    'total_expenses': 384600000,
    'current_investment_assets': 0,
    'charity_only_accounts': None,
    'charitable_activities_costs': 290000000,
    'total_funds': 55900000,
    'investment_management_costs': 100000,
    'fixed_investment_assets_at_end_of_year': 3000000,
    'voluntary_income_costs': 23200000,
    'fixed_assets_at_end_of_year': 16100000,
    'other_income': 800000,
    'long_term_creditors_or_provisions': 14100000,
    'annual_return_code': 'AR13',
    'employees': 5046,
    'consolidated_accounts': True,
    'fixed_assets_at_start_of_year': 14800000,
    'updated_at': '2014-06-12T22:47:36+01:00',
    'total_income': 367900000,
    'charity_id': 92548,
    'gains_from_asset_revaluations': 0,
    'financial_year_start': '2012-04-01',
    'depreciation': 3200000,
    'investment_income': 900000,
    'income_from_charitable_activities': 166000000,
    'activities_generating_funds': 88700000,
    'creditors_within_1_year': 25900000,
    'fundraising_trading_costs': 68600000,
    'total_assets': 55900000,
    'gains_on_pension_fund': -1300000,
    'fixed_investment_assets_at_start_of_year': 3000000,
    'volunteers': 22000,
    'pension_assets': -28600000,
    'governance_costs': 1300000,
    'id': 45245,
    'income_from_legacies': 11800000,
    'restricted_funds': 31700000,
    'unrestricted_funds': 21600000}],
  'corrected_company_number': '00612172',
  'charity_number': '202918',
  'address': {'lat': 51.7332804,
   'addressee_type': 'Charity',
   'postal_code': 'OX4 2JY',
   'updated_at': '2012-04-30T01:03:50+01:00',
   'locality': 'OXFORD',
   'addressee_id': 92548,
   'raw_address': None,
   'created_at': '2010-08-31T23:07:32+01:00',
   'region': None,
   'id': 136328,
   'former': False,
   'country': None,
   'street_address': 'OXFAM GB, OXFAM, 2700 JOHN SMITH DRIVE, OXFORD BUSINESS PARK SOUTH',
   'lng': -1.2060825},
  'youtube_account_name': 'OxfamGreatBritain',
  'company_number': '612172',
  'normalised_title': 'oxfam',
  'twitter_account_name': 'oxfamgb',
  'classifications': [{'id': 678,
    'parent_id': None,
    'extended_title': None,
    'updated_at': '2012-01-26T08:54:30+00:00',
    'grouping': 'CharityClassification',
    'uid': '202',
    'title': 'ELDERLY/OLD PEOPLE',
    'created_at': '2010-10-26T07:28:30+01:00'},
   {'id': 691,
    'parent_id': None,
    'extended_title': None,
    'updated_at': '2012-01-26T08:54:30+00:00',
    'grouping': 'CharityClassification',
    'uid': '308',
    'title': 'SPONSORS OR UNDERTAKES RESEARCH',
    'created_at': '2010-10-26T07:28:30+01:00'},
   {'id': 685,
    'parent_id': None,
    'extended_title': None,
    'updated_at': '2012-01-26T08:54:30+00:00',
    'grouping': 'CharityClassification',
    'uid': '302',
    'title': 'MAKES GRANTS TO ORGANISATIONS',
    'created_at': '2010-10-26T07:28:30+01:00'},
   {'id': 689,
    'parent_id': None,
    'extended_title': None,
    'updated_at': '2012-01-26T08:54:30+00:00',
    'grouping': 'CharityClassification',
    'uid': '306',
    'title': 'PROVIDES SERVICES',
    'created_at': '2010-10-26T07:28:30+01:00'},
   {'id': 683,
    'parent_id': None,
    'extended_title': None,
    'updated_at': '2012-01-26T08:54:30+00:00',
    'grouping': 'CharityClassification',
    'uid': '207',
    'title': 'THE GENERAL PUBLIC/MANKIND',
    'created_at': '2010-10-26T07:28:30+01:00'},
   {'id': 690,
    'parent_id': None,
    'extended_title': None,
    'updated_at': '2012-01-26T08:54:30+00:00',
    'grouping': 'CharityClassification',
    'uid': '307',
    'title': 'PROVIDES ADVOCACY/ADVICE/INFORMATION',
    'created_at': '2010-10-26T07:28:30+01:00'},
   {'id': 677,
    'parent_id': None,
    'extended_title': None,
    'updated_at': '2012-01-26T08:54:30+00:00',
    'grouping': 'CharityClassification',
    'uid': '201',
    'title': 'CHILDREN/YOUNG PEOPLE',
    'created_at': '2010-10-26T07:28:30+01:00'},
   {'id': 733,
    'parent_id': 732,
    'extended_title': None,
    'updated_at': '2010-10-29T22:19:14+01:00',
    'grouping': 'ICNPO',
    'uid': '9100',
    'title': 'International activities',
    'created_at': '2010-10-29T22:19:14+01:00'},
   {'id': 667,
    'parent_id': None,
    'extended_title': None,
    'updated_at': '2012-01-26T08:54:30+00:00',
    'grouping': 'CharityClassification',
    'uid': '105',
    'title': 'THE PREVENTION OR RELIEF OF POVERTY',
    'created_at': '2010-10-26T07:28:30+01:00'},
   {'id': 686,
    'parent_id': None,
    'extended_title': None,
    'updated_at': '2012-01-26T08:54:30+00:00',
    'grouping': 'CharityClassification',
    'uid': '303',
    'title': 'PROVIDES OTHER FINANCE',
    'created_at': '2010-10-26T07:28:30+01:00'},
   {'id': 679,
    'parent_id': None,
    'extended_title': None,
    'updated_at': '2012-01-26T08:54:30+00:00',
    'grouping': 'CharityClassification',
    'uid': '203',
    'title': 'PEOPLE WITH DISABILITIES',
    'created_at': '2010-10-26T07:28:30+01:00'},
   {'id': 668,
    'parent_id': None,
    'extended_title': None,
    'updated_at': '2012-01-26T08:54:30+00:00',
    'grouping': 'CharityClassification',
    'uid': '106',
    'title': 'OVERSEAS AID/FAMINE RELIEF',
    'created_at': '2010-10-26T07:28:30+01:00'},
   {'id': 681,
    'parent_id': None,
    'extended_title': None,
    'updated_at': '2012-01-26T08:54:30+00:00',
    'grouping': 'CharityClassification',
    'uid': '205',
    'title': 'OTHER CHARITIES OR VOLUNTARY BODIES',
    'created_at': '2010-10-26T07:28:30+01:00'}],
  'title': 'OXFAM',
  'accounts': [{'spending': '384600000',
    'sir_url': 'http://apps.charitycommission.gov.uk/SIR/ENDS18\\0000202918_SIR_20130331_E.PDF',
    'accounts_url': None,
    'accounts_date': '31 Mar 2013',
    'income': '367900000',
    'consolidated': 13},
   {'spending': '378700000',
    'sir_url': 'http://apps.charitycommission.gov.uk/SIR/ENDS18\\0000202918_SIR_20120331_E.PDF',
    'accounts_url': None,
    'accounts_date': '31 Mar 2012',
    'income': '385500000',
    'consolidated': 13},
   {'spending': '361100000',
    'sir_url': None,
    'accounts_url': None,
    'accounts_date': '31 Mar 2011',
    'income': '367500000',
    'consolidated': 13},
   {'spending': '294800000',
    'sir_url': 'http://apps.charitycommission.gov.uk/SIR/ENDS18\\0000202918_SIR_10_E.PDF',
    'accounts_url': None,
    'accounts_date': '31 Mar 2010',
    'income': '318000000',
    'consolidated': 13},
   {'spending': '318600000',
    'sir_url': 'http://apps.charitycommission.gov.uk/SIR/ENDS18\\0000202918_SIR_09_E.PDF',
    'accounts_url': None,
    'accounts_date': '30 Apr 2009',
    'income': '308300000',
    'consolidated': 13}],
  'spending': None,
  'trustees': [{'uid': '2490956', 'full_name': 'MS MAJA DARUWALA'},
   {'uid': '3358688', 'full_name': 'MR MATTHEW HENRY MARTIN'},
   {'uid': '3480415', 'full_name': 'MS Katy Steward'},
   {'uid': '3517246', 'full_name': 'MS PATRICIA MARY ZIPFEL'},
   {'uid': '3517247', 'full_name': 'MR RAJIV JOSHI'},
   {'uid': '3570943', 'full_name': 'MS MARJORIE SCARDINO'},
   {'uid': '3666324', 'full_name': 'MR JAMES ANDREW DARCY'},
   {'uid': '3700697', 'full_name': 'KAREN BROWN MS'},
   {'uid': '3710351', 'full_name': 'MR DAVID PITT-WATSON'},
   {'uid': '11169670', 'full_name': 'MS NKOYO TOYO'},
   {'uid': '11369445', 'full_name': 'MR GAVIN MACNEILL STEWART'},
   {'uid': '11369446', 'full_name': 'MR STEPHEN MARK WALTON'}],
  'website': 'http://www.oxfam.org.uk',
  'last_checked': '2014-05-16T16:44:09+01:00',
  'housing_association_number': None,
  'signed_up_for_1010': False,
  'other_names': ['OXFAM GB'],
  'volunteers': 22000,
  'subsidiary_number': None,
  'fax': None,
  'accounts_date': '2013-03-31',
  'id': 92548,
  'date_removed': None,
  'manually_updated': None,
  'facebook_account_name': None,
  'feed_url': None,
  'date_registered': '1965-09-07',
  'employees': 5046,
  'telephone': '0870 333 2444',
  'vat_number': None,
  'governing_document': 'MEMORANDUM AND ARTICLES INCORPORATED 01/10/58 AS AMENDED 07/05/65,  26/07/86, 18/10/96, 10/07/98, 29/10/98, 05/01/99, 02/99, 13/01/ 00, 23/11/07  AND 24/04/09, NOW ARTICLES ADOPTED AND AMENDED BY SPECIAL RESOLUTION DATED 04/12/09.;'}}
In [159]:
#We can start to pull out specifics from this data...
chdata = jdata['charity']
fdata = chdata['financial_breakdown']

data={}
#The data will include employee and volunteer numbers..
for tmp in ['volunteers','employees']:
    data[tmp] = chdata[tmp]
 
#...as well as financial information
for tmp in ['assets','spending','income']:
    if fdata != None and tmp in fdata:
        for tmp2 in fdata[tmp]:
            data[tmp+'_'+tmp2] = fdata[tmp][tmp2]
data
Out[159]:
{'spending_generating_voluntary_income': '23200000',
 'assets_other_assets': '108400000',
 'assets_total_liabilities': '-68600000',
 'assets_own_use_assets': '13100000',
 'income_charitable': '166000000',
 'spending_investment_management': '100000',
 'assets_long_term_investments': '3000000',
 'volunteers': 22000,
 'income_trading': '88700000',
 'income_investment_gains': '0',
 'spending_other': '1400000',
 'income_voluntary': '111500000',
 'spending_charitable_activities': '290000000',
 'spending_trading': '68600000',
 'employees': 5046,
 'spending_governance': '1300000',
 'income_other': '800000',
 'income_investment': '900000'}
In [160]:
#Or how about get a list of trustees?
pd.DataFrame(jdata['charity']['trustees'])

#A search on the Charities Commission site - http://www.charitycommission.gov.uk/find-charities/ - for a particular charity
#  includes a list of trustees. Clicking a trustee name shows you other charities for which they are a trustee.
#Unfortunately, OpenCharities doesnlt seem to support this level of search - to implement it ourselves, we'd have to scrape the
# data ourselves...
Out[160]:
full_name uid
0 MS MAJA DARUWALA 2490956
1 MR MATTHEW HENRY MARTIN 3358688
2 MS Katy Steward 3480415
3 MS PATRICIA MARY ZIPFEL 3517246
4 MR RAJIV JOSHI 3517247
5 MS MARJORIE SCARDINO 3570943
6 MR JAMES ANDREW DARCY 3666324
7 KAREN BROWN MS 3700697
8 MR DAVID PITT-WATSON 3710351
9 MS NKOYO TOYO 11169670
10 MR GAVIN MACNEILL STEWART 11369445
11 MR STEPHEN MARK WALTON 11369446
In [149]:
#Simple function to look for trustees for a charity using OpenCharities by Charity Number
def lookupTrusteesForCharity(charitynumber):
    url = 'http://opencharities.org/charities/{0}.json'.format(charitynumber)
    jsondata = json.loads(requests.get(url).text)
    df = pd.DataFrame(jsondata['charity']['trustees'])
    df['charityname']=jsondata['charity']['title']
    df['charitynumber']=jsondata['charity']['charity_number']
    return df

lookupTrusteesForCharity('516163')
Out[149]:
full_name uid charityname charitynumber
0 MR MICHAEL PEARS 1556031 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP 516163
1 MR ROBERT THURLBY 1768031 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP 516163
2 MRS CYNTHIA HINDLEY 2026503 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP 516163
3 MR DAVID BUTTERWORTH 2026712 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP 516163
4 MR ADAM LOMAX 3256338 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP 516163
In [156]:
#Display charity classification areas

def lookupCharityClassification(charitynumber):
    url = 'http://opencharities.org/charities/{0}.json'.format(charitynumber)
    jsondata = json.loads(requests.get(url).text)
    df = pd.DataFrame(jsondata['charity']['classifications'])
    
    df['charityname']=jsondata['charity']['title']
    df['charitynumber']=jsondata['charity']['charity_number']
    return df

lookupCharityClassification('516163')
Out[156]:
created_at extended_title grouping id parent_id title uid updated_at charityname charitynumber
0 2010-10-26T07:28:30+01:00 None CharityClassification 664 NaN EDUCATION/TRAINING 102 2012-01-26T08:54:30+00:00 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP 516163
1 2010-10-26T07:28:30+01:00 None CharityClassification 677 NaN CHILDREN/YOUNG PEOPLE 201 2012-01-26T08:54:30+00:00 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP 516163
2 2010-10-26T07:28:30+01:00 None CharityClassification 687 NaN PROVIDES HUMAN RESOURCES 304 2012-01-26T08:54:30+00:00 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP 516163
3 2010-10-29T22:19:13+01:00 None ICNPO 714 712 Scout groups and youth clubs 4110 2010-10-29T22:19:13+01:00 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP 516163
4 2010-10-26T07:28:30+01:00 None CharityClassification 688 NaN PROVIDES BUILDINGS/FACILITIES/OPEN SPACE 305 2012-01-26T08:54:30+00:00 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP 516163
5 2010-10-26T07:28:30+01:00 None CharityClassification 672 NaN AMATEUR SPORT 110 2012-01-26T08:54:30+00:00 105TH SHEFFIELD (HIGH GREEN) SCOUT GROUP 516163
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
#Notes on sqlalchemy

#There's something odd goes on - sqlalchemy tablenames may include case by default.
#To drop a tablename with cased characters, enclose the name in "" in the DROP statement.
#I also seem to have a table that is listed but that doesn't exist?
In [42]:
#import sqlalchemy
#meta = sqlalchemy.MetaData(engine, schema='tm351test')
#meta.reflect()
#pdsql = pd.io.sql.PandasSQLAlchemy(engine, meta=meta)
In [43]:
#from sqlalchemy import inspect
#eng=inspect(engine)
#eng.get_columns('charitiescommission')
In [44]:
#dir(eng)
In [45]:
#eng.get_table_names()
In [46]:
#dir(chC)
In [47]:
#psql("SELECT table_schema,table_name FROM information_schema.tables \
#    ORDER BY table_schema,table_name;", engine)
In [ ]: