Quick Look at UK PFI Contracts Data

With cuts in local spending due to continue, I started to wonder about whether the spending commitments made under PFI contracts are fixed, or whether they can be cut too. (If they're fixed, it's perhaps yet another example of how public policymakers allow corporates to lock in their own, presumably profitable, budgets at the expense of everyone else...)

I'm not sure whether I can make any progress at all with unerstanding the above, but my first step is to try to find some data about current PFI contracts. I found a spreadsheet detailing 2013 summary data about current PFI contracts - here's a report of a quick conversation with it.

[For future reference, see eg Treasury - Seventeenth Report: Private Finance Initiative, 18 July 2011 ]

In [1]:
#Load in some libraries to help us work with the data
import pandas as pd
from ggplot import *

Getting the Data

In [2]:
#Data source - download the data to the data/ directory
#wget https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/267640/pfi_current_projects_list_march_2013_published_dec_2013.xlsx -P data
In [3]:
#Import the data we've just downloaded
df=pd.read_excel('data/pfi_current_projects_list_march_2013_published_dec_2013.xlsx',skiprows=1)

#Preview the first few rows
df[:3]
Out[3]:
Unique HMT Project ID Project Name Department Procuring authority Sector Constituency Region Project Status Date Of OJEU Date of preferred bidder ... Equity holder 4: change of ownership since March 2011? (Yes / No) Equity holder 5: Name Equity holder 5: Equity share (%) Equity holder 5: change of ownership since March 2011? (Yes / No) Equity holder 6: Name Equity holder 6: Equity share (%) Equity holder 6: change of ownership since March 2011? (Yes / No) SPV name SPV company number SPV address
0 2 Sunningdale Park Site CABINET OFFICE (GROUP) CABINET OFFICE Other Windsor ENGLAND_South East InOperation 1998-04-17 2002-03-01 ... NaN NaN NaN NaN NaN NaN NaN DeVere Venues Ltd not known DeVere Venues Ltd
1 3 Compass CROWN PROSECUTION SERVICE (GROUP) THE CROWN PROSECUTION SERVICE IT Infrastructure and communications More than one National/More than one region InOperation 2000-10-06 N/K ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 5 RRS Ernest Shackleton DEPT FOR BUSINESS INNOVATION AND SKILLS (GROUP) NATURAL ENVIRONMENT RESEARCH COUNCIL Other More than one National/More than one region InOperation 1997-12-01 N/K ... NaN NaN NaN NaN NaN NaN NaN RRS Ernest Shackleton N/K N/K

3 rows × 107 columns

Reviewing the Data

Each row of the data file corresponds to a separate PFI contract, identified by a Unique HMT Project ID. The Procuring Authority looks like it's the entity that procured the project under the auspices of a more centralised Department. If a Department cuts a budget to the Procuring Authority in the years to come, what happens to covering the costs of the PFI contract?

Contracts can be grouped by constituency, which allows us to look at the political fall out from PFI contracts for a particular MP, I guess? Note that some contracts may just be covered by a blanket "more than one constituency' response.

Let's have a look at what the columns relate to in more comprehesive detail:

In [4]:
#Check the column headings
df.columns.tolist()
Out[4]:
['Unique HMT Project ID',
 'Project Name',
 'Department',
 'Procuring authority',
 'Sector',
 'Constituency',
 'Region',
 'Project Status',
 'Date Of OJEU',
 'Date of preferred bidder',
 'Date of financial close',
 'Date of construction completion',
 'First date of operations',
 'Operational period of contract (years)',
 'On / Off balance sheet under IFRS',
 'On / Off balance sheet under ESA 95',
 'On / Off balance sheet under UK GAAP',
 'Capital Value (£m)',
 'Unitary charge payment 1992-93 (£m)',
 'Unitary charge payment 1993-94 (£m)',
 'Unitary charge payment 1994-95 (£m)',
 'Unitary charge payment 1995-96 (£m)',
 'Unitary charge payment 1996-97 (£m)',
 'Unitary charge payment 1997-98 (£m)',
 'Unitary charge payment 1998-99 (£m)',
 'Unitary charge payment 1999-00 (£m)',
 'Unitary charge payment 2000-01 (£m)',
 'Unitary charge payment 2001-02 (£m)',
 'Unitary charge payment 2002-03 (£m)',
 'Unitary charge payment 2003-04 (£m)',
 'Unitary charge payment 2004-05 (£m)',
 'Unitary charge payment 2005-06 (£m)',
 'Unitary charge payment 2006-07 (£m)',
 'Unitary charge payment 2007-08 (£m)',
 'Unitary charge payment 2008-09 (£m)',
 'Unitary charge payment 2009-10 (£m)',
 'Unitary charge payment 2010-11 (£m)',
 'Unitary charge payment 2011-12 (£m)',
 'Unitary charge payment 2012-13 (£m)',
 'Estimated unitary charge payment 2013-14 (£m)',
 'Estimated unitary charge payment 2014-15 (£m)',
 'Estimated unitary charge payment 2015-16 (£m)',
 'Estimated unitary charge payment 2016-17 (£m)',
 'Estimated unitary charge payment 2017-18 (£m)',
 'Estimated unitary charge payment 2018-19 (£m)',
 'Estimated unitary charge payment 2019-20 (£m)',
 'Estimated unitary charge payment 2020-21 (£m)',
 'Estimated unitary charge payment 2021-22 (£m)',
 'Estimated unitary charge payment 2022-23 (£m)',
 'Estimated unitary charge payment 2023-24 (£m)',
 'Estimated unitary charge payment 2024-25 (£m)',
 'Estimated unitary charge payment 2025-26 (£m)',
 'Estimated unitary charge payment 2026-27 (£m)',
 'Estimated unitary charge payment 2027-28 (£m)',
 'Estimated unitary charge payment 2028-29 (£m)',
 'Estimated unitary charge payment 2029-30 (£m)',
 'Estimated unitary charge payment 2030-31 (£m)',
 'Estimated unitary charge payment 2031-32 (£m)',
 'Estimated unitary charge payment 2032-33 (£m)',
 'Estimated unitary charge payment 2033-34 (£m)',
 'Estimated unitary charge payment 2034-35 (£m)',
 'Estimated unitary charge payment 2035-36 (£m)',
 'Estimated unitary charge payment 2036-37 (£m)',
 'Estimated unitary charge payment 2037-38 (£m)',
 'Estimated unitary charge payment 2038-39 (£m)',
 'Estimated unitary charge payment 2039-40 (£m)',
 'Estimated unitary charge payment 2040-41 (£m)',
 'Estimated unitary charge payment 2041-42 (£m)',
 'Estimated unitary charge payment 2042-43 (£m)',
 'Estimated unitary charge payment 2043-44 (£m)',
 'Estimated unitary charge payment 2044-45 (£m)',
 'Estimated unitary charge payment 2045-46 (£m)',
 'Estimated unitary charge payment 2046-47 (£m)',
 'Estimated unitary charge payment 2047-48 (£m)',
 'Estimated unitary charge payment 2048-49 (£m)',
 'Estimated unitary charge payment 2049-50 (£m)',
 'Estimated unitary charge payment 2050-51 (£m)',
 'Estimated unitary charge payment 2051-52 (£m)',
 'Estimated unitary charge payment 2052-53 (£m)',
 'Estimated unitary charge payment 2053-54 (£m)',
 'Estimated unitary charge payment 2054-55 (£m)',
 'Estimated unitary charge payment 2055-56 (£m)',
 'Estimated unitary charge payment 2056-57 (£m)',
 'Estimated unitary charge payment 2057-58 (£m)',
 'Estimated unitary charge payment 2058-59 (£m)',
 'Estimated unitary charge payment 2059-60 (£m)',
 'Equity holder 1: Name',
 'Equity holder 1: Equity share (%)',
 'Equity holder 1: change of ownership since March 2011? \n(Yes / No)',
 'Equity holder 2: Name',
 'Equity holder 2: Equity share (%)',
 'Equity holder 2: change of ownership since March 2011? \n(Yes / No)',
 'Equity holder 3: Name',
 'Equity holder 3: Equity share (%)',
 'Equity holder 3: change of ownership since March 2011? \n(Yes / No)',
 'Equity holder 4: Name',
 'Equity holder 4: Equity share (%)',
 'Equity holder 4: change of ownership since March 2011? \n(Yes / No)',
 'Equity holder 5: Name',
 'Equity holder 5: Equity share (%)',
 'Equity holder 5: change of ownership since March 2011? \n(Yes / No)',
 'Equity holder 6: Name',
 'Equity holder 6: Equity share (%)',
 'Equity holder 6: change of ownership since March 2011? \n(Yes / No)',
 'SPV name',
 'SPV company number',
 'SPV address']

The column headings split in to what we might term "core" administrative data, payment-by-date columns, other financial columns, and equity share columns.

Let's just tease some of those out into separate groupings that we might be able to make use of later.

In [5]:
#Set up some convenenience groupings of columns
basecols=['Unique HMT Project ID', 'Project Name', 'Department', 'Procuring authority', 'Sector', 'Constituency', 
          'Region', 'Project Status', 'Date Of OJEU', 'Date of preferred bidder', 'Date of financial close',
          'Date of construction completion', 'First date of operations', 'Operational period of contract (years)',
          'On / Off balance sheet under IFRS', 'On / Off balance sheet under ESA 95', 'On / Off balance sheet under UK GAAP',
          'Capital Value (£m)','SPV name', 'SPV company number', 'SPV address']
financecols=['On / Off balance sheet under IFRS', 'On / Off balance sheet under ESA 95', 'On / Off balance sheet under UK GAAP']
corecols=['Unique HMT Project ID','Project Name','Procuring authority','Sector','Capital Value (£m)']
datecols=['Unitary charge payment 1992-93 (£m)', 'Unitary charge payment 1993-94 (£m)', 'Unitary charge payment 1994-95 (£m)',
          'Unitary charge payment 1995-96 (£m)', 'Unitary charge payment 1996-97 (£m)', 'Unitary charge payment 1997-98 (£m)',
          'Unitary charge payment 1998-99 (£m)', 'Unitary charge payment 1999-00 (£m)', 'Unitary charge payment 2000-01 (£m)',
          'Unitary charge payment 2001-02 (£m)', 'Unitary charge payment 2002-03 (£m)', 'Unitary charge payment 2003-04 (£m)',
          'Unitary charge payment 2004-05 (£m)', 'Unitary charge payment 2005-06 (£m)', 'Unitary charge payment 2006-07 (£m)',
          'Unitary charge payment 2007-08 (£m)', 'Unitary charge payment 2008-09 (£m)', 'Unitary charge payment 2009-10 (£m)',
          'Unitary charge payment 2010-11 (£m)', 'Unitary charge payment 2011-12 (£m)', 'Unitary charge payment 2012-13 (£m)',
          'Estimated unitary charge payment 2013-14 (£m)', 'Estimated unitary charge payment 2014-15 (£m)',
          'Estimated unitary charge payment 2015-16 (£m)', 'Estimated unitary charge payment 2016-17 (£m)',
          'Estimated unitary charge payment 2017-18 (£m)', 'Estimated unitary charge payment 2018-19 (£m)',
          'Estimated unitary charge payment 2019-20 (£m)', 'Estimated unitary charge payment 2020-21 (£m)',
          'Estimated unitary charge payment 2021-22 (£m)', 'Estimated unitary charge payment 2022-23 (£m)',
          'Estimated unitary charge payment 2023-24 (£m)', 'Estimated unitary charge payment 2024-25 (£m)',
          'Estimated unitary charge payment 2025-26 (£m)', 'Estimated unitary charge payment 2026-27 (£m)',
          'Estimated unitary charge payment 2027-28 (£m)', 'Estimated unitary charge payment 2028-29 (£m)',
          'Estimated unitary charge payment 2029-30 (£m)', 'Estimated unitary charge payment 2030-31 (£m)',
          'Estimated unitary charge payment 2031-32 (£m)', 'Estimated unitary charge payment 2032-33 (£m)',
          'Estimated unitary charge payment 2033-34 (£m)', 'Estimated unitary charge payment 2034-35 (£m)',
          'Estimated unitary charge payment 2035-36 (£m)', 'Estimated unitary charge payment 2036-37 (£m)',
          'Estimated unitary charge payment 2037-38 (£m)', 'Estimated unitary charge payment 2038-39 (£m)',
          'Estimated unitary charge payment 2039-40 (£m)', 'Estimated unitary charge payment 2040-41 (£m)',
          'Estimated unitary charge payment 2041-42 (£m)', 'Estimated unitary charge payment 2042-43 (£m)',
          'Estimated unitary charge payment 2043-44 (£m)', 'Estimated unitary charge payment 2044-45 (£m)',
          'Estimated unitary charge payment 2045-46 (£m)', 'Estimated unitary charge payment 2046-47 (£m)',
          'Estimated unitary charge payment 2047-48 (£m)', 'Estimated unitary charge payment 2048-49 (£m)',
          'Estimated unitary charge payment 2049-50 (£m)', 'Estimated unitary charge payment 2050-51 (£m)',
          'Estimated unitary charge payment 2051-52 (£m)', 'Estimated unitary charge payment 2052-53 (£m)',
          'Estimated unitary charge payment 2053-54 (£m)', 'Estimated unitary charge payment 2054-55 (£m)',
          'Estimated unitary charge payment 2055-56 (£m)', 'Estimated unitary charge payment 2056-57 (£m)',
          'Estimated unitary charge payment 2057-58 (£m)', 'Estimated unitary charge payment 2058-59 (£m)',
          'Estimated unitary charge payment 2059-60 (£m)']
equitycols=['Equity holder 1: Name', 'Equity holder 1: Equity share (%)',
            'Equity holder 1: change of ownership since March 2011? \n(Yes / No)',
            'Equity holder 2: Name', 'Equity holder 2: Equity share (%)',
            'Equity holder 2: change of ownership since March 2011? \n(Yes / No)',
            'Equity holder 3: Name', 'Equity holder 3: Equity share (%)',
            'Equity holder 3: change of ownership since March 2011? \n(Yes / No)',
            'Equity holder 4: Name', 'Equity holder 4: Equity share (%)',
            'Equity holder 4: change of ownership since March 2011? \n(Yes / No)',
            'Equity holder 5: Name', 'Equity holder 5: Equity share (%)',
            'Equity holder 5: change of ownership since March 2011? \n(Yes / No)',
            'Equity holder 6: Name', 'Equity holder 6: Equity share (%)',
            'Equity holder 6: change of ownership since March 2011? \n(Yes / No)']

What are the unique departments?

In [6]:
df['Department'].unique()
Out[6]:
array(['CABINET OFFICE (GROUP)', 'CROWN PROSECUTION SERVICE (GROUP)',
       'DEPT FOR BUSINESS INNOVATION AND SKILLS (GROUP)',
       'DEPARTMENT FOR EDUCATION (GROUP)',
       'DEPT FOR COMMUNITIES AND LOCAL GOVERNMENT (GROUP)',
       'DEPARTMENT OF HEALTH (GROUP)',
       'DEPARTMENT FOR CULTURE, MEDIA AND SPORT (GROUP)',
       'DEPT FOR ENVIRONMENT, FOOD & RURAL AFFAIRS (GROUP)',
       'DEPARTMENT FOR TRANSPORT (GROUP)',
       'DEPARTMENT FOR WORK AND PENSIONS (GROUP)',
       'FOREIGN AND COMMONWEALTH OFFICE (GROUP)',
       'SECURITY AND INTELLIGENCE AGENCIES (GROUP)',
       'HM REVENUE AND CUSTOMS (GROUP)', 'HM TREASURY (GROUP)',
       'HOME OFFICE (GROUP)', 'MINISTRY OF DEFENCE (GROUP)',
       'MINISTRY OF JUSTICE (GROUP)', 'NORTHERN IRELAND EXECUTIVE (GROUP)',
       'SCOTTISH GOVERNMENT (GROUP)', 'WELSH ASSEMBLY GOVERNMENT (GROUP)',
       'DEPARTMENT OF ENERGY AND CLIMATE CHANGE (GROUP)'], dtype=object)

What sectors do the contracts relate to?

In [7]:
df['Sector'].unique()
Out[7]:
array(['Other', 'IT Infrastructure and communications',
       'Schools (Non-BSF)', 'Schools (BSF)', 'Offices',
       'Housing (Non-HRA)', 'Housing (HRA)', 'Emergency Services',
       'Social Care', 'Leisure Facilities', 'Libraries', 'Waste',
       'Street Lighting', 'Roads and Highway Maintenance',
       'Underground Rail', 'Tram/Light Rail', 'Hospitals and Acute Health',
       'Military facility', 'Housing (Military)', 'Equipment', 'Courts',
       'Prisons', 'Secure Training Centres (YJB)', 'Energy'], dtype=object)

We can also take this line of questioning further - how many contracts are awarded by sector?

In [8]:
#Group by sector and count the number of contracts in each group
df.groupby('Sector').size().order(ascending=False)
Out[8]:
Sector
Schools (Non-BSF)                       149
Hospitals and Acute Health              144
Schools (BSF)                            66
Offices                                  47
Waste                                    41
Other                                    40
Emergency Services                       39
Street Lighting                          32
Roads and Highway Maintenance            32
Social Care                              23
Housing (HRA)                            17
Leisure Facilities                       15
Military facility                        13
Housing (Non-HRA)                        12
Prisons                                  12
IT Infrastructure and communications     11
Courts                                    8
Libraries                                 7
Housing (Military)                        7
Secure Training Centres (YJB)             4
Equipment                                 2
Tram/Light Rail                           2
Underground Rail                          1
Energy                                    1
dtype: int64
In [9]:
#Use the same pattern as before, but this time group by procuring authority and the ten with the most contracts
df.groupby('Procuring authority').size().order(ascending=False)[:10]
Out[9]:
Procuring authority
MINISTRY OF DEFENCE                           44
DEPARTMENT FOR EDUCATION                      14
Highways Agency                               14
Leeds                                         13
National Offender Management Service (WPR)    10
SCOTTISH WATER                                 9
HM REVENUE & CUSTOMS                           8
Newcastle upon Tyne                            8
Manchester                                     7
Sheffield                                      7
dtype: int64

What is the total capital value by sector?

In [10]:
#The Capital Value column may contain things that aren't numbers. If so, ignore them so they don't break the sum calculation
tmp=df[df['Capital Value (£m)'].apply(lambda x: isinstance(x, (int, float)))]
#We can also force the type
tmp['Capital Value (£m)']=tmp['Capital Value (£m)'].astype(float)
#Now use the filtered dataframe to calculate the total Capital Value associated with each sector
tmp[['Sector','Capital Value (£m)']].groupby('Sector').sum().sort('Capital Value (£m)',ascending=False)
-c:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
Out[10]:
Capital Value (£m)
Sector
Hospitals and Acute Health 13267.460000
Other 7174.511898
Schools (Non-BSF) 7078.410987
Roads and Highway Maintenance 5155.386000
Waste 4809.200000
Offices 3766.522864
Schools (BSF) 3298.491825
Street Lighting 1427.198626
Military facility 1361.971000
Housing (HRA) 1198.834227
Emergency Services 1041.736000
IT Infrastructure and communications 867.100000
Tram/Light Rail 800.143000
Prisons 637.000000
Underground Rail 475.162100
Housing (Non-HRA) 433.786120
Leisure Facilities 296.387000
Social Care 265.975000
Housing (Military) 198.320000
Courts 195.000000
Equipment 194.340000
Libraries 157.700000
Secure Training Centres (YJB) 67.600000
Energy 43.000000

And how long do the contracts tend to remain operational for?

In [11]:
#With data in a dataframe, we can chart various elements of it
#For example, we can generate a histogram that shows a count of projects with a particular oprsational duration
df['Operational period of contract (years)'].hist()
Out[11]:
<matplotlib.axes.AxesSubplot at 0x7f5c7502add8>

How Much is Estimated as Going to Be Spent Year on Year?

One of the major questions that comes to my mind is "how much will be spent year on year?"

The spend is currently split, by project, across columns. We can look at the total spend in a particular year by summing down a columns:

In [12]:
df['Estimated unitary charge payment 2014-15 (£m)'].sum()
Out[12]:
10217.493401545797

The precision of that results looks a little bit too ragged, possibly in part becuase of the way the numbers are being represented as floating point numbers. Let's round the result:

In [13]:
df['Estimated unitary charge payment 2014-15 (£m)'].sum().round(2)
Out[13]:
10217.49

We can also look at the spend by procuring authority for a particular year, or department.

In [14]:
def spendBy_X(df,column, item, year=2014):
    ''' Display the total spend for a particular year, filtered by a particular value in a particular column '''
    
    if year<2013:  year='Unitary charge payment {0}-{1} (£m)'.format(year,year-1999)
    else: year='Estimated unitary charge payment {0}-{1} (£m)'.format(year,year-1999)

    print('{0} [total] for the {1} {2} is £{3}m across {4} contracts.'.format( 
            year,
            item,
            column,
            df[df[column]==item][year].sum().round(2),
            df[df[column]==item][year].count()
            ))


spendBy_X(df,'Procuring authority', 'Isle of Wight')
spendBy_X(df,'Department', 'DEPARTMENT FOR EDUCATION (GROUP)')
spendBy_X(df,'Sector', 'Emergency Services',2012)
Estimated unitary charge payment 2014-15 (£m) [total] for the Isle of Wight Procuring authority is £25.26m across 2 contracts.
Estimated unitary charge payment 2014-15 (£m) [total] for the DEPARTMENT FOR EDUCATION (GROUP) Department is £1008.42m across 168 contracts.
Unitary charge payment 2012-13 (£m) [total] for the Emergency Services Sector is £423.85m across 39 contracts.

If we want to plot a time series of spend across years, it's easiest if we move the data from the current wide format, in which the data for different years is spread across columns, to a long format in which we have a column for the year and a column for the corresponding value.

In [15]:
#Generate a view of the data that just has the core and date related columns
dfTime=df[corecols+datecols]
In [16]:
#Generate a long version of this data by melting the date related columns
dfTimeLong=pd.melt(dfTime, id_vars=corecols, value_vars= datecols )

#Now we'll do some tidying, using a differnt method to the previous one to ensure we have a numeric datatype for the value
#Turn the values to numerics
dfTimeLong['value']=dfTimeLong['value'].convert_objects(convert_numeric=True)
#Get rid of any non-numeric values
dfTimeLong=dfTimeLong.dropna(subset=['value'])
#Get rid of any zero values
dfTimeLong=dfTimeLong[dfTimeLong['value']!=0]
In [17]:
#Preview the data - note how the variable column identifies the year, and the value column the payment for that year
dfTimeLong[:3]
Out[17]:
Unique HMT Project ID Project Name Procuring authority Sector Capital Value (£m) variable value
3122 239 A1(M) Alconbury to Peterborough Highways Agency Roads and Highway Maintenance 128 Unitary charge payment 1996-97 (£m) 5.805675
3123 240 A19 Dishforth to Tyne Tunnel DBFO Highways Agency Roads and Highway Maintenance 29 Unitary charge payment 1996-97 (£m) 0.205000
3125 242 A30/A35 Exeter to Bere Regis Highways Agency Roads and Highway Maintenance 75 Unitary charge payment 1996-97 (£m) 5.400000

Using the long format data frame, if we filter by project ID we now get the yearly payments as rows rather than columns.

In [18]:
#Preview data for a particular project
dfTimeLong[dfTimeLong['Unique HMT Project ID']==523]
Out[18]:
Unique HMT Project ID Project Name Procuring authority Sector Capital Value (£m) variable value
12807 523 Eschools Data Warehouse DEPARTMENT FOR EDUCATION IT Infrastructure and communications 4.63 Unitary charge payment 2009-10 (£m) 0.37
13532 523 Eschools Data Warehouse DEPARTMENT FOR EDUCATION IT Infrastructure and communications 4.63 Unitary charge payment 2010-11 (£m) 0.35
14257 523 Eschools Data Warehouse DEPARTMENT FOR EDUCATION IT Infrastructure and communications 4.63 Unitary charge payment 2011-12 (£m) 0.36
14982 523 Eschools Data Warehouse DEPARTMENT FOR EDUCATION IT Infrastructure and communications 4.63 Unitary charge payment 2012-13 (£m) 0.37
15707 523 Eschools Data Warehouse DEPARTMENT FOR EDUCATION IT Infrastructure and communications 4.63 Estimated unitary charge payment 2013-14 (£m) 0.38
16432 523 Eschools Data Warehouse DEPARTMENT FOR EDUCATION IT Infrastructure and communications 4.63 Estimated unitary charge payment 2014-15 (£m) 0.32

The date range that a spend relates to is currently wrapped up in textual variable column values. Let's generate a new column that specifies a single year relating to the start of the tax year the spnd relates to. We can then plot our time series against this value.

In [19]:
#Let's do some more tidying - pull out the start year that a value relates to
dfTimeLong['year']=dfTimeLong['variable'].apply(lambda x: x.split('payment')[1].split('-')[0])
dfTimeLong['year']=dfTimeLong['year'].astype(int)

#Identify whether the value was an actual payment or an estimated one
dfTimeLong['typ']=dfTimeLong['variable'].apply(lambda x: 'estimated' if x.startswith('Estimated') else 'actual')
dfTimeLong[:3]
Out[19]:
Unique HMT Project ID Project Name Procuring authority Sector Capital Value (£m) variable value year typ
3122 239 A1(M) Alconbury to Peterborough Highways Agency Roads and Highway Maintenance 128 Unitary charge payment 1996-97 (£m) 5.805675 1996 actual
3123 240 A19 Dishforth to Tyne Tunnel DBFO Highways Agency Roads and Highway Maintenance 29 Unitary charge payment 1996-97 (£m) 0.205000 1996 actual
3125 242 A30/A35 Exeter to Bere Regis Highways Agency Roads and Highway Maintenance 75 Unitary charge payment 1996-97 (£m) 5.400000 1996 actual

We've now got a long format dataset with a handy date column.

Let's generate a view of the long form dataset that finds the total value across all contracts by year.

In [20]:
totalexposure=dfTimeLong[['year','value']].groupby(['year']).sum()
totalexposure.reset_index(inplace=True)
totalexposure[:3]
Out[20]:
year value
0 1996 25.880750
1 1997 156.139501
2 1998 562.001436

We can chart this total exposure using a line chart.

In [21]:
ggplot(totalexposure,aes(x='year',y='value')) \
    + geom_line() \
    + ggtitle('Total UK PFI estimated spend') \
    + ylab('Spend £m')
Out[21]:
<ggplot: (-9223363284646696174)>

It might be more useful to look at the spend required of a particular procuring authority.

Let's generate a new view of the data, exposure, that allows us to easily chart the spend for each one of them.

In [22]:
#What's the exposure of a particular procuring authority to contracted PFI spend each year?
exposure=dfTimeLong[['Procuring authority','year','value']].groupby(['Procuring authority','year']).sum()
exposure.reset_index(inplace=True)
exposure[:3]
Out[22]:
Procuring authority year value
0 Aberdeenshire 2001 0.2
1 Aberdeenshire 2002 2.4
2 Aberdeenshire 2003 3.0
In [23]:
#Have a peek at the data for a particular authority
ggplot(exposure[exposure['Procuring authority']=='Manchester'],aes(x='year',y='value'))+geom_line()
Out[23]:
<repr(<ggplot.ggplot.ggplot at 0x7f5c7445d9b0>) failed: KeyError: 0>
In [24]:
#Let's bundle that up into a function
def procAuthExposureChart(_df,procAuth):
    g=ggplot(_df[_df['Procuring authority']==procAuth],aes(x='year',y='value')) \
        + geom_line() + ylab('(Epected) Unitary Charge Payment (£m)') \
        + ggtitle(procAuth)
    return g

procAuthExposureChart(exposure,'Isle of Wight')
Out[24]:
<repr(<ggplot.ggplot.ggplot at 0x7f5c7453a2e8>) failed: KeyError: 0>

The exposure data view gives us a view of the total year on year exposure of a procuring authority. But what about it we want to break that down into separate projects?

Let's create a new, more refined view, that let's us do just that - again, we'll keep the data in a long form.

In [25]:
groupingrows= ['Unique HMT Project ID','Project Name','Procuring authority','year']
projexposure=dfTimeLong[groupingrows+['value']].groupby(groupingrows).sum()
projexposure.reset_index(inplace=True)
projexposure[:3]
Out[25]:
Unique HMT Project ID Project Name Procuring authority year value
0 2 Sunningdale Park Site CABINET OFFICE 2002 4.0
1 2 Sunningdale Park Site CABINET OFFICE 2003 3.8
2 2 Sunningdale Park Site CABINET OFFICE 2004 3.6

We can now ask about the data for a particular procuring authority, broken out by project.

In [26]:
#If assign the colour aeshetic to the project, we can chart each project with a separately coloured line
ggplot(projexposure[projexposure['Procuring authority']=='Manchester'],aes(x='year',y='value',colour='Project Name')) \
    + geom_line()
Out[26]:
<repr(<ggplot.ggplot.ggplot at 0x7f5c744bf630>) failed: KeyError: 0>

An alternative way of presenting the same data is to break each project out into a separately faceted chart.

In [27]:
ggplot(projexposure[projexposure['Procuring authority']=='Manchester'],aes(x='year',y='value')) \
    + geom_line() \
    + facet_wrap('Project Name', scales = "fixed")
Out[27]:
<ggplot: (8752191591295)>

That chart for Temple Primary School looks like a bit of a future liability? Does the data really say that?!

In [28]:
#Grab the row for the Temple Primary School project and melt the date columns, retaining years where a value is set
pd.melt(df[df['Project Name']=='Temple Primary School'][datecols]).dropna()
Out[28]:
variable value
9 Unitary charge payment 2001-02 (£m) 0.32
10 Unitary charge payment 2002-03 (£m) 0.46
11 Unitary charge payment 2003-04 (£m) 0.47
12 Unitary charge payment 2004-05 (£m) 0.51
13 Unitary charge payment 2005-06 (£m) 0.539
14 Unitary charge payment 2006-07 (£m) 0.551
15 Unitary charge payment 2007-08 (£m) 0.576
16 Unitary charge payment 2008-09 (£m) 0.593
17 Unitary charge payment 2009-10 (£m) 0.599
18 Unitary charge payment 2010-11 (£m) 0.618
19 Unitary charge payment 2011-12 (£m) 0.652
20 Unitary charge payment 2012-13 (£m) 0.701
21 Estimated unitary charge payment 2013-14 (£m) 0.72
22 Estimated unitary charge payment 2014-15 (£m) 0.815
23 Estimated unitary charge payment 2015-16 (£m) 0.835375
24 Estimated unitary charge payment 2016-17 (£m) 0.8562594
25 Estimated unitary charge payment 2017-18 (£m) 0.8776659
26 Estimated unitary charge payment 2018-19 (£m) 0.8996075
27 Estimated unitary charge payment 2019-20 (£m) 0.9220977
28 Estimated unitary charge payment 2020-21 (£m) 0.9451501
29 Estimated unitary charge payment 2021-22 (£m) 0.9687789
30 Estimated unitary charge payment 2022-23 (£m) 0.9929984
31 Estimated unitary charge payment 2023-24 (£m) 1.017823
32 Estimated unitary charge payment 2024-25 (£m) 1.043269
33 Estimated unitary charge payment 2025-26 (£m) 1.069351
34 Estimated unitary charge payment 2026-27 (£m) 4.718884
35 Estimated unitary charge payment 2027-28 (£m) 4.765719
36 Estimated unitary charge payment 2028-29 (£m) 4.813724
37 Estimated unitary charge payment 2029-30 (£m) 4.86293
38 Estimated unitary charge payment 2030-31 (£m) 4.913366
39 Estimated unitary charge payment 2031-32 (£m) 4.965062
40 Estimated unitary charge payment 2032-33 (£m) 5.018051
41 Estimated unitary charge payment 2033-34 (£m) 5.072365
42 Estimated unitary charge payment 2034-35 (£m) 5.128037
43 Estimated unitary charge payment 2035-36 (£m) 5.1851

It seems like it does. I wonder, with education projects, we could presumably start to look at education related open data to get information about things like the number of pupils on the school roll for the schools built under PFI contracts?

Perhaps monitoring PFI contracts over time would be a good test bed of open data for transparency?

For example, for hospitals, data about number of beds occupied and in total would give a clue as to occupancy. What other metrics from HSCIC might make sense when it comes to checking the health (?!;-) of hospitals built under PFI. (In the case of NHS hospitals, there is probably also spending data from the NHS trust that can in part be cross-referenced to on-going costs associated with a hospital?)

Hmm - this could all be worth thinking about more...

But back to the data for now...

How much has been spent in total on each project? Let's look at the top 10 by their (estimated) total unitary amount.

In [29]:
#Let projcodeTotal represent summed total estimated unitary values for each project
projcodeTotal=projexposure[['Project Name','value']].groupby('Project Name').sum()
projcodeTotal.reset_index(inplace=True)
projcodeTotal.sort('value',ascending=False)[:10]
Out[29]:
Project Name value
28 Allenby/Connaught 11630.645419
375 M25 Orbital 10699.460000
243 Future Strategic Tanker Aircraft (FSTA) 10656.300388
483 PRIME 9029.960000
23 Acute site rationalisation 7143.000000
436 New District General Hospital 4441.000000
27 Airwave 4143.948024
437 New District General Hospital, Walsgrave 4027.000000
580 Skynet 5 3971.217000
393 Manchester Waste 3907.450000

And how about estimated total unitary amounts for the most exposed procuring authorities?

In [30]:
projexposure[['Procuring authority','value']].groupby('Procuring authority').sum().sort('value',ascending=False)[:10]
Out[30]:
value
Procuring authority
MINISTRY OF DEFENCE 49437.918160
Highways Agency 18508.535212
DEPARTMENT FOR WORK AND PENSIONS 9029.960000
National Offender Management Service (WPR) 7906.770763
Barts Health NHS Trust 7880.000000
HM REVENUE & CUSTOMS 5901.700000
HOME OFFICE 5628.869578
SCOTTISH WATER 4381.180000
University Hospitals Coventry and Warwickshire NHS 4027.000000
Greater Manchester Waste Disposal Authority 3907.450000

One of the many things I don't understand about these contracts is the balance between the capital Value and the unitary amounts. Is there anything notable about the way these values compare?

For example, the extent to which the total unitary amount is some multiple of the capital amount, or the simple difference between the two amounts?

For each project, let's look at the total estimated unitary amount alongside the capital value, and perhaps also bring in the operational period of the contract too to try to normalise things a little?

In [31]:
#Create a view of the data over some core data for each project and the summed unitary spend for each project
projReview=pd.merge(df[['Unique HMT Project ID','Project Name','Capital Value (£m)','Operational period of contract (years)']],
                    projcodeTotal, on='Project Name')

#Make sure we're dealing with numbers...
projReview['Capital Value (£m)']=projReview['Capital Value (£m)'].convert_objects(convert_numeric=True)

#Let's start to generate some comparative metrics
#I'm not sure if these are in any way meaningful, but they may reveal some sort of signal...
#...if we can work out how to interpet them?!
#First, the extent to which the summed total unitary value is a multiple of the capital amount
projReview['multiplier']=projReview['value'].divide(projReview['Capital Value (£m)'])
#Secondly, the difference betweem the summed total unitary value is a multiple of the capital amount
projReview['excess']= projReview['value']-projReview['Capital Value (£m)']
#Thirdly, that difference on average over the operational period of the contract
projReview['avexcess']=projReview['excess']/projReview['Operational period of contract (years)']

#Now let's start to make some rankings...
#First, let's rank according to the multiplier...
projReview.sort('multiplier',ascending=False)[:5]
Out[31]:
Unique HMT Project ID Project Name Capital Value (£m) Operational period of contract (years) value multiplier excess avexcess
201 216 PFI Eastbrook 0.000000 30 165.100000 inf 165.100000 5.503333
289 366 New District General Hospital 60.000000 33 4441.000000 74.016667 4381.000000 132.757576
290 381 New District General Hospital 86.600000 31 4441.000000 51.281755 4354.400000 140.464516
12 81 BSF Wave1 23.328988 33 1141.466377 48.929099 1118.137389 33.882951
230 245 A69 Carlisle to Newcastle 9.000000 30 350.220000 38.913333 341.220000 11.374000

Let's try for some textual interpretation of that...

In [32]:
def interpreter1(_df):
    #txt='The {0} project has capital value £{1}m and estimated total unitary spend  of £{2}m, ({3} times more than the capital amount),  with an operational contract period of {4} years.'.format(_df['Project Name'],_df['Capital Value (£m)'], _df['value'], _df['multiplier'])
    txt=''
    if _df['Capital Value (£m)']!=0:
        txt='''The {0} project has capital value £{1}m and estimated total unitary spend of £{2}m, \
({3} times more than the capital amount), with an operational contract period of {4} years.\n'''.format(_df['Project Name'],
                           round(_df['Capital Value (£m)'],2),
                           round(_df['value'],2),
                           round(_df['multiplier'],2),
                           _df['Operational period of contract (years)'])
    print(txt)
    
projReview.sort('multiplier',ascending=False)[:5].apply(interpreter1,axis=1 )
The New District General Hospital project has capital value £60.0m and estimated total unitary spend of £4441.0m, (74.02 times more than the capital amount), with an operational contract period of 33 years.

The New District General Hospital project has capital value £86.6m and estimated total unitary spend of £4441.0m, (51.28 times more than the capital amount), with an operational contract period of 31 years.

The BSF Wave1 project has capital value £23.33m and estimated total unitary spend of £1141.47m, (48.93 times more than the capital amount), with an operational contract period of 33 years.

The A69 Carlisle to Newcastle project has capital value £9.0m and estimated total unitary spend of £350.22m, (38.91 times more than the capital amount), with an operational contract period of 30 years.

Out[32]:
201    None
289    None
290    None
12     None
230    None
dtype: object

We could also sort based on difference in amount between the total estimated unitary spend and the capital amount.

In [33]:
projReview.sort('excess',ascending=False)[:5]
Out[33]:
Unique HMT Project ID Project Name Capital Value (£m) Operational period of contract (years) value multiplier excess avexcess
391 421 Allenby/Connaught 1298.425000 35 11630.645419 8.957503 10332.220419 295.206298
232 247 M25 Orbital 988.117000 31 10699.460000 10.828131 9711.343000 313.269129
257 282 PRIME 990.000000 20 9029.960000 9.121172 8039.960000 401.998000
405 437 Future Strategic Tanker Aircraft (FSTA) 2687.627898 24 10656.300388 3.964946 7968.672490 332.028020
261 286 Acute site rationalisation 1149.000000 43 7143.000000 6.216710 5994.000000 139.395349

Are there any projects where the estimated total unitary amount is less than the captial amount? (that is, where the excess column is less than zero?)

In [34]:
projReview.dropna(subset=['excess']).sort('excess')[:15]
Out[34]:
Unique HMT Project ID Project Name Capital Value (£m) Operational period of contract (years) value multiplier excess avexcess
244 263 Doncaster Interchange 200.000000 33 94.870000 0.474350 -1.051300e+02 -3.185758e+00
600 646 Enniskillen Hospital 223.910000 30 158.114169 0.706151 -6.579583e+01 -2.193194e+00
175 188 Brent Round 2 Non-HRA Housing 113.700000 19 91.450000 0.804310 -2.225000e+01 -1.171053e+00
476 517 Classroom 2000 Post Primary and Special local ... 102.370000 7 85.950000 0.839601 -1.642000e+01 -2.345714e+00
477 518 Classroom 2000 Wide Area Network managed servi... 120.510000 7 104.718000 0.868957 -1.579200e+01 -2.256000e+00
158 171 Merlyn Vaz Centre (LIFT project - formerly Cha... 12.095000 26 4.493649 0.371529 -7.601351e+00 -2.923597e-01
148 161 Warrington Phase 1: Anson & Blenheim ... 20.000000 33 14.520000 0.726000 -5.480000e+00 -1.660606e-01
482 523 Eschools Data Warehouse 4.630000 5 2.150000 0.464363 -2.480000e+00 -4.960000e-01
144 157 Selby, Sherburn and Tadcaster 16.320000 31 14.427000 0.884007 -1.893000e+00 -6.106452e-02
124 133 NE Derbyshire Holmewood Non-HRA 1.750000 15 0.744005 0.425146 -1.005995e+00 -6.706633e-02
672 821 Classroom 2000 Local Area Network services, Wi... 169.960000 5 169.960000 1.000000 -2.842171e-14 -5.684342e-15
478 519 Classrooom 2000 Primary Service local area net... 51.080000 5 51.080000 1.000000 7.105427e-15 1.421085e-15
485 526 Wellington College/Balmoral High School PFI Pr... 17.700000 25 18.200000 1.028249 5.000000e-01 2.000000e-02
35 39 Special Schools PFI 70.142387 31 70.828428 1.009781 6.860410e-01 2.213035e-02
189 202 Penzance Leisure Centre 6.700000 8 8.200000 1.223881 1.500000e+00 1.875000e-01

Finally, how about if we rank according the projects with the greatest "excess" amount averaged over the operational life of the contract? (So these projects presumably have expensive annual running costs?)

In [35]:
projReview.sort('avexcess',ascending=False)[:5]
Out[35]:
Unique HMT Project ID Project Name Capital Value (£m) Operational period of contract (years) value multiplier excess avexcess
257 282 PRIME 990.000000 20 9029.960000 9.121172 8039.960000 401.998000
405 437 Future Strategic Tanker Aircraft (FSTA) 2687.627898 24 10656.300388 3.964946 7968.672490 332.028020
232 247 M25 Orbital 988.117000 31 10699.460000 10.828131 9711.343000 313.269129
391 421 Allenby/Connaught 1298.425000 35 11630.645419 8.957503 10332.220419 295.206298
400 431 Defence Fixed Telecommunications Service (DFTS) 277.000000 19 3756.895000 13.562798 3479.895000 183.152368

Questions of Equity

Recall that there were several columns that identify equity shares in the project by one or more named partners.

In [36]:
equitycols
Out[36]:
['Equity holder 1: Name',
 'Equity holder 1: Equity share (%)',
 'Equity holder 1: change of ownership since March 2011? \n(Yes / No)',
 'Equity holder 2: Name',
 'Equity holder 2: Equity share (%)',
 'Equity holder 2: change of ownership since March 2011? \n(Yes / No)',
 'Equity holder 3: Name',
 'Equity holder 3: Equity share (%)',
 'Equity holder 3: change of ownership since March 2011? \n(Yes / No)',
 'Equity holder 4: Name',
 'Equity holder 4: Equity share (%)',
 'Equity holder 4: change of ownership since March 2011? \n(Yes / No)',
 'Equity holder 5: Name',
 'Equity holder 5: Equity share (%)',
 'Equity holder 5: change of ownership since March 2011? \n(Yes / No)',
 'Equity holder 6: Name',
 'Equity holder 6: Equity share (%)',
 'Equity holder 6: change of ownership since March 2011? \n(Yes / No)']

Let's see what we can learn about particular partners.

We'll start by generating a view of the equity partners associated with each project in a wide form, directly from the original dataset.

In [37]:
#Use the equity cols list to filter the columns we want, keying each row by project ID
df[['Unique HMT Project ID']+equitycols][:2]
Out[37]:
Unique HMT Project ID Equity holder 1: Name Equity holder 1: Equity share (%) Equity holder 1: change of ownership since March 2011? (Yes / No) Equity holder 2: Name Equity holder 2: Equity share (%) Equity holder 2: change of ownership since March 2011? (Yes / No) Equity holder 3: Name Equity holder 3: Equity share (%) Equity holder 3: change of ownership since March 2011? (Yes / No) Equity holder 4: Name Equity holder 4: Equity share (%) Equity holder 4: change of ownership since March 2011? (Yes / No) Equity holder 5: Name Equity holder 5: Equity share (%) Equity holder 5: change of ownership since March 2011? (Yes / No) Equity holder 6: Name Equity holder 6: Equity share (%) Equity holder 6: change of ownership since March 2011? (Yes / No)
0 2 De Vere Venues Ltd 1 NO NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 3 Data not supplied 1 NO NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

If we had a multi-index (that is, a hierarchical index) defined over the columns identifying each equity holder number as a three column wide spanning index, and a name, share and ownership change attribute within each, I think we could stack the data into a long format dataframe with name/share/change columns and a row for each equity holder in each project.

I don't offhand know how to create a multi-index from the data frame we have, so can't try this. Instead, we'll have to hack a way of getting the data into an appropriately long formatted data frame. We can do this by grabbing blocks of three columns at a tine to create a partial dataframe of the correct shape, and then just concat the dataframes from each three way grouping.

In [40]:
#Create a dummy dataframe to hold the long format data
dfequity=pd.DataFrame()
#There are six possible equity partner column groupings
for i in range(1, 6):
    #Calculate the index of the left most column in the current grouping
    lower=(i-1)*3
    #Generate a data frame from the project id and the three columns in the current equity grouping
    tmp=df[['Unique HMT Project ID','Project Name','SPV name','SPV company number']+equitycols][[0,1,2,3]+np.arange(lower+4,lower+6).tolist()]
    #Rename the columns
    tmp.columns=['Unique HMT Project ID','Project Name','SPV name','SPV company number','equityName','equityShare']
    #make sure the equity chart converts to a numeric
    tmp['equityShare']=tmp['equityShare'].convert_objects(convert_numeric=True)
    #Only keep rows where we have an equity share recorded
    tmp=tmp.dropna(subset=['equityShare'])
    #Add the data from the current equity grouping to the growing long format dataframe
    dfequity=pd.concat([dfequity,tmp])
In [41]:
#Let's see what our dataframe looks like
dfequity[:3]
Out[41]:
Unique HMT Project ID Project Name SPV name SPV company number equityName equityShare
0 2 Sunningdale Park Site DeVere Venues Ltd not known De Vere Venues Ltd 1
1 3 Compass NaN NaN Data not supplied 1
2 5 RRS Ernest Shackleton RRS Ernest Shackleton N/K Polar Holding ASA of Norway 1

We can also generate a summary of the equity partners involved in a project.

In [42]:
def showPartner(partner):
    txt='\n * {0} ({1}% equity share)'.format(partner[1]['equityName'],100*partner[1]['equityShare'])
    return txt

def equityPartnersInProject(_df, project,typ='name',endstr='\n'):
    if typ=='name':pp=_df[_df['Project Name']==project]
    elif typ=='id':pp=_df[_df['Unique HMT Project ID']==project]
    else: return
    
    txt='The {0} project (project id: {1}) has {2} equity partner(s):'.format(pp['Project Name'].iloc[0],pp['Unique HMT Project ID'].iloc[0],len(pp))
    for partner in pp.iterrows():
        txt+=showPartner(partner)
    txt+=endstr
    print(txt)
In [43]:
equityPartnersInProject(dfequity, 'Lambeth Street Lighting')
The Lambeth Street Lighting project (project id: 256) has 3 equity partner(s):
 * IIC Projects Ltd (68.0% equity share)
 * IIC Lambeth Funding Investment limited (17.0% equity share)
 * ETDE Infrastructure Limited (15.0% equity share)

Let's also just quickly generate a report of equity partners in projects associated with a particular SPV.

In [101]:
def equityPartnersBySPV(_df,spv,typ='name',endstr='\n'):
    if typ=='name':pp=_df[_df['SPV name'].notnull & _df['SPV name'].str.contains(spv)]
    elif typ=='id':pp=_df[_df['SPV company number']==spv]
    else: return
    
    if len(pp)==0:
        txt='No SPV with that {0} found.'.format(typ)
    else:
        txt='The {0} SPV has {1} equity partner(s):'.format(pp['SPV name'].iloc[0],len(pp))
    for partner in pp.iterrows():
        txt+=showPartner(partner)
    txt+=endstr
    print(txt)
In [104]:
equityPartnersBySPV(dfequity,'Hounslow')
equityPartnersBySPV(dfequity,'Island Roads')
The Hounslow Highway Services Limited SPV has 2 equity partner(s):
 * Vinci Infrastructures SAS (50.0% equity share)
 * BIIF Holdco ltd. (50.0% equity share)

The Island Roads Services Ltd SPV has 2 equity partner(s):
 * Vinci Concessions SAS (50.0% equity share)
 * Merdiam Infrastructure Finance II S.a.r.l. (50.0% equity share)

For completeness, let's do a search by equity partner.

In [113]:
def showPartnerInProj(partner):
    txt='\n * {0} ({1}% equity share in {2} (project id: {3}), SPV: {4})'.format(partner[1]['equityName'],100*partner[1]['equityShare'],
                                                     partner[1]['Project Name'],partner[1]['Unique HMT Project ID'],
                                                     partner[1]['SPV name'])
    return txt

def equitySearch(_df,equity,endstr='\n'):
    pp=_df[_df['equityName'].str.contains(equity)]
    
    if len(pp)==0:
        txt='No equity partner with that name found.'
    else:
        txt='That equity is associated with:'
    spv=[]
    for partner in pp.iterrows():
        txt+=showPartnerInProj(partner)
        if partner[1]['SPV name'] not in spv: spv.append(partner[1]['SPV name'])
    txt+=','.join(spv)+endstr
    print(txt)
In [114]:
equitySearch(dfequity,'G4S')
That equity is associated with:
 * G4S plc (51.0% equity share in Three North Wiltshire Secondary Schools (project id: 127), SPV: N/K)
 * G4S PLC (51%) a company in the corporate structure of ownership of UK Court Services (Manchester) Limited. (51.0% equity share in Manchester Magistrates' Courts (project id: 477), SPV: UK Courts Services (Manchester) Ltd)
 * G4S Joint Ventures Limited through G4S Joint Ventures (Fazakerley) Limited (100.0% equity share in HMP Altcourse (project id: 478), SPV: Fazakerley Prison Services Ltd)
 * G4S SPV Holdings Limited (59.0% equity share in HMP Parc (project id: 484), SPV: Bridgend Custodial Services Ltd)
 * G4S Joint Ventures Limited through G4S Joint Ventures (Onley) Limited (100.0% equity share in HMP Rye Hill (project id: 486), SPV: Onley Prison Services Ltd)
 * G4S Justice Services Ltd (50.0% equity share in Milton Keynes STC (project id: 489), SPV: N/K)
 * G4S plc (51.0% equity share in Group Schools (project id: 59), SPV: N/K)
 * G4S Joint Ventures Ltd (25.0% equity share in Orthopaedics/medicine redevelopment at Nuffield Orthopaedic Centre (project id: 342), SPV: Albion Healthcare (Oxford) Ltd)
 * G4S Joint Ventures Ltd (20.0% equity share in Integrated Cancer Centre (project id: 343), SPV: Ochre Solutions Limited)N/K,UK Courts Services (Manchester) Ltd,Fazakerley Prison Services Ltd,Bridgend Custodial Services Ltd,Onley Prison Services Ltd,Albion Healthcare (Oxford) Ltd,Ochre Solutions Limited

Having got a long data frame with equity holdings keyed by project ID, we can annotate each of these rows with data about each project.

For example, we can take project data that describes the project name, procuring authority and spend by year:

In [83]:
projexposure[:3]
Out[83]:
Unique HMT Project ID Project Name Procuring authority year value
0 2 Sunningdale Park Site CABINET OFFICE 2002 4.0
1 2 Sunningdale Park Site CABINET OFFICE 2003 3.8
2 2 Sunningdale Park Site CABINET OFFICE 2004 3.6

and inner join it by project id to the equity data.

This will give us a dataset that we should be able to use as the basis for asking questions about the spend, by equity share, to different partners, by year and/or project, procuring authority or department.

In [84]:
projexco=pd.merge(dfequity,projexposure,on=['Unique HMT Project ID','Project Name'])
projexco[:3]
Out[84]:
Unique HMT Project ID Project Name SPV name SPV company number equityName equityShare Procuring authority year value
0 2 Sunningdale Park Site DeVere Venues Ltd not known De Vere Venues Ltd 1 CABINET OFFICE 2002 4.0
1 2 Sunningdale Park Site DeVere Venues Ltd not known De Vere Venues Ltd 1 CABINET OFFICE 2003 3.8
2 2 Sunningdale Park Site DeVere Venues Ltd not known De Vere Venues Ltd 1 CABINET OFFICE 2004 3.6

Before we think about how to do that though, let's just check that the shares donlt add up to more than 100%.

In [85]:
#Do any rows have equity share sums  > 100% ?
test=projexco[['Unique HMT Project ID','year','equityShare']].groupby(['Unique HMT Project ID','year']).sum().sort(ascending=False)
test.reset_index(inplace=True)
#May get floating point errors
test[test['equityShare']>1.00000000000001]['Unique HMT Project ID'].unique()
Out[85]:
array([853,   8])

Oops - a couple of the projects do... Best go check the data for those two project IDs...

In [86]:
equityPartnersInProject(dfequity, 853,typ='id')
equityPartnersInProject(dfequity, 8,typ='id')
The BSF Wave 3 Phase 1 project (project id: 853) has 5 equity partner(s):
 * Laing O'Rourke (40.0% equity share)
 * Laing O'Rourke (40.0% equity share)
 * John Laing (40.0% equity share)
 * Amber Infrastructure (10.0% equity share)
 * Barnsley Metropolitan Borough Council (10.0% equity share)

The Barnsley Schools PFI project (project id: 8) has 2 equity partner(s):
 * Carillion (50.0% equity share)
 * Innisfree PFI Fund III (100.0% equity share)

Back to thinking about the spend to each partner - we need to work out the value retained by each partner, according to their share.

In [87]:
projexco['valueShare']=projexco['value']*projexco['equityShare']
projexco.reset_index(inplace=True)
projexco[123:129]
Out[87]:
index Unique HMT Project ID Project Name SPV name SPV company number equityName equityShare Procuring authority year value valueShare
123 123 8 Barnsley Schools PFI Modern Schools Barnsley 5259225 Carillion 0.5 Barnsley 2027 8.250000 4.125000
124 124 8 Barnsley Schools PFI Modern Schools Barnsley 5259225 Carillion 0.5 Barnsley 2028 8.234000 4.117000
125 125 8 Barnsley Schools PFI Modern Schools Barnsley 5259225 Carillion 0.5 Barnsley 2029 8.089000 4.044500
126 126 8 Barnsley Schools PFI Modern Schools Barnsley 5259225 Carillion 0.5 Barnsley 2030 7.471000 3.735500
127 127 8 Barnsley Schools PFI Modern Schools Barnsley 5259225 Carillion 0.5 Barnsley 2031 7.071000 3.535500
128 128 8 Barnsley Schools PFI Modern Schools Barnsley 5259225 Carillion 0.5 Barnsley 2032 1.669496 0.834748

Now let's do some more counting. We can take the long format data that gives separate rows for each equity partner by year and project, and group it by equity partner and year - this will give us an estimate of how much each partner will rake in acorss all the projects thay are partnered in each year, based on their equity share in each project and the estimated unitary amount for that project in that year.

In [88]:
projexcoAnnual=projexco[['equityName','year','valueShare']].groupby(['equityName','year']).sum()
projexcoAnnual.reset_index(inplace=True)

Let's see who the winners are for financial year 2014-15.

In [89]:
projexcoAnnual[projexcoAnnual['year']==2014].sort('valueShare',ascending=False)[:10]
Out[89]:
equityName year valueShare
11358 Innisfree 2014 527.497496
21565 Telereal Trillium 2014 507.300000
6862 EADS 2014 415.951000
15453 Maquarie European Infrastructure Fund II (MEIFII) 2014 282.058605
12712 John Laing 2014 280.280541
9156 HICL 2014 251.141000
19704 Semperian 2014 226.540450
4257 British Telecom 2014 211.770000
15433 Mapeley 2014 144.900000
2187 Balfour Beatty Plc 2014 135.069276

Let's just see if Trillium appears in any other contracts for 2014-15.

In [90]:
projexco[(projexco['equityName'].str.contains('Trillium')) & (projexco['year']==2014)]
Out[90]:
index Unique HMT Project ID Project Name SPV name SPV company number equityName equityShare Procuring authority year value valueShare
10246 10246 234 Estates Transformation N/K N/K Telereal Trillium (DV1A) Ltd 1.00 DRIVER AND VEHICLE LICENSING AGENCY 2014 22.95 22.950
12244 12244 282 PRIME Trillium (PRIME) Ltd NaN Telereal Trillium 1.00 DEPARTMENT FOR WORK AND PENSIONS 2014 507.30 507.300
17628 17628 384 Berlin Embassy Arteos HRA 112199 Land Securities Trillium 1.00 FOREIGN AND COMMONWEALTH OFFICE 2014 5.48 5.480
21632 21632 489 Milton Keynes STC N/K N/K Trillium PPP Investment Partners LLP 0.50 YOUTH JUSTICE BOARD 2014 18.30 9.150
21704 21704 494 STC Onley N/K N/K Land Securities Trillium Ltd 0.49 YOUTH JUSTICE BOARD 2014 2.60 1.274

Hmmm... so we can maybe add in about another £40m to their estimated take for that year.

We can also reuse a previous explainer to generate a report about other members of those projects:

In [91]:
trillium=projexco[(projexco['equityName'].str.contains('Trillium')) & (projexco['year']==2014)]
for project in trillium['Project Name']:
    equityPartnersInProject(dfequity, project)
    projReview[projReview['Project Name']==project].apply(interpreter1,axis=1 )
The Estates Transformation project (project id: 234) has 1 equity partner(s):
 * Telereal Trillium (DV1A)  Ltd (100.0% equity share)

The Estates Transformation project has capital value £93.21m and estimated total unitary spend of £466.8m, (5.01 times more than the capital amount), with an operational contract period of 20 years.

The PRIME project (project id: 282) has 1 equity partner(s):
 * Telereal Trillium (100.0% equity share)

The PRIME project has capital value £990.0m and estimated total unitary spend of £9029.96m, (9.12 times more than the capital amount), with an operational contract period of 20 years.

The Berlin Embassy project (project id: 384) has 1 equity partner(s):
 * Land Securities Trillium (100.0% equity share)

The Berlin Embassy project has capital value £17.1m and estimated total unitary spend of £155.59m, (9.1 times more than the capital amount), with an operational contract period of 30 years.

The Milton Keynes STC project (project id: 489) has 2 equity partner(s):
 * G4S Justice Services Ltd (50.0% equity share)
 * Trillium PPP Investment Partners LLP (50.0% equity share)

The Milton Keynes STC project has capital value £19.0m and estimated total unitary spend of £463.2m, (24.38 times more than the capital amount), with an operational contract period of 25 years.

The STC Onley project (project id: 494) has 2 equity partner(s):
 * GSL Ltd (51.0% equity share)
 * Land Securities Trillium Ltd (49.0% equity share)

The STC Onley project has capital value £18.9m and estimated total unitary spend of £127.6m, (6.75 times more than the capital amount), with an operational contract period of 15 years.

Let's tweak that phrasing a little...

In [92]:
def projDesc(_df,startstr='',endstr='\n'):
    if startstr=='': txt='The {0} project'.format(_df['Project Name'])
    else: txt=startstr
    if _df['Capital Value (£m)']!=0:
        txt+='''has capital value £{0}m and estimated total unitary spend of £{1}m, \
({2} times more than the capital amount), with an operational contract period of {3} years.'''.format(
                           round(_df['Capital Value (£m)'],2),
                           round(_df['value'],2),
                           round(_df['multiplier'],2),
                           _df['Operational period of contract (years)'])
    txt+=endstr
    print(txt)
    
In [93]:
for project in trillium['Project Name'][:1]:
    equityPartnersInProject(dfequity, project)
    projDesc(projReview[projReview['Project Name']==project].iloc[0],'It ')
The Estates Transformation project (project id: 234) has 1 equity partner(s):
 * Telereal Trillium (DV1A)  Ltd (100.0% equity share)

It has capital value £93.21m and estimated total unitary spend of £466.8m, (5.01 times more than the capital amount), with an operational contract period of 20 years.

To make further conversations easier, let's do a little more wrapping, for example, generating a report that shows projects for a particular year involving an equity partner whose name includes a specified keyword or keyphrase.

In [94]:
def projReportbyPartnerAndYear(partner,year):
    #Note that the search is case sensitive...
    projects=projexco[(projexco['equityName'].str.contains(partner)) & (projexco['year']==year)]
    for project in projects['Project Name']:
        equityPartnersInProject(dfequity, project,endstr='')
        projDesc(projReview[projReview['Project Name']==project].iloc[0],'It ')
In [95]:
projReportbyPartnerAndYear('G4S',2015)
The Group Schools project (project id: 59) has 2 equity partner(s):
 * Semperian PPP Investment Partners GP (49.0% equity share)
 * G4S plc (51.0% equity share)
It has capital value £nanm and estimated total unitary spend of £374.08m, (nan times more than the capital amount), with an operational contract period of 34 years.

The Three North Wiltshire Secondary Schools project (project id: 127) has 2 equity partner(s):
 * G4S plc (51.0% equity share)
 * Semperian PPP Investment Partners GP (49.0% equity share)
It has capital value £26.8m and estimated total unitary spend of £176.54m, (6.59 times more than the capital amount), with an operational contract period of 36 years.

The Orthopaedics/medicine redevelopment at Nuffield Orthopaedic Centre project (project id: 342) has 4 equity partner(s):
 * Barclays (25.0% equity share)
 * HICL (25.0% equity share)
 * G4S Joint Ventures Ltd (25.0% equity share)
 * Semperian (25.0% equity share)
It has capital value £37.0m and estimated total unitary spend of £325.0m, (8.78 times more than the capital amount), with an operational contract period of 32 years.

The Integrated Cancer Centre project (project id: 343) has 3 equity partner(s):
 * Impregilo International Infrastructure NV (40.0% equity share)
 * HICL (40.0% equity share)
 * G4S Joint Ventures Ltd (20.0% equity share)
It has capital value £129.0m and estimated total unitary spend of £891.0m, (6.91 times more than the capital amount), with an operational contract period of 31 years.

The Manchester Magistrates' Courts project (project id: 477) has 2 equity partner(s):
 * G4S PLC (51%) a company in the corporate structure of ownership of UK Court Services (Manchester) Limited. (51.0% equity share)
 * Secondary Market Infrastructure Fund (SMIF) = 49%, again still a company in the corporate structure of ownership of UK Courts Services(Manchester) Limited, was taken over by Semperian PPP Investment Partners Holdings Ltd, who transferred their PFI equity (49.0% equity share)
It has capital value £30.0m and estimated total unitary spend of £201.78m, (6.73 times more than the capital amount), with an operational contract period of 25 years.

The HMP Altcourse project (project id: 478) has 1 equity partner(s):
 * G4S Joint Ventures Limited through G4S Joint Ventures (Fazakerley) Limited (100.0% equity share)
It has capital value £68.0m and estimated total unitary spend of £907.02m, (13.34 times more than the capital amount), with an operational contract period of 25 years.

The HMP Parc project (project id: 484) has 2 equity partner(s):
 * G4S SPV Holdings Limited (59.0% equity share)
 * Innisfree M&G PPP LLP (41.0% equity share)
It has capital value £47.0m and estimated total unitary spend of £1018.61m, (21.67 times more than the capital amount), with an operational contract period of 25 years.

The HMP Rye Hill project (project id: 486) has 1 equity partner(s):
 * G4S Joint Ventures Limited through G4S Joint Ventures (Onley) Limited (100.0% equity share)
It has capital value £37.0m and estimated total unitary spend of £449.7m, (12.15 times more than the capital amount), with an operational contract period of 25 years.

The Milton Keynes STC project (project id: 489) has 2 equity partner(s):
 * G4S Justice Services Ltd (50.0% equity share)
 * Trillium PPP Investment Partners LLP (50.0% equity share)
It has capital value £19.0m and estimated total unitary spend of £463.2m, (24.38 times more than the capital amount), with an operational contract period of 25 years.

Let's try another one, and see what we can make of it..

How about Sodexo?

In [96]:
projexco[(projexco['equityName'].str.contains('Sodexo')) & (projexco['year']==2014)]
Out[96]:
index Unique HMT Project ID Project Name SPV name SPV company number equityName equityShare Procuring authority year value valueShare
13035 13035 297 Redevelopment of 3 hospitals for Women, Childr... Catalyst Healthcare (Manchester) Limited 4528418 Sodexo 0.2500 Central Manchester University Hospitals NHS Fo... 2014 71.00 17.750000
14048 14048 317 New block HPC King's College Hospital Limited 3696260 Sodexo Investments 0.2500 King's College Hospital NHS Foundation Trust 2014 22.00 5.500000
14778 14778 329 Hereford DGH Mercia Healthcare Limited 3693524 Sodexo Investment Services Ltd 0.2500 Wye Valley NHS Trust 2014 15.00 3.750000
17065 17065 376 University Hospital Health Care Support (North Staffs) Limited 5309720 Sodexo 0.2500 University Hospital Of North Staffordshire NHS... 2014 56.00 14.000000
17142 17142 377 Wythenshawe Hospital South Manchester Healthcare Limited 3502432 Sodexo Investment Services Limited 0.2500 University Hospital Of South Manchester NHS Fo... 2014 34.00 8.500000
17368 17368 380 Queen Mary's, Roehampton Roehampton Hospital Limited 4057595 Sodexo Investment Services Ltd 0.2500 Wandsworth PCT 2014 14.00 3.500000
21243 21243 480 HMP Bronzefield Ashford Prison Services Ltd 4334205 Sodexo Investment Services Ltd 0.3334 National Offender Management Service (WPR) 2014 31.10 10.368740
21343 21343 482 HMP Forest Bank Agecroft Prison Services Ltd 3509050 Sodexo Investment Services Ltd 0.5000 National Offender Management Service (WPR) 2014 38.85 19.425000
21460 21460 485 HMP Peterborough Peterborough Prison Management Ltd 4350276 Sodexo Investment Services Ltd 0.3334 National Offender Management Service (WPR) 2014 35.58 11.862372
23758 23758 552 Fife Schools PPP1 Pinnacle Schools (Fife) Ltd 4095208 Sodexo Alliance S.A. 0.1000 Fife 2014 7.53 0.753000
25986 25986 599 Addiewell Prison Addiewell Prison Ltd (APL) N/K Sodexo Investment Services Ltd 0.3333 Scottish Prison Service 2014 31.00 10.332300
27362 27362 627 Conwy Schools Project (3 schools) Enterprise Education (Conwy) Ltd 0 Sodexo Investment Services Ltd 0.1000 Conwy 2014 6.96 0.696000

Let's see how they're doing more graphically...

In [97]:
#Generate a data view containing rows where the name of the equity partner includes the word Sodexo
sodexo=projexco[(projexco['equityName'].str.contains('Sodexo'))][['equityName','year','valueShare']]
#For each name, generate the sum total of value shares going to that partner each year
sodexo=sodexo.groupby(['equityName','year']).sum()
sodexo.reset_index(inplace=True)
In [98]:
ggplot(sodexo,aes(x='year',y='valueShare')) \
    + geom_line() \
    + facet_wrap('equityName',scales='fixed') \
    + ggtitle('Sodexo')
Out[98]:
<ggplot: (-9223363284663392389)>
In [99]:
#Hmm - two takes on Sodexo? How come?
sodexo['equityName'].unique()
Out[99]:
array(['Sodexo', 'Sodexo ', 'Sodexo Alliance S.A.',
       'Sodexo Investment Services Limited',
       'Sodexo Investment Services Ltd', 'Sodexo Investments'], dtype=object)

Ah - white space...

In [100]: