Learning how to understand a project management dataset and build models in Python for further analysis

Today we will be exploring a synthetically generated dataset based on the PERIL database. Learn more about the PERIL dataset and it's creator Tom Kendrick here.

To accomplish our task of understanding the data, we will need to use some Python libraries. Python libraries can be summarized as code other people have written and shared that we can use to save us time. We can import and use that code with a small import statement. Some of the packages we will use today include Pandas, Scipy, Numpy, and MatPlotLib. If you want to learn more about how these different packages work with examples, click here. If you would like to see the code that imports these packages - click the show code button below.

In [34]:
#Importing python libraries which help us speed our coding up.
#The packages we're importing are below.
import pandas as pd
import random
from pandas.plotting import scatter_matrix
import scipy as sc
import numpy as np
import matplotlib
import matplotlib.pyplot as plt #Can abbreviate specific parts of the packages so it's easier to type
import sklearn as sk
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import datasets, linear_model
In [4]:
#We will now read the data in and store it in a dataframe called dataset
dataset = pd.read_excel('Peril_Dataset.xlsx', header = 0)
dataset.insert(loc = 0, column = 'Project Number', value = np.arange(len(dataset)))

Understanding our data

Let's run some code and see how many rows and columns we have. Below - you will see the result as (Number of Rows, Number of Columns)

In [5]:
dataset.shape
Out[5]:
(5000, 13)

What do the first 10 rows of our data look like?

In [6]:
dataset.head(10)
Out[6]:
Project Number Parameter Category Sub cat Impact TRL Description Region Numeric Region Project Year Month Cost
0 0 Resource Money Limitation 18 5 Did not have sufficient resources and the cont... 0 Americas IT/Solution 2010 October 700560
1 1 Resource Outsourcing Delayed start 17 4 Contractor setup delayed by a week 0 Americas Prod. Dev. 2002 May 347922
2 2 Resource Outsourcing Late or poor output 13 5 Contractor did not spend time on the project b... 0 Americas IT/Solution 2002 August 919828
3 3 Resource Outsourcing Late or poor output 13 7 Contractors failed to show up as committed 1 Asia IT/Solution 2014 July 521495
4 4 Resource Outsourcing Late or poor output 13 3 Third-party vendor inadvertently introduced a ... 0 Americas IT/Solution 2010 November 322010
5 5 Resource Outsourcing Late or poor output 16 7 Outsouced staff lacked the necessary skillset 3 Africa IT/Solution 2014 June 498896
6 6 Resource People Late start 15 5 Planning delayed due to staff being still tied... 0 Americas Prod. Dev. 2015 April 592785
7 7 Resource People Loss 16 4 Chef quit two days before the café was schedul... 3 Africa IT/Solution 2017 October 578928
8 8 Resource People Motivation 11 5 Work at customer site had to be done by union ... 3 Africa IT/Solution 2003 July 318846
9 9 Resource People Queuing 16 6 Critical task assigned to a heavily booked expert 0 Americas IT/Solution 2010 June 1109120

What are our column names?

As you can see above - we have 12 columns with different values

Project Number: A unique number to identify each project.
Parameter: The parameter of the project that was impacted. This dataset categorizes the risk in these three main groups of Scope, Resource, and Schedule.
Category: The category of the risk. We have the following categories in the main parameters - Change(Scope), Defect(Scope), People(Resource), Money(Resource), Outsourcing(Resource), Delay(Schedule), Dependency(Schedule), Estimates(Schedule)
Sub cat: The sub-category of the risk are the following - Creep, Dependency, Gap, Hardware, Software, Integration, Limitation, Late Start, Loss, Motivation, Queuing, Temp Loss, Decision, Information, Parts, Infrastructure, Legal, Project, Deadline, Judgement, and Learning


Below is the mapping for the Parameter -> Category -> Sub category:

  • Scope -> Change -> Creep
  • Scope -> Change -> Dependency
  • Scope -> Change -> Gap

  • Scope -> Defect -> Hardware

  • Scope -> Defect -> Software
  • Scope -> Defect -> Integration

  • Resource -> Money -> Limitation

  • Resource -> People -> Late Start
  • Resource -> People -> Loss
  • Resource -> People -> Motivation
  • Resource -> People -> Queuing
  • Resource -> People -> Temp Loss

  • Schedule -> Delay -> Decision
  • Schedule -> Delay -> Hardware
  • Schedule -> Delay -> Information
  • Schedule -> Delay -> Parts

  • Schedule -> Dependency -> Infrastructure
  • Schedule -> Dependency -> Legal
  • Schedule -> Dependency -> Project

  • Schedule -> Estimates -> Deadline
  • Schedule -> Estimates -> Judgement
  • Schedule -> Estimates -> Learning

Impact: The impact to the project by number of weeks.
TRL: The technology readiness level of the project. You can find more information here.
Description: The description of the issue.
Region Numeric: A numeric value representing the region (0 = Americas, 1 = Asia, 2 = Eur/ME, 3 = Africa).
Region: What region of the world did the issue occur in.
Project: What type of project did the issue occur in. (Prod. Dev., or IT/Solution) Year: The year the issue occurred in.
Cost: The cost, in USD, of the impact on the project.
Month: The month the project failure occurred in.

Now let's get a breakdown of what are the counts of the values in each column

Here we can see that there are three different types of parameters that each project is grouped into that we have: Scope, Resource, and Schedule. Below the chart are the raw counts of the projects in those parameter groups.

In [129]:
plt.figure(figsize = (3,10))
plot_parameter = dataset['Parameter'].value_counts().plot(kind='bar')
plot_parameter.set_xlabel("Parameter")
plot_parameter.set_ylabel("Count (Number of Projects)")
plot_parameter
Out[129]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1eef67b8>
In [8]:
dataset["Parameter"].value_counts()
Out[8]:
Scope       1991
Resource    1579
Schedule    1430
Name: Parameter, dtype: int64

What is the breakdown of the different Categories that we have? We find the following categories: Change, People, Delay, Defect, Outsourcing, Estimates, Dependency, and Money. Below the chart are the raw counts of the projects in those categories.

In [9]:
plot_category = dataset['Category'].value_counts().plot(kind='bar')
plot_category.set_xlabel("Category")
plot_category.set_ylabel("Count (Number of Projects)")
plot_category
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a19c40518>
In [10]:
dataset["Category"].value_counts()
Out[10]:
Change         1459
People          943
Delay           763
Defect          532
Outsourcing     465
Estimates       369
Dependency      298
Money           171
Name: Category, dtype: int64

What is the breakdown of the sub-categories? Here we find the following: Gap, Creep, Late or poor output, Hardware, Parts, Loss, Temp loss, Queueing, Software, Learning, Limitation, Information, Judgement, Decision, Dependency, Infrastructure, Project, Late start, Delayed start, Integration, Deadline, Motivation, Legal, Project, Infrastructure, and Late or poor output. Below the chart are the raw counts of the projects in those sub-categories.

In [11]:
plot_subcat = dataset['Sub cat'].value_counts().plot(kind='bar')
plot_subcat.set_xlabel("Sub-Category")
plot_subcat.set_ylabel("Count (Number of Projects)")
plot_subcat
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a19c15048>
In [12]:
dataset["Sub cat"].value_counts()
Out[12]:
Gap                     768
Creep                   576
Late or poor output     380
Hardware                373
Parts                   315
Loss                    298
Temp loss               256
Queuing                 250
Software                249
Learning                173
Limitation              171
Information             161
Judgment                138
Decision                134
Dependency              115
Infrastructure          113
Project                  95
Late start               93
Delayed start            76
Integration              63
Deadline                 58
Motivation               51
Legal                    46
Project                  37
Infrastructure            7
Late or poor output       4
Name: Sub cat, dtype: int64

Here you can see that the Region column has four separate regions and the count of values in those regions is provided below in the table.

In [13]:
dataset["Region"].value_counts()
Out[13]:
Americas    2872
Eur/ME       945
Africa       636
Asia         547
Name: Region, dtype: int64

What years do we have in our dataset? 2001-2019

In [14]:
dataset["Year"].value_counts()
Out[14]:
2010    578
2007    435
2019    364
2015    341
2002    267
2013    235
2003    233
2005    226
2012    225
2006    223
2004    223
2018    221
2001    219
2016    212
2011    205
2009    199
2008    199
2017    198
2014    197
Name: Year, dtype: int64

What is the distribution of the Impact levels in our data?

In [53]:
plot_impact = dataset['Impact'].value_counts().plot(kind='bar')
plot_impact.set_xlabel("Impact Level")
plot_impact.set_ylabel("Count (Number of Projects)")
plot_impact
Out[53]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a19666940>

Analysis

Below we'll start looking at analyzing projects from different perspectives to understand which parameters, categories, sub-categories, TRL levels, regions, project types, and dates produced the highest costs and impacts.

The Column Names and Values that can be inputted below:

The Project Number
  • A number between 0-4999.
Parameter
  • Scope
  • Resource
  • Schedule
Category
  • Change
  • Defect
  • Delay
  • Dependency
  • Estimates
  • Money
  • Outsourcing
  • People
Sub cat
  • Creep
  • Dependency
  • Gap
  • Hardware
  • Software
  • Integration
  • Limitation
  • Late Start
  • Loss
  • Motivation
  • Queuing
  • Temp Loss
  • Decision
  • Information
  • Parts
  • Infrastructure
  • Legal
  • Project
  • Deadline
  • Judgement
  • Learning
Impact
  • Numbers between 1-26 in weeks
TRL
  • Numbers between 1-9
Description
Region Numeric
  • 0: Americas
  • 1: Asia
  • 2: Eur/ME
  • 3: Africa
Region
  • Americas
  • Asia
  • Eur/ME
  • Africa
Project
  • IT/Solution
  • Prod. Dev.
Year
  • Years from 2001-2019
Cost
  • Number between 10787 and 2543242
Month
  • Any month in the calendar year (January - December)

What is the cost distribution in our data? We find that we have a median between "500,000" and "1,000,000" and the cost values are skewed to the left.

In [15]:
boxplot = dataset.boxplot(column=['Cost'], vert = False)
boxplot.set_xlabel("Cost (in USD)")
boxplot
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1985d748>

If we want to understand which projects contribute to the outliers, let's filter the data to projects where the cost was greater than $2,000,000. Let's also sort the data so we can see the projects with the highest cost failures first.

In [77]:
dataset_high_cost = dataset[dataset['Cost']>2000000]
dataset_high_cost = dataset_high_cost.sort_values(['Cost'], ascending = False)
dataset_high_cost
Out[77]:
Project Number Parameter Category Sub cat Impact TRL Description Region Numeric Region Project Year Month Cost
3842 3842 Resource People Late start 26 3 Change Management Lead took unexpected medical... 1 Asia IT/Solution 2003 November 2562560
4940 4940 Scope Change Gap 26 8 Chip failed and had to be refabricated with ch... 2 Eur/ME IT/Solution 2018 January 2548442
3705 3705 Resource Outsourcing Delayed start 26 2 Project had to wait until all data collection ... 2 Eur/ME Prod. Dev. 2001 October 2543034
4956 4956 Scope Change Gap 26 6 The scope of the project was poorly defined 0 Americas IT/Solution 2010 September 2535416
4938 4938 Scope Change Gap 26 3 Ship sinks with critical part; forced to repla... 0 Americas Prod. Dev. 2010 February 2533232
4077 4077 Schedule Dependency Project 26 5 Client's insisted on the use of "the latest te... 0 Americas IT/Solution 2010 May 2526342
4991 4991 Resource People Loss 26 2 To expand business, unrealistic commitments we... 3 Africa Prod. Dev. 2004 May 2504268
4788 4788 Resource People Loss 25 2 Contract lab failed to inform us that they had... 3 Africa Prod. Dev. 2002 December 2469325
4549 4549 Resource Money Limitation 26 4 "Minor" changes added and accepted late in pr... 0 Americas IT/Solution 2002 January 2437292
4552 4552 Resource Money Limitation 26 7 Server technology chage was necessitated to su... 2 Eur/ME Prod. Dev. 2015 December 2425748
4133 4133 Resource Money Limitation 24 6 Shipping documents not filled in correctly 0 Americas IT/Solution 2006 March 2383512
4116 4116 Scope Change Gap 25 6 Key contributor out ill at critical time. 0 Americas IT/Solution 2019 December 2371375
3580 3580 Schedule Delay Parts 24 2 Team leader reassigned to another project and ... 0 Americas IT/Solution 2014 November 2366784
4925 4925 Scope Change Creep 25 2 Outsourcing partner in Switzerland did unneede... 0 Americas Prod. Dev. 2010 November 2360300
3394 3394 Schedule Dependency Infrastructure 26 6 Need to upgrade detected only at project end 2 Eur/ME Prod. Dev. 2003 March 2357602
4251 4251 Resource Outsourcing Late or poor output 26 2 Late in project, the budget and staff are cut ... 0 Americas Prod. Dev. 2017 March 2350920
4843 4843 Resource Outsourcing Late start 25 4 Internal supplier problem; needed components d... 3 Africa Prod. Dev. 2013 June 2326750
3730 3730 Resource People Queuing 24 6 Planning delayed due to staff being still tied... 0 Americas Prod. Dev. 2015 August 2324688
4598 4598 Resource Outsourcing Late or poor output 25 6 Product owner having unique knowledge left the... 0 Americas IT/Solution 2010 February 2316300
4797 4797 Scope Change Creep 24 7 Late test results came back negative 3 Africa IT/Solution 2011 January 2309472
3523 3523 Scope Change Gap 24 8 Several employees had to take a leave of absen... 1 Asia Prod. Dev. 2011 August 2301120
4933 4933 Scope Change Gap 25 1 Negotiation and training on a new tool took mo... 0 Americas Prod. Dev. 2015 September 2292350
3740 3740 Resource People Temp loss 23 5 Development plans failed to include all of the... 0 Americas IT/Solution 2016 January 2286108
4641 4641 Resource People Temp loss 24 5 Delay in hiring more engineers due to cost cut... 3 Africa IT/Solution 2010 June 2286048
4753 4753 Scope Change Gap 25 3 Components do not work as documented. Replace... 0 Americas IT/Solution 2015 September 2285325
4393 4393 Resource People Loss 25 3 Software modules in system do not work togethe... 0 Americas Prod. Dev. 2008 April 2276500
4545 4545 Scope Change Gap 23 1 Escalated late changes from Sponsor 0 Americas IT/Solution 2008 January 2276126
4595 4595 Resource Outsourcing Delayed start 25 5 Market change due to Improvements in technolog... 0 Americas IT/Solution 2015 May 2265925
4725 4725 Scope Change Creep 23 9 User specifications not defined until late in ... 0 Americas IT/Solution 2010 July 2259957
4875 4875 Schedule Delay Hardware 25 9 Issues discovered late in testing 0 Americas Prod. Dev. 2015 May 2257500
... ... ... ... ... ... ... ... ... ... ... ... ... ...
4905 4905 Schedule Estimates Judgment 25 6 Lack of testing of older WWW browser versions ... 0 Americas Prod. Dev. 2015 November 2161575
4864 4864 Resource People Queuing 23 1 Lack of understanding by the manager of the sy... 2 Eur/ME Prod. Dev. 2007 April 2157676
4083 4083 Schedule Estimates Learning 22 5 Project delayed due to recent/last minute chan... 1 Asia Prod. Dev. 2014 March 2150148
4599 4599 Resource Outsourcing Late or poor output 25 1 Poor communications resulted in never truly un... 0 Americas IT/Solution 2010 March 2149800
4747 4747 Scope Change Gap 26 3 Product developed for multiple platforms, but ... 0 Americas Prod. Dev. 2010 July 2127450
4883 4883 Schedule Delay Parts 24 4 Apartments were sold before construction begin... 2 Eur/ME IT/Solution 2007 December 2124000
4160 4160 Schedule Delay Parts 24 7 Cable required had been left off the implement... 2 Eur/ME IT/Solution 2006 April 2106312
4000 4000 Resource People Loss 26 9 Client requested "15 minute" change accepted a... 1 Asia Prod. Dev. 2003 November 2098902
4779 4779 Scope Defect Software 23 3 Too few disc drives, and insufficient "owned" ... 0 Americas Prod. Dev. 2015 August 2095208
3518 3518 Schedule Delay Decision 24 2 Poor Fit/Gap analysis 1 Asia Prod. Dev. 2006 November 2094336
4437 4437 Resource Money Limitation 26 5 None of the developers had experience with the... 2 Eur/ME Prod. Dev. 2009 July 2093078
4316 4316 Scope Change Gap 23 9 Lack of spare parts and expertise 0 Americas IT/Solution 2011 August 2092333
3726 3726 Resource People Loss 24 9 Project start delayed; staffing and buy-in lat... 1 Asia IT/Solution 2011 June 2090328
4947 4947 Scope Change Gap 23 4 Inexperienced staff 0 Americas Prod. Dev. 2015 April 2088676
4986 4986 Scope Defect Software 25 6 Initial code release was a major failure 0 Americas IT/Solution 2010 July 2085275
3831 3831 Schedule Delay Hardware 22 4 After more than 2/3 of the original budget had... 2 Eur/ME Prod. Dev. 2003 March 2085204
4835 4835 Resource Outsourcing Late or poor output 24 7 Parallel tasks require resources beyond what i... 2 Eur/ME IT/Solution 2001 July 2082432
4682 4682 Schedule Delay Parts 26 3 Supplier was unable to make a sample correctly 2 Eur/ME IT/Solution 2004 January 2081560
4183 4183 Scope Change Gap 25 9 Scope not defined well; huge software delays 0 Americas IT/Solution 2003 July 2075050
3345 3345 Resource People Temp loss 24 8 Scope expanded during the course of the project 1 Asia IT/Solution 2017 December 2055624
4318 4318 Resource People Temp loss 22 2 Move out of old location required, but new spa... 2 Eur/ME IT/Solution 2017 November 2052248
3557 3557 Resource People Queuing 24 3 Inadequate procedures for acceptance, quality,... 1 Asia Prod. Dev. 2015 May 2052144
3857 3857 Resource People Loss 25 7 Team was staffed by 3 successive new hires in ... 2 Eur/ME IT/Solution 2010 March 2050425
3556 3556 Resource People Queuing 25 1 A pool of unqualified, poorly performing, marg... 1 Asia Prod. Dev. 2013 September 2041850
3898 3898 Resource Outsourcing Late or poor output 21 5 Errors found in data and critrical reports 1 Asia IT/Solution 2011 March 2040171
3878 3878 Resource Outsourcing Late or poor output 22 8 Late stakeholders redefine requiremnts 2 Eur/ME IT/Solution 2017 April 2035968
3235 3235 Scope Defect Hardware 21 9 Conversion from legacy system to SAP has multi... 2 Eur/ME IT/Solution 2017 April 2033829
3577 3577 Schedule Delay Information 22 3 Project team was not aware that the new releas... 2 Eur/ME IT/Solution 2016 January 2023054
4453 4453 Scope Change Creep 24 4 Unexpectedly high demand requires outlisde help 0 Americas Prod. Dev. 2010 April 2018520
3329 3329 Resource People Motivation 24 8 Lack of head count from a 10% company-wide lay... 0 Americas IT/Solution 2003 March 2002488

72 rows × 13 columns

We find 72 rows where the cost value is greater than $2,000,000. Let's plot the parameters, categories, and sub-categories off these project failures to get a better sense of what types of projects comprise these high cost failures.

In [59]:
plot_high_cost = dataset_high_cost['Parameter'].value_counts().plot(kind='bar')
plt.xlabel('Parameter')
plt.ylabel('Count (of Project Failures)')
plt.title('The Count of High Cost Project Failures by Parameter')
Out[59]:
Text(0.5,1,'The Count of High Cost Project Failures by Parameter')
In [60]:
plot_high_cost_by_category = dataset_high_cost['Category'].value_counts().plot(kind='bar')
plt.xlabel('Category')
plt.ylabel('Count (of Project Failures)')
plt.title('The Count of High Cost Project Failures by Category')
Out[60]:
Text(0.5,1,'The Count of High Cost Project Failures by Category')
In [61]:
plot_high_cost_by_subcat = dataset_high_cost['Sub cat'].value_counts().plot(kind='bar')
plt.xlabel('Sub-Category')
plt.ylabel('Count (of Project Failures)')
plt.title('The Count of High Cost Project Failures by Sub-Category')
Out[61]:
Text(0.5,1,'The Count of High Cost Project Failures by Sub-Category')

Let's plot these high cost project failures to see which one has the most.

In [62]:
plot_high_cost_by_region = dataset_high_cost['Region'].value_counts().plot(kind='bar')
plt.xlabel('Region')
plt.ylabel('Count (of Project Failures)')
plt.title('The Count of High Cost Project Failures by Region')
Out[62]:
Text(0.5,1,'The Count of High Cost Project Failures by Region')

Let's say we want to look at the data for the projects in the Americas region - we can run the following code. Let's also sort the data so we can see the projects with the highest cost failures first. If you want to see the data for a difference region then change 'Americas' to a different region (copy and paste the region name from above). For example you can copy and paste the line below to see the data for just the Europe/Middle East region:

Copy and paste the code below:
region_filter = dataset.loc[dataset['Region'] == 'Eur/ME']
region_filter = region_filter.sort_values(['Cost'], ascending = False)
region_filter.head(50)

In [78]:
region_filter = dataset.loc[dataset['Region'] == 'Americas']
region_filter = region_filter.sort_values(['Cost'], ascending = False)
region_filter.head(50) #Let's look at the first 50 values
Out[78]:
Project Number Parameter Category Sub cat Impact TRL Description Region Numeric Region Project Year Month Cost
4956 4956 Scope Change Gap 26 6 The scope of the project was poorly defined 0 Americas IT/Solution 2010 September 2535416
4938 4938 Scope Change Gap 26 3 Ship sinks with critical part; forced to repla... 0 Americas Prod. Dev. 2010 February 2533232
4077 4077 Schedule Dependency Project 26 5 Client's insisted on the use of "the latest te... 0 Americas IT/Solution 2010 May 2526342
4549 4549 Resource Money Limitation 26 4 "Minor" changes added and accepted late in pr... 0 Americas IT/Solution 2002 January 2437292
4133 4133 Resource Money Limitation 24 6 Shipping documents not filled in correctly 0 Americas IT/Solution 2006 March 2383512
4116 4116 Scope Change Gap 25 6 Key contributor out ill at critical time. 0 Americas IT/Solution 2019 December 2371375
3580 3580 Schedule Delay Parts 24 2 Team leader reassigned to another project and ... 0 Americas IT/Solution 2014 November 2366784
4925 4925 Scope Change Creep 25 2 Outsourcing partner in Switzerland did unneede... 0 Americas Prod. Dev. 2010 November 2360300
4251 4251 Resource Outsourcing Late or poor output 26 2 Late in project, the budget and staff are cut ... 0 Americas Prod. Dev. 2017 March 2350920
3730 3730 Resource People Queuing 24 6 Planning delayed due to staff being still tied... 0 Americas Prod. Dev. 2015 August 2324688
4598 4598 Resource Outsourcing Late or poor output 25 6 Product owner having unique knowledge left the... 0 Americas IT/Solution 2010 February 2316300
4933 4933 Scope Change Gap 25 1 Negotiation and training on a new tool took mo... 0 Americas Prod. Dev. 2015 September 2292350
3740 3740 Resource People Temp loss 23 5 Development plans failed to include all of the... 0 Americas IT/Solution 2016 January 2286108
4753 4753 Scope Change Gap 25 3 Components do not work as documented. Replace... 0 Americas IT/Solution 2015 September 2285325
4393 4393 Resource People Loss 25 3 Software modules in system do not work togethe... 0 Americas Prod. Dev. 2008 April 2276500
4545 4545 Scope Change Gap 23 1 Escalated late changes from Sponsor 0 Americas IT/Solution 2008 January 2276126
4595 4595 Resource Outsourcing Delayed start 25 5 Market change due to Improvements in technolog... 0 Americas IT/Solution 2015 May 2265925
4725 4725 Scope Change Creep 23 9 User specifications not defined until late in ... 0 Americas IT/Solution 2010 July 2259957
4875 4875 Schedule Delay Hardware 25 9 Issues discovered late in testing 0 Americas Prod. Dev. 2015 May 2257500
3337 3337 Resource People Queuing 25 6 Building permit was issued late due to roof da... 0 Americas IT/Solution 2002 June 2227875
4970 4970 Scope Change Gap 23 2 Defects discovered in late testing 0 Americas IT/Solution 2010 February 2227527
3469 3469 Scope Defect Integration 26 3 Program Manager squeezes in additional functio... 0 Americas IT/Solution 2014 January 2226432
3882 3882 Schedule Delay Information 25 5 Foundation was inadaquate and needed upgrading 0 Americas IT/Solution 2003 February 2200725
4423 4423 Scope Change Gap 22 7 Partner software is incompatible with current ... 0 Americas Prod. Dev. 2015 November 2180992
4842 4842 Resource Outsourcing Late or poor output 26 8 The Project Lead did not hold the customer/bus... 0 Americas IT/Solution 2015 June 2164214
4905 4905 Schedule Estimates Judgment 25 6 Lack of testing of older WWW browser versions ... 0 Americas Prod. Dev. 2015 November 2161575
4599 4599 Resource Outsourcing Late or poor output 25 1 Poor communications resulted in never truly un... 0 Americas IT/Solution 2010 March 2149800
4747 4747 Scope Change Gap 26 3 Product developed for multiple platforms, but ... 0 Americas Prod. Dev. 2010 July 2127450
4779 4779 Scope Defect Software 23 3 Too few disc drives, and insufficient "owned" ... 0 Americas Prod. Dev. 2015 August 2095208
4316 4316 Scope Change Gap 23 9 Lack of spare parts and expertise 0 Americas IT/Solution 2011 August 2092333
4947 4947 Scope Change Gap 23 4 Inexperienced staff 0 Americas Prod. Dev. 2015 April 2088676
4986 4986 Scope Defect Software 25 6 Initial code release was a major failure 0 Americas IT/Solution 2010 July 2085275
4183 4183 Scope Change Gap 25 9 Scope not defined well; huge software delays 0 Americas IT/Solution 2003 July 2075050
4453 4453 Scope Change Creep 24 4 Unexpectedly high demand requires outlisde help 0 Americas Prod. Dev. 2010 April 2018520
3329 3329 Resource People Motivation 24 8 Lack of head count from a 10% company-wide lay... 0 Americas IT/Solution 2003 March 2002488
4716 4716 Schedule Estimates Judgment 21 4 Changing product definition and poor estimatio... 0 Americas IT/Solution 2015 October 1999221
4256 4256 Resource Outsourcing Late or poor output 23 9 Resources thin and tasks queued in backlogs. 0 Americas Prod. Dev. 2002 July 1994169
4035 4035 Scope Change Gap 22 6 The project team agreed to new requirements; u... 0 Americas IT/Solution 2015 September 1987920
4400 4400 Schedule Dependency Project 24 2 Controlled Document rejected by Approvers 0 Americas IT/Solution 2010 September 1985688
3734 3734 Resource People Temp loss 22 4 Programmers were volunteers, and ultimately al... 0 Americas Prod. Dev. 2015 July 1975996
3615 3615 Scope Change Creep 26 9 Developer added a new routine that was not ful... 0 Americas Prod. Dev. 2008 October 1974674
4660 4660 Schedule Delay Decision 20 1 Key resource tied to a previous project for tw... 0 Americas IT/Solution 2015 March 1973680
4505 4505 Schedule Delay Parts 25 3 Had only 1/3 of the units needed on time 0 Americas IT/Solution 2015 October 1972475
3838 3838 Scope Change Creep 26 7 New application works, but data collected is n... 0 Americas IT/Solution 2005 December 1961206
4650 4650 Resource People Temp loss 21 9 Project team members reassigned. Not replaced... 0 Americas IT/Solution 2010 May 1955856
4994 4994 Schedule Delay Parts 24 7 Freelancer hired was a nutjob, and lacked t... 0 Americas IT/Solution 2015 September 1952040
4009 4009 Resource Outsourcing Delayed start 25 1 Upgrade to software required. Planning and tr... 0 Americas Prod. Dev. 2016 April 1933950
3560 3560 Resource People Queuing 26 2 Six weeks in, a redesign to the most fundament... 0 Americas IT/Solution 2011 July 1928602
4418 4418 Scope Change Gap 23 1 Contributor couldn't work due to injury and ba... 0 Americas IT/Solution 2010 May 1926181
4500 4500 Schedule Delay Hardware 25 6 Needed system delivered to wrong building; los... 0 Americas IT/Solution 2015 September 1923300

What if we want to see the data for projects in the Americas that had a category of money? Let's also sort the data so we can see the projects with the highest cost failures first. See below for the code to do so, you can change the category that you would like by copying and pasting the category you want from above into the code below.

For example, if I want to look at the project in the Americas where the category is People then I could do:
category_filter = category_filter.sort_values(['Cost'], ascending = False)
dataset.loc[(dataset['Category'] == 'People') & (dataset['Region'] == 'Americas')]

In [73]:
category_filter = dataset.loc[(dataset['Category'] == 'Money') & (dataset['Region'] == 'Americas')]
category_filter = category_filter.sort_values(['Cost'], ascending = False)
category_filter.head(50) #Let's look at the first 50 rows where the region is the Americas and the category is Money
Out[73]:
Project Number Parameter Category Sub cat Impact TRL Description Region Numeric Region Project Year Month Cost
4549 4549 Resource Money Limitation 26 4 "Minor" changes added and accepted late in pr... 0 Americas IT/Solution 2002 January 2437292
4133 4133 Resource Money Limitation 24 6 Shipping documents not filled in correctly 0 Americas IT/Solution 2006 March 2383512
4082 4082 Resource Money Limitation 24 1 Components that could have been purchased were... 0 Americas IT/Solution 2017 December 1864824
4434 4434 Resource Money Limitation 26 6 Key analyst is scheduled to work on another pr... 0 Americas Prod. Dev. 2015 November 1809678
4546 4546 Resource Money Limitation 22 5 Project schedule was revised user interface re... 0 Americas Prod. Dev. 2010 July 1724844
4435 4435 Resource Money Limitation 19 6 Architect who knew how to integrate all compon... 0 Americas Prod. Dev. 2010 February 1678745
1077 1077 Resource Money Limitation 21 6 Sponsor fails to provide money 0 Americas Prod. Dev. 2007 November 1646505
1132 1132 Resource Money Limitation 22 3 IT department was underfunded and had insuffie... 0 Americas IT/Solution 2019 January 1594516
4072 4072 Resource Money Limitation 26 9 Delay in getting material when a fermentation ... 0 Americas IT/Solution 2014 February 1552278
1480 1480 Resource Money Limitation 21 6 Internal resources were not available on a ful... 0 Americas IT/Solution 2007 December 1486023
1138 1138 Resource Money Limitation 21 5 Staffing was not sufficient to complete all as... 0 Americas Prod. Dev. 2019 July 1464645
4369 4369 Resource Money Limitation 24 5 Online badge printing requirement completely m... 0 Americas Prod. Dev. 2001 June 1388904
4473 4473 Resource Money Limitation 16 8 The key person responsible for major marketing... 0 Americas IT/Solution 2015 December 1346976
1100 1100 Resource Money Limitation 20 6 Development team had insufficient resources as... 0 Americas Prod. Dev. 2017 August 1342380
1709 1709 Resource Money Limitation 17 6 Chronically understaffed business environment 0 Americas Prod. Dev. 2016 September 1339753
4292 4292 Resource Money Limitation 16 7 New equipment critical to the success of the p... 0 Americas IT/Solution 2010 December 1262176
2049 2049 Resource Money Limitation 20 4 Contractor setup delayed by a week 0 Americas Prod. Dev. 2019 August 1229220
4553 4553 Resource Money Limitation 23 6 Corporate merger delayed project 0 Americas IT/Solution 2006 February 1213710
531 531 Resource Money Limitation 15 6 Lack of funds for travel at launch 0 Americas Prod. Dev. 2014 September 1160100
4387 4387 Resource Money Limitation 14 8 fittings and assemblies were not up to project... 0 Americas Prod. Dev. 2008 July 1135526
4300 4300 Resource Money Limitation 15 2 Errors in input data had to be fixed and repor... 0 Americas IT/Solution 2007 August 1133325
937 937 Resource Money Limitation 15 5 Without funding, all work ceased, contracts we... 0 Americas Prod. Dev. 2001 March 1124565
1078 1078 Resource Money Limitation 22 4 Project needs $150K per month in supplies but ... 0 Americas Prod. Dev. 2007 August 1105742
4134 4134 Resource Money Limitation 17 7 Dependency on other project not discovered unt... 0 Americas IT/Solution 2013 August 1035181
1521 1521 Resource Money Limitation 15 4 Due to a layoff no writer was immediately avai... 0 Americas Prod. Dev. 2009 January 1028115
1963 1963 Resource Money Limitation 22 5 Did not have sufficient resources and the co... 0 Americas IT/Solution 2019 January 1005026
2169 2169 Resource Money Limitation 22 4 Lack of money for sufficient equipment 0 Americas Prod. Dev. 2019 November 981112
3293 3293 Resource Money Limitation 15 4 Project has funding, but limited staff delays ... 0 Americas IT/Solution 2005 April 957585
1328 1328 Resource Money Limitation 20 6 Insufficient resources assigned to the project 0 Americas Prod. Dev. 2007 September 930900
813 813 Resource Money Limitation 18 6 lack of staff availability due to company down... 0 Americas IT/Solution 2007 February 917766
4445 4445 Resource Money Limitation 17 2 Patch to software required for project work; e... 0 Americas IT/Solution 2002 May 885836
2014 2014 Resource Money Limitation 13 6 Major infrastructure project cut back; delayed... 0 Americas IT/Solution 2007 July 883350
1400 1400 Resource Money Limitation 16 3 The travel budget for the project was cut and ... 0 Americas Prod. Dev. 2003 March 849680
76 76 Resource Money Limitation 15 6 Project budget was dramatically cut 0 Americas IT/Solution 2009 May 832665
2769 2769 Resource Money Limitation 20 4 Delays due to bidding issues and protests 0 Americas IT/Solution 2010 August 822300
4361 4361 Resource Money Limitation 15 2 Details of requirements were submitted incorre... 0 Americas Prod. Dev. 2002 June 820110
2405 2405 Resource Money Limitation 16 5 Unexpectedly high demand requires outlisde help 0 Americas IT/Solution 2013 December 809664
2048 2048 Resource Money Limitation 22 7 Did not have sufficient resources and the cont... 0 Americas IT/Solution 2019 April 804584
1329 1329 Resource Money Limitation 11 7 Project Budget was restricted to the bare mini... 0 Americas Prod. Dev. 2009 June 785147
356 356 Resource Money Limitation 13 5 Development team had insufficient resources as... 0 Americas Prod. Dev. 2005 September 779298
355 355 Resource Money Limitation 13 5 Part of the project team ran of money three mo... 0 Americas Prod. Dev. 2009 July 777777
4004 4004 Resource Money Limitation 10 8 Contract had no penalties for missing deadline... 0 Americas IT/Solution 2005 August 759950
1876 1876 Resource Money Limitation 15 6 Layoffs in mid-project 0 Americas IT/Solution 2013 March 758550
102 102 Resource Money Limitation 11 6 Project had limited funding and limited staffing 0 Americas IT/Solution 2012 April 717541
73 73 Resource Money Limitation 10 5 Failed to allocate enough resources and missed... 0 Americas IT/Solution 2002 May 708760
1453 1453 Resource Money Limitation 12 4 Lack of money for sufficient equipment 0 Americas Prod. Dev. 2007 August 701112
0 0 Resource Money Limitation 18 5 Did not have sufficient resources and the cont... 0 Americas IT/Solution 2010 October 700560
2854 2854 Resource Money Limitation 20 4 Unexpectedly high demand requires outlisde help 0 Americas IT/Solution 2005 July 692300
719 719 Resource Money Limitation 11 4 Client slashed the original budget 0 Americas IT/Solution 2014 December 673464
1405 1405 Resource Money Limitation 15 4 Key consultant was let go at a critical time f... 0 Americas IT/Solution 2014 February 664230

How can we visualize this filtered data to understand how TRL level effects the Impact value in projects in the Americas, in the Money Category?

In [79]:
g = category_filter.groupby('TRL', as_index = False)['Impact'].mean()
g['Impact'] = g['Impact'].round(2)
g
Out[79]:
TRL Impact
0 1 14.75
1 2 13.50
2 3 12.45
3 4 14.17
4 5 14.71
5 6 15.54
6 7 11.78
7 8 10.75
8 9 22.00
In [128]:
plt.figure(figsize = (3,10))
h = plt.bar(g['TRL'],g['Impact'])
plt.xlabel('TRL')
plt.ylabel('Impact (in Weeks)')
plt.title('Average Impact by TRL for Projects in the Americas with a Failure in the Money Category')
h
Out[128]:
<BarContainer object of 9 artists>

Now that we can we graph this data easily and analyze the data for our understanding? Let's start with understanding how the average cost per failure varies by region.

In [125]:
y = dataset.groupby('Region', as_index = False)['Cost'].mean()
y['Cost'] = y['Cost'].round(2)
y
Out[125]:
Region Cost
0 Africa 744616.68
1 Americas 752808.41
2 Asia 724886.88
3 Eur/ME 778307.72
In [126]:
plt.figure(figsize = (3,10))
a = plt.bar(y['Region'],y['Cost'])
plt.xlabel('Region')
plt.ylabel('Average Cost (in USD)')
plt.title('Average Cost (in USD) per Region')
a
Out[126]:
<BarContainer object of 4 artists>

What is the Average Cost Per Year for all the Projects?

In [23]:
z = dataset.groupby('Year', as_index = False)['Cost'].mean()
z['Cost'] = z['Cost'].round(2)
z.sort_values(by = ['Year'])
z
Out[23]:
Year Cost
0 2001 766136.24
1 2002 775338.76
2 2003 751868.87
3 2004 760491.31
4 2005 714075.49
5 2006 715143.45
6 2007 711551.25
7 2008 729581.87
8 2009 782631.19
9 2010 801956.27
10 2011 759503.60
11 2012 705645.49
12 2013 701452.11
13 2014 733437.70
14 2015 875305.61
15 2016 729610.30
16 2017 770409.96
17 2018 764912.75
18 2019 699717.54
In [39]:
b = plt.plot(z['Year'],z['Cost'])
plt.xlabel('Year')
plt.ylabel('Average Cost (in USD)')
plt.title('Average Cost (in USD) over Time for Projects in all the Regions')
plt.xticks(np.arange(2001, 2020, 2)) 
b
Out[39]:
[<matplotlib.lines.Line2D at 0x1a1a64beb8>]
In [40]:
y = region_filter.groupby('Year', as_index = False)['Cost'].mean()
y['Cost'] = y['Cost'].round(2)
y = y.sort_values(by = ['Year'])
y
Out[40]:
Year Cost
0 2001 753027.31
1 2002 759695.25
2 2003 730311.18
3 2004 734516.43
4 2005 694079.68
5 2006 704339.55
6 2007 697654.34
7 2008 724117.90
8 2009 800630.01
9 2010 841943.56
10 2011 768046.24
11 2012 743514.16
12 2013 646923.40
13 2014 752774.03
14 2015 901210.34
15 2016 682162.81
16 2017 802036.48
17 2018 741020.30
18 2019 666180.41

Now let's look at the Average Cost Per Year for Projects in the Americas.

In [26]:
a = plt.plot(y['Year'],y['Cost'])
plt.xlabel('Year')
plt.ylabel('Average Cost (in USD)')
plt.title('Average Cost (in USD) over Time for Projects in the Americas')
plt.xticks(np.arange(2001, 2020, 2)) 
a
Out[26]:
[<matplotlib.lines.Line2D at 0x1a1a2bcf98>]

Let's take this one step further - what if we look at the average cost for projects in all the regions but the Americas? First we'll filter the data to only include projects in the Asia, Eur/ME, and Africa regions. Let's also sort the data so we can see the projects with the highest cost failures first.

In [81]:
region_filter_no_americas = dataset.loc[dataset['Region'] != 'Americas']
region_filter_no_americas = region_filter_no_americas.sort_values(['Cost'], ascending = False)
region_filter_no_americas.head(50) #Let's look at the first 50 values
Out[81]:
Project Number Parameter Category Sub cat Impact TRL Description Region Numeric Region Project Year Month Cost
3842 3842 Resource People Late start 26 3 Change Management Lead took unexpected medical... 1 Asia IT/Solution 2003 November 2562560
4940 4940 Scope Change Gap 26 8 Chip failed and had to be refabricated with ch... 2 Eur/ME IT/Solution 2018 January 2548442
3705 3705 Resource Outsourcing Delayed start 26 2 Project had to wait until all data collection ... 2 Eur/ME Prod. Dev. 2001 October 2543034
4991 4991 Resource People Loss 26 2 To expand business, unrealistic commitments we... 3 Africa Prod. Dev. 2004 May 2504268
4788 4788 Resource People Loss 25 2 Contract lab failed to inform us that they had... 3 Africa Prod. Dev. 2002 December 2469325
4552 4552 Resource Money Limitation 26 7 Server technology chage was necessitated to su... 2 Eur/ME Prod. Dev. 2015 December 2425748
3394 3394 Schedule Dependency Infrastructure 26 6 Need to upgrade detected only at project end 2 Eur/ME Prod. Dev. 2003 March 2357602
4843 4843 Resource Outsourcing Late start 25 4 Internal supplier problem; needed components d... 3 Africa Prod. Dev. 2013 June 2326750
4797 4797 Scope Change Creep 24 7 Late test results came back negative 3 Africa IT/Solution 2011 January 2309472
3523 3523 Scope Change Gap 24 8 Several employees had to take a leave of absen... 1 Asia Prod. Dev. 2011 August 2301120
4641 4641 Resource People Temp loss 24 5 Delay in hiring more engineers due to cost cut... 3 Africa IT/Solution 2010 June 2286048
4869 4869 Resource People Temp loss 25 5 Some of the cross functional teams were not in... 3 Africa IT/Solution 2010 October 2242050
4756 4756 Scope Change Gap 25 2 Insufficient length of anchor bolts protruding... 2 Eur/ME IT/Solution 2018 August 2237275
4568 4568 Resource People Queuing 25 5 Insufficient initial project scope was defined... 2 Eur/ME IT/Solution 2005 April 2218325
4675 4675 Schedule Delay Information 25 4 Key parts would be delivered late 2 Eur/ME IT/Solution 2018 September 2206475
3617 3617 Scope Change Creep 25 1 New database wouldn’t work with the proposed p... 2 Eur/ME IT/Solution 2015 May 2204675
3897 3897 Scope Change Creep 26 9 Multiple projects depend on lone expert scientist 1 Asia Prod. Dev. 2006 August 2194530
4864 4864 Resource People Queuing 23 1 Lack of understanding by the manager of the sy... 2 Eur/ME Prod. Dev. 2007 April 2157676
4083 4083 Schedule Estimates Learning 22 5 Project delayed due to recent/last minute chan... 1 Asia Prod. Dev. 2014 March 2150148
4883 4883 Schedule Delay Parts 24 4 Apartments were sold before construction begin... 2 Eur/ME IT/Solution 2007 December 2124000
4160 4160 Schedule Delay Parts 24 7 Cable required had been left off the implement... 2 Eur/ME IT/Solution 2006 April 2106312
4000 4000 Resource People Loss 26 9 Client requested "15 minute" change accepted a... 1 Asia Prod. Dev. 2003 November 2098902
3518 3518 Schedule Delay Decision 24 2 Poor Fit/Gap analysis 1 Asia Prod. Dev. 2006 November 2094336
4437 4437 Resource Money Limitation 26 5 None of the developers had experience with the... 2 Eur/ME Prod. Dev. 2009 July 2093078
3726 3726 Resource People Loss 24 9 Project start delayed; staffing and buy-in lat... 1 Asia IT/Solution 2011 June 2090328
3831 3831 Schedule Delay Hardware 22 4 After more than 2/3 of the original budget had... 2 Eur/ME Prod. Dev. 2003 March 2085204
4835 4835 Resource Outsourcing Late or poor output 24 7 Parallel tasks require resources beyond what i... 2 Eur/ME IT/Solution 2001 July 2082432
4682 4682 Schedule Delay Parts 26 3 Supplier was unable to make a sample correctly 2 Eur/ME IT/Solution 2004 January 2081560
3345 3345 Resource People Temp loss 24 8 Scope expanded during the course of the project 1 Asia IT/Solution 2017 December 2055624
4318 4318 Resource People Temp loss 22 2 Move out of old location required, but new spa... 2 Eur/ME IT/Solution 2017 November 2052248
3557 3557 Resource People Queuing 24 3 Inadequate procedures for acceptance, quality,... 1 Asia Prod. Dev. 2015 May 2052144
3857 3857 Resource People Loss 25 7 Team was staffed by 3 successive new hires in ... 2 Eur/ME IT/Solution 2010 March 2050425
3556 3556 Resource People Queuing 25 1 A pool of unqualified, poorly performing, marg... 1 Asia Prod. Dev. 2013 September 2041850
3898 3898 Resource Outsourcing Late or poor output 21 5 Errors found in data and critrical reports 1 Asia IT/Solution 2011 March 2040171
3878 3878 Resource Outsourcing Late or poor output 22 8 Late stakeholders redefine requiremnts 2 Eur/ME IT/Solution 2017 April 2035968
3235 3235 Scope Defect Hardware 21 9 Conversion from legacy system to SAP has multi... 2 Eur/ME IT/Solution 2017 April 2033829
3577 3577 Schedule Delay Information 22 3 Project team was not aware that the new releas... 2 Eur/ME IT/Solution 2016 January 2023054
4816 4816 Scope Change Creep 20 3 Inadequate staffing 2 Eur/ME Prod. Dev. 2018 March 1997540
4517 4517 Scope Change Creep 20 9 Needed wiring not available on time 2 Eur/ME Prod. Dev. 2002 February 1992660
4449 4449 Resource People Queuing 22 3 Work at customer site had to be done by union ... 2 Eur/ME Prod. Dev. 2002 June 1991176
4637 4637 Resource People Queuing 21 9 Contract lab failed to inform us that they had... 2 Eur/ME IT/Solution 2015 April 1977528
4815 4815 Scope Change Creep 20 5 Client slashed the original budget 3 Africa Prod. Dev. 2018 February 1974940
3800 3800 Schedule Delay Parts 24 4 Did not have the correct people validate requi... 1 Asia IT/Solution 2004 February 1971720
3191 3191 Resource People Temp loss 21 9 Redesign required late in printer project to m... 2 Eur/ME Prod. Dev. 2003 October 1971123
4477 4477 Resource Outsourcing Late or poor output 22 7 All students did not complete training as sche... 2 Eur/ME IT/Solution 2019 October 1968626
4669 4669 Schedule Delay Hardware 23 9 Decisions are being delayed without apparent r... 3 Africa Prod. Dev. 2013 June 1964062
4326 4326 Resource People Queuing 25 6 Approvals held up at project completion to wai... 2 Eur/ME IT/Solution 2003 June 1960725
3574 3574 Schedule Delay Hardware 26 7 In creating a new system, there was too little... 1 Asia Prod. Dev. 2002 August 1952574
4405 4405 Scope Change Creep 24 5 Delay in patient recruitment for clinical trials 2 Eur/ME IT/Solution 2008 January 1948032
3673 3673 Scope Defect Hardware 22 4 No understanding of the scope and not enough s... 1 Asia Prod. Dev. 2001 March 1934790
In [42]:
c = region_filter_no_americas.groupby('Year', as_index = False)['Cost'].mean()
c['Cost'] = c['Cost'].round(2)
c = c.sort_values(by = ['Year'])
c
Out[42]:
Year Cost
0 2001 779125.99
1 2002 795394.55
2 2003 774372.07
3 2004 784885.11
4 2005 733375.79
5 2006 727068.51
6 2007 744882.14
7 2008 734674.51
8 2009 757989.94
9 2010 730288.23
10 2011 749011.00
11 2012 669423.28
12 2013 756446.87
13 2014 713904.06
14 2015 806226.34
15 2016 789172.05
16 2017 742397.90
17 2018 785768.02
18 2019 792031.09
In [43]:
l = plt.plot(c['Year'],c['Cost'])
plt.xlabel('Year')
plt.ylabel('Average Cost (in USD)')
plt.title('Average Cost (in USD) over Time for Projects not in the Americas')
plt.xticks(np.arange(2001, 2020, 2)) 
l
Out[43]:
[<matplotlib.lines.Line2D at 0x1a1a6b6eb8>]

Now that we've looked at the cost breakdown over time per region - let's look at the difference in the average Impact (in weeks) per region.

In [30]:
d = dataset.groupby('Region', as_index = False)['Impact'].mean()
d['Impact'] = d['Impact'].round(2)
d
Out[30]:
Region Impact
0 Africa 13.53
1 Americas 13.51
2 Asia 13.29
3 Eur/ME 13.41

Let's also take a look at how TRL values affect Impact.

In [31]:
e = dataset.groupby('TRL', as_index = False)['Impact'].mean()
e['Impact'] = e['Impact'].round(2)
e
Out[31]:
TRL Impact
0 1 13.70
1 2 13.13
2 3 13.45
3 4 13.37
4 5 13.76
5 6 13.83
6 7 13.43
7 8 12.82
8 9 12.29
In [127]:
plt.figure(figsize = (3,10))
f = plt.bar(e['TRL'],e['Impact'])
plt.xlabel('TRL')
plt.ylabel('Average Impact (in Weeks)')
plt.title('Average Impact (in Weeks) per TRL')
plt.yticks(np.arange(0, 13, 3)) 
f
Out[127]:
<BarContainer object of 9 artists>