To accomplish our task of understanding the data, we will need to use some Python libraries. Python libraries can be summarized as code other people have written and shared that we can use to save us time. We can import and use that code with a small import statement. Some of the packages we will use today include Pandas, Scipy, Numpy, and MatPlotLib. If you want to learn more about how these different packages work with examples, click here. If you would like to see the code that imports these packages - click the show code button below.
#Importing python libraries which help us speed our coding up.
#The packages we're importing are below.
import pandas as pd
import random
from pandas.plotting import scatter_matrix
import scipy as sc
import numpy as np
import matplotlib
import matplotlib.pyplot as plt #Can abbreviate specific parts of the packages so it's easier to type
import sklearn as sk
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import datasets, linear_model
#We will now read the data in and store it in a dataframe called dataset
dataset = pd.read_excel('Peril_Dataset.xlsx', header = 0)
dataset.insert(loc = 0, column = 'Project Number', value = np.arange(len(dataset)))
Let's run some code and see how many rows and columns we have. Below - you will see the result as (Number of Rows, Number of Columns)
dataset.shape
(5000, 13)
dataset.head(10)
Project Number | Parameter | Category | Sub cat | Impact | TRL | Description | Region Numeric | Region | Project | Year | Month | Cost | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Resource | Money | Limitation | 18 | 5 | Did not have sufficient resources and the cont... | 0 | Americas | IT/Solution | 2010 | October | 700560 |
1 | 1 | Resource | Outsourcing | Delayed start | 17 | 4 | Contractor setup delayed by a week | 0 | Americas | Prod. Dev. | 2002 | May | 347922 |
2 | 2 | Resource | Outsourcing | Late or poor output | 13 | 5 | Contractor did not spend time on the project b... | 0 | Americas | IT/Solution | 2002 | August | 919828 |
3 | 3 | Resource | Outsourcing | Late or poor output | 13 | 7 | Contractors failed to show up as committed | 1 | Asia | IT/Solution | 2014 | July | 521495 |
4 | 4 | Resource | Outsourcing | Late or poor output | 13 | 3 | Third-party vendor inadvertently introduced a ... | 0 | Americas | IT/Solution | 2010 | November | 322010 |
5 | 5 | Resource | Outsourcing | Late or poor output | 16 | 7 | Outsouced staff lacked the necessary skillset | 3 | Africa | IT/Solution | 2014 | June | 498896 |
6 | 6 | Resource | People | Late start | 15 | 5 | Planning delayed due to staff being still tied... | 0 | Americas | Prod. Dev. | 2015 | April | 592785 |
7 | 7 | Resource | People | Loss | 16 | 4 | Chef quit two days before the café was schedul... | 3 | Africa | IT/Solution | 2017 | October | 578928 |
8 | 8 | Resource | People | Motivation | 11 | 5 | Work at customer site had to be done by union ... | 3 | Africa | IT/Solution | 2003 | July | 318846 |
9 | 9 | Resource | People | Queuing | 16 | 6 | Critical task assigned to a heavily booked expert | 0 | Americas | IT/Solution | 2010 | June | 1109120 |
Project Number: A unique number to identify each project.
Parameter: The parameter of the project that was impacted. This dataset categorizes the risk in these three main groups of Scope, Resource, and Schedule.
Category: The category of the risk. We have the following categories in the main parameters - Change(Scope), Defect(Scope), People(Resource), Money(Resource), Outsourcing(Resource), Delay(Schedule), Dependency(Schedule), Estimates(Schedule)
Sub cat: The sub-category of the risk are the following - Creep, Dependency, Gap, Hardware, Software, Integration, Limitation, Late Start, Loss, Motivation, Queuing, Temp Loss, Decision, Information, Parts, Infrastructure, Legal, Project, Deadline, Judgement, and Learning
Impact: The impact to the project by number of weeks.
TRL: The technology readiness level of the project. You can find more information here.
Description: The description of the issue.
Region Numeric: A numeric value representing the region (0 = Americas, 1 = Asia, 2 = Eur/ME, 3 = Africa).
Region: What region of the world did the issue occur in.
Project: What type of project did the issue occur in. (Prod. Dev., or IT/Solution)
Year: The year the issue occurred in.
Cost: The cost, in USD, of the impact on the project.
Month: The month the project failure occurred in.
Here we can see that there are three different types of parameters that each project is grouped into that we have: Scope, Resource, and Schedule. Below the chart are the raw counts of the projects in those parameter groups.
plt.figure(figsize = (3,10))
plot_parameter = dataset['Parameter'].value_counts().plot(kind='bar')
plot_parameter.set_xlabel("Parameter")
plot_parameter.set_ylabel("Count (Number of Projects)")
plot_parameter
<matplotlib.axes._subplots.AxesSubplot at 0x1a1eef67b8>
dataset["Parameter"].value_counts()
Scope 1991 Resource 1579 Schedule 1430 Name: Parameter, dtype: int64
What is the breakdown of the different Categories that we have? We find the following categories: Change, People, Delay, Defect, Outsourcing, Estimates, Dependency, and Money. Below the chart are the raw counts of the projects in those categories.
plot_category = dataset['Category'].value_counts().plot(kind='bar')
plot_category.set_xlabel("Category")
plot_category.set_ylabel("Count (Number of Projects)")
plot_category
<matplotlib.axes._subplots.AxesSubplot at 0x1a19c40518>
dataset["Category"].value_counts()
Change 1459 People 943 Delay 763 Defect 532 Outsourcing 465 Estimates 369 Dependency 298 Money 171 Name: Category, dtype: int64
What is the breakdown of the sub-categories? Here we find the following: Gap, Creep, Late or poor output, Hardware, Parts, Loss, Temp loss, Queueing, Software, Learning, Limitation, Information, Judgement, Decision, Dependency, Infrastructure, Project, Late start, Delayed start, Integration, Deadline, Motivation, Legal, Project, Infrastructure, and Late or poor output. Below the chart are the raw counts of the projects in those sub-categories.
plot_subcat = dataset['Sub cat'].value_counts().plot(kind='bar')
plot_subcat.set_xlabel("Sub-Category")
plot_subcat.set_ylabel("Count (Number of Projects)")
plot_subcat
<matplotlib.axes._subplots.AxesSubplot at 0x1a19c15048>
dataset["Sub cat"].value_counts()
Gap 768 Creep 576 Late or poor output 380 Hardware 373 Parts 315 Loss 298 Temp loss 256 Queuing 250 Software 249 Learning 173 Limitation 171 Information 161 Judgment 138 Decision 134 Dependency 115 Infrastructure 113 Project 95 Late start 93 Delayed start 76 Integration 63 Deadline 58 Motivation 51 Legal 46 Project 37 Infrastructure 7 Late or poor output 4 Name: Sub cat, dtype: int64
Here you can see that the Region column has four separate regions and the count of values in those regions is provided below in the table.
dataset["Region"].value_counts()
Americas 2872 Eur/ME 945 Africa 636 Asia 547 Name: Region, dtype: int64
What years do we have in our dataset? 2001-2019
dataset["Year"].value_counts()
2010 578 2007 435 2019 364 2015 341 2002 267 2013 235 2003 233 2005 226 2012 225 2006 223 2004 223 2018 221 2001 219 2016 212 2011 205 2009 199 2008 199 2017 198 2014 197 Name: Year, dtype: int64
What is the distribution of the Impact levels in our data?
plot_impact = dataset['Impact'].value_counts().plot(kind='bar')
plot_impact.set_xlabel("Impact Level")
plot_impact.set_ylabel("Count (Number of Projects)")
plot_impact
<matplotlib.axes._subplots.AxesSubplot at 0x1a19666940>
The Column Names and Values that can be inputted below:
What is the cost distribution in our data? We find that we have a median between "500,000" and "1,000,000" and the cost values are skewed to the left.
boxplot = dataset.boxplot(column=['Cost'], vert = False)
boxplot.set_xlabel("Cost (in USD)")
boxplot
<matplotlib.axes._subplots.AxesSubplot at 0x1a1985d748>
If we want to understand which projects contribute to the outliers, let's filter the data to projects where the cost was greater than $2,000,000. Let's also sort the data so we can see the projects with the highest cost failures first.
dataset_high_cost = dataset[dataset['Cost']>2000000]
dataset_high_cost = dataset_high_cost.sort_values(['Cost'], ascending = False)
dataset_high_cost
Project Number | Parameter | Category | Sub cat | Impact | TRL | Description | Region Numeric | Region | Project | Year | Month | Cost | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3842 | 3842 | Resource | People | Late start | 26 | 3 | Change Management Lead took unexpected medical... | 1 | Asia | IT/Solution | 2003 | November | 2562560 |
4940 | 4940 | Scope | Change | Gap | 26 | 8 | Chip failed and had to be refabricated with ch... | 2 | Eur/ME | IT/Solution | 2018 | January | 2548442 |
3705 | 3705 | Resource | Outsourcing | Delayed start | 26 | 2 | Project had to wait until all data collection ... | 2 | Eur/ME | Prod. Dev. | 2001 | October | 2543034 |
4956 | 4956 | Scope | Change | Gap | 26 | 6 | The scope of the project was poorly defined | 0 | Americas | IT/Solution | 2010 | September | 2535416 |
4938 | 4938 | Scope | Change | Gap | 26 | 3 | Ship sinks with critical part; forced to repla... | 0 | Americas | Prod. Dev. | 2010 | February | 2533232 |
4077 | 4077 | Schedule | Dependency | Project | 26 | 5 | Client's insisted on the use of "the latest te... | 0 | Americas | IT/Solution | 2010 | May | 2526342 |
4991 | 4991 | Resource | People | Loss | 26 | 2 | To expand business, unrealistic commitments we... | 3 | Africa | Prod. Dev. | 2004 | May | 2504268 |
4788 | 4788 | Resource | People | Loss | 25 | 2 | Contract lab failed to inform us that they had... | 3 | Africa | Prod. Dev. | 2002 | December | 2469325 |
4549 | 4549 | Resource | Money | Limitation | 26 | 4 | "Minor" changes added and accepted late in pr... | 0 | Americas | IT/Solution | 2002 | January | 2437292 |
4552 | 4552 | Resource | Money | Limitation | 26 | 7 | Server technology chage was necessitated to su... | 2 | Eur/ME | Prod. Dev. | 2015 | December | 2425748 |
4133 | 4133 | Resource | Money | Limitation | 24 | 6 | Shipping documents not filled in correctly | 0 | Americas | IT/Solution | 2006 | March | 2383512 |
4116 | 4116 | Scope | Change | Gap | 25 | 6 | Key contributor out ill at critical time. | 0 | Americas | IT/Solution | 2019 | December | 2371375 |
3580 | 3580 | Schedule | Delay | Parts | 24 | 2 | Team leader reassigned to another project and ... | 0 | Americas | IT/Solution | 2014 | November | 2366784 |
4925 | 4925 | Scope | Change | Creep | 25 | 2 | Outsourcing partner in Switzerland did unneede... | 0 | Americas | Prod. Dev. | 2010 | November | 2360300 |
3394 | 3394 | Schedule | Dependency | Infrastructure | 26 | 6 | Need to upgrade detected only at project end | 2 | Eur/ME | Prod. Dev. | 2003 | March | 2357602 |
4251 | 4251 | Resource | Outsourcing | Late or poor output | 26 | 2 | Late in project, the budget and staff are cut ... | 0 | Americas | Prod. Dev. | 2017 | March | 2350920 |
4843 | 4843 | Resource | Outsourcing | Late start | 25 | 4 | Internal supplier problem; needed components d... | 3 | Africa | Prod. Dev. | 2013 | June | 2326750 |
3730 | 3730 | Resource | People | Queuing | 24 | 6 | Planning delayed due to staff being still tied... | 0 | Americas | Prod. Dev. | 2015 | August | 2324688 |
4598 | 4598 | Resource | Outsourcing | Late or poor output | 25 | 6 | Product owner having unique knowledge left the... | 0 | Americas | IT/Solution | 2010 | February | 2316300 |
4797 | 4797 | Scope | Change | Creep | 24 | 7 | Late test results came back negative | 3 | Africa | IT/Solution | 2011 | January | 2309472 |
3523 | 3523 | Scope | Change | Gap | 24 | 8 | Several employees had to take a leave of absen... | 1 | Asia | Prod. Dev. | 2011 | August | 2301120 |
4933 | 4933 | Scope | Change | Gap | 25 | 1 | Negotiation and training on a new tool took mo... | 0 | Americas | Prod. Dev. | 2015 | September | 2292350 |
3740 | 3740 | Resource | People | Temp loss | 23 | 5 | Development plans failed to include all of the... | 0 | Americas | IT/Solution | 2016 | January | 2286108 |
4641 | 4641 | Resource | People | Temp loss | 24 | 5 | Delay in hiring more engineers due to cost cut... | 3 | Africa | IT/Solution | 2010 | June | 2286048 |
4753 | 4753 | Scope | Change | Gap | 25 | 3 | Components do not work as documented. Replace... | 0 | Americas | IT/Solution | 2015 | September | 2285325 |
4393 | 4393 | Resource | People | Loss | 25 | 3 | Software modules in system do not work togethe... | 0 | Americas | Prod. Dev. | 2008 | April | 2276500 |
4545 | 4545 | Scope | Change | Gap | 23 | 1 | Escalated late changes from Sponsor | 0 | Americas | IT/Solution | 2008 | January | 2276126 |
4595 | 4595 | Resource | Outsourcing | Delayed start | 25 | 5 | Market change due to Improvements in technolog... | 0 | Americas | IT/Solution | 2015 | May | 2265925 |
4725 | 4725 | Scope | Change | Creep | 23 | 9 | User specifications not defined until late in ... | 0 | Americas | IT/Solution | 2010 | July | 2259957 |
4875 | 4875 | Schedule | Delay | Hardware | 25 | 9 | Issues discovered late in testing | 0 | Americas | Prod. Dev. | 2015 | May | 2257500 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4905 | 4905 | Schedule | Estimates | Judgment | 25 | 6 | Lack of testing of older WWW browser versions ... | 0 | Americas | Prod. Dev. | 2015 | November | 2161575 |
4864 | 4864 | Resource | People | Queuing | 23 | 1 | Lack of understanding by the manager of the sy... | 2 | Eur/ME | Prod. Dev. | 2007 | April | 2157676 |
4083 | 4083 | Schedule | Estimates | Learning | 22 | 5 | Project delayed due to recent/last minute chan... | 1 | Asia | Prod. Dev. | 2014 | March | 2150148 |
4599 | 4599 | Resource | Outsourcing | Late or poor output | 25 | 1 | Poor communications resulted in never truly un... | 0 | Americas | IT/Solution | 2010 | March | 2149800 |
4747 | 4747 | Scope | Change | Gap | 26 | 3 | Product developed for multiple platforms, but ... | 0 | Americas | Prod. Dev. | 2010 | July | 2127450 |
4883 | 4883 | Schedule | Delay | Parts | 24 | 4 | Apartments were sold before construction begin... | 2 | Eur/ME | IT/Solution | 2007 | December | 2124000 |
4160 | 4160 | Schedule | Delay | Parts | 24 | 7 | Cable required had been left off the implement... | 2 | Eur/ME | IT/Solution | 2006 | April | 2106312 |
4000 | 4000 | Resource | People | Loss | 26 | 9 | Client requested "15 minute" change accepted a... | 1 | Asia | Prod. Dev. | 2003 | November | 2098902 |
4779 | 4779 | Scope | Defect | Software | 23 | 3 | Too few disc drives, and insufficient "owned" ... | 0 | Americas | Prod. Dev. | 2015 | August | 2095208 |
3518 | 3518 | Schedule | Delay | Decision | 24 | 2 | Poor Fit/Gap analysis | 1 | Asia | Prod. Dev. | 2006 | November | 2094336 |
4437 | 4437 | Resource | Money | Limitation | 26 | 5 | None of the developers had experience with the... | 2 | Eur/ME | Prod. Dev. | 2009 | July | 2093078 |
4316 | 4316 | Scope | Change | Gap | 23 | 9 | Lack of spare parts and expertise | 0 | Americas | IT/Solution | 2011 | August | 2092333 |
3726 | 3726 | Resource | People | Loss | 24 | 9 | Project start delayed; staffing and buy-in lat... | 1 | Asia | IT/Solution | 2011 | June | 2090328 |
4947 | 4947 | Scope | Change | Gap | 23 | 4 | Inexperienced staff | 0 | Americas | Prod. Dev. | 2015 | April | 2088676 |
4986 | 4986 | Scope | Defect | Software | 25 | 6 | Initial code release was a major failure | 0 | Americas | IT/Solution | 2010 | July | 2085275 |
3831 | 3831 | Schedule | Delay | Hardware | 22 | 4 | After more than 2/3 of the original budget had... | 2 | Eur/ME | Prod. Dev. | 2003 | March | 2085204 |
4835 | 4835 | Resource | Outsourcing | Late or poor output | 24 | 7 | Parallel tasks require resources beyond what i... | 2 | Eur/ME | IT/Solution | 2001 | July | 2082432 |
4682 | 4682 | Schedule | Delay | Parts | 26 | 3 | Supplier was unable to make a sample correctly | 2 | Eur/ME | IT/Solution | 2004 | January | 2081560 |
4183 | 4183 | Scope | Change | Gap | 25 | 9 | Scope not defined well; huge software delays | 0 | Americas | IT/Solution | 2003 | July | 2075050 |
3345 | 3345 | Resource | People | Temp loss | 24 | 8 | Scope expanded during the course of the project | 1 | Asia | IT/Solution | 2017 | December | 2055624 |
4318 | 4318 | Resource | People | Temp loss | 22 | 2 | Move out of old location required, but new spa... | 2 | Eur/ME | IT/Solution | 2017 | November | 2052248 |
3557 | 3557 | Resource | People | Queuing | 24 | 3 | Inadequate procedures for acceptance, quality,... | 1 | Asia | Prod. Dev. | 2015 | May | 2052144 |
3857 | 3857 | Resource | People | Loss | 25 | 7 | Team was staffed by 3 successive new hires in ... | 2 | Eur/ME | IT/Solution | 2010 | March | 2050425 |
3556 | 3556 | Resource | People | Queuing | 25 | 1 | A pool of unqualified, poorly performing, marg... | 1 | Asia | Prod. Dev. | 2013 | September | 2041850 |
3898 | 3898 | Resource | Outsourcing | Late or poor output | 21 | 5 | Errors found in data and critrical reports | 1 | Asia | IT/Solution | 2011 | March | 2040171 |
3878 | 3878 | Resource | Outsourcing | Late or poor output | 22 | 8 | Late stakeholders redefine requiremnts | 2 | Eur/ME | IT/Solution | 2017 | April | 2035968 |
3235 | 3235 | Scope | Defect | Hardware | 21 | 9 | Conversion from legacy system to SAP has multi... | 2 | Eur/ME | IT/Solution | 2017 | April | 2033829 |
3577 | 3577 | Schedule | Delay | Information | 22 | 3 | Project team was not aware that the new releas... | 2 | Eur/ME | IT/Solution | 2016 | January | 2023054 |
4453 | 4453 | Scope | Change | Creep | 24 | 4 | Unexpectedly high demand requires outlisde help | 0 | Americas | Prod. Dev. | 2010 | April | 2018520 |
3329 | 3329 | Resource | People | Motivation | 24 | 8 | Lack of head count from a 10% company-wide lay... | 0 | Americas | IT/Solution | 2003 | March | 2002488 |
72 rows × 13 columns
We find 72 rows where the cost value is greater than $2,000,000. Let's plot the parameters, categories, and sub-categories off these project failures to get a better sense of what types of projects comprise these high cost failures.
plot_high_cost = dataset_high_cost['Parameter'].value_counts().plot(kind='bar')
plt.xlabel('Parameter')
plt.ylabel('Count (of Project Failures)')
plt.title('The Count of High Cost Project Failures by Parameter')
Text(0.5,1,'The Count of High Cost Project Failures by Parameter')
plot_high_cost_by_category = dataset_high_cost['Category'].value_counts().plot(kind='bar')
plt.xlabel('Category')
plt.ylabel('Count (of Project Failures)')
plt.title('The Count of High Cost Project Failures by Category')
Text(0.5,1,'The Count of High Cost Project Failures by Category')
plot_high_cost_by_subcat = dataset_high_cost['Sub cat'].value_counts().plot(kind='bar')
plt.xlabel('Sub-Category')
plt.ylabel('Count (of Project Failures)')
plt.title('The Count of High Cost Project Failures by Sub-Category')
Text(0.5,1,'The Count of High Cost Project Failures by Sub-Category')
Let's plot these high cost project failures to see which one has the most.
plot_high_cost_by_region = dataset_high_cost['Region'].value_counts().plot(kind='bar')
plt.xlabel('Region')
plt.ylabel('Count (of Project Failures)')
plt.title('The Count of High Cost Project Failures by Region')
Text(0.5,1,'The Count of High Cost Project Failures by Region')
Let's say we want to look at the data for the projects in the Americas region - we can run the following code. Let's also sort the data so we can see the projects with the highest cost failures first. If you want to see the data for a difference region then change 'Americas' to a different region (copy and paste the region name from above). For example you can copy and paste the line below to see the data for just the Europe/Middle East region:
Copy and paste the code below:
region_filter = dataset.loc[dataset['Region'] == 'Eur/ME']
region_filter = region_filter.sort_values(['Cost'], ascending = False)
region_filter.head(50)
region_filter = dataset.loc[dataset['Region'] == 'Americas']
region_filter = region_filter.sort_values(['Cost'], ascending = False)
region_filter.head(50) #Let's look at the first 50 values
Project Number | Parameter | Category | Sub cat | Impact | TRL | Description | Region Numeric | Region | Project | Year | Month | Cost | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4956 | 4956 | Scope | Change | Gap | 26 | 6 | The scope of the project was poorly defined | 0 | Americas | IT/Solution | 2010 | September | 2535416 |
4938 | 4938 | Scope | Change | Gap | 26 | 3 | Ship sinks with critical part; forced to repla... | 0 | Americas | Prod. Dev. | 2010 | February | 2533232 |
4077 | 4077 | Schedule | Dependency | Project | 26 | 5 | Client's insisted on the use of "the latest te... | 0 | Americas | IT/Solution | 2010 | May | 2526342 |
4549 | 4549 | Resource | Money | Limitation | 26 | 4 | "Minor" changes added and accepted late in pr... | 0 | Americas | IT/Solution | 2002 | January | 2437292 |
4133 | 4133 | Resource | Money | Limitation | 24 | 6 | Shipping documents not filled in correctly | 0 | Americas | IT/Solution | 2006 | March | 2383512 |
4116 | 4116 | Scope | Change | Gap | 25 | 6 | Key contributor out ill at critical time. | 0 | Americas | IT/Solution | 2019 | December | 2371375 |
3580 | 3580 | Schedule | Delay | Parts | 24 | 2 | Team leader reassigned to another project and ... | 0 | Americas | IT/Solution | 2014 | November | 2366784 |
4925 | 4925 | Scope | Change | Creep | 25 | 2 | Outsourcing partner in Switzerland did unneede... | 0 | Americas | Prod. Dev. | 2010 | November | 2360300 |
4251 | 4251 | Resource | Outsourcing | Late or poor output | 26 | 2 | Late in project, the budget and staff are cut ... | 0 | Americas | Prod. Dev. | 2017 | March | 2350920 |
3730 | 3730 | Resource | People | Queuing | 24 | 6 | Planning delayed due to staff being still tied... | 0 | Americas | Prod. Dev. | 2015 | August | 2324688 |
4598 | 4598 | Resource | Outsourcing | Late or poor output | 25 | 6 | Product owner having unique knowledge left the... | 0 | Americas | IT/Solution | 2010 | February | 2316300 |
4933 | 4933 | Scope | Change | Gap | 25 | 1 | Negotiation and training on a new tool took mo... | 0 | Americas | Prod. Dev. | 2015 | September | 2292350 |
3740 | 3740 | Resource | People | Temp loss | 23 | 5 | Development plans failed to include all of the... | 0 | Americas | IT/Solution | 2016 | January | 2286108 |
4753 | 4753 | Scope | Change | Gap | 25 | 3 | Components do not work as documented. Replace... | 0 | Americas | IT/Solution | 2015 | September | 2285325 |
4393 | 4393 | Resource | People | Loss | 25 | 3 | Software modules in system do not work togethe... | 0 | Americas | Prod. Dev. | 2008 | April | 2276500 |
4545 | 4545 | Scope | Change | Gap | 23 | 1 | Escalated late changes from Sponsor | 0 | Americas | IT/Solution | 2008 | January | 2276126 |
4595 | 4595 | Resource | Outsourcing | Delayed start | 25 | 5 | Market change due to Improvements in technolog... | 0 | Americas | IT/Solution | 2015 | May | 2265925 |
4725 | 4725 | Scope | Change | Creep | 23 | 9 | User specifications not defined until late in ... | 0 | Americas | IT/Solution | 2010 | July | 2259957 |
4875 | 4875 | Schedule | Delay | Hardware | 25 | 9 | Issues discovered late in testing | 0 | Americas | Prod. Dev. | 2015 | May | 2257500 |
3337 | 3337 | Resource | People | Queuing | 25 | 6 | Building permit was issued late due to roof da... | 0 | Americas | IT/Solution | 2002 | June | 2227875 |
4970 | 4970 | Scope | Change | Gap | 23 | 2 | Defects discovered in late testing | 0 | Americas | IT/Solution | 2010 | February | 2227527 |
3469 | 3469 | Scope | Defect | Integration | 26 | 3 | Program Manager squeezes in additional functio... | 0 | Americas | IT/Solution | 2014 | January | 2226432 |
3882 | 3882 | Schedule | Delay | Information | 25 | 5 | Foundation was inadaquate and needed upgrading | 0 | Americas | IT/Solution | 2003 | February | 2200725 |
4423 | 4423 | Scope | Change | Gap | 22 | 7 | Partner software is incompatible with current ... | 0 | Americas | Prod. Dev. | 2015 | November | 2180992 |
4842 | 4842 | Resource | Outsourcing | Late or poor output | 26 | 8 | The Project Lead did not hold the customer/bus... | 0 | Americas | IT/Solution | 2015 | June | 2164214 |
4905 | 4905 | Schedule | Estimates | Judgment | 25 | 6 | Lack of testing of older WWW browser versions ... | 0 | Americas | Prod. Dev. | 2015 | November | 2161575 |
4599 | 4599 | Resource | Outsourcing | Late or poor output | 25 | 1 | Poor communications resulted in never truly un... | 0 | Americas | IT/Solution | 2010 | March | 2149800 |
4747 | 4747 | Scope | Change | Gap | 26 | 3 | Product developed for multiple platforms, but ... | 0 | Americas | Prod. Dev. | 2010 | July | 2127450 |
4779 | 4779 | Scope | Defect | Software | 23 | 3 | Too few disc drives, and insufficient "owned" ... | 0 | Americas | Prod. Dev. | 2015 | August | 2095208 |
4316 | 4316 | Scope | Change | Gap | 23 | 9 | Lack of spare parts and expertise | 0 | Americas | IT/Solution | 2011 | August | 2092333 |
4947 | 4947 | Scope | Change | Gap | 23 | 4 | Inexperienced staff | 0 | Americas | Prod. Dev. | 2015 | April | 2088676 |
4986 | 4986 | Scope | Defect | Software | 25 | 6 | Initial code release was a major failure | 0 | Americas | IT/Solution | 2010 | July | 2085275 |
4183 | 4183 | Scope | Change | Gap | 25 | 9 | Scope not defined well; huge software delays | 0 | Americas | IT/Solution | 2003 | July | 2075050 |
4453 | 4453 | Scope | Change | Creep | 24 | 4 | Unexpectedly high demand requires outlisde help | 0 | Americas | Prod. Dev. | 2010 | April | 2018520 |
3329 | 3329 | Resource | People | Motivation | 24 | 8 | Lack of head count from a 10% company-wide lay... | 0 | Americas | IT/Solution | 2003 | March | 2002488 |
4716 | 4716 | Schedule | Estimates | Judgment | 21 | 4 | Changing product definition and poor estimatio... | 0 | Americas | IT/Solution | 2015 | October | 1999221 |
4256 | 4256 | Resource | Outsourcing | Late or poor output | 23 | 9 | Resources thin and tasks queued in backlogs. | 0 | Americas | Prod. Dev. | 2002 | July | 1994169 |
4035 | 4035 | Scope | Change | Gap | 22 | 6 | The project team agreed to new requirements; u... | 0 | Americas | IT/Solution | 2015 | September | 1987920 |
4400 | 4400 | Schedule | Dependency | Project | 24 | 2 | Controlled Document rejected by Approvers | 0 | Americas | IT/Solution | 2010 | September | 1985688 |
3734 | 3734 | Resource | People | Temp loss | 22 | 4 | Programmers were volunteers, and ultimately al... | 0 | Americas | Prod. Dev. | 2015 | July | 1975996 |
3615 | 3615 | Scope | Change | Creep | 26 | 9 | Developer added a new routine that was not ful... | 0 | Americas | Prod. Dev. | 2008 | October | 1974674 |
4660 | 4660 | Schedule | Delay | Decision | 20 | 1 | Key resource tied to a previous project for tw... | 0 | Americas | IT/Solution | 2015 | March | 1973680 |
4505 | 4505 | Schedule | Delay | Parts | 25 | 3 | Had only 1/3 of the units needed on time | 0 | Americas | IT/Solution | 2015 | October | 1972475 |
3838 | 3838 | Scope | Change | Creep | 26 | 7 | New application works, but data collected is n... | 0 | Americas | IT/Solution | 2005 | December | 1961206 |
4650 | 4650 | Resource | People | Temp loss | 21 | 9 | Project team members reassigned. Not replaced... | 0 | Americas | IT/Solution | 2010 | May | 1955856 |
4994 | 4994 | Schedule | Delay | Parts | 24 | 7 | Freelancer hired was a nutjob, and lacked t... | 0 | Americas | IT/Solution | 2015 | September | 1952040 |
4009 | 4009 | Resource | Outsourcing | Delayed start | 25 | 1 | Upgrade to software required. Planning and tr... | 0 | Americas | Prod. Dev. | 2016 | April | 1933950 |
3560 | 3560 | Resource | People | Queuing | 26 | 2 | Six weeks in, a redesign to the most fundament... | 0 | Americas | IT/Solution | 2011 | July | 1928602 |
4418 | 4418 | Scope | Change | Gap | 23 | 1 | Contributor couldn't work due to injury and ba... | 0 | Americas | IT/Solution | 2010 | May | 1926181 |
4500 | 4500 | Schedule | Delay | Hardware | 25 | 6 | Needed system delivered to wrong building; los... | 0 | Americas | IT/Solution | 2015 | September | 1923300 |
What if we want to see the data for projects in the Americas that had a category of money? Let's also sort the data so we can see the projects with the highest cost failures first. See below for the code to do so, you can change the category that you would like by copying and pasting the category you want from above into the code below.
For example, if I want to look at the project in the Americas where the category is People then I could do:
category_filter = category_filter.sort_values(['Cost'], ascending = False)
dataset.loc[(dataset['Category'] == 'People') & (dataset['Region'] == 'Americas')]
category_filter = dataset.loc[(dataset['Category'] == 'Money') & (dataset['Region'] == 'Americas')]
category_filter = category_filter.sort_values(['Cost'], ascending = False)
category_filter.head(50) #Let's look at the first 50 rows where the region is the Americas and the category is Money
Project Number | Parameter | Category | Sub cat | Impact | TRL | Description | Region Numeric | Region | Project | Year | Month | Cost | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4549 | 4549 | Resource | Money | Limitation | 26 | 4 | "Minor" changes added and accepted late in pr... | 0 | Americas | IT/Solution | 2002 | January | 2437292 |
4133 | 4133 | Resource | Money | Limitation | 24 | 6 | Shipping documents not filled in correctly | 0 | Americas | IT/Solution | 2006 | March | 2383512 |
4082 | 4082 | Resource | Money | Limitation | 24 | 1 | Components that could have been purchased were... | 0 | Americas | IT/Solution | 2017 | December | 1864824 |
4434 | 4434 | Resource | Money | Limitation | 26 | 6 | Key analyst is scheduled to work on another pr... | 0 | Americas | Prod. Dev. | 2015 | November | 1809678 |
4546 | 4546 | Resource | Money | Limitation | 22 | 5 | Project schedule was revised user interface re... | 0 | Americas | Prod. Dev. | 2010 | July | 1724844 |
4435 | 4435 | Resource | Money | Limitation | 19 | 6 | Architect who knew how to integrate all compon... | 0 | Americas | Prod. Dev. | 2010 | February | 1678745 |
1077 | 1077 | Resource | Money | Limitation | 21 | 6 | Sponsor fails to provide money | 0 | Americas | Prod. Dev. | 2007 | November | 1646505 |
1132 | 1132 | Resource | Money | Limitation | 22 | 3 | IT department was underfunded and had insuffie... | 0 | Americas | IT/Solution | 2019 | January | 1594516 |
4072 | 4072 | Resource | Money | Limitation | 26 | 9 | Delay in getting material when a fermentation ... | 0 | Americas | IT/Solution | 2014 | February | 1552278 |
1480 | 1480 | Resource | Money | Limitation | 21 | 6 | Internal resources were not available on a ful... | 0 | Americas | IT/Solution | 2007 | December | 1486023 |
1138 | 1138 | Resource | Money | Limitation | 21 | 5 | Staffing was not sufficient to complete all as... | 0 | Americas | Prod. Dev. | 2019 | July | 1464645 |
4369 | 4369 | Resource | Money | Limitation | 24 | 5 | Online badge printing requirement completely m... | 0 | Americas | Prod. Dev. | 2001 | June | 1388904 |
4473 | 4473 | Resource | Money | Limitation | 16 | 8 | The key person responsible for major marketing... | 0 | Americas | IT/Solution | 2015 | December | 1346976 |
1100 | 1100 | Resource | Money | Limitation | 20 | 6 | Development team had insufficient resources as... | 0 | Americas | Prod. Dev. | 2017 | August | 1342380 |
1709 | 1709 | Resource | Money | Limitation | 17 | 6 | Chronically understaffed business environment | 0 | Americas | Prod. Dev. | 2016 | September | 1339753 |
4292 | 4292 | Resource | Money | Limitation | 16 | 7 | New equipment critical to the success of the p... | 0 | Americas | IT/Solution | 2010 | December | 1262176 |
2049 | 2049 | Resource | Money | Limitation | 20 | 4 | Contractor setup delayed by a week | 0 | Americas | Prod. Dev. | 2019 | August | 1229220 |
4553 | 4553 | Resource | Money | Limitation | 23 | 6 | Corporate merger delayed project | 0 | Americas | IT/Solution | 2006 | February | 1213710 |
531 | 531 | Resource | Money | Limitation | 15 | 6 | Lack of funds for travel at launch | 0 | Americas | Prod. Dev. | 2014 | September | 1160100 |
4387 | 4387 | Resource | Money | Limitation | 14 | 8 | fittings and assemblies were not up to project... | 0 | Americas | Prod. Dev. | 2008 | July | 1135526 |
4300 | 4300 | Resource | Money | Limitation | 15 | 2 | Errors in input data had to be fixed and repor... | 0 | Americas | IT/Solution | 2007 | August | 1133325 |
937 | 937 | Resource | Money | Limitation | 15 | 5 | Without funding, all work ceased, contracts we... | 0 | Americas | Prod. Dev. | 2001 | March | 1124565 |
1078 | 1078 | Resource | Money | Limitation | 22 | 4 | Project needs $150K per month in supplies but ... | 0 | Americas | Prod. Dev. | 2007 | August | 1105742 |
4134 | 4134 | Resource | Money | Limitation | 17 | 7 | Dependency on other project not discovered unt... | 0 | Americas | IT/Solution | 2013 | August | 1035181 |
1521 | 1521 | Resource | Money | Limitation | 15 | 4 | Due to a layoff no writer was immediately avai... | 0 | Americas | Prod. Dev. | 2009 | January | 1028115 |
1963 | 1963 | Resource | Money | Limitation | 22 | 5 | Did not have sufficient resources and the co... | 0 | Americas | IT/Solution | 2019 | January | 1005026 |
2169 | 2169 | Resource | Money | Limitation | 22 | 4 | Lack of money for sufficient equipment | 0 | Americas | Prod. Dev. | 2019 | November | 981112 |
3293 | 3293 | Resource | Money | Limitation | 15 | 4 | Project has funding, but limited staff delays ... | 0 | Americas | IT/Solution | 2005 | April | 957585 |
1328 | 1328 | Resource | Money | Limitation | 20 | 6 | Insufficient resources assigned to the project | 0 | Americas | Prod. Dev. | 2007 | September | 930900 |
813 | 813 | Resource | Money | Limitation | 18 | 6 | lack of staff availability due to company down... | 0 | Americas | IT/Solution | 2007 | February | 917766 |
4445 | 4445 | Resource | Money | Limitation | 17 | 2 | Patch to software required for project work; e... | 0 | Americas | IT/Solution | 2002 | May | 885836 |
2014 | 2014 | Resource | Money | Limitation | 13 | 6 | Major infrastructure project cut back; delayed... | 0 | Americas | IT/Solution | 2007 | July | 883350 |
1400 | 1400 | Resource | Money | Limitation | 16 | 3 | The travel budget for the project was cut and ... | 0 | Americas | Prod. Dev. | 2003 | March | 849680 |
76 | 76 | Resource | Money | Limitation | 15 | 6 | Project budget was dramatically cut | 0 | Americas | IT/Solution | 2009 | May | 832665 |
2769 | 2769 | Resource | Money | Limitation | 20 | 4 | Delays due to bidding issues and protests | 0 | Americas | IT/Solution | 2010 | August | 822300 |
4361 | 4361 | Resource | Money | Limitation | 15 | 2 | Details of requirements were submitted incorre... | 0 | Americas | Prod. Dev. | 2002 | June | 820110 |
2405 | 2405 | Resource | Money | Limitation | 16 | 5 | Unexpectedly high demand requires outlisde help | 0 | Americas | IT/Solution | 2013 | December | 809664 |
2048 | 2048 | Resource | Money | Limitation | 22 | 7 | Did not have sufficient resources and the cont... | 0 | Americas | IT/Solution | 2019 | April | 804584 |
1329 | 1329 | Resource | Money | Limitation | 11 | 7 | Project Budget was restricted to the bare mini... | 0 | Americas | Prod. Dev. | 2009 | June | 785147 |
356 | 356 | Resource | Money | Limitation | 13 | 5 | Development team had insufficient resources as... | 0 | Americas | Prod. Dev. | 2005 | September | 779298 |
355 | 355 | Resource | Money | Limitation | 13 | 5 | Part of the project team ran of money three mo... | 0 | Americas | Prod. Dev. | 2009 | July | 777777 |
4004 | 4004 | Resource | Money | Limitation | 10 | 8 | Contract had no penalties for missing deadline... | 0 | Americas | IT/Solution | 2005 | August | 759950 |
1876 | 1876 | Resource | Money | Limitation | 15 | 6 | Layoffs in mid-project | 0 | Americas | IT/Solution | 2013 | March | 758550 |
102 | 102 | Resource | Money | Limitation | 11 | 6 | Project had limited funding and limited staffing | 0 | Americas | IT/Solution | 2012 | April | 717541 |
73 | 73 | Resource | Money | Limitation | 10 | 5 | Failed to allocate enough resources and missed... | 0 | Americas | IT/Solution | 2002 | May | 708760 |
1453 | 1453 | Resource | Money | Limitation | 12 | 4 | Lack of money for sufficient equipment | 0 | Americas | Prod. Dev. | 2007 | August | 701112 |
0 | 0 | Resource | Money | Limitation | 18 | 5 | Did not have sufficient resources and the cont... | 0 | Americas | IT/Solution | 2010 | October | 700560 |
2854 | 2854 | Resource | Money | Limitation | 20 | 4 | Unexpectedly high demand requires outlisde help | 0 | Americas | IT/Solution | 2005 | July | 692300 |
719 | 719 | Resource | Money | Limitation | 11 | 4 | Client slashed the original budget | 0 | Americas | IT/Solution | 2014 | December | 673464 |
1405 | 1405 | Resource | Money | Limitation | 15 | 4 | Key consultant was let go at a critical time f... | 0 | Americas | IT/Solution | 2014 | February | 664230 |
How can we visualize this filtered data to understand how TRL level effects the Impact value in projects in the Americas, in the Money Category?
g = category_filter.groupby('TRL', as_index = False)['Impact'].mean()
g['Impact'] = g['Impact'].round(2)
g
TRL | Impact | |
---|---|---|
0 | 1 | 14.75 |
1 | 2 | 13.50 |
2 | 3 | 12.45 |
3 | 4 | 14.17 |
4 | 5 | 14.71 |
5 | 6 | 15.54 |
6 | 7 | 11.78 |
7 | 8 | 10.75 |
8 | 9 | 22.00 |
plt.figure(figsize = (3,10))
h = plt.bar(g['TRL'],g['Impact'])
plt.xlabel('TRL')
plt.ylabel('Impact (in Weeks)')
plt.title('Average Impact by TRL for Projects in the Americas with a Failure in the Money Category')
h
<BarContainer object of 9 artists>
Now that we can we graph this data easily and analyze the data for our understanding? Let's start with understanding how the average cost per failure varies by region.
y = dataset.groupby('Region', as_index = False)['Cost'].mean()
y['Cost'] = y['Cost'].round(2)
y
Region | Cost | |
---|---|---|
0 | Africa | 744616.68 |
1 | Americas | 752808.41 |
2 | Asia | 724886.88 |
3 | Eur/ME | 778307.72 |
plt.figure(figsize = (3,10))
a = plt.bar(y['Region'],y['Cost'])
plt.xlabel('Region')
plt.ylabel('Average Cost (in USD)')
plt.title('Average Cost (in USD) per Region')
a
<BarContainer object of 4 artists>
What is the Average Cost Per Year for all the Projects?
z = dataset.groupby('Year', as_index = False)['Cost'].mean()
z['Cost'] = z['Cost'].round(2)
z.sort_values(by = ['Year'])
z
Year | Cost | |
---|---|---|
0 | 2001 | 766136.24 |
1 | 2002 | 775338.76 |
2 | 2003 | 751868.87 |
3 | 2004 | 760491.31 |
4 | 2005 | 714075.49 |
5 | 2006 | 715143.45 |
6 | 2007 | 711551.25 |
7 | 2008 | 729581.87 |
8 | 2009 | 782631.19 |
9 | 2010 | 801956.27 |
10 | 2011 | 759503.60 |
11 | 2012 | 705645.49 |
12 | 2013 | 701452.11 |
13 | 2014 | 733437.70 |
14 | 2015 | 875305.61 |
15 | 2016 | 729610.30 |
16 | 2017 | 770409.96 |
17 | 2018 | 764912.75 |
18 | 2019 | 699717.54 |
b = plt.plot(z['Year'],z['Cost'])
plt.xlabel('Year')
plt.ylabel('Average Cost (in USD)')
plt.title('Average Cost (in USD) over Time for Projects in all the Regions')
plt.xticks(np.arange(2001, 2020, 2))
b
[<matplotlib.lines.Line2D at 0x1a1a64beb8>]
y = region_filter.groupby('Year', as_index = False)['Cost'].mean()
y['Cost'] = y['Cost'].round(2)
y = y.sort_values(by = ['Year'])
y
Year | Cost | |
---|---|---|
0 | 2001 | 753027.31 |
1 | 2002 | 759695.25 |
2 | 2003 | 730311.18 |
3 | 2004 | 734516.43 |
4 | 2005 | 694079.68 |
5 | 2006 | 704339.55 |
6 | 2007 | 697654.34 |
7 | 2008 | 724117.90 |
8 | 2009 | 800630.01 |
9 | 2010 | 841943.56 |
10 | 2011 | 768046.24 |
11 | 2012 | 743514.16 |
12 | 2013 | 646923.40 |
13 | 2014 | 752774.03 |
14 | 2015 | 901210.34 |
15 | 2016 | 682162.81 |
16 | 2017 | 802036.48 |
17 | 2018 | 741020.30 |
18 | 2019 | 666180.41 |
Now let's look at the Average Cost Per Year for Projects in the Americas.
a = plt.plot(y['Year'],y['Cost'])
plt.xlabel('Year')
plt.ylabel('Average Cost (in USD)')
plt.title('Average Cost (in USD) over Time for Projects in the Americas')
plt.xticks(np.arange(2001, 2020, 2))
a
[<matplotlib.lines.Line2D at 0x1a1a2bcf98>]
Let's take this one step further - what if we look at the average cost for projects in all the regions but the Americas? First we'll filter the data to only include projects in the Asia, Eur/ME, and Africa regions. Let's also sort the data so we can see the projects with the highest cost failures first.
region_filter_no_americas = dataset.loc[dataset['Region'] != 'Americas']
region_filter_no_americas = region_filter_no_americas.sort_values(['Cost'], ascending = False)
region_filter_no_americas.head(50) #Let's look at the first 50 values
Project Number | Parameter | Category | Sub cat | Impact | TRL | Description | Region Numeric | Region | Project | Year | Month | Cost | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3842 | 3842 | Resource | People | Late start | 26 | 3 | Change Management Lead took unexpected medical... | 1 | Asia | IT/Solution | 2003 | November | 2562560 |
4940 | 4940 | Scope | Change | Gap | 26 | 8 | Chip failed and had to be refabricated with ch... | 2 | Eur/ME | IT/Solution | 2018 | January | 2548442 |
3705 | 3705 | Resource | Outsourcing | Delayed start | 26 | 2 | Project had to wait until all data collection ... | 2 | Eur/ME | Prod. Dev. | 2001 | October | 2543034 |
4991 | 4991 | Resource | People | Loss | 26 | 2 | To expand business, unrealistic commitments we... | 3 | Africa | Prod. Dev. | 2004 | May | 2504268 |
4788 | 4788 | Resource | People | Loss | 25 | 2 | Contract lab failed to inform us that they had... | 3 | Africa | Prod. Dev. | 2002 | December | 2469325 |
4552 | 4552 | Resource | Money | Limitation | 26 | 7 | Server technology chage was necessitated to su... | 2 | Eur/ME | Prod. Dev. | 2015 | December | 2425748 |
3394 | 3394 | Schedule | Dependency | Infrastructure | 26 | 6 | Need to upgrade detected only at project end | 2 | Eur/ME | Prod. Dev. | 2003 | March | 2357602 |
4843 | 4843 | Resource | Outsourcing | Late start | 25 | 4 | Internal supplier problem; needed components d... | 3 | Africa | Prod. Dev. | 2013 | June | 2326750 |
4797 | 4797 | Scope | Change | Creep | 24 | 7 | Late test results came back negative | 3 | Africa | IT/Solution | 2011 | January | 2309472 |
3523 | 3523 | Scope | Change | Gap | 24 | 8 | Several employees had to take a leave of absen... | 1 | Asia | Prod. Dev. | 2011 | August | 2301120 |
4641 | 4641 | Resource | People | Temp loss | 24 | 5 | Delay in hiring more engineers due to cost cut... | 3 | Africa | IT/Solution | 2010 | June | 2286048 |
4869 | 4869 | Resource | People | Temp loss | 25 | 5 | Some of the cross functional teams were not in... | 3 | Africa | IT/Solution | 2010 | October | 2242050 |
4756 | 4756 | Scope | Change | Gap | 25 | 2 | Insufficient length of anchor bolts protruding... | 2 | Eur/ME | IT/Solution | 2018 | August | 2237275 |
4568 | 4568 | Resource | People | Queuing | 25 | 5 | Insufficient initial project scope was defined... | 2 | Eur/ME | IT/Solution | 2005 | April | 2218325 |
4675 | 4675 | Schedule | Delay | Information | 25 | 4 | Key parts would be delivered late | 2 | Eur/ME | IT/Solution | 2018 | September | 2206475 |
3617 | 3617 | Scope | Change | Creep | 25 | 1 | New database wouldn’t work with the proposed p... | 2 | Eur/ME | IT/Solution | 2015 | May | 2204675 |
3897 | 3897 | Scope | Change | Creep | 26 | 9 | Multiple projects depend on lone expert scientist | 1 | Asia | Prod. Dev. | 2006 | August | 2194530 |
4864 | 4864 | Resource | People | Queuing | 23 | 1 | Lack of understanding by the manager of the sy... | 2 | Eur/ME | Prod. Dev. | 2007 | April | 2157676 |
4083 | 4083 | Schedule | Estimates | Learning | 22 | 5 | Project delayed due to recent/last minute chan... | 1 | Asia | Prod. Dev. | 2014 | March | 2150148 |
4883 | 4883 | Schedule | Delay | Parts | 24 | 4 | Apartments were sold before construction begin... | 2 | Eur/ME | IT/Solution | 2007 | December | 2124000 |
4160 | 4160 | Schedule | Delay | Parts | 24 | 7 | Cable required had been left off the implement... | 2 | Eur/ME | IT/Solution | 2006 | April | 2106312 |
4000 | 4000 | Resource | People | Loss | 26 | 9 | Client requested "15 minute" change accepted a... | 1 | Asia | Prod. Dev. | 2003 | November | 2098902 |
3518 | 3518 | Schedule | Delay | Decision | 24 | 2 | Poor Fit/Gap analysis | 1 | Asia | Prod. Dev. | 2006 | November | 2094336 |
4437 | 4437 | Resource | Money | Limitation | 26 | 5 | None of the developers had experience with the... | 2 | Eur/ME | Prod. Dev. | 2009 | July | 2093078 |
3726 | 3726 | Resource | People | Loss | 24 | 9 | Project start delayed; staffing and buy-in lat... | 1 | Asia | IT/Solution | 2011 | June | 2090328 |
3831 | 3831 | Schedule | Delay | Hardware | 22 | 4 | After more than 2/3 of the original budget had... | 2 | Eur/ME | Prod. Dev. | 2003 | March | 2085204 |
4835 | 4835 | Resource | Outsourcing | Late or poor output | 24 | 7 | Parallel tasks require resources beyond what i... | 2 | Eur/ME | IT/Solution | 2001 | July | 2082432 |
4682 | 4682 | Schedule | Delay | Parts | 26 | 3 | Supplier was unable to make a sample correctly | 2 | Eur/ME | IT/Solution | 2004 | January | 2081560 |
3345 | 3345 | Resource | People | Temp loss | 24 | 8 | Scope expanded during the course of the project | 1 | Asia | IT/Solution | 2017 | December | 2055624 |
4318 | 4318 | Resource | People | Temp loss | 22 | 2 | Move out of old location required, but new spa... | 2 | Eur/ME | IT/Solution | 2017 | November | 2052248 |
3557 | 3557 | Resource | People | Queuing | 24 | 3 | Inadequate procedures for acceptance, quality,... | 1 | Asia | Prod. Dev. | 2015 | May | 2052144 |
3857 | 3857 | Resource | People | Loss | 25 | 7 | Team was staffed by 3 successive new hires in ... | 2 | Eur/ME | IT/Solution | 2010 | March | 2050425 |
3556 | 3556 | Resource | People | Queuing | 25 | 1 | A pool of unqualified, poorly performing, marg... | 1 | Asia | Prod. Dev. | 2013 | September | 2041850 |
3898 | 3898 | Resource | Outsourcing | Late or poor output | 21 | 5 | Errors found in data and critrical reports | 1 | Asia | IT/Solution | 2011 | March | 2040171 |
3878 | 3878 | Resource | Outsourcing | Late or poor output | 22 | 8 | Late stakeholders redefine requiremnts | 2 | Eur/ME | IT/Solution | 2017 | April | 2035968 |
3235 | 3235 | Scope | Defect | Hardware | 21 | 9 | Conversion from legacy system to SAP has multi... | 2 | Eur/ME | IT/Solution | 2017 | April | 2033829 |
3577 | 3577 | Schedule | Delay | Information | 22 | 3 | Project team was not aware that the new releas... | 2 | Eur/ME | IT/Solution | 2016 | January | 2023054 |
4816 | 4816 | Scope | Change | Creep | 20 | 3 | Inadequate staffing | 2 | Eur/ME | Prod. Dev. | 2018 | March | 1997540 |
4517 | 4517 | Scope | Change | Creep | 20 | 9 | Needed wiring not available on time | 2 | Eur/ME | Prod. Dev. | 2002 | February | 1992660 |
4449 | 4449 | Resource | People | Queuing | 22 | 3 | Work at customer site had to be done by union ... | 2 | Eur/ME | Prod. Dev. | 2002 | June | 1991176 |
4637 | 4637 | Resource | People | Queuing | 21 | 9 | Contract lab failed to inform us that they had... | 2 | Eur/ME | IT/Solution | 2015 | April | 1977528 |
4815 | 4815 | Scope | Change | Creep | 20 | 5 | Client slashed the original budget | 3 | Africa | Prod. Dev. | 2018 | February | 1974940 |
3800 | 3800 | Schedule | Delay | Parts | 24 | 4 | Did not have the correct people validate requi... | 1 | Asia | IT/Solution | 2004 | February | 1971720 |
3191 | 3191 | Resource | People | Temp loss | 21 | 9 | Redesign required late in printer project to m... | 2 | Eur/ME | Prod. Dev. | 2003 | October | 1971123 |
4477 | 4477 | Resource | Outsourcing | Late or poor output | 22 | 7 | All students did not complete training as sche... | 2 | Eur/ME | IT/Solution | 2019 | October | 1968626 |
4669 | 4669 | Schedule | Delay | Hardware | 23 | 9 | Decisions are being delayed without apparent r... | 3 | Africa | Prod. Dev. | 2013 | June | 1964062 |
4326 | 4326 | Resource | People | Queuing | 25 | 6 | Approvals held up at project completion to wai... | 2 | Eur/ME | IT/Solution | 2003 | June | 1960725 |
3574 | 3574 | Schedule | Delay | Hardware | 26 | 7 | In creating a new system, there was too little... | 1 | Asia | Prod. Dev. | 2002 | August | 1952574 |
4405 | 4405 | Scope | Change | Creep | 24 | 5 | Delay in patient recruitment for clinical trials | 2 | Eur/ME | IT/Solution | 2008 | January | 1948032 |
3673 | 3673 | Scope | Defect | Hardware | 22 | 4 | No understanding of the scope and not enough s... | 1 | Asia | Prod. Dev. | 2001 | March | 1934790 |
c = region_filter_no_americas.groupby('Year', as_index = False)['Cost'].mean()
c['Cost'] = c['Cost'].round(2)
c = c.sort_values(by = ['Year'])
c
Year | Cost | |
---|---|---|
0 | 2001 | 779125.99 |
1 | 2002 | 795394.55 |
2 | 2003 | 774372.07 |
3 | 2004 | 784885.11 |
4 | 2005 | 733375.79 |
5 | 2006 | 727068.51 |
6 | 2007 | 744882.14 |
7 | 2008 | 734674.51 |
8 | 2009 | 757989.94 |
9 | 2010 | 730288.23 |
10 | 2011 | 749011.00 |
11 | 2012 | 669423.28 |
12 | 2013 | 756446.87 |
13 | 2014 | 713904.06 |
14 | 2015 | 806226.34 |
15 | 2016 | 789172.05 |
16 | 2017 | 742397.90 |
17 | 2018 | 785768.02 |
18 | 2019 | 792031.09 |
l = plt.plot(c['Year'],c['Cost'])
plt.xlabel('Year')
plt.ylabel('Average Cost (in USD)')
plt.title('Average Cost (in USD) over Time for Projects not in the Americas')
plt.xticks(np.arange(2001, 2020, 2))
l
[<matplotlib.lines.Line2D at 0x1a1a6b6eb8>]
Now that we've looked at the cost breakdown over time per region - let's look at the difference in the average Impact (in weeks) per region.
d = dataset.groupby('Region', as_index = False)['Impact'].mean()
d['Impact'] = d['Impact'].round(2)
d
Region | Impact | |
---|---|---|
0 | Africa | 13.53 |
1 | Americas | 13.51 |
2 | Asia | 13.29 |
3 | Eur/ME | 13.41 |
Let's also take a look at how TRL values affect Impact.
e = dataset.groupby('TRL', as_index = False)['Impact'].mean()
e['Impact'] = e['Impact'].round(2)
e
TRL | Impact | |
---|---|---|
0 | 1 | 13.70 |
1 | 2 | 13.13 |
2 | 3 | 13.45 |
3 | 4 | 13.37 |
4 | 5 | 13.76 |
5 | 6 | 13.83 |
6 | 7 | 13.43 |
7 | 8 | 12.82 |
8 | 9 | 12.29 |
plt.figure(figsize = (3,10))
f = plt.bar(e['TRL'],e['Impact'])
plt.xlabel('TRL')
plt.ylabel('Average Impact (in Weeks)')
plt.title('Average Impact (in Weeks) per TRL')
plt.yticks(np.arange(0, 13, 3))
f
<BarContainer object of 9 artists>