We have an excel workbook that details the progress of all One-Health/Veterinary Services products to include posters, brochures and newsletters.
Taking a look at the initial structure of this spreadsheet, we have missing dates or progress for certain products. This may be due to certain products not needing certain levels of approval. It may be advantageous for us to split the data by product and then look at the time related to each product.
For my sanity, I will be getting rid of the incomplete records.
First let's go ahead and load the .csv into python
import pandas as pd # import the module and alias it as pd
timedateVOH_data = pd.read_csv('C:\Users\waugh\Dropbox\Documents\One_Health\VS_Product_Analysis_dates.csv', parse_dates=True)
timedateVOH_data.head()
Type | Draft Review | SME Content Review | PMD Content Review | VID Design Work/Edits | Submitted to CRC | CRC Review Complete | |
---|---|---|---|---|---|---|---|
0 | Brochure | NaN | NaN | 5/17/2016 | 5/23/2016 | 6/1/2016 | 6/7/2016 |
1 | Brochure | NaN | NaN | 5/17/2016 | 5/23/2016 | 6/3/2016 | 6/13/2016 |
2 | Brochure | NaN | NaN | 5/17/2016 | 10/20/2016 | 11/1/2016 | 12/20/2016 |
3 | Brochure | NaN | NaN | 7/5/2016 | 7/5/2016 | 8/19/2016 | 9/2/2016 |
4 | Brochure | 5/9/2017 | 5/16/2017 | 6/6/2017 | 6/23/2017 | 8/4/2017 | 10/10/2017 |
timedateVOH_data.describe()
Type | Draft Review | SME Content Review | PMD Content Review | VID Design Work/Edits | Submitted to CRC | CRC Review Complete | |
---|---|---|---|---|---|---|---|
count | 81 | 52 | 48 | 62 | 70 | 75 | 69 |
unique | 3 | 36 | 29 | 31 | 33 | 33 | 21 |
top | Newsletter | 6/1/2017 | 4/20/2016 | 7/11/2016 | 4/20/2016 | 8/29/2017 | 8/15/2016 |
freq | 41 | 4 | 5 | 8 | 6 | 8 | 9 |
We see that the dates we have on the csv file did not read into python as a datetime object. Without this, we're unable to determine the elapsed time for each column.
Lets fix for the CRC columns at least so since the product wouldn't exist without CRC approval.
df_clean = timedateVOH_data[pd.notnull(timedateVOH_data['Submitted to CRC '])]
df_clean_1 = df_clean[pd.notnull(df_clean['CRC Review Complete'])]
df_clean_1
df_clean_2 = df_clean_1.copy()
df_clean_2['CRC Review Complete'] = pd.to_datetime(df_clean_2['CRC Review Complete'], format = '%m/%d/%Y')
df_clean_2['Submitted to CRC '] = pd.to_datetime(df_clean_2['Submitted to CRC '], format = '%m/%d/%Y')
df_clean_2['CRCtime'] = df_clean_2['CRC Review Complete'] - df_clean_2['Submitted to CRC ']
df_clean_2.head()
Type | Draft Review | SME Content Review | PMD Content Review | VID Design Work/Edits | Submitted to CRC | CRC Review Complete | CRCtime | |
---|---|---|---|---|---|---|---|---|
0 | Brochure | NaN | NaN | 5/17/2016 | 5/23/2016 | 2016-06-01 | 2016-06-07 | 6 days |
1 | Brochure | NaN | NaN | 5/17/2016 | 5/23/2016 | 2016-06-03 | 2016-06-13 | 10 days |
2 | Brochure | NaN | NaN | 5/17/2016 | 10/20/2016 | 2016-11-01 | 2016-12-20 | 49 days |
3 | Brochure | NaN | NaN | 7/5/2016 | 7/5/2016 | 2016-08-19 | 2016-09-02 | 14 days |
4 | Brochure | 5/9/2017 | 5/16/2017 | 6/6/2017 | 6/23/2017 | 2017-08-04 | 2017-10-10 | 67 days |
Now that we finished the column, we now have the calculated the elapsed time! We see, however, that certain products don't have full processing times through all columns. This probably means that certain products may not need certain approvals. So from now on, let's go ahead and split the created data.frame by the two main One-Health Products (Newsletters and Brochures).
#Create the dataframe subset
df_clean_2_News = df_clean_2.loc[df_clean_2['Type'] == 'Newsletter']
df_clean_2_News.head()
#Erase rows with NaN values
df_clean_3_News = df_clean_2_News[pd.notnull(df_clean_2_News['VID Design Work/Edits'])]
df_clean_4_News = df_clean_3_News[pd.notnull(df_clean_3_News['PMD Content Review'])]
df_clean_5_News = df_clean_4_News[pd.notnull(df_clean_4_News['SME Content Review'])]
df_clean_6_News = df_clean_5_News[pd.notnull(df_clean_5_News['Draft Review'])]
#Cenvert strings to datetime objects
df_clean_6_News['VID Design Work/Edits'] = pd.to_datetime(df_clean_6_News['VID Design Work/Edits'], format = '%m/%d/%Y')
df_clean_6_News['PMD Content Review'] = pd.to_datetime(df_clean_6_News['PMD Content Review'], format = '%m/%d/%Y')
df_clean_6_News['SME Content Review'] = pd.to_datetime(df_clean_6_News['SME Content Review'], format = '%m/%d/%Y')
df_clean_6_News['Draft Review'] = pd.to_datetime(df_clean_6_News['Draft Review'], format = '%m/%d/%Y')
df_clean_6_News.head()
#Create the elapsed time columns
df_clean_6_News['VID_Edittime'] = df_clean_6_News['Submitted to CRC '] - df_clean_6_News['VID Design Work/Edits']
df_clean_6_News['PMD_Reviewtime'] = df_clean_6_News['VID Design Work/Edits'] - df_clean_6_News['PMD Content Review']
df_clean_6_News['SME_Reviewtime'] = df_clean_6_News['PMD Content Review'] - df_clean_6_News['SME Content Review']
df_clean_6_News['Draft_Reviewtime'] = df_clean_6_News['SME Content Review'] - df_clean_6_News['Draft Review']
df_clean_7_News = df_clean_6_News.drop(df_clean_6_News[df_clean_6_News.PMD_Reviewtime == '-10 days'].index)
df_clean_8_News = df_clean_7_News.drop(df_clean_7_News[df_clean_7_News.PMD_Reviewtime == '-6 days'].index)
#df_clean_6_News['PMD_Reviewtime'].total_seconds()
#Good to go!
df_clean_8_News
Type | Draft Review | SME Content Review | PMD Content Review | VID Design Work/Edits | Submitted to CRC | CRC Review Complete | CRCtime | VID_Edittime | PMD_Reviewtime | SME_Reviewtime | Draft_Reviewtime | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
40 | Newsletter | 2016-04-13 | 2016-04-20 | 2016-04-20 | 2016-04-20 | 2016-04-28 | 2016-05-09 | 11 days | 8 days | 0 days | 0 days | 7 days |
41 | Newsletter | 2016-04-15 | 2016-04-20 | 2016-04-20 | 2016-04-20 | 2016-04-28 | 2016-05-09 | 11 days | 8 days | 0 days | 0 days | 5 days |
42 | Newsletter | 2016-04-15 | 2016-04-20 | 2016-04-20 | 2016-04-20 | 2016-04-28 | 2016-05-09 | 11 days | 8 days | 0 days | 0 days | 5 days |
43 | Newsletter | 2016-04-15 | 2016-04-20 | 2016-04-20 | 2016-04-20 | 2016-04-28 | 2016-05-09 | 11 days | 8 days | 0 days | 0 days | 5 days |
48 | Newsletter | 2016-05-17 | 2016-05-17 | 2016-05-17 | 2016-05-17 | 2016-06-03 | 2016-06-07 | 4 days | 17 days | 0 days | 0 days | 0 days |
54 | Newsletter | 2016-07-01 | 2016-07-05 | 2016-07-11 | 2016-07-11 | 2016-07-22 | 2016-08-15 | 24 days | 11 days | 0 days | 6 days | 4 days |
56 | Newsletter | 2016-06-27 | 2016-06-27 | 2016-07-11 | 2016-07-11 | 2016-07-22 | 2016-08-15 | 24 days | 11 days | 0 days | 14 days | 0 days |
58 | Newsletter | 2016-11-17 | 2016-11-23 | 2016-11-28 | 2016-11-29 | 2017-01-27 | 2017-03-30 | 62 days | 59 days | 1 days | 5 days | 6 days |
59 | Newsletter | 2016-11-08 | 2016-11-23 | 2016-11-28 | 2016-11-29 | 2017-01-27 | 2017-03-30 | 62 days | 59 days | 1 days | 5 days | 15 days |
61 | Newsletter | 2017-02-25 | 2017-03-15 | 2017-03-15 | 2017-03-17 | 2017-03-17 | 2017-05-08 | 52 days | 0 days | 2 days | 0 days | 18 days |
62 | Newsletter | 2017-02-10 | 2017-03-13 | 2017-03-15 | 2017-03-20 | 2017-04-04 | 2017-06-01 | 58 days | 15 days | 5 days | 2 days | 31 days |
63 | Newsletter | 2017-02-10 | 2017-03-10 | 2017-03-13 | 2017-03-15 | 2017-04-04 | 2017-06-01 | 58 days | 20 days | 2 days | 3 days | 28 days |
64 | Newsletter | 2017-03-01 | 2017-03-16 | 2017-03-16 | 2017-03-20 | 2017-04-04 | 2017-06-01 | 58 days | 15 days | 4 days | 0 days | 15 days |
66 | Newsletter | 2017-06-01 | 2017-06-23 | 2017-06-29 | 2017-07-25 | 2017-08-29 | 2017-08-29 | 0 days | 35 days | 26 days | 6 days | 22 days |
68 | Newsletter | 2017-06-01 | 2017-06-23 | 2017-06-28 | 2017-07-25 | 2017-08-29 | 2017-08-29 | 0 days | 35 days | 27 days | 5 days | 22 days |
70 | Newsletter | 2016-07-01 | 2016-07-05 | 2016-07-11 | 2017-08-28 | 2017-08-29 | 2017-08-29 | 0 days | 1 days | 413 days | 6 days | 4 days |
71 | Newsletter | 2016-06-07 | 2016-06-30 | 2016-07-11 | 2017-08-28 | 2017-08-29 | 2017-08-29 | 0 days | 1 days | 413 days | 11 days | 23 days |
72 | Newsletter | 2016-06-27 | 2016-06-27 | 2016-07-11 | 2017-08-28 | 2017-08-29 | 2017-08-29 | 0 days | 1 days | 413 days | 14 days | 0 days |
77 | Newsletter | 2018-08-15 | 2018-08-24 | 2018-08-24 | 2018-08-29 | 2018-08-30 | 2018-09-04 | 5 days | 1 days | 5 days | 0 days | 9 days |
78 | Newsletter | 2018-08-19 | 2018-08-30 | 2018-08-29 | 2018-08-29 | 2018-08-30 | 2018-09-04 | 5 days | 1 days | 0 days | -1 days | 11 days |
79 | Newsletter | 2018-08-15 | 2018-08-24 | 2018-08-24 | 2018-08-29 | 2018-08-30 | 2018-09-04 | 5 days | 1 days | 5 days | 0 days | 9 days |
80 | Newsletter | 2018-08-16 | 2018-08-23 | 2018-08-24 | 2018-08-29 | 2018-08-30 | 2018-09-04 | 5 days | 1 days | 5 days | 1 days | 7 days |
Now lets take a look at the summary data.....
df_clean_8_News.describe()
CRCtime | VID_Edittime | PMD_Reviewtime | SME_Reviewtime | Draft_Reviewtime | |
---|---|---|---|---|---|
count | 22 | 22 | 22 | 22 | 22 |
mean | 21 days 04:21:49.090909 | 14 days 08:43:38.181818 | 60 days 02:10:54.545454 | 3 days 12:00:00 | 11 days 04:21:49.090909 |
std | 24 days 06:22:33.479019 | 17 days 14:47:37.238031 | 143 days 17:24:49.094471 | 4 days 14:02:38.929146 | 9 days 05:32:01.438379 |
min | 0 days 00:00:00 | 0 days 00:00:00 | 0 days 00:00:00 | -1 days +00:00:00 | 0 days 00:00:00 |
25% | 4 days 06:00:00 | 1 days 00:00:00 | 0 days 00:00:00 | 0 days 00:00:00 | 5 days 00:00:00 |
50% | 11 days 00:00:00 | 8 days 00:00:00 | 2 days 00:00:00 | 1 days 12:00:00 | 8 days 00:00:00 |
75% | 45 days 00:00:00 | 16 days 12:00:00 | 5 days 00:00:00 | 5 days 18:00:00 | 17 days 06:00:00 |
max | 62 days 00:00:00 | 59 days 00:00:00 | 413 days 00:00:00 | 14 days 00:00:00 | 31 days 00:00:00 |
print(df_clean_8_News.CRCtime.median())
print(df_clean_8_News.VID_Edittime.median())
print(df_clean_8_News.PMD_Reviewtime.median())
print(df_clean_8_News.SME_Reviewtime.median())
print(df_clean_8_News.Draft_Reviewtime.median())
11 days 00:00:00 8 days 00:00:00 2 days 00:00:00 1 days 12:00:00 8 days 00:00:00
Yikes, we have some weird data that led to some negative elapsed times. We could do one of three things:
Let's go ahead and do the other products.
This type of discrepancy also demonstrates the lack of an effective way of recording data. In order to effectively produce an analysis and report of any substance, our data must be reliable.
A recommendation for the future is that an accurate working document MUST be established in order to produce reliable production time data and reports.
For this analysis, I've decided to delete the offending products with a negative elapsed time.
#Create the dataframe subset
df_clean_2_1 = df_clean_2.copy()
df_clean_2_bro = df_clean_2_1.loc[df_clean_2['Type'] == 'Brochure']
df_clean_2_bro.head()
pd.options.mode.chained_assignment = None
#Erase rows with NaN values
df_clean_3_bro = df_clean_2_bro[pd.notnull(df_clean_2_bro['VID Design Work/Edits'])]
df_clean_4_bro = df_clean_3_bro[pd.notnull(df_clean_3_bro['PMD Content Review'])]
df_clean_5_bro = df_clean_4_bro[pd.notnull(df_clean_4_bro['SME Content Review'])]
df_clean_6_bro = df_clean_5_bro[pd.notnull(df_clean_5_bro['Draft Review'])]
#Cenvert strings to datetime objects
df_clean_6_bro['VID Design Work/Edits'] = pd.to_datetime(df_clean_6_bro['VID Design Work/Edits'], format = '%m/%d/%Y')
df_clean_6_bro['PMD Content Review'] = pd.to_datetime(df_clean_6_bro['PMD Content Review'], format = '%m/%d/%Y')
df_clean_6_bro['SME Content Review'] = pd.to_datetime(df_clean_6_bro['SME Content Review'], format = '%m/%d/%Y')
df_clean_6_bro['Draft Review'] = pd.to_datetime(df_clean_6_bro['Draft Review'], format = '%m/%d/%Y')
df_clean_6_News.head()
#Create the elapsed time columns
df_clean_6_bro['VID_Edittime'] = df_clean_6_bro['Submitted to CRC '] - df_clean_6_bro['VID Design Work/Edits']
df_clean_6_bro['PMD_Reviewtime'] = df_clean_6_bro['VID Design Work/Edits'] - df_clean_6_bro['PMD Content Review']
df_clean_6_bro['SME_Reviewtime'] = df_clean_6_bro['PMD Content Review'] - df_clean_6_bro['SME Content Review']
df_clean_6_bro['Draft_Reviewtime'] = df_clean_6_bro['SME Content Review'] - df_clean_6_bro['Draft Review']
#Good to go!
df_clean_6_bro
Type | Draft Review | SME Content Review | PMD Content Review | VID Design Work/Edits | Submitted to CRC | CRC Review Complete | CRCtime | VID_Edittime | PMD_Reviewtime | SME_Reviewtime | Draft_Reviewtime | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | Brochure | 2017-05-09 | 2017-05-16 | 2017-06-06 | 2017-06-23 | 2017-08-04 | 2017-10-10 | 67 days | 42 days | 17 days | 21 days | 7 days |
7 | Brochure | 2016-11-23 | 2016-12-01 | 2016-12-02 | 2016-12-06 | 2017-01-30 | 2017-03-30 | 59 days | 55 days | 4 days | 1 days | 8 days |
9 | Brochure | 2016-12-07 | 2017-02-09 | 2017-02-10 | 2017-05-17 | 2017-06-06 | 2017-06-30 | 24 days | 20 days | 96 days | 1 days | 64 days |
10 | Brochure | 2016-12-09 | 2017-01-26 | 2017-01-27 | 2017-01-30 | 2017-02-02 | 2017-03-30 | 56 days | 3 days | 3 days | 1 days | 48 days |
11 | Brochure | 2016-09-06 | 2016-09-09 | 2016-09-09 | 2016-09-10 | 2016-11-29 | 2017-01-02 | 34 days | 80 days | 1 days | 0 days | 3 days |
13 | Brochure | 2017-03-01 | 2017-03-10 | 2017-03-13 | 2017-04-10 | 2017-06-05 | 2017-06-27 | 22 days | 56 days | 28 days | 3 days | 9 days |
df_clean_6_bro.describe()
CRCtime | VID_Edittime | PMD_Reviewtime | SME_Reviewtime | Draft_Reviewtime | |
---|---|---|---|---|---|
count | 6 | 6 | 6 | 6 | 6 |
mean | 43 days 16:00:00 | 42 days 16:00:00 | 24 days 20:00:00 | 4 days 12:00:00 | 23 days 04:00:00 |
std | 19 days 09:32:32.152061 | 27 days 14:37:12.911653 | 36 days 08:56:33.395270 | 8 days 03:25:10.375190 | 26 days 00:15:41.341229 |
min | 22 days 00:00:00 | 3 days 00:00:00 | 1 days 00:00:00 | 0 days 00:00:00 | 3 days 00:00:00 |
25% | 26 days 12:00:00 | 25 days 12:00:00 | 3 days 06:00:00 | 1 days 00:00:00 | 7 days 06:00:00 |
50% | 45 days 00:00:00 | 48 days 12:00:00 | 10 days 12:00:00 | 1 days 00:00:00 | 8 days 12:00:00 |
75% | 58 days 06:00:00 | 55 days 18:00:00 | 25 days 06:00:00 | 2 days 12:00:00 | 38 days 06:00:00 |
max | 67 days 00:00:00 | 80 days 00:00:00 | 96 days 00:00:00 | 21 days 00:00:00 | 64 days 00:00:00 |
print(df_clean_6_bro.CRCtime.median())
print(df_clean_6_bro.VID_Edittime.median())
print(df_clean_6_bro.PMD_Reviewtime.median())
print(df_clean_6_bro.SME_Reviewtime.median())
print(df_clean_6_bro.Draft_Reviewtime.median())
45 days 00:00:00 48 days 12:00:00 10 days 12:00:00 1 days 00:00:00 8 days 12:00:00
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
objects = ('CRC', 'VID', 'PMD Review', 'SME Review', 'Draft Review')
y_pos = np.arange(len(objects))
performance = [df_clean_6_bro.CRCtime.dt.days.median(),df_clean_6_bro.VID_Edittime.dt.days.median(),df_clean_6_bro.PMD_Reviewtime.dt.days.median(),df_clean_6_bro.SME_Reviewtime.dt.days.median(),df_clean_6_bro.Draft_Reviewtime.dt.days.median()]
plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, objects)
plt.ylabel('Days (Median)')
plt.title('VS (One-Health) Product Production Time (Brochure)')
for i, v in enumerate(performance):
plt.text(i-.15, v+1 , str(v), color='blue', fontweight='bold', va='center')
plt.show()
Now let's take a look at the Newsletters...
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
objects = ('CRC', 'VID', 'PMD Review', 'SME Review', 'Draft Review')
y_pos = np.arange(len(objects))
performance = [df_clean_8_News.CRCtime.dt.days.median(),df_clean_8_News.VID_Edittime.dt.days.median(),df_clean_8_News.PMD_Reviewtime.dt.days.median(),df_clean_8_News.SME_Reviewtime.dt.days.median(),df_clean_8_News.Draft_Reviewtime.dt.days.median()]
plt.bar(y_pos, performance, align='center', alpha=0.5, color=['green','green','green','green','green'])
plt.xticks(y_pos, objects)
plt.ylabel('Days (Median)')
plt.title('VS (One-Health) Product Production Time (Newsletter)')
for i, v in enumerate(performance):
plt.text(i-.15, v+.18 , str(v), color='blue', fontweight='bold', va='center')
plt.show()
With the subject matter, draft and PMD review lasting roughly 75% less than VID and CRC time, recommendations should:
Include discussions, focus groups and meetings to potentially streamline the CRC process; potentially decreasing the time at certain approval levels
Require submitters of VS One-Health products to have a clear visual outline of what their product should look like before VID work order submission
A clear and defined endstate by the customer will assist the VID designer in obtaining a product that the VID directorate and the customer will agree upon, potentially decreasing the production and processing time. Additionally, focus groups, surveys and interviews should be done within VID to discuss other potential bottlenecks during the production process.
Additionally, with our aforementioned FYGVE brochure program, a dedicated person within the One-Health Division should be responsible for the overall setup and design of the brochures. With this, this would increase positive interaction with VID and decrease the VID production time potentially decreasing the overall production time, significantly.
In conclusion, my analysis shows that VID production and CRC approval take up more than 50 percent of the total production time in both One-health newsletters and brochures. I recommend the One-Health Division to open communication between VID and CRC to develop potential ideas to decrease processing and approval times.