Potential additions to the Global Burden of Disease Study¶

The Global Burden of Disease (GBD) is a comprehensive look at the causes and extent of life years lost globally. This project began as a World Health Organization (WHO) initiative in 1990, and went through a major update in 2010. The Institute for Health Metrics and Evaluation (IHME) played a leading role in the most recent update, organizing over 1600 GBD collaborators from 16 countries.

One interesting tool that they built is the GBD Compare visualization for viewing and interacting with the data. This tool is especially helpful for comparing the scale of different public health problems, and how they relate to one another. For example, here is a re-creation of one of their visualizations using D3.js, showing the total Disability Adjusted Life Years (DALY) burden by year for the world population:

The rest of this post looks at a few other disease burdens that it might make sense to add to the GBD analysis. First, I think they should add the annualized burden of potential pandemic disease to the study. Second, it might make sense to increase the reference life expectancy so that years of life lost due to future increases in the life expectancy are incorporated into the estimates. I provide the computations and graphics to vizualize the scale of each one of these new burdens below.

In [349]:

%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import textwrap

matplotlib.style.use('ggplot')  #'ggplot' 'fivethirtyeight'  'seaborn-paper'
matplotlib.rcParams['figure.figsize'] = (11.0, 9.0)
#Reset paramaters:
#matplotlib.rcParams.update(matplotlib.rcParamsDefault)

pd.set_option('max_colwidth', 400)

# print plt.style.available

Import the Data¶

All of the data for this post can be accessed by visiting the vizualization and clicking the download button in the upper right corner. I obtained the life table that I use later from the Web Table 6 of the supplementary appendix: PDF. The CSVs are also available as a zipped file here.

The initial data shows global DALY burden by cause for a number of years between 1990 and 2013.

In [350]:

gbd_df = pd.read_csv('./GBD_global_data.csv')
gbd_df.head()

Out[350]:

	Location	Year	Age	Sex	Cause of death or injury	Measure	Value	Lower bound	Upper bound
0	Global	1990	All ages	Both	Forces of nature, war, and legal intervention	DALYs	13650843.806051	8198677.494192	23356495.474653
1	Global	1995	All ages	Both	Forces of nature, war, and legal intervention	DALYs	11243358.692292	6430940.583227	20036013.689167
2	Global	2000	All ages	Both	Forces of nature, war, and legal intervention	DALYs	13874813.144494	8648476.130667	23398335.472656
3	Global	2005	All ages	Both	Forces of nature, war, and legal intervention	DALYs	11121526.833311	6687655.473904	18862473.121176
4	Global	2010	All ages	Both	Forces of nature, war, and legal intervention	DALYs	18568961.216457	11665386.215099	33760218.249876

The Plotting Function¶

Below, I create a plotting function to re-create a view of the GBD Compare tool, showing the total DALY burden per year, separated into cause areas. Note that you can see a general trend downwards, and the red area of infectious disease is shrinking with time.

As the burden of infectious disease is declining, chronic diseases often associted with old age in high income countries are increasing. Note that this plot is showing absolute numbers, so the fact that the overall DALY burden is declining even as the population is growing is pretty encouraging.

In [351]:

colors = ["#006D2C", "#31A354","#74C476", "#BAE4B3", "#54278F", "#756BB1",
          "#9E9AC8", "#BCBDDC", "#DADAEB", "#08519C", "#3182BD",
          "#6BAED6", "#9ECAE1", "#C6DBEF", "#99000D", "#CB181D", 
          "#EF3B2C", "#FB6A4A", "#FC9272", "#FCBBA1", "#FEE0D2",
          '#ffff80','#ffffcc']

def stacked_plot(gbd_df, width, ylim, colors):

    fig, ax = plt.subplots(figsize=(17,10))  #(20,12)

    causes = gbd_df['Cause of death or injury'].drop_duplicates()

    # http://stackoverflow.com/questions/19060144
    # Keep track of bottom margin for each stack row/year
    margin_bottom = np.zeros(len(gbd_df['Year'].drop_duplicates()))

    for num, cause in enumerate(causes):
        values = list(gbd_df[gbd_df['Cause of death or injury'] == cause].loc[:, 'Value'])
        label = textwrap.fill(cause, 30)

        gbd_df[gbd_df['Cause of death or injury'] == cause].plot.bar(
                x='Year',y='Value', ax=ax, stacked=True, color=colors[num], label=label, 
                bottom = margin_bottom, width=width)

        margin_bottom += values

    #http://stackoverflow.com/questions/4700614/how-to-put-the-legend-out-of-the-plot
    #Shrink current axis by 20%
    box = ax.get_position()
    ax.set_position([box.x0, box.y0, box.width * 0.8, box.height])

    #Put a legend to the right of the current axis
    ax.legend(loc='center left', bbox_to_anchor=(1, 0.5))

    ax.set_ylabel('DALYs, Global (billions)')
    ax.set_ylim([0, ylim])

    plt.show()

stacked_plot(gbd_df, 0.85, 3.0e9, colors)

How would including pandemic disease risk change the picture?¶

I recently read a paper called The Neglected Dimension of Global Security: : A Framework to Counter Infectious Disease Crises [1], which looked at the risk of a global pandemic disease and called for a number of policy changes to increase our preparedness. It's pretty surprising how little emphasis we put on pandemic disease risk, given the potential health and economic costs. So I was interested in adding an annualized DALY [(years of life lost (YLL) only)] burden to the plot to get a feel for the relative scale of the problem.

First, I obtained an annual risk of pandemic outbreak from the paper [1], which they put at 3 percent per year (Appendix C). This is based on the 20th century rate, which had pandemic outbreaks in 1918, 1957 and 1968. The authors also make the point that the risk of a pandemic is increasing with time, so this 3 percent may be an underestimate. I obtained a mean age of death from a study that looked at the 2009 the A/H1N1 outbreak [2], and found an estimate from the 1918-20 spanish flu pandemic to use as the excess mortality figure [3].

As you can see in the table at the end, pandemic disease has a similar annual burden to that of Neglected tropical diseases and malaria, at 92 million DALY/year. This isn't as large as I thought it would be, but it's important to keep in mind that pandemics happen all at once, which would essentially double the global mortality figures in a single year [2]. Pandemics also cause massive social, economic, and political disruption resulting in other costs beyond human health. One estimate puts the annualized economic cost of a pandemic disease outbreak at $60 billion [1].

[1] "The Neglected Dimension of Global Security: A Framework to Counter Infectious Disease Crises." Commission on a Global Health Risk Framework for the Future. https://nam.edu/initiatives/global-health-risk-framework/

[2] "Preliminary Estimates of Mortality and Years of Life Lost Associated with the 2009 A/H1N1 Pandemic in the US and Comparison with Past Influenza Seasons." http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2843747/

[3] "Estimation of potential global pandemic influenza mortality on the basis of vital registry data from the 1918-20 pandemic: a quantitative analysis." http://www.ncbi.nlm.nih.gov/pubmed/17189032

In [352]:

# Pandemic disease annualized burden

#Annual probability, essentially based on 3 pandemics over the years 1900 to 2000 in
# 1918, 1957 and 1968 (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2843747/)
# Figure used in "The Neglected Dimension of Global Security." (www.nam.edu/GHRF)
annual_probability = 0.03  

# (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2843747/)
# Based on the study reported in August 2009 in [9], more than 85% of laboratory-confirmed 
# A/H1N1 deaths occurred in people under 60 years of age, with a mean age of deaths of 37 years 
# (Table 1). This is in marked contrast to seasonal influenza epidemics where 90% of deaths occur 
# in people over 65 years and the mean age of influenza-related deaths is estimated at 76 yrs.
mean_age = 37   
life_table_37 = 49.58 #Years remaining in life, from GBD life table below


# Excess mortality from a pandemic
# "Estimation of potential global pandemic influenza mortality on the basis of vital registry
# data from the 1918-20 pandemic: a quantitative analysis."
# http://www.ncbi.nlm.nih.gov/pubmed/17189032
# FINDINGS:
# Excess mortality data show that, even in 1918-20, population mortality varied over 30-fold across 
# countries. Per-head income explained a large fraction of this variation in mortality. Extrapolation 
# of 1918-20 mortality rates to the worldwide population of 2004 indicates that an estimated 62 million 
# people (10th-90th percentile range 51 million-81 million) would be killed by a similar influenza pandemic; 
# 96% (95% CI 95-98) of these deaths would occur in the developing world. If this mortality were concentrated 
# in a single year, it would increase global mortality by 114%.
excess_mortality = 62000000

#Use life table, don't incorporate potential years up to 100, etc.
pandemic_yll = life_table_37 * excess_mortality * annual_probability
#Annualized years of life lost due to pandemics (billions):  0.0922188

print 'Annualized years of life lost due to pandemics (billions): ', pandemic_yll/ 1e9


gbd2013_df = gbd_df[gbd_df['Year'] == 2013]
pandemic_df = gbd2013_df.copy()
pand_dict = [{'Cause of death or injury': 'Pandemic disease', 'Year': 2013, 'Value': pandemic_yll}]
pandemic_df = pandemic_df.append(pand_dict, ignore_index=True)  

stacked_plot(pandemic_df, 0.1, 3.0e9, colors)

Annualized years of life lost due to pandemics (billions):  0.0922188

What if we use a longer life expectancy as the reference?¶

In order to calculate DALYs, you need two numbers: the years of life lost (YLL) and the years lived with disability (YLD). In order to calculate YLL for an individual, you need to know their age at death and their life expectancy at that age.

But which life expectancy should you use? In past studies, the life expectancy of a male or female within the individual's country at that age was used. In my view, this most recent study is an ethical improvement because it uses Japanese women, who have the longest life expectancy at 86, as the standard to compare everyone against [4]. By doing this, the authors are saying that every person should have the longest possible life expectancy regardless of their location or sex.

But do Japanese women really have the longest life expectancy? As a group with a population that is over 5 million, they do. But we certainly know that it is biologically possible to live much longer than this. Why not set the upper limit at what is currently biologically possible?

One reason might be that a substantial portion of longevity could be due to genetic predisposition. Some estimates put the genetic portion at 20-30% but this might be complicated by epigenetics and gene-environment interactions and might increase in cases of extreme longevity [5]. But at the end of the day, genetic predisposition results in some type of gene expression in the body that we could mimic if we had a better understanding of the aging process. Also, note that the study authors are already ignoring genetic predisposition by using Japanese women as the comparison group.

Another argument against the extended life expectancy is that people are expressing their preference to live a shorter life by choosing a less healthy lifestyle (this often comes up when people express a love for bacon). I think this is a stronger criticism, but it's important to note that people aren't always rational when it comes to long term decision making. Also, plenty of other disease burdens that are due to conscious decisions (e.g. smoking) are included in the DALY estimates.

Anyways, the purpose of this article isn't to hash out every ethical consideration -- I just want to get a sense of the scale of the potential disease burden. So how would using a life expectancy of, say, 100 change the analysis? Below I use two methods, one a simple estimate, and a second more in-depth estimate using life tables and global deaths by age. The end result is that using a life expectancy of 100 would result in an extra disease burden of around 500 million YLL due to premature aging each year.

[4] "Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010." The Lancet. http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(12)61728-0/abstract

[5] "Genetic influence on human lifespan and longevity." Human Genetics. https://www.ncbi.nlm.nih.gov/pubmed/16463022/

In [353]:

# Aging annualized burden, simple estimate
annual_death_rate = 765.73/100000 #GBD 2013
potential_expectancy = 100
global_population = 7125000000  #2013, World Bank
gbd_expectancy = 86

#Annual potential years of life lost, estimate
aging_yll_est = (potential_expectancy - gbd_expectancy) * annual_death_rate * global_population

print 'Years lost due to premature aging, simple estimate (billions): ', aging_yll_est / 1e9
#Note it ignores any YLDs
#Also note that it doesn't take into account life expectancy at age of death.  
#Could be a large overestimate? 

#So to get a better estimate, get deaths by age (by sex?) from gbd data
#and get life table, and calculate 120 - lifeexp_age_fromlifetable * #people at that age annually

Years lost due to premature aging, simple estimate (billions):  0.763815675

Life Table Estimate¶

The second approach accounts for the fact that after you've lived through younger age cohorts, your current life expectancy actually exceeds your life expectancy at birth. This is why someone at age 105 in the life_table below can still expect to live 1.63 more years even if this exceeds their life expectancy at birth. Of the people that reach that age, the average length of life afterwards is 1.63 years. Ok, so how do we take that into account?

The GBD already used the age in the life table plus the remaining years of life when they calculated YLL [6]. So I need to subtract this sum from the new upper limit of 100 years for each age cohort, and multiply that result times the number of deaths in each age cohort in 2013. The deaths by age group data from IHME's site comes grouped by five year increments, so I use the mean life table value for the five year cohorts to do the analysis.

[6] "Comprehensive Systematic Analysis of Global Epidemiology: Definitions, Methods, Simplification of DALYs, and Comparative Results from the Global Burden of Disease 2010 Study." Web Table 6: Single year standard lifetable. http://www.thelancet.com/cms/attachment/2017336178/2037711222/mmc1.pdf

In [354]:

# Aging annualized burden, life table estimate

# Comprehensive Systematic Analysis of Global Epidemiology: Definitions, Methods, 
# Simplification of DALYs, and Comparative Results from the Global Burden of Disease 2010 Study
# Web Table 6: Single year standard lifetable
life_table = {0: 86.02, 1: 85.21, 2: 84.22, 3: 83.23, 4: 82.24, 5: 81.25, 6: 80.25, 7: 79.26, 
             8: 78.26, 9: 77.27, 10: 76.27, 11: 75.28, 12: 74.28, 13: 73.29, 14: 72.29, 15: 71.29, 
             16: 70.3, 17: 69.32, 18: 68.33, 19: 67.34, 20: 66.35, 21: 65.36, 22: 64.37, 23: 63.38, 
             24: 62.39, 25: 61.4, 26: 60.41, 27: 59.43, 28: 58.44, 29: 57.45, 30: 56.46, 31: 55.48, 
             32: 54.49, 33: 53.5, 34: 52.52, 35: 51.53, 36: 50.56, 37: 49.58, 38: 48.6, 39: 47.62, 40: 
             46.64, 41: 45.67, 42: 44.71, 43: 43.74, 44: 42.77, 45: 41.8, 46: 40.85, 47: 39.9, 48: 38.95, 
             49: 38.0, 50: 37.05, 51: 36.12, 52: 35.19, 53: 34.25, 54: 33.32, 55: 32.38, 56: 31.47, 
             57: 30.55, 58: 29.64, 59: 28.73, 60: 27.81, 61: 26.91, 62: 26.0, 63: 25.1, 64: 24.2, 65: 23.29, 
             66: 22.42, 67: 21.55, 68: 20.68, 69: 19.8, 70: 18.93, 71: 18.1, 72: 17.28, 73: 16.45, 74: 15.62, 
             75: 14.8, 76: 14.04, 77: 13.27, 78: 12.51, 79: 11.75, 80: 10.99, 81: 10.32, 82: 9.65, 83: 8.98, 
             84: 8.31, 85: 7.64, 86: 7.12, 87: 6.61, 88: 6.09, 89: 5.57, 90: 5.05, 91: 4.7, 92: 4.35, 93: 4.0, 
             94: 3.66, 95: 3.31, 96: 3.09, 97: 2.88, 98: 2.66, 99: 2.44, 100: 2.23, 101: 2.11, 102: 1.99,
             103: 1.87, 104: 1.75, 105: 1.63}

#The mean of life_table for each year range
life_table_5yr = {'60-64': 26.0, '25-29': 59.43, '50-54': 35.19, '90-94': 4.35, '100-104': 1.99, 
                  '75-79': 13.27, '10-14': 74.28, '95-99': 2.88, '15-19': 69.32, '20-24': 64.37, 
                  '1-4': 83.725, '65-69': 21.55, '55-59': 30.55, '40-44': 44.71, '45-49': 39.9, 
                  '30-34': 54.49, '35-39': 49.58, '5-9': 79.26, '70-74': 17.28, '0-1': 86.02,
                  '80-84': 9.65, '85-89': 6.61, '80-105': 4.96}

years = life_table_5yr.keys()
values= life_table_5yr.values()
life_df = pd.DataFrame({'Age': years, 'avg_lost_years': values })


deaths_df = pd.read_csv('./gbd_deaths_age_2013.csv')
deaths_df = deaths_df[['Age', 'Value']]
deaths_df.replace([' years', ' days', '\+'], ['','','-105'], regex=True, inplace=True)
deaths_df = deaths_df.append({'Age': '0-1', 'Value': np.sum(deaths_df.ix[0:2,'Value'])}, ignore_index=True)
deaths_df.drop(deaths_df.index[[0,1,2]], inplace=True)
deaths_df['lower_age'], deaths_df['upper_age'] = deaths_df['Age'].str.split('-').str
deaths_df[['lower_age','upper_age']] = deaths_df[['lower_age','upper_age']].apply(pd.to_numeric)
deaths_df['avg_age'] = (deaths_df['upper_age'] + deaths_df['lower_age']) / 2
deaths_df.rename(columns={'Value':'num_deaths'}, inplace=True)

deaths_df = deaths_df.merge(life_df, on='Age')
deaths_df['age_cohort_lifeexp'] = deaths_df['avg_age'] + deaths_df['avg_lost_years']


# Potential_expectancy is from simple estimate:
deaths_df['aging_yll'] = (potential_expectancy - deaths_df['age_cohort_lifeexp']) * deaths_df['num_deaths']

aging_yll = np.sum(deaths_df['aging_yll'])  

# Add on additional potential years from pandemics, based on new potential expectancy:
# No pandemics: 0.527058402523, with: 0.552019602523
aging_yll += (potential_expectancy - (mean_age + life_table_37)) * excess_mortality * annual_probability

print 'Years lost due to premature aging, life table estimate (billions): ', aging_yll / 1e9

deaths_df.sort_values(by='lower_age')     

Years lost due to premature aging, life table estimate (billions):  0.552019602523

Out[354]:

	Age	num_deaths	lower_age	upper_age	avg_age	avg_lost_years	age_cohort_lifeexp	aging_yll
17	0-1	4463724.165644	0	1	0.5	86.020	86.520	60171001.752885
0	1-4	1816195.614709	1	4	2.5	83.725	86.225	25018094.592611
1	5-9	476742.274921	5	9	7.0	79.260	86.260	6550438.857416
2	10-14	365474.428165	10	14	12.0	74.280	86.280	5014309.154428
3	15-19	600613.932945	15	19	17.0	69.320	86.320	8216398.602691
4	20-24	900540.161535	20	24	22.0	64.370	86.370	12274362.401721
5	25-29	1075687.420244	25	29	27.0	59.430	86.430	14597078.292715
6	30-34	1175301.526232	30	34	32.0	54.490	86.490	15878323.619396
7	35-39	1329198.871632	35	39	37.0	49.580	86.580	17837848.857304
8	40-44	1571115.297667	40	44	42.0	44.710	86.710	20880122.305997
9	45-49	1992295.351683	45	49	47.0	39.900	86.900	26099069.107050
10	50-54	2527296.211479	50	54	52.0	35.190	87.190	32374664.469052
11	55-59	3186138.279997	55	59	57.0	30.550	87.550	39667421.585968
12	60-64	4097604.891095	60	64	62.0	26.000	88.000	49171258.693144
13	65-69	4174311.206536	65	69	67.0	21.550	88.550	47795863.314839
14	70-74	5155833.993336	70	74	72.0	17.280	89.280	55270540.408560
15	75-79	5501272.621034	75	79	77.0	13.270	90.270	53527382.602658
16	80-105	14454418.859887	80	105	92.5	4.960	97.460	36714223.904113

What's the relative scale?¶

Finally, here is an updated vizualization and table showing the added YLL due to premature aging. The premature aging column turns out to be quite large. In reality, this burden would be split up among all the different causes of death, but it's interesting to see it as a stand-alone category.

The final table below shows that the Pandemic disease burden is in the middle, with a similar value to that of Neglected tropical diseases and malaria. The premature aging category has the largest burden of any single category, although it really should be split up among each of the diseases.

In [355]:

#colors_update = colors[:]
#colors_update.extend(['#ffff80','#ffffcc'])  #['#ffff80','#ffffcc']  [ '#A9A9A9', '#BFBFBF']
        
dicts = [{'Cause of death or injury': 'Premature aging', 'Year': 2013, 'Value': aging_yll},
         {'Cause of death or injury': 'Pandemic disease', 'Year': 2013, 'Value': pandemic_yll}]
  
gbd2013add_df = gbd2013_df.copy()
gbd2013add_df = gbd2013add_df.append(dicts, ignore_index=True)  
        
stacked_plot(gbd2013add_df, 0.1, 3.5e9, colors)

gbd2013add_df.sort_values(by='Value')
        

Out[355]:

	Location	Year	Age	Sex	Cause of death or injury	Measure	Value	Lower bound	Upper bound
0	Global	2013	All ages	Both	Forces of nature, war, and legal intervention	DALYs	6.113613e+06	3.504764e+06	1.106874e+07
17	Global	2013	All ages	Both	Maternal disorders	DALYs	1.802781e+07	1.605184e+07	1.998946e+07
14	Global	2013	All ages	Both	Other communicable, maternal, neonatal, and nutritional diseases	DALYs	2.711404e+07	2.168406e+07	3.397773e+07
10	Global	2013	All ages	Both	Cirrhosis	DALYs	3.685807e+07	3.505394e+07	3.902250e+07
9	Global	2013	All ages	Both	Digestive diseases	DALYs	3.734117e+07	3.367044e+07	4.145244e+07
1	Global	2013	All ages	Both	Self-harm and interpersonal violence	DALYs	5.657462e+07	4.867773e+07	6.325653e+07
15	Global	2013	All ages	Both	Nutritional deficiencies	DALYs	7.483442e+07	5.940201e+07	9.408409e+07
3	Global	2013	All ages	Both	Transport injuries	DALYs	7.895289e+07	7.212276e+07	8.511561e+07
8	Global	2013	All ages	Both	Neurological disorders	DALYs	8.404802e+07	6.569416e+07	1.056925e+08
18	Global	2013	All ages	Both	Neglected tropical diseases and malaria	DALYs	9.067684e+07	7.574893e+07	1.077376e+08
22	NaN	2013	NaN	NaN	Pandemic disease	NaN	9.221880e+07	NaN	NaN
2	Global	2013	All ages	Both	Unintentional injuries	DALYs	1.059413e+08	9.699608e+07	1.172652e+08
11	Global	2013	All ages	Both	Chronic respiratory diseases	DALYs	1.127107e+08	9.887194e+07	1.281478e+08
20	Global	2013	All ages	Both	HIV/AIDS and tuberculosis	DALYs	1.191796e+08	1.124977e+08	1.275849e+08
6	Global	2013	All ages	Both	Diabetes, urogenital, blood, and endocrine diseases	DALYs	1.416209e+08	1.187134e+08	1.681583e+08
5	Global	2013	All ages	Both	Musculoskeletal disorders	DALYs	1.494357e+08	1.068885e+08	1.975651e+08
4	Global	2013	All ages	Both	Other non-communicable diseases	DALYs	1.709479e+08	1.309229e+08	2.234843e+08
7	Global	2013	All ages	Both	Mental and substance use disorders	DALYs	1.731774e+08	1.274265e+08	2.217341e+08
16	Global	2013	All ages	Both	Neonatal disorders	DALYs	1.896010e+08	1.790241e+08	2.000440e+08
13	Global	2013	All ages	Both	Neoplasms	DALYs	1.970935e+08	1.892370e+08	2.062585e+08
19	Global	2013	All ages	Both	Diarrhea, lower respiratory, and other common infectious diseases	DALYs	2.498551e+08	2.312221e+08	2.696253e+08
12	Global	2013	All ages	Both	Cardiovascular diseases	DALYs	3.297056e+08	3.111888e+08	3.482062e+08
21	NaN	2013	NaN	NaN	Premature aging	NaN	5.520196e+08	NaN	NaN

In [ ]: