About the project:
This project focuses on visualizing the changes on happiness over the last ten years on country-level. We will find where are the happy countries and how their happiness changed over time. At the end of the notebook, we will see an animated bubble plot to display the changes.
Dataset:
This data set contains happiness data about 150 countries collected from World Happiness Report and from Gapminder
I did research on finding the data, combining different dataset and then cleaning it before the final visualization.
The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.
The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. The scores are from nationally representative samples for the years 2013-2016 and use the Gallup weights to make the estimates representative. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. They have no impact on the total score reported for each country, but they do explain why some countries rank higher than others.
Inspiration
What countries or regions rank the highest in overall happiness and each of the six factors contributing to happiness? How did country ranks or scores change between the 2015 and 2016 as well as the 2016 and 2017 reports? Did any country experience a significant increase or decrease in happiness? What is Dystopia?
Dystopia is an imaginary country that has the world’s least-happy people. The purpose in establishing Dystopia is to have a benchmark against which all countries can be favorably compared (no country performs more poorly than Dystopia) in terms of each of the six key variables, thus allowing each sub-bar to be of positive width. The lowest scores observed for the six key variables, therefore, characterize Dystopia. Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom and least social support, it is referred to as “Dystopia,” in contrast to Utopia. What are the residuals?
The residuals, or unexplained components, differ for each country, reflecting the extent to which the six variables either over- or under-explain average 2014-2016 life evaluations. These residuals have an average value of approximately zero over the whole set of countries. Figure 2.2 shows the average residual for each country when the equation in Table 2.1 is applied to average 2014- 2016 data for the six variables in that country. We combine these residuals with the estimate for life evaluations in Dystopia so that the combined bar will always have positive values. As can be seen in Figure 2.2, although some life evaluation residuals are quite large, occasionally exceeding one point on the scale from 0 to 10, they are always much smaller than the calculated value in Dystopia, where the average life is rated at 1.85 on the 0 to 10 scale.
What do the columns succeeding the Happiness Score(like Family, Generosity, etc.) describe?
The following columns: GDP per Capita, Family, Life Expectancy, Freedom, Generosity, Trust Government Corruption describe the extent to which these factors contribute in evaluating the happiness in each country. The Dystopia Residual metric actually is the Dystopia Happiness Score(1.85) + the Residual value or the unexplained value for each country as stated in the previous answer.
If you add all these factors up, you get the happiness score so it might be un-reliable to model them to predict Happiness Scores.
To make the final bubble plot, there is a long process on data cleaning. This Jupyter Notebook shows all the steps and methods on the cleaning process, but for people who are more interested to see the final result, you can skip this part and directly go to the last part Data Visualization.
# import library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
# to make better chart
sns.set_style('whitegrid')
plt.figure(figsize = (10,6))
sns.despine(left=True, bottom=True)
# to avoid warnings
import warnings
warnings.filterwarnings('ignore')
# to avoid truncated output
pd.options.display.max_columns = 150
<Figure size 720x432 with 0 Axes>
happy_2019 = pd.read_excel('Chapter2OnlineData.xls')
happy_2019.head()
Country name | Year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Positive affect | Negative affect | Confidence in national government | Democratic Quality | Delivery Quality | Standard deviation of ladder by country-year | Standard deviation/Mean of ladder by country-year | GINI index (World Bank estimate) | GINI index (World Bank estimate), average 2000-16 | gini of household income reported in Gallup, by wp5-year | Most people can be trusted, Gallup | Most people can be trusted, WVS round 1981-1984 | Most people can be trusted, WVS round 1989-1993 | Most people can be trusted, WVS round 1994-1998 | Most people can be trusted, WVS round 1999-2004 | Most people can be trusted, WVS round 2005-2009 | Most people can be trusted, WVS round 2010-2014 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2008 | 3.723590 | 7.168690 | 0.450662 | 50.799999 | 0.718114 | 0.177889 | 0.881686 | 0.517637 | 0.258195 | 0.612072 | -1.929690 | -1.655084 | 1.774662 | 0.476600 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | Afghanistan | 2009 | 4.401778 | 7.333790 | 0.552308 | 51.200001 | 0.678896 | 0.200178 | 0.850035 | 0.583926 | 0.237092 | 0.611545 | -2.044093 | -1.635025 | 1.722688 | 0.391362 | NaN | NaN | 0.441906 | 0.286315 | NaN | NaN | NaN | NaN | NaN | NaN |
2 | Afghanistan | 2010 | 4.758381 | 7.386629 | 0.539075 | 51.599998 | 0.600127 | 0.134353 | 0.706766 | 0.618265 | 0.275324 | 0.299357 | -1.991810 | -1.617176 | 1.878622 | 0.394803 | NaN | NaN | 0.327318 | 0.275833 | NaN | NaN | NaN | NaN | NaN | NaN |
3 | Afghanistan | 2011 | 3.831719 | 7.415019 | 0.521104 | 51.919998 | 0.495901 | 0.172137 | 0.731109 | 0.611387 | 0.267175 | 0.307386 | -1.919018 | -1.616221 | 1.785360 | 0.465942 | NaN | NaN | 0.336764 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | Afghanistan | 2012 | 3.782938 | 7.517126 | 0.520637 | 52.240002 | 0.530935 | 0.244273 | 0.775620 | 0.710385 | 0.267919 | 0.435440 | -1.842996 | -1.404078 | 1.798283 | 0.475367 | NaN | NaN | 0.344540 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
happy_2019['Country name'].nunique()
165
happy_2019.rename({'Country name':'Country'}, axis=1, inplace=True)
happy_2019.columns
Index(['Country', 'Year', 'Life Ladder', 'Log GDP per capita', 'Social support', 'Healthy life expectancy at birth', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption', 'Positive affect', 'Negative affect', 'Confidence in national government', 'Democratic Quality', 'Delivery Quality', 'Standard deviation of ladder by country-year', 'Standard deviation/Mean of ladder by country-year', 'GINI index (World Bank estimate)', 'GINI index (World Bank estimate), average 2000-16', 'gini of household income reported in Gallup, by wp5-year', 'Most people can be trusted, Gallup', 'Most people can be trusted, WVS round 1981-1984', 'Most people can be trusted, WVS round 1989-1993', 'Most people can be trusted, WVS round 1994-1998', 'Most people can be trusted, WVS round 1999-2004', 'Most people can be trusted, WVS round 2005-2009', 'Most people can be trusted, WVS round 2010-2014'], dtype='object')
drop_columns=['Positive affect', 'Negative affect',
'Confidence in national government', 'Democratic Quality',
'Delivery Quality', 'Standard deviation of ladder by country-year',
'Standard deviation/Mean of ladder by country-year',
'GINI index (World Bank estimate)',
'GINI index (World Bank estimate), average 2000-16',
'gini of household income reported in Gallup, by wp5-year',
'Most people can be trusted, Gallup',
'Most people can be trusted, WVS round 1981-1984',
'Most people can be trusted, WVS round 1989-1993',
'Most people can be trusted, WVS round 1994-1998',
'Most people can be trusted, WVS round 1999-2004',
'Most people can be trusted, WVS round 2005-2009',
'Most people can be trusted, WVS round 2010-2014']
happy_2019_clean = happy_2019.drop(columns = drop_columns, axis=1)
happy_2019_clean.head()
Country | Year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | |
---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2008 | 3.723590 | 7.168690 | 0.450662 | 50.799999 | 0.718114 | 0.177889 | 0.881686 |
1 | Afghanistan | 2009 | 4.401778 | 7.333790 | 0.552308 | 51.200001 | 0.678896 | 0.200178 | 0.850035 |
2 | Afghanistan | 2010 | 4.758381 | 7.386629 | 0.539075 | 51.599998 | 0.600127 | 0.134353 | 0.706766 |
3 | Afghanistan | 2011 | 3.831719 | 7.415019 | 0.521104 | 51.919998 | 0.495901 | 0.172137 | 0.731109 |
4 | Afghanistan | 2012 | 3.782938 | 7.517126 | 0.520637 | 52.240002 | 0.530935 | 0.244273 | 0.775620 |
happy_2019_clean.groupby('Year')['Country'].count()
Year 2005 27 2006 89 2007 102 2008 110 2009 114 2010 124 2011 146 2012 142 2013 137 2014 145 2015 143 2016 142 2017 147 2018 136 Name: Country, dtype: int64
happy_2019_clean['Country'].nunique()
165
happy_2019_clean.head()
Country | Year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | |
---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2008 | 3.723590 | 7.168690 | 0.450662 | 50.799999 | 0.718114 | 0.177889 | 0.881686 |
1 | Afghanistan | 2009 | 4.401778 | 7.333790 | 0.552308 | 51.200001 | 0.678896 | 0.200178 | 0.850035 |
2 | Afghanistan | 2010 | 4.758381 | 7.386629 | 0.539075 | 51.599998 | 0.600127 | 0.134353 | 0.706766 |
3 | Afghanistan | 2011 | 3.831719 | 7.415019 | 0.521104 | 51.919998 | 0.495901 | 0.172137 | 0.731109 |
4 | Afghanistan | 2012 | 3.782938 | 7.517126 | 0.520637 | 52.240002 | 0.530935 | 0.244273 | 0.775620 |
table_2015 = pd.read_excel('2015-2017.xlsx', sheet_name=1)
table_2015.head(1)
Country | Region | Happiness Rank | Happiness Score | Lower Confidence Interval | Upper Confidence Interval | Economy (GDP per Capita) | Family | Health (Life Expectancy) | Freedom | Trust (Government Corruption) | Generosity | Dystopia Residual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Denmark | Western Europe | 1 | 7.526 | 7.46 | 7.592 | 1.44178 | 1.16374 | 0.79504 | 0.57941 | 0.44453 | 0.36171 | 2.73939 |
Country_Region = table_2015[['Country','Region']]
print(Country_Region['Country'].nunique())
Country_Region.head()
157
Country | Region | |
---|---|---|
0 | Denmark | Western Europe |
1 | Switzerland | Western Europe |
2 | Iceland | Western Europe |
3 | Norway | Western Europe |
4 | Finland | Western Europe |
Country_Region['Region'].value_counts()
Sub-Saharan Africa 38 Central and Eastern Europe 29 Latin America and Caribbean 24 Western Europe 21 Middle East and Northern Africa 19 Southeastern Asia 9 Southern Asia 7 Eastern Asia 6 North America 2 Australia and New Zealand 2 Name: Region, dtype: int64
Country_Region.loc[Country_Region['Region'].str.contains('Asia'), 'Region']='Asia'
Country_Region['Region'].value_counts()
Sub-Saharan Africa 38 Central and Eastern Europe 29 Latin America and Caribbean 24 Asia 22 Western Europe 21 Middle East and Northern Africa 19 North America 2 Australia and New Zealand 2 Name: Region, dtype: int64
Country_Region['Region'].replace({'Australia and New Zealand':'North America'}, inplace=True)
Country_Region['Region'].value_counts()
Sub-Saharan Africa 38 Central and Eastern Europe 29 Latin America and Caribbean 24 Asia 22 Western Europe 21 Middle East and Northern Africa 19 North America 4 Name: Region, dtype: int64
Country_Region.head()
Country | Region | |
---|---|---|
0 | Denmark | Western Europe |
1 | Switzerland | Western Europe |
2 | Iceland | Western Europe |
3 | Norway | Western Europe |
4 | Finland | Western Europe |
happy_2019_clean.shape
(1704, 9)
happy_2019_clean_region = happy_2019_clean.merge(Country_Region, on='Country',how='outer')
print(happy_2019_clean_region.shape)
happy_2019_clean_region.head()
(1708, 10)
Country | Year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Region | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2008.0 | 3.723590 | 7.168690 | 0.450662 | 50.799999 | 0.718114 | 0.177889 | 0.881686 | Asia |
1 | Afghanistan | 2009.0 | 4.401778 | 7.333790 | 0.552308 | 51.200001 | 0.678896 | 0.200178 | 0.850035 | Asia |
2 | Afghanistan | 2010.0 | 4.758381 | 7.386629 | 0.539075 | 51.599998 | 0.600127 | 0.134353 | 0.706766 | Asia |
3 | Afghanistan | 2011.0 | 3.831719 | 7.415019 | 0.521104 | 51.919998 | 0.495901 | 0.172137 | 0.731109 | Asia |
4 | Afghanistan | 2012.0 | 3.782938 | 7.517126 | 0.520637 | 52.240002 | 0.530935 | 0.244273 | 0.775620 | Asia |
happy_2019_clean_region.isna().sum()
Country 0 Year 4 Life Ladder 4 Log GDP per capita 32 Social support 17 Healthy life expectancy at birth 32 Freedom to make life choices 33 Generosity 86 Perceptions of corruption 100 Region 50 dtype: int64
happy_2019_clean_region.to_csv('happy_2019_clean_region.csv',index=False)
happy_report=pd.read_excel('happy_2019_clean_region.xlsx')
happy_report.isna().sum()
Country 0 Year 0 Life Ladder 0 Log GDP per capita 28 Social support 12 Healthy life expectancy at birth 28 Freedom to make life choices 29 Generosity 82 Perceptions of corruption 96 Region 0 dtype: int64
happy_report['Country'].nunique()
165
happy_report.groupby('Year')['Country'].count()
Year 2005 27 2006 89 2007 102 2008 110 2009 114 2010 124 2011 146 2012 142 2013 137 2014 145 2015 143 2016 142 2017 147 2018 136 Name: Country, dtype: int64
population = pd.read_csv('population_total.csv')
population.head(1)
country | 1800 | 1801 | 1802 | 1803 | 1804 | 1805 | 1806 | 1807 | 1808 | 1809 | 1810 | 1811 | 1812 | 1813 | 1814 | 1815 | 1816 | 1817 | 1818 | 1819 | 1820 | 1821 | 1822 | 1823 | 1824 | 1825 | 1826 | 1827 | 1828 | 1829 | 1830 | 1831 | 1832 | 1833 | 1834 | 1835 | 1836 | 1837 | 1838 | 1839 | 1840 | 1841 | 1842 | 1843 | 1844 | 1845 | 1846 | 1847 | 1848 | 1849 | 1850 | 1851 | 1852 | 1853 | 1854 | 1855 | 1856 | 1857 | 1858 | 1859 | 1860 | 1861 | 1862 | 1863 | 1864 | 1865 | 1866 | 1867 | 1868 | 1869 | 1870 | 1871 | 1872 | 1873 | ... | 2026 | 2027 | 2028 | 2029 | 2030 | 2031 | 2032 | 2033 | 2034 | 2035 | 2036 | 2037 | 2038 | 2039 | 2040 | 2041 | 2042 | 2043 | 2044 | 2045 | 2046 | 2047 | 2048 | 2049 | 2050 | 2051 | 2052 | 2053 | 2054 | 2055 | 2056 | 2057 | 2058 | 2059 | 2060 | 2061 | 2062 | 2063 | 2064 | 2065 | 2066 | 2067 | 2068 | 2069 | 2070 | 2071 | 2072 | 2073 | 2074 | 2075 | 2076 | 2077 | 2078 | 2079 | 2080 | 2081 | 2082 | 2083 | 2084 | 2085 | 2086 | 2087 | 2088 | 2089 | 2090 | 2091 | 2092 | 2093 | 2094 | 2095 | 2096 | 2097 | 2098 | 2099 | 2100 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 3280000 | 3280000 | 3280000 | 3280000 | 3280000 | 3280000 | 3280000 | 3280000 | 3280000 | 3280000 | 3280000 | 3280000 | 3280000 | 3280000 | 3290000 | 3290000 | 3300000 | 3300000 | 3310000 | 3320000 | 3320000 | 3330000 | 3340000 | 3350000 | 3360000 | 3380000 | 3390000 | 3400000 | 3420000 | 3430000 | 3450000 | 3470000 | 3480000 | 3500000 | 3520000 | 3540000 | 3550000 | 3570000 | 3590000 | 3610000 | 3630000 | 3640000 | 3660000 | 3680000 | 3700000 | 3720000 | 3730000 | 3750000 | 3770000 | 3790000 | 3810000 | 3830000 | 3840000 | 3860000 | 3870000 | 3890000 | 3910000 | 3920000 | 3940000 | 3960000 | 3970000 | 3990000 | 4010000 | 4030000 | 4050000 | 4070000 | 4080000 | 4110000 | 4130000 | 4150000 | 4170000 | 4190000 | 4220000 | 4240000 | ... | 43300000 | 44100000 | 45000000 | 45800000 | 46700000 | 47600000 | 48400000 | 49200000 | 50100000 | 50900000 | 51700000 | 52500000 | 53300000 | 54100000 | 54900000 | 55700000 | 56400000 | 57200000 | 57900000 | 58600000 | 59300000 | 60000000 | 60700000 | 61300000 | 61900000 | 62500000 | 63100000 | 63700000 | 64300000 | 64800000 | 65400000 | 65900000 | 66400000 | 66800000 | 67300000 | 67700000 | 68200000 | 68600000 | 68900000 | 69300000 | 69700000 | 70000000 | 70300000 | 70600000 | 70800000 | 71100000 | 71300000 | 71500000 | 71700000 | 71800000 | 72000000 | 72100000 | 72200000 | 72300000 | 72300000 | 72400000 | 72400000 | 72400000 | 72400000 | 72400000 | 72300000 | 72300000 | 72200000 | 72100000 | 72000000 | 71900000 | 71800000 | 71600000 | 71500000 | 71300000 | 71200000 | 71000000 | 70800000 | 70600000 | 70400000 |
1 rows × 302 columns
population_country = population.loc[:,['country']]
population_year = population.loc[:,'2005':'2018']
population_new = population_country.merge(population_year, on = population_year.index)
population_new.head(1)
key_0 | country | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | Afghanistan | 25100000 | 25900000 | 26600000 | 27300000 | 28000000 | 28800000 | 29700000 | 30700000 | 31700000 | 32800000 | 33700000 | 34700000 | 35500000 | 36400000 |
population_new = population_new.drop('key_0',axis = 1)
population_new.head()
country | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 25100000 | 25900000 | 26600000 | 27300000 | 28000000 | 28800000 | 29700000 | 30700000 | 31700000 | 32800000 | 33700000 | 34700000 | 35500000 | 36400000 |
1 | Albania | 3080000 | 3050000 | 3020000 | 2990000 | 2960000 | 2940000 | 2930000 | 2920000 | 2920000 | 2920000 | 2920000 | 2930000 | 2930000 | 2930000 |
2 | Algeria | 33300000 | 33800000 | 34300000 | 34900000 | 35500000 | 36100000 | 36800000 | 37600000 | 38300000 | 39100000 | 39900000 | 40600000 | 41300000 | 42000000 |
3 | Andorra | 78900 | 81000 | 82700 | 83900 | 84500 | 84400 | 83800 | 82400 | 80800 | 79200 | 78000 | 77300 | 77000 | 77000 |
4 | Angola | 19600000 | 20300000 | 21000000 | 21800000 | 22500000 | 23400000 | 24200000 | 25100000 | 26000000 | 26900000 | 27900000 | 28800000 | 29800000 | 30800000 |
population_new_converted = pd.melt(population_new, id_vars=['country'],
value_vars =['2005','2006','2007','2008','2009','2010','2011','2012','2013',
'2014','2015','2016','2017','2018'])
population_new_converted.head()
country | variable | value | |
---|---|---|---|
0 | Afghanistan | 2005 | 25100000 |
1 | Albania | 2005 | 3080000 |
2 | Algeria | 2005 | 33300000 |
3 | Andorra | 2005 | 78900 |
4 | Angola | 2005 | 19600000 |
population_new_converted.rename({'country':'Country','variable':'Year', 'value':'Population'}, axis=1, inplace=True)
population_new_converted.head()
Country | Year | Population | |
---|---|---|---|
0 | Afghanistan | 2005 | 25100000 |
1 | Albania | 2005 | 3080000 |
2 | Algeria | 2005 | 33300000 |
3 | Andorra | 2005 | 78900 |
4 | Angola | 2005 | 19600000 |
population_new = population_new_converted.sort_values(by = ['Country','Year'])
population_new.head(20)
Country | Year | Population | |
---|---|---|---|
0 | Afghanistan | 2005 | 25100000 |
195 | Afghanistan | 2006 | 25900000 |
390 | Afghanistan | 2007 | 26600000 |
585 | Afghanistan | 2008 | 27300000 |
780 | Afghanistan | 2009 | 28000000 |
975 | Afghanistan | 2010 | 28800000 |
1170 | Afghanistan | 2011 | 29700000 |
1365 | Afghanistan | 2012 | 30700000 |
1560 | Afghanistan | 2013 | 31700000 |
1755 | Afghanistan | 2014 | 32800000 |
1950 | Afghanistan | 2015 | 33700000 |
2145 | Afghanistan | 2016 | 34700000 |
2340 | Afghanistan | 2017 | 35500000 |
2535 | Afghanistan | 2018 | 36400000 |
1 | Albania | 2005 | 3080000 |
196 | Albania | 2006 | 3050000 |
391 | Albania | 2007 | 3020000 |
586 | Albania | 2008 | 2990000 |
781 | Albania | 2009 | 2960000 |
976 | Albania | 2010 | 2940000 |
population_new.to_csv('population_new.csv')
population = pd.read_csv('population_new.csv')
population.head()
Unnamed: 0 | Country | Year | Population | |
---|---|---|---|---|
0 | 0 | Afghanistan | 2005 | 25100000 |
1 | 195 | Afghanistan | 2006 | 25900000 |
2 | 390 | Afghanistan | 2007 | 26600000 |
3 | 585 | Afghanistan | 2008 | 27300000 |
4 | 780 | Afghanistan | 2009 | 28000000 |
population.drop('Unnamed: 0',axis=1, inplace=True)
population.head()
Country | Year | Population | |
---|---|---|---|
0 | Afghanistan | 2005 | 25100000 |
1 | Afghanistan | 2006 | 25900000 |
2 | Afghanistan | 2007 | 26600000 |
3 | Afghanistan | 2008 | 27300000 |
4 | Afghanistan | 2009 | 28000000 |
happy_report.head()
Country | Year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Region | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2008 | 3.723590 | 7.168690 | 0.450662 | 50.799999 | 0.718114 | 0.177889 | 0.881686 | Asia |
1 | Afghanistan | 2009 | 4.401778 | 7.333790 | 0.552308 | 51.200001 | 0.678896 | 0.200178 | 0.850035 | Asia |
2 | Afghanistan | 2010 | 4.758381 | 7.386629 | 0.539075 | 51.599998 | 0.600127 | 0.134353 | 0.706766 | Asia |
3 | Afghanistan | 2011 | 3.831719 | 7.415019 | 0.521104 | 51.919998 | 0.495901 | 0.172137 | 0.731109 | Asia |
4 | Afghanistan | 2012 | 3.782938 | 7.517126 | 0.520637 | 52.240002 | 0.530935 | 0.244273 | 0.775620 | Asia |
happy_report_population=happy_report.merge(population, on=['Country','Year'])
happy_report_population.head()
Country | Year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Region | Population | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2008 | 3.723590 | 7.168690 | 0.450662 | 50.799999 | 0.718114 | 0.177889 | 0.881686 | Asia | 27300000 |
1 | Afghanistan | 2009 | 4.401778 | 7.333790 | 0.552308 | 51.200001 | 0.678896 | 0.200178 | 0.850035 | Asia | 28000000 |
2 | Afghanistan | 2010 | 4.758381 | 7.386629 | 0.539075 | 51.599998 | 0.600127 | 0.134353 | 0.706766 | Asia | 28800000 |
3 | Afghanistan | 2011 | 3.831719 | 7.415019 | 0.521104 | 51.919998 | 0.495901 | 0.172137 | 0.731109 | Asia | 29700000 |
4 | Afghanistan | 2012 | 3.782938 | 7.517126 | 0.520637 | 52.240002 | 0.530935 | 0.244273 | 0.775620 | Asia | 30700000 |
happy_report_population.to_csv('happy_report.csv',index=False)
happy_report_population.head()
Country | Year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Region | Population | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2008 | 3.723590 | 7.168690 | 0.450662 | 50.799999 | 0.718114 | 0.177889 | 0.881686 | Asia | 27300000 |
1 | Afghanistan | 2009 | 4.401778 | 7.333790 | 0.552308 | 51.200001 | 0.678896 | 0.200178 | 0.850035 | Asia | 28000000 |
2 | Afghanistan | 2010 | 4.758381 | 7.386629 | 0.539075 | 51.599998 | 0.600127 | 0.134353 | 0.706766 | Asia | 28800000 |
3 | Afghanistan | 2011 | 3.831719 | 7.415019 | 0.521104 | 51.919998 | 0.495901 | 0.172137 | 0.731109 | Asia | 29700000 |
4 | Afghanistan | 2012 | 3.782938 | 7.517126 | 0.520637 | 52.240002 | 0.530935 | 0.244273 | 0.775620 | Asia | 30700000 |
happy_report_population.groupby('Year')['Country'].count()
Year 2005 27 2006 83 2007 97 2008 103 2009 106 2010 116 2011 135 2012 130 2013 127 2014 134 2015 133 2016 131 2017 136 2018 126 Name: Country, dtype: int64
df = happy_report_population
happy2009 = df[df['Year']==2009]
happy2010 = df[df['Year']==2010]
happy2011 = df[df['Year']==2011]
happy2012 = df[df['Year']==2012]
happy2013 = df[df['Year']==2013]
happy2014 = df[df['Year']==2014]
happy2015 = df[df['Year']==2015]
happy2016 = df[df['Year']==2016]
happy2017 = df[df['Year']==2017]
happy2018 = df[df['Year']==2018]
df = happy2010.merge(happy2009,on='Country')
df.shape[0]
98
df = df.merge(happy2010,on='Country')
df.shape[0]
98
df = df.merge(happy2011,on='Country')
df.shape[0]
97
df = df.merge(happy2012,on='Country')
df.shape[0]
95
df = df.merge(happy2013,on='Country')
df.shape[0]
92
df = df.merge(happy2014,on='Country')
df.shape[0]
91
df = df.merge(happy2015,on='Country')
df.shape[0]
91
df = df.merge(happy2016,on='Country')
df.shape[0]
89
df = df.merge(happy2017,on='Country')
df.shape[0]
89
df = df.merge(happy2018,on='Country')
df.shape[0]
81
df.head()
Country | Year_x | Life Ladder_x | Log GDP per capita_x | Social support_x | Healthy life expectancy at birth_x | Freedom to make life choices_x | Generosity_x | Perceptions of corruption_x | Region_x | Population_x | Year_y | Life Ladder_y | Log GDP per capita_y | Social support_y | Healthy life expectancy at birth_y | Freedom to make life choices_y | Generosity_y | Perceptions of corruption_y | Region_y | Population_y | Year_x | Life Ladder_x | Log GDP per capita_x | Social support_x | Healthy life expectancy at birth_x | Freedom to make life choices_x | Generosity_x | Perceptions of corruption_x | Region_x | Population_x | Year_y | Life Ladder_y | Log GDP per capita_y | Social support_y | Healthy life expectancy at birth_y | Freedom to make life choices_y | Generosity_y | Perceptions of corruption_y | Region_y | Population_y | Year_x | Life Ladder_x | Log GDP per capita_x | Social support_x | Healthy life expectancy at birth_x | Freedom to make life choices_x | Generosity_x | Perceptions of corruption_x | Region_x | Population_x | Year_y | Life Ladder_y | Log GDP per capita_y | Social support_y | Healthy life expectancy at birth_y | Freedom to make life choices_y | Generosity_y | Perceptions of corruption_y | Region_y | Population_y | Year_x | Life Ladder_x | Log GDP per capita_x | Social support_x | Healthy life expectancy at birth_x | Freedom to make life choices_x | Generosity_x | Perceptions of corruption_x | Region_x | Population_x | Year_y | Life Ladder_y | Log GDP per capita_y | Social support_y | Healthy life expectancy at birth_y | Freedom to make life choices_y | Generosity_y | Perceptions of corruption_y | Region_y | Population_y | Year_x | Life Ladder_x | Log GDP per capita_x | Social support_x | Healthy life expectancy at birth_x | Freedom to make life choices_x | Generosity_x | Perceptions of corruption_x | Region_x | Population_x | Year_y | Life Ladder_y | Log GDP per capita_y | Social support_y | Healthy life expectancy at birth_y | Freedom to make life choices_y | Generosity_y | Perceptions of corruption_y | Region_y | Population_y | Year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Region | Population | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2010 | 4.758381 | 7.386629 | 0.539075 | 51.599998 | 0.600127 | 0.134353 | 0.706766 | Asia | 28800000 | 2009 | 4.401778 | 7.333790 | 0.552308 | 51.200001 | 0.678896 | 0.200178 | 0.850035 | Asia | 28000000 | 2010 | 4.758381 | 7.386629 | 0.539075 | 51.599998 | 0.600127 | 0.134353 | 0.706766 | Asia | 28800000 | 2011 | 3.831719 | 7.415019 | 0.521104 | 51.919998 | 0.495901 | 0.172137 | 0.731109 | Asia | 29700000 | 2012 | 3.782938 | 7.517126 | 0.520637 | 52.240002 | 0.530935 | 0.244273 | 0.775620 | Asia | 30700000 | 2013 | 3.572100 | 7.522238 | 0.483552 | 52.560001 | 0.577955 | 0.070403 | 0.823204 | Asia | 31700000 | 2014 | 3.130896 | 7.516955 | 0.525568 | 52.880001 | 0.508514 | 0.113184 | 0.871242 | Asia | 32800000 | 2015 | 3.982855 | 7.500539 | 0.528597 | 53.200001 | 0.388928 | 0.089091 | 0.880638 | Asia | 33700000 | 2016 | 4.220169 | 7.497038 | 0.559072 | 53.000000 | 0.522566 | 0.051365 | 0.793246 | Asia | 34700000 | 2017 | 2.661718 | 7.497755 | 0.490880 | 52.799999 | 0.427011 | -0.112198 | 0.954393 | Asia | 35500000 | 2018 | 2.694303 | 7.494588 | 0.507516 | 52.599998 | 0.373536 | -0.084888 | 0.927606 | Asia | 36400000 |
1 | Albania | 2010 | 5.268937 | 9.203032 | 0.733152 | 66.400002 | 0.568958 | -0.175367 | 0.726262 | Central and Eastern Europe | 2940000 | 2009 | 5.485470 | 9.161638 | 0.833047 | 66.199997 | 0.525223 | -0.160855 | 0.863665 | Central and Eastern Europe | 2960000 | 2010 | 5.268937 | 9.203032 | 0.733152 | 66.400002 | 0.568958 | -0.175367 | 0.726262 | Central and Eastern Europe | 2940000 | 2011 | 5.867422 | 9.230904 | 0.759434 | 66.680000 | 0.487496 | -0.207943 | 0.877003 | Central and Eastern Europe | 2930000 | 2012 | 5.510124 | 9.246655 | 0.784502 | 66.959999 | 0.601512 | -0.172262 | 0.847675 | Central and Eastern Europe | 2920000 | 2013 | 4.550648 | 9.258445 | 0.759477 | 67.239998 | 0.631830 | -0.130645 | 0.862905 | Central and Eastern Europe | 2920000 | 2014 | 4.813763 | 9.278104 | 0.625587 | 67.519997 | 0.734648 | -0.028162 | 0.882704 | Central and Eastern Europe | 2920000 | 2015 | 4.606651 | 9.302960 | 0.639356 | 67.800003 | 0.703851 | -0.084411 | 0.884793 | Central and Eastern Europe | 2920000 | 2016 | 4.511101 | 9.337532 | 0.638411 | 68.099998 | 0.729819 | -0.020687 | 0.901071 | Central and Eastern Europe | 2930000 | 2017 | 4.639548 | 9.376145 | 0.637698 | 68.400002 | 0.749611 | -0.032643 | 0.876135 | Central and Eastern Europe | 2930000 | 2018 | 5.004403 | 9.412399 | 0.683592 | 68.699997 | 0.824212 | 0.005385 | 0.899129 | Central and Eastern Europe | 2930000 |
2 | Argentina | 2010 | 6.441067 | 9.836924 | 0.926799 | 67.300003 | 0.730258 | -0.121725 | 0.854695 | Latin America and Caribbean | 41200000 | 2009 | 6.424133 | 9.750825 | 0.918693 | 67.180000 | 0.636646 | -0.125714 | 0.884742 | Latin America and Caribbean | 40800000 | 2010 | 6.441067 | 9.836924 | 0.926799 | 67.300003 | 0.730258 | -0.121725 | 0.854695 | Latin America and Caribbean | 41200000 | 2011 | 6.775805 | 9.884781 | 0.889073 | 67.480003 | 0.815802 | -0.170262 | 0.754646 | Latin America and Caribbean | 41700000 | 2012 | 6.468387 | 9.863960 | 0.901776 | 67.660004 | 0.747498 | -0.143875 | 0.816546 | Latin America and Caribbean | 42100000 | 2013 | 6.582260 | 9.877256 | 0.909874 | 67.839996 | 0.737250 | -0.126476 | 0.822900 | Latin America and Caribbean | 42500000 | 2014 | 6.671114 | 9.841482 | 0.917870 | 68.019997 | 0.745058 | -0.160449 | 0.854192 | Latin America and Caribbean | 43000000 | 2015 | 6.697131 | 9.858329 | 0.926492 | 68.199997 | 0.881224 | -0.170244 | 0.850906 | Latin America and Caribbean | 43400000 | 2016 | 6.427221 | 9.830088 | 0.882819 | 68.400002 | 0.847702 | -0.188304 | 0.850924 | Latin America and Caribbean | 43800000 | 2017 | 6.039330 | 9.848709 | 0.906699 | 68.599998 | 0.831966 | -0.182600 | 0.841052 | Latin America and Caribbean | 44300000 | 2018 | 5.792797 | 9.809972 | 0.899912 | 68.800003 | 0.845895 | -0.206937 | 0.855255 | Latin America and Caribbean | 44700000 |
3 | Armenia | 2010 | 4.367811 | 8.810287 | 0.660342 | 65.199997 | 0.459257 | -0.162075 | 0.890629 | Central and Eastern Europe | 2880000 | 2009 | 4.177582 | 8.784616 | 0.680007 | 65.099998 | 0.441413 | -0.199945 | 0.881887 | Central and Eastern Europe | 2890000 | 2010 | 4.367811 | 8.810287 | 0.660342 | 65.199997 | 0.459257 | -0.162075 | 0.890629 | Central and Eastern Europe | 2880000 | 2011 | 4.260491 | 8.856818 | 0.705108 | 65.360001 | 0.464525 | -0.211577 | 0.874601 | Central and Eastern Europe | 2880000 | 2012 | 4.319712 | 8.924142 | 0.676446 | 65.519997 | 0.501864 | -0.201539 | 0.892544 | Central and Eastern Europe | 2880000 | 2013 | 4.277191 | 8.952597 | 0.723260 | 65.680000 | 0.504082 | -0.181963 | 0.899797 | Central and Eastern Europe | 2890000 | 2014 | 4.453083 | 8.983580 | 0.738764 | 65.839996 | 0.506487 | -0.205098 | 0.920390 | Central and Eastern Europe | 2910000 | 2015 | 4.348320 | 9.011394 | 0.722551 | 66.000000 | 0.551027 | -0.189388 | 0.901462 | Central and Eastern Europe | 2920000 | 2016 | 4.325472 | 9.010698 | 0.709218 | 66.300003 | 0.610987 | -0.157249 | 0.921421 | Central and Eastern Europe | 2920000 | 2017 | 4.287736 | 9.081095 | 0.697925 | 66.599998 | 0.613697 | -0.133958 | 0.864683 | Central and Eastern Europe | 2930000 | 2018 | 5.062449 | 9.119424 | 0.814449 | 66.900002 | 0.807644 | -0.149109 | 0.676826 | Central and Eastern Europe | 2930000 |
4 | Azerbaijan | 2010 | 4.218611 | 9.677230 | 0.687001 | 63.400002 | 0.501071 | -0.142826 | 0.858347 | Central and Eastern Europe | 9030000 | 2009 | 4.573725 | 9.641726 | 0.735970 | 63.020000 | 0.498138 | -0.106351 | 0.753850 | Central and Eastern Europe | 8920000 | 2010 | 4.218611 | 9.677230 | 0.687001 | 63.400002 | 0.501071 | -0.142826 | 0.858347 | Central and Eastern Europe | 9030000 | 2011 | 4.680470 | 9.664859 | 0.725194 | 63.639999 | 0.537484 | -0.125277 | 0.795119 | Central and Eastern Europe | 9150000 | 2012 | 4.910772 | 9.673333 | 0.761873 | 63.880001 | 0.598859 | -0.160711 | 0.763155 | Central and Eastern Europe | 9260000 | 2013 | 5.481178 | 9.716747 | 0.769690 | 64.120003 | 0.671957 | -0.188644 | 0.698820 | Central and Eastern Europe | 9390000 | 2014 | 5.251530 | 9.724068 | 0.799433 | 64.360001 | 0.732773 | -0.228512 | 0.653845 | Central and Eastern Europe | 9500000 | 2015 | 5.146775 | 9.723096 | 0.785703 | 64.599998 | 0.764289 | -0.218102 | 0.615553 | Central and Eastern Europe | 9620000 | 2016 | 5.303895 | 9.680427 | 0.777271 | 64.900002 | 0.712573 | -0.224378 | 0.606771 | Central and Eastern Europe | 9730000 | 2017 | 5.152279 | 9.670762 | 0.787039 | 65.199997 | 0.731030 | -0.245216 | 0.652539 | Central and Eastern Europe | 9830000 | 2018 | 5.167995 | 9.678014 | 0.781230 | 65.500000 | 0.772449 | -0.251795 | 0.561206 | Central and Eastern Europe | 9920000 |
country_81=df[['Country']]
country_81.head()
Country | |
---|---|
0 | Afghanistan |
1 | Albania |
2 | Argentina |
3 | Armenia |
4 | Azerbaijan |
happy_report_population.head()
Country | Year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Region | Population | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2008 | 3.723590 | 7.168690 | 0.450662 | 50.799999 | 0.718114 | 0.177889 | 0.881686 | Asia | 27300000 |
1 | Afghanistan | 2009 | 4.401778 | 7.333790 | 0.552308 | 51.200001 | 0.678896 | 0.200178 | 0.850035 | Asia | 28000000 |
2 | Afghanistan | 2010 | 4.758381 | 7.386629 | 0.539075 | 51.599998 | 0.600127 | 0.134353 | 0.706766 | Asia | 28800000 |
3 | Afghanistan | 2011 | 3.831719 | 7.415019 | 0.521104 | 51.919998 | 0.495901 | 0.172137 | 0.731109 | Asia | 29700000 |
4 | Afghanistan | 2012 | 3.782938 | 7.517126 | 0.520637 | 52.240002 | 0.530935 | 0.244273 | 0.775620 | Asia | 30700000 |
happy_bubble=happy_report_population.copy()
happy_bubble = happy_bubble.merge(country_81, on='Country')
happy_bubble.head()
Country | Year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Region | Population | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2008 | 3.723590 | 7.168690 | 0.450662 | 50.799999 | 0.718114 | 0.177889 | 0.881686 | Asia | 27300000 |
1 | Afghanistan | 2009 | 4.401778 | 7.333790 | 0.552308 | 51.200001 | 0.678896 | 0.200178 | 0.850035 | Asia | 28000000 |
2 | Afghanistan | 2010 | 4.758381 | 7.386629 | 0.539075 | 51.599998 | 0.600127 | 0.134353 | 0.706766 | Asia | 28800000 |
3 | Afghanistan | 2011 | 3.831719 | 7.415019 | 0.521104 | 51.919998 | 0.495901 | 0.172137 | 0.731109 | Asia | 29700000 |
4 | Afghanistan | 2012 | 3.782938 | 7.517126 | 0.520637 | 52.240002 | 0.530935 | 0.244273 | 0.775620 | Asia | 30700000 |
happy_bubble['Year'].value_counts()
2018 81 2017 81 2016 81 2015 81 2014 81 2013 81 2012 81 2011 81 2010 81 2009 81 2007 68 2008 67 2006 53 2005 20 Name: Year, dtype: int64
happy_bubble = happy_bubble.drop(index=happy_bubble[happy_bubble['Year']==2005].index)
happy_bubble = happy_bubble.drop(index=happy_bubble[happy_bubble['Year']==2006].index)
happy_bubble = happy_bubble.drop(index=happy_bubble[happy_bubble['Year']==2007].index)
happy_bubble = happy_bubble.drop(index=happy_bubble[happy_bubble['Year']==2008].index)
happy_bubble['Year'].value_counts()
2018 81 2017 81 2016 81 2015 81 2014 81 2013 81 2012 81 2011 81 2010 81 2009 81 Name: Year, dtype: int64
happy_bubble.head()
Country | Year | Life Ladder | Log GDP per capita | Social support | Healthy life expectancy at birth | Freedom to make life choices | Generosity | Perceptions of corruption | Region | Population | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Afghanistan | 2009 | 4.401778 | 7.333790 | 0.552308 | 51.200001 | 0.678896 | 0.200178 | 0.850035 | Asia | 28000000 |
2 | Afghanistan | 2010 | 4.758381 | 7.386629 | 0.539075 | 51.599998 | 0.600127 | 0.134353 | 0.706766 | Asia | 28800000 |
3 | Afghanistan | 2011 | 3.831719 | 7.415019 | 0.521104 | 51.919998 | 0.495901 | 0.172137 | 0.731109 | Asia | 29700000 |
4 | Afghanistan | 2012 | 3.782938 | 7.517126 | 0.520637 | 52.240002 | 0.530935 | 0.244273 | 0.775620 | Asia | 30700000 |
5 | Afghanistan | 2013 | 3.572100 | 7.522238 | 0.483552 | 52.560001 | 0.577955 | 0.070403 | 0.823204 | Asia | 31700000 |
As shown in the cleaning process, I added region to classify each country, so we can have clear comparison on different region (Note that Australia and Newsland are included in'North America' because they are smaller countries regarding to population and more similar in culture and economics). I also added population to each country, and in this way we can find where the majority of people's happiness levels are (the large two blue bubbles are China and India)
The chart is fullyy interactive.
from __future__ import division
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode()
from bubbly.bubbly import bubbleplot
figure = bubbleplot(dataset = happy_bubble, x_column='Log GDP per capita', y_column='Life Ladder',
bubble_column='Country', color_column='Region', time_column='Year', size_column='Population',
x_title="GDP per Capita", y_title="Happiness Score for each country", title='Happy Report', scale_bubble=3, height=650)
iplot(figure, config={'scrollzoom': True})