Dataset - The dataset used here is the World Happiness Report data for 2015. It can be found here: https://www.kaggle.com/unsdsn/world-happiness/data
# import the necessary packages
import pandas as pd
import numpy as np
import geoplotlib
from geoplotlib.colors import ColorMap
from geoplotlib.colors import create_set_cmap
import pyglet
from sklearn.cluster import KMeans
from geoplotlib.layers import BaseLayer
from geoplotlib.core import BatchPainter
from geoplotlib.utils import BoundingBox
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
df = pd.read_csv('2015.csv')
print(df.head())
map_data = pd.read_csv('countries.csv')
map_data.head()
Country Region Happiness Rank Happiness Score \ 0 Switzerland Western Europe 1 7.587 1 Iceland Western Europe 2 7.561 2 Denmark Western Europe 3 7.527 3 Norway Western Europe 4 7.522 4 Canada North America 5 7.427 Standard Error Economy (GDP per Capita) Family \ 0 0.03411 1.39651 1.34951 1 0.04884 1.30232 1.40223 2 0.03328 1.32548 1.36058 3 0.03880 1.45900 1.33095 4 0.03553 1.32629 1.32261 Health (Life Expectancy) Freedom Trust (Government Corruption) \ 0 0.94143 0.66557 0.41978 1 0.94784 0.62877 0.14145 2 0.87464 0.64938 0.48357 3 0.88521 0.66973 0.36503 4 0.90563 0.63297 0.32957 Generosity Dystopia Residual 0 0.29678 2.51738 1 0.43630 2.70201 2 0.34139 2.49204 3 0.34699 2.46531 4 0.45811 2.45176
ISO 3166 Country Code | Country | Latitude | Longitude | |
---|---|---|---|---|
0 | AD | Andorra | 42.50 | 1.50 |
1 | AE | United Arab Emirates | 24.00 | 54.00 |
2 | AF | Afghanistan | 33.00 | 65.00 |
3 | AG | Antigua and Barbuda | 17.05 | -61.80 |
4 | AI | Anguilla | 18.25 | -63.17 |
This map shows the happiness rank of all the countries in the world in 2015. The darker the colour, the higher the rank, i.e. the happier the people in that country. It can be seen that the countries of North America (the US, Mexico and Canada), Australia, New Zealand and the western countries of Europe have the happiest citizens. Countries that have lower happiness scores are the ones that are either war struck (eg., Iraq) or are highly underdeveloped as can be said of the countries in Africa like Congo and Chad. The top 5 countries are:
scl = [[0,"rgb(5, 10, 172)"],[0.35,"rgb(40, 60, 190)"],[0.5,"rgb(70, 100, 245)"],\
[0.6,"rgb(90, 120, 245)"],[0.7,"rgb(106, 137, 247)"],[1,"rgb(240, 210, 250)"]]
# scl = [[0.0, 'rgb(50,10,143)'],[0.2, 'rgb(117,107,177)'],[0.4, 'rgb(158,154,200)'],\
# [0.6, 'rgb(188,189,220)'],[0.8, 'rgb(218,208,235)'],[1.0, 'rgb(250,240,255)']]
data = dict(type = 'choropleth',
colorscale = scl,
autocolorscale = True,
reversescale = True,
locations = df['Country'],
locationmode = 'country names',
z = df['Happiness Rank'],
text = df['Country'],
colorbar = {'title':'Happiness'})
layout = dict(title = 'Global Happiness',
geo = dict(showframe = False,
projection = {'type': 'Orthographic'}))
choromap3 = go.Figure(data = [data], layout=layout)
iplot(choromap3)
To take our analysis further, we plot a symbol map where the size of the circles represents the Happiness Score while the colour represents the GDP of the countries. Larger the circle, happier the citizens while darker the circle, higher the GDP. From this plot, we can see that the top 5 countries we have seen above definitely have a much higher GDP. ** This seems to imply that more well off a country is economically, the happier its citizens are. ** Also, the underdeveloped countries like Chad, Congo, Burundi and Togo have a very low GDP and also a very low happiness score. While we cannot directly say that low GDP implies lower happiness, it seems like an important factor. This trend remain consistent throughout all the countries. ** There are almost no countries that have a high GDP but low happiness index or vice versa. **
Also geographic location and neighbours may be playing an important role. We can clusters/regions with countries having similar GDPs and similar Happiness Scores. ** So countries that have a good economy and good relations with their neighbours benefit from mutual growth and this is also reflected in their happiness scores. Countries that have disturbed neighbourhoods, like in middle east Asia (Iraq, Afghanistan, etc.), show much lower growth/economic prosperity as well as lower happiness scores. **
One thing that can also be noted is that in general, countries which are known for their lower population densities (https://www.worldatlas.com/articles/the-10-least-densely-populated-places-in-the-world-2015.html) like Denmark and Iceland are much happier than the more densly populated countries.
df = df.merge(map_data, how='left', on = ['Country'])
df.head()
Country | Region | Happiness Rank | Happiness Score | Standard Error | Economy (GDP per Capita) | Family | Health (Life Expectancy) | Freedom | Trust (Government Corruption) | Generosity | Dystopia Residual | ISO 3166 Country Code | Latitude | Longitude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Switzerland | Western Europe | 1 | 7.587 | 0.03411 | 1.39651 | 1.34951 | 0.94143 | 0.66557 | 0.41978 | 0.29678 | 2.51738 | CH | 47.0 | 8.0 |
1 | Iceland | Western Europe | 2 | 7.561 | 0.04884 | 1.30232 | 1.40223 | 0.94784 | 0.62877 | 0.14145 | 0.43630 | 2.70201 | IS | 65.0 | -18.0 |
2 | Denmark | Western Europe | 3 | 7.527 | 0.03328 | 1.32548 | 1.36058 | 0.87464 | 0.64938 | 0.48357 | 0.34139 | 2.49204 | DK | 56.0 | 10.0 |
3 | Norway | Western Europe | 4 | 7.522 | 0.03880 | 1.45900 | 1.33095 | 0.88521 | 0.66973 | 0.36503 | 0.34699 | 2.46531 | NO | 62.0 | 10.0 |
4 | Canada | North America | 5 | 7.427 | 0.03553 | 1.32629 | 1.32261 | 0.90563 | 0.63297 | 0.32957 | 0.45811 | 2.45176 | CA | 60.0 | -95.0 |
df['Happiness Score'].min(), df['Happiness Score'].max()
(2.839, 7.5870000000000006)
df['text']=df['Country'] + '<br>Happiness Score ' + (df['Happiness Score']).astype(str)
scl = [ [0,"rgb(5, 10, 172)"],[0.35,"rgb(40, 60, 190)"],[0.5,"rgb(70, 100, 245)"],\
[0.6,"rgb(90, 120, 245)"],[0.7,"rgb(106, 137, 247)"],[1,"rgb(220, 220, 220)"] ]
data = [ dict(
type = 'scattergeo',
locationmode = 'country names',
lon = df['Longitude'],
lat = df['Latitude'],
text = df['text'],
mode = 'markers',
marker = dict(
size = df['Happiness Score']*3,
opacity = 0.8,
reversescale = True,
autocolorscale = False,
symbol = 'circle',
line = dict(
width=1,
color='rgba(102, 102, 102)'
),
colorscale = scl,
cmin = 0,
color = df['Economy (GDP per Capita)'],
cmax = df['Economy (GDP per Capita)'].max(),
colorbar=dict(
title="GDP per Capita"
)
))]
layout = dict(
title = 'Happiness Scores by GDP',
geo = dict(
# scope='usa',
projection=dict( type='Mercator' ),
showland = True,
landcolor = "rgb(250, 250, 250)",
# subunitcolor = "rgb(217, 217, 217)",
# countrycolor = "rgb(217, 217, 217)",
countrywidth = 0.5,
subunitwidth = 0.5
),
)
symbolmap = go.Figure(data = data, layout=layout)
iplot(symbolmap)
A cluster plot may also be used to see if the clustered regions coincide with any of the regions above. (This plot opens in a new window)
"""
Example of keyboard interaction
"""
class KMeansLayer(BaseLayer):
def __init__(self, data):
self.data = data
self.k = 2
def invalidate(self, proj):
self.painter = BatchPainter()
x, y = proj.lonlat_to_screen(self.data['Longitude'], self.data['Latitude'])
k_means = KMeans(n_clusters=self.k)
k_means.fit(np.vstack([x,y]).T)
labels = k_means.labels_
self.cmap = create_set_cmap(set(labels), 'hsv')
for l in set(labels):
self.painter.set_color(self.cmap[l])
self.painter.convexhull(x[labels == l], y[labels == l])
self.painter.points(x[labels == l], y[labels == l], 2)
def draw(self, proj, mouse_x, mouse_y, ui_manager):
ui_manager.info('Use left and right to increase/decrease the number of clusters. k = %d' % self.k)
self.painter.batch_draw()
def on_key_release(self, key, modifiers):
if key == pyglet.window.key.LEFT:
self.k = max(2,self.k - 1)
return True
elif key == pyglet.window.key.RIGHT:
self.k = self.k + 1
return True
return False
data = geoplotlib.utils.DataAccessObject(df)
geoplotlib.add_layer(KMeansLayer(data))
geoplotlib.set_smoothing(True)
geoplotlib.set_bbox(geoplotlib.utils.BoundingBox.DK)
geoplotlib.show()