#!/usr/bin/env python
# coding: utf-8
# # Guided Project: Analyzing CIA Factbook Data Using SQL
#
# > The data from the [CIA World Factbook](),
# > a compendium of statistics about all of the countries on Earth.
# > The Factbook contains demographic information like:
# >
# > * population - The population as of 2015.
# > * population_growth - The annual population growth rate, as a
# > percentage.
# > * area - The total land and water area.
# >
# > To download the SQLite database - [factbook.db]()
# In[1]:
get_ipython().run_cell_magic('capture', '', '%load_ext sql\n%sql sqlite:///factbook.db\n')
# In[2]:
get_ipython().run_cell_magic('sql', '', "SELECT \n* \nFROM sqlite_master \nWHERE type='table';\n")
# ### Overview of the Data
# In[3]:
get_ipython().run_cell_magic('sql', '', 'SELECT \n* \nFROM facts \nLIMIT 5;\n')
# ### Here are the descriptions of the columns:
#
# |Column Name|Description|
# |:----|:----|
# |name|The name of the country|
# |area|The total land and sea area of the country.|
# |population|The country's population.|
# |population_growth|The country's population growth as a percentage.|
# |birth_rate|The country's birth rate, or the number of births a year per 1,000 people.|
# |death_rate|The country's death rate, or the number of death a year per 1,000 people.|
# |area|The country's total area (both land and water).|
# |area_land|The country's land area in square kilometers.|
# |area_water|The country's waterarea in square kilometers.|
# ### Calculating some summary statistics and looking for any outlier countries
# In[4]:
get_ipython().run_cell_magic('sql', '', 'SELECT \n MIN(population) AS min_pop, \n MAX(population) AS max_pop, \n MIN(population_growth) AS min_pop_grwth, \n MAX(population_growth) AS max_pop_grwth \nFROM facts;\n')
# > The result we got from our above query is really astounding, we
# > have a country with zero population and a country with more
# > than 7.2 billion people. Also we have astounding population
# > growth rate. Can we associate the same with the respective
# > country with population 0 and more than 7.2 billion population?
# >
# > Let us find out these two countries. And the countries with 0.0
# > and 4.02 population growth rate.
# In[5]:
get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS country, \n population AS pop,\n population_growth AS pop_grwth\nFROM facts \nWHERE pop == (SELECT MIN(population) FROM facts);\n')
# > There is only one country with least population (population of
# > 0 or no population at all):
# > **Antarctica**
# In[6]:
get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS country, \n population AS pop,\n population_growth AS pop_grwth\nFROM facts \nWHERE pop == (SELECT MAX(population) FROM facts);\n')
# > There is only one country with maximum population (population
# > of over 7.2 billion):
# > **World**
# >
# > Taking into consideration that here the World as a whole is
# > counted to be one big country itself on our Earth.
# In[7]:
get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS country, \n population AS pop,\n population_growth AS pop_grwth\nFROM facts \nWHERE pop_grwth == (SELECT MIN(population_growth) FROM facts);\n')
# In[8]:
get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS country, \n population AS pop,\n population_growth AS pop_grwth\nFROM facts \nWHERE pop_grwth == (SELECT MAX(population_growth) FROM facts);\n')
# In[9]:
get_ipython().run_cell_magic('sql', '', 'SELECT \n DISTINCT population_growth AS pop_grwth\nFROM facts\nORDER BY pop_grwth DESC;\n')
# > Thus from the results of above SQL queries we conclude
# > that:
# >
# > * the table contains a row for Antarctica, which explains having a value 0 in the population column.
# > * the table contains a row for the whole world, which explains having a value of over 7.2 billion in the population column.
# >
# > This seems to match the CIA Factbook [page for Antarctica]()
# >
# > We also found that Antarctica population growth is None, but
# > obvious there is no population, and World population growth is
# > 1.08. The minimum and maximum values in population_growth
# > columns are not associated with these two countries as we had
# > assumed previously.
# > * the table contains 4 countries which do have some population
# > but numbers shows zero population growth in these countries.
# > * our query above reveals that South Sudan is the country which has maximum population growth rate of 4.02.
# ### Finding the average of population and area column
# In[10]:
get_ipython().run_cell_magic('sql', '', 'SELECT \n AVG(population) AS pop_avg, \n AVG(area) AS area_avg \nFROM facts;\n')
# > The result of above query reveals that the average population
# > is far more than the average area. Does this means
# > overpopulated?
# >
# > Let us find the countries which are overpopulated / densely
# > populated.
# ### Finding countries that are densely populated
# In[11]:
get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS countries,\n population AS pop,\n area AS area\nFROM facts \nWHERE population > (SELECT AVG(population) FROM facts) And \narea < (SELECT AVG(area) FROM facts);\n')
# > Above query outputs the name of the countries in the table
# > having dense population.
# >
# > To find countries that are densely populated we need to
# > identify the countries that have:
# > * Above average values for population
# >
# > and
# > * Below average values for area.
# >
# > The result of our above queries shows that as per 2015 records
# > there are seven countries which are overpopulated / densely
# > populated. And South Sudan is having the highest population
# > growth rate.
# ### Countries having the highest ratios of water to land
# In[12]:
get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS countries, \n ROUND(CAST(area_water AS Float)/CAST(area_land AS Float),2) AS ratios \nFROM facts \nORDER BY ratios DESC \nLIMIT 5;\n')
# > The above query reveals that the country British Indian Ocean
# > Territory has highest ratio of 905.67 of water to land followed
# > by Virgin Islands with 4.52
# ### Countries having more water than land
# In[13]:
get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS countries, \n area_water, \n area_land \nFROM facts \nWHERE area_water > area_land;\n')
# > The above query reveals more details to explain ratios 905.67
# > and 4.52 of water to land for the countries British Indian
# > Ocean Territory and Virgin Islands respectively.
# ### Countries that will add the most people to their population next year 2016. (since the data is as of 2015 records)
# In[14]:
get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS countries, \n population_growth AS pop_grwth \nFROM facts \nORDER BY pop_grwth \nDESC LIMIT 10;\n')
# > Through above query we collected minimum 10 countries with
# > their respective population growth rate, and found South Sudan
# > is leading with highest population growth rate of 4.02.
# >
# > As mentioned this is for 2016 as per 2015 record. However there
# > were some drastic changes seen from 2018 to 2019 in the
# > population growth rate, and hence based on population growth
# > rate records in our table as of 2015, we won't be able to find
# > the countries who have and will add most people in present year
# > 2020.
# ### Countries having higher death rate than birth rate
# In[15]:
get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS countries,\n death_rate, \n birth_rate,\n ROUND((death_rate - birth_rate),2) AS diff\nFROM facts \nWHERE death_rate > birth_rate\nORDER BY diff DESC;\n')
# > The result of our query above shows that Bulgaria is the
# > country which has higher death rate than birth rate.