#!/usr/bin/env python # coding: utf-8 # # Guided Project: Analyzing CIA Factbook Data Using SQL # # > The data from the [CIA World Factbook](), # > a compendium of statistics about all of the countries on Earth. # > The Factbook contains demographic information like: # > # > * population - The population as of 2015. # > * population_growth - The annual population growth rate, as a # > percentage. # > * area - The total land and water area. # > # > To download the SQLite database - [factbook.db]() # In[1]: get_ipython().run_cell_magic('capture', '', '%load_ext sql\n%sql sqlite:///factbook.db\n') # In[2]: get_ipython().run_cell_magic('sql', '', "SELECT \n* \nFROM sqlite_master \nWHERE type='table';\n") # ### Overview of the Data # In[3]: get_ipython().run_cell_magic('sql', '', 'SELECT \n* \nFROM facts \nLIMIT 5;\n') # ### Here are the descriptions of the columns: # # |Column Name|Description| # |:----|:----| # |name|The name of the country| # |area|The total land and sea area of the country.| # |population|The country's population.| # |population_growth|The country's population growth as a percentage.| # |birth_rate|The country's birth rate, or the number of births a year per 1,000 people.| # |death_rate|The country's death rate, or the number of death a year per 1,000 people.| # |area|The country's total area (both land and water).| # |area_land|The country's land area in square kilometers.| # |area_water|The country's waterarea in square kilometers.| # ### Calculating some summary statistics and looking for any outlier countries # In[4]: get_ipython().run_cell_magic('sql', '', 'SELECT \n MIN(population) AS min_pop, \n MAX(population) AS max_pop, \n MIN(population_growth) AS min_pop_grwth, \n MAX(population_growth) AS max_pop_grwth \nFROM facts;\n') # > The result we got from our above query is really astounding, we # > have a country with zero population and a country with more # > than 7.2 billion people. Also we have astounding population # > growth rate. Can we associate the same with the respective # > country with population 0 and more than 7.2 billion population? # > # > Let us find out these two countries. And the countries with 0.0 # > and 4.02 population growth rate. # In[5]: get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS country, \n population AS pop,\n population_growth AS pop_grwth\nFROM facts \nWHERE pop == (SELECT MIN(population) FROM facts);\n') # > There is only one country with least population (population of # > 0 or no population at all): # > **Antarctica** # In[6]: get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS country, \n population AS pop,\n population_growth AS pop_grwth\nFROM facts \nWHERE pop == (SELECT MAX(population) FROM facts);\n') # > There is only one country with maximum population (population # > of over 7.2 billion): # > **World** # > # > Taking into consideration that here the World as a whole is # > counted to be one big country itself on our Earth. # In[7]: get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS country, \n population AS pop,\n population_growth AS pop_grwth\nFROM facts \nWHERE pop_grwth == (SELECT MIN(population_growth) FROM facts);\n') # In[8]: get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS country, \n population AS pop,\n population_growth AS pop_grwth\nFROM facts \nWHERE pop_grwth == (SELECT MAX(population_growth) FROM facts);\n') # In[9]: get_ipython().run_cell_magic('sql', '', 'SELECT \n DISTINCT population_growth AS pop_grwth\nFROM facts\nORDER BY pop_grwth DESC;\n') # > Thus from the results of above SQL queries we conclude # > that: # > # > * the table contains a row for Antarctica, which explains having a value 0 in the population column. # > * the table contains a row for the whole world, which explains having a value of over 7.2 billion in the population column. # > # > This seems to match the CIA Factbook [page for Antarctica]() # > # > We also found that Antarctica population growth is None, but # > obvious there is no population, and World population growth is # > 1.08. The minimum and maximum values in population_growth # > columns are not associated with these two countries as we had # > assumed previously. # > * the table contains 4 countries which do have some population # > but numbers shows zero population growth in these countries. # > * our query above reveals that South Sudan is the country which has maximum population growth rate of 4.02. # ### Finding the average of population and area column # In[10]: get_ipython().run_cell_magic('sql', '', 'SELECT \n AVG(population) AS pop_avg, \n AVG(area) AS area_avg \nFROM facts;\n') # > The result of above query reveals that the average population # > is far more than the average area. Does this means # > overpopulated? # > # > Let us find the countries which are overpopulated / densely # > populated. # ### Finding countries that are densely populated # In[11]: get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS countries,\n population AS pop,\n area AS area\nFROM facts \nWHERE population > (SELECT AVG(population) FROM facts) And \narea < (SELECT AVG(area) FROM facts);\n') # > Above query outputs the name of the countries in the table # > having dense population. # > # > To find countries that are densely populated we need to # > identify the countries that have: # > * Above average values for population # > # > and # > * Below average values for area. # > # > The result of our above queries shows that as per 2015 records # > there are seven countries which are overpopulated / densely # > populated. And South Sudan is having the highest population # > growth rate. # ### Countries having the highest ratios of water to land # In[12]: get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS countries, \n ROUND(CAST(area_water AS Float)/CAST(area_land AS Float),2) AS ratios \nFROM facts \nORDER BY ratios DESC \nLIMIT 5;\n') # > The above query reveals that the country British Indian Ocean # > Territory has highest ratio of 905.67 of water to land followed # > by Virgin Islands with 4.52 # ### Countries having more water than land # In[13]: get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS countries, \n area_water, \n area_land \nFROM facts \nWHERE area_water > area_land;\n') # > The above query reveals more details to explain ratios 905.67 # > and 4.52 of water to land for the countries British Indian # > Ocean Territory and Virgin Islands respectively. # ### Countries that will add the most people to their population next year 2016. (since the data is as of 2015 records) # In[14]: get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS countries, \n population_growth AS pop_grwth \nFROM facts \nORDER BY pop_grwth \nDESC LIMIT 10;\n') # > Through above query we collected minimum 10 countries with # > their respective population growth rate, and found South Sudan # > is leading with highest population growth rate of 4.02. # > # > As mentioned this is for 2016 as per 2015 record. However there # > were some drastic changes seen from 2018 to 2019 in the # > population growth rate, and hence based on population growth # > rate records in our table as of 2015, we won't be able to find # > the countries who have and will add most people in present year # > 2020. # ### Countries having higher death rate than birth rate # In[15]: get_ipython().run_cell_magic('sql', '', 'SELECT \n name AS countries,\n death_rate, \n birth_rate,\n ROUND((death_rate - birth_rate),2) AS diff\nFROM facts \nWHERE death_rate > birth_rate\nORDER BY diff DESC;\n') # > The result of our query above shows that Bulgaria is the # > country which has higher death rate than birth rate.