#!/usr/bin/env python # coding: utf-8 # # CIA WORLD FACTBOOK ANALYSIS IN SQL # In this project, we will be working with data from the [CIA WORLD FACTBOOK](https://www.cia.gov/library/publications/the-world-factbook/), a compedium of statistics about all of the countries on earth. The factbook contains demographic information like: # - population - The popuation as of 2015 # - population_growth - The annual population growth rate, as a percentage # - area - The total land and water area # # The database was prepared by dataquest with sqlite and can be downloaded [here](https://dsserver-prod-resources-1.s3.amazonaws.com/257/factbook.db) # we need to importall the neccesssary extensions to jupyternotebbok which will allow it to run sql # they are # `%%capture # %load_ext sql # %sql sqlite:///factbook.db` # In[3]: get_ipython().run_cell_magic('capture', '', '%load_ext sql\n%sql sqlite:///factbook.db\n') # The code below was used to introduce the name of the tsble and what the table looked like. # In[4]: get_ipython().run_cell_magic('sql', '', "SELECT * FROM sqlite_master where type='table'\n") # In[5]: get_ipython().run_cell_magic('sql', '', 'SELECT * FROM FACTS;\n') # Here are the descriptions for some of the columns above # - name - The name of the country # - area - The total land and sea area of the country. # - population - The country's population # - population_growth - The country's population growth as a percentage # - birth_rate - The country's birth rate, or the number of births a year per 1,000 people # - death_rate - The country's death rate,or the number of death a year per 1,000 people # - area - The country's total area (both land and water) # - area_land - The country's land area in square kilometers # - area_water - The country's waterarea in square kilometers # In[9]: get_ipython().run_cell_magic('sql', '', 'SELECT min(population),max(population),min(population_growth),max(population_growth)\nFROM FACTS;\n') # The query above returned a query of the minimum population, maximum population,minimum population_growth, maximum population growth. # # From here we could see that there are some outliers, we have a minimun population which is zero and the maximum population which is about 7.2 billlion which is weird. # let us explore this in detail # In[21]: get_ipython().run_cell_magic('sql', '', 'select *\nfrom facts\nwhere population = (select min(population) from facts)\n') # In[23]: get_ipython().run_cell_magic('sql', '', 'select * \nfrom facts \nwhere population = (select max(population) from facts)\n') # We could see that the a column was assigned to the whole world, and antarctica corresponds to the zero population column, which is quite correct. # from the [CIA](https://www.cia.gov/library/publications/the-world-factbook/geos/ay.html) website it says "no indigenous inhabitants, but there are both permanent and summer-only staffed research stations" # In[24]: get_ipython().run_cell_magic('sql', '', 'select avg(population),avg(area) from facts\n') # In[26]: get_ipython().run_cell_magic('sql', '', 'select *\nfrom facts\nwhere population > (select avg(population) from facts) and area >(select avg(area) from facts )\n') # the above query returned the all counties that are densely populated, i.e higher than the avg popuation and and area # lets check to see counties tat have the highest ratios of water to land # In[38]: get_ipython().run_cell_magic('sql', '', 'select name, cast(area_water as float)/cast(area_land as float) water_to_land_ratio\nfrom facts\nwhere water_to_land_ratio = (select max(cast(area_water as float)/cast(area_land as float)) from facts)\n') # alas we could see that the british indian ocean territory has the highest ratio of water to land. Well, which country will dare have a higher ratio than one that has ocean attached to its name # In[39]: get_ipython().run_cell_magic('sql', '', 'select name, population_growth\nfrom facts\nwhere population_growth = (select max(population_growth) from facts)\n') # From above we could see that the country that will add the most people to thier population is south sudan with a high population growth of about 4.02 # In[42]: get_ipython().run_cell_magic('sql', '', 'select name,death_rate,birth_rate\nfrom facts\nwhere death_rate > birth_rate\n') # In[ ]: The above query shows country with a higher death_rate than birth_rate # In[ ]: