#!/usr/bin/env python # coding: utf-8 # # Introduction # In[1]: get_ipython().run_cell_magic('capture', '', '%load_ext sql\n%sql sqlite:///factbook.db\n') # # Table viewing # In[2]: get_ipython().run_cell_magic('sql', '', 'SELECT *\n FROM facts\nLIMIT 5; \n') # # Column Description # **Below are the description of some of the columns in the table(facts):** # * id: Country Identification. # * Code: Country's code. # * name: The name of the country. # * area: The country's total area (both land and water). # * area_land:The country's land area in square kilometers. # * area_water: is the area of country's land covered by water in square kilometers. # * population: Contry's Population. # * population_growth- The country's population growth as a percentage. # * birth_rate:The country's birth rate, or the number of births a year per 1,000 people. # * death_rate:The country's death rate, or the number of death a year per 1,000 people. # * migration_rate: the rate at which people move out of the country from other countries. # # SUMMARY STATISTICS # In[3]: get_ipython().run_cell_magic('sql', '', 'SELECT MIN(population) AS min_population, \n MAX(population) AS max_population,\n MIN(population_growth) AS min_pop_growth,\n MAX(population_growth) AS max_pop_growth \n FROM facts;\n') # **From the result of our summary statistics we can see that there is a country with Population 0, # Another one has a population of over 7.2 billion. # In the next cell, we are going to use subqueries to zoom in on these countries with the above populations.** # In[4]: get_ipython().run_cell_magic('sql', '', 'SELECT *\n FROM facts\n WHERE population = (SELECT MIN(Population)\n FROM facts);\n') # **From our output we can see that Antartica is the country with zero population. If we want to know why the # Population is 0, we can find out from [here](https://www.cia.gov/library/publications/the-world-factbook/geos/ay.html).** # In[5]: get_ipython().run_cell_magic('sql', '', 'SELECT *\n FROM facts\n WHERE population = (SELECT MAX(Population)\n FROM facts);\n') # **From our output we can see that the name returned is the Population for the whole world, # This explains the ambiguous population. In our next code cell we will recompute our summary and remove the 'world' # country, so that we can have an accurate result.** # In[6]: get_ipython().run_cell_magic('sql', '', "SELECT MIN(population) AS min_population, \n MAX(population) AS max_population,\n MIN(population_growth) AS min_pop_growth,\n MAX(population_growth) AS max_pop_growth \n FROM facts\nWHERE name <> 'World';\n") # **There is a country that has a population of approximately 1.4 Billion** # In[7]: get_ipython().run_cell_magic('sql', '', "SELECT *\n FROM facts\n WHERE population = (SELECT MAX(Population)\n FROM facts\n WHERE name <> 'World');\n") # **From our output above, we discovered that China is the country with the highest population of about 1.4 Billion.** # # Exploring Average population and Average Area # In[8]: get_ipython().run_cell_magic('sql', '', "SELECT AVG(population) AS avg_pop, AVG(area) AS avg_area\n FROM facts\n WHERE name <> 'World';\n") # **From our output the average population is around 32 million whereas the average area is # approximately 555 thousand Kilometer square.** # # # Density Population # **To finish, we'll build on the query we wrote for the previous screen to find countries that are densely populated. We'll identify countries that have:** # # **Above average values for population. # Below average values for area.** # In[9]: get_ipython().run_cell_magic('sql', '', "SELECT *\n FROM facts\nWHERE population > (SELECT AVG(population) FROM facts\n WHERE name <> 'World')\n AND area < (SELECT AVG(area) FROM facts\n WHERE name <> 'World');\n") # **In conclusion, if we compare our result with the real data of some of these countries, we will find # the result to be true.** # In[10]: get_ipython().run_cell_magic('sql', '', "SELECT *\n FROM facts\n WHERE population_growth = (SELECT MAX(Population_growth)\n FROM facts\n WHERE name <> 'World');\n") # **From our ouput South sudan has the highest growth rate of 4.02** # ## Country with the highest ratio of water to land. # In[11]: get_ipython().run_cell_magic('sql', '', "SELECT *, CAST(area_water AS FLOAT) / area_land AS ratio_water_land\n FROM facts\n WHERE ratio_water_land = (SELECT MAX(CAST(area_water AS FLOAT) / area_land)\n FROM facts\n WHERE name <> 'World');\n") # In[12]: get_ipython().run_cell_magic('sql', '', "SELECT *\n FROM facts\n WHERE area_water > area_land\n AND name <> 'World';\n") # **Our output shows that British Indian Ocean Territory has more water than Land # and higest ratio of water to land.** # In[13]: get_ipython().run_cell_magic('sql', '', "SELECT *\n FROM facts\n WHERE migration_rate = (SELECT MAX(migration_rate) FROM facts\n WHERE name <> 'World')\n") # **Qatar from our output has the highest migration rate** # In[14]: get_ipython().run_cell_magic('sql', '', "SELECT *\n FROM facts\nWHERE population * population_growth/100 = (SELECT MAX(population * population_growth/100)\n FROM facts\n WHERE name <> 'World');\n") # **India has the highest probablity of adding to their populatio next year # based on the ratio between their population and Population growth.** # ## Countries with Higher Death rate than Birth rate. # In[15]: get_ipython().run_cell_magic('sql', '', "SELECT *\n FROM facts\nWHERE death_rate > birth_rate\n AND name <> 'World';\n") # ## Country with the highest population/area ratio # In[16]: get_ipython().run_cell_magic('sql', '', "SELECT *\n FROM facts\nWHERE population / area= (SELECT MAX(population /area)\n FROM facts\n WHERE name <> 'World');\n") # **Macau has the highest populatio/area ratio and the area of land in # square kilometer is so small posibly compared to other countries.**