Using Basic SQL queries in Jupyter Notebook to analyze data from this database.
This data from the CIA World Factbook, a compendium of statistics about all of the countries on Earth.
The Factbook contains demographic information like the following:
Row |
Description |
---|---|
population |
The global population. |
population_growth |
The annual population growth rate, as a percentage |
area |
The total land and water area. |
Before getting started let's connect to the database :
%%capture
%load_ext sql
%sql sqlite:///factbook.db
'Connected: None@factbook.db'
%%sql
SELECT *
FROM sqlite_master
WHERE type='table' ;
Done.
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
table | facts | facts | 47 | CREATE TABLE "facts" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "code" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "area" integer, "area_land" integer, "area_water" integer, "population" integer, "population_growth" float, "birth_rate" float, "death_rate" float, "migration_rate" float) |
Selecting the 'Facts' table from the database :
%%sql
SELECT *
FROM facts
LIMIT 5 ;
Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
1 | af | Afghanistan | 652230 | 652230 | 0 | 32564342 | 2.32 | 38.57 | 13.89 | 1.51 |
2 | al | Albania | 28748 | 27398 | 1350 | 3029278 | 0.3 | 12.92 | 6.58 | 3.3 |
3 | ag | Algeria | 2381741 | 2381741 | 0 | 39542166 | 1.84 | 23.67 | 4.31 | 0.92 |
4 | an | Andorra | 468 | 468 | 0 | 85580 | 0.12 | 8.13 | 6.96 | 0.0 |
5 | ao | Angola | 1246700 | 1246700 | 0 | 19625353 | 2.78 | 38.78 | 11.49 | 0.46 |
** Column description : **
Row |
Description |
---|---|
name |
The name of the country. |
area |
The total land and water area. |
area_land |
The country's land area in sq km. |
area_water |
The country's water area in sq km. |
population |
The country's population |
population_growth |
The annual population growth rate as percentage |
birth_rate |
The country's birth rate (no.of births per year, per 1000 people) |
death_rate |
The country's death rate (no.of deaths per year, per 1000 people) |
Let's get some basic summary statistics from the data like :
%%sql
SELECT MIN(population) AS 'Min Overall population',
MAX(population) AS 'Max Overall population',
MIN(population_growth) AS 'Min Overall population growth',
MAX(population_growth) AS 'Max Overall population growth'
FROM facts ;
Done.
Min Overall population | Max Overall population | Min Overall population growth | Max Overall population growth |
---|---|---|---|
0 | 7256490011 | 0.0 | 4.02 |
%%sql
SELECT name
FROM facts
WHERE population = (SELECT MIN(population)
FROM facts) ;
Done.
name |
---|
Antarctica |
Antarctica
is a continent, so the name
column also contains the data about continents
listed under it.%%sql
SELECT name
FROM facts
WHERE population = (SELECT MAX(population)
FROM facts) ;
Done.
name |
---|
World |
name
column includes the entry world
that seems to be the sum of the total population in the table. Let's verify our guess.%%sql
SELECT SUM(population) AS 'Total Population'
FROM facts
WHERE name != 'World' ;
Done.
Total Population |
---|
7770482643 |
Total Population
seems to be a bit off, so there must be other listings throwing the total off. Let's continue with our analysis to find the summary statistics without World
and Antarctica
for now.%%sql
SELECT MIN(population) AS 'Min Overall population',
CAST (MAX(population) AS Float) / CAST(1000000000 AS int) AS 'Max Overall population',
MIN(population_growth) AS 'Min Overall population growth',
MAX(population_growth) AS 'Max Overall population growth'
FROM facts
WHERE (name != 'World' AND name != 'Antarctica') ;
Done.
Min Overall population | Max Overall population | Min Overall population growth | Max Overall population growth |
---|---|---|---|
48 | 1.367485388 | 0.0 | 4.02 |
%%sql
SELECT name, population
FROM facts
WHERE population = (SELECT MIN(population)
FROM facts
WHERE name != 'Antarctica') ;
Done.
name | population |
---|---|
Pitcairn Islands | 48 |
%%sql
SELECT name, population
FROM facts
WHERE population = (SELECT MAX(population)
FROM facts
WHERE name != 'World') ;
Done.
name | population |
---|---|
China | 1367485388 |
China
.Let's also find the countries with the maximum and minimum growth rate.
%%sql
SELECT name AS 'Country with Highest population growth', population_growth
FROM facts
WHERE population_growth = (SELECT MAX(population_growth)
FROM facts
WHERE name != 'World') ;
Done.
Country with Highest population growth | population_growth |
---|---|
South Sudan | 4.02 |
%%sql
SELECT name AS 'Countries with Least population growth', population_growth
FROM facts
WHERE population_growth = (SELECT MIN(population_growth)
FROM facts
WHERE population_growth != '0.0')
ORDER BY population;
Done.
Countries with Least population growth | population_growth |
---|---|
Tokelau | 0.01 |
Norfolk Island | 0.01 |
Falkland Islands (Islas Malvinas) | 0.01 |
Greece | 0.01 |
Now that we know about some outliers, let's exclude them and calculate some averages like :
%%sql
SELECT ROUND(CAST (AVG(population) AS Float) / CAST(1000000 AS int), 3) AS 'Average population (in millions)',
AVG(area_land) AS 'Average Land area per country',
ROUND(AVG(area),1) AS 'Average Overall area per country'
FROM facts
WHERE (name != 'World' AND name != 'Antarctica') ;
Done.
Average population (in millions) | Average Land area per country | Average Overall area per country |
---|---|---|
32.377 | 523693.2 | 555093.5 |
Now that we know the Average values, let's try to use it to find all the Densely Populated countries, i.e which are :
%%sql
SELECT name AS 'Densely populated Countries',
CAST (population AS Float) / CAST (area_land AS Float) AS 'Density_ratio per sq_km'
FROM facts
WHERE (name != 'World' AND name != 'Antarctica')
AND (population > (SELECT AVG(population) FROM facts) AND area < (SELECT AVG(area_land) FROM facts))
ORDER BY 'Density_ratio per sq_km' DESC;
Done.
Densely populated Countries | Density_ratio per sq_km |
---|---|
Bangladesh | 1297.9776062072674 |
Germany | 231.8924605359765 |
Japan | 348.21641219803286 |
Philippines | 338.72749102860786 |
Thailand | 133.05487482628354 |
United Kingdom | 264.9039887570785 |
Vietnam | 304.28237172251426 |
Further, let's see which countries have more water area than land area using ratios :
%%sql
SELECT name AS 'More Water Than Land',
ROUND(CAST (area_water AS Float) / CAST (area AS Float), 3) AS 'ratio'
FROM facts
ORDER BY ratio DESC
LIMIT 10;
Done.
More Water Than Land | ratio |
---|---|
British Indian Ocean Territory | 0.999 |
Virgin Islands | 0.819 |
Puerto Rico | 0.357 |
Bahamas, The | 0.279 |
Guinea-Bissau | 0.222 |
Malawi | 0.206 |
Netherlands | 0.184 |
Uganda | 0.182 |
Eritrea | 0.141 |
Liberia | 0.135 |
Let's find the countries with population decline i.e (higher death rate than birth rate):
%%sql
SELECT name AS 'Countries with Population Decline',
ROUND((CAST (death_rate AS Float) / CAST (birth_rate AS Float)) ,3) AS 'Decline_ratio'
FROM facts
WHERE death_rate > birth_rate
ORDER BY Decline_ratio DESC
LIMIT 10;
Done.
Countries with Population Decline | Decline_ratio |
---|---|
Bulgaria | 1.619 |
Serbia | 1.504 |
Latvia | 1.431 |
Lithuania | 1.413 |
Hungary | 1.39 |
Monaco | 1.389 |
Slovenia | 1.35 |
Ukraine | 1.349 |
Germany | 1.348 |
Saint Pierre and Miquelon | 1.31 |
Finally, let's see the country which will increase its population the most in the next year :
%%sql
SELECT name AS 'Countries with highest population addition',
ROUND( CAST ((population_growth * population) AS Float) / CAST (1000000 AS Float), 2 ) AS 'Incline_value_by_mil'
FROM facts
WHERE name != 'World'
ORDER BY Incline_value_by_mil DESC
LIMIT 10;
Done.
Countries with highest population addition | Incline_value_by_mil |
---|---|
India | 1527.07 |
China | 615.37 |
Nigeria | 444.83 |
Pakistan | 290.67 |
Ethiopia | 287.46 |
Bangladesh | 270.33 |
United States | 250.67 |
Indonesia | 235.51 |
Congo, Democratic Republic of the | 194.47 |
Philippines | 162.61 |
From the dataset we see that :
Basic Summary Statistics:
Pitcairn Islands
* with a population of just *48*.China
* with a population of *1.3 Billion*South Sudan
* with *4.02% annual growth rate.*Tokelau
* with *0.01% annual growth rate.*Averages :
32.377 Million
*523,693 sq Kms
*555,093 sq Kms
*Top 3 :
Most Densely populated countries :
Bangladesh
* with *1297.97 people per sq km*Germany
* with *231.89 people per sq km*Japan
* with *348.21 people per sq km*Countries with more Water than Land area :
Virgin Islands
* with *81.9 % water area.*Puerto Rico
* with *35.7 % water area.*The Bahamas
* with *27.9 % water area.*Countries with population decline :
Bulgaria
* with *death rate 1.619 times the birth rate.*Serbia
* with *death rate 1.504 times the birth rate.*Latvia
* with *death rate 1.431 times the birth rate.*Countries with highest population addition :
India
* with *1527.07 Million people getting added* the next year.China
* with *615.37 Million people getting added* the next year.Nigeria
* with *444.83 Million people getting added* the next year.