Let's take a quick virtual tour around the world with data from the CIA World Factbook, a concise collection of information about all of the countries on Earth.
We'll explore the countries that:
First, let's load the database and see what's inside.
%%capture
%load_ext sql
%sql sqlite:///factbook.db
'Connected: None@factbook.db'
%%sql
SELECT *
FROM sqlite_master
WHERE type='table';
Done.
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
table | facts | facts | 47 | CREATE TABLE "facts" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "code" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "area" integer, "area_land" integer, "area_water" integer, "population" integer, "population_growth" float, "birth_rate" float, "death_rate" float, "migration_rate" float) |
%%sql
SELECT *
FROM facts
LIMIT 5;
Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
1 | af | Afghanistan | 652230 | 652230 | 0 | 32564342 | 2.32 | 38.57 | 13.89 | 1.51 |
2 | al | Albania | 28748 | 27398 | 1350 | 3029278 | 0.3 | 12.92 | 6.58 | 3.3 |
3 | ag | Algeria | 2381741 | 2381741 | 0 | 39542166 | 1.84 | 23.67 | 4.31 | 0.92 |
4 | an | Andorra | 468 | 468 | 0 | 85580 | 0.12 | 8.13 | 6.96 | 0.0 |
5 | ao | Angola | 1246700 | 1246700 | 0 | 19625353 | 2.78 | 38.78 | 11.49 | 0.46 |
Some descriptions of the columns in the file that I will use for analysis.
name
— the name of the country.area
— the country's total area (both land and water).area_land
— the country's land area in square kilometers.area_water
— the country's waterarea in square kilometers.population
— the country's population.population_growth
— the country's population growth as a percentage.birth_rate
— the country's birth rate, or the number of births per year per 1,000 people.death_rate
— the country's death rate, or the number of death per year per 1,000 people.There's also a code
column, which I thought was ISO 3166 two-digit code for country, but actually it's not. But fret not, CIA provides a handy cross-reference list of their country code to ISO 3166 code just in case you need it.
Let's have a quick look on the world's population.
%%sql
SELECT MIN(population) AS 'Min population',
MAX(population) AS 'Max population',
MIN(population_growth) AS 'Min population growth',
MAX(population_growth) AS 'Max population growth'
FROM facts;
Done.
Min population | Max population | Min population growth | Max population growth |
---|---|---|---|
0 | 7256490011 | 0.0 | 4.02 |
Let's find out more about the funny numbers on Min population
and Max population
.
%%sql
SELECT name AS country, population
FROM facts
WHERE population == (SELECT MIN(population)
FROM facts
);
Done.
country | population |
---|---|
Antarctica | 0 |
This is what the Factbook says about Antarctica:
no indigenous inhabitants, but there are both permanent and summer-only staffed research stations"
So, population 0 makes sense.
No worries, it's not all that bad, we still have some friendly penguins there! Just like little Fred here.
To be honest, Antarctica is listed at the top of my bucket list. You can go there through Argentina or New Zealand in a fancy cruise to take cool pics of penguins, yachting, camping, hiking and cross country skiing in South Pole!
Other than that, I also have an unhealthy obsession to see polar bears in North Pole. Let's see what the Factbook says about Arctic (which is called 'Arctic Ocean' in this database).
%%sql
SELECT name AS Country, population
FROM facts
WHERE name == 'Arctic';
Done.
Country | population |
---|
Eh, why is there no Arctic if there is an Antarctica? I demand my cute polar bears!
Ah OK, so Arctic is called 'Arctic Ocean' based on the Factbook.
%%sql
SELECT name AS Country, population
FROM facts
WHERE name == 'Arctic Ocean';
Done.
Country | population |
---|---|
Arctic Ocean | None |
OK, why population
is 'None' and not 0?
Out of curiosity, let's just check is there any other country with population
'None' (which is NULL) instead of 0.
%%sql
SELECT name AS Country, population
FROM facts
WHERE population IS NULL
ORDER BY name;
Done.
Country | population |
---|---|
Arctic Ocean | None |
Ashmore and Cartier Islands | None |
Atlantic Ocean | None |
Bouvet Island | None |
British Indian Ocean Territory | None |
Clipperton Island | None |
Coral Sea Islands | None |
French Southern and Antarctic Lands | None |
Heard Island and McDonald Islands | None |
Indian Ocean | None |
Jan Mayen | None |
Navassa Island | None |
Pacific Ocean | None |
Paracel Islands | None |
South Georgia and South Sandwich Islands | None |
Southern Ocean | None |
Spratly Islands | None |
United States Pacific Island Wildlife Refuges | None |
Wake Island | None |
Oh so we have several NULLs here for countries that are islands and oceans, but why is there an inconsistency between having the NULL value and 0 value?
Anyway, let's end our fascinating polar expeditions right here and let's see the top country that has the lowest, non-0, and non-NULL population.
%%sql
SELECT name AS Country, MIN(population) AS 'Population'
FROM facts
WHERE population != 0
AND population != 'NULL';
Done.
Country | Population |
---|---|
Pitcairn Islands | 48 |
Here's a pic of Adamstown, the only settlement on the Pitcairn island (no one lives on the other three islands (or atolls) named Ducie, Henderson and Oeno).
Looks like a beautiful place!Now let's talk about the country with maximum population.
%%sql
SELECT name AS Country, population
FROM facts
WHERE population == (SELECT MAX(population)
FROM facts
);
Done.
Country | population |
---|---|
World | 7256490011 |
Well, 'World' is not a country, so let's exclude it and see what we got. I'm guessing it's China that has most people.
%%sql
SELECT name AS Country, population
FROM facts
WHERE population == (SELECT MAX(population)
FROM facts
WHERE name != 'World'
);
Done.
Country | population |
---|---|
China | 1367485388 |
Yes, it's China, so here's a cute panda pic, featuring [Yuan Zai](https://en.wikipedia.org/wiki/Yuan_Zai_(giant_panda) the giant panda!
OK since we're now in Asia, let's stop by my country, featuring the famous Raja Ampat in the eastern part of Indonesia.
%%sql
SELECT *
FROM facts
WHERE name == 'Indonesia';
Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
78 | id | Indonesia | 1904569 | 1811569 | 93000 | 255993674 | 0.92 | 16.72 | 6.37 | 1.16 |
But I'm pretty sure my country's area_water
is larger than area_land
... hmmm. Water territory is quite a sensitive national issue here.
And the birth rate is, like, almost 3x compared to the death rate. Should our government reinforce the mandatory birth control like several decades ago?
Now that we found the value that might skew our data, let's do some recalculation.
%%sql
SELECT MIN(population) AS 'Min population',
MAX(population) AS 'Max population',
MIN(population_growth) AS 'Min population growth',
MAX(population_growth) AS 'Max population growth'
FROM facts
WHERE name != 'World';
Done.
Min population | Max population | Min population growth | Max population growth |
---|---|---|---|
0 | 1367485388 | 0.0 | 4.02 |
Since the data looks much better now, let's calculate the average.
%%sql
SELECT ROUND(AVG(population), 0) AS 'Average population',
ROUND(AVG(area), 2) AS 'Average area'
FROM facts
WHERE name != 'World';
Done.
Average population | Average area |
---|---|
32242667.0 | 555093.55 |
Let's take a quick check on how my country's population and area compared to the world's averages.
%%sql
SELECT name AS Country, population
FROM facts
WHERE population = (SELECT AVG(population)
FROM facts)
OR name == 'Indonesia';
Done.
Country | population |
---|---|
Indonesia | 255993674 |
world_average_population = 32242667
Indonesia_population = 255993674
world_average_area = 555093.55
Indonesia_area = 1904569
if Indonesia_area > world_average_area:
print("Indonesia's area is higher than the world's average by " + "{:.2f}".format((Indonesia_area - world_average_area)/world_average_area) + "%")
else:
print("Indonesia's area is lower than the world's average by " + "{:.2f}".format((world_average_area - Indonesia_area)/world_average_area) + "%")
if Indonesia_population > world_average_population:
print("Indonesia's population is higher than the world's average by " + "{:.2f}".format((Indonesia_population - world_average_population)/world_average_population) + "%")
else:
print("Indonesia's population is lower than the world's average by " + "{:.2f}".format((world_average_population - Indonesia_population)/world_average_population) + "%")
Indonesia's area is higher than the world's average by 2.43% Indonesia's population is higher than the world's average by 6.94%
Next, let's see all countries with:
population
area
%%sql
SELECT name AS Country, population, area
FROM facts
WHERE population > (SELECT AVG(population)
FROM facts
WHERE name != 'World'
)
AND area < (SELECT AVG(area)
FROM facts
WHERE name != 'World'
);
Done.
Country | population | area |
---|---|---|
Bangladesh | 168957745 | 148460 |
Germany | 80854408 | 357022 |
Iraq | 37056169 | 438317 |
Italy | 61855120 | 301340 |
Japan | 126919659 | 377915 |
Korea, South | 49115196 | 99720 |
Morocco | 33322699 | 446550 |
Philippines | 100998376 | 300000 |
Poland | 38562189 | 312685 |
Spain | 48146134 | 505370 |
Thailand | 67976405 | 513120 |
Uganda | 37101745 | 241038 |
United Kingdom | 64088222 | 243610 |
Vietnam | 94348835 | 331210 |
%%sql
SELECT name AS Country, population_growth AS 'Growth rate'
FROM facts
ORDER BY population_growth DESC
LIMIT 10;
Done.
Country | Growth rate |
---|---|
South Sudan | 4.02 |
Malawi | 3.32 |
Burundi | 3.28 |
Niger | 3.25 |
Uganda | 3.24 |
Qatar | 3.07 |
Burkina Faso | 3.03 |
Mali | 2.98 |
Cook Islands | 2.95 |
Iraq | 2.93 |
Almost all of these countries are located in the African continent.
%%sql
SELECT name AS Country, population_growth AS 'Growth rate'
FROM facts
WHERE population_growth != 'NULL'
ORDER BY population_growth
LIMIT 10;
Done.
Country | Growth rate |
---|---|
Holy See (Vatican City) | 0.0 |
Cocos (Keeling) Islands | 0.0 |
Greenland | 0.0 |
Pitcairn Islands | 0.0 |
Greece | 0.01 |
Norfolk Island | 0.01 |
Tokelau | 0.01 |
Falkland Islands (Islas Malvinas) | 0.01 |
Guyana | 0.02 |
Slovakia | 0.02 |
%%sql
SELECT name AS Country, area_water, area_land,
ROUND(CAST(area_water AS Float)/CAST(area_land AS Float), 1) AS water_to_land_ratio
FROM facts
ORDER BY water_to_land_ratio DESC
LIMIT 10;
Done.
Country | area_water | area_land | water_to_land_ratio |
---|---|---|---|
British Indian Ocean Territory | 54340 | 60 | 905.7 |
Virgin Islands | 1564 | 346 | 4.5 |
Puerto Rico | 4921 | 8870 | 0.6 |
Bahamas, The | 3870 | 10010 | 0.4 |
Guinea-Bissau | 8005 | 28120 | 0.3 |
Malawi | 24404 | 94080 | 0.3 |
Eritrea | 16600 | 101000 | 0.2 |
Liberia | 15049 | 96320 | 0.2 |
Netherlands | 7650 | 33893 | 0.2 |
Uganda | 43938 | 197100 | 0.2 |
Again, I'm not sure if the area_water
includes the territorial water, baseline, internal waters, and exclusive economic zone (EEZ), or anything related to the maritime boundary.
%%sql
SELECT name AS Country, population, population_growth,
ROUND(CAST(population AS Float)/100 * population_growth, 0) AS 'added people in 2016'
FROM facts
WHERE name != 'World'
ORDER BY ROUND(CAST(population AS Float)/100 * population_growth, 0) DESC
LIMIT 10;
Done.
Country | population | population_growth | added people in 2016 |
---|---|---|---|
India | 1251695584 | 1.22 | 15270686.0 |
China | 1367485388 | 0.45 | 6153684.0 |
Nigeria | 181562056 | 2.45 | 4448270.0 |
Pakistan | 199085847 | 1.46 | 2906653.0 |
Ethiopia | 99465819 | 2.89 | 2874562.0 |
Bangladesh | 168957745 | 1.6 | 2703324.0 |
United States | 321368864 | 0.78 | 2506677.0 |
Indonesia | 255993674 | 0.92 | 2355142.0 |
Congo, Democratic Republic of the | 79375136 | 2.45 | 1944691.0 |
Philippines | 100998376 | 1.61 | 1626074.0 |
I spotted my country there haha.. countries from three continents are listed here, so it seems like the increase in people are pretty balanced around the world.
%%sql
SELECT name AS Country, birth_rate, death_rate,
ROUND(death_rate - birth_rate/death_rate, 2) AS death_birth_ratio
FROM facts
ORDER BY death_birth_ratio DESC
LIMIT 10;
Done.
Country | birth_rate | death_rate | death_birth_ratio |
---|---|---|---|
Bulgaria | 8.92 | 14.44 | 13.82 |
Ukraine | 10.72 | 14.46 | 13.72 |
Latvia | 10.0 | 14.31 | 13.61 |
Lithuania | 10.1 | 14.27 | 13.56 |
Lesotho | 25.47 | 14.89 | 13.18 |
Serbia | 9.08 | 13.66 | 13.0 |
Russia | 11.6 | 13.69 | 12.84 |
Belarus | 10.7 | 13.36 | 12.56 |
Namibia | 19.8 | 13.91 | 12.49 |
Hungary | 9.16 | 12.73 | 12.01 |
Mostly in Eastern Europe countries.
%%sql
SELECT name AS Country, birth_rate, death_rate,
ROUND(birth_rate/death_rate, 1) AS birth_death_ratio
FROM facts
ORDER BY birth_death_ratio DESC
LIMIT 10;
Done.
Country | birth_rate | death_rate | birth_death_ratio |
---|---|---|---|
Gaza Strip | 31.11 | 3.04 | 10.2 |
Kuwait | 19.91 | 2.18 | 9.1 |
Iraq | 31.45 | 3.77 | 8.3 |
United Arab Emirates | 15.43 | 1.97 | 7.8 |
Oman | 24.44 | 3.36 | 7.3 |
Jordan | 25.37 | 3.79 | 6.7 |
Solomon Islands | 25.77 | 3.85 | 6.7 |
West Bank | 22.99 | 3.5 | 6.6 |
Qatar | 9.84 | 1.53 | 6.4 |
Marshall Islands | 25.6 | 4.21 | 6.1 |
Mostly from Middle Eastern countries.
%%sql
SELECT name AS Country, population, area,
ROUND(CAST(population AS Float) / area, 2) AS 'population/area'
FROM facts
ORDER BY ROUND(CAST(population AS Float) / area, 2) DESC
LIMIT 10;
Done.
Country | population | area | population/area |
---|---|---|---|
Macau | 592731 | 28 | 21168.96 |
Monaco | 30535 | 2 | 15267.5 |
Singapore | 5674472 | 697 | 8141.28 |
Hong Kong | 7141106 | 1108 | 6445.04 |
Gaza Strip | 1869055 | 360 | 5191.82 |
Gibraltar | 29258 | 6 | 4876.33 |
Bahrain | 1346613 | 760 | 1771.86 |
Maldives | 393253 | 298 | 1319.64 |
Malta | 413965 | 316 | 1310.02 |
Bermuda | 70196 | 54 | 1299.93 |
area water
and area land
data might not be that accurate.