The CIA factbook, also known as the World Factbook, is an annual publication of the US Central Intelligence Agency. It provides basic intelligence by summarizing information about countries and regions worldwide. The factbook contains a mix of demographic and geographic data, among many more.
This project is aimed at exploring the SQLite factbook.db
database and analyzing the data based on demographic indicators like population, mortality, and fertility
Let's use the following code to connect our Jupyter Notebook to the database file.
%%capture
%load_ext sql
%sql sqlite:///factbook.db
'Connected: None@factbook.db'
Let's start by exploring the data to understand its content.
%%sql
SELECT *
FROM sqlite_master
WHERE type='table';
Done.
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
table | facts | facts | 47 | CREATE TABLE "facts" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "code" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "area" integer, "area_land" integer, "area_water" integer, "population" integer, "population_growth" float, "birth_rate" float, "death_rate" float, "migration_rate" float) |
We have two tables from the database. We are going to use the 'facts'
table for our analysis.
Let's return its first 5 rows to make sense of the data it contains.
%%sql
SELECT *
FROM facts
LIMIT 5
Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
1 | af | Afghanistan | 652230 | 652230 | 0 | 32564342 | 2.32 | 38.57 | 13.89 | 1.51 |
2 | al | Albania | 28748 | 27398 | 1350 | 3029278 | 0.3 | 12.92 | 6.58 | 3.3 |
3 | ag | Algeria | 2381741 | 2381741 | 0 | 39542166 | 1.84 | 23.67 | 4.31 | 0.92 |
4 | an | Andorra | 468 | 468 | 0 | 85580 | 0.12 | 8.13 | 6.96 | 0.0 |
5 | ao | Angola | 1246700 | 1246700 | 0 | 19625353 | 2.78 | 38.78 | 11.49 | 0.46 |
Here are the descriptions for some of the columns:
name
ā the name of the country.area
ā the country's total area (both land and water).area_land
ā the country's land area in square kilometers.area_water
ā the country's waterarea in square kilometers.population
ā the country's population.population_growth
ā the country's population growth as a percentage.birth_rate
ā the country's birth rate, or the number of births per year per 1,000 people.death_rate
ā the country's death rate, or the number of death per year per 1,000 people.Let's start by calculating some summary statistics and look for any outlier countries.
%%sql
SELECT
MIN(population) AS min_pop,
MAX(population) AS max_pop,
MIN(population_growth) AS min_pop_growth,
MAX(population_growth) AS max_pop_growth
FROM
facts
Done.
min_pop | max_pop | min_pop_growth | max_pop_growth |
---|---|---|---|
0 | 7256490011 | 0.0 | 4.02 |
As shown above,
So let's use subqueries
to find out these countries with minimum and maximum populations respectively.
%%sql
SELECT
name
FROM
facts
WHERE
population = (SELECT
MIN(population) AS min_pop
FROM
facts) -- country(ies) with the lowest population
Done.
name |
---|
Antarctica |
It seems like the table contains a row for Antarctica, which explains the population of 0. This seems to match the CIA Factbook page for Antarctica.
%%sql
SELECT
name
FROM
facts
WHERE
population = (SELECT
MAX(population) AS max_pop
FROM
facts) -- country(ies) with the highest population
Done.
name |
---|
World |
It also seems like the table contains a row for the whole world, which explains the population of over 7.2 billion. Therefore, we should remove this row and revisit our summary statistics.
%%sql
SELECT
MIN(population) AS min_pop,
MAX(population) AS max_pop,
MIN(population_growth) AS min_pop_growth,
MAX(population_growth) AS max_pop_growth
FROM
facts
WHERE
name <> 'World'
Done.
min_pop | max_pop | min_pop_growth | max_pop_growth |
---|---|---|---|
0 | 1367485388 | 0.0 | 4.02 |
There is a country whose population is close to 1.4 billion!
Density of a country depends on its population and area. So let's explore the average population and area. We will exclude the world
row again.
%%sql
SELECT
ROUND(AVG(population),2) AS avg_pop,
ROUND(AVG(area),2) AS avg_area
FROM
facts
WHERE
name <> 'World'
Done.
avg_pop | avg_area |
---|---|
32242666.57 | 555093.55 |
We see that the average population is around 32 million and the average area is 555 thousand square kilometers.
To find the most densely populated countries, we'll identify countries that have the following:
%%sql
SELECT
name,
population,
area
FROM
facts
WHERE
population > (SELECT
AVG(population) AS avg_pop
FROM
facts
WHERE
name <> 'World')
AND area < (SELECT
AVG(area) AS avg_area
FROM
facts
WHERE
name <> 'World')
Done.
name | population | area |
---|---|---|
Bangladesh | 168957745 | 148460 |
Germany | 80854408 | 357022 |
Iraq | 37056169 | 438317 |
Italy | 61855120 | 301340 |
Japan | 126919659 | 377915 |
Korea, South | 49115196 | 99720 |
Morocco | 33322699 | 446550 |
Philippines | 100998376 | 300000 |
Poland | 38562189 | 312685 |
Spain | 48146134 | 505370 |
Thailand | 67976405 | 513120 |
Uganda | 37101745 | 241038 |
United Kingdom | 64088222 | 243610 |
Vietnam | 94348835 | 331210 |
Some of these countries are generally known to be densely populated, so we have confidence in our results!
To further my practice of using SQL, I decide to take the analysis further to answer the following questions:
Let's dive in.
%%sql
SELECT -- countries with highest population
name,
population
FROM
facts
WHERE
name NOT IN ('World', 'European Union')
ORDER BY 2 DESC
LIMIT 10
Done.
name | population |
---|---|
China | 1367485388 |
India | 1251695584 |
United States | 321368864 |
Indonesia | 255993674 |
Brazil | 204259812 |
Pakistan | 199085847 |
Nigeria | 181562056 |
Bangladesh | 168957745 |
Russia | 142423773 |
Japan | 126919659 |
China is the country with most people with a population of almost 1.4 billion people.
%%sql
SELECT -- country with highest population growth
name,
population_growth
FROM
facts
WHERE
population_growth = (SELECT MAX(population_growth)
FROM
facts
WHERE
name <> 'World'
)
Done.
name | population_growth |
---|---|
South Sudan | 4.02 |
South Sudan is the country with the highest growth rate with a population growth of 4.02%.
To find the country(ies) with the highest ratios of water to land, we need to find the highest water to land ratio.
%%sql
SELECT
ROUND(MAX(CAST(area_water AS Float) / area_land), 2) AS max_water_land_ratio
FROM
facts
Done.
max_water_land_ratio |
---|
905.67 |
Then we pass this as a subquery to our WHERE
clause to find the country with this ratio (which is the highest).
%%sql
SELECT
name,
ROUND(CAST(area_water AS Float) / area_land, 2) AS water_land_ratio
FROM
facts
WHERE
water_land_ratio = (SELECT
ROUND(MAX(CAST(area_water AS Float) / area_land), 2) AS max_water_land_ratio
FROM
facts
WHERE
name <> 'World')
Done.
name | water_land_ratio |
---|---|
British Indian Ocean Territory | 905.67 |
It seems that our data contains ocean which is not a country. Let's revisit our query to exclude the British Indian Ocean Territory
.
*Revisiting query*
%%sql
SELECT
ROUND(MAX(CAST(area_water AS Float) / area_land), 2) AS max_water_land_ratio
FROM
facts
WHERE
name <> 'World'
AND
name <> 'British Indian Ocean Territory'
Done.
max_water_land_ratio |
---|
4.52 |
It seems there is a country in which the ratio of water to land is 4.52. Let's find out which country is that.
%%sql
SELECT
name,
area_water,
area_land,
ROUND(CAST(area_water AS Float) / CAST(area_land AS Float), 2) AS water_land_ratio
FROM
facts
WHERE
water_land_ratio = (SELECT
ROUND(MAX(CAST(area_water AS Float) / area_land), 2) AS max_water_land_ratio
FROM
facts
WHERE
name <> 'World'
AND
name <> 'British Indian Ocean Territory')
Done.
name | area_water | area_land | water_land_ratio |
---|---|---|---|
Virgin Islands | 1564 | 346 | 4.52 |
Virgin Islands is the country with the highest water to land ratio with 4.52.
To find the countries with more water than land, we need the difference between water and land to be positive.
%%sql
SELECT
name,
area_water,
area_land,
area_water - area_land AS water_land_diff
FROM
facts
WHERE
water_land_diff > 0
AND
name <> 'World'
AND
name <> 'British Indian Ocean Territory'
Done.
name | area_water | area_land | water_land_diff |
---|---|---|---|
Virgin Islands | 1564 | 346 | 1218 |
Virgin Islands
is the only country that thas more water than land.
To identify the fastest rising populations, we select countries whose yearly births are bigger than the yearly deaths.
%%sql
SELECT
name,
ROUND(CAST(population AS Float) * 0.001 * birth_rate, 2) AS new_births,
ROUND(CAST(population AS Float) * 0.001 * death_rate, 2) AS new_deaths,
ROUND(ROUND(CAST(population AS Float) * 0.001 * birth_rate, 2) - ROUND(CAST(population AS Float) * 0.001 * death_rate, 2),2)
AS additional_population
FROM
facts
WHERE
name NOT IN ('World','British Indian Ocean Territory')
ORDER BY
additional_population DESC
LIMIT 10
Done.
name | new_births | new_deaths | additional_population |
---|---|---|---|
India | 24470648.67 | 9162411.67 | 15308237.0 |
China | 17079892.5 | 10297164.97 | 6782727.53 |
Nigeria | 6833995.79 | 2342150.52 | 4491845.27 |
Pakistan | 4495358.43 | 1292067.15 | 3203291.28 |
Ethiopia | 3707091.07 | 814625.06 | 2892466.01 |
Indonesia | 4280214.23 | 1630679.7 | 2649534.53 |
Bangladesh | 3571766.73 | 947852.95 | 2623913.78 |
Congo, Democratic Republic of the | 2768604.74 | 799307.62 | 1969297.12 |
Philippines | 2451230.59 | 617100.08 | 1834130.51 |
Mexico | 2286217.27 | 640335.62 | 1645881.65 |
India, China and Nigeria are the top 3 countries that will add most people to their population in next year.
%%sql
SELECT
name,
death_rate,
birth_rate,
ROUND(death_rate / birth_rate, 2) AS death_birth_ratio
FROM
facts
WHERE
death_rate > birth_rate
ORDER BY
death_birth_ratio
LIMIT
10
Done.
name | death_rate | birth_rate | death_birth_ratio |
---|---|---|---|
Austria | 9.42 | 9.41 | 1.0 |
Moldova | 12.59 | 12.0 | 1.05 |
Poland | 10.19 | 9.74 | 1.05 |
Czech Republic | 10.34 | 9.63 | 1.07 |
Bosnia and Herzegovina | 9.75 | 8.87 | 1.1 |
Italy | 10.19 | 8.74 | 1.17 |
Estonia | 12.4 | 10.51 | 1.18 |
Russia | 13.69 | 11.6 | 1.18 |
Portugal | 11.02 | 9.27 | 1.19 |
Japan | 9.51 | 7.93 | 1.2 |
%%sql
SELECT
name,
birth_rate
FROM
facts
ORDER BY
birth_rate DESC
LIMIT
10
Done.
name | birth_rate |
---|---|
Niger | 45.45 |
Mali | 44.99 |
Uganda | 43.79 |
Zambia | 42.13 |
Burkina Faso | 42.03 |
Burundi | 42.01 |
Malawi | 41.56 |
Somalia | 40.45 |
Angola | 38.78 |
Mozambique | 38.58 |
%%sql
SELECT
name,
birth_rate
FROM
facts
WHERE
name <> 'World'
AND birth_rate IS NOT NULL
ORDER BY
birth_rate
LIMIT
10
Done.
name | birth_rate |
---|---|
Monaco | 6.65 |
Saint Pierre and Miquelon | 7.42 |
Japan | 7.93 |
Andorra | 8.13 |
Korea, South | 8.19 |
Singapore | 8.27 |
Slovenia | 8.42 |
Germany | 8.47 |
Taiwan | 8.47 |
San Marino | 8.63 |
This Wikipedia article provides some insights into the low birth rates recorded in Asian regions (Japan, South Korea, Taiwan, and Singapore). Aging and socio-economic challenges are described as the major reasons for these results.
%%sql
SELECT
name,
death_rate
FROM
facts
ORDER BY
death_rate DESC
LIMIT
10
Done.
name | death_rate |
---|---|
Lesotho | 14.89 |
Ukraine | 14.46 |
Bulgaria | 14.44 |
Guinea-Bissau | 14.33 |
Latvia | 14.31 |
Chad | 14.28 |
Lithuania | 14.27 |
Namibia | 13.91 |
Afghanistan | 13.89 |
Central African Republic | 13.8 |
African countries of Lesotho anf Guinea_Bissau occupies the 1st and 4th positions with 14.89 deaths per 100 people and 14.33 deaths per 100 people respectively. According to the CDC and the Borgen project, the leading causes of death in these territories are infections and poverty.
While in the case of the European countries of Ukraine, Bulgaria, Latvia that occupy the 2nd, 3rd and 5th positions respectively, studies reveal that the leading causes of death are non-infectious diseases like circulatory, digestive, or respiratory diseases and cancers. In Ukraine and Latvia, poor healthcare administration contributes to death.
%%sql
SELECT
name,
death_rate
FROM
facts
WHERE
name <> 'World'
AND death_rate IS NOT NULL
ORDER BY
death_rate
LIMIT
10
Done.
name | death_rate |
---|---|
Qatar | 1.53 |
United Arab Emirates | 1.97 |
Kuwait | 2.18 |
Bahrain | 2.69 |
Gaza Strip | 3.04 |
Turks and Caicos Islands | 3.1 |
Saudi Arabia | 3.33 |
Oman | 3.36 |
Singapore | 3.43 |
West Bank | 3.5 |
Let's find the countries with the highest popualtion/area ratio (without considering the global average population and area) and then compare our result with that of countries with highest population density.
%%sql
SELECT
name,
area,
population,
ROUND(CAST(population AS Float) / area, 2) AS pop_density
FROM
facts
WHERE
birth_rate <> 'None'
AND
area <> 'None'
ORDER BY
pop_density DESC
LIMIT
10
Done.
name | area | population | pop_density |
---|---|---|---|
Macau | 28 | 592731 | 21168.96 |
Monaco | 2 | 30535 | 15267.5 |
Singapore | 697 | 5674472 | 8141.28 |
Hong Kong | 1108 | 7141106 | 6445.04 |
Gaza Strip | 360 | 1869055 | 5191.82 |
Gibraltar | 6 | 29258 | 4876.33 |
Bahrain | 760 | 1346613 | 1771.86 |
Maldives | 298 | 393253 | 1319.64 |
Malta | 316 | 413965 | 1310.02 |
Bermuda | 54 | 70196 | 1299.93 |
This list is different from the first one (with Bangladesh on top) because in this case, we didn't consider the world average population and area.
Notably, all of these locations are tiny. The top five have a combined land area of less than 3,000 kmĀ². Moreover, most are also highly urbanized centers of wealth and finance, with a tendency toward cities and skyscrapers rather than rolling fields or vast wilderness areas.
Throughout this project, we used the CIA World factbook data to identify the leading and lagging territories of the world. Our analysis is based on demographic indicators like population, mortality, and fertility.
Some of the key findings of our analysis are: