The goal of this project is figuring out the countries that are densely populated by listing the countries that have a population above average and an area below average.
We'll work with the CIA factbook database, and therefore we'll use SQL language to extract the list of countries.
Loading SQLite and database:
%%capture
%load_ext sql
%sql sqlite:///factbook.db
Checking proper load of the table:
%%sql
SELECT *
FROM sqlite_master
WHERE type='table';
* sqlite:///factbook.db Done.
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
table | facts | facts | 47 | CREATE TABLE "facts" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "code" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "area" integer, "area_land" integer, "area_water" integer, "population" integer, "population_growth" float, "birth_rate" float, "death_rate" float, "migration_rate" float) |
Inspection of the first 5 rows of our table of interest:
%%sql
SELECT *
FROM facts
LIMIT 5;
* sqlite:///factbook.db Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
1 | af | Afghanistan | 652230 | 652230 | 0 | 32564342 | 2.32 | 38.57 | 13.89 | 1.51 |
2 | al | Albania | 28748 | 27398 | 1350 | 3029278 | 0.3 | 12.92 | 6.58 | 3.3 |
3 | ag | Algeria | 2381741 | 2381741 | 0 | 39542166 | 1.84 | 23.67 | 4.31 | 0.92 |
4 | an | Andorra | 468 | 468 | 0 | 85580 | 0.12 | 8.13 | 6.96 | 0.0 |
5 | ao | Angola | 1246700 | 1246700 | 0 | 19625353 | 2.78 | 38.78 | 11.49 | 0.46 |
Here are the descriptions for some of the columns:
Let's start by calculating some summary statistics and look for any outlier countries.
We'll begin by writing a single query that returns the following:
%%sql
SELECT MIN(population) AS 'Minimum_population', MAX(population) AS 'Maximum_population', MIN(population_growth) AS 'Minimum_population_growth', MAX(population_growth) AS 'Maximum_population_growth'
FROM facts;
* sqlite:///factbook.db Done.
Minimum_population | Maximum_population | Minimum_population_growth | Maximum_population_growth |
---|---|---|---|
0 | 7256490011 | 0.0 | 4.02 |
We see a few interesting things in the summary statistics on the previous screen:
Next, we'll check what countries are these:
%%sql
SELECT name
FROM facts
WHERE population = (SELECT MIN(population) FROM facts);
* sqlite:///factbook.db Done.
name |
---|
Antarctica |
%%sql
SELECT name
FROM facts
WHERE population = (SELECT MAX(population) FROM facts);
* sqlite:///factbook.db Done.
name |
---|
World |
Now that we've checked that the country with a population is zero is Antarctica -which is correct- and that there's a row for the whole world, we'll repeat our summary statistics excluding the row for the world, as it doesn't belong to the same category as the countries:
%%sql
SELECT MIN(population) AS 'Minimum_population', MAX(population) AS 'Maximum_population', MIN(population_growth) AS 'Minimum_population_growth', MAX(population_growth) AS 'Maximum_population_growth'
FROM facts
WHERE name != 'World';
* sqlite:///factbook.db Done.
Minimum_population | Maximum_population | Minimum_population_growth | Maximum_population_growth |
---|---|---|---|
0 | 1367485388 | 0.0 | 4.02 |
We'll also calculate the average value of population and area columns:
%%sql
SELECT CAST(AVG(population) AS Integer) AS 'Average_population', CAST(AVG(area) AS Integer) AS 'Average_area'
FROM facts
WHERE name != 'World';
* sqlite:///factbook.db Done.
Average_population | Average_area |
---|---|
32242666 | 555093 |
To answer our question, we'll build on the query we wrote for the previous screen to find countries that are densely populated:
%%sql
SELECT name, area, population
FROM facts
WHERE population > (
SELECT AVG(population)
FROM facts
WHERE name != 'World'
)
AND area < (
SELECT AVG(area)
FROM facts
WHERE name != 'World'
)
AND name != 'World'
ORDER BY area;
* sqlite:///factbook.db Done.
name | area | population |
---|---|---|
Korea, South | 99720 | 49115196 |
Bangladesh | 148460 | 168957745 |
Uganda | 241038 | 37101745 |
United Kingdom | 243610 | 64088222 |
Philippines | 300000 | 100998376 |
Italy | 301340 | 61855120 |
Poland | 312685 | 38562189 |
Vietnam | 331210 | 94348835 |
Germany | 357022 | 80854408 |
Japan | 377915 | 126919659 |
Iraq | 438317 | 37056169 |
Morocco | 446550 | 33322699 |
Spain | 505370 | 48146134 |
Thailand | 513120 | 67976405 |
The analysis above shows that there are 14 countries whose populations and areas are above and below average, respectively.
To complete our analysis, we'll calculate the population/area ratio and list the 14 countries with the highest value, to check if both lists match:
%%sql
SELECT name, population, area, population / area AS 'Density_ratio'
FROM facts
WHERE name != 'World'
ORDER BY Density_ratio DESC
LIMIT 14;
* sqlite:///factbook.db Done.
name | population | area | Density_ratio |
---|---|---|---|
Macau | 592731 | 28 | 21168 |
Monaco | 30535 | 2 | 15267 |
Singapore | 5674472 | 697 | 8141 |
Hong Kong | 7141106 | 1108 | 6445 |
Gaza Strip | 1869055 | 360 | 5191 |
Gibraltar | 29258 | 6 | 4876 |
Bahrain | 1346613 | 760 | 1771 |
Maldives | 393253 | 298 | 1319 |
Malta | 413965 | 316 | 1310 |
Bermuda | 70196 | 54 | 1299 |
Sint Maarten | 39689 | 34 | 1167 |
Bangladesh | 168957745 | 148460 | 1138 |
Guernsey | 66080 | 78 | 847 |
Jersey | 97294 | 116 | 838 |
The analysis above shows that, from the 14 countries with a highest density ratio (population / area), only Bangladesh was on our list containing the countries with population above average and area below average.
The rest of the countries on this list, altough more densely populated than those of the previous list, do not pass the 'population above average' filter that we used previously.
We have learned which countries have a population above average with an area below average - which makes them densely populated countries. These are mostly medium-sized countries, such as South Korea, Italy, Spain, etc.
We've also learned which countries are the most densely populated, by dividing their population by their area - which makes them the most densely populated countries. These are mostly small-sized countries, such as Macau, Gibraltar, Malta, etc.
However, the most important lesson comes from the fact that these lists do not coincide (except for Bangladesh). The lesson to learn here is that measuring two variables independently (i.e., area and population) is not the same as measuring their relationship (i.e., the density ratio).