In this project, we will be working with the data from the Central Intelligence Agency World Factbook. The database, factbook.db
used in this analysis can be downloaded here . We will be working with SQLite
in this project to explore and analyse the database.
%%capture
%load_ext sql
%sql sqlite:///factbook.db
'Connected: None@factbook.db'
%%sql
SELECT * from sqlite_master WHERE type='table';
Done.
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
table | facts | facts | 47 | CREATE TABLE "facts" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "code" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "area" integer, "area_land" integer, "area_water" integer, "population" integer, "population_growth" float, "birth_rate" float, "death_rate" float, "migration_rate" float) |
From the above cell, we see that there is only one table in the database named facts
that is relevant to our analysis task in hand. Let's view the first 5 rows in the the table facts
%%sql
SELECT * FROM facts LIMIT 5;
Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
1 | af | Afghanistan | 652230 | 652230 | 0 | 32564342 | 2.32 | 38.57 | 13.89 | 1.51 |
2 | al | Albania | 28748 | 27398 | 1350 | 3029278 | 0.3 | 12.92 | 6.58 | 3.3 |
3 | ag | Algeria | 2381741 | 2381741 | 0 | 39542166 | 1.84 | 23.67 | 4.31 | 0.92 |
4 | an | Andorra | 468 | 468 | 0 | 85580 | 0.12 | 8.13 | 6.96 | 0.0 |
5 | ao | Angola | 1246700 | 1246700 | 0 | 19625353 | 2.78 | 38.78 | 11.49 | 0.46 |
We have the following columns in the facts
table:
name
: the name of the countryarea
: the total land and sea area of the countrypopulation
: the country's populationpopulation_growth
: the population growth in percentagebirth_rate
: the number of births per 1000 people in a yeardeath_rate
: the number of deaths per 1000 people in a yeararea
: the total of the country area including both land and waterarea_land
: land area in square kilometersarea_water
: water area in square kilometersLet's calculate the minimum and maximum of the population and also of the population growth from the facts
table.
%%sql
SELECT
MIN(population) "Minimum population",
MAX(population) "Maximum population",
MIN(population_growth) "Minimum population growth",
MAX(population_growth) "Maximum population growth"
FROM facts;
Done.
Minimum population | Maximum population | Minimum population growth | Maximum population growth |
---|---|---|---|
0 | 7256490011 | 0.0 | 4.02 |
Now let's also retrieve the names of the countries associated with minimum and maximum population.
%%sql
SELECT
name,
Min "Mininum population"
FROM
(
SELECT *,
MIN(population)Min
FROM facts
);
Done.
name | Mininum population |
---|---|
Antarctica | 0 |
%%sql
SELECT
name,
Max "Maximum population"
FROM
(
SELECT
*,
MAX(population) Max
FROM facts
)
Done.
name | Maximum population |
---|---|
World | 7256490011 |
The above results shows Antartica has a population of 0 and matchs the results in the CIA page
Let's perform more mathematical aggregations on the database on the columns population
and area
%%sql
SELECT
AVG(population) "Average Population",
AVG(area) "Average Area"
FROM facts;
Done.
Average Population | Average Area |
---|---|
62094928.32231405 | 555093.546184739 |
To find the countries that are densly populated, we have to find the countries with above average population and countries that have below average area.
%%sql
WITH avg_population AS (
SELECT name, population,area FROM facts
GROUP BY name
HAVING AVG(population) > 62094928.32231405
AND
AVG(area) < 555093.546184739)
SELECT
name "Countries with population above average",
population "Population",
area "Area"
FROM avg_population
ORDER BY population DESC;
Done.
Countries with population above average | Population | Area |
---|---|---|
Bangladesh | 168957745 | 148460 |
Japan | 126919659 | 377915 |
Philippines | 100998376 | 300000 |
Vietnam | 94348835 | 331210 |
Germany | 80854408 | 357022 |
Thailand | 67976405 | 513120 |
United Kingdom | 64088222 | 243610 |
Bangladesh is the most densely populated country with above average population.
%%sql
SELECT
name,
area_water,
area_land,
(CAST(area_water AS FLOAT)/CAST(area_land AS FLOAT)) "water_to_land_ratio"
FROM facts
ORDER BY 4 DESC
LIMIT 3;
Done.
name | area_water | area_land | water_to_land_ratio |
---|---|---|---|
British Indian Ocean Territory | 54340 | 60 | 905.6666666666666 |
Virgin Islands | 1564 | 346 | 4.520231213872832 |
Puerto Rico | 4921 | 8870 | 0.5547914317925592 |
The top 3 countries with more water to land ratio are British Indian Ocean Territory,Virgin Islands and Puerto Rico.
%%sql
SELECT
name,
population,
CAST((population_growth*population)/100 AS FLOAT) "predicted population growth next year"
FROM facts
ORDER BY 2 desc
LIMIT 5
Done.
name | population | predicted population growth next year |
---|---|---|
World | 7256490011 | 78370092.1188 |
China | 1367485388 | 6153684.246 |
India | 1251695584 | 15270686.1248 |
European Union | 513949445 | 1284873.6125 |
United States | 321368864 | 2506677.1392 |
The World on the whole will add the most people to the population as per the results, which is very broad. However, taking only the country names into consideration, China will add more people to their population in the coming year.
%%sql
SELECT
name,
death_rate,
birth_rate
FROM facts
WHERE
death_rate > birth_rate
ORDER BY death_rate DESC
LIMIT 3;
Done.
name | death_rate | birth_rate |
---|---|---|
Ukraine | 14.46 | 10.72 |
Bulgaria | 14.44 | 8.92 |
Latvia | 14.31 | 10.0 |
Ukraine has the highest death rates than the birth rates from the list of other countries in the database.