In this project, we'll use SQL to analyze CIA World Factbook.you can download the database here.
The aim of this project is to find out the densely populated country.
The code below,will connect our jupyter Notebook to our database file.
%%capture
%load_ext sql
%sql sqlite:///factbook.db
'Connected: None@factbook.db'
The output above confirms to us that our jupyter notebook has been successfully connected to our database.
Let's now name our table , the code below helps to query the database to get information of our table
%%sql
SELECT *
FROM sqlite_master
WHERE type='table';
Done.
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
table | facts | facts | 47 | CREATE TABLE "facts" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "code" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "area" integer, "area_land" integer, "area_water" integer, "population" integer, "population_growth" float, "birth_rate" float, "death_rate" float, "migration_rate" float) |
The output has confirm to us the name of the table that is facts
The LIMIT we have used in the code cell below help us to specify the number of rows we want to display.
%%sql
SELECT *
FROM facts
LIMIT 5
Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
1 | af | Afghanistan | 652230 | 652230 | 0 | 32564342 | 2.32 | 38.57 | 13.89 | 1.51 |
2 | al | Albania | 28748 | 27398 | 1350 | 3029278 | 0.3 | 12.92 | 6.58 | 3.3 |
3 | ag | Algeria | 2381741 | 2381741 | 0 | 39542166 | 1.84 | 23.67 | 4.31 | 0.92 |
4 | an | Andorra | 468 | 468 | 0 | 85580 | 0.12 | 8.13 | 6.96 | 0.0 |
5 | ao | Angola | 1246700 | 1246700 | 0 | 19625353 | 2.78 | 38.78 | 11.49 | 0.46 |
Here are the descriptions for some of the columns:
name
- the name of the countryarea
— the country's total area (both land and water).area_land
— the country's land area in square kilometersarea_water
— the country's waterarea in square kilometers.population
— the country's populationpopulation_growth
— the country's population growth as a percentagebirth_rate
— the country's birth rate, or the number of births per year per 1,000 peopledeath_rate
— the country's death rate, or the number of death per year per 1,000 people** let's now work on some summary statistic**
%%sql
SELECT min(population),max(population),min(population_growth),max(population_growth)
FROM facts
Done.
min(population) | max(population) | min(population_growth) | max(population_growth) |
---|---|---|---|
0 | 7256490011 | 0.0 | 4.02 |
From the output, we can notice few interesting things in the summary statistics ;
We'll therefore write a query that returns the countries with minimum population and those with maximum populaution
%%sql
SELECT *
FROM facts
WHERE population = 0
GROUP BY name
Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
250 | ay | Antarctica | None | 280000 | None | 0 | None | None | None | None |
It seems like the table contains a row for Antarctica, which explains the population of ** 0** which seems to match the CIA Factbook page for Antarctica
%%sql
SELECT *
FROM facts
WHERE population = (select max(population)
FROM facts)
GROUP BY name
Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
261 | xx | World | None | None | None | 7256490011 | 1.08 | 18.6 | 7.8 | None |
The output above shows a table that contains a row for the whole World, which explains the population of over 7.2 billion.
With this information, We are going to recompute the summary statistic we found in the above outputs while excluding the row for the whole world.
%%sql
SELECT min(population),max(population),min(population_growth),max(population_growth)
FROM facts
WHERE name != "World"
Done.
min(population) | max(population) | min(population_growth) | max(population_growth) |
---|---|---|---|
0 | 1367485388 | 0.0 | 4.02 |
From the output above,we can confirm the most populated countary has of about 1.3 billion.
We'll calculate the average value for the following columns
%%sql
SELECT avg(population) as avg_population,avg(area) as avg_area
FROM facts
Done.
avg_population | avg_area |
---|---|
62094928.32231405 | 555093.546184739 |
The above output, confirm that the average population and that of an Area is about ** 62 million** and 555 thousand square kilometer respectively.
To finish,we'll build on the query we wrote in the above cell,to find find countries that are densely populated. We'll identify countries that have the following:
%%sql
SELECT *
FROM facts
WHERE population > (SELECT avg(population)
FROM facts)
AND area < (SElECT avg(area)
FROM facts)
ORDER BY population DESC
Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
14 | bg | Bangladesh | 148460 | 130170 | 18290 | 168957745 | 1.6 | 21.14 | 5.61 | 0.46 |
85 | ja | Japan | 377915 | 364485 | 13430 | 126919659 | 0.16 | 7.93 | 9.51 | 0.0 |
138 | rp | Philippines | 300000 | 298170 | 1830 | 100998376 | 1.61 | 24.27 | 6.11 | 2.09 |
192 | vm | Vietnam | 331210 | 310070 | 21140 | 94348835 | 0.97 | 15.96 | 5.93 | 0.3 |
65 | gm | Germany | 357022 | 348672 | 8350 | 80854408 | 0.17 | 8.47 | 11.42 | 1.24 |
173 | th | Thailand | 513120 | 510890 | 2230 | 67976405 | 0.34 | 11.19 | 7.8 | 0.0 |
185 | uk | United Kingdom | 243610 | 241930 | 1680 | 64088222 | 0.54 | 12.17 | 9.35 | 2.54 |
The output above confirm to us that: