The data is from the CIA World Factbook, a compendium of statistics about all of the countries on Earth. The Factbook contains demographic information.
code
- Code for evry country name.name
- The name of the country.area
- The total land and sea area of the country.population
- The country's population.population_growth
- The country's population growth as a percentage.birth_rate
- The country's birth rate, or the number of births a year per 1,000 people.death_rate
- The country's death rate, or the number of death a year per 1,000 people.area
- The country's total area (both land and water).area_land
- The country's land area in square kilometers.area_water
- The country's waterarea in square kilometers.migration_rate
- The country's migration rate# run this to enable your notebook write sql
# !conda install -yc conda-forge ipython-sql
%%capture
%load_ext sql
%sql sqlite:///factbook.db
# Establishing a connection to the database file
Let us get more information about the database file.
NB: the "%%sql" magic at the top of the cells helps jupyter recognize the cell as an SQL
query.
%%sql
SELECT *
FROM sqlite_master
WHERE type='table';
* sqlite:///factbook.db Done.
type | name | tbl_name | rootpage | sql |
---|---|---|---|---|
table | sqlite_sequence | sqlite_sequence | 3 | CREATE TABLE sqlite_sequence(name,seq) |
table | facts | facts | 47 | CREATE TABLE "facts" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, "code" varchar(255) NOT NULL, "name" varchar(255) NOT NULL, "area" integer, "area_land" integer, "area_water" integer, "population" integer, "population_growth" float, "birth_rate" float, "death_rate" float, "migration_rate" float) |
%%sql
SELECT *
FROM facts
LIMIT 5;
* sqlite:///factbook.db Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
1 | af | Afghanistan | 652230 | 652230 | 0 | 32564342 | 2.32 | 38.57 | 13.89 | 1.51 |
2 | al | Albania | 28748 | 27398 | 1350 | 3029278 | 0.3 | 12.92 | 6.58 | 3.3 |
3 | ag | Algeria | 2381741 | 2381741 | 0 | 39542166 | 1.84 | 23.67 | 4.31 | 0.92 |
4 | an | Andorra | 468 | 468 | 0 | 85580 | 0.12 | 8.13 | 6.96 | 0.0 |
5 | ao | Angola | 1246700 | 1246700 | 0 | 19625353 | 2.78 | 38.78 | 11.49 | 0.46 |
Below, we calculate the MAX
and MIN
of the population
and population_growth
columns
%%sql
SELECT MIN(population) AS min_population,
Max(population) AS max_population,
MIN(population_growth) AS min_pop_growth,
MAX(population_growth) AS max_pop_growth
FROM facts;
* sqlite:///factbook.db Done.
min_population | max_population | min_pop_growth | max_pop_growth |
---|---|---|---|
0 | 7256490011 | 0.0 | 4.02 |
Something seems to be going on here:
0
.This makes a good base for further analysis. We will zoom in on entries with this attributes.
%%sql
SELECT name, population, population_growth
FROM facts
WHERE population = (SELECT MIN(population)
FROM facts )
OR population = (SELECT MAX(population)
FROM facts);
* sqlite:///factbook.db Done.
name | population | population_growth |
---|---|---|
Antarctica | 0 | None |
World | 7256490011 | 1.08 |
0
population. This is in accordance with CIA factbook page for Antartica.We recalculate the summary statistics above this time elimintaing rows for 'World' and 'Antartica'
%%sql
SELECT MAX(population),
MIN(population),
MAX(population_growth),
MIN(population_growth)
FROM facts
WHERE name <> 'World' and name <> 'Antarctica';
* sqlite:///factbook.db Done.
MAX(population) | MIN(population) | MAX(population_growth) | MIN(population_growth) |
---|---|---|---|
1367485388 | 48 | 4.02 | 0.0 |
Now the summary statistics looks more accurate:
Below we calulate the average population and average area from the data.
%%sql
SELECT ROUND(AVG(population),2) AS avg_population,
ROUND(AVG(area),2) AS avg_area
FROM facts
WHERE name != 'World';
* sqlite:///factbook.db Done.
avg_population | avg_area |
---|---|
32242666.57 | 555093.55 |
# # a view is like a saved query that we can reference back to in the FROM clause
# %%sql
# CREATE VIEW facts_wo_world AS
# SELECT *
# FROM facts
# WHERE name <> 'World';
To achieve this, we select rows that meet our criteria:
population
is above average.area
is below average.%%sql
SELECT name, population, area,
ROUND(CAST(population AS FLOAT) / area, 2) AS population_density
FROM facts
WHERE population > (SELECT AVG(population)
FROM facts
WHERE name != 'World')
AND area < (SELECT AVG(area)
FROM facts
WHERE name != 'World')
ORDER BY population_density DESC;
* sqlite:///factbook.db Done.
name | population | area | population_density |
---|---|---|---|
Bangladesh | 168957745 | 148460 | 1138.07 |
Korea, South | 49115196 | 99720 | 492.53 |
Philippines | 100998376 | 300000 | 336.66 |
Japan | 126919659 | 377915 | 335.84 |
Vietnam | 94348835 | 331210 | 284.86 |
United Kingdom | 64088222 | 243610 | 263.08 |
Germany | 80854408 | 357022 | 226.47 |
Italy | 61855120 | 301340 | 205.27 |
Uganda | 37101745 | 241038 | 153.92 |
Thailand | 67976405 | 513120 | 132.48 |
Poland | 38562189 | 312685 | 123.33 |
Spain | 48146134 | 505370 | 95.27 |
Iraq | 37056169 | 438317 | 84.54 |
Morocco | 33322699 | 446550 | 74.62 |
In this section we would be answering question s like:
Let us begin:
%%sql
SELECT name, MAX(population) AS 'population'
FROM facts
WHERE name <> 'World'
* sqlite:///factbook.db Done.
name | population |
---|---|
China | 1367485388 |
%%sql
SELECT *
FROM facts
WHERE population = (SELECT MIN(population)
FROM facts
WHERE name <> 'Antarctica');
* sqlite:///factbook.db Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
238 | pc | Pitcairn Islands | 47 | 47 | 0 | 48 | 0.0 | None | None | None |
%%sql
SELECT name, MAX(population_growth)
FROM facts;
* sqlite:///factbook.db Done.
name | MAX(population_growth) |
---|---|
South Sudan | 4.02 |
%%sql
SELECT name, birth_rate
FROM facts
ORDER BY birth_rate DESC
LIMIT 15;
* sqlite:///factbook.db Done.
name | birth_rate |
---|---|
Niger | 45.45 |
Mali | 44.99 |
Uganda | 43.79 |
Zambia | 42.13 |
Burkina Faso | 42.03 |
Burundi | 42.01 |
Malawi | 41.56 |
Somalia | 40.45 |
Angola | 38.78 |
Mozambique | 38.58 |
Afghanistan | 38.57 |
Nigeria | 37.64 |
Ethiopia | 37.27 |
Sierra Leone | 37.03 |
South Sudan | 36.91 |
%%sql
SELECT name, death_rate
FROM facts
ORDER BY death_rate DESC
LIMIT 10;
* sqlite:///factbook.db Done.
name | death_rate |
---|---|
Lesotho | 14.89 |
Ukraine | 14.46 |
Bulgaria | 14.44 |
Guinea-Bissau | 14.33 |
Latvia | 14.31 |
Chad | 14.28 |
Lithuania | 14.27 |
Namibia | 13.91 |
Afghanistan | 13.89 |
Central African Republic | 13.8 |
%%sql
SELECT name, area_land, area_water
FROM facts
WHERE area_land < area_water;
* sqlite:///factbook.db Done.
name | area_land | area_water |
---|---|---|
British Indian Ocean Territory | 60 | 54340 |
Virgin Islands | 346 | 1564 |
We would ask more insightful questions and try to get answers for them.
Firstly, we ask the question, what countries will add the most number of people next year. Let us try to answer this:
%%sql
SELECT name, population AS old_population, population_growth,
CAST(population + (population*(population_growth/100)) AS INTEGER) AS 'new_population',
CAST(population + (population*(population_growth/100)) AS INTEGER) - population AS pop_diff
FROM facts
WHERE name <> 'World'
ORDER BY pop_diff DESC
LIMIT 10;
* sqlite:///factbook.db Done.
name | old_population | population_growth | new_population | pop_diff |
---|---|---|---|---|
India | 1251695584 | 1.22 | 1266966270 | 15270686 |
China | 1367485388 | 0.45 | 1373639072 | 6153684 |
Nigeria | 181562056 | 2.45 | 186010326 | 4448270 |
Pakistan | 199085847 | 1.46 | 201992500 | 2906653 |
Ethiopia | 99465819 | 2.89 | 102340381 | 2874562 |
Bangladesh | 168957745 | 1.6 | 171661068 | 2703323 |
United States | 321368864 | 0.78 | 323875541 | 2506677 |
Indonesia | 255993674 | 0.92 | 258348815 | 2355141 |
Congo, Democratic Republic of the | 79375136 | 2.45 | 81319826 | 1944690 |
Philippines | 100998376 | 1.61 | 102624449 | 1626073 |
Another interesting point of investigation is identfying countries that have a higer death rate than birth rate.
%%sql
SELECT name, death_rate, birth_rate
FROM facts
WHERE death_rate > birth_rate
ORDER BY death_rate DESC;
* sqlite:///factbook.db Done.
name | death_rate | birth_rate |
---|---|---|
Ukraine | 14.46 | 10.72 |
Bulgaria | 14.44 | 8.92 |
Latvia | 14.31 | 10.0 |
Lithuania | 14.27 | 10.1 |
Russia | 13.69 | 11.6 |
Serbia | 13.66 | 9.08 |
Belarus | 13.36 | 10.7 |
Hungary | 12.73 | 9.16 |
Moldova | 12.59 | 12.0 |
Estonia | 12.4 | 10.51 |
Croatia | 12.18 | 9.45 |
Romania | 11.9 | 9.14 |
Germany | 11.42 | 8.47 |
Slovenia | 11.37 | 8.42 |
Greece | 11.09 | 8.66 |
Portugal | 11.02 | 9.27 |
Czech Republic | 10.34 | 9.63 |
Italy | 10.19 | 8.74 |
Poland | 10.19 | 9.74 |
Bosnia and Herzegovina | 9.75 | 8.87 |
Saint Pierre and Miquelon | 9.72 | 7.42 |
Japan | 9.51 | 7.93 |
Austria | 9.42 | 9.41 |
Monaco | 9.24 | 6.65 |
The above cell tells us what countries have higher death rate's than birth rate's. Another insightful way we can view this is to represent it as a ratio and see the countries that really standout.
%%sql
SELECT name, death_rate, birth_rate, ROUND(death_rate/birth_rate,2) AS ratio
FROM facts
WHERE death_rate > birth_rate
ORDER BY ratio DESC
LIMIT 10;
* sqlite:///factbook.db Done.
name | death_rate | birth_rate | ratio |
---|---|---|---|
Bulgaria | 14.44 | 8.92 | 1.62 |
Serbia | 13.66 | 9.08 | 1.5 |
Latvia | 14.31 | 10.0 | 1.43 |
Lithuania | 14.27 | 10.1 | 1.41 |
Hungary | 12.73 | 9.16 | 1.39 |
Monaco | 9.24 | 6.65 | 1.39 |
Germany | 11.42 | 8.47 | 1.35 |
Slovenia | 11.37 | 8.42 | 1.35 |
Ukraine | 14.46 | 10.72 | 1.35 |
Saint Pierre and Miquelon | 9.72 | 7.42 | 1.31 |
This seems a lot like the earlier analysis we did when we found highly Dense countries. We could try to see if there is any similarities with our answers.
%%sql
SELECT name, population, area, ROUND(CAST(population AS FLOAT)/area, 2) AS population_density
FROM facts
ORDER BY population_density DESC
LIMIT 10;
* sqlite:///factbook.db Done.
name | population | area | population_density |
---|---|---|---|
Macau | 592731 | 28 | 21168.96 |
Monaco | 30535 | 2 | 15267.5 |
Singapore | 5674472 | 697 | 8141.28 |
Hong Kong | 7141106 | 1108 | 6445.04 |
Gaza Strip | 1869055 | 360 | 5191.82 |
Gibraltar | 29258 | 6 | 4876.33 |
Bahrain | 1346613 | 760 | 1771.86 |
Maldives | 393253 | 298 | 1319.64 |
Malta | 413965 | 316 | 1310.02 |
Bermuda | 70196 | 54 | 1299.93 |