This CIA World Factbook is a collection of statistics and data about every country on Earth. It contains demographic, geographic, ecomomic, and government related data.
In this project we will use SQL do analyze the data from the database.
Here are the descriptions for some of the columns:
name
— the name of the country.area
— the country's total area (both land and water) in square kilometersarea_land
— the country's land area in square kilometers.area_water
— the country's waterarea in square kilometers.population
— the country's population.population_growth
— the country's population growth as a percentage.birth_rate
— the country's birth rate, or the number of births per year per 1,000 people.death_rate
— the country's death rate, or the number of death per year per 1,000 people.# !conda install -yc conda-forge ipython-sql
%%capture
%load_ext sql
%sql sqlite:///factbook.db
%%sql
SELECT *
FROM facts
LIMIT 5;
* sqlite:///factbook.db Done.
id | code | name | area | area_land | area_water | population | population_growth | birth_rate | death_rate | migration_rate |
---|---|---|---|---|---|---|---|---|---|---|
1 | af | Afghanistan | 652230 | 652230 | 0 | 32564342 | 2.32 | 38.57 | 13.89 | 1.51 |
2 | al | Albania | 28748 | 27398 | 1350 | 3029278 | 0.3 | 12.92 | 6.58 | 3.3 |
3 | ag | Algeria | 2381741 | 2381741 | 0 | 39542166 | 1.84 | 23.67 | 4.31 | 0.92 |
4 | an | Andorra | 468 | 468 | 0 | 85580 | 0.12 | 8.13 | 6.96 | 0.0 |
5 | ao | Angola | 1246700 | 1246700 | 0 | 19625353 | 2.78 | 38.78 | 11.49 | 0.46 |
%%sql
SELECT MIN(population) AS min_pop,
MAX(population) AS max_pop,
MIN(population_growth) AS min_pop_growth,
MAX(population_growth) max_pop_growth
FROM facts;
* sqlite:///factbook.db Done.
min_pop | max_pop | min_pop_growth | max_pop_growth |
---|---|---|---|
0 | 7256490011 | 0.0 | 4.02 |
There is one country with a population of zero and one country with over 7.2 billion people. Clearly these are unrealistic so let's take a closer look.
%%sql
SELECT name, population
FROM facts
WHERE population == (SELECT MIN(population)
FROM facts
);
* sqlite:///factbook.db Done.
name | population |
---|---|
Antarctica | 0 |
%%sql
SELECT name, population
FROM facts
WHERE population == (SELECT MAX(population)
FROM facts
);
* sqlite:///factbook.db Done.
name | population |
---|---|
World | 7256490011 |
The database includes a row for the total world population and also includes the continent of Antartica which has no permanant residents. Let's re-calculate our summary statistics without including 'World' and 'Antartica'
%%sql
SELECT MIN(population) AS min_pop,
MAX(population) AS max_pop,
MIN(population_growth) AS min_pop_growth,
MAX(population_growth) max_pop_growth
FROM facts
WHERE name <> 'Antarctica'
AND name <> 'World';
* sqlite:///factbook.db Done.
min_pop | max_pop | min_pop_growth | max_pop_growth |
---|---|---|---|
48 | 1367485388 | 0.0 | 4.02 |
%%sql
SELECT AVG(population) AS avg_pop,
AVG(area) AS avg_area
FROM facts
WHERE name <> 'Antarctica'
AND name <> 'World';
* sqlite:///factbook.db Done.
avg_pop | avg_area |
---|---|
32377011.0125 | 555093.546184739 |
The average population per country is 32,377,011 and the average total area is 555,093 sq. km.
Let's take a look at this in 2 ways:
population
and below average area
%%sql
SELECT name
FROM facts
WHERE population > (SELECT AVG(population)
FROM facts
WHERE name <> 'Antarctica'
AND name <> 'World'
)
AND area < (SELECT AVG(area)
FROM facts
WHERE name <> 'Antarctica'
AND name <> 'World'
);
* sqlite:///factbook.db Done.
name |
---|
Bangladesh |
Germany |
Iraq |
Italy |
Japan |
Korea, South |
Morocco |
Philippines |
Poland |
Spain |
Thailand |
Uganda |
United Kingdom |
Vietnam |
%%sql
SELECT name, population/area AS pop_area_ratio
FROM facts
ORDER BY pop_area_ratio DESC
LIMIT 20;
* sqlite:///factbook.db Done.
name | pop_area_ratio |
---|---|
Macau | 21168 |
Monaco | 15267 |
Singapore | 8141 |
Hong Kong | 6445 |
Gaza Strip | 5191 |
Gibraltar | 4876 |
Bahrain | 1771 |
Maldives | 1319 |
Malta | 1310 |
Bermuda | 1299 |
Sint Maarten | 1167 |
Bangladesh | 1138 |
Guernsey | 847 |
Jersey | 838 |
Barbados | 675 |
Mauritius | 656 |
Taiwan | 650 |
Aruba | 623 |
Lebanon | 594 |
Saint Martin | 588 |
%%sql
SELECT name, population
FROM facts
WHERE name IN ("Macau", "Monaco", "Singapore", "Hong Kong", "Gaza Strip");
* sqlite:///factbook.db Done.
name | population |
---|---|
Monaco | 30535 |
Singapore | 5674472 |
Hong Kong | 7141106 |
Macau | 592731 |
Gaza Strip | 1869055 |
As we can see that many of the countries with the highest populatio density (traditional population
/area
meansurement) are not on our first list of Countries with above average population
and below average area
.
This is due to the fact that the countries with the highest population density have a total population that is typically less than the worldwide average of 32,377,011. As such Banglasesh is the only country from our first list that is also on the list of the top 20 most densely populated countries.
In conclusion, the first metric omits most of the densely populated countries and as such isn't a reliable metric for determining population density.
%%sql
SELECT name, population
FROM facts
WHERE population = (SELECT MAX(population)
FROM facts
WHERE name <> 'Antarctica'
AND name <> 'World'
);
* sqlite:///factbook.db Done.
name | population |
---|---|
China | 1367485388 |
%%sql
SELECT name, population_growth
FROM facts
WHERE population_growth = (SELECT MAX(population_growth)
FROM facts
WHERE name <> 'Antarctica'
AND name <> 'World'
);
* sqlite:///factbook.db Done.
name | population_growth |
---|---|
South Sudan | 4.02 |
%%sql
SELECT name, CAST(area_water AS FLOAT)/area_land AS ratio_water
FROM facts
WHERE name <> 'Antarctica'
AND name <> 'World'
ORDER BY ratio_water DESC
LIMIT 10;
* sqlite:///factbook.db Done.
name | ratio_water |
---|---|
British Indian Ocean Territory | 905.6666666666666 |
Virgin Islands | 4.520231213872832 |
Puerto Rico | 0.5547914317925592 |
Bahamas, The | 0.3866133866133866 |
Guinea-Bissau | 0.2846728307254623 |
Malawi | 0.25939625850340137 |
Netherlands | 0.22571032366565366 |
Uganda | 0.22292237442922375 |
Eritrea | 0.16435643564356436 |
Liberia | 0.15623961794019933 |
The British Indian Ocian Territory has an extreme water to land ratio. This is due to the fact that the Territory includes seven atolls and several very small islands.
%%sql
SELECT name, ROUND(CAST(area_water AS FLOAT)/CAST(area AS FLOAT), 3) AS ratio_water
FROM facts
WHERE ratio_water > 0.5
AND name <> 'Antarctica'
AND name <> 'World';
### Which countries will add the most people to their populations next year?
* sqlite:///factbook.db Done. (sqlite3.OperationalError) unrecognized token: "#" [SQL: ### Which countries will add the most people to their populations next year?] (Background on this error at: http://sqlalche.me/e/13/e3q8)
The British Indian Ocean Territory and the Virgin Islands have more water than land.
To answer this we will take the population and multiply it by the population growth rate.
%%sql
SELECT name, population, population_growth, ROUND((population * (population_growth/100)), 0)
AS pop_increase
FROM facts
WHERE name <> 'Antarctica'
AND name <> 'World'
ORDER BY pop_increase DESC
LIMIT 10;
* sqlite:///factbook.db Done.
name | population | population_growth | pop_increase |
---|---|---|---|
India | 1251695584 | 1.22 | 15270686.0 |
China | 1367485388 | 0.45 | 6153684.0 |
Nigeria | 181562056 | 2.45 | 4448270.0 |
Pakistan | 199085847 | 1.46 | 2906653.0 |
Ethiopia | 99465819 | 2.89 | 2874562.0 |
Bangladesh | 168957745 | 1.6 | 2703324.0 |
United States | 321368864 | 0.78 | 2506677.0 |
Indonesia | 255993674 | 0.92 | 2355142.0 |
Congo, Democratic Republic of the | 79375136 | 2.45 | 1944691.0 |
Philippines | 100998376 | 1.61 | 1626074.0 |
India will add the most people to its population next year with an increase ov over 15 million. Dispite China having nearly the same population as India, India will add over twice the amonnt of people than China. This is due to India's growth rate being almost triple than China's.
To answer this we will subtract the death_rate
from the birth_rate
to find the net birth rate. Then we can find which ones are negative (decreasing population).
%%sql
SELECT name, ROUND((birth_rate - death_rate), 2) AS net_growth_rate
FROM facts
WHERE death_rate > birth_rate
ORDER BY net_growth_rate;
* sqlite:///factbook.db Done.
name | net_growth_rate |
---|---|
Bulgaria | -5.52 |
Serbia | -4.58 |
Latvia | -4.31 |
Lithuania | -4.17 |
Ukraine | -3.74 |
Hungary | -3.57 |
Germany | -2.95 |
Slovenia | -2.95 |
Romania | -2.76 |
Croatia | -2.73 |
Belarus | -2.66 |
Monaco | -2.59 |
Greece | -2.43 |
Saint Pierre and Miquelon | -2.3 |
Russia | -2.09 |
Estonia | -1.89 |
Portugal | -1.75 |
Japan | -1.58 |
Italy | -1.45 |
Bosnia and Herzegovina | -0.88 |
Czech Republic | -0.71 |
Moldova | -0.59 |
Poland | -0.45 |
Austria | -0.01 |
This list is dominated by Eastern European & Balkan countries