In this project, we will use Nager.Date API to create datasets of public holidays and long weekends in different countries of the world in 2019. At the end of the project, we will have two datasets that can be analyzed in the future:
world_holidays_2019
.world_long_weekends_2019
The world_holidays_2019
dataset will have the following columns:
date
: the date of the holiday.local_name
: the name of the holiday in the country.english_name
: the name of the holiday in English.country_code
: two-letter country code.country_name
: full country name.fixed_date
: is this holiday celebrated every year on the same date.global_holiday
: is this holiday international.counties
: federal state code.launch_year
: when this holiday was launched.type
: the holiday's typeAnd the world_long_weekends_2019
dataset these columns:
start_date
: when the weekend starts.end_date
: when the weekend ends.weekend_length
: how many days the weekend lasts.need_bridge_day
: if this weekend needs a bridge day.country_code
: two-letter country code.country_name
: full country name.# Import the necessary libraries
import json
import time
import pandas as pd
import requests
import requests_cache
from IPython.core.display import clear_output
In this section, we will create the dataframes that we will clean a bit and transform into two csv
files:
world_holidays_2019.csv
containing the holidays in different countries in 2019.world_long_weekends_2019.csv
containing the long weekends in different countries in 2019.# Function to prettify the returned list from response
def prettify_json(python_obj):
text = json.dumps(python_obj, indent=4)
print(text)
Before we proceed we will need to extract the two-letter country code for each country that we will insert as a parameter to the API URLs.
url_countries = "https://date.nager.at/Api/v2/AvailableCountries"
available_countries = requests.get(url_countries)
# Return the first 3 dictionaries to study the structureb
prettify_json(available_countries.json()[:3])
[ { "key": "AD", "value": "Andorra" }, { "key": "AL", "value": "Albania" }, { "key": "AR", "value": "Argentina" } ]
We are going to add both country codes and names to our final datasets so we have to create a dataframe of all available countries that we will merge with the final dataframes. We can pass each country code from the countries dataframe to the API URLs to extract information about holidays and long weekends for each country.
# Create the dataframe of available countries
countries_df = pd.DataFrame(available_countries.json())
# Rename columns
countries_df.columns = ["country_code", "country_name"]
# Check the dataframe
countries_df.head()
country_code | country_name | |
---|---|---|
0 | AD | Andorra |
1 | AL | Albania |
2 | AR | Argentina |
3 | AT | Austria |
4 | AU | Australia |
Now we are ready to extract information about public holidays and long weekends in 2019. We will use two for
loops to loop over the country codes and get an API response for each country that we will append to two different lists: one for the holidays and one for weekends.
In case of weekends, we will append directly lists of dictionaries containing the information about long weekends while in case of holidays we will append Response
objects. We do so because we want to add missing country codes to the weekends' dictionaries.
# Create a local cache for holidays requests
requests_cache.install_cache("holidays")
# Initialize empty lists for responses
holidays_responses = []
weekends_responses = []
# Convert the `country_code` name in a list to pass each element in the API url
for code in list(countries_df["country_code"]):
holiday_url = "https://date.nager.at/Api/v2/PublicHolidays/2019/{}".format(code)
# Get the API response and append it to the list of responses
holiday_response = requests.get(holiday_url)
holidays_responses.append(holiday_response)
# If not cached sleep for 0.5 seconds
if not getattr(holiday_response, "from_cache", False):
time.sleep(0.5)
# Repeat the procedure to get responses for long weekends
weekend_url = "https://date.nager.at/Api/v2/LongWeekend/2019/{}".format(code)
weekend_response = requests.get(weekend_url).json()
# Add a country code to each dictionary of long weekends
for d in weekend_response:
d.update({"country_code": code})
# Append the API response to the list of responses
weekends_responses.append(weekend_response)
Let's now look at how our data is organized in pandas
dataframes.
pd.DataFrame(holidays_responses[0].json())
date | localName | name | countryCode | fixed | global | counties | launchYear | type | |
---|---|---|---|---|---|---|---|---|---|
0 | 2019-01-01 | Any nou | New Year's Day | AD | True | True | None | None | Public |
1 | 2019-03-14 | Dia de la Constitució | Constitution Day | AD | True | True | None | None | Public |
2 | 2019-03-14 | Mare de Déu de Meritxell | National Holiday | AD | True | True | None | None | Public |
3 | 2019-12-25 | Nadal | Christmas Day | AD | True | True | None | None | Public |
pd.DataFrame(weekends_responses[0])
startDate | endDate | dayCount | needBridgeDay | country_code | |
---|---|---|---|---|---|
0 | 2018-12-29 | 2019-01-01 | 4 | True | AD |
1 | 2019-03-14 | 2019-03-17 | 4 | True | AD |
We now can create two lists of dataframes out of the created lists and concatenate all dataframes in that list.
# Create two list of dataframes
holidays_frames = [pd.DataFrame(x.json()) for x in holidays_responses]
weekends_frames = [pd.DataFrame(x) for x in weekends_responses]
# Concatenate the dataframes from the lists
holidays = pd.concat(holidays_frames, ignore_index=True)
weekends = pd.concat(weekends_frames, ignore_index=True)
Before we export the dataframes in csv
files we will:
It is possible to do some more data cleaning but we are just preparing the dataset for a further analysis so we will not dive into the details.
# Rename columns in the `holidays` dataframe
holidays = holidays.rename(
columns={
"localName": "local_name",
"name": "english_name",
"countryCode": "country_code",
"fixed": "fixed_date",
"global": "global_holiday",
"launchYear": "launch_year",
}
)
# Check if everything is correct
holidays.head()
date | local_name | english_name | country_code | fixed_date | global_holiday | counties | launch_year | type | |
---|---|---|---|---|---|---|---|---|---|
0 | 2019-01-01 | Any nou | New Year's Day | AD | True | True | None | None | Public |
1 | 2019-03-14 | Dia de la Constitució | Constitution Day | AD | True | True | None | None | Public |
2 | 2019-03-14 | Mare de Déu de Meritxell | National Holiday | AD | True | True | None | None | Public |
3 | 2019-12-25 | Nadal | Christmas Day | AD | True | True | None | None | Public |
4 | 2019-01-01 | Viti i Ri | New Year's Day | AL | True | True | None | None | Public |
# Rename columns in the `weekends` dataframe
weekends = weekends.rename(
columns={
"startDate": "start_date",
"endDate": "end_date",
"dayCount": "weekend_length",
"needBridgeDay": "need_bridge_day",
}
)
# Check if everything is correct
weekends.head()
start_date | end_date | weekend_length | need_bridge_day | country_code | |
---|---|---|---|---|---|
0 | 2018-12-29 | 2019-01-01 | 4 | True | AD |
1 | 2019-03-14 | 2019-03-17 | 4 | True | AD |
2 | 2018-12-29 | 2019-01-02 | 5 | True | AL |
3 | 2019-03-14 | 2019-03-17 | 4 | True | AL |
4 | 2019-03-22 | 2019-03-24 | 3 | False | AL |
After renaming the columns we can export the dataframes in csv
files that we can analyze in the future. Before doing so we will add full country names to both dataframes.
# Merge the dataframes
holidays = pd.merge(holidays, countries_df, on="country_code")
weekends = pd.merge(weekends, countries_df, on="country_code")
# Reorder the columns in the `holidays` dataframe to have `country_name` after `country_code`
cols = [
"date",
"local_name",
"english_name",
"country_code",
"country_name",
"fixed_date",
"global_holiday",
"counties",
"launch_year",
"type",
]
holidays = holidays[cols]
# Check the `holidays` dataframe
holidays.head()
date | local_name | english_name | country_code | country_name | fixed_date | global_holiday | counties | launch_year | type | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2019-01-01 | Any nou | New Year's Day | AD | Andorra | True | True | None | None | Public |
1 | 2019-03-14 | Dia de la Constitució | Constitution Day | AD | Andorra | True | True | None | None | Public |
2 | 2019-03-14 | Mare de Déu de Meritxell | National Holiday | AD | Andorra | True | True | None | None | Public |
3 | 2019-12-25 | Nadal | Christmas Day | AD | Andorra | True | True | None | None | Public |
4 | 2019-01-01 | Viti i Ri | New Year's Day | AL | Albania | True | True | None | None | Public |
# Check the `weekends` dataframe
weekends.head()
start_date | end_date | weekend_length | need_bridge_day | country_code | country_name | |
---|---|---|---|---|---|---|
0 | 2018-12-29 | 2019-01-01 | 4 | True | AD | Andorra |
1 | 2019-03-14 | 2019-03-17 | 4 | True | AD | Andorra |
2 | 2018-12-29 | 2019-01-02 | 5 | True | AL | Albania |
3 | 2019-03-14 | 2019-03-17 | 4 | True | AL | Albania |
4 | 2019-03-22 | 2019-03-24 | 3 | False | AL | Albania |
Now we can export the dataframes to csv
files that we can analyze in the future.
# Export the dataframes in csv files
holidays.to_csv("world_holidays_2019.csv", index=False)
weekends.to_csv("world_long_weekends_2019.csv", index=False)
One can use these datasets to answer the following questions (and not only!):
global_holiday
column)?In the project, we used Nager.Date API to create two datasets of holidays and long weekends in different countries all over the globe. We also did some data cleaning to prepare the data for a more comfortable analysis. We also proposed some questions that can be answerred using these datasets.