Creating Holidays Datasets¶

In this project, we will use Nager.Date API to create datasets of public holidays and long weekends in different countries of the world in 2019. At the end of the project, we will have two datasets that can be analyzed in the future:

world_holidays_2019.
world_long_weekends_2019

The world_holidays_2019 dataset will have the following columns:

date: the date of the holiday.
local_name: the name of the holiday in the country.
english_name: the name of the holiday in English.
country_code: two-letter country code.
country_name: full country name.
fixed_date: is this holiday celebrated every year on the same date.
global_holiday: is this holiday international.
counties: federal state code.
launch_year: when this holiday was launched.
type: the holiday's type

And the world_long_weekends_2019 dataset these columns:

start_date: when the weekend starts.
end_date: when the weekend ends.
weekend_length: how many days the weekend lasts.
need_bridge_day: if this weekend needs a bridge day.
country_code: two-letter country code.
country_name: full country name.

In [1]:

# Import the necessary libraries
import json
import time

import pandas as pd
import requests
import requests_cache
from IPython.core.display import clear_output

Creating the Dataframes¶

In this section, we will create the dataframes that we will clean a bit and transform into two csv files:

world_holidays_2019.csv containing the holidays in different countries in 2019.
world_long_weekends_2019.csv containing the long weekends in different countries in 2019.

In [2]:

# Function to prettify the returned list from response
def prettify_json(python_obj):
    text = json.dumps(python_obj, indent=4)
    print(text)

Before we proceed we will need to extract the two-letter country code for each country that we will insert as a parameter to the API URLs.

In [3]:

url_countries = "https://date.nager.at/Api/v2/AvailableCountries"
available_countries = requests.get(url_countries)

# Return the first 3 dictionaries to study the structureb
prettify_json(available_countries.json()[:3])

[
    {
        "key": "AD",
        "value": "Andorra"
    },
    {
        "key": "AL",
        "value": "Albania"
    },
    {
        "key": "AR",
        "value": "Argentina"
    }
]

We are going to add both country codes and names to our final datasets so we have to create a dataframe of all available countries that we will merge with the final dataframes. We can pass each country code from the countries dataframe to the API URLs to extract information about holidays and long weekends for each country.

In [4]:

# Create the dataframe of available countries
countries_df = pd.DataFrame(available_countries.json())

# Rename columns
countries_df.columns = ["country_code", "country_name"]

# Check the dataframe
countries_df.head()

Out[4]:

	country_code	country_name
0	AD	Andorra
1	AL	Albania
2	AR	Argentina
3	AT	Austria
4	AU	Australia

Now we are ready to extract information about public holidays and long weekends in 2019. We will use two for loops to loop over the country codes and get an API response for each country that we will append to two different lists: one for the holidays and one for weekends.

In case of weekends, we will append directly lists of dictionaries containing the information about long weekends while in case of holidays we will append Response objects. We do so because we want to add missing country codes to the weekends' dictionaries.

In [5]:

# Create a local cache for holidays requests
requests_cache.install_cache("holidays")

# Initialize empty lists for responses
holidays_responses = []
weekends_responses = []

# Convert the `country_code` name in a list to pass each element in the API url
for code in list(countries_df["country_code"]):

    holiday_url = "https://date.nager.at/Api/v2/PublicHolidays/2019/{}".format(code)

    # Get the API response and append it to the list of responses
    holiday_response = requests.get(holiday_url)
    holidays_responses.append(holiday_response)

    # If not cached sleep for 0.5 seconds
    if not getattr(holiday_response, "from_cache", False):
        time.sleep(0.5)

    # Repeat the procedure to get responses for long weekends
    weekend_url = "https://date.nager.at/Api/v2/LongWeekend/2019/{}".format(code)
    weekend_response = requests.get(weekend_url).json()

    # Add a country code to each dictionary of long weekends
    for d in weekend_response:
        d.update({"country_code": code})

    # Append the API response to the list of responses
    weekends_responses.append(weekend_response)

Let's now look at how our data is organized in pandas dataframes.

In [6]:

pd.DataFrame(holidays_responses[0].json())

Out[6]:

	date	localName	name	countryCode	fixed	global	counties	launchYear	type
0	2019-01-01	Any nou	New Year's Day	AD	True	True	None	None	Public
1	2019-03-14	Dia de la Constitució	Constitution Day	AD	True	True	None	None	Public
2	2019-03-14	Mare de Déu de Meritxell	National Holiday	AD	True	True	None	None	Public
3	2019-12-25	Nadal	Christmas Day	AD	True	True	None	None	Public

In [7]:

pd.DataFrame(weekends_responses[0])

Out[7]:

	startDate	endDate	dayCount	needBridgeDay	country_code
0	2018-12-29	2019-01-01	4	True	AD
1	2019-03-14	2019-03-17	4	True	AD

We now can create two lists of dataframes out of the created lists and concatenate all dataframes in that list.

In [8]:

# Create two list of dataframes
holidays_frames = [pd.DataFrame(x.json()) for x in holidays_responses]
weekends_frames = [pd.DataFrame(x) for x in weekends_responses]

# Concatenate the dataframes from the lists
holidays = pd.concat(holidays_frames, ignore_index=True)
weekends = pd.concat(weekends_frames, ignore_index=True)

Data Cleaning and Exporting¶

Before we export the dataframes in csv files we will:

Rename the columns to make them more readable and descriptive.
Add a column with full country names.
Reorder the columns in a more logical way.

It is possible to do some more data cleaning but we are just preparing the dataset for a further analysis so we will not dive into the details.

In [9]:

# Rename columns in the `holidays` dataframe
holidays = holidays.rename(
    columns={
        "localName": "local_name",
        "name": "english_name",
        "countryCode": "country_code",
        "fixed": "fixed_date",
        "global": "global_holiday",
        "launchYear": "launch_year",
    }
)

# Check if everything is correct
holidays.head()

Out[9]:

	date	local_name	english_name	country_code	fixed_date	global_holiday	counties	launch_year	type
0	2019-01-01	Any nou	New Year's Day	AD	True	True	None	None	Public
1	2019-03-14	Dia de la Constitució	Constitution Day	AD	True	True	None	None	Public
2	2019-03-14	Mare de Déu de Meritxell	National Holiday	AD	True	True	None	None	Public
3	2019-12-25	Nadal	Christmas Day	AD	True	True	None	None	Public
4	2019-01-01	Viti i Ri	New Year's Day	AL	True	True	None	None	Public

In [10]:

# Rename columns in the `weekends` dataframe
weekends = weekends.rename(
    columns={
        "startDate": "start_date",
        "endDate": "end_date",
        "dayCount": "weekend_length",
        "needBridgeDay": "need_bridge_day",
    }
)

# Check if everything is correct
weekends.head()

Out[10]:

	start_date	end_date	weekend_length	need_bridge_day	country_code
0	2018-12-29	2019-01-01	4	True	AD
1	2019-03-14	2019-03-17	4	True	AD
2	2018-12-29	2019-01-02	5	True	AL
3	2019-03-14	2019-03-17	4	True	AL
4	2019-03-22	2019-03-24	3	False	AL

After renaming the columns we can export the dataframes in csv files that we can analyze in the future. Before doing so we will add full country names to both dataframes.

In [11]:

# Merge the dataframes
holidays = pd.merge(holidays, countries_df, on="country_code")
weekends = pd.merge(weekends, countries_df, on="country_code")

# Reorder the columns in the `holidays` dataframe to have `country_name` after `country_code`
cols = [
    "date",
    "local_name",
    "english_name",
    "country_code",
    "country_name",
    "fixed_date",
    "global_holiday",
    "counties",
    "launch_year",
    "type",
]

holidays = holidays[cols]

# Check the `holidays` dataframe
holidays.head()

Out[11]:

	date	local_name	english_name	country_code	country_name	fixed_date	global_holiday	counties	launch_year	type
0	2019-01-01	Any nou	New Year's Day	AD	Andorra	True	True	None	None	Public
1	2019-03-14	Dia de la Constitució	Constitution Day	AD	Andorra	True	True	None	None	Public
2	2019-03-14	Mare de Déu de Meritxell	National Holiday	AD	Andorra	True	True	None	None	Public
3	2019-12-25	Nadal	Christmas Day	AD	Andorra	True	True	None	None	Public
4	2019-01-01	Viti i Ri	New Year's Day	AL	Albania	True	True	None	None	Public

In [12]:

# Check the `weekends` dataframe
weekends.head()

Out[12]:

	start_date	end_date	weekend_length	need_bridge_day	country_code	country_name
0	2018-12-29	2019-01-01	4	True	AD	Andorra
1	2019-03-14	2019-03-17	4	True	AD	Andorra
2	2018-12-29	2019-01-02	5	True	AL	Albania
3	2019-03-14	2019-03-17	4	True	AL	Albania
4	2019-03-22	2019-03-24	3	False	AL	Albania

Now we can export the dataframes to csv files that we can analyze in the future.

In [13]:

# Export the dataframes in csv files
holidays.to_csv("world_holidays_2019.csv", index=False)
weekends.to_csv("world_long_weekends_2019.csv", index=False)

Next Steps¶

One can use these datasets to answer the following questions (and not only!):

Which countries have the most number of holidays?
Which countries have the most number of free days?
Which are truly global holidays (yes, there are a lot of mistakes in the global_holiday column)?
Which month have the most number of free days worldwide? In each country?
Are there similar holidays in different countries? If so which?

Conclusions¶

In the project, we used Nager.Date API to create two datasets of holidays and long weekends in different countries all over the globe. We also did some data cleaning to prepare the data for a more comfortable analysis. We also proposed some questions that can be answerred using these datasets.