Notebook

Covid detection in Ireland¶

Statistics on detected spread of Coronavirus in Ireland. The dictionary below shows several pages from the source at https://www.worldometers.info/coronavirus/#countries. To chose a country other than Ireland, use 'country = key' in line 2 below. You can add to the list by exploring the sibling pages and adding their URLs to the dictionary below.

Locate and frame data¶

In [1]:

# Choose country here...
country = 'Ireland'

# From this dictionary
countries = {
	'Ireland': 		'https://www.worldometers.info/coronavirus/country/ireland/',
	'UK': 			'https://www.worldometers.info/coronavirus/country/uk/',
	'USA': 			'https://www.worldometers.info/coronavirus/country/us/',
	'Italy': 		'https://www.worldometers.info/coronavirus/country/italy/',
	'South Korea': 	'https://www.worldometers.info/coronavirus/country/south-korea',
	'Poland':		'https://www.worldometers.info/coronavirus/country/poland/',
	'China':		'https://www.worldometers.info/coronavirus/country/china/'
}
# You can add to this. The parent page is https://www.worldometers.info/coronavirus/#countries
# Select the country, and paste the key: Url pair above

Now we scrape data from the page source

In [2]:

import requests
from bs4 import BeautifulSoup
import re
import pandas as pd


url = countries[country]
content = requests.get(url).text
soup = BeautifulSoup(content, features='html.parser')

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-61db1a0bfbc9> in <module>
      1 import requests
----> 2 from BeautifulSoup import BeautifulSoup
      3 import re
      4 import pandas as pd
      5 

ModuleNotFoundError: No module named 'BeautifulSoup'

We will use this function again later, so define it here as a function.

In [ ]:

def get_data(div_id):
    '''Find the script element following div with id div_id'''
    script = soup.find('div', id = div_id).findNext('script')

    # Find the data line
    lines = script.text.split('\n')
    datas = [l for l in lines if l.strip().startswith('data')]
    data_line = datas[0].strip()

    # Extract the data from the data_line
    counts_text = re.findall('[0-9]+', data_line)
    counts = list(map(int, counts_text))

    # Create DataFrame with Cumulative Sum of the daily data
    df = pd.DataFrame(counts, columns=['DailyIncrease'])
    df['Total'] = df.DailyIncrease.cumsum()
    
    return df

In [ ]:

df = get_data('graph-cases-daily')
print(df)

Plots¶

We can plot this raw data already

In [ ]:

import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(df.DailyIncrease, color = '#888800FF', label = 'Daily Increase')
plt.plot(df.Total, color = '#008800FF', label = 'Total Cases')

plt.title(country)
plt.legend()
plt.grid()
plt.xlabel('Day')
plt.ylabel('Count')
plt.show()

We can fit these curves to the nearest matching exponential. First, define the exponential function, and then use scypy to fit the curve

In [ ]:

# The function to fit datapoints to is exponential. B is the growth rate.
import numpy as np
def exponential(x, a, b, c):
    return a * np.exp(b * x) + c

# Use scypy
from scipy.optimize import curve_fit

Plot again, now with the smoothened version overlaid.

In [ ]:

'''Plot the daily and total rates, and then the smootened version'''
params_daily, _ = curve_fit(exponential, df.index, df.DailyIncrease)
params_total, _ = curve_fit(exponential, df.index, df.Total)
# Plot the data
plt.plot(df.DailyIncrease, color = '#888800FF', label = 'Daily Increase')
plt.plot(exponential(df.index, *params_daily), 'r-', color = '#88880088', label = 'Daily Increase (smooth)')

plt.plot(df.Total, color = '#008800FF', label = 'Total Cases')
plt.plot(exponential(df.index, *params_total), 'r-',color = '#00880088', label = 'Total (smooth)')

plt.title(country)
plt.legend()
plt.grid()
plt.xlabel('Day')
plt.ylabel('Count')
plt.show()

Parameters of interest¶

In the exponential function above, the 'b' parameter represented the growth rate. We can use the rule of seventy to estimate the doubling rate from this.

In [ ]:

# A quick lambda to limit decimal places displayed
short = lambda x: '{:.1f}'.format(x)

# The true value of seventy in the rule of seventy
seventy = np.log(2)

# And a lambda for the rule of seventy
doubling_time = lambda x: seventy/x

daily_growth_rate = params_daily[1]
total_growth_rate = params_total[1]

daily_doubling_rate = doubling_time(daily_growth_rate)
total_doubling_rate = doubling_time(total_growth_rate)


print(f'Daily Detections: growth rate = {short(daily_growth_rate)}, doubling time = {short(daily_doubling_rate)} days')
print(f'Total Detections: growth rate = {short(total_growth_rate)}, doubling time = {short(total_doubling_rate)} days')

Rate of change in parameters of interest¶

We don't really have enough data points for this, but let's look to see if there is any emerging trend in parameters like the doubling rates.

Using the dataframe, for each row, recalculate the exponential curve, and from that, what the estimated doubling rate was on that day. That is, we show the history of estimates of doubling rates as they would have been calculated on that day.

In [ ]:

def get_stats(df):
#     print('start')
    params_daily, _ = curve_fit(exponential, df.index, df.DailyIncrease)
    params_total, _ = curve_fit(exponential, df.index, df.Total)
#     print('fin')
    daily_growth_rate = params_daily[1]
    total_growth_rate = params_total[1]

    daily_doubling_rate = doubling_time(daily_growth_rate)
    total_doubling_rate = doubling_time(total_growth_rate)

    return daily_growth_rate, daily_doubling_rate, total_growth_rate, total_doubling_rate

In [ ]:

minimum = 8

for index, row in df.iterrows():
    
    head = df.head(index)

    try:
        daily_growth_rate, daily_doubling_rate, total_growth_rate, total_doubling_rate = get_stats(head)
    except:
        daily, total = 0, 0
    
    df.at[index, 'DGR'] = daily_growth_rate
    df.at[index, 'DDR'] = daily_doubling_rate
    df.at[index, 'TGR'] = total_growth_rate
    df.at[index, 'TDR'] = total_doubling_rate
        
# print(df)

In the plot below, we want consistently negative growth rates, meaning that the number of new cases is declining daily.

In [ ]:

plt.plot(df.DGR, color = '#888800FF', label = 'Daily Growth Rate')
plt.plot(df.TGR, color = '#008800FF', label = 'Total Growth Rate')

plt.title(country)
plt.legend()
plt.grid()
plt.xlabel('Day')
plt.ylabel('Count')
plt.show()

Death Rates¶

I don't want to be morbid, but these are starting to ramp up now too.

In [ ]:

df = get_data('graph-deaths-daily')
print(df)

In [ ]:

# Plot the data
plt.plot(df.DailyIncrease, color = '#888800FF', label = 'Daily Deaths')
plt.plot(df.Total, color = '#008800FF', label = 'Total Deaths')

plt.title(country)
plt.legend()
plt.grid()
plt.xlabel('Day')
plt.ylabel('Count')
plt.show()