Visualisations and descriptive statistics

This notebook contains some basic visualisations and descriptive statistics.

In [1]:
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
figsize = (12, 9)
In [2]:
import pandas as pd
import glob
In [3]:
dataframes = []

for f in glob.glob("fallzahlen_kanton_total_csv/COVID19_Fallzahlen_Kanton_*total.csv"):
    dataframes.append(pd.read_csv(f))

df = pd.concat(dataframes)

df['date'] = pd.to_datetime(df['date'], dayfirst=True)
/Users/thead/miniconda/envs/sk-playground/lib/python3.7/site-packages/ipykernel_launcher.py:6: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  
In [4]:
df.head()
Out[4]:
TotalCured TotalPosTests1 abbreviation_canton_and_fl date ncumul_ICU ncumul_conf ncumul_deceased ncumul_hosp ncumul_released ncumul_tested ncumul_vent source time
0 NaN NaN AG 2020-02-28 NaN 1.0 NaN NaN NaN NaN NaN https://www.ag.ch/media/kanton_aargau/themen_1... 15:00
1 NaN NaN AG 2020-03-02 NaN 2.0 NaN NaN NaN NaN NaN https://www.ag.ch/media/kanton_aargau/themen_1... 18:00
2 NaN NaN AG 2020-03-03 NaN 6.0 NaN NaN NaN NaN NaN https://www.ag.ch/media/kanton_aargau/themen_1... 15:00
3 NaN NaN AG 2020-03-04 NaN 7.0 NaN NaN NaN NaN NaN https://www.ag.ch/media/kanton_aargau/themen_1... 15:00
4 NaN NaN AG 2020-03-05 NaN 9.0 NaN NaN NaN NaN NaN https://www.ag.ch/media/kanton_aargau/themen_1... 15:00
In [5]:
def transform_df(df, column, pretty_name):
    """Pick out one column of interest"""
    df = (df
          .rename(columns={"abbreviation_canton_and_fl": "Canton",
                           column: pretty_name,
                           "date": "Date"})
          .pivot_table(index="Date",
                       values=[pretty_name],
                       columns=['Canton'])
         )
    # Create a row for every day
    all_days = pd.date_range(df.index.min(), df.index.max(), freq='D')
    df = df.reindex(all_days)
    # Fill missing values with previous day's number
    df.fillna(method='pad', inplace=True)
    # Now there are only missing values at the start
    # of the series, so set them to zero
    df.fillna(value=0, inplace=True)
    return df


def time_to_double(series):
    """Compute how long it took for the most recent doubling"""
    last_day = series.index[-1]
    current = series[last_day]
    half = current / 2
    
    half_day = series.where(series < half).dropna().index[-1]
    return last_day - half_day

Deaths

Confirmed deaths is a quantity that is known.

In [6]:
deaths_per_canton = transform_df(df, "ncumul_deceased", "Deaths")
In [7]:
# double check
deaths_per_canton.tail().T
Out[7]:
2020-03-17 00:00:00 2020-03-18 00:00:00 2020-03-19 00:00:00 2020-03-20 00:00:00 2020-03-21 00:00:00
Canton
Deaths AG 0.0 0.0 0.0 1.0 1.0
BE 1.0 1.0 1.0 2.0 2.0
BL 2.0 2.0 2.0 3.0 3.0
BS 4.0 4.0 4.0 4.0 5.0
FR 0.0 1.0 1.0 1.0 1.0
GE 0.0 0.0 0.0 7.0 7.0
GR 0.0 1.0 1.0 3.0 3.0
NE 0.0 0.0 0.0 2.0 2.0
TI 10.0 14.0 15.0 22.0 28.0
VD 5.0 5.0 7.0 12.0 12.0
VS 3.0 3.0 4.0 5.0 5.0
ZH 1.0 1.0 3.0 3.0 3.0
In [8]:
ch = deaths_per_canton.sum(axis=1)
In [9]:
print("It took {} days for the number of deaths to double.".format(time_to_double(ch).days))
It took 3 days for the number of deaths to double.
In [10]:
ax = ch.plot(legend=None, title="Deaths across Switzerland")
ax.set_ylabel("Deaths");
ax.set_yscale('log', nonposy='clip')

Confirmed cases

Read https://ourworldindata.org/coronavirus#cases-of-covid-19 for caveats when it comes to interpreting these numbers.

In [11]:
confirmed_per_canton = transform_df(df, "ncumul_conf", "Cases")
In [12]:
# Look at the last few days to double check data
confirmed_per_canton.tail().T
Out[12]:
2020-03-17 00:00:00 2020-03-18 00:00:00 2020-03-19 00:00:00 2020-03-20 00:00:00 2020-03-21 00:00:00
Canton
Cases AG 67.0 101.0 118.0 165.0 165.0
AI 5.0 5.0 6.0 6.0 6.0
AR 5.0 11.0 11.0 11.0 11.0
BE 123.0 193.0 282.0 377.0 377.0
BL 89.0 116.0 134.0 184.0 282.0
BS 165.0 182.0 222.0 272.0 299.0
FR 11.0 11.0 11.0 11.0 11.0
GE 0.0 0.0 0.0 873.0 873.0
GL 10.0 10.0 17.0 17.0 17.0
GR 0.0 116.0 145.0 213.0 213.0
JU 0.0 0.0 27.0 29.0 29.0
LU 0.0 65.0 65.0 92.0 109.0
NE 0.0 0.0 0.0 159.0 159.0
NW 4.0 4.0 25.0 28.0 28.0
OW 1.0 1.0 1.0 1.0 1.0
SG 47.0 61.0 85.0 98.0 98.0
SH 0.0 0.0 0.0 14.0 14.0
SZ 13.0 13.0 13.0 13.0 13.0
TG 23.0 32.0 36.0 49.0 56.0
TI 422.0 511.0 638.0 834.0 918.0
UR 2.0 5.0 7.0 7.0 12.0
VD 608.0 796.0 1212.0 1432.0 1432.0
VS 95.0 95.0 232.0 282.0 282.0
ZG 24.0 24.0 24.0 48.0 48.0
ZH 294.0 424.0 526.0 773.0 773.0
In [13]:
# this is a pretty messy plot, but it can serve as a base
# for picking out a few cantons or some such
ax = confirmed_per_canton.plot(legend=None, title="Confirmed cases per canton")
ax.set_yscale('log', nonposy='clip')
In [14]:
# Switzerland wide cases
ch = confirmed_per_canton.sum(axis=1)
In [15]:
print("It took {} days for the number of confirmed cases to double.".format(time_to_double(ch).days))
It took 3 days for the number of confirmed cases to double.
In [16]:
ax = ch.plot(legend=None, title="Confirmed cases across Switzerland")
ax.set_ylabel("Cases");
ax.set_yscale('log', nonposy='clip')

Created by Tim Head [email protected] licensed under CC-BY-4.