Exploring data on COVID-19

In this notebook we will explore and analyse some data on the advance of the COVID-19 pandemic. The goal is to produce a plot like this:

plot

Shift-enter to evaluate a cell

$y = x^2$

In [ ]:

In [1]:
url = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
Out[1]:
"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
In [2]:
url
Out[2]:
"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
In [3]:
typeof(url)
Out[3]:
String
In [4]:
x = 3
Out[4]:
3
In [5]:
typeof(x)
Out[5]:
Int64
In [6]:
x * x
Out[6]:
9
In [7]:
url * url
Out[7]:
"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csvhttps://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"
In [8]:
*
Out[8]:
* (generic function with 357 methods)
In [1]:
methods(*);  # remove `;` to see the output
In [11]:
(1 + 2im) * (3 + im)
Out[11]:
1 + 7im
In [12]:
@which (1 + 2im) * (3 + im)
Out[12]:
*(z::Complex, w::Complex) in Base at complex.jl:277
In [13]:
url
Out[13]:
"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv"

Grab the data

In [14]:
download(url, "covid_data.csv")
Out[14]:
"covid_data.csv"
In [15]:
readdir
Out[15]:
readdir (generic function with 2 methods)
In [16]:
readdir()
Out[16]:
3-element Array{String,1}:
 ".ipynb_checkpoints"
 "01 - Exploring COVID-19 data.ipynb"
 "covid_data.csv"

Install a package ONCE in our current Julia installation:

In [18]:
using Pkg   # built-in package manager in Julia: Pkg
Pkg.add("CSV")   # calls the `add` function from the module Pkg.  This installs a package
   Updating registry at `~/.julia/registries/General`
    
   Updating git-repo `https://github.com/JuliaRegistries/General.git`
    Fetching: [======================>                  ]  [1mFetching: [========================================>]  100.0 %
  Resolving package versions...
  Installed OpenSSL_jll ─ v1.1.1+2
######################################################################### 100.0%##O#- #                                                                        
   Updating `~/.julia/environments/v1.4/Project.toml`
 [no changes]
   Updating `~/.julia/environments/v1.4/Manifest.toml`
  [458c3c95] ↑ OpenSSL_jll v1.1.1+1 ⇒ v1.1.1+2

Load a package every time we run a Julia session:

In [20]:
using CSV   # Comma Separated Values
In [21]:
CSV.read("covid_data.csv")  # run the function `read` from the package CSV
Out[21]:

253 rows × 72 columns (omitted printing of 67 columns)

Province/StateCountry/RegionLatLong1/22/20
String⍰StringFloat64Float64Int64
1missingAfghanistan33.065.00
2missingAlbania41.153320.16830
3missingAlgeria28.03391.65960
4missingAndorra42.50631.52180
5missingAngola-11.202717.87390
6missingAntigua and Barbuda17.0608-61.79640
7missingArgentina-38.4161-63.61670
8missingArmenia40.069145.03820
9Australian Capital TerritoryAustralia-35.4735149.0120
10New South WalesAustralia-33.8688151.2090
11Northern TerritoryAustralia-12.4634130.8460
12QueenslandAustralia-28.0167153.40
13South AustraliaAustralia-34.9285138.6010
14TasmaniaAustralia-41.4545145.9710
15VictoriaAustralia-37.8136144.9630
16Western AustraliaAustralia-31.9505115.8610
17missingAustria47.516214.55010
18missingAzerbaijan40.143147.57690
19missingBahamas25.0343-77.39630
20missingBahrain26.027550.550
21missingBangladesh23.68590.35630
22missingBarbados13.1939-59.54320
23missingBelarus53.709827.95340
24missingBelgium50.83334.00
25missingBenin9.30772.31580
26missingBhutan27.514290.43360
27missingBolivia-16.2902-63.58870
28missingBosnia and Herzegovina43.915917.67910
29missingBrazil-14.235-51.92530
30missingBrunei4.5353114.7280
In [27]:
data = CSV.read("covid_data.csv");
In [28]:
data
Out[28]:

253 rows × 72 columns (omitted printing of 67 columns)

Province/StateCountry/RegionLatLong1/22/20
String⍰StringFloat64Float64Int64
1missingAfghanistan33.065.00
2missingAlbania41.153320.16830
3missingAlgeria28.03391.65960
4missingAndorra42.50631.52180
5missingAngola-11.202717.87390
6missingAntigua and Barbuda17.0608-61.79640
7missingArgentina-38.4161-63.61670
8missingArmenia40.069145.03820
9Australian Capital TerritoryAustralia-35.4735149.0120
10New South WalesAustralia-33.8688151.2090
11Northern TerritoryAustralia-12.4634130.8460
12QueenslandAustralia-28.0167153.40
13South AustraliaAustralia-34.9285138.6010
14TasmaniaAustralia-41.4545145.9710
15VictoriaAustralia-37.8136144.9630
16Western AustraliaAustralia-31.9505115.8610
17missingAustria47.516214.55010
18missingAzerbaijan40.143147.57690
19missingBahamas25.0343-77.39630
20missingBahrain26.027550.550
21missingBangladesh23.68590.35630
22missingBarbados13.1939-59.54320
23missingBelarus53.709827.95340
24missingBelgium50.83334.00
25missingBenin9.30772.31580
26missingBhutan27.514290.43360
27missingBolivia-16.2902-63.58870
28missingBosnia and Herzegovina43.915917.67910
29missingBrazil-14.235-51.92530
30missingBrunei4.5353114.7280
In [29]:
typeof(data)
Out[29]:
DataFrames.DataFrame
In [30]:
using DataFrames
In [33]:
data_2 = rename(data, 1 => "province", 2 => "country")
Out[33]:

253 rows × 72 columns (omitted printing of 67 columns)

provincecountryLatLong1/22/20
String⍰StringFloat64Float64Int64
1missingAfghanistan33.065.00
2missingAlbania41.153320.16830
3missingAlgeria28.03391.65960
4missingAndorra42.50631.52180
5missingAngola-11.202717.87390
6missingAntigua and Barbuda17.0608-61.79640
7missingArgentina-38.4161-63.61670
8missingArmenia40.069145.03820
9Australian Capital TerritoryAustralia-35.4735149.0120
10New South WalesAustralia-33.8688151.2090
11Northern TerritoryAustralia-12.4634130.8460
12QueenslandAustralia-28.0167153.40
13South AustraliaAustralia-34.9285138.6010
14TasmaniaAustralia-41.4545145.9710
15VictoriaAustralia-37.8136144.9630
16Western AustraliaAustralia-31.9505115.8610
17missingAustria47.516214.55010
18missingAzerbaijan40.143147.57690
19missingBahamas25.0343-77.39630
20missingBahrain26.027550.550
21missingBangladesh23.68590.35630
22missingBarbados13.1939-59.54320
23missingBelarus53.709827.95340
24missingBelgium50.83334.00
25missingBenin9.30772.31580
26missingBhutan27.514290.43360
27missingBolivia-16.2902-63.58870
28missingBosnia and Herzegovina43.915917.67910
29missingBrazil-14.235-51.92530
30missingBrunei4.5353114.7280
In [34]:
rename!(data, 1 => "province", 2 => "country") # ! is convention: function *modifies* its argument in place
Out[34]:

253 rows × 72 columns (omitted printing of 67 columns)

provincecountryLatLong1/22/20
String⍰StringFloat64Float64Int64
1missingAfghanistan33.065.00
2missingAlbania41.153320.16830
3missingAlgeria28.03391.65960
4missingAndorra42.50631.52180
5missingAngola-11.202717.87390
6missingAntigua and Barbuda17.0608-61.79640
7missingArgentina-38.4161-63.61670
8missingArmenia40.069145.03820
9Australian Capital TerritoryAustralia-35.4735149.0120
10New South WalesAustralia-33.8688151.2090
11Northern TerritoryAustralia-12.4634130.8460
12QueenslandAustralia-28.0167153.40
13South AustraliaAustralia-34.9285138.6010
14TasmaniaAustralia-41.4545145.9710
15VictoriaAustralia-37.8136144.9630
16Western AustraliaAustralia-31.9505115.8610
17missingAustria47.516214.55010
18missingAzerbaijan40.143147.57690
19missingBahamas25.0343-77.39630
20missingBahrain26.027550.550
21missingBangladesh23.68590.35630
22missingBarbados13.1939-59.54320
23missingBelarus53.709827.95340
24missingBelgium50.83334.00
25missingBenin9.30772.31580
26missingBhutan27.514290.43360
27missingBolivia-16.2902-63.58870
28missingBosnia and Herzegovina43.915917.67910
29missingBrazil-14.235-51.92530
30missingBrunei4.5353114.7280
In [35]:
data
Out[35]:

253 rows × 72 columns (omitted printing of 67 columns)

provincecountryLatLong1/22/20
String⍰StringFloat64Float64Int64
1missingAfghanistan33.065.00
2missingAlbania41.153320.16830
3missingAlgeria28.03391.65960
4missingAndorra42.50631.52180
5missingAngola-11.202717.87390
6missingAntigua and Barbuda17.0608-61.79640
7missingArgentina-38.4161-63.61670
8missingArmenia40.069145.03820
9Australian Capital TerritoryAustralia-35.4735149.0120
10New South WalesAustralia-33.8688151.2090
11Northern TerritoryAustralia-12.4634130.8460
12QueenslandAustralia-28.0167153.40
13South AustraliaAustralia-34.9285138.6010
14TasmaniaAustralia-41.4545145.9710
15VictoriaAustralia-37.8136144.9630
16Western AustraliaAustralia-31.9505115.8610
17missingAustria47.516214.55010
18missingAzerbaijan40.143147.57690
19missingBahamas25.0343-77.39630
20missingBahrain26.027550.550
21missingBangladesh23.68590.35630
22missingBarbados13.1939-59.54320
23missingBelarus53.709827.95340
24missingBelgium50.83334.00
25missingBenin9.30772.31580
26missingBhutan27.514290.43360
27missingBolivia-16.2902-63.58870
28missingBosnia and Herzegovina43.915917.67910
29missingBrazil-14.235-51.92530
30missingBrunei4.5353114.7280
In [ ]:
DataFrames.rename!(...)

Ctrl-M, Y to switch to code cell

Ctrl-M, M to switch to markdown

Esc instead of Ctrl-M

Interact.jl: Simple interactive visualizations

In [37]:
using Interact

Unable to load WebIO. Please make sure WebIO works for your Jupyter client. For troubleshooting, please see the WebIO/IJulia documentation.

In [38]:
for i in 1:10
    @show i
end
i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
i = 7
i = 8
i = 9
i = 10
In [40]:
typeof(1:10)
Out[40]:
UnitRange{Int64}
In [41]:
collect(1:10)
Out[41]:
10-element Array{Int64,1}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
In [42]:
for i in 1:10
    println("i = ", i)
end
i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
i = 7
i = 8
i = 9
i = 10
In [43]:
for i in 1:10
    @show i
end
i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
i = 7
i = 8
i = 9
i = 10
In [44]:
for i in 1:10
    i
end
In [47]:
@manipulate for i in 1:10
    HTML(i^2)
end
Out[47]:
In [48]:
countries = data[2:5, 2]
Out[48]:
4-element Array{String,1}:
 "Albania"
 "Algeria"
 "Andorra"
 "Angola"
In [49]:
countries = data[1:end, 2]
Out[49]:
253-element Array{String,1}:
 "Afghanistan"
 "Albania"
 "Algeria"
 "Andorra"
 "Angola"
 "Antigua and Barbuda"
 "Argentina"
 "Armenia"
 "Australia"
 "Australia"
 "Australia"
 "Australia"
 "Australia"
 ⋮
 "West Bank and Gaza"
 "Guinea-Bissau"
 "Mali"
 "Saint Kitts and Nevis"
 "Canada"
 "Canada"
 "Kosovo"
 "Burma"
 "United Kingdom"
 "United Kingdom"
 "United Kingdom"
 "MS Zaandam"
In [52]:
countries = collect(data[:, 2])
Out[52]:
253-element Array{String,1}:
 "Afghanistan"
 "Albania"
 "Algeria"
 "Andorra"
 "Angola"
 "Antigua and Barbuda"
 "Argentina"
 "Armenia"
 "Australia"
 "Australia"
 "Australia"
 "Australia"
 "Australia"
 ⋮
 "West Bank and Gaza"
 "Guinea-Bissau"
 "Mali"
 "Saint Kitts and Nevis"
 "Canada"
 "Canada"
 "Kosovo"
 "Burma"
 "United Kingdom"
 "United Kingdom"
 "United Kingdom"
 "MS Zaandam"
In [53]:
unique_countries = unique(countries)
Out[53]:
177-element Array{String,1}:
 "Afghanistan"
 "Albania"
 "Algeria"
 "Andorra"
 "Angola"
 "Antigua and Barbuda"
 "Argentina"
 "Armenia"
 "Australia"
 "Austria"
 "Azerbaijan"
 "Bahamas"
 "Bahrain"
 ⋮
 "Syria"
 "Timor-Leste"
 "Belize"
 "Laos"
 "Libya"
 "West Bank and Gaza"
 "Guinea-Bissau"
 "Mali"
 "Saint Kitts and Nevis"
 "Kosovo"
 "Burma"
 "MS Zaandam"
In [54]:
@manipulate for i in 1:length(countries)
    countries[i]
end
Out[54]:

Julia has 1-based indexing: indices of vectors start at 1, not 0

In [60]:
@manipulate for i in 1:length(countries)
    data[i, 1:15]
end
Out[60]:

Extract data and plot

In [61]:
startswith("United", "U")
Out[61]:
true
In [62]:
startswith("David", "U")
Out[62]:
false

Array comprehension:

In [66]:
U_countries = [startswith(country, "U") for country in countries];
In [68]:
data[U_countries, :]
Out[68]:

16 rows × 72 columns (omitted printing of 66 columns)

provincecountryLatLong1/22/201/23/20
String⍰StringFloat64Float64Int64Int64
1missingUganda1.032.000
2missingUkraine48.379431.165600
3missingUnited Arab Emirates24.054.000
4BermudaUnited Kingdom32.3078-64.750500
5Cayman IslandsUnited Kingdom19.3133-81.254600
6Channel IslandsUnited Kingdom49.3723-2.364400
7GibraltarUnited Kingdom36.1408-5.353600
8Isle of ManUnited Kingdom54.2361-4.548100
9MontserratUnited Kingdom16.7425-62.187400
10missingUnited Kingdom55.3781-3.43600
11missingUruguay-32.5228-55.765800
12missingUS37.0902-95.712911
13missingUzbekistan41.377564.585300
14AnguillaUnited Kingdom18.2206-63.068600
15British Virgin IslandsUnited Kingdom18.4207-64.6400
16Turks and Caicos IslandsUnited Kingdom21.694-71.797900
In [69]:
countries == "US"
Out[69]:
false
In [71]:
countries .== "US"  
# . is "broadcasting": apply operation to each element of a vector
Out[71]:
253-element BitArray{1}:
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 ⋮
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
In [75]:
US_row = findfirst(countries .== "US")
Out[75]:
226
In [77]:
US_data_row = data[US_row, :]
Out[77]:

DataFrameRow (72 columns)

provincecountryLatLong1/22/201/23/201/24/201/25/201/26/20
String⍰StringFloat64Float64Int64Int64Int64Int64Int64
226missingUS37.0902-95.712911225
In [78]:
US_data = convert(Vector, US_data_row[5:end])
Out[78]:
68-element Array{Int64,1}:
      1
      1
      2
      2
      5
      5
      5
      5
      5
      7
      8
      8
     11
      ⋮
   7783
  13677
  19100
  25489
  33276
  43847
  53740
  65778
  83836
 101657
 121478
 140886
In [79]:
using Plots
┌ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]
└ @ Base loading.jl:1260
In [80]:
plot(US_data)
Out[80]:
0 20 40 60 0 2.50×10 4 5.00×10 4 7.50×10 4 1.00×10 5 1.25×10 5 y1
In [84]:
col_names = names(data)
Out[84]:
72-element Array{Symbol,1}:
 :province
 :country
 :Lat
 :Long
 Symbol("1/22/20")
 Symbol("1/23/20")
 Symbol("1/24/20")
 Symbol("1/25/20")
 Symbol("1/26/20")
 Symbol("1/27/20")
 Symbol("1/28/20")
 Symbol("1/29/20")
 Symbol("1/30/20")
 ⋮
 Symbol("3/18/20")
 Symbol("3/19/20")
 Symbol("3/20/20")
 Symbol("3/21/20")
 Symbol("3/22/20")
 Symbol("3/23/20")
 Symbol("3/24/20")
 Symbol("3/25/20")
 Symbol("3/26/20")
 Symbol("3/27/20")
 Symbol("3/28/20")
 Symbol("3/29/20")
In [90]:
date_strings = String.(names(data))[5:end];  # apply String function to each element

Parse: convert string representation into a Julia object:

In [91]:
date_strings[1]
Out[91]:
"1/22/20"
In [92]:
using Dates
In [94]:
format = Dates.DateFormat("d/m/Y")
Out[94]:
dateformat"d/m/Y"
In [95]:
parse(Date, date_strings[1], format)
ArgumentError: Month: 22 out of range (1:12)

Stacktrace:
 [1] Date(::Int64, ::Int64, ::Int64) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.4/Dates/src/types.jl:223
 [2] parse(::Type{Date}, ::String, ::DateFormat{Symbol("d/m/Y"),Tuple{Dates.DatePart{'d'},Dates.Delim{Char,1},Dates.DatePart{'m'},Dates.Delim{Char,1},Dates.DatePart{'Y'}}}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.4/Dates/src/parse.jl:285
 [3] top-level scope at In[95]:1
In [96]:
format = Dates.DateFormat("m/d/Y")
Out[96]:
dateformat"m/d/Y"
In [98]:
parse(Date, date_strings[1], format) + Year(2000)
Out[98]:
2020-01-22
In [99]:
dates = parse.(Date, date_strings, format) .+ Year(2000)
Out[99]:
68-element Array{Date,1}:
 2020-01-22
 2020-01-23
 2020-01-24
 2020-01-25
 2020-01-26
 2020-01-27
 2020-01-28
 2020-01-29
 2020-01-30
 2020-01-31
 2020-02-01
 2020-02-02
 2020-02-03
 ⋮
 2020-03-18
 2020-03-19
 2020-03-20
 2020-03-21
 2020-03-22
 2020-03-23
 2020-03-24
 2020-03-25
 2020-03-26
 2020-03-27
 2020-03-28
 2020-03-29
In [110]:
plot(dates, US_data, xticks=dates[1:5:end], xrotation=45, leg=:topleft, 
    label="US data", m=:o)

xlabel!("date")
ylabel!("confirmed cases in US")
title!("US confirmed COVID-19 cases")

# annotate!(20, US_data[end], text("US", :blue, :left))
Out[110]:
2020-01-22 2020-01-27 2020-02-01 2020-02-06 2020-02-11 2020-02-16 2020-02-21 2020-02-26 2020-03-02 2020-03-07 2020-03-12 2020-03-17 2020-03-22 2020-03-27 0 2.50×10 4 5.00×10 4 7.50×10 4 1.00×10 5 1.25×10 5 US confirmed COVID-19 cases date confirmed cases in US US data US
In [111]:
plot(dates, US_data, xticks=dates[1:5:end], xrotation=45, leg=:topleft, 
    label="US data", m=:o,
    yscale=:log10)

xlabel!("date")
ylabel!("confirmed cases in US")
title!("US confirmed COVID-19 cases")

# annotate!(20, US_data[end], text("US", :blue, :left))
Out[111]:
2020-01-22 2020-01-27 2020-02-01 2020-02-06 2020-02-11 2020-02-16 2020-02-21 2020-02-26 2020-03-02 2020-03-07 2020-03-12 2020-03-17 2020-03-22 2020-03-27 10 0 10 1 10 2 10 3 10 4 10 5 US confirmed COVID-19 cases date confirmed cases in US US data

Straight line on semi-log scale means exponential growth!

In [114]:
function f(country)
    return country * country
end
Out[114]:
f (generic function with 1 method)
In [115]:
f("US")
Out[115]:
"USUS"

plot! add new curve onto the same graph

In [ ]: