Monitoring The Progress of the COVID-19 Pandemic

A Julia Data Analysis Tutorial

By Daniel Lakeland

Lakeland Applied Sciences LLC

As a mathematical modeler and data analyst many of my friends have asked me questions about what is going on with the COVID pandemic. On my blog I've posted some graphs as PDFs with updates every few weeks, but it's more convenient to give my friends and family an executable Jupyter notebook where they can update to the latest data any time they want. Let's get started by grabbing the daily data for all states from the Covid Tracking Project

We'll use the Vegalite graphics library this time with the @vlplot macro.

In [1]:
using Queryverse
In [2]:
usdata = "https://covidtracking.com/api/v1/states/daily.csv"


uspopurl = "https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv"

statecodesurl ="https://www2.census.gov/geo/docs/reference/state.txt?#"

uspop = load(uspopurl) |> @filter(_.COUNTY == 0) |> @select(:STATE,:STNAME,:POPESTIMATE2019) |> DataFrame


statecodes = load(File(format"CSV",statecodesurl), delim='|') |> DataFrame



dat = DataFrame(load(usdata))


dat = join(dat,statecodes, on = :state => :STUSAB,kind=:left)

dat = join(dat,uspop, on = :STATE_NAME => :STNAME, kind=:left, makeunique=true)


display(first(dat,5))
display(first(uspop))


display(first(dat,5))
display(first(uspop))

5 rows × 44 columns (omitted printing of 37 columns)

datestatepositivenegativependinghospitalizedCurrentlyhospitalizedCumulative
Int64StringInt64⍰Int64⍰Int64⍰Int64⍰Int64⍰
120200707AK1184130236missing25missing
220200707AL45785415579missing10732961
320200707AR24512326143missing3691604
420200707AS0696missingmissingmissing
520200707AZ105094523703missing33565272

DataFrameRow (3 columns)

STATESTNAMEPOPESTIMATE2019
Int64StringInt64
11Alabama4903185

5 rows × 44 columns (omitted printing of 37 columns)

datestatepositivenegativependinghospitalizedCurrentlyhospitalizedCumulative
Int64StringInt64⍰Int64⍰Int64⍰Int64⍰Int64⍰
120200707AK1184130236missing25missing
220200707AL45785415579missing10732961
320200707AR24512326143missing3691604
420200707AS0696missingmissingmissing
520200707AZ105094523703missing33565272

DataFrameRow (3 columns)

STATESTNAMEPOPESTIMATE2019
Int64StringInt64
11Alabama4903185

Understanding the Data

The Covid Tracking Project aggregates data from all the states on various important measures. For the moment let's focus on daily positive tests, the ratio of total tests to positive tests, the number of hospitalized patients, and the number of deaths per day.

Let's figure out which columns those correspond to:

In [3]:
println(names(dat))

using Dates

function convdate(d::Int)
    return(Date(div(d,10000),div(mod(d,10000),100),mod(d,100)));
end

dat2 = dat |> @mutate(thedate=convdate(_.date)) |> DataFrame

allstates = unique(dat.state);
[:date, :state, :positive, :negative, :pending, :hospitalizedCurrently, :hospitalizedCumulative, :inIcuCurrently, :inIcuCumulative, :onVentilatorCurrently, :onVentilatorCumulative, :recovered, :dataQualityGrade, :lastUpdateEt, :dateModified, :checkTimeEt, :death, :hospitalized, :dateChecked, :totalTestsViral, :positiveTestsViral, :negativeTestsViral, :positiveCasesViral, :fips, :positiveIncrease, :negativeIncrease, :total, :totalTestResults, :totalTestResultsIncrease, :posNeg, :deathIncrease, :hospitalizedIncrease, :hash, :commercialScore, :negativeRegularScore, :negativeScore, :positiveScore, :score, :grade, :STATE, :STATE_NAME, :STATENS, :STATE_1, :POPESTIMATE2019]

Let's create a function that plots percentage positive, positive per day, and deaths per day for a given state...

In [4]:
function plotstate(df,state)
    dfstate = df |> @filter(_.state == state) |> 
        @mutate(testpct=_.positiveIncrease/(_.totalTestResultsIncrease+.1)) |>
        @select(:thedate,:state,:testpct,:positiveIncrease,:deathIncrease)|>DataFrame

    testing = @vlplot(width=300,layer=[],title="Testing in $state") +
        @vlplot(:point,x=:thedate,y={:testpct,axis={title="Percentage Positive"}}) + 
        @vlplot(transform=[{loess=:testpct,on=:thedate}],
                mark=:line,x=:thedate,y=:testpct)
    
    cases = @vlplot(width=300,layer=[],title="Cases in $state") + 
        @vlplot(mark={:point,filled=true},
                x=:thedate,y={:positiveIncrease,axis={title="Cases Per Day"}})+
    @vlplot(transform=[{loess=:positiveIncrease,on=:thedate,bandwidth=.2}],
            mark=:line,x=:thedate,y=:positiveIncrease)
    
    deaths = @vlplot(width=300,layer=[],title="Deaths in $state") + 
        @vlplot(:point,x=:thedate,y={:deathIncrease,axis={title="Deaths Per Day"}}) + 
        @vlplot(transform=[{loess=:deathIncrease,on=:thedate,bandwidth=.2}],
                mark=:line,x=:thedate,y=:deathIncrease)
    
    return(dfstate |> hcat(testing,cases,deaths))
    
end

#test output
#plotstate(dat2,"CA")
Out[4]:
plotstate (generic function with 1 method)

Plotting All The States:

In [5]:
for i in allstates
    display(plotstate(dat2,i))
end
AprilMayJuneJulythedate0.00.20.40.60.81.0Percentage PositiveTesting in AKAprilMayJuneJulythedate01020304050Cases Per DayCases in AKAprilMayJuneJulythedate0.00.51.01.52.0Deaths Per DayDeaths in AK
AprilMayJuneJulythedate0.00.20.40.60.81.0Percentage PositiveTesting in ALAprilMayJuneJulythedate05001,0001,500Cases Per DayCases in ALAprilMayJuneJulythedate0510152025Deaths Per DayDeaths in AL
AprilMayJuneJulythedate0.00.20.40.60.81.0Percentage PositiveTesting in ARAprilMayJuneJulythedate02004006008001,000Cases Per DayCases in ARAprilMayJuneJulythedate051015Deaths Per DayDeaths in AR
Mar 22Mar 29Apr 05Apr 12Apr 19Apr 26May 03May 10May 17May 24May 31Jun 07Jun 14Jun 21Jun 28Jul 05thedate0Percentage PositiveTesting in ASMar 22Mar 29Apr 05Apr 12Apr 19Apr 26May 03May 10May 17May 24May 31Jun 07Jun 14Jun 21Jun 28Jul 05thedate0Cases Per DayCases in ASMar 22Mar 29Apr 05Apr 12Apr 19Apr 26May 03May 10May 17May 24May 31Jun 07Jun 14Jun 21Jun 28Jul 05thedate0Deaths Per DayDeaths in AS
AprilMayJuneJulythedate0.00.20.40.60.81.0Percentage PositiveTesting in AZAprilMayJuneJulythedate01,0002,0003,0004,0005,000Cases Per DayCases in AZAprilMayJuneJulythedate020406080100120Deaths Per DayDeaths in AZ
AprilMayJuneJulythedate0.00.20.40.60.81.0Percentage PositiveTesting in CAAprilMayJuneJulythedate02,0004,0006,0008,00010,00012,000Cases Per DayCases in CAAprilMayJuneJulythedate020406080100120Deaths Per DayDeaths in CA
AprilMayJuneJulythedate0.00.20.40.60.8Percentage PositiveTesting in COAprilMayJuneJulythedate02004006008001,000Cases Per DayCases in COAprilMayJuneJulythedate-200-1000100Deaths Per DayDeaths in CO
AprilMayJuneJulythedate0.00.20.40.60.81.0Percentage Positive