What this file does

Examines the relative contribution of various the four main SVI themes — socioeconomic status, household composition & disability, minority status & language, housing type and transportation — to vaccination rates.

Background reading: https://www.cdc.gov/mmwr/volumes/70/wr/mm7012e1.htm?s_cid=mm7012e1_w#contribAff "State and local jurisdictions should also consider analyzing SVI metrics at the level of the census tract." Main findings: 1) Socioeconomic factors have strongest effect on vaccine uptake 2) Education was associated with the greatest disparity

My findings are consistent with this, though my methodology differs slightly. Don't think I cant really go this far with the analysis.

Steps:

  • Join SVI with vaxx rate
  • Run correlations with overall SVI, themes
  • Run regression with four themes separately
  • Run multivariate regression with all four

Tools

In [1]:
#analysis tools
import pandas as pd
import plotly.express as px
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

#regression tools
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.sandbox.regression.predstd import wls_prediction_std

Read in data

In [2]:
#read in SVI data 
df_ct_svi = pd.read_csv('Connecticut.csv')
In [3]:
#isolate SVI theme variables
df_ct_svi_themes = df_ct_svi[['FIPS','RPL_THEME1','RPL_THEME2','RPL_THEME3','RPL_THEME4','RPL_THEMES']]
In [46]:
#when
df_vax_ct['DateUpdate'].value_counts()
Out[46]:
5/26/2021    831
Name: DateUpdate, dtype: int64
In [7]:
#CT vaccination data 
#df_vax_ct = pd.read_csv('COVID-19_Vaccinations_by_Census_Tract (5).csv')
df_vax_ct = pd.read_csv('COVP Coverage by Census Tract_  Ages 16 and Up (5).csv')
In [8]:
#looks like we're missing a Census tract in the overall
len(df_ct_svi_themes),len(df_vax_ct)
Out[8]:
(830, 831)
In [9]:
#prepped data after merge
df_final = df_vax_ct.merge(df_ct_svi_themes, left_on='GEOID10', right_on='FIPS')
len(df_final)
Out[9]:
830
In [10]:
#removing values for which the overall SVI index is suppressed
df_final_svi_not_suppressed = df_final[df_final['RPL_THEMES']!=-999]
len(df_final_svi_not_suppressed)
Out[10]:
827

Descriptive statistics

In [13]:
df_vax_ct.columns
Out[13]:
Index(['OBJECTID', 'GEOID10', 'CTTractID', 'DateUpdate', 'TractTown',
       'Cov_16Plus', 'Cov_16_44', 'Cov_45_64', 'Cov_65Plus', 'needvac_16_plus',
       'needvac_16_44', 'needvac_45_64', 'needvac_65_plus'],
      dtype='object')
In [14]:
#50% of Census tracts under the statewide coverage rate
# df_vax_ct['Sixteen_plus'].hist()
# df_vax_ct['Sixteen_plus'].median()

df_vax_ct['Cov_16Plus'].hist()
df_vax_ct['Cov_16Plus'].median()
Out[14]:
65.98694943
In [15]:
#much higher covg overall with just 65+
df_vax_ct['Cov_65Plus'].hist()
df_vax_ct['Cov_65Plus'].median()
Out[15]:
86.4142539
In [16]:
#SVI index value distribution
df_final_svi_not_suppressed['RPL_THEMES'].hist()
Out[16]:
<AxesSubplot:>
In [17]:
#socioeconomic factors
df_final_svi_not_suppressed['RPL_THEME1'].hist()
df_final_svi_not_suppressed['RPL_THEME1'].median()
Out[17]:
0.5
In [18]:
#minority and english language ability 
df_final_svi_not_suppressed['RPL_THEME3'].hist()
df_final_svi_not_suppressed['RPL_THEME3'].median()
Out[18]:
0.4994

Correlation analyses

  1. Overall SVI
In [19]:
#strong -ve correlation with overall; more vulnerable, lower vaxx rate
# df_final_svi_not_suppressed[df_final_svi_not_suppressed['RPL_THEMES']!=-999][['Sixteen_plus','RPL_THEMES']].corr()
df_final_svi_not_suppressed[df_final_svi_not_suppressed['RPL_THEMES']!=-999][['Cov_16Plus','RPL_THEMES']].corr()
Out[19]:
Cov_16Plus RPL_THEMES
Cov_16Plus 1.000000 -0.598524
RPL_THEMES -0.598524 1.000000
In [20]:
#plot to confirm
fig = px.scatter(df_final_svi_not_suppressed, x="RPL_THEMES", y="Cov_16Plus")
fig.show()