# What this file does¶

Examines the relative contribution of various the four main SVI themes — socioeconomic status, household composition & disability, minority status & language, housing type and transportation — to vaccination rates.

Background reading: https://www.cdc.gov/mmwr/volumes/70/wr/mm7012e1.htm?s_cid=mm7012e1_w#contribAff "State and local jurisdictions should also consider analyzing SVI metrics at the level of the census tract." Main findings: 1) Socioeconomic factors have strongest effect on vaccine uptake 2) Education was associated with the greatest disparity

My findings are consistent with this, though my methodology differs slightly. Don't think I cant really go this far with the analysis.

Steps:

• Join SVI with vaxx rate
• Run correlations with overall SVI, themes
• Run regression with four themes separately
• Run multivariate regression with all four

# Tools¶

In [1]:
#analysis tools
import pandas as pd
import plotly.express as px
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

#regression tools
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.sandbox.regression.predstd import wls_prediction_std


In [2]:
#read in SVI data

In [3]:
#isolate SVI theme variables
df_ct_svi_themes = df_ct_svi[['FIPS','RPL_THEME1','RPL_THEME2','RPL_THEME3','RPL_THEME4','RPL_THEMES']]

In [46]:
#when
df_vax_ct['DateUpdate'].value_counts()

Out[46]:
5/26/2021    831
Name: DateUpdate, dtype: int64
In [7]:
#CT vaccination data
df_vax_ct = pd.read_csv('COVP Coverage by Census Tract_  Ages 16 and Up (5).csv')

In [8]:
#looks like we're missing a Census tract in the overall
len(df_ct_svi_themes),len(df_vax_ct)

Out[8]:
(830, 831)
In [9]:
#prepped data after merge
df_final = df_vax_ct.merge(df_ct_svi_themes, left_on='GEOID10', right_on='FIPS')
len(df_final)

Out[9]:
830
In [10]:
#removing values for which the overall SVI index is suppressed
df_final_svi_not_suppressed = df_final[df_final['RPL_THEMES']!=-999]
len(df_final_svi_not_suppressed)

Out[10]:
827

# Descriptive statistics¶

In [13]:
df_vax_ct.columns

Out[13]:
Index(['OBJECTID', 'GEOID10', 'CTTractID', 'DateUpdate', 'TractTown',
'Cov_16Plus', 'Cov_16_44', 'Cov_45_64', 'Cov_65Plus', 'needvac_16_plus',
'needvac_16_44', 'needvac_45_64', 'needvac_65_plus'],
dtype='object')
In [14]:
#50% of Census tracts under the statewide coverage rate
# df_vax_ct['Sixteen_plus'].hist()
# df_vax_ct['Sixteen_plus'].median()

df_vax_ct['Cov_16Plus'].hist()
df_vax_ct['Cov_16Plus'].median()

Out[14]:
65.98694943
In [15]:
#much higher covg overall with just 65+
df_vax_ct['Cov_65Plus'].hist()
df_vax_ct['Cov_65Plus'].median()

Out[15]:
86.4142539
In [16]:
#SVI index value distribution
df_final_svi_not_suppressed['RPL_THEMES'].hist()

Out[16]:
<AxesSubplot:>
In [17]:
#socioeconomic factors
df_final_svi_not_suppressed['RPL_THEME1'].hist()
df_final_svi_not_suppressed['RPL_THEME1'].median()

Out[17]:
0.5
In [18]:
#minority and english language ability
df_final_svi_not_suppressed['RPL_THEME3'].hist()
df_final_svi_not_suppressed['RPL_THEME3'].median()

Out[18]:
0.4994

# Correlation analyses¶

1. Overall SVI
In [19]:
#strong -ve correlation with overall; more vulnerable, lower vaxx rate
# df_final_svi_not_suppressed[df_final_svi_not_suppressed['RPL_THEMES']!=-999][['Sixteen_plus','RPL_THEMES']].corr()
df_final_svi_not_suppressed[df_final_svi_not_suppressed['RPL_THEMES']!=-999][['Cov_16Plus','RPL_THEMES']].corr()

Out[19]:
Cov_16Plus RPL_THEMES
Cov_16Plus 1.000000 -0.598524
RPL_THEMES -0.598524 1.000000
In [20]:
#plot to confirm
fig = px.scatter(df_final_svi_not_suppressed, x="RPL_THEMES", y="Cov_16Plus")
fig.show()