In this demo, we will be exploring how world developmental indicators are related to a country’s early effort in COVID-19 response.
To enable Lux, simply add import lux
along with your Pandas import statement.
import lux
import pandas as pd
Lux preserves the Pandas dataframe semantics -- which means that you can apply any command from Pandas's API to the dataframes in Lux and expect the same behavior. For example, we can load the Happy Planet Index (HPI) dataset via standard Pandas read_csv
command.
df = pd.read_csv("https://github.com/lux-org/lux-datasets/blob/master/data/hpi_full.csv?raw=True")
We can quickly get an overview of the dataframe, simply by print out the dataframe df
.
df
From the Pandas table view, we see that the dataframe contains country-level data on sustainability and well-being. By clicking on the Toggle button, you can now explore the data visually through Lux, you should see several tabs of visualizations recommended to you that includes scatterplots, bar charts, and maps. In Lux, we recommend visualizations that may be relevant or interesting to you across different actions, which are displayed as different tabs.
Lux is designed to be tightly integrated with Pandas and can be used as-is, without modifying your existing Pandas code. This means that you can seamlessly transition from doing data cleaning and transformation to visualizing your dataframes with no effort. The goal of this section is largely to demonstrate how Lux can help you visualize your dataframe in a realistic scenario that involves lots of complex data cleaning and transformation
Note: If you're short on time, you can quickly execute these cells and skip to the next section.
# We add an additional feature column, describing whether the country is one of the G10 nations
df["G10"] = df["Country"].isin(["Belgium","Canada","France","Germany","Italy","Japan","Netherlands","United Kingdom","Switzerland","Sweden","United States of America"])
We drop the Inequality Adjusted measures since they are clearly correlated with each other, also dropping HPI Rank and just keeping Happy Planet Index.
df = df[df.columns.drop(list(df.filter(regex='IneqAdj'))+["HPIRank"])]
df
Now after dropping these columns, the correlations are a bit more realistic.
The Country
column needs to be assigned to a code that is easier to work with later on. So we load in countries dataset that contains the ISO-3 country code and information such as currency, language, and geography.
countries = pd.read_csv("https://github.com/lux-org/lux-datasets/blob/master/data/countries.csv?raw=True")
countries["Country"]=countries["name"].apply(lambda x:x.split(",")[0])
countries.loc[countries["Country"]=='United States',"Country"] = 'United States of America'
countries
# The countries dataset has some additional features column that we can add in
countries["landlocked"] = countries["landlocked"].fillna("False").replace(1,"True")
countries["NumOfficialLanguages"]=countries.languages.str.count(",")+1
countries["NumBorderingCountries"]=countries.borders.str.count(",")+1
countries["NumBorderingCountries"]=countries["NumBorderingCountries"].fillna(0)
countries = countries[['Country','cca3', 'landlocked', "NumOfficialLanguages", "NumBorderingCountries",'area']]
# Combining the HPI information to get ISO-3 code
df = df.merge(countries)
df = df.rename(index=str, columns={"SubRegion":"Region","subregion":"SubRegion"})
df["Region"] = df.Region.replace("Middle East and North Africa","Middle East")
df.area = df.area.astype(int)
# Ensure well-formatted country names based on: https://github.com/deactivated/python-iso3166/blob/master/iso3166/__init__.py
df.loc[df.Country=="Russia","Country"]="Russian Federation"
df.loc[df["Country"]=="Czech Republic","Country"]="Czechia"
df.loc[df.Country=="DR Congo","Country"]="Congo, Democratic Republic of the"#not working?
df.loc[df.Country=="Bolivia","Country"]="Bolivia, Plurinational State of"
df.loc[df["Country"]=="Cote d'Ivoire","Country"]="Côte d'Ivoire"
After all this data cleaning, we print out the combined dataframe again to look at the visualizations and patterns in the dataset.
df
By inspecting the Correlation
tab, we learn that there is a negative correlation between AvrgLifeExpectancy
and Inequality
. In other words, countries with higher levels of inequality also have a lower average life expectancy. We can also look at other tabs, which show the Distribution of quantitative attributes and the Occurrence of categorical attributes.
Let's say that we want to investigate whether any country-level characteristics explain the observed negative correlation between inequality and life expectancy. Beyond the basic recommendations, you can further specify your analysis intent, i.e., the data attributes and values that you are interested in visualizing.
We can do this by specifying our analysis intent to Lux via df.intent
:
df.intent = ["Inequality","AvrgLifeExpectancy"]
Upon printing the dataframe again, Lux leverages the analysis intent to steer the recommendations towards what the user might be interested in.
df
By looking at the colored scatterplots in the Enhance
tab, we find that most G10 industrialized countries are on the upper left quadrant on the scatterplot (low inequality, high life expectancy). In the breakdown by Region, we observe that countries in Sub-Saharan Africa (yellow points) tend to be on the bottom right, with lower life expectancy and higher inequality.
ℹ️ Check out this tutorial to learn more about how to specify intent in Lux.
Continuing our analysis, we are now interested in how these country-level metrics related to a country's COVID intervention strategy and response. We download the COVID pandemic policy dataset dataset.
covid = pd.read_csv("https://github.com/lux-org/lux-datasets/blob/master/data/covid-stringency.csv?raw=True")
covid= covid.rename(columns={"stringency_index":"stringency"})
covid['Day'] = pd.to_datetime(covid['Day'], format="%Y-%m-%d")
When we print out the column Day
, we see that this record spans from stringency tracked daily from January 2020 to March 2021. We also see the temporal patterns across year, month, and day of the week.
covid["Day"]
Our COVID dataset contains a column stringency
, which is a number from 0-100, with 100 being the highest level of responses (i.e., enacting measures, such as travel bans, stay-at-home orders, school closure, etc.).
We want to plot the distribution of stringency
by creating a Vis
object in Lux.
To generate a Vis
, users should specify their intent (i.e., what columns/values do you want to plot, in this case: ['stringency']
) and a source dataframe (in this case : covid
).
from lux.vis.Vis import Vis
Vis(["stringency"],covid)
When we print the dataframe, we see that stringency distribution is at the medium to high levels, around with the distribution peaking at a stringency of 60-80.
We are only interested in the records on March 11,2020, which is the first day WHO announce COVID as pandemic. By filtering to the records only on this day, the stringency score becomes a proxy that measures the strictness of the country's early intervention efforts.
early = covid[covid["Day"]=="2020-03-11"]
Vis(["stringency"],early)
Somewhat interestingly, we see that during this early date, the stringency is heavily right-skewed, suggesting that most countries didn't enact strict measures in the early days of the pandemic.
Lux is built on the principle that users should always be able to visualize and explore anything they specify, without having to think about how the visualization should look like. The programmatic generation of Vis
provides a quick and dirty way to ask specific questions about the dataframe without having to write a lot of code.
ℹ️ In addition to Vis
, you can also specify a list of visualizations to browse through via a VisList
. Check out this tutorial to learn more about how to create Vis and VisList in Lux.
We now join the countries dataframe df
with the filtered early
COVID dataframe:
result = early.merge(df,left_on=["Entity","Code"],right_on=["Country","cca3"])
result.intent = ["stringency"]
result
When we set the intent as stringency
, we see that China and Italy have the strictest measures (corresponding to dark blue on the geo map, among a sea of light yellow and green).
We want to discern these country-level differences further, so we divide the stringency index into a categorical variable stringency_level
. We use pd.qcut to ensure that there is equal number records in the Low
and High
bins.
result["stringency_level"] = pd.qcut(result["stringency"],2,labels=["Low","High"])
result = result.drop(columns=["stringency"])
With the modified dataframe, we revisit the negative correlation that we observed previously by setting the intent as average life expectancy and inequality again. The result is similar to what we saw before, with one visualization showing the breakdown by stringency_level
.
result.intent = ["Inequality","AvrgLifeExpectancy"]
result
We see a strong separation showing how stricter countries (blue) corresponded to countries with higher life expectancy and lower levels of inequality. This visualization indicates that these countries could possibly have a more well-developed public health infrastructure that promoted the early pandemic response. However, we observe three outliers that seem to defy this trend.
When we filter to these dataframe records, we find that these countries correspond to Afghanistan, Pakistan, and Rwanda—countries that were praised for their early pandemic response despite limited resources.
result[(result["Inequality"]>0.35)&(result["stringency_level"]=="High")]
To download this visualization insight and share with others, we can click on the visualization in the Lux view above and the button.
result
This exports the visualization from the widget to a Vis
object. We can access the exported Vis
object via the exported
property and print it as code.
result.exported
print(result.exported[0].to_code("altair"))
We can copy-and-paste the output Altair code into a separate cell. Then let's tweak the plotting code a bit before sharing this insight with our colleagues.
import altair as alt
c = "#e7298a"
chart = alt.Chart(result,title="Check out this cool insight!").mark_circle().encode(
x=alt.X('Inequality',scale=alt.Scale(domain=(0.04, 0.51)),type='quantitative', axis=alt.Axis(title='Inequality')),
y=alt.Y('AvrgLifeExpectancy',scale=alt.Scale(domain=(48.9, 83.6)),type='quantitative', axis=alt.Axis(title='AvrgLifeExpectancy'))
)
highlight = result[(result["Inequality"]>0.35)&(result["stringency_level"]=="High")]
hchart = alt.Chart(highlight).mark_point(color=c,size=50,shape="cross").encode(
x=alt.X('Inequality',scale=alt.Scale(domain=(0.04, 0.51)),type='quantitative', axis=alt.Axis(title='Inequality')),
y=alt.Y('AvrgLifeExpectancy',scale=alt.Scale(domain=(48.9, 83.6)),type='quantitative', axis=alt.Axis(title='AvrgLifeExpectancy')),
)
text = alt.Chart(highlight).mark_text(color=c,dx=-35,dy=0,fontWeight=800).encode(
x=alt.X('Inequality',scale=alt.Scale(domain=(0.04, 0.51)),type='quantitative', axis=alt.Axis(title='Inequality')),
y=alt.Y('AvrgLifeExpectancy',scale=alt.Scale(domain=(48.9, 83.6)),type='quantitative', axis=alt.Axis(title='AvrgLifeExpectancy')),
text=alt.Text('Country')
)
chart = chart.encode(color=alt.Color('stringency_level',type='nominal'))
chart = chart.properties(width=160,height=150)
(chart + hchart + text).configure_title(color=c)
ℹ️ Check out this tutorial to learn more about exporting visualizations in Lux.
To get started, Lux can be installed through PyPI.
pip install lux-api
To use Lux in Jupyter notebook or VSCode, activate the notebook extension:
jupyter nbextension install --py luxwidget
jupyter nbextension enable --py luxwidget
To use Lux in Jupyter Lab, activate the lab extension:
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install luxwidget
If you encounter issues with the installation, please refer to this page to troubleshoot the installation.