%load_ext autoreload
%autoreload 2
%matplotlib inline
The autoreload extension is already loaded. To reload it, use: %reload_ext autoreload
from sklearn.datasets import load_iris
import panel as pn
import pandas as pd
import hvplot.pandas
import holoviews as hv
import janitor
import pickle as pkl
import gzip
import numpy as np
from io import StringIO
from Bio import SeqIO
from uuid import uuid4
from pathlib import Path
from utils import molecular_weights, featurize_sequence_
from panel.interact import fixed
hv.extension("bokeh")
In this notebook, we will do a walkthrough behind the main ideas you'll need to build a Panel app quickly.
We will use a number of modern data visualization tools available for Panel, primarily holoviews and hvplot. In particular, I want to showcase hvplot, as it is intended to be a drop-in replacement (up to a certain point) for pandas' matplotlib-based .plot()
API.
The first thing we are going to build is an interactive visualization tool for the Iris dataset. As a minimal example, we will only concern ourselves with adding drop-down selection capabilities, but know that more can be done.
iris = load_iris()
X = iris['data']
y = iris['target']
names = iris['target_names']
mapping = {i:v for i, v in enumerate(iris['target_names'])}
df = pd.DataFrame(X)
df.columns = iris['feature_names']
df['flower_type'] = iris['target']
df['flower_type'] = df['flower_type'].apply(lambda x: mapping.get(x))
df.head(3)
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | flower_type | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
The first plot we are going to build is a scatterplot of pairs of the iris dataset features.
# Declare the plotting function.
# It should return the plot of interest.
def pairplot(df, x, y):
return df.hvplot.scatter(x=x, y=y, c="flower_type").opts(width=600, height=400)
Let's verify that the plot returns correctly.
pairplot(df, x=iris["feature_names"][0], y=iris["feature_names"][1])
Now, we're going to add widgets.
x = pn.widgets.Select(options=iris['feature_names'], value=iris["feature_names"][0])
y = pn.widgets.Select(options=iris['feature_names'], value=iris["feature_names"][1])
scatter = pn.interact(pairplot, df=fixed(df), x=x, y=y)
scatter
Now, we're going to add in some text as a preface to the plot, and combine them into a single logical unit; let's call them "tabs", for we'll be building an interface that has multiple tabs. The text and the plot should be arranged as a single column.
scatter_txt = pn.pane.Markdown("""
# Iris Dataset
Pick the x- and y-axes from the dropdown menus to explore how the three flower types differ from one another.
""")
scatter_tab = pn.Column(scatter_txt, scatter)
In the next two tabs we are going to build, this is going to showcase how to build in the ability to upload a file that contains a data point, and use a machine learning model to make predictions for that data point.
# Let's write the introduction first, since we're sure on what we're building.
intro_hiv = pn.pane.Markdown("""
# HIV Resistance Prediction
This shows how to write an app that accepts a file input
and returns a model prediction.
""")
# Also pre-declare a list of drugs. This will be handy later.
drugs = ['ATV', 'DRV', 'FPV', 'IDV', 'LPV', 'NFV', 'SQV', 'TPV']
The first tab we are going to build is the easier one - model performance for visualization purposes.
These scores were obtained by performing cross-validation scoring.
# Load scores to visualize.
with gzip.open("data/scores.pkl.gz", "rb") as f:
scores = pkl.load(f)
scores
{'ATV': array([0.20048292, 0.18764676, 0.28482497, 0.18413254, 0.0963426 ]), 'DRV': array([0.17326838, 0.18668307, 0.07524106, 0.1235008 , 0.10624229]), 'FPV': array([0.10610533, 0.13212618, 0.13680233, 0.17547875, 0.11057616]), 'IDV': array([0.10894821, 0.13180615, 0.09262525, 0.16391846, 0.09632168]), 'LPV': array([0.13758661, 0.13609392, 0.19091143, 0.13969725, 0.09384091]), 'NFV': array([0.14737556, 0.20635308, 0.11037221, 0.17550392, 0.11768701]), 'SQV': array([0.19770716, 0.18196959, 0.11561523, 0.34781876, 0.16124411]), 'TPV': array([0.24116757, 0.19185904, 0.15004696, 0.12898619, 0.04866918])}
# Convert this into a pandas dataframe.
model_performance = pd.DataFrame(scores)
model_performance.head(3)
ATV | DRV | FPV | IDV | LPV | NFV | SQV | TPV | |
---|---|---|---|---|---|---|---|---|
0 | 0.200483 | 0.173268 | 0.106105 | 0.108948 | 0.137587 | 0.147376 | 0.197707 | 0.241168 |
1 | 0.187647 | 0.186683 | 0.132126 | 0.131806 | 0.136094 | 0.206353 | 0.181970 | 0.191859 |
2 | 0.284825 | 0.075241 | 0.136802 | 0.092625 | 0.190911 | 0.110372 | 0.115615 | 0.150047 |
# Make the plot
perfplot = (
model_performance
.hvplot.box()
.opts(xlabel="drug", ylabel="mse (lower is better)", width=400)
)
# Add preface text
perftext = pn.pane.Markdown("""
# Can we trust these models?
For certain drugs, the random forest model that was trained
on the data show low error. For other drugs, error is higher.
We should trust the models that have lower error on the training data.
We used 5-fold cross-validation to measure model performance,
and report below the distribution of model performances.
""")
# Make a column of the text and plot.
perftab = pn.Column(perftext, perfplot)
In this section, we will use pre-trained models to predict drug resistance for a given protein sequence.
First, we are going to define some convenience functions.
def predict(model, seq):
"""Given a sequence, predict its drug resistance value."""
x = featurize_sequence_(seq).reshape(1, -1)
return model.predict(x)
def predict_uncertainty(model, seq, q=[25, 75]):
x = featurize_sequence_(seq).reshape(1, -1)
predrange = []
for estimator in model.estimators_:
predrange.append(estimator.predict(x))
minimum, maximum = np.percentile(predrange, q=q)
return minimum, maximum
def make_preds_df(preds):
"""Convenience function to make preds dataframe."""
return (
pd.DataFrame(preds)
.melt()
.rename_column("variable", "drug")
.rename_column("value", "resistance")
)
def make_preds_plot(preds_df):
"""Prediction plot for drug resistance."""
preds_plot = (
preds_df
.hvplot.scatter(x="drug", y="resistance", color='red')
.opts(ylabel="drug resistance (higher is more resistant)")
)
return preds_plot
Now, we are going to plot the distribution of drug resistances for context.
data = (
pd.read_csv("data/hiv-protease-data-expanded.csv", index_col=0)
.query("weight == 1.0")
.transform_column("sequence", lambda x: len(x), "seq_length")
.query("seq_length == 99")
.transform_column("sequence", featurize_sequence_, "features")
.transform_columns(drugs, np.log10)
)
distplot = (
data
.select_columns(drugs)
.hvplot.box().opts(xlabel="drug", ylabel="resistance", width=500, height=300)
)
distplot
Now, let's build the interface. We need a function that will loop over all of the models that we've saved and cached after training, and use those models to predict drug resistance.
fileinput = pn.widgets.FileInput()
distpane = pn.pane.Pane(distplot)
def predict_all_drug_resistances(event):
file = StringIO(event.new.decode("utf-8"))
seq = SeqIO.read(file, format='fasta')
preds = dict()
for drug in drugs:
with gzip.open(f"data/models/{drug}.pkl.gz", "rb") as f:
model = pkl.load(f)
preds[drug] = predict(model, seq)
preds_df = make_preds_df(preds)
preds_plot = make_preds_plot(preds_df)
distpane.object = (distplot * preds_plot).opts(show_legend=False)
# Make "predict_all_drug_resisistances" "watch" the "value" of "fileinput"
fileinput.param.watch(predict_all_drug_resistances, "value")
pn.Column(fileinput, distpane)
Finally, we are going to build the "resistance prediction" tab.
markdown = pn.pane.Markdown("""
# Predict Drug Resistance
Upload a FASTA file containing an HIV protease sequence
(99 amino acids long, no invalid amino acid characters)
to obtain predictions for the drug resistance value
of a that sequence.
If you need an example file,
you can find it [here](https://github.com/ericmjl/minimal-panel-app/raw/master/data/hiv-protease-consensus.txt)
For reference, all measured resistances
in the training set are also provided.
The predictions will be shown as red dots
overlaid on the box whisker plots
after you upload the file.
""")
resistance_tab = pn.Column(markdown, fileinput, distpane)
resistance_tab
We're now going to build the overall introduction tab, which is effectively like a landing page.
intro_txt = pn.pane.Markdown("""
# Minimal Panel Examplee
This is a minimal Panel example that shows you how to serve and deploy a Panel app.
Panel is a dashboarding toolkit that works inside and outside of Jupyter notebooks.
You can prototype your dashboard visualizations inside a Jupyter notebook,
and then choose how you want to serve it:
- As a standalone `.py` file
- Served using a Jupyter notebook
Click on the next tab to see a plot generated using Panel and hvPlot.
The source code for this project can be found [here](https://github.com/bjrnfrdnnd/minimal-panel-app).
""")
We can finally assemble all of the parts together into an interface! The key thing here is that there is composability: If all of the necessary code to produce a plot was abstracted away into a function, then we can compose multiple plots (i.e. multiple functions) together to build our app interface, with all of the code reusable if we need to re-order the front-facing app part.
tabs = dict()
tabs['Introduction'] = intro_txt
tabs['1. Iris'] = scatter_tab
tabs['2. Drug Resistance'] = resistance_tab
tabs['3. Drug Model Performance'] = perftab
def dict2tuple(d):
return tuple(zip(*zip(*d.items())))
app = pn.Tabs(*dict2tuple(tabs))
# TAKE NOTE!
# This line is extremely important. This is what Panel will recognize and make servable.
app.servable()
Now that we have the app interface built, we can serve it up using panel:
panel serve minimal-panel.ipynb --port 8866