popmon
introductory notebook¶This notebook contains examples of how to generate popmon
reports from a pandas DataFrame.
# (optional) Adjust the jupyter notebook style for easier navigation of the reports
from IPython.core.display import HTML, display
# Wider notebook
display(HTML("<style>.container { width:80% !important; }</style>"))
# Cells are higher by default
display(HTML("<style>div.output_scroll { height: 44em; }</style>"))
popmon
and load our dataset¶Install popmon (if not installed yet) in the current environment.
import sys
!"{sys.executable}" -m pip install -q popmon
Import pandas and popmon, load and example dataset provided by popmon and show the first few results.
import pandas as pd
import popmon
from popmon import resources
from popmon.config import Settings
df = pd.read_csv(resources.data("test.csv.gz"), parse_dates=["date"])
df.head()
report = df.pm_stability_report(
# Use the 'date' column as our time axis
time_axis="date",
# Create batches for every two weeks of data
time_width="2w",
# Select a subset of features
features=["date:age", "date:isActive", "date:eyeColor"],
)
report
You can change the report parameters without having to rerun the computational part of the pipeline using the regenerate
method. For example: a short (limited) report will be generated since extended_report
flag is set to False
. If a user wants to configure which statistics she/he wants to see, show_stats
argument has to be set accordingly.
Another option is to change the plot_hist_n
parameter to control the number of histograms being displayed per feature.
report_settings = Settings()
report_settings.report.extended_report = False
report_settings.report.section.histograms.plot_hist_n = 6
report.regenerate(settings=report_settings)
If the user would like to generate the report directly from histograms, then popmon also supports that. First, we generate histograms, (but we could load pre-generated histograms from a pickle or json file as well)
hists = df.pm_make_histograms(
time_axis="date",
time_width="2w",
features=["date:age", "date:gender", "date:isActive"],
)
list(hists.keys())
And then generate the report based on histograms:
report = popmon.stability_report(hists)
report