This notebook demonstrates the use of the timeline displays built using the Bokeh library.
You must have msticpy installed:
%pip install --upgrade msticpy
There are two display types:
# Imports
import sys
import warnings
from msticpy.common.utility import check_py_version
MIN_REQ_PYTHON = (3,6)
check_py_version(MIN_REQ_PYTHON)
import ipywidgets as widgets
import pandas as pd
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 50)
pd.set_option('display.max_colwidth', 100)
from msticpy.vis.timeline import display_timeline
WIDGET_DEFAULTS = {'layout': widgets.Layout(width='95%'),
'style': {'description_width': 'initial'}}
processes_on_host = pd.read_csv(
"data/processes_on_host.csv",
parse_dates=["TimeGenerated"],
infer_datetime_format=True,
index_col=0
);
# At a minimum we need to pass a dataframe with timestamp column
# (defaults to TimeGenerated)
display_timeline(processes_on_host)
The Bokeh graph is interactive and has the following features:
Additionally an interactive timeline navigation bar is displayed below the main graph. You can change the timespan shown on the main graph by dragging or resizing the selected area on this navigation bar.
Note:
source_columns
parameter explicitly to have
the hover tooltips populated correctly.display_timeline
also takes a number of optional parameters that give you more flexibility to show multiple data series and change the way the graph appears.
The majority of these parameters are optional so don't be too overwhelmed by them.
help(display_timeline)
Help on function display_timeline in module msticpy.vis.timeline: display_timeline(data: Union[pandas.core.frame.DataFrame, dict], time_column: str = 'TimeGenerated', source_columns: Optional[List[str]] = None, **kwargs) -> bokeh.models.layouts.LayoutDOM Display a timeline of events. Parameters ---------- data : Union[dict, pd.DataFrame] Either dict of data sets to plot on the timeline with the following structure:: Key (str) - Name of data set to be displayed in legend Value (Dict[str, Any]) - containing: data (pd.DataFrame) - Data to plot time_column (str, optional) - Name of the timestamp column source_columns (list[str], optional) - source columns to use in tooltips color (str, optional) - color of data points for this data size (int) - size of plotted event glyphs If any of the last values are omitted, they default to the values supplied as parameters to the function (see below) Or DataFrame as a single data set or grouped into individual plot series using the `group_by` parameter time_column : str, optional Name of the timestamp column (the default is 'TimeGenerated') source_columns : Optional[List[str]] List of default source columns to use in tooltips (the default is None) Other Parameters ---------------- title : str, optional Title to display (the default is None) alert : SecurityAlert, optional Add a reference line/label using the alert time (the default is None) ref_event : Any, optional Add a reference line/label using the alert time (the default is None) ref_time : datetime, optional Add a reference line/label using `ref_time` (the default is None) group_by : str (where `data` is a DataFrame) The column to group timelines on size : Union[int, str] The size of the event glyph. If a string the size is taken as a column in the input data. If an integer, this is used as the fixed size. legend: str, optional "left", "right", "inline" or "none" (the default is to show a legend when plotting multiple series and not to show one when plotting a single series) yaxis : bool, optional Whether to show the yaxis and labels (default is False) ygrid : bool, optional Whether to show the yaxis grid (default is False) xgrid : bool, optional Whether to show the xaxis grid (default is True) range_tool : bool, optional Show the the range slider tool (default is True) height : int, optional The height of the plot figure (the default is auto-calculated height) width : int, optional The width of the plot figure (the default is 900) color : str Default series color (default is "navy") overlay_data : pd.DataFrame: A second dataframe to plot as a different series. overlay_color : str Overlay series color (default is "green") hide : bool, optional If True, create but do not display the plot. By default, False. ref_events : pd.DataFrame, optional Add references line/label using the event times in the dataframe. (the default is None) ref_time_col : str, optional Add references line/label using the this column in `ref_events` for the time value (x-axis). (this defaults the value of the `time_column` parameter or 'TimeGenerated' `time_column` is None) ref_col : str, optional The column name to use for the label from `ref_events` (the default is None) ref_times : List[Tuple[datetime, str]], optional Add one or more reference line/label using (the default is None) Returns ------- LayoutDOM The bokeh plot figure.
display_timeline(
processes_on_host,
group_by="Account",
source_columns=["NewProcessName", "ParentProcessName"],
legend="left",
);
We can use the group_by parameter to specify a column on which to split individually plotted series.
Specifying a legend, we can see the value of each series group. The legend is interactive - click on a series name to
hide/show the data. The legend can be placed inside of the chart (legend="inline"
) or to the left or right.
Alternatively we can enable the yaxis - although this is not guaranteed to show all values of the groups.
Note:
source_columns
parameter explicitly to have the hover tooltips populated correctly.display_timeline(
processes_on_host,
group_by="Account",
source_columns=["NewProcessName", "ParentProcessName"],
legend="none",
yaxis=True,
ygrid=True,
);
We've implemented the timeline plotting functions
as pandas accessors so you can plot directly from the DataFrame
using mp_plot.timeline()
.
All of the parameters used in the standalone function are available in the pandas accessor functions.
host_logons = pd.read_csv(
"data/host_logons.csv",
parse_dates=["TimeGenerated"],
infer_datetime_format=True,
index_col=0,
)
host_logons.mp_plot.timeline(
title="Logons by Account name",
group_by="Account",
source_columns=["Account", "TargetLogonId", "LogonType"],
legend="left",
height=200,
)
host_logons.mp_plot.timeline(
title="Logons by logon type",
group_by="LogonType",
source_columns=["Account", "TargetLogonId", "LogonType"],
legend="left",
height=200,
range_tool=False,
ygrid=True,
);
You can annotate your timeline with one or more reference markers. These can be supplied as timestamped events in a DataFrame or a list of datetime/label pairs.
To use a DataFrame, pass this as the ref_events
:
ref_col
parameterref_time_col
To use a list of times, use the ref_times
parameter. This should be a list of tuples of
E.g. ref_times=[(date1, "item1"), (date2, "item2")...]
You can use either ref_events
or ref_times
with a single row or list entry.
alerts = processes_on_host.sample(3)
display_timeline(
host_logons,
title="Processes with marker",
group_by="Account",
source_columns=["Account", "TargetLogonId", "LogonType"],
ref_events=alerts,
ref_col="SubjectUserName",
legend="left",
ygrid=True,
);
For a single reference point you can also use alert
, ref_event
or ref_time
although these are now deprecated in
favor of ref_events
and ref_times
.
Use ref_event
(note: this is different from ref_events
)
fake_alert = processes_on_host.sample().iloc[0]
display_timeline(
host_logons,
title="Processes with marker",
group_by="LogonType",
source_columns=["Account", "TargetLogonId", "LogonType"],
alert=fake_alert,
legend="left",
);
When you want to plot data sets with different schema on the same plot it is difficult to put them in a single DataFrame.
To do this we need to assemble the different data sets into a dictionary and pass that to the display_timeline
The dictionary has this format:
Key: str Name of data set to be displayed in legend
Value: dict, the value holds the settings for each data series:
data: pd.DataFrame
Data to plot
time_column: str, optional
Name of the timestamp column
(defaults to `time_column` function parameter)
source_columns: list[str], optional
List of source columns to use in tooltips
(defaults to `source_columns` function parameter)
color: str, optional
Color of datapoints for this data
(defaults to autogenerating colors)
procs_and_logons = {
"Processes": {
"data": processes_on_host,
"source_columns": ["NewProcessName", "Account"],
},
"Logons": {
"data": host_logons,
"source_columns": ["Account", "TargetLogonId", "LogonType"],
},
}
display_timeline(
data=procs_and_logons, title="Logons and Processes", legend="left", yaxis=False
);
Often you may want to see a scalar value plotted with the series.
The first example below uses the pandas mp_plot.timeline_values()
accessor
to plot network flow data using the total flows recorded between
a pair of IP addresses.
You can also import and use display_timeline_values
from
msticpy.vis.timeline_values
. This is shown in later examples
Note that the majority of parameters are the same as display_timeline
but
include a mandatory value_col
parameter which indicates which value
you want to plot on the y (vertical) axis.
(this can also be specified as y
)
from msticpy.vis.timeline import display_timeline_values
az_net_flows_df = pd.read_csv(
"data/az_net_flows.csv",
parse_dates=["TimeGenerated", "FlowStartTime", "FlowEndTime"],
infer_datetime_format=True,
index_col=0,
)
az_net_flows_df.mp_plot.timeline_values(
group_by="L7Protocol",
source_columns=[
"FlowType",
"AllExtIPs",
"L7Protocol",
"FlowDirection",
"TotalAllowedFlows",
],
time_column="FlowStartTime",
value_column="TotalAllowedFlows",
legend="right",
height=500,
);
By default the plot uses vertical bars show the values but you can use any combination of vbar, circle and line, using the kind
parameter. You specify the plot types as a list of strings (all lowercase).
Notes
flow_plot = display_timeline_values(
data=az_net_flows_df,
group_by="L7Protocol",
source_columns=[
"FlowType",
"AllExtIPs",
"L7Protocol",
"FlowDirection",
"TotalAllowedFlows",
],
time_column="FlowStartTime",
value_column="TotalAllowedFlows",
legend="right",
height=500,
kind=["vbar", "circle"],
);
display_timeline_values(
data=az_net_flows_df[az_net_flows_df["L7Protocol"] == "http"],
group_by="L7Protocol",
title="Line plot can be misleading",
source_columns=[
"FlowType",
"AllExtIPs",
"L7Protocol",
"FlowDirection",
"TotalAllowedFlows",
],
time_column="FlowStartTime",
value_column="TotalAllowedFlows",
legend="right",
height=300,
kind=["line", "circle"],
range_tool=False,
)
display_timeline_values(
data=az_net_flows_df[az_net_flows_df["L7Protocol"] == "http"],
group_by="L7Protocol",
title="Vbar and circle show zero gaps in data",
source_columns=[
"FlowType",
"AllExtIPs",
"L7Protocol",
"FlowDirection",
"TotalAllowedFlows",
],
time_column="FlowStartTime",
value_column="TotalAllowedFlows",
legend="right",
height=300,
kind=["vbar", "circle"],
range_tool=False,
);
help(display_timeline_values)
Help on function display_timeline_values in module msticpy.vis.timeline_values: display_timeline_values(data: pandas.core.frame.DataFrame, value_column: str = None, time_column: str = 'TimeGenerated', source_columns: list = None, **kwargs) -> bokeh.models.layouts.LayoutDOM Display a timeline of events. Parameters ---------- data : pd.DataFrame DataFrame as a single data set or grouped into individual plot series using the `group_by` parameter time_column : str, optional Name of the timestamp column (the default is 'TimeGenerated') value_column : str The column name holding the value to plot vertically source_columns : list, optional List of default source columns to use in tooltips (the default is None) Other Parameters ---------------- x : str, optional alias of `time_column` y : str, optional alias of `value_column` value_col : str, optional alias of `value_column` title : str, optional Title to display (the default is None) ref_event : Any, optional Add a reference line/label using the alert time (the default is None) ref_time : datetime, optional Add a reference line/label using `ref_time` (the default is None) ref_label : str, optional A label for the `ref_event` or `ref_time` reference item group_by : str (where `data` is a DataFrame) The column to group timelines on legend: str, optional "left", "right", "inline" or "none" (the default is to show a legend when plotting multiple series and not to show one when plotting a single series) yaxis : bool, optional Whether to show the yaxis and labels range_tool : bool, optional Show the the range slider tool (default is True) height : int, optional The height of the plot figure (the default is auto-calculated height) width : int, optional The width of the plot figure (the default is 900) color : str Default series color (default is "navy"). This is overridden by automatic color assignments if plotting a grouped chart kind : Union[str, List[str]] one or more glyph types to plot., optional Supported types are "circle", "line" and "vbar" (default is "vbar") hide : bool, optional If True, create but do not display the plot. By default, False. ref_events : pd.DataFrame, optional Add references line/label using the event times in the dataframe. (the default is None) ref_time_col : str, optional Add references line/label using the this column in `ref_events` for the time value (x-axis). (this defaults the value of the `time_column` parameter or 'TimeGenerated' `time_column` is None) ref_col : str, optional The column name to use for the label from `ref_events` (the default is None) ref_times : List[Tuple[datetime, str]], optional Add one or more reference line/label using (the default is None) Returns ------- LayoutDOM The bokeh plot figure.
Sometimes it's useful to be able to group data and see the start and ending activity over a period. The timeline durations plot gives you that option. It creates bands for the start and ending duration of each group, as well as the locations of the individual events.
Note, that unlike other timeline controls you must specify a
group_by
parameter. This defines the way that the data is grouped
before calculating the start and end of the events within that group.
group_by
can be a single column or a list of columns.
Durations are shown using boxes with individual events superimposed (as diamonds).
from msticpy.vis.timeline_duration import display_timeline_duration
display_timeline_duration(
host_logons,
group_by="Account",
ref_events=host_logons.sample(3),
ref_col="TargetUserName",
);
az_net_flows_df.mp_plot.timeline_duration(
group_by=["SrcIP", "DestIP", "L7Protocol"]
)
To use bokeh.io image export functions you need selenium, phantomjs and pillow installed:
conda install -c bokeh selenium phantomjs pillow
or
pip install selenium pillow
npm install -g phantomjs-prebuilt
For phantomjs see https://phantomjs.org/download.html.
Once the prerequisites are installed you can create a plot and save the return value to a variable.
Then export the plot using export_png
function.
from bokeh.io import export_png
from IPython.display import Image
# Create a plot
flow_plot = nbdisplay.display_timeline_values(data=az_net_flows_df,
group_by="L7Protocol",
source_columns=["FlowType",
"AllExtIPs",
"L7Protocol",
"FlowDirection",
"TotalAllowedFlows"],
time_column="FlowStartTime",
y="TotalAllowedFlows",
legend="right",
height=500,
kind=["vbar", "circle"]
);
# Export
file_name = "plot.png"
export_png(flow_plot, filename=file_name)
# Read it and show it
display(Markdown(f"## Here is our saved plot: {file_name}"))
Image(filename=file_name)