This notebook demonstrates the use of the MSTICPy matrix visualization built using the Bokeh library.
You must have msticpy installed:
%pip install --upgrade msticpy
The matrix plot is designed to show interactions between two items stored in a pandas DataFrame in a x-y grid.
To take an example, if you have a DataFrame with source and destination IP addresses (for example, a firewall log), you can plot the source IPs on the y axis and destination IPs on the x axis. Where there is an event (row) that links a given source and destination the matrix plot will plot a circle.
By default the circle is proportional to the number of events containing a given source/destination (x and y).
The matrix plot also has the following variations:
# Imports
from msticpy.common.utility import check_py_version
MIN_REQ_PYTHON = (3,6)
check_py_version(MIN_REQ_PYTHON)
import pandas as pd
from msticpy import init_notebook
init_notebook(globals())
True
all_df = pd.read_csv(
"data/az_net_flows.csv",
index_col=0,
parse_dates=[
"TimeGenerated",
"FlowStartTime",
"FlowEndTime",
"FlowIntervalEndTime",
],
)
# Create some sample data to work with
net_df = (
all_df[["AllExtIPs", "L7Protocol", "TotalAllowedFlows"]]
.rename(columns={"AllExtIPs": "SourceIP"})
.sample(100)
)
def get_dest_ip(row):
dest_ip = None
while dest_ip is None or row.SourceIP == dest_ip:
dest_ip = net_df[~net_df["SourceIP"].str.startswith("10.")].sample(1)["SourceIP"].values[0]
return dest_ip
net_df["DestinationIP"] = net_df.apply(get_dest_ip, axis=1)
net_df.head(3)
SourceIP | L7Protocol | TotalAllowedFlows | DestinationIP | |
---|---|---|---|---|
690 | 20.38.98.100 | https | 1.0 | 65.55.44.109 |
544 | 13.67.143.117 | https | 1.0 | 13.71.172.130 |
957 | 65.55.163.76 | https | 5.0 | 13.65.107.32 |
The basic plot displays a circle at each interaction between the X and Y axes items. The size of the circle is proportional to the number of records/rows in which the X and Y parameter interact.
Here we are using MSTICPy pandas accessor to plot the graph directly from the DataFrame
data.mp_plot.matrix()
net_df.mp_plot.matrix(x="SourceIP", y="DestinationIP", title="IP Interaction")
The Bokeh graph is interactive and has the following features:
You can use sort
to sort both axes or sort_x
and sort_y
to individually sort the values.
The sort parameters take values "asc" (ascending), "desc" (descending), True
(ascending).
None
and False
produce no sorting.
net_df.mp_plot.matrix(
x="SourceIP",
y="DestinationIP",
title="IP Interaction",
sort_y="asc",
sort_x=False,
)
plot_matrix
function directly¶Supply the input DataFrame as the first parameter (or as named
parameter data
)
from msticpy.vis.matrix_plot import plot_matrix
plot_matrix(data=net_df, x="SourceIP", y="DestinationIP", title="IP Interaction")
Instead of a simple count of rows linking an X-Y pair of entities, you can use a numeric column in the input DataFrame to control the size of the plotted circle.
In this example, we're using the "TotalAllowedFlows" column.
all_df.mp_plot.matrix(
x="L7Protocol",
y="AllExtIPs",
value_col="TotalAllowedFlows",
title="External IP protocol flows",
sort="asc",
)
Note because of a few large values in the data many points are difficult to see in the previous plot. We can change this by plotting the log of the scalar values.
all_df.mp_plot.matrix(
x="L7Protocol",
y="AllExtIPs",
value_col="TotalAllowedFlows",
title="External IP protocol flows (log of size)",
log_size=True,
sort="asc",
)
Use the dist_count
parameter with the value_col
parameter
to display size based on number of distinct values in the value_col column.
The plot below plots the circle size in proportion to the number of distinct Layer 7 protocols used between the endpoints.
net_df.mp_plot.matrix(
x="SourceIP",
y="DestinationIP",
value_col="TotalAllowedFlows",
dist_count=True,
title="External IP protocol flows (distinct protocols)",
sort="asc",
max_label_font_size=9,
)
Where you want to highlight unusual interactions, we can plot the
inverse of the value_col
value or count of interactions using the invert=True
parameter.
This results in a plot with larger circles for rarer interactions.
net_df.mp_plot.matrix(
x="SourceIP",
y="DestinationIP",
value_col="TotalAllowedFlows",
title="External IP flows (rare flows == larger)",
invert=True,
sort="asc",
)
Where you do not care about any value associated with the interaction
and only want to see if there has been an interaction, you can use
the intersect
parameter