MSTICPy has a lot of functionality distributed across many classes and modules. However, there is no simple way to discover where these functions are and what types of data the function is relevant to.
Pivot functions bring this functionality together grouped around Entities.
Entities are representations real-world objects found commonly in CyberSec investigations. Some examples are: IpAddress, Host, Account, URL
>>> IpAddress.util.ip_type(ip_str="157.53.1.1"))
ip result
157.53.1.1 Public
>>> IpAddress.util.whois("157.53.1.1"))
asn asn_cidr asn_country_code asn_date asn_description asn_registry nets nir query raw raw_referral referral
NA NA US 2015-04-01 NA arin [{'cidr': '157.53.0.0/16', 'name': 'NETACTUATE-MDN-04', 'handle': 'NET-157-53-0-0-1', 'range': '... None 157.53.1.1 None None None
>>> IpAddress.util.geoloc(value="157.53.1.1"))
CountryCode CountryName State City Longitude Latitude Asn edges Type AdditionalData IpAddress
US United States None None -97.822 37.751 None {} geolocation {} 157.53.1.1
>>> Host.MSSentinel.list_host_logons(host_name="VictimPc")
Account EventID TimeGenerated SourceComputerId Computer SubjectUserName SubjectDomainName
NT AUTHORITY\SYSTEM 4624 2020-10-01 22:39:36.987000+00:00 f6638b82-98a5-4542-8bec-6bc0977f793f VictimPc.Contoso.Azure VictimPc$ CONTOSO
NT AUTHORITY\SYSTEM 4624 2020-10-01 22:39:37.220000+00:00 f6638b82-98a5-4542-8bec-6bc0977f793f VictimPc.Contoso.Azure VictimPc\$ CONTOSO
NT AUTHORITY\SYSTEM 4624 2020-10-01 22:39:42.603000+00:00 f6638b82-98a5-4542-8bec-6bc0977f793f VictimPc.Contoso.Azure VictimPc\$ CONTOSO
You can also chain pivot functions together to create a processing pipeline that does multiple operations on data:
>>> (
suspicious_ips_df
# Lookup IPs at VT
.mp_pivot.run(IpAddress.ti.lookup_ipv4_VirusTotal, column="IPAddress")
# Filter on high severity
.query("Severity == 'high'")
.mp_pivot.run(IpAddress.util.whois, column="Ioc", join="left")
# Query IPs that have login attempts
.mp_pivot.run(IpAddress.MSSentinel.list_aad_signins_for_ip, ip_address_list="Ioc")
# Send the output of this to a plot
.mp_plot.timeline(
title="High Severity IPs with Logon attempts",
source_columns=["UserPrincipalName", "IPAddress", "ClientAppUsed", "Location"],
group_by="UserPrincipalName"
)
)
We'll see examples of how to do these pivoting queries later in the notebook.
MSTICPy has had entity classes from the very early days but, until now, these have only been used sporadically in the rest of the package.
The pivot functionality exposed operations relevant to a particular entity as methods of that entity. These operations could include:
The name comes from the common practice of Cyber investigators navigating between related entities. For example an entity/investigation chain might look like the following:
Step | Source | Operation | Target |
---|---|---|---|
1 | Alert | Review alert -> | Source IP(A) |
2 | Source IP(A) | Lookup TI -> | Related URLs |
Malware names | |||
3 | URL | Query web logs -> | Requesting hosts |
4 | Host | Query host logons -> | Accounts |
At each step there are one or more directions that you can take to follow the chain of related indicators of activity in a possible attack.
Bringing these functions into a few, well-known locations makes it easier to use MSTICPy to carry out this common pivoting pattern in Jupyter notebooks.
import msticpy as mp
mp.init_notebook();
The pivoting subsystem will automatically load functions from MSTICPy that do not need any authentication.
This is the case with providers such as Threat Intelligence (TILookup) and GeoIP.
If not initialized before running init_notebook
, they will be loaded with
the defaults as specified in your msticpyconfig.yaml.
For query providers, pivot functions are added dynamically as you connect/authenticate to the data provider.
You can add custom functions as pivot functions by creating a registration template and importing the function. Details of this are covered later in the document.
az_provider = QueryProvider("MSSentinel")
az_provider.connect(workspace="Default")
Please wait. Loading Kqlmagic extension...done Connecting...
connected
You can view the pivot functions loaded for
an entity. Note that because we've loaded an
MS Sentinel query provider a subset of the queries
(those that have a host_name
parameter) are
also loaded.
Note entity classes are also automatically imported by
init_notebook
, so you do not need to import them manually.
Host.pivots()
['MSSentinel.VMComputer_vmcomputer', 'MSSentinel.auditd_auditd_all', 'MSSentinel.az_nsg_interface', 'MSSentinel.az_nsg_net_flows', 'MSSentinel.az_nsg_net_flows_depr', 'MSSentinel.heartbeat', 'MSSentinel.heartbeat_for_host_depr', 'MSSentinel.sec_alerts', 'MSSentinel.sent_bookmarks', 'MSSentinel.syslog_all_syslog', 'MSSentinel.syslog_cron_activity', 'MSSentinel.syslog_logon_failures', 'MSSentinel.syslog_logons', 'MSSentinel.syslog_squid_activity', 'MSSentinel.syslog_sudo_activity', 'MSSentinel.syslog_user_group_activity', 'MSSentinel.syslog_user_logon', 'MSSentinel.wevt_all_events', 'MSSentinel.wevt_events_by_id', 'MSSentinel.wevt_get_process_tree', 'MSSentinel.wevt_list_other_events', 'MSSentinel.wevt_logon_attempts', 'MSSentinel.wevt_logon_failures', 'MSSentinel.wevt_logon_session', 'MSSentinel.wevt_logons', 'MSSentinel.wevt_parent_process', 'MSSentinel.wevt_process_session', 'MSSentinel.wevt_processes', 'RiskIQ.articles', 'RiskIQ.artifacts', 'RiskIQ.certificates', 'RiskIQ.components', 'RiskIQ.cookies', 'RiskIQ.hostpair_children', 'RiskIQ.hostpair_parents', 'RiskIQ.malware', 'RiskIQ.projects', 'RiskIQ.reputation', 'RiskIQ.resolutions', 'RiskIQ.summary', 'RiskIQ.trackers', 'RiskIQ.whois', 'dns_is_resolvable', 'dns_resolve', 'util.dns_components', 'util.dns_in_abuse_list', 'util.dns_is_resolvable', 'util.dns_resolve', 'util.dns_validate_tld']
Notice that TILookup was loaded even though we did not create an instance of TILookup beforehand.
mp.pivot.providers
{'TILookup': <msticpy.context.tilookup.TILookup at 0x207f6fb7ac0>, 'MSSentinel': <msticpy.data.core.data_providers.QueryProvider at 0x207f7132ee0>}
Data queries are grouped into a container with the name of the data provider to which they belong. E.g. MSSentinel queries are in a container of that name, Spunk queries would be in a "Splunk" container.
Note: if you have multiple instances of a provider type the name of the provider container will have a suffix with the instance name (e.g. the Sentinel Workspace name).
TI lookups are put into a "ti" container
All other built-in functions are added to the "other" container.
The containers themselves are callable and will return a list of their contents. Containers are also iterable - each iteration returns a tuple (pair) of name/function values.
In notebooks/IPython you can also use tab completion to get to the right function.
You can pass a substring to the pivots
(or get_pivot_list
) function
Host.pivots("logon")
['MSSentinel.syslog_logon_failures', 'MSSentinel.syslog_logons', 'MSSentinel.syslog_user_logon', 'MSSentinel.wevt_logon_attempts', 'MSSentinel.wevt_logon_failures', 'MSSentinel.wevt_logon_session', 'MSSentinel.wevt_logons']
Pivot also has a utility that allows you to browse entities and the pivot functions attached to them. You can search for functions with desired keywords, view help for the specific function and copy the function signature to paste into a code cell.
Pivot.browse()
VBox(children=(HBox(children=(VBox(children=(HTML(value='<b>Entities</b>'), Select(description='entity', layou…
Pivot functions have flexible input types. They can be used with the following types of parameters:
Pivot functions normally return results as a dataframe (although some complex functions such as Notebooklets can return composite results objects containing multiple dataframes and other object types.
print("List 'utility' pivot functions for IpAddress\n")
IpAddress.util()
print()
print("-------------------------------\n")
print("Print help for a function - IpAddress.util.type\n")
help(IpAddress.util.ip_type)
List 'utility' pivot functions for IpAddress whois function ip_type function ip_rev_resolve function geoloc function geoloc_ips function ------------------------------- Print help for a function - IpAddress.util.type Help on function get_ip_type in module msticpy.context.ip_utils: get_ip_type(ip: str = None, ip_str: str = None) -> str Validate value is an IP address and determine IPType category. (IPAddress category is e.g. Private/Public/Multicast). Parameters ---------- ip : str The string of the IP Address ip_str : str The string of the IP Address - alias for `ip` Returns ------- str Returns ip type string using ip address module
If in doubt, use help(entity.container.func) or entity.container.func?
IpAddress.util.ip_type("10.1.1.1")
ip | result | src_row_index | |
---|---|---|---|
0 | 10.1.1.1 | Private | 0 |
display(IpAddress.util.ip_type("10.1.1.1"))
display(IpAddress.util.ip_type(ip_str="157.53.1.1"))
display(IpAddress.util.whois("157.53.1.1"))
display(IpAddress.util.geoloc(value="157.53.1.1"))
ip | result | src_row_index | |
---|---|---|---|
0 | 10.1.1.1 | Private | 0 |
ip | result | src_row_index | |
---|---|---|---|
0 | 157.53.1.1 | Public | 0 |
asn | asn_cidr | asn_country_code | asn_date | asn_description | asn_registry | nets | nir | query | raw | raw_referral | referral | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NA | NA | US | 2015-04-01 | NA | arin | [{'cidr': '157.53.0.0/16', 'name': 'NETACTUATE-MDN-04', 'handle': 'NET-157-53-0-0-1', 'range': '... | None | 157.53.1.1 | None | None | None |
CountryCode | CountryName | Longitude | Latitude | TimeGenerated | Type | IpAddress | |
---|---|---|---|---|---|---|---|
0 | US | United States | -97.822 | 37.751 | 2022-06-08 19:38:16.207819 | geolocation | 157.53.1.1 |
Behind the scenes the Pivot api is using a mapping of entity attributes to supply the right value to the function parameter.
ip1 = IpAddress(Address="10.1.1.1")
ip2 = IpAddress(Address="157.53.1.1")
display(IpAddress.util.ip_type(ip1))
display(IpAddress.util.ip_type(ip2))
display(IpAddress.util.whois(ip2))
display(IpAddress.util.geoloc(ip2))
ip | result | src_row_index | |
---|---|---|---|
0 | 10.1.1.1 | Private | 0 |
ip | result | src_row_index | |
---|---|---|---|
0 | 157.53.1.1 | Public | 0 |
asn | asn_cidr | asn_country_code | asn_date | asn_description | asn_registry | nets | nir | query | raw | raw_referral | referral | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NA | NA | US | 2015-04-01 | NA | arin | [{'cidr': '157.53.0.0/16', 'name': 'NETACTUATE-MDN-04', 'handle': 'NET-157-53-0-0-1', 'range': '... | None | 157.53.1.1 | None | None | None |
CountryCode | CountryName | Longitude | Latitude | TimeGenerated | Type | IpAddress | |
---|---|---|---|---|---|---|---|
0 | US | United States | -97.822 | 37.751 | 2022-06-08 19:38:19.432876 | geolocation | 157.53.1.1 |
Many of the underlying functions will accept either single values or collections (usually in DataFrames) of values as input. Even in cases where the underlying function does not accept iterables as parameters, the Pivot library will usually be able to iterate through each value and collate the results to hand you back a single dataframe.
Note: there are some exceptions to this - usually where the underlying function
is long-running or expensive and has opted not to accept iterated calls.
Notebooklets are an example of these.
Where the function has multiple parameters you can supply a mixture of iterables and single values.
You can also use multiple iterables for multiple parameters.
For example:
list_1 = [1, 2, 3, 4]
list_2 = ["a", "b", "c"]
entity.util.func(p1=list_1, p2=list_2)
The function will execute with the pairings (1, "a"), (2, "b") and (3, "c) - (4, _) will be ignored
Use our magic function to convert pasted-in list to dataframe
%%txt2df --headers --name ip_df1
AllExtIPs
9, 172.217.15.99
10, 40.85.232.64
11, 20.38.98.100
12, 23.96.64.84
13, 65.55.44.108
14, 131.107.147.209
15, 10.0.3.4
16, 10.0.3.5
17, 13.82.152.48
AllExtIPs | |
---|---|
9 | 172.217.15.99 |
10 | 40.85.232.64 |
11 | 20.38.98.100 |
12 | 23.96.64.84 |
13 | 65.55.44.108 |
14 | 131.107.147.209 |
15 | 10.0.3.4 |
16 | 10.0.3.5 |
17 | 13.82.152.48 |
ip_list1 = ip_df1.AllExtIPs.values[-6:]
display(IpAddress.util.ip_type(ip_list1))
display(IpAddress.util.ip_type(ip_str=list(ip_list1)))
display(IpAddress.util.whois(value=tuple(ip_list1)))
display(IpAddress.util.geoloc(ip_list1))
ip | result | src_row_index | |
---|---|---|---|
0 | 23.96.64.84 | Public | 0 |
1 | 65.55.44.108 | Public | 1 |
2 | 131.107.147.209 | Public | 2 |
3 | 10.0.3.4 | Private | 3 |
4 | 10.0.3.5 | Private | 4 |
5 | 13.82.152.48 | Public | 5 |
ip | result | src_row_index | |
---|---|---|---|
0 | 23.96.64.84 | Public | 0 |
1 | 65.55.44.108 | Public | 1 |
2 | 131.107.147.209 | Public | 2 |
3 | 10.0.3.4 | Private | 3 |
4 | 10.0.3.5 | Private | 4 |
5 | 13.82.152.48 | Public | 5 |
nir | asn_registry | asn | asn_cidr | asn_country_code | asn_date | asn_description | query | nets | raw | referral | raw_referral | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | arin | 8075 | 23.96.0.0/14 | US | 2013-06-18 | MICROSOFT-CORP-MSN-AS-BLOCK, US | 23.96.64.84 | [{'cidr': '23.96.0.0/13', 'name': 'MSFT', 'handle': 'NET-23-96-0-0-1', 'range': '23.96.0.0 - 23.... | NaN | NaN | NaN |
1 | NaN | arin | 8075 | 65.52.0.0/14 | US | 2001-02-14 | MICROSOFT-CORP-MSN-AS-BLOCK, US | 65.55.44.108 | [{'cidr': '65.52.0.0/14', 'name': 'MICROSOFT-1BLK', 'handle': 'NET-65-52-0-0-1', 'range': '65.52... | NaN | NaN | NaN |
2 | NaN | arin | 3598 | 131.107.0.0/16 | US | 1988-11-11 | MICROSOFT-CORP-AS, US | 131.107.147.209 | [{'cidr': '131.107.0.0/16', 'name': 'MICROSOFT', 'handle': 'NET-131-107-0-0-1', 'range': '131.10... | NaN | NaN | NaN |
3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 | NaN | arin | 8075 | 13.64.0.0/11 | US | 2015-03-26 | MICROSOFT-CORP-MSN-AS-BLOCK, US | 13.82.152.48 | [{'cidr': '13.96.0.0/13, 13.104.0.0/14, 13.64.0.0/11', 'name': 'MSFT', 'handle': 'NET-13-64-0-0-... | NaN | NaN | NaN |
CountryCode | CountryName | State | City | Longitude | Latitude | TimeGenerated | Type | IpAddress | |
---|---|---|---|---|---|---|---|---|---|
0 | US | United States | Virginia | Tappahannock | -76.8545 | 37.9273 | 2022-06-08 19:38:37.428898 | geolocation | 23.96.64.84 |
1 | US | United States | Virginia | Boydton | -78.3750 | 36.6534 | 2022-06-08 19:38:37.429901 | geolocation | 65.55.44.108 |
2 | US | United States | Washington | Redmond | -122.1257 | 47.6722 | 2022-06-08 19:38:37.429901 | geolocation | 131.107.147.209 |
3 | NaN | Private address | NaN | location unknown | NaN | NaN | 2022-06-08 19:38:37.429901 | geolocation | 10.0.3.4 |
4 | NaN | Private address | NaN | location unknown | NaN | NaN | 2022-06-08 19:38:37.429901 | geolocation | 10.0.3.5 |
5 | US | United States | Virginia | Tappahannock | -76.8545 | 37.9273 | 2022-06-08 19:38:37.430901 | geolocation | 13.82.152.48 |
Using a dataframe as input requires a slightly different syntax since you not only need to pass the dataframe as a parameter but also tell the function which column to use for input.
To specify the column to use, you can use the name of the parameter that the underlying function expects or one of these generic names:
Note these generic names are not shown in the function help
display(IpAddress.util.ip_type(data=ip_df1, input_col="AllExtIPs"))
display(IpAddress.util.ip_type(data=ip_df1, ip="AllExtIPs"))
display(IpAddress.util.whois(data=ip_df1, column="AllExtIPs"))
display(IpAddress.util.geoloc(data=ip_df1, src_col="AllExtIPs"))
ip | result | src_row_index | |
---|---|---|---|
0 | 172.217.15.99 | Public | 0 |
1 | 40.85.232.64 | Public | 1 |
2 | 20.38.98.100 | Public | 2 |
3 | 23.96.64.84 | Public | 3 |
4 | 65.55.44.108 | Public | 4 |
5 | 131.107.147.209 | Public | 5 |
6 | 10.0.3.4 | Private | 6 |
7 | 10.0.3.5 | Private | 7 |
8 | 13.82.152.48 | Public | 8 |
ip | result | src_row_index | |
---|---|---|---|
0 | 172.217.15.99 | Public | 0 |
1 | 40.85.232.64 | Public | 1 |
2 | 20.38.98.100 | Public | 2 |
3 | 23.96.64.84 | Public | 3 |
4 | 65.55.44.108 | Public | 4 |
5 | 131.107.147.209 | Public | 5 |
6 | 10.0.3.4 | Private | 6 |
7 | 10.0.3.5 | Private | 7 |
8 | 13.82.152.48 | Public | 8 |
nir | asn_registry | asn | asn_cidr | asn_country_code | asn_date | asn_description | query | nets | raw | referral | raw_referral | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
9 | NaN | arin | 15169 | 172.217.15.0/24 | US | 2012-04-16 | GOOGLE, US | 172.217.15.99 | [{'cidr': '172.217.0.0/16', 'name': 'GOOGLE', 'handle': 'NET-172-217-0-0-1', 'range': '172.217.0... | NaN | NaN | NaN |
10 | NaN | arin | 8075 | 40.80.0.0/12 | US | 2015-02-23 | MICROSOFT-CORP-MSN-AS-BLOCK, US | 40.85.232.64 | [{'cidr': '40.96.0.0/12, 40.74.0.0/15, 40.124.0.0/16, 40.80.0.0/12, 40.125.0.0/17, 40.112.0.0/13... | NaN | NaN | NaN |
11 | NaN | arin | 8075 | 20.36.0.0/14 | US | 2017-10-18 | MICROSOFT-CORP-MSN-AS-BLOCK, US | 20.38.98.100 | [{'cidr': '20.40.0.0/13, 20.33.0.0/16, 20.64.0.0/10, 20.128.0.0/16, 20.34.0.0/15, 20.36.0.0/14, ... | NaN | NaN | NaN |
12 | NaN | arin | 8075 | 23.96.0.0/14 | US | 2013-06-18 | MICROSOFT-CORP-MSN-AS-BLOCK, US | 23.96.64.84 | [{'cidr': '23.96.0.0/13', 'name': 'MSFT', 'handle': 'NET-23-96-0-0-1', 'range': '23.96.0.0 - 23.... | NaN | NaN | NaN |
13 | NaN | arin | 8075 | 65.52.0.0/14 | US | 2001-02-14 | MICROSOFT-CORP-MSN-AS-BLOCK, US | 65.55.44.108 | [{'cidr': '65.52.0.0/14', 'name': 'MICROSOFT-1BLK', 'handle': 'NET-65-52-0-0-1', 'range': '65.52... | NaN | NaN | NaN |
14 | NaN | arin | 3598 | 131.107.0.0/16 | US | 1988-11-11 | MICROSOFT-CORP-AS, US | 131.107.147.209 | [{'cidr': '131.107.0.0/16', 'name': 'MICROSOFT', 'handle': 'NET-131-107-0-0-1', 'range': '131.10... | NaN | NaN | NaN |
15 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
16 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
17 | NaN | arin | 8075 | 13.64.0.0/11 | US | 2015-03-26 | MICROSOFT-CORP-MSN-AS-BLOCK, US | 13.82.152.48 | [{'cidr': '13.96.0.0/13, 13.104.0.0/14, 13.64.0.0/11', 'name': 'MSFT', 'handle': 'NET-13-64-0-0-... | NaN | NaN | NaN |
CountryCode | CountryName | Longitude | Latitude | TimeGenerated | Type | IpAddress | State | City | |
---|---|---|---|---|---|---|---|---|---|
0 | US | United States | -97.8220 | 37.7510 | 2022-06-08 19:38:44.907125 | geolocation | 172.217.15.99 | NaN | NaN |
1 | CA | Canada | -79.3623 | 43.6547 | 2022-06-08 19:38:44.907125 | geolocation | 40.85.232.64 | Ontario | Toronto |
2 | US | United States | -76.8545 | 37.9273 | 2022-06-08 19:38:44.907125 | geolocation | 20.38.98.100 | Virginia | Tappahannock |
3 | US | United States | -76.8545 | 37.9273 | 2022-06-08 19:38:44.908124 | geolocation | 23.96.64.84 | Virginia | Tappahannock |
4 | US | United States | -78.3750 | 36.6534 | 2022-06-08 19:38:44.908124 | geolocation | 65.55.44.108 | Virginia | Boydton |
5 | US | United States | -122.1257 | 47.6722 | 2022-06-08 19:38:44.909124 | geolocation | 131.107.147.209 | Washington | Redmond |
6 | NaN | Private address | NaN | NaN | 2022-06-08 19:38:44.909124 | geolocation | 10.0.3.4 | NaN | location unknown |
7 | NaN | Private address | NaN | NaN | 2022-06-08 19:38:44.909124 | geolocation | 10.0.3.5 | NaN | location unknown |
8 | US | United States | -76.8545 | 37.9273 | 2022-06-08 19:38:44.909124 | geolocation | 13.82.152.48 | Virginia | Tappahannock |
You might want to return a data set that is joined to your input set. To do that use the "join" parameter.
The value of join can be:
To preserve all rows from the input, use a "left" join. To keep only rows that have a valid result from the function use "inner" or "right"
Note while most functions only return a single output row for each input row
some return multiple rows. Be cautious using "outer" in these cases.
display(IpAddress.util.geoloc(data=ip_df1, src_col="AllExtIPs", join="left"))
AllExtIPs | CountryCode | CountryName | Longitude | Latitude | TimeGenerated | Type | IpAddress | State | City | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 172.217.15.99 | US | United States | -97.8220 | 37.7510 | 2022-06-08 23:18:07.276418 | geolocation | 172.217.15.99 | NaN | NaN |
1 | 40.85.232.64 | CA | Canada | -79.3623 | 43.6547 | 2022-06-08 23:18:07.277418 | geolocation | 40.85.232.64 | Ontario | Toronto |
2 | 20.38.98.100 | US | United States | -76.8545 | 37.9273 | 2022-06-08 23:18:07.277418 | geolocation | 20.38.98.100 | Virginia | Tappahannock |
3 | 23.96.64.84 | US | United States | -76.8545 | 37.9273 | 2022-06-08 23:18:07.277418 | geolocation | 23.96.64.84 | Virginia | Tappahannock |
4 | 65.55.44.108 | US | United States | -78.3750 | 36.6534 | 2022-06-08 23:18:07.278418 | geolocation | 65.55.44.108 | Virginia | Boydton |
5 | 131.107.147.209 | US | United States | -122.1257 | 47.6722 | 2022-06-08 23:18:07.278418 | geolocation | 131.107.147.209 | Washington | Redmond |
6 | 10.0.3.4 | NaN | Private address | NaN | NaN | 2022-06-08 23:18:07.278418 | geolocation | 10.0.3.4 | NaN | location unknown |
7 | 10.0.3.5 | NaN | Private address | NaN | NaN | 2022-06-08 23:18:07.278418 | geolocation | 10.0.3.5 | NaN | location unknown |
8 | 13.82.152.48 | US | United States | -76.8545 | 37.9273 | 2022-06-08 23:18:07.278418 | geolocation | 13.82.152.48 | Virginia | Tappahannock |
A significant difference between the functions that we've seen so far and data query functions is that the latter do not accept generic parameter names.
When you use a named parameter in a data query pivot, you must specify the name that the query function is expecting. If in doubt, use "?" prefix to show the function help.
Example:
Host.MSSentinel.list_host_events_by_id?
qry_prov2 = QueryProvider("MSSentinel")
qry_prov2.connect(workspace="CyberSecuritySOC")
Host.pivots()
['MSSentinel.VMComputer_vmcomputer', 'MSSentinel.auditd_auditd_all', 'MSSentinel.az_nsg_interface', 'MSSentinel.az_nsg_net_flows', 'MSSentinel.az_nsg_net_flows_depr', 'MSSentinel.heartbeat', 'MSSentinel.heartbeat_for_host_depr', 'MSSentinel.sec_alerts', 'MSSentinel.sent_bookmarks', 'MSSentinel.syslog_all_syslog', 'MSSentinel.syslog_cron_activity', 'MSSentinel.syslog_logon_failures', 'MSSentinel.syslog_logons', 'MSSentinel.syslog_squid_activity', 'MSSentinel.syslog_sudo_activity', 'MSSentinel.syslog_user_group_activity', 'MSSentinel.syslog_user_logon', 'MSSentinel.wevt_all_events', 'MSSentinel.wevt_events_by_id', 'MSSentinel.wevt_get_process_tree', 'MSSentinel.wevt_list_other_events', 'MSSentinel.wevt_logon_attempts', 'MSSentinel.wevt_logon_failures', 'MSSentinel.wevt_logon_session', 'MSSentinel.wevt_logons', 'MSSentinel.wevt_parent_process', 'MSSentinel.wevt_process_session', 'MSSentinel.wevt_processes', 'MSSentinel_cybersecuritysoc.VMComputer_vmcomputer', 'MSSentinel_cybersecuritysoc.auditd_auditd_all', 'MSSentinel_cybersecuritysoc.az_nsg_interface', 'MSSentinel_cybersecuritysoc.az_nsg_net_flows', 'MSSentinel_cybersecuritysoc.az_nsg_net_flows_depr', 'MSSentinel_cybersecuritysoc.heartbeat', 'MSSentinel_cybersecuritysoc.heartbeat_for_host_depr', 'MSSentinel_cybersecuritysoc.sec_alerts', 'MSSentinel_cybersecuritysoc.sent_bookmarks', 'MSSentinel_cybersecuritysoc.syslog_all_syslog', 'MSSentinel_cybersecuritysoc.syslog_cron_activity', 'MSSentinel_cybersecuritysoc.syslog_logon_failures', 'MSSentinel_cybersecuritysoc.syslog_logons', 'MSSentinel_cybersecuritysoc.syslog_squid_activity', 'MSSentinel_cybersecuritysoc.syslog_sudo_activity', 'MSSentinel_cybersecuritysoc.syslog_user_group_activity', 'MSSentinel_cybersecuritysoc.syslog_user_logon', 'MSSentinel_cybersecuritysoc.wevt_all_events', 'MSSentinel_cybersecuritysoc.wevt_events_by_id', 'MSSentinel_cybersecuritysoc.wevt_get_process_tree', 'MSSentinel_cybersecuritysoc.wevt_list_other_events', 'MSSentinel_cybersecuritysoc.wevt_logon_attempts', 'MSSentinel_cybersecuritysoc.wevt_logon_failures', 'MSSentinel_cybersecuritysoc.wevt_logon_session', 'MSSentinel_cybersecuritysoc.wevt_logons', 'MSSentinel_cybersecuritysoc.wevt_parent_process', 'MSSentinel_cybersecuritysoc.wevt_process_session', 'MSSentinel_cybersecuritysoc.wevt_processes', 'RiskIQ.articles', 'RiskIQ.artifacts', 'RiskIQ.certificates', 'RiskIQ.components', 'RiskIQ.cookies', 'RiskIQ.hostpair_children', 'RiskIQ.hostpair_parents', 'RiskIQ.malware', 'RiskIQ.projects', 'RiskIQ.reputation', 'RiskIQ.resolutions', 'RiskIQ.summary', 'RiskIQ.trackers', 'RiskIQ.whois', 'dns_is_resolvable', 'dns_resolve', 'util.dns_components', 'util.dns_in_abuse_list', 'util.dns_is_resolvable', 'util.dns_resolve', 'util.dns_validate_tld']
Host.pivots()[:10]
['MSSentinel.VMComputer_vmcomputer', 'MSSentinel.auditd_auditd_all', 'MSSentinel.az_nsg_interface', 'MSSentinel.az_nsg_net_flows', 'MSSentinel.az_nsg_net_flows_depr', 'MSSentinel.heartbeat', 'MSSentinel.heartbeat_for_host_depr', 'MSSentinel.sec_alerts', 'MSSentinel.sent_bookmarks', 'MSSentinel.syslog_all_syslog']
Use the edit_query_time
function to set/change the time range used by queries.
With no parameters it defaults to a period of [UtcNow - 1 day] to [UtcNow].
Or you can change the timespan to use with the TimeSpan class. Changes that you make to the time range take effect immediately.
help(mp.pivot.edit_query_time)
Help on method edit_query_time in module msticpy.init.pivot: edit_query_time(timespan: Optional[msticpy.common.timespan.TimeSpan] = None) method of msticpy.init.pivot.Pivot instance Display a QueryTime widget to get the timespan. Parameters ---------- timespan : Optional[TimeSpan], optional Pre-populate the timespan shown by the QueryTime editor, by default None
mp.pivot.edit_query_time()
VBox(children=(HTML(value='<h4>Set time range for pivot functions.</h4>'), HBox(children=(DatePicker(value=dat…
You can also just set the timespan directly on the pivot object
from msticpy.common.timespan import TimeSpan
ts = TimeSpan(start="2020-10-01", period="1d")
mp.pivot.timespan = ts
Host.MSSentinel()
sec_alerts function VMComputer_vmcomputer function sent_bookmarks function az_nsg_net_flows_depr function az_nsg_interface function heartbeat function az_nsg_net_flows function heartbeat_for_host_depr function auditd_auditd_all function syslog_sudo_activity function syslog_cron_activity function syslog_user_group_activity function syslog_all_syslog function syslog_squid_activity function syslog_user_logon function syslog_logons function syslog_logon_failures function wevt_all_events function wevt_events_by_id function wevt_list_other_events function wevt_logon_session function wevt_logons function wevt_logon_failures function wevt_logon_attempts function wevt_processes function wevt_get_process_tree function wevt_parent_process function wevt_process_session function
host = Host(HostName="VictimPc")
Host.MSSentinel.heartbeat(host)
TenantId | SourceSystem | TimeGenerated | MG | ManagementGroupName | SourceComputerId | ComputerIP | Computer | Category | OSType | OSName | OSMajorVersion | OSMinorVersion | Version | SCAgentChannel | IsGatewayInstalled | RemoteIPLongitude | RemoteIPLatitude | RemoteIPCountry | SubscriptionId | ResourceGroup | ResourceProvider | Resource | ResourceId | ResourceType | ComputerEnvironment | Solutions | VMUUID | ComputerPrivateIPs | Type | _ResourceId |
---|
Host.MSSentinel.wevt_logons(host_name="VictimPc").head()
TenantId | Account | EventID | TimeGenerated | SourceComputerId | Computer | SubjectUserName | SubjectDomainName | SubjectUserSid | TargetUserName | TargetDomainName | TargetUserSid | TargetLogonId | LogonProcessName | LogonType | LogonTypeName | AuthenticationPackageName | Status | IpAddress | WorkstationName | TimeCreatedUtc |
---|
The example below shows using the host entity as an initial parameter
(Pivot is using the attribute mapping assign the host_name
function parameter the value of host.fqdn
).
The second parameter is a list of event IDs specified explicitly.
Host.MSSentinel.wevt_logons?
Signature: Host.MSSentinel.wevt_logons(*args, **kwargs) -> Union[pandas.core.frame.DataFrame, Any] Docstring: Retrieves the logon events on the host Parameters ---------- add_query_items: str (optional) Additional query clauses end: datetime Query end time event_filter: str (optional) Event subset (default value is: | where EventID == 4624) host_name: str Name of host query_project: str (optional) Column project statement (default value is: | project TenantId, Account, EventID, TimeGenerat...) start: datetime Query start time subscription_filter: str (optional) Optional subscription/tenant filter expression (default value is: true) table: str (optional) Table name (default value is: SecurityEvent) File: f:\anaconda\envs\msticpy\lib\functools.py Type: function
(
Host.MSSentinel.wevt_events_by_id( # Pivot query returns DataFrame
host, event_list=[4624, 4625, 4672]
)
[["Computer", "EventID", "Activity"]] # we could have save the output to a dataframe
.groupby(["EventID", "Activity"]) # variable but we can also use pandas
.count() # functions/syntax directly on the output
)
Computer | ||
---|---|---|
EventID | Activity |
Some data queries accept "list" items as parameters (e.g. many of the IP queries accept a list of IP addresses). These work as expected, with a single query calling sending the whole list as a single parameter.
ip_list = [
"203.23.68.64",
"67.10.68.45",
"182.69.173.164",
"79.176.167.161",
"167.220.197.230",
]
IpAddress.MSSentinel.aad_signins?
Signature: IpAddress.MSSentinel.aad_signins(*args, **kwargs) -> Union[pandas.core.frame.DataFrame, Any] Docstring: Lists Azure AD Signins for an IP Address Parameters ---------- add_query_items: str (optional) Additional query clauses end: datetime Query end time ip_address_list: list The IP Address or list of Addresses start: datetime Query start time table: str (optional) Table name (default value is: SigninLogs) File: f:\anaconda\envs\msticpy\lib\functools.py Type: function
type(IpAddress.MSSentinel.aad_signins(ip_address_list=ip_list).head(5))
In this case the pivot function will iterate through the values of the iterable, making a separate query for each and then joining the results.
We can see that this function only accepts a single value for "account_name".
Account.MSSentinel.aad_signins?
Signature: Account.MSSentinel.aad_signins(*args, **kwargs) -> Union[pandas.core.frame.DataFrame, Any] Docstring: Lists Azure AD Signins for Account Parameters ---------- account_id: str (optional) Azure user ID to find (default value is: !!DEFAULT!!) account_name: str (optional) The account name to find (default value is: !!DEFAULT!!) add_query_items: str (optional) Additional query clauses end: datetime Query end time start: datetime Query start time table: str (optional) Table name (default value is: SigninLogs) File: f:\anaconda\envs\msticpy\lib\functools.py Type: function
accounts = [
"ofshezaf",
"moshabi",
]
Account.MSSentinel.aad_signins(account_name=accounts)
TenantId | SourceSystem | TimeGenerated | ResourceId | OperationName | OperationVersion | Category | ResultType | ResultSignature | ResultDescription | DurationMs | CorrelationId | Resource | ResourceGroup | ResourceProvider | Identity | Level | Location | AlternateSignInName | AppDisplayName | AppId | AuthenticationContextClassReferences | AuthenticationDetails | AuthenticationMethodsUsed | AuthenticationProcessingDetails | ... | RiskState | ResourceDisplayName | ResourceIdentity | ResourceServicePrincipalId | ServicePrincipalId | ServicePrincipalName | Status | TokenIssuerName | TokenIssuerType | UserAgent | UserDisplayName | UserId | UserPrincipalName | AADTenantId | UserType | FlaggedForReview | IPAddressFromResourceProvider | SignInIdentifier | SignInIdentifierType | ResourceTenantId | HomeTenantId | UniqueTokenIdentifier | SessionLifetimePolicies | AutonomousSystemNumber | Type |
---|
0 rows × 71 columns
The same rules as outline earlier for multiple parameters of different types apply to data queries
project = "| project UserPrincipalName, Identity"
Account.MSSentinel.aad_signins(account_name=accounts, add_query_items=project)
UserPrincipalName | Identity |
---|
This is similar to using dataframes for other pivot functions.
We must use the data
parameter to specify the input dataframe.
You supply the column name from your input dataframe as the value of
the parameters expected by the function.
account_df = pd.DataFrame(accounts, columns=["User"])
display(account_df)
User | |
---|---|
0 | ofshezaf |
1 | moshabi |
Now we have our dataframe:
account_df
as the value of the data
parameter.User
On each iteration, the column value from a subsequent row will be extracted and given as the parameter value for the function parameter.
Note:
If the function parameter type is a "list" type - i.e. it expects a list of values
the parameter value will be sent as a list and only a single query is executed.
If the query function has multiple "list" type parameters, these will be
populated in the same way.
Note2:
If you have multiple parameters fed by multiple input columns AND one or more
of the function parameters is not a list type, the the query will be broken
into queries for each row. Each sub-query getting its values from a single row
of the input dataframe.
Account.MSSentinel.aad_signins(data=account_df, account_name="User", add_query_items=project)
UserPrincipalName | Identity |
---|
These work in the same way as the functions described earlier. However, there are a few peculiarities of the Threat Intel functions:
Some providers treat these interchangably and use the same endpoint for both. Other providers do not explicitly support IPV6 (e.g. the Tor exit nodes provider). Still others (notably OTX) use different endpoints for IPv4 and IPv6.
If you are querying IPv4 you can use either the lookup_ip
function or one
of the lookup_ipv4
functions. In most cases, you can also use these functions
for a mixture of IPv4 and v6 addresses. However, in cases where a provider
does not support IPv6 or uses a different endpoint for IPv6 queries you
will get no responses.
This table shows the mapping between and entity type and IoC Types:
Entity | IoCType |
---|---|
IpAddress | ipv4, ipv6 |
Dns | domain |
File | filehash (incl |
md5, sha1, sha256) | |
Url | url |
Note: Where you are using a File entity as a parameter, there is a complication.
A file entity can have multiple hash values (md5, sha1, sha256 and even sha256 authenticode).
Thefile_hash
attibute of File is used as the default parameter.
In cases where a file has multiple hashes the highest priority hash (in order
sha256, sha1, md5, sha256ac) is returned.
If you are not using file entities as parameters (and specifying the input values
explicitly or via a Dataframe or iterable) you can ignore this.
IpAddress.ti()
lookup_ip function lookup_ipv4 function lookup_ipv6 function
Dns.ti.lookup_dns(value="fkksjobnn43.org")
Ioc | IocType | SafeIoc | QuerySubtype | Provider | Result | Severity | Details | RawResult | Reference | Status | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | fkksjobnn43.org | dns | fkksjobnn43.org | None | OTX | True | high | {'pulse_count': 36, 'names': ['Vertek - Jaff Ransomware', 'Jaff - Malware Domain Feed V2', 'Jaff... | {'sections': ['general', 'geo', 'url_list', 'passive_dns', 'malware', 'whois', 'http_scans'], 'w... | https://otx.alienvault.com/api/v1/indicators/domain/fkksjobnn43.org/general | 0 |
0 | fkksjobnn43.org | dns | None | OPR | True | warning | {'rank': None, 'error': 'Domain not found'} | {'status_code': 404, 'error': 'Domain not found', 'page_rank_integer': 0, 'page_rank_decimal': 0... | https://openpagerank.com/api/v1.0/getPageRank?domains[0]=fkksjobnn43.org | 0 | |
0 | fkksjobnn43.org | dns | fkksjobnn43.org | None | RiskIQ | True | high | {'summary': {'resolutions': 7, 'certificates': 0, 'malware_hashes': 79, 'projects': 0, 'articles... | {'summary': {'resolutions': 7, 'certificates': 0, 'malware_hashes': 79, 'projects': 0, 'articles... | https://community.riskiq.com | 0 |
0 | fkksjobnn43.org | dns | fkksjobnn43.org | None | VirusTotal | True | information | {'verbose_msg': 'Domain found in dataset', 'response_code': 1, 'positives': 0, 'detected_urls': ... | {'Sophos category': 'command and control', 'undetected_downloaded_samples': [], 'whois_timestamp... | https://www.virustotal.com/vtapi/v2/domain/report | 0 |
0 | fkksjobnn43.org | dns | fkksjobnn43.org | None | XForce | False | information | Authorization failed. Check account and key details. | <Response [401 Unauthorized]> | https://api.xforce.ibmcloud.com/url/fkksjobnn43.org | 401 |
hashes = [
"02a7977d1faf7bfc93a4b678a049c9495ea663e7065aa5a6caf0f69c5ff25dbd",
"06b020a3fd3296bc4c7bf53307fe7b40638e7f445bdd43fac1d04547a429fdaf",
"06c676bf8f5c6af99172c1cf63a84348628ae3f39df9e523c42447e2045e00ff",
]
File.ti.lookup_file_hash_VirusTotal(hashes)
Ioc | IocType | SafeIoc | QuerySubtype | Provider | Result | Severity | Details | RawResult | Reference | Status | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 02a7977d1faf7bfc93a4b678a049c9495ea663e7065aa5a6caf0f69c5ff25dbd | sha256_hash | 02a7977d1faf7bfc93a4b678a049c9495ea663e7065aa5a6caf0f69c5ff25dbd | None | VirusTotal | True | high | {'verbose_msg': 'Scan finished, information embedded', 'response_code': 1, 'positives': 51, 'res... | {'scans': {'Bkav': {'detected': True, 'version': '1.3.0.9899', 'result': 'W32.AIDetect.malware2'... | https://www.virustotal.com/vtapi/v2/file/report | 0 |
1 | 06b020a3fd3296bc4c7bf53307fe7b40638e7f445bdd43fac1d04547a429fdaf | sha256_hash | 06b020a3fd3296bc4c7bf53307fe7b40638e7f445bdd43fac1d04547a429fdaf | None | VirusTotal | True | high | {'verbose_msg': 'Scan finished, information embedded', 'response_code': 1, 'positives': 55, 'res... | {'scans': {'Bkav': {'detected': False, 'version': '1.3.0.9899', 'result': None, 'update': '20201... | https://www.virustotal.com/vtapi/v2/file/report | 0 |
2 | 06c676bf8f5c6af99172c1cf63a84348628ae3f39df9e523c42447e2045e00ff | sha256_hash | 06c676bf8f5c6af99172c1cf63a84348628ae3f39df9e523c42447e2045e00ff | None | VirusTotal | True | high | {'verbose_msg': 'Scan finished, information embedded', 'response_code': 1, 'positives': 53, 'res... | {'scans': {'Bkav': {'detected': True, 'version': '1.3.0.9899', 'result': 'W32.AIDetect.malware1'... | https://www.virustotal.com/vtapi/v2/file/report | 0 |
To specify the source column you can use either "column" or "obs_column"
hashes_df = pd.DataFrame(
[(fh, f"item_{idx}", "stuff") for idx, fh in enumerate(hashes)],
columns=["hash", "ref", "desc"],
)
display(hashes_df)
File.ti.lookup_file_hash_VirusTotal(data=hashes_df, column="hash")
hash | ref | desc | |
---|---|---|---|
0 | 02a7977d1faf7bfc93a4b678a049c9495ea663e7065aa5a6caf0f69c5ff25dbd | item_0 | stuff |
1 | 06b020a3fd3296bc4c7bf53307fe7b40638e7f445bdd43fac1d04547a429fdaf | item_1 | stuff |
2 | 06c676bf8f5c6af99172c1cf63a84348628ae3f39df9e523c42447e2045e00ff | item_2 | stuff |
Ioc | IocType | SafeIoc | QuerySubtype | Provider | Result | Severity | Details | RawResult | Reference | Status | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 02a7977d1faf7bfc93a4b678a049c9495ea663e7065aa5a6caf0f69c5ff25dbd | sha256_hash | 02a7977d1faf7bfc93a4b678a049c9495ea663e7065aa5a6caf0f69c5ff25dbd | None | VirusTotal | True | high | {'verbose_msg': 'Scan finished, information embedded', 'response_code': 1, 'positives': 51, 'res... | {'scans': {'Bkav': {'detected': True, 'version': '1.3.0.9899', 'result': 'W32.AIDetect.malware2'... | https://www.virustotal.com/vtapi/v2/file/report | 0 |
1 | 06b020a3fd3296bc4c7bf53307fe7b40638e7f445bdd43fac1d04547a429fdaf | sha256_hash | 06b020a3fd3296bc4c7bf53307fe7b40638e7f445bdd43fac1d04547a429fdaf | None | VirusTotal | True | high | {'verbose_msg': 'Scan finished, information embedded', 'response_code': 1, 'positives': 55, 'res... | {'scans': {'Bkav': {'detected': False, 'version': '1.3.0.9899', 'result': None, 'update': '20201... | https://www.virustotal.com/vtapi/v2/file/report | 0 |
2 | 06c676bf8f5c6af99172c1cf63a84348628ae3f39df9e523c42447e2045e00ff | sha256_hash | 06c676bf8f5c6af99172c1cf63a84348628ae3f39df9e523c42447e2045e00ff | None | VirusTotal | True | high | {'verbose_msg': 'Scan finished, information embedded', 'response_code': 1, 'positives': 53, 'res... | {'scans': {'Bkav': {'detected': True, 'version': '1.3.0.9899', 'result': 'W32.AIDetect.malware1'... | https://www.virustotal.com/vtapi/v2/file/report | 0 |
Because pivot functions can take dataframes as inputs and return them as outputs, you can create chains of pivot functions. You can also add other items to the chain that input or output dataframes.
For example, you could build a chain that included the following:
To make building these types of pipelines easier we've implemented some
pandas helper functions. These are available in the mp_pivot
property of pandas DataFrames, once Pivot is imported.
run
lets you run a pivot function as a pandas pipeline operation.
Let's take an example of a simple pivot function using a dataframe as input
IpAddress.util.whois(data=my_df, column="Ioc")
We can us mp_pivot.run to do this:
(
my_df
.query("UserCount > 1")
.mp_pivot.run(IpAddress.util.whois, column="Ioc")
.drop_duplicates()
)
The pandas extension takes care of the data=my_df
parameter. We still have
to add any other required parameters (like the column specification in this case.
When it runs it returns its output as a DataFrame and the next operation
(drop_duplicates()) runs on this output.
Depending on the scenario you might want to preserve the existing dataframe
contents (most of the pivot functions only return the results of their specific
operation - e.g. whois returns ASN information for an IP address). You
can carry the columns of the input dataframe over to the output from
the pivot function by adding a join
parameter to the mp_pivot.run() call.
Use a "left" to keep all of the input rows regardless of whether the pivot
function returned a result for that row.
Use an "inner" join to return only rows where the input had a positive result
in the pivot function.
.mp_pivot.run(IpAddress.util.whois, column="Ioc", join="inner")
There are also a couple of convenience functions. These only work in an IPython/Jupyter environment.
mp_pivot.display
will display the intermediate results of the dataframe in the middle
of a pipeline. It does not change the data at all, but does give you the
chance to display a view of the data partway through processing. This
is useful for debugging but its main purpose is to give you a way to
show partial results without having to break the pipeline into pieces
and create unnecessary throw-away variables that will add bulk to your
code and clutter to your memory.
display
supports some options that you can use to modify the displayed
output:
head
rowsThese options do not affect the data being passed through the pipeline - only how the intermediate output is displayed.
mp_pivot.tee
behaves a little like the Linux "tee" command. It allows the
data to pass through unchanged but allows you to create a variable that
is a snapshot of the data at that point in the pipeline. It takes
a parameter var_name
and assigns the current DataFrame instance
to that name. So, when your pipeline has run you can access partial results (again,
without having to break up your pipeline to do so).
By default, it will not overwrite an existing variable of the same name
unless you specify clobber=True
in the call to tee
.
behaves similarly to the "tee" function above except that it
will try to execute the DataFrame accessor function on the input
DataFrame. The name of the function (as a string) can be passed named as the value of the
df_func
named parameter, or the first positional.
The function must be a method of a pandas DataFrame - this includes
built-in functions such as .query
, .sort_values
or a custom function
added as a custom pd accessor function (see
Extending pandas)
mp_pivot.tee_exec
allows the input
data to pass through unchanged but will also send
a snapshot of the data at that point in the pipeline to the named function.
You can also pass arbitrary other named arguments to the tee_exec
. These arguments will be passed to the df_func
function.
The example below shows the use of mp_pivot.run and mp_pivot.display.
This takes an existing DataFrame - suspcious_ips - and:
query
to filter only the high severity hitsjoin='left'
so our output will be all TI result data plus whois dataThe final step uses another MSTICPy pandas extension to plot the login attempts on a timeline chart.
suspicious_ips = [
"212.109.217.155",
"103.141.68.38",
"165.255.70.149",
"206.1.228.141",
]
suspicious_ips_df = pd.DataFrame(suspicious_ips, columns=["IPAddress"])
(
suspicious_ips_df
.mp_pivot.display(title=f"Initial IPs {len(suspicious_ips)}", head=5)
# Lookup IPs at OTX
.mp_pivot.run(IpAddress.ti.lookup_ipv4_OTX, column="IPAddress")
# Filter on high severity
.query("Severity == 'high'")
.mp_pivot.run(IpAddress.util.whois, column="Ioc", join="left")
.mp_pivot.display(title="TI High Severity IPs", head=5)
# Query IPs that have login attempts
.mp_pivot.run(IpAddress.MSSentinel.aad_signins, ip_address_list="Ioc")
# Send the output of this to a plot
.mp_plot.timeline(
title="High Severity IPs with Logon attempts",
source_columns=["UserPrincipalName", "IPAddress", "ResultType", "ClientAppUsed", "UserAgent", "Location"],
group_by="UserPrincipalName"
)
)
IPAddress | |
---|---|
0 | 113.190.36.2 |
1 | 118.163.135.17 |
2 | 118.163.135.18 |
3 | 118.163.97.19 |
4 | 125.34.240.33 |
Ioc | IocType | SafeIoc | QuerySubtype | Provider | Result | Severity | Details | RawResult | Reference | Status | nir | asn_registry | asn | asn_cidr | asn_country_code | asn_date | asn_description | query | nets | raw | referral | raw_referral | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 113.190.36.2 | ipv4 | 113.190.36.2 | None | OTX | True | high | {'pulse_count': 46, 'names': ['Analysis Report (AR21-013A) - Strengthening Security Configuratio... | {'whois': 'http://whois.domaintools.com/113.190.36.2', 'reputation': 0, 'indicator': '113.190.36... | https://otx.alienvault.com/api/v1/indicators/IPv4/113.190.36.2/general | 0 | None | apnic | 45899 | 113.190.32.0/20 | VN | 2008-11-04 | VNPT-AS-VN VNPT Corp, VN | 113.190.36.2 | [{'cidr': '113.160.0.0/11', 'name': 'VNPT-VN', 'handle': 'PTH13-AP', 'range': '113.160.0.0 - 113... | None | None | None |
1 | 118.163.135.17 | ipv4 | 118.163.135.17 | None | OTX | True | high | {'pulse_count': 50, 'names': ['IOCs - 20201122247 - ANIA Threat Feeds - IP Segment 0', 'IOCs - 2... | {'whois': 'http://whois.domaintools.com/118.163.135.17', 'reputation': 0, 'indicator': '118.163.... | https://otx.alienvault.com/api/v1/indicators/IPv4/118.163.135.17/general | 0 | None | apnic | 3462 | 118.163.0.0/16 | TW | 2007-10-04 | HINET Data Communication Business Group, TW | 118.163.135.17 | [{'cidr': '118.160.0.0/13', 'name': 'HINET-NET', 'handle': 'AT939-AP', 'range': '118.160.0.0 - 1... | None | None | None |
2 | 118.163.135.18 | ipv4 | 118.163.135.18 | None | OTX | True | high | {'pulse_count': 50, 'names': ['Analysis Report (AR21-013A) - Strengthening Security Configuratio... | {'whois': 'http://whois.domaintools.com/118.163.135.18', 'reputation': 0, 'indicator': '118.163.... | https://otx.alienvault.com/api/v1/indicators/IPv4/118.163.135.18/general | 0 | None | apnic | 3462 | 118.163.0.0/16 | TW | 2007-10-04 | HINET Data Communication Business Group, TW | 118.163.135.18 | [{'cidr': '118.160.0.0/13', 'name': 'HINET-NET', 'handle': 'AT939-AP', 'range': '118.160.0.0 - 1... | None | None | None |
3 | 118.163.97.19 | ipv4 | 118.163.97.19 | None | OTX | True | high | {'pulse_count': 50, 'names': ['Analysis Report (AR21-013A) - Strengthening Security Configuratio... | {'whois': 'http://whois.domaintools.com/118.163.97.19', 'reputation': 0, 'indicator': '118.163.9... | https://otx.alienvault.com/api/v1/indicators/IPv4/118.163.97.19/general | 0 | None | apnic | 3462 | 118.163.0.0/16 | TW | 2007-10-04 | HINET Data Communication Business Group, TW | 118.163.97.19 | [{'cidr': '118.160.0.0/13', 'name': 'HINET-NET', 'handle': 'AT939-AP', 'range': '118.160.0.0 - 1... | None | None | None |
4 | 125.34.240.33 | ipv4 | 125.34.240.33 | None | OTX | True | high | {'pulse_count': 50, 'names': ['IOCs - 20223111348 - ANIA Threat Feeds - IP Segment 4', 'IOCs - 2... | {'whois': 'http://whois.domaintools.com/125.34.240.33', 'reputation': 0, 'indicator': '125.34.24... | https://otx.alienvault.com/api/v1/indicators/IPv4/125.34.240.33/general | 0 | None | apnic | 4837 | 125.34.240.0/24 | CN | 2006-01-09 | CHINA169-BACKBONE CHINA UNICOM China169 Backbone, CN | 125.34.240.33 | [{'cidr': '125.34.0.0/16', 'name': 'UNICOM-BJ', 'handle': 'CH1302-AP', 'range': '125.34.0.0 - 12... | None | None | None |
No data to plot.
This is what the pipelined functions should output (although the results will obviously not be the same for your environment).
To do this you need the following information
Item | Description | Required |
---|---|---|
src_module | The src_module to containing the class or function | Yes |
class | The class containing function | No |
src_func_name | The name of the function to wrap | Yes |
func_new_name | Rename the function | No |
input type | The input type that the wrapped function expects (dataframe iterable value) | Yes |
entity_map | Mapping of entity and attribute used for function | Yes |
func_df_param_name | The param name that the function uses as input param for DataFrame | If DF input |
func_df_col_param_name | The param name that function uses to identify the input column name | If DF input |
func_out_column_name | Name of the column in the output DF to use as a key to join | If DF output |
func_static_params | dict of static name/value params always sent to the function | No |
func_input_value_arg | Name of the param that the wrapped function uses for its input value | No |
can_iterate | True if the function supports being called multiple times | No |
entity_container_name | The name of the container in the entity where the func will appear | No |
The entity_map controls where the pivot function will be added. Each entry
requires an Entity name (see msticpy.datamodel.entities) and an entity
attribute name. This is only used if an instance of the entity is used
as a parameter to the function. For IpAddress
in the example below,
the pivot function will try to extract the value of the Address
attribute
when an instance of IpAddress is used as a function parameter.
entity_map:
IpAddress: Address
Host: HostName
Account: Name
This means that you can specify different attributes of the same entity for different functions (or even for two instances of the same function)
The func_df_param_name
and func_df_col_param_name
are needed only if
the source function takes a dataframe and column name as input parameters.
func_out_column_name
is relevant if the source function returns a
dataframe. In order to join input data with output data this needs to
be the column in the output that has the same value as the function
input (e.g. if you are processing IP addresses and the column name
in the output DF containing the IP is named "ip_addr", put "ip_addr" here.)
When you have this information create or add this to a yaml file
with the top-level element pivot_providers
.
Example from the msticpy ip_utils who_is
function
pivot_providers:
...
who_is:
src_module: msticpy.sectools.ip_utils
src_func_name: get_whois_df
func_new_name: whois
input_type: dataframe
entity_map:
IpAddress: Address
func_df_param_name: data
func_df_col_param_name: ip_column
func_out_column_name: ip
func_static_params:
whois_col: whois_result
func_input_value_arg: ip_address
Once you have your yaml definition file you can call
Pivot.register_pivot_providers(
pivot_reg_path=path_to_your_yaml,
namespace=globals(),
def_container="my_container",
force_container=True
)
Note, this is not persistent. You will need to call this each time you start a new session.
Pivot.register_pivot_providers(
pivot_reg_path: str,
namespace: Dict[str, Any] = None,
def_container: str = 'custom',
force_container: bool = False,
)
Docstring:
Register pivot functions from configuration file.
Parameters
----------
file_path : str
Path to config yaml file
namespace : Dict[str, Any], optional
Namespace to search for existing instances of classes, by default None
container : str, optional
Container name to use for entity pivot functions, by default "other"
force_container : bool, optional
Force `container` value to be used even if entity definitions have
specific setting for a container name, by default False
Pivot.register_pivot_providers?
Signature: Pivot.register_pivot_providers( pivot_reg_path: str, namespace: Dict[str, Any] = None, def_container: str = 'custom', force_container: bool = False, ) Docstring: Register pivot functions from configuration file. Parameters ---------- pivot_reg_path : str Path to config yaml file namespace : Dict[str, Any], optional Namespace to search for existing instances of classes, by default None def_container : str, optional Container name to use for entity pivot functions, by default "other" force_container : bool, optional Force `container` value to be used even if entity definitions have specific setting for a container name, by default False Raises ------ ValueError An entity specified in the config file is not recognized. File: e:\src\msticpy\msticpy\init\pivot.py Type: function
You can also add ad hoc functions as pivot functions. This is probably a less common scenario but may be useful for testing and development.
You can either create a PivotRegistration object and supply that (along
with the func
parameter), to this method.
from msticpy.init.pivot import PivotRegistration
def my_func(input: str):
return input.upper()
piv_reg = PivotRegistration(
input_type="value",
entity_map={"Host": "HostName"},
func_input_value_arg="input",
func_new_name="upper_name"
)
Pivot.add_pivot_function(my_func, piv_reg, container="change_case")
Alternatively, you can supply the pivot registration parameters as keyword arguments:
def my_func(input: str):
return input.upper()
Pivot.add_pivot_function(
func=my_func,
container="change_case",
input_type="value",
entity_map={"Host": "HostName"},
func_input_value_arg="input",
func_new_name="upper_name",
)
pipelines:
pipeline1:
description: Pipeline 1 description
steps:
- name: get_logons
step_type: pivot
function: util.whois
entity: IpAddress
comment: Standard pivot function
params:
column: IpAddress
join: inner
- name: disp_logons
step_type: pivot_display
comment: Pivot display
params:
title: "The title"
cols:
- Computer
- Account
query: Computer.str.startswith('MSTICAlerts')
head: 10
- name: tee_logons
step_type: pivot_tee
comment: Pivot tee
params:
var_name: var_df
clobber: True
- name: tee_logons_disp
step_type: pivot_tee_exec
comment: Pivot tee_exec with mp_plot.timeline
function: mp_plot.timeline
params:
source_columns:
- Computer
- Account
- name: logons_timeline
step_type: pd_accessor
comment: Standard accessor with mp_plot.timeline
function: mp_plot.timeline
params:
source_columns:
- Computer
- Account
pipeline2:
description: Pipeline 2 description
steps:
- name: get_logons
step_type: pivot
function: util.whois
entity: IpAddress
comment: Standard pivot function
params:
column: IpAddress
join: inner
- name: disp_logons
step_type: pivot_display
comment: Pivot display
params:
title: "The title"
cols:
- Computer
- Account
query: Computer.str.startswith('MSTICAlerts')
head: 10
- name: tee_logons
step_type: pivot_tee
comment: Pivot tee
params:
var_name: var_df
clobber: True
from msticpy.init.pivot_core.pivot_pipeline import Pipeline
pipelines_yml = """
pipelines:
pipeline1:
description: Pipeline 1 description
steps:
- name: get_ip_type
step_type: pivot
function: util.ip_type
entity: IpAddress
comment: Get IP Type
params:
column: IPAddress
join: inner
- name: filter_public
step_type: pd_accessor
comment: Filter to only public IPs
function: query
pos_params:
- result == "Public"
- name: whois
step_type: pivot
function: util.whois
entity: IpAddress
comment: Get Whois info
params:
column: IPAddress
join: inner
"""
pipelines = list(Pipeline.from_yaml(pipelines_yml))
print(pipelines[0].print_pipeline())
# Pipeline 1 description ( input_df # Get IP Type .mp_pivot.run(IpAddress.util.ip_type, column='IPAddress', join='inner') # Filter to only public IPs .query('result == "Public"') # Get Whois info .mp_pivot.run(IpAddress.util.whois, column='IPAddress', join='inner') )
pipeline1 = pipelines[0]
result_df = pipeline1.run(data=suspicious_ips_df)
result_df.head(3)
Steps: 0%| | 0/3 [00:00<?, ?it/s]
step = get_ip_type PipelineExecStep(accessor='mp_pivot.run', pos_params=[], params={'func': <function get_ip_type at 0x000001C163E4E0D0>, 'column': 'IPAddress', 'join': 'inner'}, text=".mp_pivot.run(IpAddress.util.ip_type, column='IPAddress', join='inner')", comment='Get IP Type', step_type='pivot') step = filter_public PipelineExecStep(accessor='query', pos_params=['result == "Public"'], params={}, text='.query(\'result == "Public"\')', comment='Filter to only public IPs', step_type='pd_accessor') step = whois PipelineExecStep(accessor='mp_pivot.run', pos_params=[], params={'func': <function get_whois_df at 0x000001C163E4E040>, 'column': 'IPAddress', 'join': 'inner'}, text=".mp_pivot.run(IpAddress.util.whois, column='IPAddress', join='inner')", comment='Get Whois info', step_type='pivot')
Steps: 100%|██████████| 3/3 [00:00<00:00, 3.49it/s]
IPAddress | ip | result | nir | asn_registry | asn | asn_cidr | asn_country_code | asn_date | asn_description | query | nets | raw | referral | raw_referral | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 113.190.36.2 | 113.190.36.2 | Public | None | apnic | 45899 | 113.190.32.0/20 | VN | 2008-11-04 | VNPT-AS-VN VNPT Corp, VN | 113.190.36.2 | [{'cidr': '113.160.0.0/11', 'name': 'VNPT-VN', 'handle': 'PTH13-AP', 'range': '113.160.0.0 - 113... | None | None | None |
1 | 118.163.135.17 | 118.163.135.17 | Public | None | apnic | 3462 | 118.163.0.0/16 | TW | 2007-10-04 | HINET Data Communication Business Group, TW | 118.163.135.17 | [{'cidr': '118.160.0.0/13', 'name': 'HINET-NET', 'handle': 'AT939-AP', 'range': '118.160.0.0 - 1... | None | None | None |
2 | 118.163.135.18 | 118.163.135.18 | Public | None | apnic | 3462 | 118.163.0.0/16 | TW | 2007-10-04 | HINET Data Communication Business Group, TW | 118.163.135.18 | [{'cidr': '118.160.0.0/13', 'name': 'HINET-NET', 'handle': 'AT939-AP', 'range': '118.160.0.0 - 1... | None | None | None |