NFStream: a Flexible Network Data Analysis Framework¶

In [1]:

import nfstream
print(nfstream.__version__)

6.5.3

NFStream is a multiplatform Python framework providing fast, flexible, and expressive data structures designed to make working with online or offline network data easy and intuitive. It aims to be Python's fundamental high-level building block for doing practical, real-world network flow data analysis. Additionally, it has the broader goal of becoming a unifying network data analytics framework for researchers providing data reproducibility across experiments.

Performance: NFStream is designed to be fast: [AF_PACKET_V3/FANOUT][packet] on Linux, multiprocessing, native

CFFI based computation engine, and PyPy full support.

Encrypted layer-7 visibility: NFStream deep packet inspection is based on nDPI.

It allows NFStream to perform reliable encrypted applications identification and metadata fingerprinting (e.g. TLS, SSH, DHCP, HTTP).

System visibility: NFStream probes the monitored system's kernel to obtain information on open Internet sockets

and collects guaranteed ground-truth (process name, PID, etc.) at the application level.

Statistical features extraction: NFStream provides state of the art of flow-based statistical feature extraction.

It includes post-mortem statistical features (e.g., minimum, mean, standard deviation, and maximum of packet size and inter-arrival time) and early flow features (e.g. sequence of first n packets sizes, inter-arrival times, and directions).

Flexibility: NFStream is easily extensible using NFPlugins. It allows the creation of a new flow

feature within a few lines of Python.

Machine Learning oriented: NFStream aims to make Machine Learning Approaches for network traffic management

reproducible and deployable. By using NFStream as a common framework, researchers ensure that models are trained using the same feature computation logic, and thus, a fair comparison is possible. Moreover, trained models can be deployed and evaluated on live networks using NFPlugins.

In this notebook, we demonstrate a subset of features provided by NFStream.

In [2]:

from nfstream import NFStreamer, NFPlugin
import pandas as pd
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 500)

Flow aggregation made simple¶

In the following, we are going to use the main object provided by nfstream, NFStreamer which have the following parameters:

source [default=None]: Packet capture source. Pcap file path or network interface name.
decode_tunnels [default=True]: Enable/Disable GTP/TZSP tunnels decoding.
bpf_filter [default=None]: Specify a BPF filter filter for filtering selected traffic.
promiscuous_mode [default=True]: Enable/Disable promiscuous capture mode.
snapshot_length [default=1500]: Control packet slicing size (truncation) in bytes.
idle_timeout [default=120]: Flows that are idle (no packets received) for more than this value in seconds are expired.
active_timeout [default=1800]: Flows that are active for more than this value in seconds are expired.
accounting_mode [default=0] : Specify the accounting mode that will be used to report bytes related features (0: Link layer, 1: IP layer, 2: Transport layer, 3: Payload).
udps [default=None]: Specify user defined NFPlugins used to extend NFStreamer.
n_dissections [default=20]: Number of per flow packets to dissect for L7 visibility feature. When set to 0, L7 visibility feature is disabled.
statistical_analysis [default=False]: Enable/Disable post-mortem flow statistical analysis.
splt_analysis [default=0]: Specify the sequence of first packets length for early statistical analysis. When set to 0, splt_analysis is disabled.
max_nflows [default=0]: Specify the number of maximum flows to capture before returning. Unset when equal to 0.
n_meters [default=0]: Specify the number of parallel metering processes. When set to 0, NFStreamer will automatically scale metering according to available physical cores on the running host.
performance_report [default=0]: Performance report interval in seconds. Disabled whhen set to 0. Ignored for offline capture.
system_visibility_mode [default=0] Enable system process mapping by probing the host machine.
system_visibility_poll_ms [default=100] Set the polling interval in milliseconds for system process mapping feature (0 is the maximum achievable rate).

NFStreamer returns a flow iterator. We can iterate over flows or convert it directly to pandas Dataframe using to_pandas() method.

In [3]:

df = NFStreamer(source="pcap/instagram.pcap").to_pandas()

In [4]:

df.head()

Out[4]:

	id	src_ip	src_mac	src_oui	src_port	dst_ip	dst_mac	dst_oui	dst_port	protocol	ip_version	bidirectional_first_seen_ms	bidirectional_last_seen_ms	bidirectional_duration_ms	bidirectional_packets	bidirectional_bytes	src2dst_first_seen_ms	src2dst_last_seen_ms	src2dst_duration_ms	src2dst_packets	src2dst_bytes	dst2src_first_seen_ms	dst2src_last_seen_ms	dst2src_duration_ms	dst2src_packets	dst2src_bytes	application_name	application_category_name	application_confidence	requested_server_name	client_fingerprint	server_fingerprint	user_agent	content_type
0	0	192.168.0.103	40:f3:08:c3:8e:e1	40:f3:08	33936	31.13.93.52	00:1b:2f:f0:7e:b4	00:1b:2f	443	6	4	1436720898386	1436720908442	10056	68	45688	1436720898386	1436720908442	10056	34	5555	1436720898475	1436720908442	9967	34	40133	TLS	Web	6	NaN	NaN	NaN	NaN	NaN
1	1	192.168.0.106	00:16:44:1f:59:66	00:16:44	17500	255.255.255.255	ff:ff:ff:ff:ff:ff	ff:ff:ff	17500	17	4	1436720906017	1436720906024	7	4	580	1436720906017	1436720906024	7	4	580	0	0	0	0	0	Dropbox	Cloud	6	NaN	NaN	NaN	NaN	NaN
2	2	192.168.0.106	00:16:44:1f:59:66	00:16:44	17500	192.168.0.255	ff:ff:ff:ff:ff:ff	ff:ff:ff	17500	17	4	1436720906022	1436720906022	0	1	145	1436720906022	1436720906022	0	1	145	0	0	0	0	0	Dropbox	Cloud	6	NaN	NaN	NaN	NaN	NaN
3	3	192.168.0.1	00:1b:2f:f0:7e:b4	00:1b:2f	520	192.168.0.255	ff:ff:ff:ff:ff:ff	ff:ff:ff	520	17	4	1436720906025	1436720906025	0	1	66	1436720906025	1436720906025	0	1	66	0	0	0	0	0	Unknown	Unspecified	0	NaN	NaN	NaN	NaN	NaN
4	4	192.168.0.103	00:00:00:00:00:00	00:00:00	0	192.168.0.103	00:00:00:00:00:00	00:00:00	0	1	4	1436720908464	1436720911139	2675	5	510	1436720908464	1436720911139	2675	5	510	0	0	0	0	0	ICMP	Network	6	NaN	NaN	NaN	NaN	NaN

In [5]:

df.shape

Out[5]:

(38, 38)

We can enable post-mortem statistical flow features extraction as follow:

In [6]:

df = NFStreamer(source="pcap/instagram.pcap", statistical_analysis=True).to_pandas()

In [7]:

df.head()

Out[7]:

	id	src_ip	src_mac	src_oui	src_port	dst_ip	dst_mac	dst_oui	dst_port	protocol	ip_version	bidirectional_first_seen_ms	bidirectional_last_seen_ms	bidirectional_duration_ms	bidirectional_packets	bidirectional_bytes	src2dst_first_seen_ms	src2dst_last_seen_ms	src2dst_duration_ms	src2dst_packets	src2dst_bytes	dst2src_first_seen_ms	dst2src_last_seen_ms	dst2src_duration_ms	dst2src_packets	dst2src_bytes	bidirectional_min_ps	bidirectional_mean_ps	bidirectional_stddev_ps	bidirectional_max_ps	src2dst_min_ps	src2dst_mean_ps	src2dst_stddev_ps	src2dst_max_ps	dst2src_min_ps	dst2src_mean_ps	dst2src_stddev_ps	dst2src_max_ps	bidirectional_min_piat_ms	bidirectional_mean_piat_ms	bidirectional_stddev_piat_ms	bidirectional_max_piat_ms	src2dst_min_piat_ms	src2dst_mean_piat_ms	src2dst_stddev_piat_ms	src2dst_max_piat_ms	dst2src_mean_piat_ms	dst2src_stddev_piat_ms	dst2src_max_piat_ms	bidirectional_ack_packets	bidirectional_psh_packets	src2dst_ack_packets	src2dst_psh_packets	dst2src_ack_packets	dst2src_psh_packets	application_name	application_category_name	application_confidence	requested_server_name	client_fingerprint	server_fingerprint	user_agent	content_type
0	0	192.168.0.103	40:f3:08:c3:8e:e1	40:f3:08	33936	31.13.93.52	00:1b:2f:f0:7e:b4	00:1b:2f	443	6	4	1436720898386	1436720908442	10056	68	45688	1436720898386	1436720908442	10056	34	5555	1436720898475	1436720908442	9967	34	40133	66	671.882353	661.76184	1464	66	163.382353	322.650107	1431	66	1180.382353	502.204535	1464	0	150.089552	951.791862	7669	0	304.727273	1349.724098	7669	302.030303	1358.385703	7709	68	10	34	3	34	7	TLS	Web	6	NaN	NaN	NaN	NaN	NaN
1	1	192.168.0.106	00:16:44:1f:59:66	00:16:44	17500	255.255.255.255	ff:ff:ff:ff:ff:ff	ff:ff:ff	17500	17	4	1436720906017	1436720906024	7	4	580	1436720906017	1436720906024	7	4	580	0	0	0	0	0	145	145.000000	0.00000	145	145	145.000000	0.000000	145	0	0.000000	0.000000	0	1	2.333333	1.527525	4	1	2.333333	1.527525	4	0.000000	0.000000	0	0	0	0	0	0	0	Dropbox	Cloud	6	NaN	NaN	NaN	NaN	NaN
2	2	192.168.0.106	00:16:44:1f:59:66	00:16:44	17500	192.168.0.255	ff:ff:ff:ff:ff:ff	ff:ff:ff	17500	17	4	1436720906022	1436720906022	0	1	145	1436720906022	1436720906022	0	1	145	0	0	0	0	0	145	145.000000	0.00000	145	145	145.000000	0.000000	145	0	0.000000	0.000000	0	0	0.000000	0.000000	0	0	0.000000	0.000000	0	0.000000	0.000000	0	0	0	0	0	0	0	Dropbox	Cloud	6	NaN	NaN	NaN	NaN	NaN
3	3	192.168.0.1	00:1b:2f:f0:7e:b4	00:1b:2f	520	192.168.0.255	ff:ff:ff:ff:ff:ff	ff:ff:ff	520	17	4	1436720906025	1436720906025	0	1	66	1436720906025	1436720906025	0	1	66	0	0	0	0	0	66	66.000000	0.00000	66	66	66.000000	0.000000	66	0	0.000000	0.000000	0	0	0.000000	0.000000	0	0	0.000000	0.000000	0	0.000000	0.000000	0	0	0	0	0	0	0	Unknown	Unspecified	0	NaN	NaN	NaN	NaN	NaN
4	4	192.168.0.103	00:00:00:00:00:00	00:00:00	0	192.168.0.103	00:00:00:00:00:00	00:00:00	0	1	4	1436720908464	1436720911139	2675	5	510	1436720908464	1436720911139	2675	5	510	0	0	0	0	0	102	102.000000	0.00000	102	102	102.000000	0.000000	102	0	0.000000	0.000000	0	0	668.750000	1173.672122	2420	0	668.750000	1173.672122	2420	0.000000	0.000000	0	0	0	0	0	0	0	ICMP	Network	6	NaN	NaN	NaN	NaN	NaN

We can enable early statistical flow features extraction as follow:

In [8]:

df = NFStreamer(source="pcap/instagram.pcap", splt_analysis=10).to_pandas()

In [9]:

df.head()

Out[9]:

	id	src_ip	src_mac	src_oui	src_port	dst_ip	dst_mac	dst_oui	dst_port	protocol	ip_version	bidirectional_first_seen_ms	bidirectional_last_seen_ms	bidirectional_duration_ms	bidirectional_packets	bidirectional_bytes	src2dst_first_seen_ms	src2dst_last_seen_ms	src2dst_duration_ms	src2dst_packets	src2dst_bytes	dst2src_first_seen_ms	dst2src_last_seen_ms	dst2src_duration_ms	dst2src_packets	dst2src_bytes	splt_direction	splt_ps	splt_piat_ms	application_name	application_category_name	application_confidence	requested_server_name	client_fingerprint	server_fingerprint	user_agent	content_type
0	0	192.168.0.103	40:f3:08:c3:8e:e1	40:f3:08	33936	31.13.93.52	00:1b:2f:f0:7e:b4	00:1b:2f	443	6	4	1436720898386	1436720908442	10056	68	45688	1436720898386	1436720908442	10056	34	5555	1436720898475	1436720908442	9967	34	40133	[0, 1, 1, 0, 0, 1, 1, 0, 1, 0]	[1431, 66, 679, 66, 1063, 66, 1464, 66, 209, 66]	[0, 89, 76, 0, 1523, 50, 340, 0, 2, 0]	TLS	Web	6	NaN	NaN	NaN	NaN	NaN
1	1	192.168.0.106	00:16:44:1f:59:66	00:16:44	17500	255.255.255.255	ff:ff:ff:ff:ff:ff	ff:ff:ff	17500	17	4	1436720906017	1436720906024	7	4	580	1436720906017	1436720906024	7	4	580	0	0	0	0	0	[0, 0, 0, 0, -1, -1, -1, -1, -1, -1]	[145, 145, 145, 145, -1, -1, -1, -1, -1, -1]	[0, 2, 1, 4, -1, -1, -1, -1, -1, -1]	Dropbox	Cloud	6	NaN	NaN	NaN	NaN	NaN
2	2	192.168.0.106	00:16:44:1f:59:66	00:16:44	17500	192.168.0.255	ff:ff:ff:ff:ff:ff	ff:ff:ff	17500	17	4	1436720906022	1436720906022	0	1	145	1436720906022	1436720906022	0	1	145	0	0	0	0	0	[0, -1, -1, -1, -1, -1, -1, -1, -1, -1]	[145, -1, -1, -1, -1, -1, -1, -1, -1, -1]	[0, -1, -1, -1, -1, -1, -1, -1, -1, -1]	Dropbox	Cloud	6	NaN	NaN	NaN	NaN	NaN
3	3	192.168.0.1	00:1b:2f:f0:7e:b4	00:1b:2f	520	192.168.0.255	ff:ff:ff:ff:ff:ff	ff:ff:ff	520	17	4	1436720906025	1436720906025	0	1	66	1436720906025	1436720906025	0	1	66	0	0	0	0	0	[0, -1, -1, -1, -1, -1, -1, -1, -1, -1]	[66, -1, -1, -1, -1, -1, -1, -1, -1, -1]	[0, -1, -1, -1, -1, -1, -1, -1, -1, -1]	Unknown	Unspecified	0	NaN	NaN	NaN	NaN	NaN
4	4	192.168.0.103	40:f3:08:c3:8e:e1	40:f3:08	38816	46.33.70.160	00:1b:2f:f0:7e:b4	00:1b:2f	80	6	4	1436720900684	1436720900750	66	52	58994	1436720900684	1436720900750	66	13	1118	1436720900716	1436720900744	28	39	57876	[0, 1, 0, 1, 1, 1, 1, 1, 1, 1]	[326, 1484, 66, 1484, 1484, 1484, 1484, 1484, ...	[0, 32, 1, 0, 1, 2, 2, 0, 0, 0]	HTTP.Instagram	SocialNetwork	6	photos-h.ak.instagram.com	NaN	NaN	Instagram 7.1.1 Android (19/4.4.2; 480dpi; 108...	NaN

We can enable IP anonymization as follow:

In [10]:

df = NFStreamer(source="pcap/instagram.pcap", 
                statistical_analysis=True).to_pandas(columns_to_anonymize=["src_ip", "src_mac", "dst_ip", "dst_mac"])

In [11]:

df.head()

Out[11]:

	id	src_ip	src_mac	src_oui	src_port	dst_ip	dst_mac	dst_oui	dst_port	protocol	ip_version	bidirectional_first_seen_ms	bidirectional_last_seen_ms	bidirectional_duration_ms	bidirectional_packets	bidirectional_bytes	src2dst_first_seen_ms	src2dst_last_seen_ms	src2dst_duration_ms	src2dst_packets	src2dst_bytes	dst2src_first_seen_ms	dst2src_last_seen_ms	dst2src_duration_ms	dst2src_packets	dst2src_bytes	bidirectional_min_ps	bidirectional_mean_ps	bidirectional_stddev_ps	bidirectional_max_ps	src2dst_min_ps	src2dst_mean_ps	src2dst_stddev_ps	src2dst_max_ps	dst2src_min_ps	dst2src_mean_ps	dst2src_stddev_ps	dst2src_max_ps	bidirectional_min_piat_ms	bidirectional_mean_piat_ms	bidirectional_stddev_piat_ms	bidirectional_max_piat_ms	src2dst_mean_piat_ms	src2dst_stddev_piat_ms	src2dst_max_piat_ms	dst2src_mean_piat_ms	dst2src_stddev_piat_ms	dst2src_max_piat_ms	bidirectional_syn_packets	bidirectional_ack_packets	bidirectional_psh_packets	src2dst_syn_packets	src2dst_ack_packets	src2dst_psh_packets	dst2src_ack_packets	dst2src_psh_packets	application_name	application_category_name	application_is_guessed	application_confidence	requested_server_name	client_fingerprint	server_fingerprint	user_agent	content_type
0	0	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	57936	3a44c94fd7c9aefa07df278f016460d75aa809c94a571c...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	80	6	4	1436720900687	1436720901200	513	58	50220	1436720900687	1436720901200	513	24	1837	1436720900744	1436720901200	456	34	48383	66	865.862069	696.739485	1484	66	76.541667	51.643409	319	186	1423.029412	252.360311	1484	0	9.000000	45.124035	321	22.304348	70.131976	322	13.818182	58.73109	323	0	58	4	0	24	1	34	3	HTTP.Instagram	SocialNetwork	0	6	photos-g.ak.instagram.com	NaN	NaN	Instagram 7.1.1 Android (19/4.4.2; 480dpi; 108...	NaN
1	1	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	38816	c26866c915aaa410921c4fc309477eb0ceba2caec77bcf...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	80	6	4	1436720900684	1436720900750	66	52	58994	1436720900684	1436720900750	66	13	1118	1436720900716	1436720900744	28	39	57876	66	1134.500000	612.257779	1484	66	86.000000	72.111026	326	1484	1484.000000	0.000000	1484	0	1.294118	4.495750	32	5.500000	9.170110	33	0.736842	0.68514	2	0	52	1	0	13	1	39	0	HTTP.Instagram	SocialNetwork	0	6	photos-h.ak.instagram.com	NaN	NaN	Instagram 7.1.1 Android (19/4.4.2; 480dpi; 108...	NaN
2	2	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	37350	68df6e56301d6c238c302eaa732d2075ef802914771e7b...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	80	6	4	1436720901262	1436720901262	0	1	324	1436720901262	1436720901262	0	1	324	0	0	0	0	0	324	324.000000	0.000000	324	324	324.000000	0.000000	324	0	0.000000	0.000000	0	0	0.000000	0.000000	0	0.000000	0.000000	0	0.000000	0.00000	0	0	1	1	0	1	1	0	0	HTTP.Instagram	SocialNetwork	0	6	photos-a.ak.instagram.com	NaN	NaN	Instagram 7.1.1 Android (19/4.4.2; 480dpi; 108...	NaN
3	3	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	33603	107d5f2f69c5e2f3da41be7ce2a59c0d818947212d6ef0...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	53	17	4	1436720908524	1436720908575	51	2	298	1436720908524	1436720908524	0	1	89	1436720908575	1436720908575	0	1	209	89	149.000000	84.852814	209	89	89.000000	0.000000	89	209	209.000000	0.000000	209	51	51.000000	0.000000	51	0.000000	0.000000	0	0.000000	0.00000	0	0	0	0	0	0	0	0	0	DNS.Instagram	Network	0	6	igcdn-photos-a-a.akamaihd.net	NaN	NaN	NaN	NaN
4	4	d7feda5309e4f8477aac71903e83486f9f13566cc836f8...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	80	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	40855	6	4	1436720952611	1436720952611	0	2	140	1436720952611	1436720952611	0	1	74	1436720952611	1436720952611	0	1	66	66	70.000000	5.656854	74	74	74.000000	0.000000	74	66	66.000000	0.000000	66	0	0.000000	0.000000	0	0.000000	0.000000	0	0.000000	0.00000	0	1	2	0	1	1	0	1	0	HTTP	Web	1	1	NaN	NaN	NaN	NaN	NaN

Now that we have our Dataframe, we can start analyzing our data as any data. For example we can compute additional features:

Compute data ratio on both direction (src2dst and dst2src)

In [12]:

df["src2dst_bytes_data_ratio"] = df['src2dst_bytes'] / df['bidirectional_bytes']
df["dst2src_bytes_data_ratio"] = df['dst2src_bytes'] / df['bidirectional_bytes']

In [13]:

df.head()

Out[13]:

	id	src_ip	src_mac	src_oui	src_port	dst_ip	dst_mac	dst_oui	dst_port	protocol	ip_version	bidirectional_first_seen_ms	bidirectional_last_seen_ms	bidirectional_duration_ms	bidirectional_packets	bidirectional_bytes	src2dst_first_seen_ms	src2dst_last_seen_ms	src2dst_duration_ms	src2dst_packets	src2dst_bytes	dst2src_first_seen_ms	dst2src_last_seen_ms	dst2src_duration_ms	dst2src_packets	dst2src_bytes	bidirectional_min_ps	bidirectional_mean_ps	bidirectional_stddev_ps	bidirectional_max_ps	src2dst_min_ps	src2dst_mean_ps	src2dst_stddev_ps	src2dst_max_ps	dst2src_min_ps	dst2src_mean_ps	dst2src_stddev_ps	dst2src_max_ps	bidirectional_min_piat_ms	bidirectional_mean_piat_ms	bidirectional_stddev_piat_ms	bidirectional_max_piat_ms	src2dst_mean_piat_ms	src2dst_stddev_piat_ms	src2dst_max_piat_ms	dst2src_mean_piat_ms	dst2src_stddev_piat_ms	dst2src_max_piat_ms	bidirectional_syn_packets	bidirectional_ack_packets	bidirectional_psh_packets	src2dst_syn_packets	src2dst_ack_packets	src2dst_psh_packets	dst2src_ack_packets	dst2src_psh_packets	application_name	application_category_name	application_is_guessed	application_confidence	requested_server_name	client_fingerprint	server_fingerprint	user_agent	content_type	src2dst_bytes_data_ratio	dst2src_bytes_data_ratio
0	0	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	57936	3a44c94fd7c9aefa07df278f016460d75aa809c94a571c...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	80	6	4	1436720900687	1436720901200	513	58	50220	1436720900687	1436720901200	513	24	1837	1436720900744	1436720901200	456	34	48383	66	865.862069	696.739485	1484	66	76.541667	51.643409	319	186	1423.029412	252.360311	1484	0	9.000000	45.124035	321	22.304348	70.131976	322	13.818182	58.73109	323	0	58	4	0	24	1	34	3	HTTP.Instagram	SocialNetwork	0	6	photos-g.ak.instagram.com	NaN	NaN	Instagram 7.1.1 Android (19/4.4.2; 480dpi; 108...	NaN	0.036579	0.963421
1	1	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	38816	c26866c915aaa410921c4fc309477eb0ceba2caec77bcf...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	80	6	4	1436720900684	1436720900750	66	52	58994	1436720900684	1436720900750	66	13	1118	1436720900716	1436720900744	28	39	57876	66	1134.500000	612.257779	1484	66	86.000000	72.111026	326	1484	1484.000000	0.000000	1484	0	1.294118	4.495750	32	5.500000	9.170110	33	0.736842	0.68514	2	0	52	1	0	13	1	39	0	HTTP.Instagram	SocialNetwork	0	6	photos-h.ak.instagram.com	NaN	NaN	Instagram 7.1.1 Android (19/4.4.2; 480dpi; 108...	NaN	0.018951	0.981049
2	2	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	37350	68df6e56301d6c238c302eaa732d2075ef802914771e7b...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	80	6	4	1436720901262	1436720901262	0	1	324	1436720901262	1436720901262	0	1	324	0	0	0	0	0	324	324.000000	0.000000	324	324	324.000000	0.000000	324	0	0.000000	0.000000	0	0	0.000000	0.000000	0	0.000000	0.000000	0	0.000000	0.00000	0	0	1	1	0	1	1	0	0	HTTP.Instagram	SocialNetwork	0	6	photos-a.ak.instagram.com	NaN	NaN	Instagram 7.1.1 Android (19/4.4.2; 480dpi; 108...	NaN	1.000000	0.000000
3	3	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	33603	107d5f2f69c5e2f3da41be7ce2a59c0d818947212d6ef0...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	53	17	4	1436720908524	1436720908575	51	2	298	1436720908524	1436720908524	0	1	89	1436720908575	1436720908575	0	1	209	89	149.000000	84.852814	209	89	89.000000	0.000000	89	209	209.000000	0.000000	209	51	51.000000	0.000000	51	0.000000	0.000000	0	0.000000	0.00000	0	0	0	0	0	0	0	0	0	DNS.Instagram	Network	0	6	igcdn-photos-a-a.akamaihd.net	NaN	NaN	NaN	NaN	0.298658	0.701342
4	4	d7feda5309e4f8477aac71903e83486f9f13566cc836f8...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	80	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	40855	6	4	1436720952611	1436720952611	0	2	140	1436720952611	1436720952611	0	1	74	1436720952611	1436720952611	0	1	66	66	70.000000	5.656854	74	74	74.000000	0.000000	74	66	66.000000	0.000000	66	0	0.000000	0.000000	0	0.000000	0.000000	0	0.000000	0.00000	0	1	2	0	1	1	0	1	0	HTTP	Web	1	1	NaN	NaN	NaN	NaN	NaN	0.528571	0.471429

Filter data according to some criterias:

In [14]:

df[df["dst_port"] == 443].head()

Out[14]:

	id	src_ip	src_mac	src_oui	src_port	dst_ip	dst_mac	dst_oui	dst_port	protocol	ip_version	bidirectional_first_seen_ms	bidirectional_last_seen_ms	bidirectional_duration_ms	bidirectional_packets	bidirectional_bytes	src2dst_first_seen_ms	src2dst_last_seen_ms	src2dst_duration_ms	src2dst_packets	src2dst_bytes	dst2src_first_seen_ms	dst2src_last_seen_ms	dst2src_duration_ms	dst2src_packets	dst2src_bytes	bidirectional_min_ps	bidirectional_mean_ps	bidirectional_stddev_ps	bidirectional_max_ps	src2dst_min_ps	src2dst_mean_ps	src2dst_stddev_ps	src2dst_max_ps	dst2src_min_ps	dst2src_mean_ps	dst2src_stddev_ps	dst2src_max_ps	bidirectional_mean_piat_ms	bidirectional_stddev_piat_ms	bidirectional_max_piat_ms	src2dst_min_piat_ms	src2dst_mean_piat_ms	src2dst_stddev_piat_ms	src2dst_max_piat_ms	dst2src_mean_piat_ms	dst2src_stddev_piat_ms	dst2src_max_piat_ms	bidirectional_syn_packets	bidirectional_ack_packets	bidirectional_psh_packets	bidirectional_fin_packets	src2dst_syn_packets	src2dst_ack_packets	src2dst_psh_packets	src2dst_fin_packets	dst2src_syn_packets	dst2src_ack_packets	dst2src_psh_packets	application_name	application_category_name	application_confidence	requested_server_name	client_fingerprint	server_fingerprint	user_agent	content_type	src2dst_bytes_data_ratio	dst2src_bytes_data_ratio
5	5	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	33763	6c662e3d71901ed2227ad7c8bd2e074e240ca921855da1...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	443	6	4	1436720908466	1436720910950	2484	11	5397	1436720908466	1436720908723	257	5	1279	1436720908518	1436720910950	2432	6	4118	66	490.636364	588.172640	1464	66	255.800000	424.405702	1015	66	686.333333	668.351006	1464	248.400000	698.093945	2227	0	64.250000	126.502635	254	486.4	976.910078	2227	0	11	3	0	0	5	1	0	0	6	2	TLS	Web	6	NaN	NaN	NaN	NaN	NaN	0.236984	0.763016
8	8	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	41181	c9f555cb103cf7cb79a76242fe4b121682293ebb993677...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	443	6	4	1436720908576	1436720908733	157	14	5567	1436720908576	1436720908733	157	8	896	1436720908615	1436720908662	47	6	4671	66	397.642857	566.204041	1484	66	112.000000	86.328277	292	66	778.500000	720.057706	1484	12.076923	22.746654	71	0	22.428571	28.564880	71	9.4	17.728508	41	2	13	4	0	1	7	2	0	1	6	2	TLS.Instagram	SocialNetwork	6	igcdn-photos-a-a.akamaihd.net	54ae5fcb0159e2ddf6a50e149221c7c7	34d6f0ad0a79e4cfdf145e640cc93f78	NaN	NaN	0.160948	0.839052
9	9	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	58690	7e269e3e2e4c87e46547c961422353c0034227df8cb6e5...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	443	6	4	1436720952561	1436720952561	0	2	169	1436720952561	1436720952561	0	2	169	0	0	0	0	0	66	84.500000	26.162951	103	66	84.500000	26.162951	103	0	0.000000	0.000000	0	0.000000	0.000000	0	0	0.000000	0.000000	0	0.0	0.000000	0	0	2	1	1	0	2	1	1	0	0	0	TLS	Web	6	NaN	NaN	NaN	NaN	NaN	1.000000	0.000000
11	11	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	56382	8296dd65fdefef3bde6b1205337bd3270f4960d0747ae6...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	443	6	4	1436720898354	1436720899158	804	17	2647	1436720898354	1436720899158	804	9	1583	1436720898499	1436720899122	623	8	1064	66	155.705882	128.137994	530	66	175.888889	164.147376	530	66	133.000000	74.989523	231	50.250000	72.009722	181	0	100.500000	83.513900	183	89.0	84.261498	183	2	16	8	0	1	8	4	0	1	8	4	TLS.Instagram	SocialNetwork	6	telegraph-ash.instagram.com	54ae5fcb0159e2ddf6a50e149221c7c7	acb741bcdffb787c5a52654c78645bdf	NaN	NaN	0.598036	0.401964
15	15	5885370fbc1de250a4570351f2679e915e15245a5534bd...	b5d836f0b4088481bd22d1bcdbf78c8bb4ed6c5b5a3175...	40:f3:08	41182	c9f555cb103cf7cb79a76242fe4b121682293ebb993677...	7f6b3b13330898c4dcf505e44b642c996b9674139831e0...	00:1b:2f	443	6	4	1436720908577	1436720908737	160	14	5567	1436720908577	1436720908737	160	8	896	1436720908616	1436720908665	49	6	4671	66	397.642857	566.204041	1484	66	112.000000	86.328277	292	66	778.500000	720.057706	1484	12.307692	23.346608	71	1	22.857143	28.439577	71	9.8	20.801442	47	2	13	4	0	1	7	2	0	1	6	2	TLS.Instagram	SocialNetwork	6	igcdn-photos-a-a.akamaihd.net	54ae5fcb0159e2ddf6a50e149221c7c7	34d6f0ad0a79e4cfdf145e640cc93f78	NaN	NaN	0.160948	0.839052

Extend nfstream¶

In some use cases, we need to add features that are computed as packet level. Thus, nfstream handles such scenario using NFPlugin.

Let's suppose that we want bidirectional packets with exact IP size equal to 40 counter per flow.

In [15]:

class Packet40Count(NFPlugin):
    def on_init(self, pkt, flow): # flow creation with the first packet
        if pkt.ip_size == 40:
            flow.udps.packet_with_40_ip_size=1
        else:
            flow.udps.packet_with_40_ip_size=0
        
    def on_update(self, pkt, flow): # flow update with each packet belonging to the flow
        if pkt.ip_size == 40:
            flow.udps.packet_with_40_ip_size += 1

In [16]:

df = NFStreamer(source="pcap/google_ssl.pcap", udps=[Packet40Count()]).to_pandas()

In [17]:

df.head()

Out[17]:

	id	expiration_id	src_ip	src_mac	src_oui	src_port	dst_ip	dst_mac	dst_oui	dst_port	protocol	ip_version	vlan_id	tunnel_id	bidirectional_first_seen_ms	bidirectional_last_seen_ms	bidirectional_duration_ms	bidirectional_packets	bidirectional_bytes	src2dst_first_seen_ms	src2dst_last_seen_ms	src2dst_duration_ms	src2dst_packets	src2dst_bytes	dst2src_first_seen_ms	dst2src_last_seen_ms	dst2src_duration_ms	dst2src_packets	dst2src_bytes	application_name	application_category_name	application_is_guessed	application_confidence	requested_server_name	client_fingerprint	server_fingerprint	user_agent	content_type	udps.packet_with_40_ip_size
0	0	0	172.31.3.224	80:c6:ca:00:9e:9f	80:c6:ca	42835	216.58.212.100	00:0e:8e:4d:b4:a8	00:0e:8e	443	6	4	0	0	1434443394683	1434443401353	6670	28	9108	1434443394683	1434443401353	6670	16	1512	1434443394717	1434443401308	6591	12	7596	TLS	Web	1	1	NaN	NaN	NaN	NaN	NaN	14

Our Dataframe have a new column named udps.packet_with_40_ip_size.