WIP: Rekall to Pandas Dataframe

This notebook demonstrates a particularily kewl feature of workbench. Quickly and efficiently going from raw data to a Pandas Dataframe.

Here we're using the workbench server to look at a forensic memory image that workbench processes with the Rekall python module https://github.com/google/rekall. Any thing that is kewl in this notebook is because of Rekall, anything that is lame is probably Workbench (our Rekall integration is days old).

Super Big Thanks

  • JPH Security: This notebook utilitizes the 'Baseline Approach' outlined in this JPH Security Blog. Resources like this are terrific and greatly appreciated by us and the community.
  • Michael Cohen (scudette): Main developer of the Google Rekall project and amazingly patient with my dumb questions.

Tools in this Notebook:

More Info:

Lets start up the workbench server...

Run the workbench server (from somewhere, for the demo we're just going to start a local one)

$ workbench_server
In [1]:
# Lets start to interact with workbench, please note there is NO specific client to workbench,
# Just use the ZeroRPC Python, Node.js, or CLI interfaces.
import zerorpc
c = zerorpc.Client(timeout=120)
c.connect("tcp://127.0.0.1:4242")
Out[1]:
[None]

Read in the Data

The data is pulled from a popular publically available memory image dataset called exemplar4.vmem.

In [2]:
# Load in the Memory Image file
with open('../data/mem_images/exemplar4.vmem','rb') as f:
    mem_md5 = c.store_sample(f.read(), 'exemplar4.vmem', 'mem')
In [3]:
# Lets look at the workers that we might invoke
print c.help_workers()
Workbench Workers:
	json_meta ['sample', 'meta']
	log_meta ['sample', 'meta']
	mem_base ['sample']
	mem_connscan ['sample']
	mem_dlllist ['sample']
	mem_meta ['sample']
	mem_procdump ['sample']
	mem_pslist ['sample']
	meta ['sample']
	meta_deep ['sample', 'meta']
	pcap_bro ['sample']
	pcap_graph ['pcap_bro']
	pcap_graph_0_1 ['pcap_bro']
	pcap_http_graph ['pcap_bro']
	pe_classifier ['pe_features', 'pe_indicators']
	pe_deep_sim ['meta_deep']
	pe_features ['sample']
	pe_indicators ['sample']
	pe_peid ['sample']
	strings ['sample']
	swf_meta ['sample', 'meta']
	unzip ['sample']
	url ['strings']
	view ['meta']
	view_customer ['meta']
	view_log_meta ['log_meta']
	view_meta ['meta']
	view_pcap ['pcap_bro']
	view_pcap_details ['view_pcap']
	view_pdf ['meta', 'strings']
	view_pe ['meta', 'strings', 'pe_peid', 'pe_indicators', 'pe_classifier', 'pe_disass']
	view_zip ['meta', 'unzip']
	vt_query ['meta']
	yara_sigs ['sample']
In [4]:
# Now we invoke the mem_meta worker (all memory workers start with mem_)
output = c.work_request('mem_meta', mem_md5)['mem_meta']
output
Out[4]:
{'md5': '359df4feb25af47cfef228f393d07c10',
 'plugin_name': 'imageinfo',
 'sections': {'Info': [{'Fact': 'Kernel DTB', 'Value': '0x7d0000'},
   {'Fact': 'NT Build', 'Value': '2600.xpsp_sp2_rtm.040803-2158'},
   {'Fact': 'NT Build Ex', 'Value': '-'},
   {'Fact': 'Signed Drivers', 'Value': '-'},
   {'Fact': 'Time (UTC)', 'Value': '2009-01-08 02:02:18+0000'},
   {'Fact': 'Time (Local)', 'Value': '2009-01-08 07:02:18+0000'},
   {'Fact': 'Sec Since Boot', 'Value': 937.34375},
   {'Fact': 'NtSystemRoot', 'Value': 'C:\\WINDOWS'}],
  'Physical Layout': [{'Number of Pages': 158,
    'Phys End': 651264,
    'Phys Start': 4096},
   {'Number of Pages': 3839, 'Phys End': 16773120, 'Phys Start': 1048576},
   {'Number of Pages': 61168, 'Phys End': 267321344, 'Phys Start': 16777216},
   {'Number of Pages': 256, 'Phys End': 268435456, 'Phys Start': 267386880}]}}
In [5]:
# Now we look at the pslist worker (which is just a big blog of python data)
output = c.work_request('mem_pslist', mem_md5)['mem_pslist']
str(output)[:50]
Out[5]:
"{'sections': {'Info': [{'Sess': '-', 'Hnds': 244, "

Lets look at the Data

We're going to use some nice functionality in the Pandas dataframe to look at our memory image data, specifically we're going to group by Parent Process IDs (PPIDs) and see which processes came from which parents.

This type of operation is really just scratching the surface when it comes to dataframes, so quickly and efficiently populating a dataframe is super awesome.

In [6]:
# Okay that didn't seem very useful, just a gigantic ugly blob of python.
# Lets push the pslist info section into a pandas dataframe
import pandas as pd
df = pd.DataFrame(output['sections']['Info'])
df.head()
Out[6]:
Exit Hnds Name Offset (V) PID PPID Sess Start Thds Wow64
0 - 244 System [_EPROCESS _EPROCESS] @ 0x817CC7F8 (pid=4)\n ... 4 0 - - 53 False
1 - 98 alg.exe [_EPROCESS _EPROCESS] @ 0x8163D020 (pid=408)\n... 408 656 0 2009-01-08T01:48:23Z 5 False
2 - 21 smss.exe [_EPROCESS _EPROCESS] @ 0x8140F600 (pid=516)\n... 516 4 - 2009-01-08T01:46:50Z 3 False
3 - 303 csrss.exe [_EPROCESS _EPROCESS] @ 0x81712170 (pid=588)\n... 588 516 0 2009-01-08T01:46:56Z 9 False
4 - 599 winlogon.exe [_EPROCESS _EPROCESS] @ 0x8172D2D8 (pid=612)\n... 612 516 0 2009-01-08T01:46:56Z 20 False

5 rows × 10 columns

In [7]:
# Now lets use the Pandas groupby methods
df['count'] = 1
df.groupby(['PPID','Name','PID']).sum()
Out[7]:
Hnds Thds count
PPID Name PID
0 System 4 244 53 1
4 smss.exe 516 21 3 1
516 csrss.exe 588 303 9 1
winlogon.exe 612 599 20 1
612 lsass.exe 668 321 20 1
services.exe 656 249 15 1
656 alg.exe 408 98 5 1
spoolsv.exe 1516 109 11 1
svchost.exe 888 222 9 1
984 1491 69 1
1020 197 18 1
1232 79 5 1
1304 202 13 1
984 wscntfy.exe 1048 27 1 1
1888 svhost.exe 1936 83 7 1
2000 explorer.exe 1928 311 12 1

16 rows × 3 columns

The groupby is well organized, as we break it down we get nervous...

  • smss.exe[516] spawned:
    • csrss.exe
    • winlogon.exe[612]
  • winlogon.exe[612] spawned:
    • lsass.exe
    • services.exe[656]
  • services.exe[656] spawned:
    • alg.exe
    • spoolsv.exe
    • svchost.exe (5 of them)

Why is 1888 spawning svhost.exe[1936], not quite svchost.exe and more importantly not spawned by services.exe?!

In [8]:
# Now we look at the connscan worker
output = c.work_request('mem_connscan', mem_md5)['mem_connscan']
output
Out[8]:
{'md5': '359df4feb25af47cfef228f393d07c10',
 'plugin_name': 'connscan',
 'sections': {'Info': [{'Local Address': '192.168.30.128:1052',
    'Offset(P)': 23629832,
    'Pid': 1644,
    'Remote Address': '204.160.104.126:80'},
   {'Local Address': '192.168.30.128:1051',
    'Offset(P)': 23665408,
    'Pid': 1644,
    'Remote Address': '204.160.104.126:80'},
   {'Local Address': '192.168.30.128:1049',
    'Offset(P)': 25150656,
    'Pid': 1644,
    'Remote Address': '65.54.152.225:80'},
   {'Local Address': '192.168.30.128:1048',
    'Offset(P)': 25151712,
    'Pid': 1644,
    'Remote Address': '65.54.77.76:80'},
   {'Local Address': '192.168.30.128:1050',
    'Offset(P)': 25152768,
    'Pid': 1644,
    'Remote Address': '192.221.98.124:80'},
   {'Local Address': '192.168.30.128:1056',
    'Offset(P)': 25727888,
    'Pid': 1936,
    'Remote Address': '66.249.128.230:9899'},
   {'Local Address': '192.168.30.128:1055',
    'Offset(P)': 26176800,
    'Pid': 1644,
    'Remote Address': '192.168.30.129:80'},
   {'Local Address': '3.0.48.2:18776',
    'Offset(P)': 27477112,
    'Pid': 2168284584,
    'Remote Address': '192.221.114.126:19277'}]}}
In [9]:
# Same as above we'll throw it into a Dataframe and do a group by
conn_df = pd.DataFrame(output['sections']['Info'])
conn_df['count'] = 1
conn_df.groupby(['Pid','Remote Address']).sum()
Out[9]:
Offset(P) count
Pid Remote Address
1644 192.168.30.129:80 26176800 1
192.221.98.124:80 25152768 1
204.160.104.126:80 47295240 2
65.54.152.225:80 25150656 1
65.54.77.76:80 25151712 1
1936 66.249.128.230:9899 25727888 1
2168284584 192.221.114.126:19277 27477112 1

7 rows × 2 columns

Okay we see that our 'weird' svhost.exe[1936] is communicating to the outside world...

  • "Port 9989/TCP is associated with the Ini-Killer remote access Trojan" (quote from JPH Security Blog)
  • The 66.249.128.230 host is now clean, but at one point is was probably naughty. The important part here is that we can very quickly 'pivot' on the data from Rekall using Workbench.
In [10]:
# Now lets look at the DLL for the various processes
output = c.work_request('mem_dlllist', mem_md5)['mem_dlllist']
In [11]:
# Each process has it's own section
output['sections'].keys()
Out[11]:
['Info',
 'lsass_exe pid: 668',
 'services_exe pid: 656',
 'winlogon_exe pid: 612',
 'svchost_exe pid: 1020',
 'svchost_exe pid: 1232',
 'svchost_exe pid: 1304',
 'System pid: 4',
 'spoolsv_exe pid: 1516',
 'svhost_exe pid: 1936',
 'csrss_exe pid: 588',
 'explorer_exe pid: 1928',
 'svchost_exe pid: 888',
 'alg_exe pid: 408',
 'smss_exe pid: 516',
 'svchost_exe pid: 984',
 'wscntfy_exe pid: 1048']
In [12]:
# Lets look at the process of interest
dll_df = pd.DataFrame(output['sections']['svhost_exe pid: 1936'])
dll_df
Out[12]:
Base Load Reason/Count Path Size
0 Pointer to Pointer to - 65535 C:\Windows\msagent\svhost.exe 430080
1 Pointer to Pointer to - 65535 C:\WINDOWS\system32\ntdll.dll 720896
2 Pointer to Pointer to - 65535 C:\WINDOWS\system32\kernel32.dll 999424
3 Pointer to Pointer to - 65535 C:\WINDOWS\system32\advapi32.dll 634880
4 Pointer to Pointer to - 65535 C:\WINDOWS\system32\RPCRT4.dll 593920
5 Pointer to Pointer to - 65535 C:\WINDOWS\system32\shell32.dll 8470528
6 Pointer to Pointer to - 65535 C:\WINDOWS\system32\msvcrt.dll 360448
7 Pointer to Pointer to - 65535 C:\WINDOWS\system32\GDI32.dll 286720
8 Pointer to Pointer to - 65535 C:\WINDOWS\system32\USER32.dll 589824
9 Pointer to Pointer to - 65535 C:\WINDOWS\system32\SHLWAPI.dll 483328
10 Pointer to Pointer to - 65535 C:\WINDOWS\system32\ws2_32.dll 94208
11 Pointer to Pointer to - 65535 C:\WINDOWS\system32\WS2HELP.dll 32768
12 Pointer to Pointer to - 65535 C:\WINDOWS\system32\ole32.dll 1294336
13 Pointer to Pointer to - 65535 C:\WINDOWS\system32\oleaut32.dll 573440
14 Pointer to Pointer to - 3 C:\WINDOWS\WinSxS\x86_Microsoft.Windows.Common... 1056768
15 Pointer to Pointer to - 1 C:\WINDOWS\system32\comctl32.dll 618496
16 Pointer to Pointer to - 2 C:\WINDOWS\system32\version.dll 32768
17 Pointer to Pointer to - 2 C:\WINDOWS\system32\wsock32.dll 36864
18 Pointer to Pointer to - 2 C:\WINDOWS\system32\uxtheme.dll 229376
19 Pointer to Pointer to - 1 C:\WINDOWS\system32\wininet.dll 679936
20 Pointer to Pointer to - 1 C:\WINDOWS\system32\CRYPT32.dll 606208
21 Pointer to Pointer to - 1 C:\WINDOWS\system32\MSASN1.dll 73728
22 Pointer to Pointer to - 1 C:\WINDOWS\system32\Secur32.dll 69632
23 Pointer to Pointer to - 1 C:\WINDOWS\system32\urlmon.dll 638976
24 Pointer to Pointer to - 1 C:\WINDOWS\system32\icmp.dll 16384
25 Pointer to Pointer to - 3 C:\WINDOWS\system32\iphlpapi.dll 102400
26 Pointer to Pointer to - 4 C:\WINDOWS\system32\mswsock.dll 258048
27 Pointer to Pointer to - 1 C:\WINDOWS\system32\hnetcfg.dll 360448
28 Pointer to Pointer to - 1 C:\WINDOWS\System32\wshtcpip.dll 32768
29 Pointer to Pointer to - 2 C:\WINDOWS\system32\DNSAPI.dll 159744
30 Pointer to Pointer to - 1 C:\WINDOWS\System32\winrnr.dll 32768
31 Pointer to Pointer to - 1 C:\WINDOWS\system32\WLDAP32.dll 180224
32 Pointer to Pointer to - 1 C:\WINDOWS\system32\rasadhlp.dll 24576

33 rows × 4 columns

We see from dll disk locations that our 'weird' svhost.exe[1936] is still weird...

  • C:\Windows\msagent\svhost.exe doesn't look like a standard windows system location
  • We see communication dlls being loaded (wsock32.dll, ws2_32.dll)
In [13]:
# Dump PE Files from all the processes
output = c.work_request('mem_procdump', mem_md5)['mem_procdump']
output
Out[13]:
{'dumped_files': [{'md5': '3dfee6c10e85fa73af431541ae57af1b',
   'name': 'alg_exe_408.exe'},
  {'md5': 'f79c50b2a6e7b1a532c72d7e2e99a1b4', 'name': 'csrss_exe_588.exe'},
  {'md5': '4c4e5b2aed0a2d0776cd1bb640d94f10', 'name': 'explorer_exe_1928.exe'},
  {'md5': 'e5dd6fbc3580b523005e16e76677d942', 'name': 'lsass_exe_668.exe'},
  {'md5': '0a02acdfb8c10511aee002805eb324e4', 'name': 'services_exe_656.exe'},
  {'md5': 'a11279e7f15a0a9f342e0809d3460e26', 'name': 'smss_exe_516.exe'},
  {'md5': '01979d4ac9b6b0ac7a7f92dce6c1a5a7', 'name': 'spoolsv_exe_1516.exe'},
  {'md5': '5e4ab0ef8720a60cd986dfbc9673cfad', 'name': 'svchost_exe_1020.exe'},
  {'md5': '3936b4cc812630cc484096da94ca20b0', 'name': 'svchost_exe_1232.exe'},
  {'md5': '3d0857eaadc6f6f2281ace126a153906', 'name': 'svchost_exe_1304.exe'},
  {'md5': '919c1bfa6a544cf585c9e42462b61841', 'name': 'svchost_exe_888.exe'},
  {'md5': 'a128462ec37cffe7e57a1bc7defbc178', 'name': 'svchost_exe_984.exe'},
  {'md5': '0374a3a1689771e93432f8803cc2a09c', 'name': 'svhost_exe_1936.exe'},
  {'md5': 'd41d8cd98f00b204e9800998ecf8427e', 'name': 'System_4.exe'},
  {'md5': 'f5804b6e198a633c3e47703e8792e3d7', 'name': 'winlogon_exe_612.exe'},
  {'md5': 'eb891ddaffbae5e69a28363220ac79ee', 'name': 'wscntfy_exe_1048.exe'}],
 'md5': '359df4feb25af47cfef228f393d07c10'}
In [14]:
# Okay nice, now let look deeper out the files with Workbench
# First the file that we're pretty sure is naughty
c.work_request('view', '0374a3a1689771e93432f8803cc2a09c')
Out[14]:
{'view': {'md5': '0374a3a1689771e93432f8803cc2a09c',
  'view_pe': {'classification': 'Evil!',
   'customer': 'Huge Inc',
   'disass': 'plugin_failed',
   'encoding': 'binary',
   'file_size': 126464,
   'file_type': 'PE32 executable (GUI) Intel 80386, for MS Windows',
   'filename': 'svhost_exe_1936.exe',
   'import_time': '2014-06-16T21:54:21.990000Z',
   'indicators': [{'category': 'PE_WARN',
     'description': 'Suspicious flags set for section 0. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',
     'severity': 2},
    {'category': 'PE_WARN',
     'description': 'Suspicious flags set for section 1. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',
     'severity': 2},
    {'category': 'PE_WARN',
     'description': 'Suspicious flags set for section 2. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',
     'severity': 2},
    {'category': 'PE_WARN',
     'description': 'Suspicious flags set for section 3. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',
     'severity': 2},
    {'category': 'PE_WARN',
     'description': 'Suspicious flags set for section 4. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',
     'severity': 2},
    {'category': 'PE_WARN',
     'description': 'Suspicious flags set for section 5. Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. This might indicate a packed executable.',
     'severity': 2},
    {'category': 'PE_WARN',
     'description': "Error parsing the import directory. Invalid Import data at RVA: 0x4ea64 ('Invalid entries in the Import Table. Aborting parsing.')",
     'severity': 2},
    {'category': 'MALFORMED',
     'description': 'Reported Checksum does not match actual checksum',
     'severity': 2},
    {'attributes': True,
     'category': 'MALFORMED',
     'description': 'Corrupted import table',
     'severity': 3},
    {'category': 'MALFORMED',
     'description': 'Image size does not match reported size',
     'severity': 3},
    {'attributes': ['', '', '', '.xxx', '.adata'],
     'category': 'MALFORMED',
     'description': 'Section(s) with a non-standard name, tamper indication',
     'severity': 3}],
   'length': 126464,
   'md5': '0374a3a1689771e93432f8803cc2a09c',
   'mime_type': 'application/x-dosexec',
   'peid_Matches': [],
   'type_tag': 'exe'}}}
In [20]:
# Now smss_exe_516.exe
c.work_request('view', 'a11279e7f15a0a9f342e0809d3460e26')
Out[20]:
{'view': {'md5': 'a11279e7f15a0a9f342e0809d3460e26',
  'view_pe': {'classification': 'Evil!',
   'customer': 'Mega Corp',
   'disass': 'plugin_failed',
   'encoding': 'binary',
   'file_size': 50688,
   'file_type': 'PE32 executable (native) Intel 80386, for MS Windows',
   'filename': 'smss_exe_516.exe',
   'import_time': '2014-06-16T21:54:21.959000Z',
   'indicators': [{'category': 'MALFORMED',
     'description': 'Reported Checksum does not match actual checksum',
     'severity': 2}],
   'length': 50688,
   'md5': 'a11279e7f15a0a9f342e0809d3460e26',
   'mime_type': 'application/x-dosexec',
   'peid_Matches': [],
   'type_tag': 'exe'}}}
In [16]:
# Virus Total Query (on svhost.exe)
c.work_request('vt_query', '0374a3a1689771e93432f8803cc2a09c')
Out[16]:
{'vt_query': {'file_type': 'PE32 executable (GUI) Intel 80386, for MS Windows',
  'md5': '0374a3a1689771e93432f8803cc2a09c',
  'positives': 17,
  'scan_date': '2013-04-09 02:20:01',
  'scan_results': [['Mal_Otorun7', 2],
   ['Heuristic.LooksLike.Win32.Suspicious.C', 1],
   ['Backdoor.Rbot!2E9D', 1],
   ['Worm:Win32/Pushbot.gen!C', 1],
   ['TR/Autorun.126464', 1]],
  'total': 46}}
In [17]:
# Virus Total Query (on smss_exe_516.exe)
c.work_request('vt_query', 'a11279e7f15a0a9f342e0809d3460e26')
Out[17]:
{'vt_query': {'file_type': 'PE32 executable (native) Intel 80386, for MS Windows',
  'md5': 'a11279e7f15a0a9f342e0809d3460e26',
  'not_found': True}}

Wrap Up

Well for this notebook we went from a forensic memory image to a Pandas Dataframe with the Power of Rekall (http://www.rekall-forensic.com). We hope this exercise showed some neato functionality using Workbench, we encourage you to check out the GitHub repository and our other notebooks: