Intro material: Generator Pipelines in Python
More advanced, LOTS of exercises: David M. Beazley
Video presentation: V James Powell @ PyData NYC 2012
tl;dr - Good for streaming data and building pipelines that iterate over streaming data.
** How easy is it to get workbench to build streaming data pipeline thingys? **
Run the workbench server (from somewhere, for the demo we're just going to start a local one)
$ workbench_server
# Lets start to interact with workbench, please note there is NO specific client to workbench,
# Just use the ZeroRPC Python, Node.js, or CLI interfaces.
import zerorpc
c = zerorpc.Client()
c.connect("tcp://127.0.0.1:4242")
[None]
# Load in 100 PE Files
import os
file_list = [os.path.join('../data/pe/bad', child) for child in os.listdir('../data/pe/bad')]
file_list += [os.path.join('../data/pe/good', child) for child in os.listdir('../data/pe/good')]
md5_list = []
for filename in file_list:
with open(filename,'rb') as f:
md5_list.append(c.store_sample(f.read(), filename, 'exe'))
print 'Files loaded: %d' % len(md5_list)
md5_list[:5]
Files loaded: 100
['033d91aae8ad29ed9fbb858179271232', '0cb9aa6fb9c4aa3afad7a303e21ac0f3', '0e882ec9b485979ea84c7843d41ba36f', '0e8b030fb6ae48ffd29e520fc16b5641', '0eb9e990c521b30428a379700ec5ab3e']
# Compute pe_features on all files of type pe, just pull back the sparse features
imports = c.batch_work_request('pe_features', {'md5_list': md5_list, 'subkeys':['md5','sparse_features.imported_symbols']})
imports
<generator object iterator at 0x10b545cd0>
# Client generators
# First we're going going to filter PE Files only getting ones with communication related imports
def comm_imports(import_info_list):
comm_imports = ['accept', 'bind', 'connect', 'connectnamedpipe', 'gethostbyname', 'gethostname', 'inet_addr', 'recv', 'send']
for import_info in import_info_list:
md5 = import_info['md5']
import_symbols = import_info['imported_symbols']
if any(comm in sym for comm in comm_imports for sym in import_symbols ):
yield md5
def peid_sigs(md5_list):
for md5 in md5_list:
yield c.work_request('pe_peid', md5)
# Now and only now will our generator pipeline unwind. The work will get pulled from the workbench
# server and ONLY what needs to get processed based on our generator filters will get processed.
# Note: Out of 100 PE Files, only 19 matched our filter, so only 19 will get pulled through the peid
# worker. Imagine a generator pipeline that ended in a dynamic sandbox.. super handy to downselect first.
peid_results = peid_sigs(comm_imports(imports))
for peid_info in peid_results:
print peid_info
{'pe_peid': {'match_list': ['Microsoft Visual C++ v6.0'], 'md5': '0cb9aa6fb9c4aa3afad7a303e21ac0f3'}} {'pe_peid': {'match_list': [], 'md5': '0e8b030fb6ae48ffd29e520fc16b5641'}} {'pe_peid': {'match_list': [], 'md5': '0eb9e990c521b30428a379700ec5ab3e'}} {'pe_peid': {'match_list': ['Microsoft Visual C++ v6.0'], 'md5': '13dcc5b4570180118eb65529b77f6d89'}} {'pe_peid': {'match_list': ['Armadillo v4.x'], 'md5': '1cac80a2147cd8f3860547e43edcaa00'}} {'pe_peid': {'match_list': [], 'md5': '2058c50de5976c67a09dfa5e0e1c7eb5'}} {'pe_peid': {'match_list': ['UPX -> www.upx.sourceforge.net'], 'md5': '2d09e4aff42aebac87ae2fd737aba94f'}} {'pe_peid': {'match_list': [], 'md5': '9ceccd9f32cb2ad0b140b6d15d8993b6'}} {'pe_peid': {'match_list': [], 'md5': 'b681485cb9e0cad73ee85b9274c0d3c2'}} {'pe_peid': {'match_list': [], 'md5': '093dee8d97fd9d35884ed52179b3d142'}} {'pe_peid': {'match_list': [], 'md5': '2352ab5f9f8f097bf9d41d5a4718a041'}} {'pe_peid': {'match_list': ['Installer VISE Custom'], 'md5': '2459a629ace148286360b860442221a2'}} {'pe_peid': {'match_list': ['Microsoft Visual C++ 8'], 'md5': '2cbfc3993bd5134d58d12668f21d91da'}} {'pe_peid': {'match_list': ['Microsoft Visual C++ 8.0 (DLL)'], 'md5': '4d71227301dd8d09097b9e4cc6527e5a'}} {'pe_peid': {'match_list': ['Microsoft Visual C++ 8'], 'md5': '550388c7aecde4bf6fae9f77755d54de'}} {'pe_peid': {'match_list': ['Microsoft Visual C++ 8.0 (DLL)'], 'md5': '8003518cf919df3475531124b327ceb8'}} {'pe_peid': {'match_list': [], 'md5': '803b370865d907ea21dc0c2b6a8936b5'}} {'pe_peid': {'match_list': [], 'md5': '9333096b59dad0c4294749755963dba7'}} {'pe_peid': {'match_list': ['Borland Delphi 3.0 (???)'], 'md5': 'a3661a61f7e7b7d37e6d037ed747e7ef'}}
- Just the import symbols for 100 samples
- Just the PEID signature matches for 19 samples