Fuzzing in the Large

In the past chapters, we have always looked at fuzzing taking place on one machine for a few seconds only. In the real world, however, fuzzers are run on dozens or even thousands of machines; for hours, days and weeks; for one program or dozens of programs. In such contexts, one needs an infrastructure to collect failure data from the individual fuzzer runs, and to aggregate such data in a central repository. In this chapter, we will examine such an infrastructure, the FuzzManager framework from Mozilla.

Prerequisites

In [1]:
import fuzzingbook_utils
In [2]:
import Fuzzer

Synopsis

To use the code provided in this chapter, write

>>> from fuzzingbook.FuzzingInTheLarge import <identifier>

and then make use of the following features.

The Python FuzzManager package allows for programmatic submission of failures from a large number of (fuzzed) programs. One can query crashes and their details, collect them into buckets to ensure thay will be treated the same, and also retrieve coverage information for debugging both programs and their tests.

Collecting Crashes from Multiple Fuzzers

So far, all our fuzzing scenarios have been one fuzzer on one machine testing one program. Failures would be shown immediately, and diagnosed quickly by the same person who started the fuzzer. Alas, testing in the real world is different. Fuzzing is still fully automated; but now, we are talking about multiple fuzzers running on multiple machines testing multiple programs (and versions thereof), producing multiple failures that have to be handled by multiple people. This raises the question of how to manage all these activities and their interplay.

A common means to coordinate several fuzzers is to have a central repository that collects all crashes as well as their crash information. Whenever a fuzzer detects a failure, it connects via the network to a crash server, which then stores the crash information in a database.

In [3]:
from graphviz import Digraph
In [4]:
g = Digraph()
server = 'Crash Server'
g.node('Crash Database', shape='cylinder')
for i in range(1, 7):
    g.edge('Fuzzer ' + repr(i), server)
g.edge(server, 'Crash Database')
g
Out[4]:
%3 Crash Database Crash Database Fuzzer 1 Fuzzer 1 Crash Server Crash Server Fuzzer 1->Crash Server Crash Server->Crash Database Fuzzer 2 Fuzzer 2 Fuzzer 2->Crash Server Fuzzer 3 Fuzzer 3 Fuzzer 3->Crash Server Fuzzer 4 Fuzzer 4 Fuzzer 4->Crash Server Fuzzer 5 Fuzzer 5 Fuzzer 5->Crash Server Fuzzer 6 Fuzzer 6 Fuzzer 6->Crash Server

The resulting crash database can be queried to find out which failures have occurred – typically, using a Web interface. It can also be integrated with other process activities. Most importantly, entries in the crash database can be linked to the bug database, and vice versa, such that bugs (= crashes) can be assigned to individual developers.

In such an infrastructure, collecting crashes is not limited to fuzzers. Crashes and failures occurring in the wild can also be automatically reported to the crash server. In industry, it is not uncommon to have crash databases collecting thousands of crashes from production runs – especially if the software in question is used by millions of people every day.

What information is stored in such a database?

  • Most important is the identifier of the product – that is, the product name, version information as well as the platform and the operating system. Without this information, there is no way developers can tell whether the bug is still around in the latest version, or whether it already has been fixed.

  • For debugging, the most helpful information for developers are the steps to reproduce – in a fuzzing scenario, this would be the input to the program in question. (In a production scenario, the user's input is not collected for obvious privacy reasons.)

  • Second most helpful for debugging is a stack trace such that developers can inspect which internal functionality was active in the moment of the failure. A coverage map also comes in handy, since developers can query which functions were executed and which were not.

  • If general failures are collected, developers also need to know what the expected behavior was; for crashes, this is simple, as users do not expect their software to crash.

All of this information can be collected automatically if the fuzzer (or the program in question) is set up accordingly.

In this chapter, we will explore a platform that automates all these steps. The FuzzManager platform allows to

  1. collect failure data from failing runs,
  2. enter this data into a centralized server, and
  3. query the server via a Web interface.

In this chapter, we will show how to conduct basic steps with FuzzManager, including crash submission and triage as well as coverage measurement tasks.

Running a Crash Server

FuzzManager is a tool chain for managing large-scale fuzzing processes. It is modular in the sense that you can make use of those parts you need; it is versatile in the sense that it does not impose a particular process. It consists of a server whose task is to collect crash data, as well as of various collector utilities that collect crash data to send it to the server.

Setting up the Server

To run the examples in this notebook, we need to run a crash server – that is, the FuzzManager server. You can either

  1. Run your own server. To do so, you need to follow the installation steps listed under "Server Setup" on the FuzzManager page. The FuzzManager folder should be created in the same folder as this notebook.

  2. Have the notebook start (and stop) a server. The following commands following commands do this automatically. They are meant for the purposes of this notebook only, though; if you want to experiment with your own server, run it manually, as described above.

We start with getting the fresh server code from the repository.

In [5]:
import os
import shutil
In [6]:
if os.path.exists('FuzzManager'):
    shutil.rmtree('FuzzManager')
In [7]:
!git clone https://github.com/MozillaSecurity/FuzzManager
Cloning into 'FuzzManager'...
remote: Enumerating objects: 224, done.
remote: Counting objects: 100% (224/224), done.
remote: Compressing objects: 100% (113/113), done.
remote: Total 9822 (delta 135), reused 170 (delta 110), pack-reused 9598
Receiving objects: 100% (9822/9822), 4.70 MiB | 4.55 MiB/s, done.
Resolving deltas: 100% (6539/6539), done.
In [8]:
!pip install -r FuzzManager/server/requirements.txt > /dev/null
Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
You are using pip version 19.0.2, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
In [9]:
!cd FuzzManager/server; python ./manage.py migrate > /dev/null

We create a user named demo with a password demo, using this handy trick.

In [10]:
!(cd FuzzManager/server; echo "from django.contrib.auth import get_user_model; User = get_user_model(); User.objects.create_superuser('demo', '[email protected]', 'demo')" | python manage.py shell)

We create a token for this user. This token will later be used by automatic commands for authentication.

In [11]:
import subprocess
import sys
In [12]:
result = subprocess.run(['python', 'FuzzManager/server/manage.py', 'get_auth_token', 'demo'], 
                        stdout=subprocess.PIPE,
                        stderr=subprocess.PIPE)
err = result.stderr.decode('ascii')
if len(err) > 0:
    print(err, file=sys.stderr, end="")
In [13]:
token = result.stdout
token = token.decode('ascii').strip()
token
Out[13]:
'6eba7ab4427eece25ef428f1e89979f82bfc4e80'
In [14]:
assert len(token) > 10, "Invalid token " + repr(token)

The token is stored in ~/.fuzzmanagerconf in our home folder.

In [15]:
home = os.path.expanduser("~")
conf = os.path.join(home, ".fuzzmanagerconf")
conf
Out[15]:
'/Users/zeller/.fuzzmanagerconf'
In [16]:
fuzzmanagerconf = """
[Main]
sigdir = /home/example/fuzzingbok
serverhost = 127.0.0.1
serverport = 8000
serverproto = http
serverauthtoken = %s
tool = fuzzingbook
""" % token
In [17]:
with open(conf, "w") as file:
    file.write(fuzzmanagerconf)
In [18]:
from pygments.lexers.configs import IniLexer
In [19]:
from fuzzingbook_utils import print_file
In [20]:
print_file(conf, lexer=IniLexer())
[Main]
sigdir = /home/example/fuzzingbok
serverhost = 127.0.0.1
serverport = 8000
serverproto = http
serverauthtoken = 6eba7ab4427eece25ef428f1e89979f82bfc4e80
tool = fuzzingbook

Starting the Server

Once the server is set up, we can start it. On the command line, we use

$ python FuzzManager/server/manage.py runserver

In our notebook, we can do this programmatically, using the Process framework introduced for fuzzing Web servers. We let the FuzzManager server run in its own process, which we start in the background.

In [21]:
from multiprocessing import Process
In [22]:
import subprocess
In [23]:
def run_fuzzmanager():
    def run_fuzzmanager_forever():
        proc = subprocess.Popen(['python', 'FuzzManager/server/manage.py', 'runserver'],
                                  stdout=subprocess.PIPE,
                                  stdin=subprocess.PIPE,
                                  stderr=subprocess.STDOUT,
                                    universal_newlines=True)
        while True:
            line = proc.stdout.readline()
            print(line, end='')
    
    fuzzmanager_process = Process(target=run_fuzzmanager_forever)
    fuzzmanager_process.start()

    return fuzzmanager_process

While the server is running, you will be able to see its output below.

In [24]:
fuzzmanager_process = run_fuzzmanager()
In [25]:
import time
In [26]:
time.sleep(2)

Logging In

FuzzManager can now be reached on the local host using this URL. To log in, use the username demo and the password demo. In this notebook, we do this programmatically, using the Selenium interface introduced in the chapter on GUI fuzzing.

In [27]:
fuzzmanager_url = "http://127.0.0.1:8000"
In [28]:
from IPython.display import display, Image
In [29]:
from fuzzingbook_utils import HTML, rich_output
In [30]:
from GUIFuzzer import start_webdriver  # minor dependency

For an interactive session, set headless to False; then you can interact with FuzzManager at the same time you are interacting with this notebook.

In [31]:
gui_driver = start_webdriver(headless=True, zoom=1.2)
In [32]:
gui_driver.set_window_size(1400, 600)
In [33]:
gui_driver.get(fuzzmanager_url)

This is the starting screen of FuzzManager:

In [34]:
Image(gui_driver.get_screenshot_as_png())
Out[34]:

We now log in by sending demo both as username and password, and then click on the Login button.

In [35]:
username = gui_driver.find_element_by_name("username")
username.send_keys("demo")
In [36]:
password = gui_driver.find_element_by_name("password")
password.send_keys("demo")
In [37]:
login = gui_driver.find_element_by_tag_name("button")
login.click()
time.sleep(1)

After login, we find an empty database. This is where crashes will appear, once we have collected them.

In [38]:
Image(gui_driver.get_screenshot_as_png())
Out[38]:

Collecting Crashes

To fill our database, we need some crashes. Let us take a look at simply-buggy, an example repository containing trivial C++ programs for illustration purposes.

In [39]:
!git clone https://github.com/choller/simply-buggy
Cloning into 'simply-buggy'...
remote: Enumerating objects: 22, done.
remote: Total 22 (delta 0), reused 0 (delta 0), pack-reused 22
Unpacking objects: 100% (22/22), done.

The make command compiles our target program, including our first target, the simple-crash example. Alongside the program, there is also a configuration file generated.

In [40]:
!(cd simply-buggy && make)
clang++ -fsanitize=address -g -o maze maze.cpp
clang++ -fsanitize=address -g -o out-of-bounds out-of-bounds.cpp
clang++ -fsanitize=address -g -o simple-crash simple-crash.cpp

Let's take a look at the simple-crash source code. As you can see, the source code is fairly simple: A forced crash by writing to a (near)-NULL pointer. This should immediately crash on most machines.

In [41]:
from fuzzingbook_utils import print_file
In [42]:
print_file("simply-buggy/simple-crash.cpp")
/*
 * simple-crash - A simple NULL crash.
 *
 * WARNING: This program neither makes sense nor should you code like it is
 *          done in this program. It is purely for demo purposes and uses
 *          bad and meaningless coding habits on purpose.
 */

int crash() {
  int* p = (int*)0x1;
  *p = 0xDEADBEEF;
  return *p;
}

int main(int argc, char** argv) {
  return crash();
}

The configuration file generated for the the binary also contains some straightforward information, like the version of the program and other metadata that is required or at least useful later on when submitting crashes.

In [43]:
print_file("simply-buggy/simple-crash.fuzzmanagerconf", lexer=IniLexer())
[Main]
platform = x86-64
product = simple-crash-simple-crash
product_version = 83038f74e812529d0fc172a718946fbec385403e
os = linux

[Metadata]
pathPrefix = /Users/zeller/Projects/fuzzingbook/notebooks/simply-buggy/
buildFlags = -fsanitize=address -g

Let us run the program! We immediately get a crash trace as expected:

In [44]:
!simply-buggy/simple-crash
AddressSanitizer:DEADLYSIGNAL
=================================================================
==13283==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x000109132e78 bp 0x7ffee6acd460 sp 0x7ffee6acd430 T0)
==13283==The signal is caused by a WRITE memory access.
==13283==Hint: address points to the zero page.
    #0 0x109132e77 in crash() simple-crash.cpp:11
    #1 0x109132efa in main simple-crash.cpp:16
    #2 0x7fff585c53d4 in start (libdyld.dylib:x86_64+0x163d4)

==13283==Register values:
rax = 0x0000000000000001  rbx = 0x0000000000000000  rcx = 0x0000000000000001  rdx = 0x0000100000000000  
rdi = 0x0000000000000000  rsi = 0x0000100000000000  rbp = 0x00007ffee6acd460  rsp = 0x00007ffee6acd430  
 r8 = 0x0000000000000000   r9 = 0x0000000000000000  r10 = 0x0000000000000000  r11 = 0x0000000000000000  
r12 = 0x0000000000000000  r13 = 0x0000000000000000  r14 = 0x0000000000000000  r15 = 0x0000000000000000  
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV simple-crash.cpp:11 in crash()
==13283==ABORTING

Now, what we would actually like to do is to run this binary from Python instead, detect that it crashed, collect the trace and submit it to the server. Let's start with a simple script that would just run the program we give it and detect the presence of the ASan trace:

In [45]:
import subprocess
In [46]:
cmd = ["simply-buggy/simple-crash"]
In [47]:
result = subprocess.run(cmd, stderr=subprocess.PIPE)
stderr = result.stderr.decode().splitlines()
crashed = False

for line in stderr:
    if "ERROR: AddressSanitizer" in line:
        crashed = True
        break

if crashed:
    print("Yay, we crashed!")
else:
    print("Move along, nothing to see...")
Yay, we crashed!

With this script, we can now run the binary and indeed detect that it crashed. But how do we send this information to the crash server now? Let's add a few features from the FuzzManager toolbox.

Program Configurations

A ProgramConfiguration is largely a container class storing various properties of the program, e.g. product name, the platform, version and runtime options. By default, it reads the information from the .fuzzmanagerconf file created for the program under test.

In [48]:
from FTB.ProgramConfiguration import ProgramConfiguration
In [49]:
configuration = ProgramConfiguration.fromBinary('simply-buggy/simple-crash')
(configuration.product, configuration.platform)
Out[49]:
('simple-crash-simple-crash', 'x86-64')

Crash Info

A CrashInfo object stores all the necessary data about a crash, including

  • the stdout output of your program
  • the stderr output of your program
  • crash information as produced by GDB or AddressSanitizer
  • a ProgramConfiguration instance
In [50]:
from FTB.Signatures.CrashInfo import CrashInfo

Let's collect the information for the run of simply-crash:

In [51]:
cmd = ["simply-buggy/simple-crash"]
result = subprocess.run(cmd, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
In [52]:
stderr = result.stderr.decode().splitlines()
stderr[0:3]
Out[52]:
['AddressSanitizer:DEADLYSIGNAL',
 '=================================================================',
 '==13300==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x000101d57e78 bp 0x7ffeedea8430 sp 0x7ffeedea8400 T0)']
In [53]:
stdout = result.stdout.decode().splitlines()
stdout
Out[53]:
[]

This reads and parses our ASan trace into a more generic format, returning us a generic CrashInfo object that we can inspect and/or submit to the server:

In [54]:
crashInfo = CrashInfo.fromRawCrashData(stdout, stderr, configuration)
print(crashInfo)
Crash trace:

# 00    crash
# 01    main
# 02    start

Crash address: 0x1

Last 5 lines on stderr:
 r8 = 0x0000000000000000   r9 = 0x0000000000000000  r10 = 0x0000000000000000  r11 = 0x0000000000000000  
r12 = 0x0000000000000000  r13 = 0x0000000000000000  r14 = 0x0000000000000000  r15 = 0x0000000000000000  
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV simple-crash.cpp:11 in crash()
==13300==ABORTING

Collector

The last step is to send the crash info to our crash manager. A Collector is a feature to communicate with a CrashManager server. Collector provides an easy client interface that allows your clients to submit crashes as well as download and match existing signatures to avoid reporting frequent issues repeatedly.

In [55]:
from Collector.Collector import Collector

We instantiate the collector instance; this will be our entry point for talking to the server.

In [56]:
collector = Collector()

To submit the crash info, we use the collector's submit() method:

In [57]:
collector.submit(crashInfo);

Inspecting Crashes

We now submitted something to our local FuzzManager demo instance. If you run the crash server on your local machine, you can go to http://127.0.0.1:8000/crashmanager/crashes/ you should see the crash info just submitted. You can inquire the product, version, operating system, and further crash details.

In [58]:
gui_driver.refresh()
In [59]:
Image(gui_driver.get_screenshot_as_png())
Out[59]:

If you click on the crash ID, you can further inspect the submitted data.

In [60]:
crash = gui_driver.find_element_by_xpath('//td/a[contains(@href,"/crashmanager/crashes/")]')
crash.click()
time.sleep(1)
In [61]:
Image(gui_driver.get_screenshot_as_png())
Out[61]:

Since Collectors can be called from any program (provided they are configured to talk to the correct server), you can now collect crashes from anywhere – fuzzers on remote machines, crashes occurring during beta testing, or even crashes during production.

Crash Buckets

One challenge with collecting crashes is that the same crashes occur multiple times. If a product is in the hands of millions of users, chances are that thousands of them will encounter the same bug, and thus the same crash. Therefore, the database will have thousands of entries that are all caused by the same one bug. Therefore, it is necessary to identify those failures that are similar and to group them together in a set called a crash bucket or bucket for short.

In FuzzManager, a bucket is defined through a crash signature, a list of predicates matching a set of bugs. Such a predicate can refer to a number of features, the most important being

  • the current program counter, reporting the instruction excuted at the moment of the crash;
  • elements from the stack trace, showing which functions were active at the moment of the crash.

We can create such a signature right away when viewing a single crash:

In [62]:
Image(gui_driver.get_screenshot_as_png())
Out[62]:

Clicking the red Create button creates a bucket for this crash. A crash signature will be proposed to you for matching this and future crashes of the same type:

In [63]:
create = gui_driver.find_element_by_xpath('//a[contains(@href,"/signatures/new/")]')
create.click()
time.sleep(1)
In [64]:
gui_driver.set_window_size(1400, 1200)
In [65]:
Image(gui_driver.get_screenshot_as_png())
Out[65]:

Accept it by clicking Save.

In [66]:
save = gui_driver.find_element_by_name("submit_save")
save.click()
time.sleep(1)

You will be redirected to the newly created bucket, which shows you the size (how many crashes it holds), its bug report status (buckets can be linked to bugs in an external bug tracker like Bugzilla) and many other useful information.

Crash Signatures

If you click on the Signatures entry in the top menu, you should also see your newly created entry.

In [67]:
gui_driver.set_window_size(1400, 800)
Image(gui_driver.get_screenshot_as_png())
Out[67]:

You see that this signature refers to a crash occurring in the function crash() (duh!) when called from main() when called from start() (an internal OS function). We also see the current crash address.

Buckets and their signatures are a central concept in FuzzManager. If you receive a lot of crash reports from various sources, bucketing allows you to easily group crashes and filter duplicates.

Coarse-Grained Signatures

The flexible signature system starts out with an initially proposed fine-grained signature, but it can be adjusted as needed to capture variations of the same bug and make tracking easier.

In the next example, we will look at a more complex example that reads data from a file and creates multiple crash signatures.

In [68]:
print_file("simply-buggy/out-of-bounds.cpp")
/*
 * out-of-bounds - A simple multi-signature out-of-bounds demo.
 *
 * WARNING: This program neither makes sense nor should you code like it is
 *          done in this program. It is purely for demo purposes and uses
 *          bad and meaningless coding habits on purpose.
 */
#include <cstring>
#include <fstream>
#include <iostream>

void printFirst(char* data, size_t count) {
  std::string first(data, count);
  std::cout << first << std::endl;
}

void printLast(char* data, size_t count) {
  std::string last(data + strlen(data) - count, count);
  std::cout << last << std::endl;
}

int validateAndPerformAction(char* buffer, size_t size) {
  if (size < 2) {
    std::cerr << "Buffer is too short." << std::endl;
    return 1;
  }

  uint8_t action = buffer[0];
  uint8_t count = buffer[1];
  char* data = buffer + 2;

  if (!count) {
    std::cerr << "count must be non-zero." << std::endl;
    return 1;
  }

  // Forgot to check count vs. the length of data here, doh!

  if (!action) {
    std::cerr << "Action can't be zero." << std::endl;
    return 1;
  } else if (action >= 128) {
    printLast(data, count);
    return 0;
  } else {
    printFirst(data, count);
    return 0;
  }
}

int main(int argc, char** argv) {
  if (argc < 2) {
    std::cerr << "Usage is: " << argv[0] << " <file>" << std::endl;
    exit(1);
  }

  std::ifstream input(argv[1], std::ifstream::binary);
  if (!input) {
    std::cerr << "Error opening file." << std::endl;
    exit(1);
  }

  input.seekg(0, input.end);
  int size = input.tellg();
  input.seekg(0, input.beg);

  if (size < 0) {
    std::cerr << "Error seeking in file." << std::endl;
    exit(1);
  }

  char* buffer = new char[size];
  input.read(buffer, size);

  if (!input) {
    std::cerr << "Error while reading file." << std::endl;
    exit(1);
  }

  int ret = validateAndPerformAction(buffer, size);

  delete[] buffer;
  return ret;
}

This program looks way more elaborate compared to the last one, but don't worry, it is not really doing a whole lot:

  • The code in the main() function simply reads a file provided on the command line and puts its contents into a buffer that is passed to validateAndPerformAction().

  • That validateAndPerformAction() function pulls out two bytes of the buffer (action and count) and considers the rest data. Depending on the value of action, it then calls either printFirst() or printLast(), which prints either the first or the last count bytes of data.

If this sounds pointless, that is because it is. The whole idea of this program is that the security check (that count is not larger than the length of data) is missing in validateAndPerformAction() but that the illegal access happens later in either of the two print functions. Hence, we would expect this program to generate at least two (slightly) different crash signatures - one with printFirst() and one with printLast().

Let's try it out with very simple fuzzing based on the last Python script.

In [69]:
import os
import random
import subprocess
import tempfile
import sys

Since FuzzManager can have trouble with 8-bit characters in the input, we introduce an escapelines() function that converts text to printable ASCII characters.

In [70]:
def isascii(s):
    return all([0 <= ord(c) <= 127 for c in s])
In [71]:
isascii('Hello,')
Out[71]:
True
In [72]:
def escapelines(bytes):
    def ascii_chr(byte):
        if 0 <= byte <= 127:
            return chr(byte)
        return r"\x%02x" % byte

    def unicode_escape(line):
        ret = "".join(map(ascii_chr, line))
        assert isascii(ret)
        return ret

    return [unicode_escape(line) for line in bytes.splitlines()]
In [73]:
escapelines(b"Hello,\nworld!")
Out[73]:
['Hello,', 'world!']
In [74]:
escapelines(b"abc\xffABC")
Out[74]:
['abc\\xffABC']

Now to the actual script. As above, we set up a collector that collects and sends crash info whenever a crash occurs.

In [75]:
cmd = ["simply-buggy/out-of-bounds"]

# Connect to crash server
collector = Collector()

random.seed(2048)

crash_count = 0
TRIALS = 20

for itnum in range(0, TRIALS):
    rand_len = random.randint(1, 1024)
    rand_data = bytes([random.randrange(0, 256) for i in range(rand_len)])

    (fd, current_file) = tempfile.mkstemp(prefix="fuzztest", text=True)
    os.write(fd, rand_data)
    os.close(fd)
    
    current_cmd = []
    current_cmd.extend(cmd)
    current_cmd.append(current_file)
    
    result = subprocess.run(current_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout = []   # escapelines(result.stdout)
    stderr = escapelines(result.stderr)
    crashed = False

    for line in stderr:
        if "ERROR: AddressSanitizer" in line:
            crashed = True
            break
            
    print(itnum, end=" ")

    if crashed:
        sys.stdout.write("(Crash) ")
        
        # This reads the simple-crash.fuzzmanagerconf file
        configuration = ProgramConfiguration.fromBinary(cmd[0])

        # This reads and parses our ASan trace into a more generic format,
        # returning us a generic "CrashInfo" object that we can inspect
        # and/or submit to the server.
        crashInfo = CrashInfo.fromRawCrashData(stdout, stderr, configuration)

        # Submit the crash
        collector.submit(crashInfo, testCase = current_file)
        
        crash_count += 1
    
    os.remove(current_file)

print("")
print("Done, submitted %d crashes after %d runs." % (crash_count, TRIALS))
0 (Crash) 1 2 (Crash) 3 4 5 6 7 8 (Crash) 9 10 11 12 (Crash) 13 14 (Crash) 15 16 17 18 19 
Done, submitted 5 crashes after 20 runs.

If you run this script, you will see its progress and notice that it produces quite a few crashes. And indeed, if you visit the FuzzManager crashes page, you will notice a variety of crashes that have accumulated:

In [76]:
gui_driver.get(fuzzmanager_url + "/crashmanager/crashes")
In [77]:
Image(gui_driver.get_screenshot_as_png())
Out[77]:

Pick the first crash and create a bucket for it, like you did the last time. After saving, you will notice that not all of your crashes went into the bucket. The reason is that our program created several different stacks that are somewhat similar but not exactly identical. This is a common problem when fuzzing real world applications.

Fortunately, there is an easy way to deal with this. While on the bucket page, hit the Optimize button for the bucket. FuzzManager will then automatically propose you to change your signature. Accept the change by hitting Edit with Changes and then Save. Repeat these steps until all crashes are part of the bucket. After 3 to 4 iterations, your signature will likely look like this:

{
  "symptoms": [
    {
      "type": "output",
      "src": "stderr",
      "value": "/ERROR: AddressSanitizer: heap-buffer-overflow/"
    },
    {
      "type": "stackFrames",
      "functionNames": [
        "?",
        "?",
        "?",
        "validateAndPerformAction",
        "main",
        "__libc_start_main",
        "_start"
      ]
    },
    {
      "type": "crashAddress",
      "address": "> 0xFF"
    }
  ]
}

As you can see in the stackFrames signature symptom, the validateAndPerformAction function is still present in the stack frame, because this function is common across all stack traces in all crashes; in fact, this is where the bug lives. But the lower stack parts have been generalized into arbitrary functions (?) because they vary across the set of submitted crashes.

The Optimize function is designed to automate this process as much as possible: It attempts to broaden the signature by fitting it to untriaged crashes and then checks if the modified signature would touch other existing buckets. This works with the assumption that other buckets are indeed other bugs, i.e. if you had created two buckets from your crashes first, optimizing would not work anymore. Also, if the existing bucket data is sparse and you have a lot of untriaged crashes, the algorithm could propose changes that include crashes of different bugs in the same bucket. There is no way to fully automatically detect and prevent this, hence the process is semi-automated and requires you to review all proposed changes.

Collecting Code Coverage

In the chapter on coverage, we have seen how measuring code coverage can be beneficial to assess fuzzer performance. Holes in code coverage can reveal particularly hard-to-reach locations as well as bugs in the fuzzer itself. Because this is an important part of the overall fuzzing operations, FuzzManager supports visualizing per-fuzzing code coverage of repositories – that is, we can interactively inspect which code was covered during fuzzing, and which was not.

To illustrate coverage collection and visualization in FuzzManager, we take a look at a another simple C++ program, the maze example:

In [78]:
print_file("simply-buggy/maze.cpp")
/*
 * maze - A simple constant maze that crashes at some point.
 *
 * WARNING: This program neither makes sense nor should you code like it is
 *          done in this program. It is purely for demo purposes and uses
 *          bad and meaningless coding habits on purpose.
 */

#include <cstdlib>
#include <iostream>

int boom() {
  int* p = (int*)0x1;
  *p = 0xDEADBEEF;
  return *p;
}

int main(int argc, char** argv) {
  if (argc != 5) {
    std::cerr << "All I'm asking for is four numbers..." << std::endl;
    return 1;
  }

  int num1 = atoi(argv[1]);
  if (num1 > 0) {
    int num2 = atoi(argv[2]);
    if (num1 > 2040109464) {
      if (num2 < 0) {
        std::cerr << "You found secret 1" << std::endl;
        return 0;
      }
    } else {
      if ((unsigned int)num2 == 3735928559) {
        unsigned int num3 = atoi(argv[3]);
        if (num3 == 3405695742) {
          int num4 = atoi(argv[4]);
          if (num4 == 1111638594) {
            std::cerr << "You found secret 2" << std::endl;
            boom();
            return 0;
          }
        }
      }
    }
  }

  return 0;
}

As you can see, all this program does is read some numbers from the command line, compare them to some magical constants and arbitrary criteria, and if everything works out, you reach one of the two secrets in the program. Reaching one of these secrets also triggers a failure.

Before we start to work on this program, we recompile the programs with coverage support. In order to emit code coverage with either Clang or GCC, programs typically need to be built and linked with special CFLAGS like --coverage. In our case, the Makefile does this for us:

In [79]:
!(cd simply-buggy && make clean && make coverage)
rm -f ./maze ./out-of-bounds ./simple-crash
clang++ -fsanitize=address -g --coverage -o maze maze.cpp
clang++ -fsanitize=address -g --coverage -o out-of-bounds out-of-bounds.cpp
clang++ -fsanitize=address -g --coverage -o simple-crash simple-crash.cpp

Also, if we want to use FuzzManager to look at our code, we need to do the initial repository setup (essentially giving the server its own working copy of our git repository to pull the source from). Normally, the client and server run on different machines, so this involves checking out the repository on the server and telling it where to find it (and what version control system it uses):

In [80]:
!git clone https://github.com/choller/simply-buggy $HOME/simply-buggy-server    
Cloning into '/Users/zeller/simply-buggy-server'...
remote: Enumerating objects: 22, done.
remote: Total 22 (delta 0), reused 0 (delta 0), pack-reused 22
Unpacking objects: 100% (22/22), done.
In [81]:
!python3 FuzzManager/server/manage.py setup_repository simply-buggy GITSourceCodeProvider $HOME/simply-buggy-server
Successfully created repository 'simply-buggy' with provider 'GITSourceCodeProvider' located at /Users/zeller/simply-buggy-server

We now assume that we know some of the magic constants (like in practice, we sometimes know some things about the target, but might miss a detail) and we fuzz the program with that:

In [82]:
import random
import subprocess
In [83]:
random.seed(0)
cmd = ["simply-buggy/maze"]

constants = [3735928559, 1111638594]; 

TRIALS = 1000

for itnum in range(0, TRIALS):
    current_cmd = []
    current_cmd.extend(cmd)
    
    for _ in range(0,4):
        if random.randint(0, 9) < 3:
            current_cmd.append(str(constants[random.randint(0, len(constants) - 1)]))
        else:
            current_cmd.append(str(random.randint(-2147483647, 2147483647)))
    
    result = subprocess.run(current_cmd, stderr=subprocess.PIPE)
    stderr = result.stderr.decode().splitlines()
    crashed = False
    
    if stderr and "secret" in stderr[0]:
        print(stderr[0])

    for line in stderr:
        if "ERROR: AddressSanitizer" in line:
            crashed = True
            break

    if crashed:
        print("Found the bug!")
        break

print("Done!")
You found secret 1
You found secret 1
You found secret 1
You found secret 1
You found secret 1
Done!

As you can see, with 1000 runs we found secret 1 a few times, but secret 2 (and the crash) are still missing. In order to determine how to improve this, we are now going to look at the coverage data.

We use Mozilla's grcov tool to capture graphical coverage information.

In [84]:
!export PATH=$HOME/.cargo/bin:$PATH; grcov simply-buggy/ -t coveralls+ --commit-sha $(cd simply-buggy && git rev-parse HEAD) --token NONE -p `pwd`/simply-buggy/ > coverage.json
In [85]:
!python3 -mCovReporter --repository simply-buggy --description "Test1" --submit coverage.json

We can now go to the FuzzManager coverage page to take a look at our source code and its coverage.

In [86]:
gui_driver.get(fuzzmanager_url + "/covmanager")
In [87]:
Image(gui_driver.get_screenshot_as_png())
Out[87]: