# Mutation-Based Fuzzing¶

Most randomly generated inputs are syntactically invalid and thus are quickly rejected by the processing program. To exercise functionality beyond input processing, we must increase chances to obtain valid inputs. One such way is so-called mutational fuzzing – that is, introducing small changes to existing inputs that may still keep the input valid, yet exercise new behavior. We show how to create such mutations, and how to guide them towards yet uncovered code, applying central concepts from the popular AFL fuzzer.

Prerequisites

• You should know how basic fuzzing works; for instance, from the "Fuzzing" chapter.

## Synopsis¶

>>> from fuzzingbook.MutationFuzzer import <identifier>


and then make use of the following features.

This chapter introduces a MutationFuzzer class that takes a list of seed inputs which are then mutated:

>>> seed_input = "http://www.google.com/search?q=fuzzing"
>>> mutation_fuzzer = MutationFuzzer(seed=[seed_input])
>>> [mutation_fuzzer.fuzz() for i in range(10)]
'http://www.g=oNogl.om/search?q=fuzzing/',
'RttpX://w)ww.goo(gle.comq/sarc(q=fuzzng',
'hdt8p://"wWw.goole.com/seDarb*?q=fuzzing',
'httop://www.CooglGe.om/s$arch?q=fuzzingY', 'http://wwlw.google.c"om/secrch?yq=fuzzin', 'hup://www.google.comC/search?q=fuzzing', 'http://w7w.google.com/search?q=ufuzgzing', 'http://www,google.com/sear4ch?.q=fuzzing', 'http://w&ww.google.cKom/search7q=fuzzing']  The MutationCoverageFuzzer maintains a population of inputs, which are then evolved in order to maximize coverage. >>> mutation_fuzzer = MutationCoverageFuzzer(seed=[seed_input]) >>> mutation_fuzzer.runs(http_runner, trials=10000) >>> mutation_fuzzer.population[:5] ['http://www.google.com/search?q=fuzzing', 'htTp://www.googld.cqom/searchq=fuzzIng', 'htTp://www.gloogld.qom/|searchq=fuzzng', 'htTp://www.googld.cqomo0searchq=fuzzIng', 'htTp://www*goegld.cqoe/sa7#hq=fuzIng']  ## Fuzzing with Mutations¶ On November 2013, the first version of American Fuzzy Lop (AFL) was released. Since then, AFL has become one of the most successful fuzzing tools and comes in many flavours, e.g., AFLFast, AFLGo, and AFLSmart (which are discussed in this book). AFL has made fuzzing a popular choice for automated vulnerability detection. It was the first to demonstrate that vulnerabilities can be detected automatically at a large scale in many security-critical, real-world applications. Figure 1. American Fuzzy Lop Command Line User Interface In this chapter, we are going to introduce the basics of mutational fuzz testing; the next chapter will then further show how to direct fuzzing towards specific code goals. ## Fuzzing a URL Parser¶ Many programs expect their inputs to come in a very specific format before they would actually process them. As an example, think of a program that accepts a URL (a Web address). The URL has to be in a valid format (i.e., the URL format) such that the program can deal with it. When fuzzing with random inputs, what are our chances to actually produce a valid URL? To get deeper into the problem, let us explore what URLs are made of. A URL consists of a number of elements: scheme://netloc/path?query#fragment  where • scheme is the protocol to be used, including http, https, ftp, file... • netloc is the name of the host to connect to, such as www.google.com • path is the path on that very host, such as search • query is a list of key/value pairs, such as q=fuzzing • fragment is a marker for a location in the retrieved document, such as #result In Python, we can use the urlparse() function to parse and decompose a URL into its parts. In [1]: import fuzzingbook_utils  In [2]: try: from urlparse import urlparse # Python 2 except ImportError: from urllib.parse import urlparse # Python 3 urlparse("http://www.google.com/search?q=fuzzing")  Out[2]: ParseResult(scheme='http', netloc='www.google.com', path='/search', params='', query='q=fuzzing', fragment='') We see how the result encodes the individual parts of the URL in different attributes. Let us now assume we have a program that takes a URL as input. To simplify things, we won't let it do very much; we simply have it check the passed URL for validity. If the URL is valid, it returns True; otherwise, it raises an exception. In [3]: def http_program(url): supported_schemes = ["http", "https"] result = urlparse(url) if result.scheme not in supported_schemes: raise ValueError("Scheme must be one of " + repr(supported_schemes)) if result.netloc == '': raise ValueError("Host must be non-empty") # Do something with the URL return True  Let us now go and fuzz http_program(). To fuzz, we use the full range of printable ASCII characters, such that :, /, and lowercase letters are included. In [4]: from Fuzzer import fuzzer  In [5]: fuzzer(char_start=32, char_range=96)  Out[5]: '"N&+slk%h\x7fyp5o\'@[3(rW*M5W]tMFPU4\\[email protected]%[X?uo\\1?b4T;1bDeYtHx #UJ5w}pMmPodJM,_' Let's try to fuzz with 1000 random inputs and see whether we have some success. In [6]: for i in range(1000): try: url = fuzzer() result = http_program(url) print("Success!") except ValueError: pass  What are the chances of actually getting a valid URL? We need our string to start with "http://" or "https://". Let's take the "http://" case first. These are seven very specific characters we need to start with. The chance of producing these seven characters randomly (with a character range of 96 different characters) is$1 : 96^7$, or In [7]: 96 ** 7  Out[7]: 75144747810816 The odds of producing a "https://" prefix are even worse, at$1 : 96^8$: In [8]: 96 ** 8  Out[8]: 7213895789838336 which gives us a total chance of In [9]: likelihood = 1 / (96 ** 7) + 1 / (96 ** 8) likelihood  Out[9]: 1.344627131107667e-14 And this is the number of runs (on average) we'd need to produce a valid URL scheme: In [10]: 1 / likelihood  Out[10]: 74370059689055.02 Let's measure how long one run of http_program() takes: In [11]: from Timer import Timer  In [12]: trials = 1000 with Timer() as t: for i in range(trials): try: url = fuzzer() result = http_program(url) print("Success!") except ValueError: pass duration_per_run_in_seconds = t.elapsed_time() / trials duration_per_run_in_seconds  Out[12]: 5.87425309995524e-05 That's pretty fast, isn't it? Unfortunately, we have a lot of runs to cover. In [13]: seconds_until_success = duration_per_run_in_seconds * (1 / likelihood) seconds_until_success  Out[13]: 4368685536.722877 which translates into In [14]: hours_until_success = seconds_until_success / 3600 days_until_success = hours_until_success / 24 years_until_success = days_until_success / 365.25 years_until_success  Out[14]: 138.4352909195527 Even if we parallelize things a lot, we're still in for months to years of waiting. And that's for getting one successful run that will get deeper into http_program(). What basic fuzzing will do well is to test urlparse(), and if there is an error in this parsing function, it has good chances of uncovering it. But as long as we cannot produce a valid input, we are out of luck in reaching any deeper functionality. ## Mutating Inputs¶ The alternative to generating random strings from scratch is to start with a given valid input, and then to subsequently mutate it. A mutation in this context is a simple string manipulation - say, inserting a (random) character, deleting a character, or flipping a bit in a character representation. This is called mutational fuzzing – in contrast to the generational fuzzing techniques discussed earlier. Here are some mutations to get you started: In [15]: import random  In [16]: def delete_random_character(s): """Returns s with a random character deleted""" if s == "": return s pos = random.randint(0, len(s) - 1) # print("Deleting", repr(s[pos]), "at", pos) return s[:pos] + s[pos + 1:]  In [17]: seed_input = "A quick brown fox" for i in range(10): x = delete_random_character(seed_input) print(repr(x))  'A uick brown fox' 'A quic brown fox' 'A quick brown fo' 'A quic brown fox' 'A quick bown fox' 'A quick bown fox' 'A quick brown fx' 'A quick brown ox' 'A quick brow fox' 'A quic brown fox'  In [18]: def insert_random_character(s): """Returns s with a random character inserted""" pos = random.randint(0, len(s)) random_character = chr(random.randrange(32, 127)) # print("Inserting", repr(random_character), "at", pos) return s[:pos] + random_character + s[pos:]  In [19]: for i in range(10): print(repr(insert_random_character(seed_input)))  'A quick brvown fox' 'A quwick brown fox' 'A qBuick brown fox' 'A quick broSwn fox' 'A quick brown fvox' 'A quick brown 3fox' 'A quick brNown fox' 'A quick brow4n fox' 'A quick brown fox8' 'A equick brown fox'  In [20]: def flip_random_character(s): """Returns s with a random bit flipped in a random position""" if s == "": return s pos = random.randint(0, len(s) - 1) c = s[pos] bit = 1 << random.randint(0, 6) new_c = chr(ord(c) ^ bit) # print("Flipping", bit, "in", repr(c) + ", giving", repr(new_c)) return s[:pos] + new_c + s[pos + 1:]  In [21]: for i in range(10): print(repr(flip_random_character(seed_input)))  'A quick bRown fox' 'A quici brown fox' 'A"quick brown fox' 'A quick brown$fox'
'A quick bpown fox'
'A quick brown!fox'
'A 1uick brown fox'
'@ quick brown fox'
'A quic+ brown fox'
'A quick bsown fox'


Let us now create a random mutator that randomly chooses which mutation to apply:

In [22]:
def mutate(s):
"""Return s with a random mutation applied"""
mutators = [
delete_random_character,
insert_random_character,
flip_random_character
]
mutator = random.choice(mutators)
# print(mutator)
return mutator(s)

In [23]:
for i in range(10):
print(repr(mutate("A quick brown fox")))

'A qzuick brown fox'
' quick brown fox'
'A quick Brown fox'
'A qMuick brown fox'
'A qu_ick brown fox'
'A quick bXrown fox'
'A quick brown fx'
'A quick!brown fox'
'A! quick brown fox'
'A quick brownfox'


The idea is now that if we have some valid input(s) to begin with, we may create more input candidates by applying one of the above mutations. To see how this works, let's get back to URLs.

## Mutating URLs¶

Let us now get back to our URL parsing problem. Let us create a function is_valid_url() that checks whether http_program() accepts the input.

In [24]:
def is_valid_url(url):
try:
result = http_program(url)
return True
except ValueError:
return False

In [25]:
assert is_valid_url("http://www.google.com/search?q=fuzzing")
assert not is_valid_url("xyzzy")


Let us now apply the mutate() function on a given URL and see how many valid inputs we obtain.

In [26]:
seed_input = "http://www.google.com/search?q=fuzzing"
valid_inputs = set()
trials = 20

for i in range(trials):
inp = mutate(seed_input)
if is_valid_url(inp):


We can now observe that by mutating the original input, we get a high proportion of valid inputs:

In [27]:
len(valid_inputs) / trials

Out[27]:
0.8

What are the odds of also producing a https: prefix by mutating a http: sample seed input? We have to insert ($1 : 3$) the right character 's' ($1 : 96$) into the correct position ($1 : l$), where $l$ is the length of our seed input. This means that on average, we need this many runs:

In [28]:
trials = 3 * 96 * len(seed_input)
trials

Out[28]:
10944

We can actually afford this. Let's try:

In [29]:
from Timer import Timer

In [30]:
trials = 0
with Timer() as t:
while True:
trials += 1
inp = mutate(seed_input)
if inp.startswith("https://"):
print(
"Success after",
trials,
"trials in",
t.elapsed_time(),
"seconds")
break

Success after 3656 trials in 0.010670263999600138 seconds


Of course, if we wanted to get, say, an "ftp://" prefix, we would need more mutations and more runs – most important, though, we would need to apply multiple mutations.

## Multiple Mutations¶

So far, we have only applied one single mutation on a sample string. However, we can also apply multiple mutations, further changing it. What happens, for instance, if we apply, say, 20 mutations on our sample string?

In [31]:
seed_input = "http://www.google.com/search?q=fuzzing"
mutations = 50

In [32]:
inp = seed_input
for i in range(mutations):
if i % 5 == 0:
print(i, "mutations:", repr(inp))
inp = mutate(inp)

0 mutations: 'http://www.google.com/search?q=fuzzing'
10 mutations: 'http:/L/www.ggoWglej.com/seaRchqfu:in'
15 mutations: 'http:/L/wwggoWglej.com/seaR3hqf,u:in'
20 mutations: 'htt://wwggoVgle"j.som/seaR3hqf,u:in'
25 mutations: 'htt://fwggoVgle"j.som/eaRd3hqf,u^:in'
30 mutations: 'htv://>fwggoVgle"j.qom/ea0Rd3hqf,u^:i'
35 mutations: 'htv://>fwggozVle"Bj.qom/eapRd[3hqf,u^:i'
40 mutations: 'htv://>fwgeo6zTle"Bj.\'qom/eapRd[3hqf,tu^:i'
45 mutations: 'htv://>fwgeo]6zTle"BjM.\'qom/eaR[3hqf,tu^:i'


As you see, the original seed input is hardly recognizable anymore. By mutating the input again and again, we get a higher variety in the input.

To implement such multiple mutations in a single package, let us introduce a MutationFuzzer class. It takes a seed (a list of strings) as well as a minimum and a maximum number of mutations.

In [33]:
from Fuzzer import Fuzzer

In [34]:
class MutationFuzzer(Fuzzer):
def __init__(self, seed, min_mutations=2, max_mutations=10):
self.seed = seed
self.min_mutations = min_mutations
self.max_mutations = max_mutations
self.reset()

def reset(self):
self.population = self.seed
self.seed_index = 0


In the following, let us develop MutationFuzzer further by adding more methods to it. The Python language requires us to define an entire class with all methods as a single, continuous unit; however, we would like to introduce one method after another. To avoid this problem, we use a special hack: Whenever we want to introduce a new method to some class C, we use the construct

class C(C):
def new_method(self, args):
pass


This seems to define C as a subclass of itself, which would make no sense – but actually, it introduces a new C class as a subclass of the old C class, and then shadowing the old C definition. What this gets us is a C class with new_method() as a method, which is just what we want. (C objects defined earlier will retain the earlier C definition, though, and thus must be rebuilt.)

Using this hack, we can now add a mutate() method that actually invokes the above mutate() function. Having mutate() as a method is useful when we want to extend a MutationFuzzer later.

In [35]:
class MutationFuzzer(MutationFuzzer):
def mutate(self, inp):
return mutate(inp)


Let's get back to our strategy, maximizing diversity in coverage in our population. First, let us create a method create_candidate(), which randomly picks some input from our current population (self.population), and then applies between min_mutations and max_mutations mutation steps, returning the final result:

In [36]:
class MutationFuzzer(MutationFuzzer):
def create_candidate(self):
candidate = random.choice(self.population)
trials = random.randint(self.min_mutations, self.max_mutations)
for i in range(trials):
candidate = self.mutate(candidate)
return candidate


The fuzz() method is set to first pick the seeds; when these are gone, we mutate:

In [37]:
class MutationFuzzer(MutationFuzzer):
def fuzz(self):
if self.seed_index < len(self.seed):
# Still seeding
self.inp = self.seed[self.seed_index]
self.seed_index += 1
else:
# Mutating
self.inp = self.create_candidate()
return self.inp

In [38]:
seed_input = "http://www.google.com/search?q=fuzzing"
mutation_fuzzer = MutationFuzzer(seed=[seed_input])
mutation_fuzzer.fuzz()

Out[38]:
'http://www.google.com/search?q=fuzzing'
In [39]:
mutation_fuzzer.fuzz()

Out[39]:
'http://www.gogl9ecom/earch?qfuzzing'
In [40]:
mutation_fuzzer.fuzz()

Out[40]:
'htotq:/www.googleom/yseach?q=fzzijg'

With every new invocation of fuzz(), we get another variant with multiple mutations applied. The higher variety in inputs, though, increases the risk of having an invalid input. The key to success lies in the idea of guiding these mutations – that is, keeping those that are especially valuable.

## Guiding by Coverage¶

To cover as much functionality as possible, one can rely on either specified or implemented functionality, as discussed in the "Coverage" chapter. For now, we will not assume that there is a specification of program behavior (although it definitely would be good to have one!). We will assume, though, that the program to be tested exists – and that we can leverage its structure to guide test generation.

Since testing always executes the program at hand, one can always gather information about its execution – the least is the information needed to decide whether a test passes or fails. Since coverage is frequently measured as well to determine test quality, let us also assume we can retrieve coverage of a test run. The question is then: How can we leverage coverage to guide test generation?

One particularly successful idea is implemented in the popular fuzzer named American fuzzy lop, or AFL for short. Just like our examples above, AFL evolves test cases that have been successful – but for AFL, "success" means finding a new path through the program execution. This way, AFL can keep on mutating inputs that so far have found new paths; and if an input finds another path, it will be retained as well.

Let us build such a strategy. We start with introducing a Runner class that captures the coverage for a given function. First, a FunctionRunner class:

In [41]:
from Fuzzer import Runner

In [42]:
class FunctionRunner(Runner):
def __init__(self, function):
"""Initialize.  function is a function to be executed"""
self.function = function

def run_function(self, inp):
return self.function(inp)

def run(self, inp):
try:
result = self.run_function(inp)
outcome = self.PASS
except Exception:
result = None
outcome = self.FAIL

return result, outcome

In [43]:
http_runner = FunctionRunner(http_program)
http_runner.run("https://foo.bar/")

Out[43]:
(True, 'PASS')

We can now extend the FunctionRunner class such that it also measures coverage. After invoking run(), the coverage() method returns the coverage achieved in the last run.

In [44]:
from Coverage import Coverage, population_coverage

In [45]:
class FunctionCoverageRunner(FunctionRunner):
def run_function(self, inp):
with Coverage() as cov:
try:
result = super().run_function(inp)
except Exception as exc:
self._coverage = cov.coverage()
raise exc

self._coverage = cov.coverage()
return result

def coverage(self):
return self._coverage

In [46]:
http_runner = FunctionCoverageRunner(http_program)
http_runner.run("https://foo.bar/")

Out[46]:
(True, 'PASS')

Here are the first five locations covered:

In [47]:
print(list(http_runner.coverage())[:5])

[('urlparse', 375), ('__exit__', 25), ('_coerce_args', 115), ('http_program', 10), ('http_program', 3)]


Now for the main class. We maintain the population and a set of coverages already achieved (coverages_seen). The fuzz() helper function takes an input and runs the given function() on it. If its coverage is new (i.e. not in coverages_seen), the input is added to population and the coverage to coverages_seen.

In [48]:
class MutationCoverageFuzzer(MutationFuzzer):
def reset(self):
super().reset()
self.coverages_seen = set()
# Now empty; we fill this with seed in the first fuzz runs
self.population = []

def run(self, runner):
"""Run function(inp) while tracking coverage.
If we reach new coverage,
add inp to population and its coverage to population_coverage
"""
result, outcome = super().run(runner)
new_coverage = frozenset(runner.coverage())
if outcome == Runner.PASS and new_coverage not in self.coverages_seen:
# We have new coverage
self.population.append(self.inp)

return result


Let us now put this to use:

In [49]:
seed_input = "http://www.google.com/search?q=fuzzing"
mutation_fuzzer = MutationCoverageFuzzer(seed=[seed_input])
mutation_fuzzer.runs(http_runner, trials=10000)
mutation_fuzzer.population

Out[49]:
['http://www.google.com/search?q=fuzzing',
'http://www.goog.com/search;q=fuzzilng',
'http://ww.6goog\x0eoomosearch;/q=f}zzilng',
'http://uv.Lboo.comoseakrch;q=fuzilng',
'http://ww.6goog\x0eo/mosarch;/q=f}z{il~g',
'Http://www.g/ogle.com/earchq=fuzzing',
'http://www.goog.com/lsearkh;q=fuzzilng',
'http://www.Ifnole.com/searchq=fzzing',
'Http://ww.g/ogle.com/earchq=furzing',
'Http://wwQw.g/oGle.ug/m/ear#hq=uzzGing',
'Http://ww.g/ogle.cnom?earchq=fZrzing',
'http://www.goog.co#m/lsearkh;q=fuzilng',
'Http://wwQw.g/oGle.ug/m/ear#hqg=uzzGvi~g',
'http://RuV.Lboo.comosekrch;q=ftzil~g',
"http://www.googco#_mx/lsa'rkhq=fuzilng",
'htTp://w.6goog\x0eo/mosarch;/q=f}z{il~g',
'Http://wwQ?w.g/oGle6.ug/m/ekar#hq=uzzGing',
'httP://uv.Lboo.comoseakrch;q=fzilng',
'http://ww.6goog\x0eo/mosarch;/?q=fz{il~g',
'htTp://w.6eoog\x0eo/mosarch;?p=f}z{il~w',
'http://www.goog\x0eom/sea7rch;#q=fuezzi-mog',
'httP://uv.L"Noo.omosekrch;q=fziln',
'htTp://w.6go7opg\x0eo/mosayrch;/ #q=b}z{il~g',
'Http://wwQ?w.g/%oGle6.ug//ekar!hq=uzzGinc',
'htTp://wS.6eoog\x0e/msarcah;?p=f}z{i~7,',
'http://www.goog\x0eo}/qa5ch;Y#q=fuezzi-mog',
'http://ww.W=6goxog\x0eo/mosarXch;+?q=fz{il~g',
'htTp://w.6Foo\x0en/mosarch;q=fK}z{il~g',
'htTp://7.6Gogo\x0e/iowarcj;q=fK}z{Il~gX',
'http://ww.62gogo/m%/saRch;/q=fl}ziil~g',
'http://www.go?og\x0eo}/qacP;Y#eq=fuezzik-mog',
'http://www.go?g\x0ek}/qaP;Y#q=fuezzik-moga',
'htTp://Hw.6g7g\x0eo/mosayrc)h;/K #q=j}zyil~g',
'htTp://Hw.6gg\x0eo/moseyrc)h;OK(#q=j}zyil~g',
'http://*ww.W=goxog\x0e/2RmosarXch;+?Cq=f{i2l~g',
'htTp://.6,gokg\x0eo~/mosrch;/?f\\}z{il~g',
'htTp://Hw.6gg\x0eo/}oseyrc)h;OK(#p=j}zyil~g',
'htTp://w.A6go7evopg\nFo/mosayvch;?R#q=b}z{il~g',
'htTp://w.Ago7e%vopgV\nzFo/oayv=ch;?R6#q=b}z{il~g',
'Http://wwQ?w.g/oGle6.ug/m/ka2#hq9=uzzGing',
'http://wHw.67g\x0eo/mosaygrc)h;/K #q=j}zyil~g',
'htTp://w.g&o\x0eo/mocarh;/q=b}Mx{iBl~g',
'http://w.Ago7e%vopoV\nzIFo/oayv=ch;?R6#q=b}z{il~+g=',
'htTp://.6O,gokg\x0eo~/mTmsrc;h;/?f\\}Ez{il~g',
'http://ww%.6goog\x0eo/mosach;/?q=fz{il~g',
'http://wHw.67g\x0eo/mosaygc)h;/K #q#=j}zyilg']

Success! In our population, each and every input now is valid and has a different coverage, coming from various combinations of schemes, paths, queries, and fragments.

In [50]:
all_coverage, cumulative_coverage = population_coverage(
mutation_fuzzer.population, http_program)

In [51]:
import matplotlib.pyplot as plt

In [52]:
plt.plot(cumulative_coverage)
plt.title('Coverage of urlparse() with random inputs')
plt.xlabel('# of inputs')
plt.ylabel('lines covered');


The nice thing about this strategy is that, applied to larger programs, it will happily explore one path after the other – covering functionality after functionality. All that is needed is a means to capture the coverage.

## Synopsis¶

This chapter introduces a MutationFuzzer class that takes a list of seed inputs which are then mutated:

In [53]:
seed_input = "http://www.google.com/search?q=fuzzing"
mutation_fuzzer = MutationFuzzer(seed=[seed_input])
[mutation_fuzzer.fuzz() for i in range(10)]

Out[53]:
['http://www.google.com/search?q=fuzzing',
'http://www.g=oNogl.om/search?q=fuzzing/',
'RttpX://w)ww.goo(gle.comq/sarc(q=fuzzng',
'hdt8p://"wWw.goole.com/seDarb*?q=fuzzing',
'httop://www.CooglGe.om/s$arch?q=fuzzingY', 'http://wwlw.google.c"om/secrch?yq=fuzzin', 'hup://www.google.comC/search?q=fuzzing', 'http://w7w.google.com/search?q=ufuzgzing', 'http://www,google.com/sear4ch?.q=fuzzing', 'http://w&ww.google.cKom/search7q=fuzzing'] The MutationCoverageFuzzer maintains a population of inputs, which are then evolved in order to maximize coverage. In [54]: mutation_fuzzer = MutationCoverageFuzzer(seed=[seed_input]) mutation_fuzzer.runs(http_runner, trials=10000) mutation_fuzzer.population[:5]  Out[54]: ['http://www.google.com/search?q=fuzzing', 'htTp://www.googld.cqom/searchq=fuzzIng', 'htTp://www.gloogld.qom/|searchq=fuzzng', 'htTp://www.googld.cqomo0searchq=fuzzIng', 'htTp://www*goegld.cqoe/sa7#hq=fuzIng'] ## Lessons Learned¶ • Randomly generated inputs are frequently invalid – and thus exercise mostly input processing functionality. • Mutations from existing valid inputs have much higher chances to be valid, and thus to exercise functionality beyond input processing. ## Next Steps¶ In the next chapter on greybox fuzzing, we further extend the concept of mutation-based testing with power schedules that allow to pends more energy on seeds that exercise "unlikely" paths and seeds that are "closer" to a target location. ## Exercises¶ ### Exercise 1: Fuzzing CGI decode with Mutations¶ Apply the above guided mutation-based fuzzing technique on cgi_decode() from the "Coverage" chapter. How many trials do you need until you cover all variations of +, % (valid and invalid), and regular characters? In [55]: from Coverage import cgi_decode  In [56]: seed = ["Hello World"] cgi_runner = FunctionCoverageRunner(cgi_decode) m = MutationCoverageFuzzer(seed) results = m.runs(cgi_runner, 10000)  In [57]: m.population  Out[57]: ['Hello World', 'he_<+llo(or<D', 'L}eml &Wol%dD', 'L)q<}aml &cWol%d3D+'] In [58]: cgi_runner.coverage()  Out[58]: {('__exit__', 25), ('cgi_decode', 9), ('cgi_decode', 10), ('cgi_decode', 11), ('cgi_decode', 12), ('cgi_decode', 15), ('cgi_decode', 16), ('cgi_decode', 17), ('cgi_decode', 18), ('cgi_decode', 19), ('cgi_decode', 20), ('cgi_decode', 21), ('cgi_decode', 22), ('cgi_decode', 23), ('cgi_decode', 24), ('cgi_decode', 25), ('cgi_decode', 26), ('cgi_decode', 30), ('cgi_decode', 31), ('cgi_decode', 32), ('run_function', 7)} In [59]: all_coverage, cumulative_coverage = population_coverage( m.population, cgi_decode) import matplotlib.pyplot as plt plt.plot(cumulative_coverage) plt.title('Coverage of cgi_decode() with random inputs') plt.xlabel('# of inputs') plt.ylabel('lines covered');  After 10,000 runs, we have managed to synthesize a + character and a valid %xx form. We can still do better. ### Exercise 2: Fuzzing bc with Mutations¶ Apply the above mutation-based fuzzing technique on bc, as in the chapter "Introduction to Fuzzing". #### Part 1: Non-Guided Mutations¶ Start with non-guided mutations. How many of the inputs are valid? Solution. This is just a matter of tying a ProgramRunner to a MutationFuzzer: In [60]: from Fuzzer import ProgramRunner  In [61]: seed = ["1 + 1"] bc = ProgramRunner(program="bc") m = MutationFuzzer(seed) outcomes = m.runs(bc, trials=100)  In [62]: outcomes[:3]  Out[62]: [(CompletedProcess(args='bc', returncode=0, stdout='', stderr='(standard_in) 1: parse error\n'), 'PASS'), (CompletedProcess(args='bc', returncode=0, stdout='', stderr='(standard_in) 1: illegal character: \n(standard_in) 1: parse error\n'), 'PASS'), (CompletedProcess(args='bc', returncode=0, stdout='', stderr='(standard_in) 1: parse error\n'), 'PASS')] In [63]: sum(1 for completed_process, outcome in outcomes if completed_process.stderr == "")  Out[63]: 5 #### Part 2: Guided Mutations¶ Continue with guided mutations. To this end, you will have to find a way to extract coverage from a C program such as bc. Proceed in these steps: First, get GNU bc; download, say, bc-1.07.1.tar.gz and unpack it: In [64]: !curl -O mirrors.kernel.org/gnu/bc/bc-1.07.1.tar.gz   % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 410k 100 410k 0 0 146k 0 0:00:02 0:00:02 --:--:-- 146k  In [65]: !tar xfz bc-1.07.1.tar.gz  Second, configure the package: In [66]: !cd bc-1.07.1; ./configure  checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... ./install-sh -c -d checking for gawk... no checking for mawk... no checking for nawk... no checking for awk... awk checking whether make sets$(MAKE)... yes
checking whether make supports nested variables... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking minix/config.h usability... no
checking minix/config.h presence... no
checking for minix/config.h... no
checking whether it is safe to define __EXTENSIONS__... yes
checking for flex... flex
checking lex output file root... lex.yy
checking lex library... -ll
checking whether yytext is a pointer... yes
checking for ar... ar
checking the archiver (ar) interface... ar
checking for bison... bison -y
checking for ranlib... ranlib
checking whether make sets $(MAKE)... (cached) yes checking for stdarg.h... yes checking for stddef.h... yes checking for stdlib.h... (cached) yes checking for string.h... (cached) yes checking for errno.h... yes checking for limits.h... yes checking for unistd.h... (cached) yes checking for lib.h... no checking for an ANSI C-conforming const... yes checking for size_t... yes checking for ptrdiff_t... yes checking for vprintf... yes checking for _doprnt... no checking for isgraph... yes checking for setvbuf... yes checking for fstat... yes checking for strtol... yes Adding GCC specific compile flags. checking that generated files are newer than configure... done configure: creating ./config.status config.status: creating Makefile config.status: creating bc/Makefile config.status: creating dc/Makefile config.status: creating lib/Makefile config.status: creating doc/Makefile config.status: creating doc/texi-ver.incl config.status: creating config.h config.status: executing depfiles commands  Third, compile the package with special flags: In [67]: !cd bc-1.07.1; make CFLAGS="--coverage"  /Applications/Xcode.app/Contents/Developer/usr/bin/make all-recursive Making all in lib gcc -DHAVE_CONFIG_H -I. -I.. -I. -I.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT getopt.o -MD -MP -MF .deps/getopt.Tpo -c -o getopt.o getopt.c mv -f .deps/getopt.Tpo .deps/getopt.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT getopt1.o -MD -MP -MF .deps/getopt1.Tpo -c -o getopt1.o getopt1.c mv -f .deps/getopt1.Tpo .deps/getopt1.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT vfprintf.o -MD -MP -MF .deps/vfprintf.Tpo -c -o vfprintf.o vfprintf.c mv -f .deps/vfprintf.Tpo .deps/vfprintf.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT number.o -MD -MP -MF .deps/number.Tpo -c -o number.o number.c mv -f .deps/number.Tpo .deps/number.Po rm -f libbc.a ar cru libbc.a getopt.o getopt1.o vfprintf.o number.o ranlib libbc.a Making all in bc gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT main.o -MD -MP -MF .deps/main.Tpo -c -o main.o main.c mv -f .deps/main.Tpo .deps/main.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT bc.o -MD -MP -MF .deps/bc.Tpo -c -o bc.o bc.c mv -f .deps/bc.Tpo .deps/bc.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT scan.o -MD -MP -MF .deps/scan.Tpo -c -o scan.o scan.c mv -f .deps/scan.Tpo .deps/scan.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT execute.o -MD -MP -MF .deps/execute.Tpo -c -o execute.o execute.c mv -f .deps/execute.Tpo .deps/execute.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT load.o -MD -MP -MF .deps/load.Tpo -c -o load.o load.c mv -f .deps/load.Tpo .deps/load.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT storage.o -MD -MP -MF .deps/storage.Tpo -c -o storage.o storage.c mv -f .deps/storage.Tpo .deps/storage.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT util.o -MD -MP -MF .deps/util.Tpo -c -o util.o util.c mv -f .deps/util.Tpo .deps/util.Po gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT warranty.o -MD -MP -MF .deps/warranty.Tpo -c -o warranty.o warranty.c mv -f .deps/warranty.Tpo .deps/warranty.Po echo '{0}' > libmath.h /Applications/Xcode.app/Contents/Developer/usr/bin/make global.o gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT global.o -MD -MP -MF .deps/global.Tpo -c -o global.o global.c mv -f .deps/global.Tpo .deps/global.Po gcc -g -O2 -Wall -funsigned-char --coverage -o libmath.h -o fbc main.o bc.o scan.o execute.o load.o storage.o util.o warranty.o global.o ../lib/libbc.a -ll ./fbc -c ./libmath.b </dev/null >libmath.h ./fix-libmath_h 2655 2793 rm -f ./fbc ./global.o gcc -DHAVE_CONFIG_H -I. -I.. -I. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT global.o -MD -MP -MF .deps/global.Tpo -c -o global.o global.c mv -f .deps/global.Tpo .deps/global.Po gcc -g -O2 -Wall -funsigned-char --coverage -o bc main.o bc.o scan.o execute.o load.o storage.o util.o global.o warranty.o ../lib/libbc.a -ll Making all in dc gcc -DHAVE_CONFIG_H -I. -I.. -I./.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT dc.o -MD -MP -MF .deps/dc.Tpo -c -o dc.o dc.c mv -f .deps/dc.Tpo .deps/dc.Po gcc -DHAVE_CONFIG_H -I. -I.. -I./.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT misc.o -MD -MP -MF .deps/misc.Tpo -c -o misc.o misc.c mv -f .deps/misc.Tpo .deps/misc.Po gcc -DHAVE_CONFIG_H -I. -I.. -I./.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT eval.o -MD -MP -MF .deps/eval.Tpo -c -o eval.o eval.c mv -f .deps/eval.Tpo .deps/eval.Po gcc -DHAVE_CONFIG_H -I. -I.. -I./.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT stack.o -MD -MP -MF .deps/stack.Tpo -c -o stack.o stack.c mv -f .deps/stack.Tpo .deps/stack.Po gcc -DHAVE_CONFIG_H -I. -I.. -I./.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT array.o -MD -MP -MF .deps/array.Tpo -c -o array.o array.c mv -f .deps/array.Tpo .deps/array.Po gcc -DHAVE_CONFIG_H -I. -I.. -I./.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT numeric.o -MD -MP -MF .deps/numeric.Tpo -c -o numeric.o numeric.c mv -f .deps/numeric.Tpo .deps/numeric.Po gcc -DHAVE_CONFIG_H -I. -I.. -I./.. -I./../h -g -O2 -Wall -funsigned-char --coverage -MT string.o -MD -MP -MF .deps/string.Tpo -c -o string.o string.c mv -f .deps/string.Tpo .deps/string.Po gcc -g -O2 -Wall -funsigned-char --coverage -o dc dc.o misc.o eval.o stack.o array.o numeric.o string.o ../lib/libbc.a Making all in doc restore=: && backupdir=".am$$" && \ am__cwd=pwd && CDPATH="{ZSH_VERSION+.}:" && cd . && \ rm -rf backupdir && mkdir backupdir && \ if (makeinfo --no-split --version) >/dev/null 2>&1; then \ for f in bc.info bc.info-[0-9] bc.info-[0-9][0-9] bc.i[0-9] bc.i[0-9][0-9]; do \ if test -f f; then mv f backupdir; restore=mv; else :; fi; \ done; \ else :; fi && \ cd "am__cwd"; \ if makeinfo --no-split -I . \ -o bc.info bc.texi; \ then \ rc=0; \ CDPATH="{ZSH_VERSION+.}:" && cd .; \ else \ rc=?; \ CDPATH="{ZSH_VERSION+.}:" && cd . && \ restore backupdir/* echo "./bc.info" | sed 's|[^/]*||'; \ fi; \ rm -rf backupdir; exit rc restore=: && backupdir=".am$$" && \ am__cwd=pwd && CDPATH="${ZSH_VERSION+.}:" && cd . && \
rm -rf $backupdir && mkdir$backupdir && \
if (makeinfo --no-split --version) >/dev/null 2>&1; then \
for f in dc.info dc.info-[0-9] dc.info-[0-9][0-9] dc.i[0-9] dc.i[0-9][0-9]; do \
if test -f $f; then mv$f $backupdir; restore=mv; else :; fi; \ done; \ else :; fi && \ cd "$am__cwd"; \
if makeinfo --no-split   -I . \
-o dc.info dc.texi; \
then \
rc=0; \
CDPATH="${ZSH_VERSION+.}:" && cd .; \ else \ rc=$?; \
CDPATH="${ZSH_VERSION+.}:" && cd . && \$restore $backupdir/* echo "./dc.info" | sed 's|[^/]*$||'; \
fi; \
rm -rf $backupdir; exit$rc
make[4]: Nothing to be done for all-am'.


The file bc/bc should now be executable...

In [68]:
!cd bc-1.07.1/bc; echo 2 + 2 | ./bc

4


...and you should be able to run the gcov program to retrieve coverage information.

In [69]:
!cd bc-1.07.1/bc; gcov main.c

File 'main.c'
Lines executed:51.69% of 118
main.c:creating 'main.c.gcov'



As sketched in the "Coverage" chapter, the file bc-1.07.1/bc/main.c.gcov now holds the coverage information for bc.c. Each line is prefixed with the number of times it was executed. ##### means zero times; - means non-executable line.

Parse the GCOV file for bc and create a coverage set, as in FunctionCoverageRunner. Make this a ProgramCoverageRunner class that would be constructed with a list of source files (bc.c, main.c, load.c) to run gcov on.

When you're done, don't forget to clean up:

In [70]:
!rm -fr bc-1.07.1 bc-1.07.1.tar.gz
`

### Exercise 3¶

In this blog post, the author of American Fuzzy Lop (AFL), a very popular mutation-based fuzzer discusses the efficiency of various mutation operators. Implement four of them and evaluate their efficiency as in the examples above.

### Exercise 4¶

When adding a new element to the list of candidates, AFL does actually not compare the coverage, but adds an element if it exercises a new branch. Using branch coverage from the exercises of the "Coverage" chapter, implement this "branch" strategy and compare it against the "coverage" strategy, above.

### Exercise 5¶

Design and implement a system that will gather a population of URLs from the Web. Can you achieve a higher coverage with these samples? What if you use them as initial population for further mutation?