# Concolic Fuzzing¶

We have previously seen how one can use dynamic taints to produce more intelligent test cases than simply looking for program crashes. We have also seen how one can use the taints to update the grammar, and hence focus more on the dangerous methods.

While taints are helpful, uninterpreted strings is only one of the attack vectors. Can we say anything more about the properties of variables at any point in the execution? For example, can we say for sure that a function will always receive the buffers with the correct length? Concolic execution offers a solution.

The idea of concolic execution over a function is as follows: We start with a sample input for the function, and execute the function under trace. At each point the execution passes through a conditional, we save the conditional encountered in an in the form of relations between symbolic variables. (A symbolic variable can be thought of as a sort of placeholder for the real variable, sort of like the x in solving for x in Algebra. The symbolic variables can be used to specify relations without actually solving them.)

With concolic execution, one can collect the constraints that an execution path encounters, and use it to answer questions about the program behavior at any point we prefer along the program execution path. We can further use concolic execution to enhance fuzzing.

In this chapter, we explore in depth how to execute a Python function concolically, and how concolic execution can be used to enhance fuzzing.

Prerequisites

We first setup our infrastructure so that we can make use of previously defined functions.

In [1]:
import fuzzingbook_utils


## Synopsis¶

>>> from fuzzingbook.ConcolicFuzzer import <identifier>


and then make use of the following features.

This chapter defines two main classes: SimpleConcolicFuzzer and ConcolicGrammarFuzzer. The SimpleConcolicFuzzer first uses a sample input to collect predicates encountered. The fuzzer then negates random predicates to generate new input constraints. These, when solved, produce inputs that explore paths that are close to the original path. It can be used as follows.

The cgi_decode() function from the chapter on coverage is the subject program.

>>> from Coverage import cgi_decode


We first obtain the constraints using ConcolicTracer.

>>> with ConcolicTracer() as _:
>>>     _[cgi_decode]('abcd')


These constraints are added to the concolic fuzzer as follows:

>>> scf = SimpleConcolicFuzzer()


The concolic fuzzer then uses the constraints added to guide its fuzzing as follows:

>>> for i in range(10):
>>>     v = scf.fuzz()
>>>     if v is None:
>>>         break
>>>     print(v)
%\x00
\x00\x00+\x00
\x00\x00+\x04
\x00\x00+\x02
\x00\x00\x00+\x00
\x00+\x00
\x00\x00+@
\x00+
\x00%\x00
\x00+\x08


The SimpleConcolicFuzzer simply explores all paths near the original path traversed by the sample input. It does not use any kind of feedback from fuzzing. The ConcolicGrammarFuzzer on the other hand, can collect feedback from the subject under fuzzing, and lift some of the constraints encountered to the grammar, enabling deeper fuzzing. It is used as follows:

>>> from InformationFlow import INVENTORY_GRAMMAR, SQLException
>>> cgf = ConcolicGrammarFuzzer(INVENTORY_GRAMMAR)
>>> cgf.prune_tokens(prune_tokens)
>>> for i in range(10):
>>>     query = cgf.fuzz()
>>>     print(query)
>>>     with ConcolicTracer() as _:
>>>         with ExpectError():
>>>             try:
>>>                 res = _[db_select](query)
>>>                 print(repr(res))
>>>             except SQLException as e:
>>>                 print(e)
>>>         cgf.update_grammar(_)
>>>         print()
update NK5f set L=D,o=G,q=X where A!=S

update vehicles set V9=B,y=r,E=E where r(U)/x(i)>b

insert into V6S (V,R2,O) values ('2/','@')

update months set q=J1,Y=n,r=X where o-S*B!=(u!=q)

select e-e/B>Z(i)-(_) from vehicles where 59.2!=S==C
Invalid WHERE ('(59.2!=S==C)')

delete from months where (M)*I<p+e(S)-R*O(_)*p(8,u)
Invalid WHERE ('(M)*I<p+e(S)-R*O(_)*p(8,u)')

insert into months (CRl,i,AW5,_j) values ('6')

select a(A),J from K8z where p-f-i<H

delete from vehicles where h(2,a)>((Sg90))+M5((az1(p)*r/W/V+b9(H)))
Invalid WHERE ('h(2,a)>((Sg90))+M5((az1(p)*r/W/V+b9(H)))')

select 9.7 from months
[9.7, 9.7, 9.7, 9.7, 9.7, 9.7, 9.7, 9.7, 9.7, 9.7, 9.7, 9.7]


## Tracking Constraints¶

In the chapter on information flow, we have seen how dynamic taints can be used to direct fuzzing by indicating which part of input reached interesting places. However, dynamic taint tracking is limited in the information that it can propagate. For example, we might want to explore what happens when certain properties of the input changes.

For example, say we have a function factorial() that returns the factorial value of its input.

In [2]:
def factorial(n):
if n < 0:
return None
if n == 0:
return 1
if n == 1:
return 1
v = 1
while n != 0:
v = v * n
n = n - 1
return v


We exercise the function with a value of 5.

In [3]:
factorial(5)

Out[3]:
120

Is this sufficient to explore all the features of the function? How do we know? One way to verify that we have explored all features is to look at the coverage obtained. First we need to extend the Coverage class from the chapter on coverage to provide us with coverage arcs.

In [4]:
from Coverage import Coverage

In [5]:
import inspect

In [6]:
class ArcCoverage(Coverage):
def traceit(self, frame, event, args):
if event != 'return':
f = inspect.getframeinfo(frame)
self._trace.append((f.function, f.lineno))
return self.traceit

def arcs(self):
t = [i for f, i in self._trace]
return list(zip(t, t[1:]))


Next, we use the Tracer to obtain the coverage arcs.

In [7]:
with ArcCoverage() as cov:
factorial(5)


We can now use the coverage arcs to visualize the coverage obtained.

In [8]:
from ControlFlow import PyCFG, CFGNode, to_graph, gen_cfg

In [9]:
from graphviz import Source, Graph

In [10]:
Source(to_graph(gen_cfg(inspect.getsource(factorial)), arcs=cov.arcs()))

Out[10]:

We see that the path [1, 2, 4, 6, 8, 9, 10, 11, 12] is covered (green) but sub-paths such as [2, 3], [4, 5] and [6, 7] are unexplored (red). What we need is the ability to generate inputs such that the True branch is taken at 2. How do we do that?

## Concolic Execution¶

One way is to look at the execution path being taken, and collect the conditional constraints that the path encounters. Then we can try to produce inputs that lead us to taking the non-traversed path.

First, let us step through the function.

In [11]:
lines = [i[1] for i in cov._trace if i[0] == 'factorial']
src = {i + 1: s for i, s in enumerate(
inspect.getsource(factorial).split('\n'))}

• The line (1) is simply the entry point of the function. We know that the input is n, which is an integer.
In [12]:
src[1]

Out[12]:
'def factorial(n):'
• The line (2) is a predicate n < 0. Since the next line taken is line (4), we know that at this point in the execution path, the predicate was true.
In [13]:
src[2], src[3], src[4]

Out[13]:
('    if n < 0:', '        return None', '    if n == 0:')

We notice that this is one of the predicates where the true branch was not taken. How do we generate a value that takes the true branch here? One way is to use symbolic variables to represent the input, encode the constraint, and use an SMT Solver to solve the negation of the constraint.

As we mentioned in the introduction to the chapter, a symbolic variable can be thought of as a sort of placeholder for the real variable, sort of like the x in solving for x in Algebra. These variables can be used to encode constraints placed on the variables in the program. We identify what constraints the variable is supposed to obey, and finally produce a value that obeys all constraints imposed.

## SMT Solvers¶

To solve these constraints, one can use a Satisfiability Modulo Theories (SMT) solver. An SMT solver is built on top of a SATISFIABILITY (SAT) solver. A SAT solver is be used to check whether boolean formulas in first order logic (e.g (a | b ) & (~a | ~b)) can be satisfied using any assignments for the variables (e.g a = true, b = false). An SMT solver extends these SAT solvers to specific background theories -- for example, theory of integers, or theory of strings. That is, given a string constraint expressed as a formula with string variables (e.g h + t == 'hello,world'), an SMT solver that understands theory of strings can be used to check if that constraint can be satisfied, and if satisfiable, provide an instantiation of concrete values for the variables used in the formula (e.g h = 'hello,', t = 'world').

We use the SMT solver, Z3 in this chapter.

In [14]:
import z3


To ensure that the string constraints we use in this chapter are successfully evaluated, we need to specify the z3str3 solver. Further, we set the timeout for Z3 computations to to 30 seconds.

In [15]:
assert z3.get_version() == (4, 8, 0, 0)
z3.set_option('smt.string_solver', 'z3str3')
z3.set_option('timeout', 30 * 1000)  # milliseconds


Encoding the constraint requires declaring a corresponding symbolic variable to the input n.

In [16]:
zn = z3.Int('n')


Remember the constraint (n < 0) from line 2 in factorial()? We can now encode the constraint as follows.

In [17]:
zn < 0

Out[17]:
n < 0

We previously traced factorial(5). We saw that with input 5, the execution took the else branch on the predicate n < 0. We can express this observation as follows.

In [18]:
z3.Not(zn < 0)

Out[18]:
Not(n < 0)

The z3.solve() method can also be used to check if the constraints are satisfiable, and if it is, provide an values for variables such that the constraints are satisfied. For example, we can ask z3 for an input that will take the else branch as follows:

In [19]:
z3.solve(z3.Not(zn < 0))

[n = 0]


This is a solution (albeit a trivial one). SMT solvers can be used to solve much harder problems. For example, here is how one can solve a quadratic equation.

In [20]:
x = z3.Real('x')
eqn = (2 * x**2 - 11 * x + 5 == 0)
z3.solve(eqn)

[x = 5]


Again, this is one solution. We can ask z3 to give us another solution as follows.

In [21]:
z3.solve(x != 5, eqn)

[x = 1/2]


Indeed, both x = 5 and x = 1/2 are solutions to the quadratic equation $2x^2 -11x + 5 = 0$

Similarly, we can ask Z3 for an input that satisfies the constraint encoded in line 2 of factorial() so that we take the if branch.

In [22]:
z3.solve(zn < 0)

[n = -1]


That is, if one uses -1 as an input to factorial(), it is guaranteed to take the if branch in line 2 during execution.

Let us try using that with our coverage. Here, the -1 is the solution from above.

In [23]:
with cov as cov:
factorial(-1)

In [24]:
Source(to_graph(gen_cfg(inspect.getsource(factorial)), arcs=cov.arcs()))

Out[24]:

Ok, so we have managed to cover a little more of the graph. Let us continue with our original input of factorial(5):

• In line (4) we encounter a new predicate n == 0, for which we again took the false branch.
In [25]:
src[4]

Out[25]:
'    if n == 0:'

The predicates required to follow the path until this point is as follows.

In [26]:
predicates = [z3.Not(zn < 0), z3.Not(zn == 0)]

• If we continue to line (6), we encounter another predicate, for which again, we took the false branch
In [27]:
src[6]

Out[27]:
'    if n == 1:'

The predicates encountered so far is as follows

In [28]:
predicates = [z3.Not(zn < 0), z3.Not(zn == 0), z3.Not(zn == 1)]


To take the branch at (6), we essentially have to obey the predicates until that point, but invert the last predicate.

In [29]:
last = len(predicates) - 1
z3.solve(predicates[0:-1] + [z3.Not(predicates[-1])])

[n = 1]


What we are doing here is tracing the execution corresponding to a particular input factorial(5), using concrete values, and along with it, keeping symbolic shadow variables that enable us to capture the constraints. As we mentioned in the introduction, this particular method of execution where one tracks concrete execution using symbolic variables is called Concolic Execution.

How do we automate this process? One method is to use a similar infrastructure as that of the chapter on information flow, and use the Python inheritance to create symbolic proxy objects that can track the concrete execution.

## A Concolic Tracer¶

Given that there is a symbolic context under which the program is executed (that is the symbolic variables that are used in the program execution) we define a context manager called ConcolicTracer that keeps track of the context.

The ConcolicTracer accepts a single argument which contains the declarations for the symbolic variables seen so far, and the pre-conditions if any.

In [30]:
class ConcolicTracer:
def __init__(self, context=None):
self.context = context if context is not None else ({}, [])
self.decls, self.path = self.context


We add the enter and exit methods for the context manager.

In [31]:
class ConcolicTracer(ConcolicTracer):
def __enter__(self):
return self

def __exit__(self, exc_type, exc_value, tb):
return


We use introspection to determine the arguments to the function, which is hooked into the getitem method.

In [32]:
class ConcolicTracer(ConcolicTracer):
def __getitem__(self, fn):
self.fn = fn
self.fn_args = {i: None for i in inspect.signature(fn).parameters}
return self


Finally, the function itself is invoked using the call method.

In [33]:
class ConcolicTracer(ConcolicTracer):
def __call__(self, *args):
self.result = self.fn(*self.concolic(args))
return self.result


For now, we define concolic() as a transparent function. It will be modified to produce symbolic variables later.

In [34]:
class ConcolicTracer(ConcolicTracer):
def concolic(self, args):
return args


It can be used as follows

In [35]:
with ConcolicTracer() as _:
_[factorial](1)

In [36]:
_.context

Out[36]:
({}, [])

The context is empty as we are yet to hook up the necessary infrastructure to ConcolicTracer.

### Concolic Proxy Objects¶

We now define the concolic proxy objects that can be used for concolic tracing. First, we define the zproxy_create() method that given a class name, correctly creates an instance of that class, and the symbolic corresponding variable, and registers the symbolic variable in the context information context.

In [37]:
def zproxy_create(cls, sname, z3var, context, zn, v=None):
zv = cls(context, z3var(zn), v)
context[0][zn] = sname
return zv


#### A Proxy Class for Booleans¶

First, we define the zbool class which is used to track the predicates encountered. It is a wrapper class that contains both symbolic (z) as well as concrete (v) values. The concrete value is used to determine which path to take, and the symbolic value is used to collect the predicates encountered.

The initialization done in two parts. The first one is using zproxy_create() to correctly initialize and register the shadow symbolic variable corresponding to the passed argument. This is used exclusively when the symbolic variable needs to be initialized first. In all other cases, the constructor is called with the preexisting symbolic value.

In [38]:
class zbool:
@classmethod
def create(cls, context, zn, v):
return zproxy_create(cls, 'Bool', z3.Bool, context, zn, v)

def __init__(self, context, z, v=None):
self.context, self.z, self.v = context, z, v
self.decl, self.path = self.context


Here is how it can be used.

In [39]:
with ConcolicTracer() as _:
za, zb = z3.Ints('a b')
val = zbool.create(_.context, 'my_bool_arg', True)
print(val.z, val.v)
_.context

my_bool_arg True

Out[39]:
({'my_bool_arg': 'Bool'}, [])
##### Negation of Encoded formula¶

The zbool class allows negation of its concrete and symbolic values.

In [40]:
class zbool(zbool):
def __not__(self):
return zbool(self.context, z3.Not(self.z), not self.v)


Here is how it can be used.

In [41]:
with ConcolicTracer() as _:
val = zbool.create(_.context, 'my_bool_arg', True).__not__()
print(val.z, val.v)
_.context

Not(my_bool_arg) False

Out[41]:
({'my_bool_arg': 'Bool'}, [])
##### Registering Predicates on Conditionals¶

The zbool class is be used to track boolean conditions that arise during program execution. It tracks such conditions by registering the corresponding symbolic expressions in the context.

In [42]:
class zbool(zbool):
def __bool__(self):
r, pred = (True, self.z) if self.v else (False, z3.Not(self.z))
self.path.append(pred)
return r


The zbool class can be used to keep track of boolean values and conditions encountered during the execution. For example, we can encode the conditions encountered by line 6 in factorial() as follows:

First, we define the concrete value (ca), and its shadow symbolic variable (za).

In [43]:
ca, za = 5, z3.Int('a')


Then, we wrap it in zbool, and use it in a conditional, forcing the conditional to be registered in the context.

In [44]:
with ConcolicTracer() as _:
if zbool(_.context, za == z3.IntVal(5), ca == 5):
print('success')

success


We can retrieve the registered conditional as follows.

In [45]:
_.path

Out[45]:
[5 == a]

#### A Proxy Class for Integers¶

Next, we define a symbolic wrapper zint for int. This class keeps track of the int variables used and the predicates encountered in context. Finally, it also keeps the concrete value used so that it can be used to determine the path to take. As the zint extends the primitive int class, we have to define a new method to open it for extension.

In [46]:
class zint(int):
def __new__(cls, context, zn, v, *args, **kw):
return int.__new__(cls, v, *args, **kw)


As in the case of zbool, the initialization takes place in two parts. The first using create() if a new symbolic argument is being registered, and then the usual initialization.

In [47]:
class zint(zint):
@classmethod
def create(cls, context, zn, v=None):
return zproxy_create(cls, 'Int', z3.Int, context, zn, v)

def __init__(self, context, z, v=None):
self.z, self.v = z, v
self.context = context


The int value of an zint object is its concrete value.

In [48]:
class zint(zint):
def __int__(self):
return self.v

def __pos__(self):
return self.v


Using these proxies are as follows.

In [49]:
with ConcolicTracer() as _:
val = zint.create(_.context, 'int_arg', 0)
print(val.z, val.v)
_.context

int_arg 0

Out[49]:
({'int_arg': 'Int'}, [])

The zint class is often used to do arithmetic with, or compare to other ints. These ints can be either a variable or a constant value. We define a helper method _zv() that checks what kind of int a given value is, and produces the correct symbolic equivalent.

In [50]:
class zint(zint):
def _zv(self, o):
return (o.z, o.v) if isinstance(o, zint) else (z3.IntVal(o), o)


It can be used as follows

In [51]:
with ConcolicTracer() as _:
val = zint.create(_.context, 'int_arg', 0)
print(val._zv(0))
print(val._zv(val))

(0, 0)
(int_arg, 0)

##### Equality between Integers¶

Two integers can be compared for equality using ne and eq.

In [52]:
class zint(zint):
def __ne__(self, other):
z, v = self._zv(other)
return zbool(self.context, self.z != z, self.v != v)

def __eq__(self, other):
z, v = self._zv(other)
return zbool(self.context, self.z == z, self.v == v)


We also define req using eq in case the int being compared is on the left hand side.

In [53]:
class zint(zint):
def __req__(self, other):
return self.__eq__(other)


It can be used as follows.

In [54]:
with ConcolicTracer() as _:
ia = zint.create(_.context, 'int_a', 0)
ib = zint.create(_.context, 'int_b', 0)
v1 = ia == ib
v2 = ia != ib
v3 = 0 != ib
print(v1.z, v2.z, v3.z)

int_a == int_b int_a != int_b 0 != int_b

##### Comparisons between Integers¶

Integers can also be compared for ordering, and the methods for this are defined below.

In [55]:
class zint(zint):
def __lt__(self, other):
z, v = self._zv(other)
return zbool(self.context, self.z < z, self.v < v)

def __gt__(self, other):
z, v = self._zv(other)
return zbool(self.context, self.z > z, self.v > v)


We use the comparisons and equality operators to provide the other missing operators.

In [56]:
class zint(zint):
def __le__(self, other):
z, v = self._zv(other)
return zbool(self.context, z3.Or(self.z < z, self.z == z),
self.v < v or self.v == v)

def __ge__(self, other):
z, v = self._zv(other)
return zbool(self.context, z3.Or(self.z > z, self.z == z),
self.v > v or self.v == v)


These functions can be used as follows.

In [57]:
with ConcolicTracer() as _:
ia = zint.create(_.context, 'int_a', 0)
ib = zint.create(_.context, 'int_b', 1)
v1 = ia > ib
v2 = ia < ib
print(v1.z, v2.z)
v3 = ia >= ib
v4 = ia <= ib
print(v3.z, v4.z)

int_a > int_b int_a < int_b
Or(int_a > int_b, int_a == int_b) Or(int_a < int_b, int_a == int_b)

##### Binary Operators for Integers¶

We implement relevant arithmetic operators for integers as described in Python documentation. (The commented out operators are not directly available for z3.ArithRef. They need to be implemented separately if needed. See the exercises for how it can be done.)

In [58]:
INT_BINARY_OPS = [
'__sub__',
'__mul__',
'__truediv__',
# '__div__',
'__mod__',
# '__divmod__',
'__pow__',
# '__lshift__',
# '__rshift__',
# '__and__',
# '__xor__',
# '__or__',
'__rsub__',
'__rmul__',
'__rtruediv__',
# '__rdiv__',
'__rmod__',
# '__rdivmod__',
'__rpow__',
# '__rlshift__',
# '__rrshift__',
# '__rand__',
# '__rxor__',
# '__ror__',
]

In [59]:
def make_int_binary_wrapper(fname, fun, zfun):
def proxy(self, other):
z, v = self._zv(other)
z_ = zfun(self.z, z)
v_ = fun(self.v, v)
if isinstance(v_, float):
# we do not implement float results yet.
assert round(v_) == v_
v_ = round(v_)
return zint(self.context, z_, v_)

return proxy

In [60]:
for fname in INT_BINARY_OPS:
fun = getattr(int, fname)
zfun = getattr(z3.ArithRef, fname)
setattr(zint, fname, make_int_binary_wrapper(fname, fun, zfun))

In [61]:
with ConcolicTracer() as _:
ia = zint.create(_.context, 'int_a', 0)
ib = zint.create(_.context, 'int_b', 1)
print((ia + ib).z)
print((ia + 10).z)
print((11 + ib).z)
print((ia - ib).z)
print((ia * ib).z)
print((ia / ib).z)
print((ia ** ib).z)

int_a + int_b
int_a + 10
11 + int_b
int_a - int_b
int_a*int_b
int_a/int_b
int_a**int_b

##### Integer Unary Operators¶

We also implement the relevant unary operators as below.

In [62]:
INT_UNARY_OPS = [
'__neg__',
'__pos__',
# '__abs__',
# '__invert__',
# '__round__',
# '__ceil__',
# '__floor__',
# '__trunc__',
]

In [63]:
def make_int_unary_wrapper(fname, fun, zfun):
def proxy(self):
return zint(self.context, zfun(self.z), fun(self.v))

return proxy

In [64]:
for fname in INT_UNARY_OPS:
fun = getattr(int, fname)
zfun = getattr(z3.ArithRef, fname)
setattr(zint, fname, make_int_unary_wrapper(fname, fun, zfun))


We can use the unary operators we defined above as follows:

In [65]:
with ConcolicTracer() as _:
ia = zint.create(_.context, 'int_a', 0)
print((-ia).z)
print((+ia).z)

-int_a
int_a

##### Using an Integer in a Boolean Context¶

An integer may be converted to a boolean context in conditionals or as part of boolean predicates such as or, and and not. In these cases, the __bool__() method gets called. Unfortunately, this method requires a primitive boolean value. Hence, we force the current integer formula to a boolean predicate and register it in the current context.

In [66]:
class zint(zint):
def __bool__(self):
# return zbool(self.context, self.z, self.v) <-- not allowed
# force registering boolean condition
if self != 0:
return True
return False


It is used as follows

In [67]:
with ConcolicTracer() as _:
za = zint.create(_.context, 'int_a', 1)
zb = zint.create(_.context, 'int_b', 0)
if za and zb:
print(1)

In [68]:
_.context

Out[68]:
({'int_a': 'Int', 'int_b': 'Int'}, [0 != int_a, Not(0 != int_b)])

#### Remaining Methods of the ConcolicTracer¶

We now complete some of the methods of the ConcolicTracer.

##### Translating to the SMT Expression Format¶

Given that we are using an SMT Solver z3, it is often useful to retrieve the corresponding SMT expression for a symbolic expression. This can be used as an argument to z3 or other SMT solvers.

The format of the SMT expression (SMT-LIB) is as follows:

• Variables declarations in S-EXP format. E.g. The following declares a symbolic integer variable x
(declare-const x Int)
This declares a bit vector b of length 8
(declare-const b (_ BitVec 8))
This declares a symbolic real variable r
(declare-const x Real)
This declares a symbolic string variable s
(declare-const s String)

The declared variables can be used in logical formulas that are encoded in S-EXP format. For example, here is a logical formula.

(assert
(and
(= a b)
(= a c)
(! b c)))

Here is another example, using string variables.

(or (< 0 (str.indexof (str.substr my_str1 7 19) " where " 0))
(= (str.indexof (str.substr my_str1 7 19) " where " 0) 0))
In [69]:
class ConcolicTracer(ConcolicTracer):
def smt_expr(self, show_decl=False, simplify=False, path=[]):
r = []
if show_decl:
for decl in self.decls:
v = self.decls[decl]
v = '(_ BitVec 8)' if v == 'BitVec' else v
r.append("(declare-const %s %s)" % (decl, v))
path = path if path else self.path
if path:
path = z3.And(path)
if show_decl:
if simplify:
return '\n'.join([
*r,
"(assert %s)" % z3.simplify(path).sexpr()
])
else:
return '\n'.join(
[*r, "(assert %s)" % path.sexpr()])
else:
return z3.simplify(path).sexpr()
else:
return ''


To see how to use smt_expr(), let us consider an example. The triangle() function is used to determine if the given sides to a triangle result in an equilateral triangle, an isosceles triangle, or a scalene triangle. It is implemented as follows.

In [70]:
def triangle(a, b, c):
if a == b:
if b == c:
return 'equilateral'
else:
return 'isosceles'
else:
if b == c:
return 'isosceles'
else:
if a == c:
return 'isosceles'
else:
return 'scalene'

In [71]:
triangle(1, 2, 1)

Out[71]:
'isosceles'

To translate make it run under ConcolicTracer, we first define the arguments. The triangle being defined has sides 1, 1, 1. i.e. it is an equilateral triangle.

In [72]:
with ConcolicTracer() as _:
za = zint.create(_.context, 'int_a', 1)
zb = zint.create(_.context, 'int_b', 1)
zc = zint.create(_.context, 'int_c', 1)
triangle(za, zb, zc)
print(_.context)

({'int_a': 'Int', 'int_b': 'Int', 'int_c': 'Int'}, [int_a == int_b, int_b == int_c])


We can now call smt_expr() to retrieve the SMT expression as below.

In [73]:
print(_.smt_expr(show_decl=True))

(declare-const int_a Int)
(declare-const int_b Int)
(declare-const int_c Int)
(assert (and (= int_a int_b) (= int_b int_c)))


The collected predicates can also be solved directly using the Python z3 API.

In [74]:
z3.solve(_.path)

[int_c = 0, int_a = 0, int_b = 0]

##### Generating Fresh Names¶

While using the proxy classes, we often will have to generate new symbolic variables, with names that have not been used before. For this, we define fresh_name() that always generates unique integers for names.

In [75]:
COUNTER = 0

In [76]:
def fresh_name():
global COUNTER
COUNTER += 1
return COUNTER


It can be used as follows

In [77]:
fresh_name()

Out[77]:
1
##### Translating Arguments to Concolic Proxies¶

We had previously defined concolic() as a transparent function. We now provide the full implementation of this function. It inspects a given function's parameters, and infers the parameter types from the concrete arguments passed in. It then uses this information to instantiate the correct proxy classes for each argument.

In [78]:
class ConcolicTracer(ConcolicTracer):
def concolic(self, args):
my_args = []
for name, arg in zip(self.fn_args, args):
t = type(arg).__name__
zwrap = globals()['z' + t]
vname = "%s_%s_%s_%s" % (self.fn.__name__, name, t, fresh_name())
my_args.append(zwrap.create(self.context, vname, arg))
self.fn_args[name] = vname
return my_args


This is how it gets used:

In [79]:
with ConcolicTracer() as _:
_[factorial](5)


With the new concolic() method, the arguments to the factorial are correctly associated with symbolic variables, which allows us to retrieve the predicates encountered.

In [80]:
_.context

Out[80]:
({'factorial_n_int_2': 'Int'},
[Not(0 > factorial_n_int_2),
Not(0 == factorial_n_int_2),
Not(1 == factorial_n_int_2),
0 != factorial_n_int_2,
0 != factorial_n_int_2 - 1,
0 != factorial_n_int_2 - 1 - 1,
0 != factorial_n_int_2 - 1 - 1 - 1,
0 != factorial_n_int_2 - 1 - 1 - 1 - 1,
Not(0 != factorial_n_int_2 - 1 - 1 - 1 - 1 - 1)])

As before, we can also print out the SMT expression which can be passed directly to command line SMT solvers.

In [81]:
print(_.smt_expr(show_decl=True))

(declare-const factorial_n_int_2 Int)
(assert (let ((a!1 (distinct 0 (- (- (- factorial_n_int_2 1) 1) 1)))
(a!2 (- (- (- (- factorial_n_int_2 1) 1) 1) 1)))
(and (not (> 0 factorial_n_int_2))
(not (= 0 factorial_n_int_2))
(not (= 1 factorial_n_int_2))
(distinct 0 factorial_n_int_2)
(distinct 0 (- factorial_n_int_2 1))
(distinct 0 (- (- factorial_n_int_2 1) 1))
a!1
(distinct 0 a!2)
(not (distinct 0 (- a!2 1))))))


We next define methods to evaluate the SMT expression both in Python and from command line.

##### Evaluating the Concolic Expressions¶

We define zeval() to solve the predicates in a context, and return results. It has two modes. The python mode uses z3 Python API to solve and return the results. If the python mode is false, it writes the SMT expression to a file, and invokes the command line z3 for a solution.

In [82]:
class ConcolicTracer(ConcolicTracer):
def zeval(self, python=False, log=False):
r, sol = (zeval_py if python else zeval_smt)(self.path, self, log)
if r == 'sat':
return r, {k: sol.get(self.fn_args[k], None) for k in self.fn_args}
else:
return r, None

##### Using the Python API¶

Given a set of predicates that the function encountered, and the tracer under which the function was executed, the zeval_py() function first declares the relevant symbolic variables, and uses the z3.Solver()to provide a set of inputs that would trace the same path through the function.

In [83]:
def zeval_py(path, cc, log):
for decl in cc.decls:
if cc.decls[decl] == 'BitVec':
v = "z3.%s('%s', 8)" % (cc.decls[decl], decl)
else:
v = "z3.%s('%s')" % (cc.decls[decl], decl)
exec(v)
s = z3.Solver()
if s.check() == z3.unsat:
return 'No Solutions', {}
elif s.check() == z3.unknown:
return 'Gave up', None
assert s.check() == z3.sat
m = s.model()
return 'sat', {d.name(): m[d] for d in m.decls()}


It can be used as follows:

In [84]:
with ConcolicTracer() as _:
_[factorial](5)

In [85]:
_.zeval(python=True)

Out[85]:
('sat', {'n': 5})

That is, given the set of constraints, the assignment n == 5 conforms to all constraints.

##### Using the Command Line¶

The zeval_smt() function writes the SMT expression to the file system, and calls the z3 SMT solver command line to solve it. The result of SMT expression is again an sexpr. Hence, we first define parse_sexp() to parse and return the correct values.

In [86]:
import re

In [87]:
import subprocess

In [88]:
SEXPR_TOKEN = r'''(?mx)
\s*(?:
(?P<bra>$)| (?P<ket>$)|
(?P<token>[^"()\s]+)|
(?P<string>"[^"]*")
)'''

In [89]:
def parse_sexp(sexp):
stack, res = [], []
for elements in re.finditer(SEXPR_TOKEN, sexp):
kind, value = [(t, v) for t, v in elements.groupdict().items() if v][0]
if kind == 'bra':
stack.append(res)
res = []
elif kind == 'ket':
last, res = res, stack.pop(-1)
res.append(last)
elif kind == 'token':
res.append(value)
elif kind == 'string':
res.append(value[1:-1])
else:
assert False
return res


The parse_sexp() function can be used as follows

In [90]:
parse_sexp('abcd (hello 123 (world "hello world"))')

Out[90]:
['abcd', ['hello', '123', ['world', 'hello world']]]

We now define zeval_smt() which uses the z3 command line directly, and uses parse_sexp() to parse and return the solutions to function arguments if any.

In [91]:
import tempfile

In [92]:
def zeval_smt(path, cc, log):
s = cc.smt_expr(True, True, path)

with tempfile.NamedTemporaryFile(mode='w', suffix='.smt') as f:
f.write(s)
f.write("\n(check-sat)")
f.write("\n(get-model)")
f.flush()

if log:
print(s, '(check-sat)', '(get-model)', sep='\n')
output = subprocess.getoutput("z3 " + f.name)

if log:
print(output)
o = parse_sexp(output)
kind = o[0]
if kind == 'unknown':
return 'Gave up', None
elif kind == 'unsat':
return 'No Solutions', {}
assert kind == 'sat'
assert o[1][0] == 'model'
return 'sat', {i[1]: i[-1] for i in o[1][1:]}


We can now use zeval() as follows.

In [93]:
with ConcolicTracer() as _:
_[factorial](5)

In [94]:
_.zeval()

Out[94]:
('sat', {'n': '5'})

Indeed, we get similar results (n == 5) from using the command line as from using the Python API.

#### A Proxy Class for Strings¶

Here, we define the proxy string class zstr. First we define our initialization routines. Since str is a primitive type, we define new to extend it.

In [95]:
class zstr(str):
def __new__(cls, context, zn, v):
return str.__new__(cls, v)


As before, initialization proceeds with create() and the constructor.

In [96]:
class zstr(zstr):
@classmethod
def create(cls, context, zn, v=None):
return zproxy_create(cls, 'String', z3.String, context, zn, v)

def __init__(self, context, z, v=None):
self.context, self.z, self.v = context, z, v
self._len = zint(context, z3.Length(z), len(v))
#self.context[1].append(z3.Length(z) == z3.IntVal(len(v)))


We also define _zv() helper to help us with methods that accept another string

In [97]:
class zstr(zstr):
def _zv(self, o):
return (o.z, o.v) if isinstance(o, zstr) else (z3.StringVal(o), o)

##### Retrieving Ordinal Value¶

We define zord that given a symbolic one character long string, obtains the ord() for that. It returns two values. The first one is the variable that corresponds to ord(), and second is the predicate that links the variable to the passed in single character string.

In [98]:
def zord(context, c):
bn = "bitvec_%d" % fresh_name()
v = z3.BitVec(bn, 8)
context[0][bn] = 'BitVec'
z = (z3.Unit(v) == c)
context[1].append(z)
return v


We use it as follows

In [99]:
zc = z3.String('arg_%d' % fresh_name())

In [100]:
with ConcolicTracer() as _:
zi = zord(_.context, zc)


The symbolic bitvector is in zi. It is linked to the passed in argument in context

In [101]:
_.context

Out[101]:
({'bitvec_6': 'BitVec'}, [seq.unit(bitvec_6) == arg_5])

We can specify what the result of ord() should be, and call z3.solve() to provide us with a solution that will provide the required result.

In [102]:
z3.solve(_.path + [zi == 65])

[arg_5 = "A", bitvec_6 = 65]

##### Translating an Ordinal Value to ASCII¶

Similarly, we can convert the ASCII value back to a single character string using zchr()

In [103]:
def zchr(context, i):
sn = 'string_%d' % fresh_name()
s = z3.String(sn)
context[0][sn] = 'String'
z = z3.And([s == z3.Unit(i), z3.Length(s) == 1])
context[1].append(z)
return s


For using it, we first define a bitvector that is 8 bits long.

In [104]:
i = z3.BitVec('bv_%d' % fresh_name(), 8)


We can now retrieve the chr() representation as below.

In [105]:
with ConcolicTracer() as _:
zc = zchr(_.context, i)

In [106]:
_.context

Out[106]:
({'string_8': 'String'},
[And(string_8 == seq.unit(bv_7), str.len(string_8) == 1)])

As before, we can specify what the end result of calling chr() should be to get the original argument.

In [107]:
z3.solve(_.path + [zc == z3.StringVal('a')])

[bv_7 = 97, string_8 = "a"]

##### Equality between Strings¶

The equality of zstr is defined similar to that of zint

In [108]:
class zstr(zstr):
def __eq__(self, other):
z, v = self._zv(other)
return zbool(self.context, self.z == z, self.v == v)

def __req__(self, other):
return self.__eq__(other)


The zstr class is used as follows.

In [109]:
def tstr1(s):
if s == 'h':
return True
else:
return False

In [110]:
with ConcolicTracer() as _:
r = _[tstr1]('h')

In [111]:
_.zeval()

Out[111]:
('sat', {'s': 'h'})

It works even if we have more than one character.

In [112]:
def tstr1(s):
if s == 'hello world':
return True
else:
return False

In [113]:
with ConcolicTracer() as _:
r = _[tstr1]('hello world')

In [114]:
_.context

Out[114]:
({'tstr1_s_str_10': 'String'}, [tstr1_s_str_10 == "hello world"])
In [115]:
_.zeval()

Out[115]:
('sat', {'s': 'hello world'})
##### Concatenation of Strings¶

What if we need to concatenate two strings? We need additional helpers to accomplish that.

In [116]:
class zstr(zstr):
z, v = self._zv(other)
return zstr(self.context, self.z + z, self.v + v)



Here is how it can be used. First, we create the wrapped arguments

In [117]:
with ConcolicTracer() as _:
v1, v2 = [zstr.create(_.context, 'arg_%d' % fresh_name(), s)
for s in ['hello', 'world']]
if (v1 + ' ' + v2) == 'hello world':
print('hello world')

hello world


The addition of symbolic variables is preserved in context

In [118]:
_.context

Out[118]:
({'arg_11': 'String', 'arg_12': 'String'},
[str.++(str.++(arg_11, " "), arg_12) == "hello world"])
##### Producing Substrings¶

Similarly, accessing substrings also require extra help.

In [119]:
class zstr(zstr):
def __getitem__(self, idx):
if isinstance(idx, slice):
start, stop, step = idx.indices(len(self.v))
assert step == 1  # for now
assert stop >= start  # for now
rz = z3.SubString(self.z, start, stop - start)
rv = self.v[idx]
elif isinstance(idx, int):
rz = z3.SubString(self.z, idx, 1)
rv = self.v[idx]
else:
assert False  # for now
return zstr(self.context, rz, rv)

def __iter__(self):
return zstr_iterator(self.context, self)

##### An Iterator Class for Strings¶

We define the iterator as follows.

In [120]:
class zstr_iterator():
def __init__(self, context, zstr):
self.context = context
self._zstr = zstr
self._str_idx = 0
self._str_max = zstr._len  # intz is not an _int_

def __next__(self):
if self._str_idx == self._str_max:  # intz#eq
raise StopIteration
c = self._zstr[self._str_idx]
self._str_idx += 1
return c

def __len__(self):
return self._len


Here is how it can be used.

In [121]:
def tstr2(s):
if s[0] == 'h' and s[1] == 'e' and s[3] == 'l':
return True
else:
return False

In [122]:
with ConcolicTracer() as _:
r = _[tstr2]('hello')


Again, the context shows predicates encountered.

In [123]:
_.context

Out[123]:
({'tstr2_s_str_13': 'String'},
[str.substr(tstr2_s_str_13, 0, 1) == "h",
str.substr(tstr2_s_str_13, 1, 1) == "e",
str.substr(tstr2_s_str_13, 3, 1) == "l"])

The function zeval() returns a solution for the predicate. Note that the value returned is not exactly the argument that we passed in. This is a consequence of the predicates we have. That is, we have no constraints on what the character value on s[2] should be.

In [124]:
_.zeval()

Out[124]:
('sat', {'s': 'he\\x00l'})
##### Translating to Upper and Lower Equivalents¶

A major complication is supporting upper() and lower() methods. We use the previously defined zchr() and zord() functions to accomplish this.

In [125]:
class zstr(zstr):
def upper(self):
empty = ''
ne = 'empty_%d' % fresh_name()
result = zstr.create(self.context, ne, empty)
self.context[1].append(z3.StringVal(empty) == result.z)
cdiff = (ord('a') - ord('A'))
for i in self:
oz = zord(self.context, i.z)
uz = zchr(self.context, oz - cdiff)
rz = z3.And([oz >= ord('a'), oz <= ord('z')])
ov = ord(i.v)
uv = chr(ov - cdiff)
rv = ov >= ord('a') and ov <= ord('z')
if zbool(self.context, rz, rv):
i = zstr(self.context, uz, uv)
else:
i = zstr(self.context, i.z, i.v)
result += i
return result


The lower() function is similar to upper() except that the character ranges are switched, and the lowercase is above uppercase. Hence, we add the difference to the ordinal to make a character to lowercase.

In [126]:
class zstr(zstr):
def lower(self):
empty = ''
ne = 'empty_%d' % fresh_name()
result = zstr.create(self.context, ne, empty)
self.context[1].append(z3.StringVal(empty) == result.z)
cdiff = (ord('a') - ord('A'))
for i in self:
oz = zord(self.context, i.z)
uz = zchr(self.context, oz + cdiff)
rz = z3.And([oz >= ord('A'), oz <= ord('Z')])
ov = ord(i.v)
uv = chr(ov + cdiff)
rv = ov >= ord('A') and ov <= ord('Z')
if zbool(self.context, rz, rv):
i = zstr(self.context, uz, uv)
else:
i = zstr(self.context, i.z, i.v)
result += i
return result


Here is how it is used.

In [127]:
def tstr3(s):
if s.upper() == 'H':
return True
else:
return False

In [128]:
with ConcolicTracer() as _:
r = _[tstr3]('h')


Again, we use zeval() to solve the collected constraints, and verify that our constraints are correct.

In [129]:
_.zeval()

Out[129]:
('sat', {'s': 'h'})

Here is a larger example using upper()

In [130]:
def tstr4(s):
if s.lower() == 'hello world':
return True
else:
return False

In [131]:
with ConcolicTracer() as _:
r = _[tstr4]('Hello World')

In [132]:
_.zeval()

Out[132]:
('sat', {'s': 'Hello World'})

Again, we obtain the right input value.

##### Checking for String Prefixes¶

We define startswith().

In [133]:
class zstr(zstr):
def startswith(self, other, beg=0, end=None):
assert end is None  # for now
assert isinstance(beg, int)  # for now
zb = z3.IntVal(beg)

others = other if isinstance(other, tuple) else (other, )

last = False
for o in others:
z, v = self._zv(o)
r = z3.IndexOf(self.z, z, zb)
last = zbool(self.context, r == zb, self.v.startswith(v))
if last:
return last
return last


An example.

In [134]:
def tstr5(s):
if s.startswith('hello'):
return True
else:
return False

In [135]:
with ConcolicTracer() as _:
r = _[tstr5]('hello world')

In [136]:
_.zeval()

Out[136]:
('sat', {'s': 'hello'})
In [137]:
with ConcolicTracer() as _:
r = _[tstr5]('my world')

In [138]:
_.zeval()

Out[138]:
('sat', {'s': '\\x00hello'})

As before, the predicates only ensure that the startswith() returned a true value. Hence, our solution reflects that.

##### Finding Substrings¶

We also define find()

In [139]:
class zstr(zstr):
def find(self, other, beg=0, end=None):
assert end is None  # for now
assert isinstance(beg, int)  # for now
zb = z3.IntVal(beg)
z, v = self._zv(other)
zi = z3.IndexOf(self.z, z, zb)
vi = self.v.find(v, beg, end)
return zint(self.context, zi, vi)


An example.

In [140]:
def tstr6(s):
if s.find('world') != -1:
return True
else:
return False

In [141]:
with ConcolicTracer() as _:
r = _[tstr6]('hello world')

In [142]:
_.zeval()

Out[142]:
('sat', {'s': '\\x00world'})

As before, the predicates only ensure that the find() returned a value greater than -1. Hence, our solution reflects that.

##### Remove Space from Ends¶

We next implement strip().

In [143]:
import string

In [144]:
class zstr(zstr):
def rstrip(self, chars=None):
if chars is None:
chars = string.whitespace
if self._len == 0:
return self
else:
last_idx = self._len - 1
cz = z3.SubString(self.z, last_idx.z, 1)
cv = self.v[-1]
zcheck_space = z3.Or([cz == z3.StringVal(char) for char in chars])
vcheck_space = any(cv == char for char in chars)
if zbool(self.context, zcheck_space, vcheck_space):
return zstr(self.context, z3.SubString(self.z, 0, last_idx.z),
self.v[0:-1]).rstrip(chars)
else:
return self

In [145]:
def tstr7(s):
if s.rstrip(' ') == 'a b':
return True
else:
return False

In [146]:
with ConcolicTracer() as _:
r = _[tstr7]('a b   ')
print(r)

True

In [147]:
_.zeval()

Out[147]:
('sat', {'s': 'a b   '})
In [148]:
class zstr(zstr):
def lstrip(self, chars=None):
if chars is None:
chars = string.whitespace
if self._len == 0:
return self
else:
first_idx = 0
cz = z3.SubString(self.z, 0, 1)
cv = self.v[0]
zcheck_space = z3.Or([cz == z3.StringVal(char) for char in chars])
vcheck_space = any(cv == char for char in chars)
if zbool(self.context, zcheck_space, vcheck_space):
return zstr(self.context, z3.SubString(
self.z, 1, self._len.z), self.v[1:]).lstrip(chars)
else:
return self

In [149]:
def tstr8(s):
if s.lstrip(' ') == 'a b':
return True
else:
return False

In [150]:
with ConcolicTracer() as _:
r = _[tstr8]('   a b')
print(r)

True

In [151]:
_.zeval()

Out[151]:
('sat', {'s': '   a b'})
In [152]:
class zstr(zstr):
def strip(self, chars=None):
return self.lstrip(chars).rstrip(chars)


Example usage.

In [153]:
def tstr9(s):
if s.strip() == 'a b':
return True
else:
return False

In [154]:
with ConcolicTracer() as _:
r = _[tstr9]('    a b  ')
print(r)

True

In [155]:
_.zeval()

Out[155]:
('sat', {'s': '    a b  '})

The strip() has generated the right constraints.

##### Splitting Strings¶

We implement string split() as follows.

In [156]:
class zstr(zstr):
def split(self, sep=None, maxsplit=-1):
assert sep is not None  # default space based split is complicated
assert maxsplit == -1  # for now.
zsep = z3.StringVal(sep)
zl = z3.Length(zsep)
zi = z3.IndexOf(self.z, zsep, z3.IntVal(0))  # zi would be the length of prefix
# Z3Bug: There is a bug in the z3.IndexOf method which returns
# z3.SeqRef instead of z3.ArithRef. So we need to fix it.
zi = z3.ArithRef(zi.ast, zi.ctx)

vi = self.v.find(sep)
if zbool(self.context, zi >= z3.IntVal(0), vi >= 0):
zprefix = z3.SubString(self.z, z3.IntVal(0), zi)
zmid = z3.SubString(self.z, zi, zl)
zsuffix = z3.SubString(self.z, zi + zl,
z3.Length(self.z))
return [zstr(self.context, zprefix, self.v[0:vi])] + zstr(
self.context, zsuffix, self.v[vi + len(sep):]).split(
sep, maxsplit)
else:
return [self]

In [157]:
def tstr10(s):
if s.split(',') == ['a', 'b', 'c']:
return True
else:
return False

In [158]:
with ConcolicTracer() as _:
r = _[tstr10]('a,b,c')
print(r)

True

In [159]:
_.zeval()

Out[159]:
('sat', {'s': 'a,b,c'})
##### Trip Wire¶

For easier debugging, we abort any calls to methods in str that are not overridden by zstr.

In [160]:
def make_str_abort_wrapper(fun):
def proxy(*args, **kwargs):
raise Exception( '%s Not implemented in zstr' % fun.__name__)
return proxy

In [161]:
strmembers = inspect.getmembers(zstr, callable)
zstrmembers = {m[0] for m in strmembers if len(
m) == 2 and 'zstr' in m[1].__qualname__}
for name, fn in inspect.getmembers(str, callable):
# Omitted 'splitlines' as this is needed for formatting output in
# IPython/Jupyter
if name not in zstrmembers and name not in [
'splitlines',
'__class__',
'__contains__',
'__delattr__',
'__dir__',
'__format__',
'__ge__',
'__getattribute__',
'__getnewargs__',
'__gt__',
'__hash__',
'__le__',
'__len__',
'__lt__',
'__mod__',
'__mul__',
'__ne__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__rmod__',
'__rmul__',
'__setattr__',
'__sizeof__',
'__str__']:
setattr(zstr, name, make_str_abort_wrapper(fn))


## Examples¶

### Triangle¶

We previously showed how to run triangle() under ConcolicTracer.

In [162]:
with ConcolicTracer() as _:
print(_[triangle](1, 2, 3))

scalene


The predicates are as follows:

In [163]:
_.path

Out[163]:
[Not(triangle_a_int_49 == triangle_b_int_50),
Not(triangle_b_int_50 == triangle_c_int_51),
Not(triangle_a_int_49 == triangle_c_int_51)]
In [164]:
_.zeval()

Out[164]:
('sat', {'a': '0', 'b': ['-', '2'], 'c': ['-', '1']})

We can modify the predicates if necessary. First, we retrieve the symbolic variables.

In [165]:
za, zb, zc = [z3.Int(s) for s in _.context[0].keys()]


Then, we pass a modified predicate to zeval(). The key determines which predicate the new predicate will replace.

In [166]:
_.zeval({1: zb == zc})

Out[166]:
('sat', {'a': -1, 'b': -2, 'c': 0})
In [167]:
triangle(1, 0, 1)

Out[167]:
'isosceles'

The updated predicate returns isosceles as expected.

### Round¶

Here is a function that gives you the nearest ten's multiplier

In [168]:
def round10(r):
while r % 10 != 0:
r += 1
return r


As before, we execute the function under the ConcolicTracer context.

In [169]:
with ConcolicTracer() as _:
r = _[round10](1)


We verify that we were able to capture all the predicates

In [170]:
_.context

Out[170]:
({'round10_r_int_52': 'Int'},
[0 != round10_r_int_52%10,
0 != (round10_r_int_52 + 1)%10,
0 != (round10_r_int_52 + 1 + 1)%10,
0 != (round10_r_int_52 + 1 + 1 + 1)%10,
0 != (round10_r_int_52 + 1 + 1 + 1 + 1)%10,
0 != (round10_r_int_52 + 1 + 1 + 1 + 1 + 1)%10,
0 != (round10_r_int_52 + 1 + 1 + 1 + 1 + 1 + 1)%10,
0 != (round10_r_int_52 + 1 + 1 + 1 + 1 + 1 + 1 + 1)%10,
0 != (round10_r_int_52 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1)%10,
Not(0 !=
(round10_r_int_52 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1)%
10)])

We use zeval() to obtain results.

In [171]:
_.zeval()

Out[171]:
('sat', {'r': '41'})

### Absolute Maximum¶

Does our concolic proxies work across functions? Say we have a function max_value() as below.

In [172]:
def abs_value(a):
if a > 0:
return a
else:
return -a


It is called by another function abs_max()

In [173]:
def abs_max(a, b):
a1 = abs_value(a)
b1 = abs_value(b)
if a1 > b1:
c = a1
else:
c = b1
return c


Using the Concolic() context on abs_max().

In [174]:
with ConcolicTracer() as _:
_[abs_max](2, 1)


As expected, we have the predicates across functions.

In [175]:
_.context

Out[175]:
({'abs_max_a_int_53': 'Int', 'abs_max_b_int_54': 'Int'},
[0 < abs_max_a_int_53,
0 < abs_max_b_int_54,
abs_max_a_int_53 > abs_max_b_int_54])
In [176]:
_.zeval()

Out[176]:
('sat', {'a': '2', 'b': '1'})

Solving the predicates works as expected.

Using negative numbers as arguments so that a different branch is taken in abs_value()

In [177]:
with ConcolicTracer() as _:
_[abs_max](-2, -1)

In [178]:
_.context

Out[178]:
({'abs_max_a_int_55': 'Int', 'abs_max_b_int_56': 'Int'},
[Not(0 < abs_max_a_int_55),
Not(0 < abs_max_b_int_56),
-abs_max_a_int_55 > -abs_max_b_int_56])
In [179]:
_.zeval()

Out[179]:
('sat', {'a': ['-', '1'], 'b': '0'})

The solution reflects our predicates. (We used a > 0 in abs_value()).

### Binomial Coefficient¶

For a larger example that uses different kinds of variables, say we want to compute the binomial coefficient by the following formulas

$$^nP_k=\frac{n!}{(n-k)!}$$$$\binom nk=\,^nC_k=\frac{^nP_k}{k!}$$

we define the functions as follows.

In [180]:
def factorial(n):
v = 1
while n != 0:
v *= n
n -= 1
return v

In [181]:
def permutation(n, k):
return factorial(n) / factorial(n - k)

In [182]:
def combination(n, k):
return permutation(n, k) / factorial(k)

In [183]:
def binomial(n, k):
if n < 0 or k < 0 or n < k:
raise Exception('Invalid values')
return combination(n, k)


As before, we run the function under ConcolicTracer.

In [184]:
with ConcolicTracer() as _:
v = _[binomial](4, 2)


Then call zeval() to evaluate.

In [185]:
_.zeval()

Out[185]:
('sat', {'n': '4', 'k': '2'})
The values returned are same as the input values as expected.

### Database¶

For a larger example using the Concolic String class zstr, We use the DB class from the chapter on information flow.

In [186]:
from InformationFlow import DB, sample_db, update_inventory


We first populate our database.

In [187]:
from GrammarMiner import VEHICLES  # minor dependency

In [188]:
db = sample_db()
for V in VEHICLES:
update_inventory(db, V)

In [189]:
db.db

Out[189]:
{'inventory': ({'year': int, 'kind': str, 'company': str, 'model': str},
[{'year': 1997, 'kind': 'van', 'company': 'Ford', 'model': 'E350'},
{'year': 2000, 'kind': 'car', 'company': 'Mercury', 'model': 'Cougar'},
{'year': 1999, 'kind': 'car', 'company': 'Chevy', 'model': 'Venture'}])}

We are now ready to fuzz our DB class. Hash functions are difficult to handle directly (because they rely on internal C functions). Hence we modify table() slightly.

In [190]:
class ConcolicDB(DB):
def table(self, t_name):
for k, v in self.db:
if t_name == k:
return v

def column(self, decl, c_name):
for k in decl:
if c_name == k:
return decl[k]


To make it easy, we define a single function db_select() that directly invokes db.sql().

In [191]:
def db_select(s):
my_db = ConcolicDB()
my_db.db = [(k, v) for (k, v) in db.db.items()]
r = my_db.sql(s)
return r


We now want to run SQL statements under our ConcolicTracer, and collect predicates obtained.

In [192]:
with ConcolicTracer() as _:
_[db_select]('select kind from inventory')


The predicates encountered during the execution are as follows:

In [193]:
_.path

Out[193]:
[str.indexof(db_select_s_str_59, "select ", 0) == 0,
str.indexof(db_select_s_str_59, "select ", 0) == 0,
Not(0 >
str.indexof(str.substr(db_select_s_str_59, 7, 19),
" from ",
0)),
Not(19 <
str.indexof(str.substr(db_select_s_str_59, 7, 19),
" from ",
0)),
Or(0 <
str.indexof(str.substr(db_select_s_str_59, 7, 19),
" from ",
0),
str.indexof(str.substr(db_select_s_str_59, 7, 19),
" from ",
0) ==
0),
Not(Or(0 <
str.indexof(str.substr(db_select_s_str_59, 7, 19),
" where ",
0),
str.indexof(str.substr(db_select_s_str_59, 7, 19),
" where ",
0) ==
0)),
Not(19 <
str.indexof(str.substr(db_select_s_str_59, 7, 19),
" from ",
0) +
6),
Or(19 >
str.indexof(str.substr(db_select_s_str_59, 7, 19),
" from ",
0) +
6,
19 ==
str.indexof(str.substr(db_select_s_str_59, 7, 19),
" from ",
0) +
6),
str.substr(str.substr(db_select_s_str_59, 7, 19), 10, 9) ==
"inventory"]

We can use zeval() as before to solve the constraints.

In [194]:
_.zeval()

Out[194]:
('Gave up', None)

Note: Due to the state of the theory of strings in SMT Solvers, you will mostly get gave up when executing the zeval(). This is perfectly fine. It just means that the SMT solver gave up before it reached an answer.

## Fuzzing with Constraints¶

In this section, we show how to use the infrastructure we built for concolic execution for guiding fuzzing.

### SimpleConcolicFuzzer¶

The SimpleConcolicFuzzer starts with a sample input generated by some other fuzzer. It then runs the function being tested under ConcolicTracer, and collects the path predicates. It then negates random predicates within the path and solves it with z3 to produce a new output that is guaranteed to take a different path than the original.

First, we import the Fuzzer interface, and an example program hang_if_no_space()

In [195]:
from Fuzzer import Fuzzer, hang_if_no_space

In [196]:
from ExpectError import ExpectTimeout

In [197]:
import random


The SimpleConcolicFuzzer is defined with the Fuzzer interface.

In [198]:
class SimpleConcolicFuzzer(Fuzzer):
def __init__(self):
self.ct = []
self.seen = set()
self.seen_strs = set()
self.max_tries = 1000
self.last = None
self.last_idx = None


The add_trace() method provides a way for new traces to be added. It is kept separate from the initialization as we might want to add more than one trace from the same function.

In [199]:
class SimpleConcolicFuzzer(SimpleConcolicFuzzer):
self.ct.append(tracer)
self.last = tracer
self.last_idx = len(tracer.context[1]) - 1


The to_num() method translates a series of predicates to a bit pattern that corresponds to the decision taken at each predicate. If the if branch is taken, the pattern is 1, while else branch is indicated by 0. This allows us to represent any execution path as a single integer.

In [200]:
class SimpleConcolicFuzzer(SimpleConcolicFuzzer):
def to_num(self, arr):
return int(
''.join(
reversed([
'0' if z3.simplify(i).decl().name() == 'not' else '1'
for i in arr
] + ['1'])), 2)


It is used as follows.

In [201]:
scf = SimpleConcolicFuzzer()
a, b = z3.Ints('a b')
print(bin(scf.to_num([z3.Not(a == b)])))
print(bin(scf.to_num([z3.Not(a == b), a == b])))
print(bin(scf.to_num([z3.Not(a == b), a == b, z3.Not(a == b)])))

0b10
0b110
0b1010


The bit pattern is read from left to right. The first 0 indicates the first else branch (corresponding to Not), while the last 1 is added as a sentinel. The pattern 0b1010 indicates two else branches taken.

The get_newpath() function chooses a random point in the list of predicates to negate. That is, it creates a new list with a prefix of random length (with same predicates as original), and a single negated value at the end. It also ensures that values seen once are never repeated.

In [202]:
class SimpleConcolicFuzzer(SimpleConcolicFuzzer):
def get_newpath(self):
switch = random.randint(0, self.last_idx)
if self.seen:
param = list(self.last.fn_args.values())[0]
sparam = z3.String(param)
seen = [sparam != z3.StringVal(i) for i in self.seen_strs]
else:
seen = []
new_path = self.last.path[0:switch] + \
[z3.Not(self.last.path[switch])] + seen
return self.to_num(new_path), new_path


The fuzz() method simply generates new lists of predicates, and solves them to produce new inputs.

In [203]:
class SimpleConcolicFuzzer(SimpleConcolicFuzzer):
def fuzz(self):
for i in range(self.max_tries):
pattern, path = self.get_newpath()
if pattern in self.seen:
continue
s, v = zeval_smt(path, self.last, log=False)
if s != 'sat':
continue
s = list(v.values())[0]
return s
return None


The SimpleConcolicFuzzer is used as follows. First, we use a random string to generate the concolic trace.

In [204]:
with ExpectTimeout(2):
with ConcolicTracer() as _:
_[hang_if_no_space]('abcd')

Traceback (most recent call last):
File "<ipython-input-204-ff2b46f545ec>", line 3, in <module>
_[hang_if_no_space]('abcd')
File "<ipython-input-33-635d2b1b13c2>", line 3, in __call__
self.result = self.fn(*self.concolic(args))
File "<string>", line 4, in hang_if_no_space
File "<string>", line 4, in hang_if_no_space
File "<string>", line 16, in check_time
TimeoutError (expected)


Next, we initialize and add this trace to the fuzzer.

In [205]:
scf = SimpleConcolicFuzzer()


Finally, we fuzz.

In [206]:
for i in range(10):
v = scf.fuzz()
if v is None:
break
print(v)

 \x00
\x00\x00 \x00
\x02
\x80
@
\x00 \x00
\x10
\x00\x00 \x08
\x00\x00\x00 \x00
\x00


Here is another example using cgi_decode()

In [207]:
from Coverage import cgi_decode

In [208]:
with ConcolicTracer() as _:
_[cgi_decode]('abcd')

In [209]:
scf = SimpleConcolicFuzzer()

In [210]:
for i in range(10):
v = scf.fuzz()
if v is None:
break
print(v)

%\x00
\x00\x00\x00+\x00
%\x02
\x00\x00%\x00
\x00+\x00
\x00\x00\x00+\x02
\x00\x00\x00%\x00
\x00\x00%
\x00\x00+\x00
\x00\x00\x00+\x80


The SimpleConcolicFuzzer is reasonably efficient at exploring paths near the path followed by a given sample input. However, it is not very intelligent when it comes to choosing which paths to follow. We look at another fuzzer that lifts the predicates obtained to the grammar and achieves better fuzzing.

### ConcolicGrammarFuzzer¶

The concolic framework can be used directly in fuzzing. First, we extend our GrammarFuzzer with a a helper method tree_to_string() such that we can retrieve the derivation tree of the fuzz output. We also define prune_tree() and coalesce() to reduce the depth of sub trees. These methods accept a list of tokens types such that a node belonging to the token type gets converted from a tree to a leaf node by calling tree_to_string().

In [211]:
from InformationFlow import INVENTORY_GRAMMAR, SQLException

In [212]:
from GrammarFuzzer import GrammarFuzzer

In [213]:
class ConcolicGrammarFuzzer(GrammarFuzzer):
def tree_to_string(self, tree):
symbol, children, *_ = tree
e = ''
if children:
return e.join([self.tree_to_string(c) for c in children])
else:
return e if symbol in self.grammar else symbol

def prune_tree(self, tree, tokens):
name, children = tree
children = self.coalesce(children)
if name in tokens:
return (name, [(self.tree_to_string(tree), [])])
else:
return (name, [self.prune_tree(c, tokens) for c in children])

def coalesce(self, children):
last = ''
new_lst = []
for cn, cc in children:
if cn not in self.grammar:
last += cn
else:
if last:
new_lst.append((last, []))
last = ''
new_lst.append((cn, cc))
if last:
new_lst.append((last, []))
return new_lst


We can now use the fuzzer to produce inputs for our DB.

In [214]:
tgf = ConcolicGrammarFuzzer(INVENTORY_GRAMMAR)
while True:
qtree = tgf.fuzz_tree()
query = str(tgf.tree_to_string(qtree))
if query.startswith('select'):
break

In [215]:
from ExpectError import ExpectError

In [216]:
with ExpectError():
print(repr(query))
with ConcolicTracer() as _:
res = _[db_select](str(query))
print(repr(res))

'select z,(k(w)+N) from W0 where T9(e)*i+b-E!=Yt'

Traceback (most recent call last):
File "<ipython-input-216-6cbadb7bcf98>", line 4, in <module>
res = _[db_select](str(query))
File "<ipython-input-33-635d2b1b13c2>", line 3, in __call__
self.result = self.fn(*self.concolic(args))
File "<ipython-input-191-26b52b86ed9e>", line 4, in db_select
r = my_db.sql(s)
File "<string>", line 18, in sql
File "<string>", line 17, in do_select
File "<ipython-input-190-d1ea3580462a>", line 6, in table


Our fuzzer returns with an exception. It is unable to find the specified table. Let us examine the predicates it encountered.

In [217]:
for i, p in enumerate(_.path):
print(i, p)

0 str.indexof(db_select_s_str_62, "select ", 0) == 0
1 str.indexof(db_select_s_str_62, "select ", 0) == 0
2 Not(0 >
str.indexof(str.substr(db_select_s_str_62, 7, 40),
" from ",
0))
3 Not(40 <
str.indexof(str.substr(db_select_s_str_62, 7, 40),
" from ",
0))
4 Or(0 <
str.indexof(str.substr(db_select_s_str_62, 7, 40),
" from ",
0),
str.indexof(str.substr(db_select_s_str_62, 7, 40),
" from ",
0) ==
0)
5 Or(0 <
str.indexof(str.substr(db_select_s_str_62, 7, 40),
" where ",
0),
str.indexof(str.substr(db_select_s_str_62, 7, 40),
" where ",
0) ==
0)
6 Not(40 <
str.indexof(str.substr(db_select_s_str_62, 7, 40),
" from ",
0) +
6)
7 Not(40 <
str.indexof(str.substr(db_select_s_str_62, 7, 40),
" where ",
0))
8 Or(str.indexof(str.substr(db_select_s_str_62, 7, 40),
" from ",
0) +
6 <
str.indexof(str.substr(db_select_s_str_62, 7, 40),
" where ",
0),
str.indexof(str.substr(db_select_s_str_62, 7, 40),
" where ",
0) ==
str.indexof(str.substr(db_select_s_str_62, 7, 40),
" from ",
0) +
6)
9 Not(40 <
str.indexof(str.substr(db_select_s_str_62, 7, 40),
" where ",
0) +
7)
10 Or(40 >
str.indexof(str.substr(db_select_s_str_62, 7, 40),
" where ",
0) +
7,
40 ==
str.indexof(str.substr(db_select_s_str_62, 7, 40),
" where ",
0) +
7)
11 Not(str.substr(str.substr(db_select_s_str_62, 7, 40), 16, 2) ==
"inventory")


Note that we can obtain constraints that are not present in the grammar from using the ConcolicTracer. In particular, see how we are able to obtain the condition that the table needs to be inventory (Predicate 11) for the fuzzing to succeed.

How do we lift these to the grammar? and in particular how do we do it automatically? One option we have is to simply switch the last predicate obtained. In our case, the last predicate is (11). Can we simply invert the predicate and solve it again?

In [218]:
new_path = _.path[0:-1] + [z3.Not(_.path[-1])]

In [219]:
new_ = ConcolicTracer((_.decls, new_path))
new_.fn = _.fn
new_.fn_args = _.fn_args

In [220]:
new_.zeval()

Out[220]:
('No Solutions', None)

Indeed, this will not work as the string lengths being compared to are different.

In [221]:
print(_.path[-1])
z3.solve(z3.Not(_.path[-1]))

Not(str.substr(str.substr(db_select_s_str_62, 7, 40), 16, 2) ==
"inventory")
no solution


A better idea is to investigate what string comparisons are being made, and associate that with the corresponding nodes in the grammar. Let us examine our derivation tree (pruned to avoid recursive structures, and to focus on important parts).

In [222]:
from GrammarFuzzer import display_tree

In [223]:
prune_tokens = [
'<value>', '<table>', '<column>', '<literals>', '<exprs>', '<bexpr>'
]
dt = tgf.prune_tree(qtree, prune_tokens)
display_tree(dt)

Out[223]:

Can we identify which part of the input was supplied by which part of the grammar? We define span() that can recover this information from the derivation tree. For a given node, let us assume that the start point is known. Then, for processing the children, we proceed as follows: We choose one child at a time from left to right, and compute the length of the child. The length of the children before the current child in addition to our starting point gives the starting point of the current child. The end point for each node is simply the end point of its last children (or the length of its node if it is a leaf).

In [224]:
from GrammarFuzzer import START_SYMBOL

In [225]:
def span(node, g, node_start=0):
hm = {}
k, cs = node
end_i = node_start
new_cs = []
for c in cs:
chm, (ck, child_start, child_end, gcs) = span(c, g, end_i)
new_cs.append((ck, child_start, child_end, gcs))
end_i = child_end
hm.update(chm)
node_end = end_i if cs else node_start + len(k)
if k in g and k != START_SYMBOL:
hm[k] = (node_start, node_end - node_start)
return hm, (k, node_start, node_end, new_cs)


We use it as follows:

In [226]:
span_hm, _n = span(dt, INVENTORY_GRAMMAR)

In [227]:
span_hm

Out[227]:
{'<exprs>': (7, 10),
'<table>': (23, 2),
'<bexpr>': (32, 15),
'<query>': (0, 47)}

We can check if we got the right values as follows.

In [228]:
print("query:", query)
for k in span_hm:
start, l = span_hm[k]
print(k, query[start:start + l])

query: select z,(k(w)+N) from W0 where T9(e)*i+b-E!=Yt
<exprs> z,(k(w)+N)
<table> W0
<bexpr> T9(e)*i+b-E!=Yt
<query> select z,(k(w)+N) from W0 where T9(e)*i+b-E!=Yt


Next, we need to obtain all the comparisons made in each predicate. For that, we define two helper functions. The first is unwrap_substrings() that translates multiple calls to z3.SubString and returns the start, and length of the given z3 string expression.

In [229]:
def unwrap_substrings(s):
assert s.decl().name() == 'str.substr'
cs, frm, l = s.children()
fl = frm.as_long()
ll = l.as_long()
if cs.decl().name() == 'str.substr':
newfrm, _l = unwrap_substrings(cs)
return (fl + newfrm, ll)
else:
return (fl, ll)


We define traverse_z3() that traverses a given z3 string expression, and collects all direct string comparisons to a substring of the original argument.

In [230]:
def traverse_z3(p, hm):
def z3_as_string(v):
s = v.as_string()
# Z3 bug: Python z3 API returns quoted strings for as_string
assert s[0] == '"' and s[-1] == '"'
return s[1:-1]

n = p.decl().name()
if n == 'not':
return traverse_z3(p.children()[0], hm)
elif n == '=':
i, j = p.children()
if isinstance(i, (int, z3.IntNumRef)):
return traverse_z3(j, hm)
elif isinstance(j, (int, z3.IntNumRef)):
return traverse_z3(i, hm)
else:
if i.is_string() and j.is_string():
if i.is_string_value():
cs, frm, l = j.children()
if (isinstance(frm, z3.IntNumRef)
and isinstance(l, z3.IntNumRef)):
hm[z3_as_string(i)] = unwrap_substrings(j)
elif j.is_string_value():
cs, frm, l = i.children()
if (isinstance(frm, z3.IntNumRef)
and isinstance(l, z3.IntNumRef)):
hm[z3_as_string(j)] = unwrap_substrings(i)
else:
assert False  # for now
elif n == '<' or n == '>':
i, j = p.children()
if isinstance(i, (int, z3.IntNumRef)):
return traverse_z3(j, hm)
elif isinstance(j, (int, z3.IntNumRef)):
return traverse_z3(i, hm)
else:
assert False
return p

In [231]:
comparisons = {}
for p in _.path:
traverse_z3(p, comparisons)
comparisons

Out[231]:
{'inventory': (23, 2)}

All that we need now is to declare string variables that match the substrings in comparisons, and solve for them for each item in the path. For that, we define find_alternatives().

In [232]:
def find_alternatives(spans, cmp):
alts = {}
for key in spans:
start, l = spans[key]
rset = set(range(start, start + l))
for ckey in cmp:
cstart, cl = cmp[ckey]
cset = set(range(cstart, cstart + cl))
# if rset.issubset(cset): <- ignoring subsets for now.
if rset == cset:
if key not in alts:
alts[key] = set()
return alts


We use it as follows.

In [233]:
alternatives = find_alternatives(span_hm, comparisons)
alternatives

Out[233]:
{'<table>': {'inventory'}}

So, we have our alternatives for each key in the grammar. We can now update our grammar as follows.

In [234]:
INVENTORY_GRAMMAR_NEW = dict(INVENTORY_GRAMMAR)

In [235]:
for k in alternatives:
INVENTORY_GRAMMAR_NEW[k] = INVENTORY_GRAMMAR_NEW[k] + list(alternatives[k])


We made a choice here. We could have completely overwritten the definition of <table> . Instead, we added our new alternatives to the existing definition. This way, our fuzzer will also attempt other values for <table> once in a while.

In [236]:
INVENTORY_GRAMMAR_NEW['<table>']

Out[236]:
['<word>', 'inventory']

Let us try fuzzing with our new grammar.

In [237]:
cgf = ConcolicGrammarFuzzer(INVENTORY_GRAMMAR_NEW)

In [238]:
for i in range(10):
qtree = cgf.fuzz_tree()
query = cgf.tree_to_string(qtree)
print(query)
with ExpectError():
try:
with ConcolicTracer() as _:
res = _[db_select](query)
print(repr(res))
except SQLException as e:
print(e)
print()

select (A*S*l(e)+h) from inventory where l/j<G(G)<P
Invalid WHERE ('(l/j<G(G)<P)')

update W5RhS set TT8=P7 where E!=b+e

delete from inventory where Z(F)/y*W/v<R/w/L(Y)/R
Invalid WHERE ('Z(F)/y*W/v<R/w/L(Y)/R')

delete from Y where -02==:

insert into ZQ (C3y,h4,o3) values ('D','.{Y')

select oy((r42),K==_==D!=(:),((h>N/W)),(k!=p)) from inventory
Invalid WHERE ('(oy((r42),K==_==D!=(:),((h>N/W)),(k!=p)))')

select X==(r),Y,l from inventory where J/x*f>M(h)/B
Invalid WHERE ('(J/x*f>M(h)/B)')

select _6(U,K5)>a-N-n+C*e+s-n from inventory
Invalid WHERE ('(_6(U,K5)>a-N-n+C*e+s-n)')

select n-V/X/X<e(s),fqw(f+v) from inventory
Invalid WHERE ('(n-V/X/X<e(s),fqw(f+v))')

select (d769R090x1Of),(b9S9G89) from inventory
Invalid WHERE ('((d769R090x1Of),(b9S9G89))')



That is, we were able to reach the dangerous method my_eval(). In effect, what we have done is to lift parts of predicates to the grammar. The new grammar can generate inputs that reach deeper into the program than before. Note that we have only handled the equality predicate. One can also lift the '<' and '>' comparison operators to the grammar if required.

Compare the output of our fuzzer to the original GrammarFuzzer below.

In [239]:
gf = GrammarFuzzer(INVENTORY_GRAMMAR)
for i in range(10):
query = gf.fuzz()
print(query)
with ExpectError():
try:
res = db_select(query)
print(repr(res))
except SQLException as e:
print(e)
print()

select (D),j from _4 where N/d/I*o>h

select r-u*Z/d*S/L-h,qN,T!=r from t

delete from _04e where -5!=(X)*p(x)+w*l(J,i,b)/Y/d

delete from c3 where --6!=o8M!=o==79156.443

insert into F (Lu9g:917s) values ('B')

delete from W where j/S*S+P/d/x>a(a)+M(V)

delete from NQ where D(u)-R+X(X,X)==x/w*s/R

delete from C3 where A*q+u/M(z,X)<v<8+X*I(A)==821.2

delete from Tg06M6 where t+V/S<s+.+S

select 5,G from z3 where W>Z==l(W)/g*D-T



As can be seen, the original grammar fuzzer is unable to proceed beyond the table verification.

#### All together¶

We implement these methods in ConcolicGrammarFuzzer. The method update_grammar() allows ConcolicGrammarFuzzer to collect feedback from concolic fuzzing, and update the grammar used for fuzzing accordingly.

In [240]:
class ConcolicGrammarFuzzer(ConcolicGrammarFuzzer):
def prune_tokens(self, tokens):
self.prune_tokens = tokens

def update_grammar(self, trace):
self.comparisons = {}
for p in trace.path:
traverse_z3(p, comparisons)
alternatives = find_alternatives(self.span_range, comparisons)
if self.log:
print('Alternatives:', alternatives, 'Span:', self.span_range)
new_grammar = dict(self.grammar)
for k in alternatives:
new_grammar[k] = list(set(new_grammar[k] + list(alternatives[k])))
self.grammar = new_grammar


The fuzz() method simply generates the derivation tree, computes the span range, and returns the string generated from the derivation tree.

In [241]:
class ConcolicGrammarFuzzer(ConcolicGrammarFuzzer):
def fuzz(self):
qtree = self.fuzz_tree()
self.pruned_tree = self.prune_tree(qtree, self.prune_tokens)
query = self.tree_to_string(qtree)
self.span_range, _n = span(self.pruned_tree, self.grammar)
return query


To ensure that our approach works, let us update our tables slightly.

In [242]:
inventory = db.db.pop('inventory', None)

In [243]:
db.db['vehicles'] = inventory
db.db['months'] = ({
'month': int,
'name': str
}, [{
'month': i + 1,
'name': m
} for i, m in enumerate([
'jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct',
'nov', 'dec'
])])
db.db

Out[243]:
{'vehicles': ({'year': int, 'kind': str, 'company': str, 'model': str},
[{'year': 1997, 'kind': 'van', 'company': 'Ford', 'model': 'E350'},
{'year': 2000, 'kind': 'car', 'company': 'Mercury', 'model': 'Cougar'},
{'year': 1999, 'kind': 'car', 'company': 'Chevy', 'model': 'Venture'}]),
'months': ({'month': int, 'name': str},
[{'month': 1, 'name': 'jan'},
{'month': 2, 'name': 'feb'},
{'month': 3, 'name': 'mar'},
{'month': 4, 'name': 'apr'},
{'month': 5, 'name': 'may'},
{'month': 6, 'name': 'jun'},
{'month': 7, 'name': 'jul'},
{'month': 8, 'name': 'aug'},
{'month': 9, 'name': 'sep'},
{'month': 10, 'name': 'oct'},
{'month': 11, 'name': 'nov'},
{'month': 12, 'name': 'dec'}])}

The ConcolicGrammarFuzzer is used as follows.

In [244]:
cgf = ConcolicGrammarFuzzer(INVENTORY_GRAMMAR)
cgf.prune_tokens(prune_tokens)
for i in range(10):
query = cgf.fuzz()
print(query)
with ConcolicTracer() as _:
with ExpectError():
try:
res = _[db_select](query)
print(repr(res))
except SQLException as e:
print(e)
cgf.update_grammar(_)
print()

delete from b where m/C!=L-K*Q<H+c(O)+-0*H/G

update O8F set E=g,p=Z where y>p

update months set wz=w5 where _*E/O-K(U)<Z+v/G/r/c

delete from months where r-z-Q-d*d-h-P!=r1R
Invalid WHERE ('r-z-Q-d*d-h-P!=r1R')

insert into muc (n23f2W6sOP) values ('y')

insert into months (.9) values ('T|gi','O',1.86,-501.9,0.1,48199462.8)

select w<O/X-S from months where Y/b>Z*K*Z*B
Invalid WHERE ('(Y/b>Z*K*Z*B)')

insert into vehicles (name) values (5.0,25,4.0,'/')

update months set kind=kN,company=a,kind=O where (S<S(Q))==I+t<z

update vehicles set month=kU7,company=k,company=M,name=X where (((b+L)==0.4))==4



As can be seen, the fuzzer starts with no knowledge of the tables vehicles, months and years, but identifies it from the concolic execution, and lifts it to the grammar. This allows us to improve the effectiveness of fuzzing.

## Limitations¶

As with dynamic taint analysis, implicit control flow can obscure the predicates encountered during concolic execution. However, this limitation could be overcome to some extent by wrapping any constants in the source with their respective proxy objects. Similarly, calls to internal C functions can cause the symbolic information to be discarded, and only partial information may be obtained.

## Synopsis¶

This chapter defines two main classes: SimpleConcolicFuzzer and ConcolicGrammarFuzzer. The SimpleConcolicFuzzer first uses a sample input to collect predicates encountered. The fuzzer then negates random predicates to generate new input constraints. These, when solved, produce inputs that explore paths that are close to the original path. It can be used as follows.

The cgi_decode() function from the chapter on coverage is the subject program.

In [245]:
from Coverage import cgi_decode


We first obtain the constraints using ConcolicTracer.

In [246]:
with ConcolicTracer() as _:
_[cgi_decode]('abcd')


These constraints are added to the concolic fuzzer as follows:

In [247]:
scf = SimpleConcolicFuzzer()


The concolic fuzzer then uses the constraints added to guide its fuzzing as follows:

In [248]:
for i in range(10):
v = scf.fuzz()
if v is None:
break
print(v)

%\x00
\x00\x00+\x00
\x00\x00+\x04
\x00\x00+\x02
\x00\x00\x00+\x00
\x00+\x00
\x00\[email protected]
\x00+
\x00%\x00
\x00+\x08


The SimpleConcolicFuzzer simply explores all paths near the original path traversed by the sample input. It does not use any kind of feedback from fuzzing. The ConcolicGrammarFuzzer on the other hand, can collect feedback from the subject under fuzzing, and lift some of the constraints encountered to the grammar, enabling deeper fuzzing. It is used as follows:

In [249]:
from InformationFlow import INVENTORY_GRAMMAR, SQLException

In [250]:
cgf = ConcolicGrammarFuzzer(INVENTORY_GRAMMAR)
cgf.prune_tokens(prune_tokens)
for i in range(10):
query = cgf.fuzz()
print(query)
with ConcolicTracer() as _:
with ExpectError():
try:
res = _[db_select](query)
print(repr(res))
except SQLException as e:
print(e)
cgf.update_grammar(_)
print()

update NK5f set L=D,o=G,q=X where A!=S

update vehicles set V9=B,y=r,E=E where r(U)/x(i)>b

insert into V6S (V,R2,O) values ('2/','@')

update months set q=J1,Y=n,r=X where o-S*B!=(u!=q)

select e-e/B>Z(i)-(_) from vehicles where 59.2!=S==C
Invalid WHERE ('(59.2!=S==C)')

delete from months where (M)*I<p+e(S)-R*O(_)*p(8,u)
Invalid WHERE ('(M)*I<p+e(S)-R*O(_)*p(8,u)')

insert into months (CRl,i,AW5,_j) values ('6')

select a(A),J from K8z where p-f-i<H

delete from vehicles where h(2,a)>((Sg90))+M5((az1(p)*r/W/V+b9(H)))
Invalid WHERE ('h(2,a)>((Sg90))+M5((az1(p)*r/W/V+b9(H)))')

select 9.7 from months
[9.7, 9.7, 9.7, 9.7, 9.7, 9.7, 9.7, 9.7, 9.7, 9.7, 9.7, 9.7]



## Lessons Learned¶

• Concolic execution can often provide more information than taint analysis with respect to the program behavior. However, this comes at a much larger runtime cost. Hence, unlike taint analysis, real-time analysis is often not possible.

• Similar to taint analysis, concolic execution also suffers from limitations such as indirect control flow and internal function calls.

• Predicates from concolic execution can be used in conjunction with fuzzing to provide an even more robust indication of incorrect behavior than taints, and can be used to create grammars that are better at producing valid inputs.

## Next Steps¶

A costlier but stronger alternative to concolic fuzzing is symbolic fuzzing. Similarly, search based fuzzing can often provide a cheaper exploration strategy than relying on SMT solvers to provide inputs slightly different from the current path.

## Background¶

The technique of concolic execution was originally used to inform and expand the scope of symbolic execution \cite{king1976symbolic}, a static analysis technique for program analysis. Laron et al. cite{Larson2003} was the first to use the concolic execution technique.

The idea of using proxy objects for collecting constraints was pioneered by Cadar et al. \cite{cadar2005execution}. The concolic execution technique for Python programs used in this chapter was pioneered by PeerCheck \cite{PeerCheck}, and Python Error Finder \cite{Barsotti2018}.

## Exercises¶

### Exercise 1: Implment a Concolic Float Proxy Class¶

While implementing the zint binary operators, we asserted that the results were int. However, that need not be the case. For example, division can result in float. Hence, we need proxy objects for float. Can you implement a similar proxy object for float and fix the zint binary operator definition?

Solution. The solution is as follows.

As in the case of zint, we first open up zfloat for extension.

In [251]:
class zfloat(float):
def __new__(cls, context, zn, v, *args, **kw):
return float.__new__(cls, v, *args, **kw)


We then implement the initialization methods.

In [252]:
class zfloat(zfloat):
@classmethod
def create(cls, context, zn, v=None):
return zproxy_create(cls, 'Real', z3.Real, context, zn, v)

def __init__(self, context, z, v=None):
self.z, self.v = z, v
self.context = context


The helper for when one of the arguments in a binary operation is not float.

In [253]:
class zfloat(zfloat):
def _zv(self, o):
return (o.z, o.v) if isinstance(o, zfloat) else (z3.RealVal(o), o)


Coerce float into bool value for use in conditionals.

In [254]:
class zfloat(zfloat):
def __bool__(self):
# force registering boolean condition
if self != 0.0:
return True
return False


Define the common proxy method for comparison methods

In [255]:
def make_float_bool_wrapper(fname, fun, zfun):
def proxy(self, other):
z, v = self._zv(other)
z_ = zfun(self.z, z)
v_ = fun(self.v, v)
return zbool(self.context, z_, v_)

return proxy


We apply the comparison methods on the defined zfloat class.

In [256]:
FLOAT_BOOL_OPS = [
'__eq__',
# '__req__',
'__ne__',
# '__rne__',
'__gt__',
'__lt__',
'__le__',
'__ge__',
]

In [257]:
for fname in FLOAT_BOOL_OPS:
fun = getattr(float, fname)
zfun = getattr(z3.ArithRef, fname)
setattr(zfloat, fname, make_float_bool_wrapper(fname, fun, zfun))


Similarly, we define the common proxy method for binary operators.

In [258]:
def make_float_binary_wrapper(fname, fun, zfun):
def proxy(self, other):
z, v = self._zv(other)
z_ = zfun(self.z, z)
v_ = fun(self.v, v)
return zfloat(self.context, z_, v_)

return proxy


And apply them on zfloat

In [259]:
FLOAT_BINARY_OPS = [
'__sub__',
'__mul__',
'__truediv__',
# '__div__',
'__mod__',
# '__divmod__',
'__pow__',
# '__lshift__',
# '__rshift__',
# '__and__',
# '__xor__',
# '__or__',
'__rsub__',
'__rmul__',
'__rtruediv__',
# '__rdiv__',
'__rmod__',
# '__rdivmod__',
'__rpow__',
# '__rlshift__',
# '__rrshift__',
# '__rand__',
# '__rxor__',
# '__ror__',
]

In [260]:
for fname in FLOAT_BINARY_OPS:
fun = getattr(float, fname)
zfun = getattr(z3.ArithRef, fname)
setattr(zfloat, fname, make_float_binary_wrapper(fname, fun, zfun))


These are used as follows.

In [261]:
with ConcolicTracer() as _:
za = zfloat.create(_.context, 'float_a', 1.0)
zb = zfloat.create(_.context, 'float_b', 0.0)
if za * zb:
print(1)

In [262]:
_.context

Out[262]:
({'float_a': 'Real', 'float_b': 'Real'}, [Not(float_a*float_b != 0)])

Finally, we fix the zint binary wrapper to correctly create zfloat when needed.

In [263]:
def make_int_binary_wrapper(fname, fun, zfun):
def proxy(self, other):
z, v = self._zv(other)
z_ = zfun(self.z, z)
v_ = fun(self.v, v)
if isinstance(v_, float):
return zfloat(self.context, z_, v_)
elif isinstance(v_, int):
return zint(self.context, z_, v_)
else:
assert False

return proxy

In [264]:
for fname in INT_BINARY_OPS:
fun = getattr(int, fname)
zfun = getattr(z3.ArithRef, fname)
setattr(zint, fname, make_int_binary_wrapper(fname, fun, zfun))


Checking whether it worked as expected.

In [265]:
with ConcolicTracer() as _:
v = _[binomial](4, 2)

In [266]:
_.zeval()

Out[266]:
('sat', {'n': '4', 'k': '2'})

### Exercise 2: Bit Manipulation¶

Similar to floats, implementing the bit manipulation functions such as xor involves converting int to its bit vector equivalents, performing operations on them, and converting it back to the original type. Can you implement the bit manipulation operations for zint?

Solution. The solution is as follows.

We first define the proxy method as before.

In [267]:
def make_int_bit_wrapper(fname, fun, zfun):
def proxy(self, other):
z, v = self._zv(other)
z_ = z3.BV2Int(
zfun(
z3.Int2BV(
self.z, num_bits=64), z3.Int2BV(
z, num_bits=64)))
v_ = fun(self.v, v)
return zint(self.context, z_, v_)

return proxy


It is then applied to the zint class.

In [268]:
BIT_OPS = [
'__lshift__',
'__rshift__',
'__and__',
'__xor__',
'__or__',
'__rlshift__',
'__rrshift__',
'__rand__',
'__rxor__',
'__ror__',
]

In [269]:
for fname in BIT_OPS:
fun = getattr(int, fname)
zfun = getattr(z3.BitVecRef, fname)
setattr(zint, fname, make_int_bit_wrapper(fname, fun, zfun))


Invert is the only unary bit manipulation method.

In [270]:
class zint(zint):
def __invert__(self):
return zint(self.context, z3.BV2Int(
~z3.Int2BV(self.z, num_bits=64)), ~self.v)


The my_fn() computes xor and returns True if the xor results in a non zero value.

In [271]:
def my_fn(a, b):
o_ = (a | b)
a_ = (a & b)
if o_ & ~a_:
return True
else:
return False


Using that under ConcolicTracer

In [272]:
with ConcolicTracer() as _:
print(_[my_fn](2, 1))

True


We log the computed SMT expression to verify that everything went well.

In [273]:
_.zeval(log=True)

(declare-const my_fn_a_int_96 Int)
(declare-const my_fn_b_int_97 Int)
(assert (let ((a!1 (bvnot (bvor (bvnot ((_ int2bv 64) my_fn_a_int_96))
(bvnot ((_ int2bv 64) my_fn_b_int_97))))))
(let ((a!2 (bvor (bvnot (bvor ((_ int2bv 64) my_fn_a_int_96)
((_ int2bv 64) my_fn_b_int_97)))
a!1)))
(not (= 0 (bv2int (bvnot a!2)))))))
(check-sat)
(get-model)
sat
(model
(define-fun my_fn_a_int_96 () Int
(- 18446744073709551617))
(define-fun my_fn_b_int_97 () Int
(- 2))
)

Out[273]:
('sat', {'a': ['-', '18446744073709551617'], 'b': ['-', '2']})

We can confirm from the formulas generated that the bit manipulation functions worked correctly.

### Exercise 3: String Translation Functions¶

We have seen how to define upper() and lower(). Can you define the capitalize(), title(), and swapcase() methods?

Solution. Solution not yet available.