Boolean Satisifiability Based ASIC Resistant Proof of Work

Matthew Wampler-Doty

Abstract: A “Proof of Work” (abbreviated PoW) system for a cryptocurrency based on providing logical formulae and variable assignments is proposed. Miners are required to both mine the logical formulae as well as provide variable assignments, as per the Stephen Cook’s celebrated Satisfaction Problem (abbreviated SAT) from his paper "The Complexity of Theorem Proving Procedures" (1970). To ensure the problem is non-trivial, formulae must conform to a large number of constraints, among them are: (1) they must have a fixed number of literals in each clause, (2) the graph of clauses must be connected, and a spanning tree traversal must be provided to prove this, (3) a minimal number of variables is necessary and finally (4) there must be exactly $N$ clauses. A naïve miner is proposed, and it is shown that it follows the same statistical behavior as the conventional BitCoin mining system. Given the nature of the problems this system is based on, we anticipate occasional algorithmic breakthroughs will result in speedups which will be difficult to introduce into ASICs, given their long development cycle. Verification is linear in the number of clauses in the mined solution.

Introduction

One of the problems facing Bitcoin is the rise of ASIC hardware specially designed for mining. These systems are designed to solve problems of the the form:

$$sha256(sha256(block\_header + nonce)) < target$$

Where $nonce$ is a free variable which may be incremented from some minimal value. Solutions to such problems constitute Proof of Work that is used as the basis for BitCoin. Since $sha256$ is straightforward to implement, ASICs can readily compute it in very few clock cycles, as compared to conventional CPU based architectures. As of November 2014, it is no longer feasible to mine BitCoin using a conventional CPU based architecture.

Alternatives systems are currently being investigated in the crypto-currency community to avoid this issue. One proposal is Hashimoto by Thaddeus Dryja (2014), who suggests using a two stage hash function. Another system, due to Vitalik Buterin, uses randomly generated logic circuits.

An independent thread of research involves mining protocols is based on traditional, Nondeterministic Polynomial (NP) problems from the mathematics literature. One such system is PrimeCoin, which is based on finding numbers that pass the popular test for primality using Fermat's Little Theorem. This was done to allow for reasonable control of inflation. As such numbers used in its blockchain are only probabilistic primes rather than true prmes, and some are not primes at all but rather composites known as Carmichael numbers (see A002997 in the Online Encyclopedia of Integer Sequences). A similar cryptocurrency, RieCoin, is also based finding prime numbers, but it uses the Miller-Rabin test to ensure true primality.

Traditional NP problems from computer science theory present an intriguing possibility for avoiding ASIC block chain mining. For algorithmically rich problems, speedups can come from researcher insight, rather than simple hardware improvements. Since ASICs have a long development cycle, this allows for savvy researchers to compete with ASICs designers and allow for CPUs to remain a competitive means of mining. In Boolean Satisfaction (SAT) in particular, researchers routinely report increases in performance in orders of magnitude (see, for instance, Duraira (2004), Aloul (2006), Velev (2009), etc.). It is noteworthy that all commercial SAT solvers are CPU based.

To this end, we propose SATCoin, for use as a Proof of Work specification based on SAT problems intended to be difficult to implement in an ASIC. We first go over how proof of work is verified, and then go on to present an algorithm suitable for mining.

Verification

In this section, we present the verification scheme for SATCoin, which exhibits linear time complexity.

Satisfaction

A satisfaction problem is the problem of finding an assignment of logical variables to either True or False such that a logical formula witha particular form is true. Formulae for SAT problems are presented in Conjunctive Normal Form (CNF), which are comprised of conjuncts of disjuncts of literals. A literal is a logical formula or its negation. Here is an example of a CNF formula:

$(A\vee \neg B \vee \neg D) \wedge \\ (\neg B \vee C \vee \neg D \vee E) \wedge \\ (C \vee \neg C \vee \neg A)$

SAT problems are solved by some assignment $\alpha$ of variables to either $True$ or $False$ that make each disjunct of the conjunct true. For instance, as long as $B$ is assigned to $False$, $E$ is assigned to $True$, and $C$ is assigned to $False$, all of the disjuncts of the above formula are made true.

In python, a literal can be modeled as an (immutable) object with two attributes: sign and a variable. A sign is either True or False, while variable is the name of the underlying variable.

A CNF formula can then be modeled as a collection of collections of literals. Here is how we'd express the example above:

In [1]:
example_sat_problem = \
  [[{"sign": True, "variable": "A"}, 
    {"sign": False, "variable": "B"}, 
    {"sign": False, "variable": "D"}],
   [{"sign": False, "variable": "B"}, 
    {"sign": True, "variable": "C"}, 
    {"sign": False, "variable": "D"},
    {"sign": True, "variable": "E"}],
   [{"sign": True, "variable": "C"}, 
    {"sign": False, "variable": "C"}, 
    {"sign": True, "variable": "A"}]]

An assignment can be modeled as a dictionary, with the keys being the variable names and the values being True or False.

In [2]:
example_assignment = {"B": False, "E": True, "C": False}

Checking if a SAT problem is satisfied by a variable assignment is straight forward. We make the assumption that if a variable's truth value is not specified then it is False.

In [3]:
def satisfied(SAT_problem, variable_assignment):
    from collections import defaultdict
    variable_assignment = defaultdict(lambda: False, **variable_assignment)
    return any(variable_assignment[literal["variable"]] == literal["sign"]
               for clause in SAT_problem
               for literal in clause)

We can verify satisfied is implemented correctly by checking that our example evaluates to True:

In [4]:
satisfied(example_sat_problem, example_assignment)
Out[4]:
True

SAT Problem Generation

We next a present how SAT problems are generated. We opt to allow SAT problems to be generated non-deterministically, allowing for the miner to construct the problem themselves.

To begin, define a linear congruential random number generator based on the secp256k1 elliptic curve that is used in BitCoin.

In [5]:
def next_random(n):
    return (abs(n) * 3**160 + 7**80) % (2**256 - 4294968273)

Next, we define the initial conditions for blockchain generation. A header is provided by the cryptocurrency network, which is a sha256 hash of the current blockchain. At the beginning of generation SAT problem generation, the miner selects a set of generators, which are all positive integers.

At each stage of SAT problem generation, the generators initially chosen are used to compute $n$ literals, using the latest random number $n$ supplied by next. For the $i$th generator $g_i$, the variable for its literal is $n \text{ mod } g_i$ while the sign is True if the $i$th lowest bit of $n$ is 1, and False if the $i$th lowest bit is 0. This is implemented as follows:

In [6]:
def generate_literals(n, generators):
    return [{"variable": n % g, "sign": ((n >> idx) % 2) == 1, "generator": g} 
            for idx,g in enumerate(generators)]

The miner then picks some number of literals from the generated literals to form a clause. The chosen generators are added to the current random value, and next is called on the result.

To give an example, suppose that the current random value is 3681122926843, and the user chose [9,13,14,21,33,35] as their initial generators. Then they could choose from among the following possibilities of literals to generate a clause:

In [7]:
generate_literals(3681122926843, [9,13,14,21,33,35])
Out[7]:
[{'generator': 9, 'sign': True, 'variable': 1},
 {'generator': 13, 'sign': True, 'variable': 11},
 {'generator': 14, 'sign': False, 'variable': 5},
 {'generator': 21, 'sign': True, 'variable': 19},
 {'generator': 33, 'sign': True, 'variable': 10},
 {'generator': 35, 'sign': True, 'variable': 33}]

Suppose the miner wanted to make a clause with 3 literals. Given the 6 possible generators, they would have ${6 \choose 3} = 20$ possibilities to choose from. Let's suppose they chose the literals generated by [9,21,33], then the next random value would be:

In [8]:
next_random(3681122926843+9+21+33)
Out[8]:
72951700441152273488650601127460710619839498214931026171753573659363821185246L

Putting everything together, we can verify if a miner correctly generated a SAT problem with the following check:

In [9]:
def sat_problem_generated(header, generators, sat_problem):
    n = header
    for clause in sat_problem:
        possible_literals = generate_literals(n, generators)
        if not all(literal in possible_literals for literal in clause):
            return False
        n = next_random(n + sum(literal["generator"] for literal in clause))
    return True

Here is an example with 8 generators:

In [10]:
header = 5751887518028651109

example_generators = [5,7,8,9,11,13,17,19]

example_generated_sat_problem = \
 [({'generator': 5, 'sign': True, 'variable': 4},
   {'generator': 7, 'sign': False, 'variable': 5},
   {'generator': 9, 'sign': False, 'variable': 6}),
  ({'generator': 5, 'sign': False, 'variable': 3L},
   {'generator': 7, 'sign': True, 'variable': 4L},
   {'generator': 8, 'sign': False, 'variable': 2L}),
  ({'generator': 5, 'sign': False, 'variable': 0L},
   {'generator': 7, 'sign': True, 'variable': 6L},
   {'generator': 8, 'sign': True, 'variable': 6L})]

sat_problem_generated(5751887518028651109, example_generators, example_generated_sat_problem)
Out[10]:
True

Constraints

So far, we have presented SAT problems, and shown how a miner can non-deterministically construct one. Right now the problem has no constraints, however. In order to make the problem non-trivial, we introduce a series of structural constraints.

Constraints on Clauses

Fixed Number of Clauses

One simple criterion is to have a fixed number of clauses. More clauses put more constraints on the possible SAT solutions.

In [11]:
def number_of_clauses(n, clauses):
    return n == len(clauses)

Clause Width

The first demand we make is that there be a specified number of unique variables in a clause; this is the clause's width.

Since it will come up somewhat often, here is a simple function for extracting the set of variables from a clause:

In [12]:
def clause_variables(clause):
    "Returns a frozenset of variables in a clause"
    return frozenset(literal["variable"] for literal in clause)

We can easily check that there are $n$ variables in a clause by checking the length of the result of clause_variables:

In [13]:
def clause_width(n, clause):
    return n == len(clause_variables(clause))

def all_clauses_width(n, clauses):
    return all(clause_width(n, clause) for clause in clauses)

Unique Clauses

SAT problems with repitious clauses are easier to solve, so we will want to constrain problems so that no two have the same variables.

In [14]:
def no_repeated(clauses):
    visited = set([])
    for clause in clauses:
        variables = clause_variables(clause)
        if variables in visited:
            return False
        visited.add(variables)
    return True

Here's an example of a set of clauses where everything is unique:

In [15]:
example_unique = \
 [({'sign': True, 'variable': 4L},
   {'sign': False, 'variable': 6L},
   {'sign': True, 'variable': 5L}),
  ({'sign': True, 'variable': 3L},
   {'sign': False, 'variable': 3L},
   {'sign': True, 'variable': 5L}),
  ({'sign': True, 'variable': 3L},
   {'sign': True, 'variable': 2L},
   {'sign': False, 'variable': 3L}),
  ({'sign': False, 'variable': 4L},
   {'sign': False, 'variable': 6L},
   {'sign': True, 'variable': 4L})]

no_repeated(example_unique)
Out[15]:
True

Here's an example with repeated clauses that we do not want to permit:

In [16]:
example_not_unique = \
 [({'sign': True, 'variable': 0L},
   {'sign': False, 'variable': 0L},
   {'sign': False, 'variable': 7L}),
  ({'sign': False, 'variable': 0L},
   {'sign': False, 'variable': 1L},
   {'sign': False, 'variable': 0L}),
  ({'sign': True, 'variable': 3L},
   {'sign': True, 'variable': 2L},
   {'sign': True, 'variable': 7L}),
  ({'sign': True, 'variable': 4L},
   {'sign': False, 'variable': 3L},
   {'sign': True, 'variable': 5L}),
  ({'sign': True, 'variable': 4L},
   {'sign': True, 'variable': 1L},
   {'sign': False, 'variable': 3L}),
  ({'sign': True, 'variable': 0L},
   {'sign': True, 'variable': 0L},
   {'sign': True, 'variable': 7L})]

no_repeated(example_not_unique)
Out[16]:
False

Connectedness

Another property we wish to enforce is that the graph of clauses must be connected. This enforces a certain level of interaction among all of the clauses; setting one literal can potentially force the value of any other literal.

To ensure that the graph of clauses is connected, we make use of the following observation:

Fact. A graph is connected if and only if there is a traversal of a spanning tree that visits every node.

Checking that a given traversal of a spanning tree visits every node can be carried out in linear time. To start, we define a binary predicate which asserts whether two clauses are connected:

In [17]:
def connected(c1, c2):
    ls1 = clause_variables(c1)
    ls2 = clause_variables(c2)
    return any(l in ls2 for l in ls1)

A spanning tree traversal is represented as a list of data structures, each with two fields:

  • An index of a clause in the SAT problem (the 'child clause index')
  • An index of a parent clause in the SAT problem (the 'parent clause index')

In the case of the root, there is no parent clause, but all subsequent entries must have a parent clause. Morever, the parent clause must always have been a clause that's been previously listed, and the parent and child must be connected according to the connected predicate.

Given that the spanning tree traversal never repeats and has as many entries as clauses, we know that the graph of clauses is indeed connected.

In [18]:
def clauses_connected_by_traversal(clauses, traversal):
    if len(traversal) != len(clauses) or \
       not (0 <= traversal[0]['child clause index'] < len(clauses)):
        return False
    visited = set([traversal[0]['child clause index']])
    for i in range(1,len(traversal)):
        child_clause_index = traversal[i]['child clause index']
        parent_clause_index = traversal[i]['parent clause index']
        if child_clause_index in visited or \
           not (parent_clause_index in visited) or \
           not (0 <= child_clause_index < len(clauses)) or \
           not connected(clauses[child_clause_index],
                         clauses[parent_clause_index]):
               return False
        visited.add(child_clause_index)
    return True

It's of course helpful to see an example:

In [19]:
connected_example_sat_problem = \
 [({'sign': True, 'variable': 4L},
   {'sign': False, 'variable': 4L},
   {'sign': True, 'variable': 5L}),
  ({'sign': True, 'variable': 3L},
   {'sign': False, 'variable': 4L},
   {'sign': False, 'variable': 1L}),
  ({'sign': True, 'variable': 2L},
   {'sign': True, 'variable': 6L},
   {'sign': False, 'variable': 3L}),
  ({'sign': True, 'variable': 3L},
   {'sign': False, 'variable': 0L},
   {'sign': True, 'variable': 5L}),
  ({'sign': True, 'variable': 4L},
   {'sign': False, 'variable': 0L},
   {'sign': False, 'variable': 1L}),
  ({'sign': False, 'variable': 2L},
   {'sign': False, 'variable': 1L},
   {'sign': True, 'variable': 4L})]
    
connected_example_spanning_tree_traversal = \
 [{'child clause index': 0},
  {'child clause index': 1, 'parent clause index': 0},
  {'child clause index': 2, 'parent clause index': 1},
  {'child clause index': 3, 'parent clause index': 2},
  {'child clause index': 4, 'parent clause index': 1},
  {'child clause index': 5, 'parent clause index': 2}]

clauses_connected_by_traversal(connected_example_sat_problem, connected_example_spanning_tree_traversal)
Out[19]:
True

The resulting spanning tree can of course be easily recovered from the traversal. Here's the spanning tree used in the above example:

In [20]:
%load_ext hierarchymagic
In [21]:
%%dot -f svg
graph G {
    0--1; 1--2; 2--3; 1--4; 2--5;
}
G 0 0 1 1 0--1 2 2 1--2 4 4 1--4 3 3 2--3 5 5 2--5

Constraints on Generators

Fixed Number of Generators

Similar to clauses, we also demand that there be a fixed number of generators:

In [22]:
def number_of_generators(n, generators):
    return n == len(generators)

Minimal Value For Generators

Some criteria for clauses are easy to satisfy with a choice of small generators. So we constrain the problem by presenting a minimal value for generator.

In [23]:
def min_generator(m, generators):
    return m <= min(*generators)

BitCoin Style SHA256 Checking

So far, we've presented a number of criteria for enforcing that the SAT problems generated by miners be non-trivial. Ideally, we'd like to be able to set a hard statistical probability that a proposed solution will pass.

We adopt the approach taken by BitCoin, but with a modification. We introduce the criterion that:

$sha256(block\_header,variable\_assignment) < target$

Like BitCoin, this effectively means that with probability $p$ a solution will be rejected, where $p = \frac{target}{2^{256}}$. Unlike BitCoin, where solutions can be found via linear search (simply incrementing a nonce value), the miner must instead enumerate possible satisifying variable assignments for a SAT problem they have constructed. Enumerating every satisfying variable assignment for a given SAT problem is #P-complete, which is NP-Hard. Of course, a miner could also try to compose other SAT problems with varying solutions, which is in NP.

Below we give an implementation of this criterion; we make use of the JSON representation of the variable assignment, although there are many possible representations.

In [24]:
def sha256_less_than_target(block_header, variable_assignment, target):
    import hashlib
    import json
    s = hashlib.sha256()
    s.update(str(block_header))
    s.update(json.dumps(variable_assignment, sort_keys=True))
    return int('0x' + s.hexdigest(), 16) < target

Here is an example; as we can see, variable_assignment and block_header will pass if we are rejecting approximately $66.\overline{6}\%$ of solutions.

In [25]:
example_block_header = 28850183455121774801176625632015427996404687904360779760892949139292104258030L
example_variable_assignment = {
  1L: True,
  3L: True,
  6L: False,
  8L: False,
  9L: True,
  15L: True,
  17L: True,
  18L: False,
  19L: True,
  23L: True,
  28L: False,
  30L: False,
  35L: True,
  36L: False,
  37L: True }

sha256_less_than_target(example_block_header, example_variable_assignment, 2**256 / 3)
Out[25]:
True

If we instead demand that approximately $75\%$ of solutions be rejected, this example does not pass.

In [26]:
sha256_less_than_target(example_block_header, example_variable_assignment, 2**256 / 4)
Out[26]:
False

Full Solution Validation

We now present all of our criteria together as a comprehensive proof of work validation scheme.

In [27]:
def proof_of_work_validate(solution, parameters):
    return (satisfied(solution["clauses"], solution["variable assignment"]) and
            sat_problem_generated(parameters["block header"], solution["generators"], solution["clauses"]) and
            number_of_clauses(parameters["number of clauses"], solution["clauses"]) and
            all_clauses_width(parameters["clause width"], solution["clauses"]) and
            no_repeated(solution["clauses"]) and
            clauses_connected_by_traversal(solution["clauses"], solution["spanning tree traversal"]) and
            number_of_generators(parameters["number of generators"], solution["generators"]) and
            min_generator(parameters["minimal generator"], solution["generators"]) and
            sha256_less_than_target(parameters["block header"], solution["variable assignment"], parameters["target"]))

Mining

In this section, we present a mining algorithm. Since the acceptance criteria are open ended, there are many possible mining algorithms.

Mining, like ordinary SAT solving, can be carried out via heuristic backtrack search. Below is a simple depth first search algorithm making use of python generators. By construction, this algorithm runs in $O(EXP(n))$, where $n$ is the number of clauses.

In [28]:
def mine(params):
    i = 0
    while True:
        start = params["minimal generator"] + i
        generators = range(start, start + parameters["number of generators"])
        for state in initial_states(generators, params):
            for solution in search(state,params):
                if sha256_less_than_target(params["block header"], solution["variable assignment"], params["target"]):
                    yield solution
                else:
                    break
        i += 1
    
def search(state, params):
    if (len(state["clauses"]) == params["number of clauses"]):
        yield {"clauses": state["clauses"], 
               "spanning tree traversal": state["spanning tree traversal"],
               "variable assignment": state["variable assignment"],
               "generators": state["generators"]}
    else:
        for new_state in possible_new_states(state, params):
            for solution in search(new_state, params):
                yield solution

Our solution makes use of a search state data structure; for simplicity we model it as immutable and copy-on-write. At each stage of the search, a pool of possible state transitions is lazily generated, and each one is searched recursively.

In [29]:
def possible_new_states(state, params):
    from itertools import combinations
    for clause in combinations(generate_literals(state["current random"], 
                                                 state["generators"]), 
                               params["clause width"]):
        parent_clause_index = None
        for literal in clause:
            if literal["variable"] in state["variable spanning tree indexes"]:
                parent_clause_index = state["variable spanning tree indexes"][literal["variable"]]
                break
        if (parent_clause_index == None or
            not clause_width(params["clause width"], clause) or
            clause_variables(clause) in state["visited clause variables"]):
            continue
        for pivot in clause:
            if not (pivot["variable"] in state["variable assignment"] and
                    state["variable assignment"][pivot["variable"]] != pivot["sign"]):
                yield update_state(state, clause, parent_clause_index, pivot)

Below, we give how initial states are generated. Book keeping variables include: which clauses have been visited, the state of the constructed spanning tree, and the state of the variable assignment under construction.

In [30]:
def initial_states(generators, params):
    from itertools import combinations
    for clause in combinations(generate_literals(params["block header"], generators),
                               params["clause width"]):
        if not clause_width(params["clause width"], clause):
            continue
        for pivot in clause:
            yield {"current random": next_random(params["block header"] + 
                                                 sum(literal["generator"] 
                                                     for literal in clause)),
                   "clauses": [clause],
                   "variable assignment": {pivot["variable"]: pivot["sign"]},
                   "spanning tree traversal": [{ "child clause index": 0 }],
                   "visited clause variables": set([clause_variables(clause)]),
                   "variable spanning tree indexes": { literal["variable"]: 0 
                                                       for literal in clause },
                   "generators": generators }

Finally, we give how states are updated when new information arrives.

In [31]:
def update_state(state, clause, parent_clause_index, pivot):
    from copy import deepcopy
    index = len(state["clauses"])
    new_state = deepcopy(state)
    new_state["clauses"].append(clause)
    new_state["visited clause variables"].add(clause_variables(clause))
    new_state["variable assignment"][pivot["variable"]] = pivot["sign"]
    for l in clause:
        new_state["variable spanning tree indexes"][l["variable"]] = index
    new_state["spanning tree traversal"].append({"parent clause index": parent_clause_index,
                                                 "child clause index": index})
    new_state["current random"] = \
        next_random(state["current random"] + 
                    sum(literal["generator"] for literal in clause))
    return new_state

Putting it together, we can verify that indeed our mining algorithm satisfies our proof of work validation scheme

In [32]:
block_header = 59807446017497444054206621583131621044750062436043876690313232144928752617878L
parameters = {
    "block header": 59807446017497444054206621583131621044750062436043876690313232144928752617878L,
    "number of clauses": 50,
    "clause width": 3,
    "target": 2**256 / 2,
    "minimal generator": 40,
    "number of generators": 8
}
solution = mine(parameters).next()
proof_of_work_validate(solution, parameters)
Out[32]:
True

Performance

In this section, we look at the performance of the mining algorithm under different difficulties. A model for expected performance given a set difficulty is provided. We first show empirically that BitCoin mining conforms to this model, and then go on to show that SATcoin mining also conforms.

For Reference: BitCoin Mining Performance

In this section, we give a simple reference model for BitCoin mining: bitcoin mining time follows a geometric distribution with respect to difficulty.

We begin by assuming that sha256 is an unbiased, uniformly random value. For a set value, the probability that a given nonce will fail to return a hash with value less than $2^{256} / D$ is $1 / D$. This means that the number of times $X$ the miner must compute a sha256 hash to mine a currency is governed by a geometric random distribution:

$$\mathbb{P}[X = k] = \left(1 - \frac{1}{D}\right)^k \frac{1}{D}$$

Next, we model computing the sha256 hash of a block header and nonce, as in conventional BitCoin mining, as taking constant time $\hat{C}$. We can think of $\hat{C}$ as the time it takes to mine when difficulty is set to 0. Let $T$ be a random variable describing the time to mine a coin. We can derive a closed form for the expected time mine a bitcoin as follows:

$$\begin{eqnarray*} \mathbb{E}[T] & = & \sum_{k=1}^\infty \left(1 - \frac{1}{D}\right)^k \frac{1}{D} k \hat{C} \\ & = & \hat{C} \sum_{k=1}^\infty k \left(1 - \frac{1}{D}\right)^k \frac{1}{D} \\ & = & \hat{C} \mathbb{E}[X]\\ & = & \hat{C} \frac{1}{1 / D}\\ & = & \hat{C} D \end{eqnarray*}$$

This model predicts that expect time for mining should increase linearly with difficulty $D$.

We can test this model by writing a simple miner, as follows:

In [33]:
def bitcoin_mine(block_header, target):
    import hashlib
    nonce = 0
    while int('0x' + hashlib.sha256(hashlib.sha256(str(block_header + nonce)).digest()).hexdigest(), 16) >= target:
        nonce += 1
    return nonce

Timing for mining can be carried out using the standard python benchmarking library timeit:

In [34]:
from timeit import timeit

max_val = 2**256
def time_bitcoin_difficulty(d):
    return timeit('bitcoin_mine(randint(0,max_val), %d)' % (max_val / d), 
                  'from random import randint ; from __main__ import bitcoin_mine, max_val', 
                  number=1)

bitcoin_difficulties = range(100,1000,10)
bitcoin_sample_times = [[time_bitcoin_difficulty(d) for _ in range(500)] for d in bitcoin_difficulties]

We can now carry out Monty-Carlo experiments involving our expected time to mine model. Since our hypothesis is that mining should increase linearly with difficulty, it should fit nicely with a linear regression:

In [35]:
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
In [36]:
import numpy as np
from pylab import polyfit
import matplotlib.pyplot as plt
average_times = map(np.mean, bitcoin_sample_times)
C,k = polyfit(bitcoin_difficulties, average_times, 1)
plt.plot(bitcoin_difficulties, average_times)
plt.plot(bitcoin_difficulties, [C*d + k for d in bitcoin_difficulties], '-r')
plt.title("Average Mining Time (Linear)")
plt.show()
print "Estimated Ĉ:", C, "secs"
print "Overhead:", k, "secs"
Estimated Ĉ: 4.00361967002e-06 secs
Overhead: 9.21835800638e-05 secs

We can see that mining occasionally has some outliers which create spikes in the average mining time. This is a consequence of the geometric distribution; while the average time increases linearly with difficulty, variance increases quadratically. We can derive a closed form the variance as follows:

$$\begin{eqnarray*} \operatorname{Var}[T] & = & \mathbb{E}[T^2] - (\mathbb{E}[T])^2 \\ & = & \sum_{k=1}^\infty \left(1 - \frac{1}{D}\right)^k \frac{1}{D} (k \hat{C})^2 - (\hat{C} D)^2 \\ & = & \hat{C}^2 \operatorname{Var}[X]\\ & = & \hat{C}^2 \frac{1 - (1/D)}{(1/D)^2}\\ & = & \hat{C}^2 (D^2 - D) \end{eqnarray*}$$

Empirically, we should expect variance to roughly follow a quadratic formula $k + C_1 D + C_2 D^2$, which takes into account computational overhead and other sources of noise. This model should follow the idealized model closely; we expect:

  1. $C_1 \approx - \hat{C}^2$
  2. $C_2 \approx \hat{C}^2$
In [37]:
variances = map(np.var, bitcoin_sample_times)
C2,C1,k = polyfit(bitcoin_difficulties, variances, 2)
plt.plot(bitcoin_difficulties, variances)
plt.plot(bitcoin_difficulties, [C2*d*d + C1*d + k for d in bitcoin_difficulties], '-r')
plt.title("Mining Time Variance (Quadratic)")
plt.show()
print "C₁:", C1, "C₂:", C2
print "Estimated Ĉ²:", C**2
C₁: -3.4142328846e-10 C₂: 1.61648134516e-11
Estimated Ĉ²: 1.60289704622e-11

As we can see, our model's predictions are quite accurate for the projected variance of mining times.

Finally, we can infer a Cummulative Distribution Function (CDF) for mining time based on the geometric distribution; it is approximated as follows:

$$\mathbb{P}[T < x] \approx 1 - (1 - 1/D)^{x / \hat{C}}$$

Based on the CDF, we can perform hypothesis testing based on the Kolmogorov-Smirnov Test. The default in scipy.stats is two-sided.

In [38]:
import scipy.stats
import pandas

data = []
for d, samples in zip(bitcoin_difficulties[::5], bitcoin_sample_times[::5]):
    cdf = lambda t: 1 - (1 - (1. / d))**(t / C)
    ks_statistic, p_value = scipy.stats.kstest(samples, cdf)
    data.append({"Difficulty D": d,
                 "KS Statistic": ks_statistic,
                 "P-Value": p_value})


pandas.DataFrame.from_records(data)
Out[38]:
Difficulty D KS Statistic P-Value
0 100 0.108416 1.424640e-05
1 150 0.154282 7.376921e-11
2 200 0.066713 2.227100e-02
3 250 0.030847 7.282142e-01
4 300 0.072371 1.008654e-02
5 350 0.037753 4.690235e-01
6 400 0.052314 1.250579e-01
7 450 0.041803 3.388827e-01
8 500 0.021436 9.756494e-01
9 550 0.040351 3.821977e-01
10 600 0.047582 2.013377e-01
11 650 0.038995 4.260011e-01
12 700 0.059341 5.675518e-02
13 750 0.044131 2.769786e-01
14 800 0.038618 4.387742e-01
15 850 0.065284 2.692867e-02
16 900 0.026932 8.613362e-01
17 950 0.039018 4.252390e-01

Recall that the null-hypothesis in the KS-Test is that the two distributions are the same, and that we reject the null hypothesis when the p-value is high or the KS-Statistic is high. As we can see, the KS-Statistic itself is always low, while the p-value it typically low up to higher difficulties D. As D increases the underlying geometric distribution approaches an exponential distribution, where statistics can be dominated by outliar behavior even in Monty-Carlo experiments (a point Nassim Taleb makes at length in his writings, most notably The Black Swan (2007)).

SATCoin Mining Performance

Unlike bitcoin, the SATCoin protocol outlined has more difficulty parameters. The parameters we investigate are:

  • Target Difficulty (same as BitCoin)
  • Number of Clauses
  • Minimal Generator

Target Difficulty

We begin by looking at adjusting the target difficulty. We observe that SATCoin follows the same geometric distribution statistics that BitCoin follows:

In [39]:
def random_parameters_with_difficulty(d, number_of_clauses, min_generator):
    from random import randint
    return {
        "block header": randint(0,2**256),
        "number of clauses": number_of_clauses,
        "clause width": 3,
        "target": 2**256 / d,
        "minimal generator": min_generator,
        "number of generators": 8
    }

def time_sat_difficulty(d,n,m):
    return timeit('next(mine(random_parameters_with_difficulty(%d,%d,%d)))' % (d,n,m), 
                  'from __main__ import random_parameters_with_difficulty, mine',number=1)

sat_difficulties = range(1,50)
sat_sample_times = [[time_sat_difficulty(d,10,5) for _ in range(100)] for d in sat_difficulties]

The statistical behavior of mining is the same as BitCoin, however there is a lot more variance and computational overhead. As we can see the average mining time increases linearly:

In [40]:
plt.plot(sat_difficulties, map(np.mean, sat_sample_times))
C,k = polyfit(sat_difficulties, map(np.mean, sat_sample_times), 1)
plt.plot(sat_difficulties, [C*d + k for d in sat_difficulties], '-r')
plt.title("Average Mining Time (Linear)")
plt.show()
print "Estimated Ĉ:", C, "secs"
print "Overhead:", k, "secs"
Estimated Ĉ: 0.00395911297141 secs
Overhead: -0.00852927855691 secs

Similarly variance increases quadratically, however we can see that it is much higher than BitCoin:

In [41]:
variances = map(np.var, sat_sample_times)
C2,C1,k = polyfit(sat_difficulties, variances, 2)
plt.plot(sat_difficulties, variances)
plt.plot(sat_difficulties, [C2*d*d + C1*d + k for d in sat_difficulties], '-r')
plt.title("Mining Time Variance (Quadratic)")
plt.show()
print "C₁:", C1, "C₂:", C2
print "Estimated Ĉ²:", C**2
C₁: 7.84860076556e-05 C₂: 1.35902456088e-05
Estimated Ĉ²: 1.56745755204e-05

As a result, our statistical hypothesis testing is also weaker:

In [42]:
data = []
for d, samples in zip(sat_difficulties[4::5], sat_sample_times[4::5]):
    cdf = lambda t: 1 - (1 - (1. / d))**(t / C)
    ks_statistic, p_value = scipy.stats.kstest(samples, cdf)
    data.append({"Difficulty D": d,
                 "KS Statistic": ks_statistic,
                 "P-Value": p_value})


pandas.DataFrame.from_records(data)
Out[42]:
Difficulty D KS Statistic P-Value
0 5 0.126027 0.076387
1 10 0.094064 0.320197
2 15 0.091186 0.356983
3 20 0.093278 0.329957
4 25 0.120947 0.098627
5 30 0.124660 0.081910
6 35 0.153303 0.016174
7 40 0.072371 0.669294
8 45 0.093874 0.322536

This poor behavior may just be an artifact that our sample miner is written in Python rather than in C. However, below we see some other statistics that lead us to believe that our miner is in general less statistically stable than BitCoin mining.

Number of Clauses

Our miner, like all SAT solvers, performs backtrack tree search. We can thus expect that its worst case performance is exponential in the clause length. Since we are giving it random input, Murphy's law dictates we should expect it to stick to its worst case performance, especially because it is written quite naïvely.

In [43]:
clause_lengths = range(10,100,10)
sat_clause_sample_times = [[time_sat_difficulty(1,l,5) for _ in range(100)] for l in clause_lengths]
In [44]:
plt.plot(clause_lengths, map(np.mean, sat_clause_sample_times))
C,k = polyfit(clause_lengths, map(lambda x: np.log(np.mean(x)), sat_clause_sample_times), 1)
plt.plot(clause_lengths, [np.exp(C*d + k) for d in clause_lengths], '-r')
plt.yscale('log')
plt.title("Average Mining Time (Exponential)")
plt.show()

Variance too scales exponentially with clause length:

In [45]:
plt.plot(clause_lengths, map(np.var, sat_clause_sample_times))
C,k = polyfit(clause_lengths, map(lambda x: np.log(np.var(x)), sat_clause_sample_times), 1)
plt.plot(clause_lengths, [np.exp(C*d + k) for d in clause_lengths], '-r')
plt.yscale('log')
plt.title("Mining Time Variance (Exponential)")
plt.show()

While our miner is naïve, we have clearly defined mining as a kind of exponential search. As a consequence, we can expect that using the number of clauses as a measure of difficulty for SATCoin will lead to greater variance in coin issuance than BitCoin. On the other hand, dynamic clause depth is where SAT research shines the most, since theoretical breakthroughs such as conflict clause learning are arguably what have enabled advances in SAT over the last 10 years.

Minimal Generator

We finally look at restrictions on the minimal generator as a difficulty measure. Monty Carlo statistics once again suggests an exponential distribution.

In [46]:
minimal_generators = range(100,1000,100)
sat_min_gen_sample_times = [[time_sat_difficulty(1,5,g) for _ in range(100)] for g in minimal_generators]
In [47]:
plt.plot(minimal_generators, map(np.mean, sat_min_gen_sample_times))
C,k = polyfit(minimal_generators, map(lambda x: np.log(np.mean(x)), sat_min_gen_sample_times), 1)
plt.plot(minimal_generators, [np.exp(C*d + k) for d in minimal_generators], '-r')
plt.yscale('log')
plt.title("Average Mining Time (Exponential)")
plt.show()
In [48]:
plt.plot(minimal_generators, map(np.var, sat_min_gen_sample_times))
C,k = polyfit(minimal_generators, map(lambda x: np.log(np.var(x)), sat_min_gen_sample_times), 1)
plt.plot(minimal_generators, [np.exp(C*d + k) for d in minimal_generators], '-r')
plt.yscale('log')
plt.title("Mining Time Variance (Exponential)")
plt.show()

Conclusions

We have presented a viable alternative to BitCoin, albeit with less simple statistical behavior. Even so, by design it can take advantage of BitCoin's conventional issuance protocol, since it has the same statistical behavior with respect to difficulty as BitCoin.

It is likely that given sufficient market incentive, special hardware will be divised to tackle any programming challenge, no matter how formidable. However, an engineering reality is that challenging computer science problems, such as SAT and Linear Programming solvers, have resisted such innovations to date, despite wide use in industry.

There are many refinements that can be made to the SAT solver presented here, which is solely presented as a proof of concept. Primarily, the innovations found in MiniSat, most notably conflict clause management, should be introduced.

Bibliography

Aloul, Fadi A., Karem A. Sakallah, and Igor L. Markov. “Efficient Symmetry Breaking for Boolean Satisfiability.” Computers, IEEE Transactions on 55, no. 5 (2006): 549–58. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1613836.
Cook, S. A. “The Complexity of Theorem-Proving Procedures.” In Proceedings of the Third Annual ACM Symposium on Theory of Computing, 151–58, 1971.
Davis, Martin, and Hilary Putnam. “A Computing Procedure for Quantification Theory.” J. ACM 7, no. 3 (July 1960): 201–15. doi:10.1145/321033.321034.
Dryja, Thaddeus. “Hashimoto: I/O Bound Proof of Work,” 2014. https://vaurum.com/hashimoto.pdf.
Duraira, V., and Priyank Kalla. “Exploiting Hypergraph Partitioning for Efficient Boolean Satisfiability.” In High-Level Design Validation and Test Workshop, 2004. Ninth IEEE International, 141–46. IEEE, 2004. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1431257.
Harrison, John. Handbook of Practical Logic and Automated Reasoning. Cambridge University Press New York, NY, USA, 2009.
King, Sunny. “Primecoin: Cryptocurrency with Prime Number Proof-of-Work,” July 17, 2014. http://primecoin.io/bin/primecoin-paper.pdf.
Marques-Silva, J.P., and K.A. Sakallah. “GRASP: A Search Algorithm for Propositional Satisfiability.” IEEE Transactions on Computers 48, no. 5 (May 1999): 506 –521. doi:10.1109/12.769433.
McDonald, Austin, and others. “Parallel WalkSAT with Clause Learning.” Accessed November 2, 2014. https://www.ml.cmu.edu/research/dap-papers/dap_mcdonald.pdf.
Moskewicz, Matthew W., Conor F. Madigan, Ying Zhao, Lintao Zhang, and Sharad Malik. “Chaff: Engineering an Efficient SAT Solver.” In Proceedings of the 38th Annual Design Automation Conference, 530–35. ACM, 2001. http://dl.acm.org/citation.cfm?id=379017.
Stålmarck, Gunnar M. N. “System for Determining Propositional Logic Theorems by Applying Values and Rules to Triplets That Are Generated from Boolean Formula,” January 4, 1994. http://www.google.com/patents/US5276897.
Stålmarck, Gunnar M. N., and Mårten Säflund. “Modeling and Verifying Systems and Software in Propositional Logic.” Safety of Computer Control Systems (SAFECOMP’90) 656 (1990): 31–36.
Taleb, Nassim Nicholas. The Black Swan: The Impact of the Highly Improbable. 2nd ed., Random trade pbk. ed. New York: Random House Trade Paperbacks, 2010.
Tseitin, GS. “On the Complexity of Derivations in the Propositional Calculus.” Studies in Mathematics and Mathematical Logic Part II (1968): 115–25.
Velev, Miroslav N., and Ping Gao. “Efficient SAT Techniques for Relative Encoding of Permutations with Constraints.” In AI 2009: Advances in Artificial Intelligence, edited by Ann Nicholson and Xiaodong Li, 517–27. Lecture Notes in Computer Science 5866. Springer Berlin Heidelberg, 2009. http://link.springer.com/chapter/10.1007/978-3-642-10439-8_52.
Weron, Rafał. “Levy-Stable Distributions Revisited: Tail Index> 2 Does Not Exclude the Levy-Stable Regime.” International Journal of Modern Physics C 12, no. 02 (February 2001): 209–23. doi:10.1142/S0129183101001614.