The Statistics of Buterin's Random Circuit Protocol

Matthew Wampler-Doty

Abstract. In this paper, the statistical behavior of of a mining protocol due to Vitalik Buterin (2014), dubbed Random Circuit, is investigated. It is revealed to follow a form of the geometric distribution. Several means of parameter estimation are demonstrated, and a statistical model for pooled mining is presented. Finally, a control mechanism for both average pooled mining time and variance is suggested. This opens the doorway to subsequent study of control protocols based on this mechanism.

Overview

It is well known that BitCoin was designed so with a simple control mechanism for keeping issuance at a constant expected rate, based on a difficulty $D$. In the BitCoin design, a single miner can expect mining time to follow a geometric distribution. In fact, any pool of miners can also be modeled as following a geometric distribution in their mining time.

This paper presents a generalization of the BitCoin mining protocol intended to provide relief to the issue of mining time variance. This is accomplished by looking at mining protocols governed by two difficulty variables, rather than just one. A generalization of the geometric distribution describing BitCoin is presented, and shown to faithfully capture the behavior of the new proposed mining protocols. Finally, we show that the new mining protocols provide a simple mechanism for controlling mining time variance to a desired value.

Mining Protocol

In this section we present Vitalik Buterin's (2014) Random Circuit (RC) mining protocol. The system is admittedly rather complicated; this is because its original intent was to thwart an easy implementation in an ASIC. We have modified it slightly to reuse our bitcoin_hash helper function where appropriate, and to have the same two parameter difficulty API we provided for IterCoin. The idea behind RC is to run a random O(S) computation requiring O(S) space.

In [1]:
from random import randint
def bitcoin_hash(block_header,nonce):
    import hashlib
    return int('0x' + hashlib.sha256(hashlib.sha256(str(block_header + nonce)).digest()).hexdigest(), 16)

max_val = 2**256

class SeedObj():
    "A class for managing a random number state, using a linear congruential generator"
    def __init__(self, seed):
        self.seed = seed
        self.a = 3**160
        self.c = 7**80
        self.n = 2**256 - 4294968273 # secp256k1n, why not        

    def rand(self, r):
        self.seed = (self.seed * self.a + self.c) % self.n
        return self.seed % r

def encode_int(x):
    "A helper function to provide a standard 64-bit big-endian encoding for numbers"
    o = ''
    for _ in range(8):
        o = chr(x % 256) + o
        x //= 256 
    return o
    
ops = {
    "plus": lambda x,y: (x + y) % 2**64,
    "times": lambda x,y: (x * y) % 2**64,
    "xor": lambda x,y: x^y,
    "and": lambda x,y: x&y,
    "or": lambda x,y: x|y,
    "not": lambda x,y: 2**64-1-x,
    "nxor": lambda x,y: (2**64-1-x) ^ y,
    "rshift": lambda x,y: x >> (y % 64)
}

def gentape(W, H, SEED):
    "Generate a random tape of binary instructions"
    s = SeedObj(SEED)
    tape = []
    for i in range(H):
        op = ops.keys()[s.rand(len(ops))]
        r = s.rand(100)
        if r < 20 and i > 20:
            x1 = tape[-r]["x1"]
        else:
            x1 = s.rand(W)
        x2 = s.rand(W)
        tape.append({"op": op, "x1": x1, "x2": x2})
    return tape

def runtape(TAPE, SEED, S):
    import hashlib
    s = SeedObj(SEED)
    mem = [s.rand(2**64) for _ in range(S)]
    dir = 1
    for i in range(S // 100):
        for j in (range(100) if dir == 1 else range(99, -1, -1)):
            t = TAPE[i * 100 + j]
            mem[t["x1"]] = ops[t["op"]](mem[t["x1"]], mem[t["x2"]])
        if 2 < mem[t["x1"]] % 37 < 9:
            dir *= -1
    return int('0x' + hashlib.sha256(''.join(encode_int(x) for x in mem)).hexdigest(), 16)

def PoWVerify(block_header, nonce, S, D):
    tape = gentape(S, S, bitcoin_hash(block_header, nonce))
    h = runtape(tape, bitcoin_hash(block_header, nonce), S)
    return h < max_val / D

def rc_mine(block_header, S, D):
    nonce = 1
    while not PoWVerify(block_header, nonce, S, D):
        nonce += 1
    return nonce

# Example Run
rc_mine(randint(0,max_val), 10, 10)
Out[1]:
5

Statistical Behavior

Geometric Random Distributions

The statistical behavior of our two proposed two parameter difficulty systems can be understood using a geometric random distribution. Let $T$ be a random variable describing the mining time for one of our two algorithms. In either case, we may model it as following a CDF:

$$\mathbb{P}[T < t] \approx 1 - (1 - 1/D)^{t / (\hat{C} S)}$$

Here $D$ reflects the conventional difficulty D parameter due to Nakamoto, $S$ reflects the number of steps S in itercoin and RC, and $\hat{C}$ represents the amount of time to compute a single step; we will call it the compute power.

We can think of $T$ as having support on a discrete set of values $\hat{C} S \mathbb{Z}^+ = \{\hat{C} S, 2 \hat{C} S, 3 \hat{C} S, \ldots\}$. This reflects the observation that the mining process involves a discrete number of trials: either the miner successfully mines on their first try, or their second try, etc. This allows us to recover a conventional geometric distribution PDF:

$$\mathbb{P}\left[ T / (\hat{C} S) = k\right] = \left(1 - \frac{1}{D}\right)^k \frac{1}{D}$$

Median

The first statistic we investigate is the median. This is selected because it is well known that the geometric distribution converges to an exponential distribution as $D$ tends to infinity, and exponential distributions often have extreme outliars. Since the median tends to be robust to outliars, it should be expected to be more stable even for large values of $D$. Given our model, we can calculate the median for $T$ to be:

$$\begin{eqnarray*} \operatorname{Median}[T] & \approx & -\frac{\hat{C}S}{\log_2(1-1/D)} \end{eqnarray*}$$

This suggests that the median should increase linearly with $S$. This is straightforward to test with Monte Carlo; we first give timing programs so we can test our mining protocols in vivo:

In [2]:
from timeit import timeit

def time_rc_mine(S,D):
    return timeit('rc_mine(randint(0,max_val),  %d, %d)' % (S,D), 
                  'from random import randint ; from __main__ import rc_mine, max_val', 
                  number=1)

We begin with looking at the behavior of Random Circuit's median mining time, keeping $D=2$ constant while varying $S$. We expect median mining time to increase linearly with $S$. Moreover, since $D$ is known, we can easily recover an estimate of $\hat{C}$:

In [3]:
rc_S_difficulties = range(10,370,50)
rc_S_samples = [[time_rc_mine(S,2) for _ in range(1000)] for S in rc_S_difficulties]
In [4]:
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
In [5]:
import numpy as np
from pylab import polyfit
import matplotlib.pyplot as plt
rc_S_medians = map(np.median, rc_S_samples)
rc_S_medians_A, rc_S_medians_k = \
    polyfit(rc_S_difficulties, rc_S_medians, 1)
plt.plot(rc_S_difficulties, rc_S_medians)
plt.plot(rc_S_difficulties, 
         [rc_S_medians_A*S + rc_S_medians_k 
          for S in rc_S_difficulties], '-r')
plt.title('$S$ vs. Median Mining Time (Linear)')
plt.ylabel('Seconds')
plt.xlabel('$S$')
plt.show()
rc_S_medians_C = - rc_S_medians_A * np.log2(1-1./2)
print "Estimated Ĉ:", rc_S_medians_C
Estimated Ĉ: 2.22365345274e-05

The behavior with respect to $D$ is more complicated. Depending on how large $D$ is, different patterns of behavior emerge. If $D$ is large, then $\log_2(1-1/D) \approx 0$, making it hard to recover $\hat{C}$ from median estimates. A reasonable limit approximation is available, however. Let $p=\frac{1}{D}$, this means that as $D$ tends to infinity then $p$ tends to 0. The Laurent series expansion of $-\frac{1}{\log_2(1-p)}$ around $p = 0$ is given by:

$$ \begin{eqnarray*} -\frac{1}{\log_2(1-p)} & = & -\frac{\log(2)}{2} + \frac{\log(2)}{p} + O(p) \\ & \approx & -\frac{\log(2)}{2} + \log(2) D \end{eqnarray*} $$

Hence for large $D$, we should expect $\operatorname{Median}[T] \propto \log(2) \hat{C} S D$. This implies empirically for a collection of samples with varying values of $D$ that $\hat{C}$ can be recovered via a simple linear regression.

In [6]:
rc_D_difficulties = range(1,502,50)
rc_D_samples = [[time_rc_mine(1,D) for _ in range(1000)] for D in rc_D_difficulties]
In [7]:
rc_D_medians = map(np.median, rc_D_samples)
plt.plot(rc_D_difficulties, rc_D_medians)
rc_D_medians_A, rc_D_medians_k = \
    polyfit(rc_D_difficulties, rc_D_medians, 1)
plt.plot(rc_D_difficulties, 
         [rc_D_medians_A*S + rc_D_medians_k 
          for S in rc_D_difficulties], '-r')
plt.title('$D$ vs. Median Mining Time (Linear)')
plt.ylabel('Seconds')
plt.xlabel('$D$')
plt.show()
rc_D_medians_C = rc_D_medians_A / np.log(2)
print "Estimated Ĉ:"
print "   using S medians:", rc_S_medians_C
print "   using D medians:", rc_D_medians_C
Estimated Ĉ:
   using S medians: 2.22365345274e-05
   using D medians: 3.09285469223e-05

As we can see, the results of both estimations of $\hat{C}$ are similar.

Mean and Variance

The next two statistics we investigate are the mean and variance. Given our model, these are given as follows:

$$\begin{eqnarray*} \mathbb{E}[T] & = & \hat{C} S D \end{eqnarray*}$$

$$\begin{eqnarray*} \operatorname{Var}[T] & = & \hat{C}^2 S^2 (D^2 - D) \end{eqnarray*}$$

The mean $\mathbb{E}[T]$ is a linear function of both $S$ and $D$, respectively, and the variance $\operatorname{Var}[T]$ is a quadratic function of both $S$ and $D$. In every case another estimate of $\hat{C}$ can be recovered using regresion.

In [8]:
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(10,5))
fig.subplots_adjust(wspace=0.3, hspace=0.4)

rc_S_means = map(np.mean, rc_S_samples)
rc_D_means = map(np.mean, rc_D_samples)
ax1.plot(rc_S_difficulties, rc_S_means)
ax2.plot(rc_D_difficulties, rc_D_means)

rc_S_vars = map(np.var, rc_S_samples)
rc_D_vars = map(np.var, rc_D_samples)
ax3.plot(rc_S_difficulties, rc_S_vars)
ax4.plot(rc_D_difficulties, rc_D_vars)

rc_S_means_A, rc_S_means_k = \
    polyfit(rc_S_difficulties, rc_S_means, 1)
ax1.plot(rc_S_difficulties, 
         [rc_S_means_A*S + rc_S_means_k 
          for S in rc_S_difficulties], '-r')
rc_S_means_C = rc_S_means_A / 2  # Recall that we set D=2 for these samples

rc_D_means_C, rc_D_means_k = \
    polyfit(rc_D_difficulties, rc_D_means, 1)
ax2.plot(rc_D_difficulties, 
         [rc_D_means_C*D + rc_D_means_k 
          for D in rc_D_difficulties], '-r')

rc_S_vars_A, rc_S_vars_B, rc_S_vars_k = \
    polyfit(rc_S_difficulties, rc_S_vars, 2)
ax3.plot(rc_S_difficulties, 
         [rc_S_vars_A*S*S + rc_S_vars_B*S + rc_S_vars_k 
          for S in rc_S_difficulties], '-r')
rc_S_vars_C = np.sqrt(rc_S_vars_A / 2)  # Recall that we set D=2 for these samples

rc_D_vars_A, rc_D_vars_B, rc_D_vars_k = \
    polyfit(rc_D_difficulties, rc_D_vars, 2)
ax4.plot(rc_D_difficulties, 
         [rc_D_vars_A*D*D + rc_D_vars_B*D + rc_D_vars_k 
          for D in rc_D_difficulties], '-r')
rc_D_vars_C = np.sqrt(rc_D_vars_A)

ax1.set_title('$S$ vs. Mean Mining Time (Linear)')
ax1.set_ylabel('Seconds')
ax1.set_xlabel('$S$')
ax2.set_title('$D$ vs. Mean Mining Time (Linear)')
ax2.set_ylabel('Seconds')
ax2.set_xlabel('$D$')
ax3.set_title('$S$ vs. Mining Time Variance (Quadratic)')
ax3.set_ylabel('Seconds')
ax3.set_xlabel('$S$')
ax4.set_title('$D$ vs. Mining Time Variance (Quadratic)')
ax4.set_ylabel('Seconds')
ax4.set_xlabel('$D$')
plt.show()

print "Estimated Ĉ:"
print "   using S medians:", rc_S_medians_C
print "   using S means:", rc_S_means_C
print "   using S variance:", rc_S_vars_C
print "   using D medians:", rc_D_medians_C
print "   using D means:", rc_D_means_C
print "   using D variance:", rc_D_vars_C
Estimated Ĉ:
   using S medians: 2.22365345274e-05
   using S means: 1.14086116496e-05
   using S variance: 1.30566190394e-05
   using D medians: 3.09285469223e-05
   using D means: 3.07988837849e-05
   using D variance: 3.17806087131e-05

Statistical Hypothesis Testing

As we can see from above there are a variety of ways to estimate $\hat{C}$. From our estimate we can perform statistical hypothesis testing using the closed form cummulative distribution function and the Kolmogorov-Smirnov (KS) test, as in Wampler-Doty (2014).

In [9]:
import scipy.stats
import pandas

C = rc_S_medians_C
S_data = []
for S, samples in zip(rc_S_difficulties, rc_S_samples):
    cdf = lambda t: 1 - (1 - (1. / 2))**(t / (S * C))
    ks_statistic, p_value = scipy.stats.kstest(samples, cdf)
    S_data.append({"Difficulty D": 2,
                   "Difficulty S": S,
                   "KS Statistic": ks_statistic,
                   "P-Value": p_value})


pandas.DataFrame.from_records(S_data)
Out[9]:
Difficulty D Difficulty S KS Statistic P-Value
0 2 10 0.377975 0
1 2 60 0.313981 0
2 2 110 0.314904 0
3 2 160 0.304511 0
4 2 210 0.302906 0
5 2 260 0.296714 0
6 2 310 0.298224 0
7 2 360 0.295487 0
In [10]:
D_data = []
for D, samples in zip(rc_D_difficulties, rc_D_samples):
    cdf = lambda t: 1 - (1 - (1. / D))**(t / C)
    ks_statistic, p_value = scipy.stats.kstest(samples, cdf)
    D_data.append({"Difficulty D": D,
                   "Difficulty S": 1,
                   "KS Statistic": ks_statistic,
                   "P-Value": p_value})
pandas.DataFrame.from_records(D_data)
Out[10]:
Difficulty D Difficulty S KS Statistic P-Value
0 1 1 1.000000 0.000000e+00
1 51 1 0.159877 0.000000e+00
2 101 1 0.165789 0.000000e+00
3 151 1 0.137124 0.000000e+00
4 201 1 0.130921 1.998401e-15
5 251 1 0.147080 0.000000e+00
6 301 1 0.108973 8.504752e-11
7 351 1 0.148628 0.000000e+00
8 401 1 0.110684 3.990630e-11
9 451 1 0.119436 6.905587e-13
10 501 1 0.147076 0.000000e+00

Lower P-values imply that the geometric distribution model is more likely to be correct.

Mining Pool Statistics

We next turn to the statistics of mining pools. The following result effectively characterizes a mining pool as behaving like a single miner:

Lemma: Fix $S$ and $D$. Let $\{T_i : i \in \mathcal{I}\}$ be a pool of miners, each with geometrically distributed mining times, and each with a respective compute power $\hat{C}_i$. Then $T_\min = \min_{i \in \mathcal{I}} T_i$ follows a geometric random distribution with CDF:

$$\mathbb{P}[T_\min < t] \approx 1 - (1 - 1/D)^{x / (S \hat{C}_{pooled})}$$

With a pooled compute power $\hat{C}_{pooled} = \left({\sum_{i\in\mathcal{I}}\hat{C}_i^{-1}}\right)^{-1}$

Proof. We prove the result in the special case of just two miners, since the arbitrary case is a straightforward generalization. Let their compute powers be $\hat{C}_1$ and $\hat{C}_2$, respectively. We have that $$\mathbb{P}[T_\min < t] = 1 - \mathbb{P}[T_1 \geq t\ \&\ T_2 \geq t]$$

Assuming the two times are independent we have $$\begin{eqnarray*} \mathbb{P}[T_1 \geq t\ \&\ T_2 \geq t] & = & \mathbb{P}[T_1 \geq t]\mathbb{P}[ T_2 \geq t] \\ & = & (1 - 1/D)^{x / (S \hat{C}_{1})}(1 - 1/D)^{x / (S \hat{C}_{2})} \\ & = & (1 - 1/D)^{(x / S) (1/\hat{C}_{1} + 1/\hat{C}_{2})} \\ \end{eqnarray*}$$

We want $1/\hat{C}_{pooled} = 1/\hat{C}_{1} + 1/\hat{C}_{2}$, hence $$\hat{C}_{pooled} = (\hat{C}_{1}^{-1} + \hat{C}_{2}^{-1})^{-1}$$

$\Box$

It is helpful to look at special cases in understanding the above formula. Imagine that the pool consisted of $n$ miners each taking exactly 1 $\mu$s to mine a coin at $D = S = 1$. Then the pool would effectively be able to mine a coin at $1/n$ $\mu$s. Of course, the result holds even in extremely heterogeneous pools, it is just harder to understand the effective compute power with a simple closed form.

One problem the approximation presented in the above lemma is that it neglects the fact that $\mathbb{P}[T < \min_{i\in\mathcal{I}}S \hat{C}_i] = 0$. Fundamentally, the compute pool cannot mine any faster than its fastest miner. In the classic BitCoin model this didn't matter, since effectively $S=1$ and $\min_{i\in\mathcal{I}}\hat{C}_i$ can be taken to be small. However, since Random Circuit has the option of enforcing $S$ to be arbitrarily large, this factor must be taken into account.

Towards Variance Control

Together, mean and variance give rise to two equations with two free variables. This suggests an alternative approach to BitCoin's difficulty model, which relies entirely on controlling $\mathbb{E}[T]$. Random Circuit has the possibility of controlling both $\mathbb{E}[T_{pooled}]$ and $\operatorname{Var}[T_{pooled}]$.

Let $\mu$ be the target average mining time, and let $\sigma^2$ be the target mining variance. Solving the mean and average equations above gives:

$$\begin{eqnarray*} D & = & \frac{\mu^2}{\mu^2 - \sigma^2} \\ S & = & \frac{\mu^2 - \sigma^2}{\hat{C}_{pooled} \mu} \end{eqnarray*}$$

This is somewhat intuitive; demanding $\sigma = 0$ requires that $D = 1$, which means that the miner always succeeds on their first try and mining time is completely deterministic. By the same token if mining is completely deterministic then we can still achieve our expected mining time target by just calculating how many steps it would take to run out the clock, which is given by $\mu / \hat{C}$.

The chief problem with this proposal is that for small values of $D$, this invalidates the exponential model that Nakamoto appeals to in his classic argument for the safety of the block chain protocol. Moreover, this likely invalidates BitCoin's technique of estimating $\hat{C}_{pooled}$ using an average.

Conclusions

Random Circuit gives a well behaved statistical model, and has the novelty of offering not one but two parameters for difficulty control. More research is required for determining how Random Circuit's second difficulty parameter $S$ can be leveraged effectively in a rate control mechanism that does not compromise blockchain security. Preliminary results appear promising.