Taken from http://blog.wolfram.com/2013/06/26/is-there-any-point-to-the-12-times-table/

In [4]:

table = zeros((12, 12), dtype=int16)
for i in range(12):
    for j in range(12):
        table[i, j] = (i + 1) * (j + 1)
print table

[[  1   2   3   4   5   6   7   8   9  10  11  12]
 [  2   4   6   8  10  12  14  16  18  20  22  24]
 [  3   6   9  12  15  18  21  24  27  30  33  36]
 [  4   8  12  16  20  24  28  32  36  40  44  48]
 [  5  10  15  20  25  30  35  40  45  50  55  60]
 [  6  12  18  24  30  36  42  48  54  60  66  72]
 [  7  14  21  28  35  42  49  56  63  70  77  84]
 [  8  16  24  32  40  48  56  64  72  80  88  96]
 [  9  18  27  36  45  54  63  72  81  90  99 108]
 [ 10  20  30  40  50  60  70  80  90 100 110 120]
 [ 11  22  33  44  55  66  77  88  99 110 121 132]
 [ 12  24  36  48  60  72  84  96 108 120 132 144]]

The example approximation problem: 7203 * 6892.

In [6]:

print 7203 * 6892
print 7000 * 7000
print (7000 * 7000 - 7203 * 6892) / float(7203 * 6892) * 100

49643076
49000000
-1.29539918115

If I knew my 72 times table, I could have made it 7200 * 6900.

In [7]:

print 7203 * 6892
print 7200 * 6900
print (7200 * 6900 - 7203 * 6892) / float(7203 * 6892) * 100

49643076
49680000
0.074378952666

Let's build an approximation function that takes into account the lead digit we want to use. This is just the proof of concept.

In [12]:

def approximate(number, lead):
    return lead * 10 ** (round(log10(number / lead)))

In [13]:

approximate(12345, 9)    

Out[13]:

9000.0

Now, let's do the same thing but with a list of lead digits.

In [30]:

def approximate(number, possible_leads):
    approx = array([lead * 10 ** (round(log10(number / lead))) for lead in possible_leads])
    return approx[abs(number - approx).argmin()]

In [32]:

approximate(18345, [1, 2, 3, 4])

Out[32]:

20000.0

The approximate product is just the product of approximates.

In [33]:

def approximate_product(a, b, possible_leads):
    return approximate(a, possible_leads) * approximate(b, possible_leads)

From there follows the relative error.

In [34]:

def relative_error(a, b, possible_leads):
    return abs(approximate_product(a, b, possible_leads) / (a * b) - 1)

In [53]:

relative_error(545, 999, range(1, 11))

Out[53]:

0.081650457797246778

In [55]:

relative_error(549, 999, range(1, 11))

Out[55]:

0.088341529142986319

On to serious things: a random sampling of random numbers and calculating the mean error with 10's table and 12's table.

In [195]:

def mean_error_uniform(max_lead_digit, N=100000):
    return mean([relative_error(randint(1, 1e6), randint(1, 1e6), range(1, max_lead_digit + 1)) for i in xrange(N)])

In [196]:

mean_error_uniform(10)

Out[196]:

0.093972687794082327

In [197]:

mean_error_uniform(10, 10000)

Out[197]:

0.095496434040729439

In [198]:

mean_error_uniform(12)

Out[198]:

0.081577214232046474

In [199]:

mean_errors = [mean_error_uniform(i, 10000) for i in range(2, 21)]

In [200]:

plot(range(2, 21), mean_errors)
ylim((0, 0.5))
grid(True)

I notice that the routine is quite slow. Maybe I could make it faster with some fancy Numpy trick?

Does approximate support arrays?

In [75]:

approximate(ones((10, 1)), range(1, 4))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-75-41cdb1b69303> in <module>()
----> 1 approximate(ones((10, 1)), range(1, 4))

<ipython-input-30-95979ec72590> in approximate(number, possible_leads)
      1 def approximate(number, possible_leads):
----> 2     approx = array([lead * 10 ** (round(log10(number / lead))) for lead in possible_leads])
      3     return approx[abs(number - approx).argmin()]

TypeError: only length-1 arrays can be converted to Python scalars

No, it doesn't! So let's just use fewer trials in the functions...

In [201]:

mean_errors = [mean_error_uniform(i, N=10000) for i in range(2, 51)]

In [202]:

plot(range(2, 51), mean_errors)
ylim((0, 0.5))
grid(True)

Let's display the relative improvement for each value.

In [203]:

mean_errors = array(mean_errors)
plot(range(2, 50), (mean_errors[:-1] - mean_errors[1:]) / mean_errors[:-1])
ylim((0, 0.6))
grid(True)

Relative improvement in outcome, per extra fact memorized.

In [204]:

rng = arange(2, 50)
denom = map(lambda n: n ** 2 - (n - 1) ** 2, rng)
plot(rng, (mean_errors[:-1] - mean_errors[1:]) / mean_errors[:-1] / (denom))
ylim((0, 0.05))
grid(True)

Simulating with Benford's law: $\forall d \in \[1, 9\] P(d)=\log_{10}(1 + \frac{1}{d})$ Steps to follow:

Compute cumulative function of probabilities $G$
for each numbers I want to draw, draw a random number $r$ following $U(0, 1)$, the digit to take is the smallest $a$ such that $G(a) > r$

In [99]:

probs = log10(1 + 1/arange(1, 10, dtype=float32))
G = cumsum(probs)
G

Out[99]:

array([ 0.30103001,  0.47712126,  0.60206002,  0.69897002,  0.77815127,
        0.84509802,  0.90309   ,  0.95424253,  1.        ], dtype=float32)

In [120]:

r = rand(1)
r

Out[120]:

array([ 0.12225127])

In [121]:

(r < G).argmax()

Out[121]:

In [168]:

probs = log10(1 + 1/arange(1, 10, dtype=float32))
G = cumsum(probs)
def random_benford_number():
   return (rand(1) < G).argmax() + 1

In [169]:

random_benford_number()

Out[169]:

In [174]:

benford_numbers = [random_benford_number() for i in range(1000000)]
bar(arange(1, 10), [count_nonzero(benford_numbers == i) for i in arange(1, 10)], label='Empirical')
plot(arange(1, 10) , sum(benford_numbers) * log10(2) * log10(1 + 1/arange(1, 10, dtype=float32)), 'or', label="Benford's law")
legend()

Out[174]:

<matplotlib.legend.Legend at 0xb3558f0>

In [170]:

%timeit random_benford_number()

100000 loops, best of 3: 6.44 us per loop

In [166]:

%timeit randint(1, 1e6)

10000000 loops, best of 3: 114 ns per loop

In [171]:

6.44 / 0.114

Out[171]:

56.49122807017544

In [175]:

def random_integer_benford(n):
    r = randint(1, n)
    return float("".join([str(random_benford_number()) for digit in str(r)]))
        

In [189]:

random_integer_benford(1000)

Out[189]:

814.0

Finally, we can rerun the analysis on the data with "real numbers".

In [205]:

def mean_error_benford(max_lead_digit, N=100000):
    return mean([relative_error(random_integer_benford(1e6), random_integer_benford(1e6), range(1, max_lead_digit + 1)) for i in xrange(N)])

In [206]:

mean_error_benford(10, N=10000)

Out[206]:

0.14795717887345317

In [207]:

mean_errors_benford = [mean_error_benford(i, N=10000) for i in range(2, 21)]

In [212]:

mean_errors_benford = array(mean_errors_benford)

In [217]:

rng = arange(2, 20)
plot(rng + 1, (mean_errors_benford[:-1] - mean_errors_benford[1:]) / mean_errors_benford[:-1])
grid(True)
ylim((0, 0.5))

Out[217]:

(0, 0.5)

In [218]:

rng = arange(2, 20)
denom = map(lambda n: n ** 2 - (n - 1) ** 2, rng)
plot(rng, (mean_errors_benford[:-1] - mean_errors_benford[1:]) / mean_errors_benford[:-1] / (denom))
ylim((0, 0.05))
grid(True)

The last result is quite different from what is shown on the original link but I'd like to finish here. My conclusions are as follows:

Very interesting blog article
IPython Notebook allows replicating most results
I learned about (and how to simulate) Benford's law
the function I wrote are sloooooooow and it would be lovely to learn (in a more general way) how to make my own functions fast by building more easily on Numpy functions
I often run into confusions because I'm using both Python lists and Numpy arrays. I believe I should try to use only the latter.