Generatate Zipf distributed data

Based on

  • David J. Schwab, Ilya Nemenman, Pankaj Mehta: Zipf's law and criticality in multivariate data without fine-tuning
In [1]:
%pylab inline
from numba import jit, vectorize, float64, int32
Populating the interactive namespace from numpy and matplotlib

Parameter values

In [2]:
N = 26        # number of spins (bits) 
C = 2**(N-4)  # number of draws
h = 2**randn(N)  # local magnetic field at each bit ($h_i$ in the paper)

Pack bit vector into an integer for efficient frequency counting.

In [3]:
def pack(xs):
    r = 0
    for x in xs:
        r <<= 1
        r |= x
    return r

Sample spins at a given inverse temperature.

In [4]:
def sample(beta):
    p = exp(H)/(2*cosh(H))
    return pack(rand(N) < p)

Sample spins at random inverse temperatures and tally configuration frequencies.

In [5]:
%time d = bincount(sample(randn(C)))
CPU times: user 8.61 s, sys: 523 ms, total: 9.14 s
Wall time: 9.14 s

Rank-frequency log-log plot.

In [6]:
loglog(sorted(d[d>0], reverse=True),basex=2,basey=2)
[<matplotlib.lines.Line2D at 0x7f61348a8090>]