You Could Have Invented Fast Inverse Roots Using $8^{th}$ Grade Math.¶

Algebra, "Music Theory", and Logarithms Oh My!

Press the Down button to see the motivation for this deck.

What's the point?¶

What's the Fast Inverse Root Method you ask?¶

This is actual code used in Quake and other early 3D games to help us calculate $\frac{1}{\sqrt{x}}$, seriously

float rsqrt(float x) {
  long i  = * ( long * ) &x;                  // floating point bit hack
  i  = 0x5f3759df - ( i >> 1 );               // what the fuck? 
  return * ( float * ) &i;
}```

Supposedly, $$ \textsf{rsqrt}(x) \approx \frac{1}{\sqrt{x}} $$ and the secret is unlocked behind this impenetrable magical constant 0x5f3759df.

Does this actually work?¶

In [200]:

%matplotlib inline

from struct import pack, unpack
import numpy as np
import matplotlib.pyplot as plt

@np.vectorize
def sharp(x):
    return unpack('I', pack('f', x))[0]

@np.vectorize
def flat(y):
    return unpack('f', pack('I', int(y) & 0xffffffff))[0]

star_long_star_amp = sharp;
star_float_star_amp = flat;

hide_code_in_slideshow();

[Double Click for Code]

In [73]:

@np.vectorize
def rsqrt(x):                      # float rsqrt(float x) {
    i = star_long_star_amp(x);     #   long i  = * ( long * ) &x;
    i = 0x5f3759df - ( i >> 1 );   #   i = 0x5f3759df - ( i >> 1 );
    return star_float_star_amp(i); #   return * ( float * ) &i;
                                   # }

In [74]:

# Construct a plot
fig = plt.figure(figsize=(16,8));
ax = plt.axes();

# Plot the approximation and the actual inverse sqrt function
x = np.linspace(1, 50, 5000);
approximation, = ax.plot(x, rsqrt(x))
actual, = ax.plot(x, 1/np.sqrt(x))

fig.legend(handles=[approximation, actual], labels=[r'qsqrt(x)', r'$\frac{1}{\sqrt{x}}$'], fontsize=20);
fig.suptitle(r"$\frac{1}{\sqrt{x}}$ versus qsqrt(x)", fontsize=26);

hide_code_in_slideshow()

[Double Click for Code]

Close Enough!¶

So what's up with the `0x5f3759df`¶

Where did this thing come from?

It's alien technology long lost to our generation.

It's super complicated and requires some advanced machinery from mathematics.

It's just plain dark magic that only wizards wield.

Can we open up Pandora's Box and peer inside without going crazy?

As it turns out, yes we can¶

In fact, I believe that the idea behind this technique is so easy that an $8^{th}$ grader can figure it all out.

Notation¶

Recall that the fast inverse square root method looks like

long i  = * ( long * ) &x;                  
i  = 0x5f3759df - ( i >> 1 );               
return * ( float * ) &i;```

*(long*)& takes a float and outputs a long
*(float*)& takes a long and outputs a float

Let's take some inspiration from musical notation and name these two operations the sharp ($\cdot^\sharp: \textsf{float} \to \textsf{long}$) and the flat ($\cdot^\flat: \textsf{long} \to \textsf{float}$) operators.

$$ \textsf{rsqrt}(x) = \left(\textsf{0x5f3759df} - \frac{1}{2} x^\sharp\right)^\flat $$

In a way, you can think of a float as a "halftone" up from a long.

However, the anology isn't quite "right":

Taking a note a half-tone up or down still gives you a note. We give you a completely different type!
In order to bring $F^\sharp$ down to $F$, you would use the natural ($\natural$) operator.

But whatever, it's close enough and it looks fancy 😉

Do Snakes Dream of EDM?¶

What would this look like in Python?

In [114]:

from struct import pack, unpack

to_long    = lambda hole: unpack('i', hole)[0] # y = (long*) x
to_float   = lambda hole: unpack('f', hole)[0] # y = (float*) x

from_long  = lambda hole: pack('i', int(hole) % 0x80000000) # long*  y = &x
from_float = lambda hole: pack('f', float(hole))            # float* y = &x

hide_code_in_slideshow()

[Double Click for Code]

In [115]:

@np.vectorize
def sharp(x):
    return to_long(from_float(x))

@np.vectorize
def flat(y):
    return to_float(from_long(y))

Here, $$\begin{align*} \textsf{sharp}(x) = x^\sharp &= *(\textsf{long}*) \textsf{&} x \\ \textsf{flat}(y) = y^\flat &= *(\textrm{float}*) \textsf{&} y \end{align*} $$

Don't worry too much about the implementation details.

Music Theory¶

What do we know about $\sharp$ and $\flat$?

Inverse Laws:$$

\left(x^\sharp\right)^\flat = x = \left(x^\flat\right)^\sharp $$ _{e.g. If you raise a number a half-tone and then lower it a half-tone, you get the same number back.}

Anihilator Laws:

$$ 0^\sharp = 0 = 0^\flat $$

There are a few other algebraic properties of $\sharp$ and $\flat$, but unfortunately, they're pretty unstructured. In particular, $\left(x^\sharp + y^\sharp\right)^\flat \ne x + y$.

In [174]:

int( flat(sharp(1) + sharp(1)) ) # 1 + 1 is ...

Out[174]:

170141183460469231731687303715884105728

A Tale of two Variables¶

Let's forget about the magic constant 𝟢𝗑𝟧𝖿𝟥𝟩𝟧𝟫𝖽𝖿 for a moment.

There are actually two magical constants in rsqrt, the other constant is the $-\frac{1}{2}$.
- Hey, $\frac{1}{\sqrt{x}}$ is just $x^{-\frac{1}{2}}$, maybe they are related!

Let's look at what happens when we change these two constants.

$$ \textsf{foobar}_{M,C}(x) = \left(M + C \cdot x^\sharp\right)^\flat $$

Foo, meet Bar¶

$$ \textsf{foobar}_{M,C}(x) = \left(M + C \cdot x^\sharp\right)^\flat $$

In [117]:

def foobar(M, C):
    return np.vectorize(lambda x: flat(M + C * sharp(x)))

# rsqrt(x) is instantiated with M = 0x5f3759df and C = -1/2
rsqrt = foobar(0x5f3759df, -1.0/2.0)

Recall that the log-log plot for $y = M \cdot x^C$ is linear because

$$\begin{align*} \log(y) &= \log\left(M x^C\right) \\ &= \log(M) + C \cdot \log(x) \end{align*}$$

where $\log(y)$ varies linearly with $\log(x)$ with slope $C$.

What does foobar looks like under the log-log scale?

In [160]:

import matplotlib
matplotlib.rcParams['text.usetex'] = False
matplotlib.rcParams['text.latex.unicode'] = False

x = np.linspace(1, 1000, 5000)
allM = (1 << 26, 1 << 28, 0x5f3759df)
properties = {
    (0, 0): {'M': allM, 'C': -2},
    (1, 0): {'M': allM, 'C': 8},
    (0, 1): {'M': allM, 'C': 0.3},
    (1, 1): {'M': allM, 'C': -0.6},
}
fig, axarr = plt.subplots(2, 2, figsize=(14,8));
for key, property in properties.items():
    C = property['C']
    axarr[key].set_ylim(1e-39, 1e41)
    handle, = axarr[key].loglog(x, x ** C, linestyle='dotted');
    handles = [handle]
    for M in property['M']:
        baz = foobar(M, C)
        kwargs = {'ls' : 'dashed'} if M == 0x5f3759df else {}
        handle, = axarr[key].loglog(x, np.abs(baz(x)), **kwargs)
        handles.append(handle)
    axarr[key].set_title(r'For slope C = $%s$, ${\rm foobar}_{M,%s}(x)$' % (C, C))
    axarr[key].legend(
        handles, 
        [
            r'$x^{%s}$' % C, 
            r'$M = 2^{26}$', 
            r'$M = 2^{28}$',
            r'$M = {\rm 0x5f3759df}$'
        ], loc=4)

hide_code_in_slideshow()

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/lib/function_base.py:2276: RuntimeWarning: invalid value encountered in ? (vectorized)
  outputs = ufunc(*inputs)

[Double Click for Code]

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/scale.py:93: RuntimeWarning: invalid value encountered in less_equal
  mask = a <= 0.0

Notice how as $C$ gets closer to $-\frac{1}{2}$, the 0x5f3759df line also gets closer to $x^C$.

What do you see?¶

All the slopes are parallel to each other.

Honestly, it's hard to grok anything off of a series of pictures, so let's invoke the ancient wisdom of Python and look at a video of this instead.

In [171]:

from IPython.display import HTML
from matplotlib import animation
animation.Animation._repr_html_ = lambda anim: anim.to_html5_video()

x = np.linspace(1, 1000, 5000)
allM = (1 << 26, 1 << 28, 0x5f3759df)
fig = plt.figure(figsize=(14,8))
ax = plt.axes(ylim=(1e-39, 1e41))

def plotSomeMagic(C, fig, ax, handles=None):
    if not handles:
        handle, = ax.loglog(x, x ** C, linestyle='dotted');
        handles = [handle]
        for M in allM:
            baz = foobar(M, C)
            kwargs = {'ls' : 'dashed'} if M == 0x5f3759df else {}
            handle, = ax.loglog(x, np.abs(baz(x)), **kwargs)
            handles.append(handle)
    else:
        handles[0].set_data(x, x ** C)
        baz = foobar(allM[0], C)
        handles[1].set_data(x, np.abs(baz(x)))
        baz = foobar(allM[1], C)
        handles[2].set_data(x, np.abs(baz(x)))
        baz = foobar(allM[2], C)
        handles[3].set_data(x, np.abs(baz(x)))
    ax.set_title(r'For slope C = $%s$, ${\rm foobar}_{M,%s}(x)$' % (C, C))
    ax.legend(
        handles, 
        [
            r'$x^{%s}$' % C, 
            r'$M = 2^{26}$', 
            r'$M = 2^{28}$',
            r'$M = {\rm 0x5f3759df}$'
        ], loc=4)
    return tuple(handles)

handles = plotSomeMagic(0, fig, ax)

# initialization function: plot the background of each frame
def init():
    return plotSomeMagic(1, fig, ax, handles)

# animation function.  This is called sequentially
def animate(i):
    return plotSomeMagic(i, fig, ax, handles)

hide_code_in_slideshow()

video = animation.FuncAnimation(fig, animate, init_func=init, frames=np.arange(-2,8,0.10), interval=100, blit=True)

plt.close();

video

[Double Click for Code]

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/lib/function_base.py:2276: RuntimeWarning: invalid value encountered in ? (vectorized)
  outputs = ufunc(*inputs)
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/matplotlib/scale.py:93: RuntimeWarning: invalid value encountered in less_equal
  mask = a <= 0.0

Out[171]:

Graphing Calculator Woes¶

This clearly suggests that all of the lines have the same slopes since they are all parallel.

We only varied the $C$ with time, which changes the slopes of the lines.

We can also definitely see the lines shift vertically over time as well, with the exception of $x^C$

Also, the distances between the solid lines do not change even though their slopes do.

Throwback to the $8^{th}$ Grade¶

If you think back to Algebra 1, you should remember that the equation for a line is just

$$ w = a + b \cdot z $$

Here, we can substitute $w = \log(y)$, $z = \log(x)$, and $b = C$ to get it log-log equation:

$$ \log(y) = a + C \cdot \log(x) $$

In the case of $y = 1 \cdot x^C$, the corresponding log-log equation is

$$ \begin{align*} \log(y) &= \log\left(1 \cdot x^C\right) \\ &= \log(1) + C \cdot \log(x). \end{align*} $$

Since all of the other lines are parallel to the $x^C$ line, we also know, for some unknown constant $\alpha$, that

$$ \log\left(\textsf{foobar}_{M,C}(x)\right) = \alpha + C \cdot \log(x)\\ $$

This confirms our earlier suspicion that the slope $C$ is related to the exponent.

In fact, since the distances between the lines do not change as we vary the slope $C$ in the video, it's clear that the unknown constant $\alpha$ will only depend on the value of our magical constant $M$:

$$ \alpha = \alpha_M $$

Eureka!¶

It means that for every exponent $C$, there exists some constant $M$ such that $\textsf{foobar}_{M,C}(x) \approx x^C$.

This is all we need to find meaningful "magic" constants!

In fact, you can find a magic constant for every exponent $x^C$.¶

Fast Exponentiation [1]¶

We know that for each $C$, there's some magic $M^*$ such that $$ \textsf{foobar}_{M^*,C}(x) = \left(M^* + C \cdot x^\sharp\right)^\flat \approx x^C $$

How do we find $M^*$? We know that for inverse square-roots, $M^*$ is around 0x5f3759df.

Answer: More $8^{th}$ grade algebra.

Fast Exponentiation [2]¶

Question: What happens if you send $x = 1$ into $x^C$?

Answer: $1^C = 1$ no matter what $C$ is. (Dude, Spoilers!)

This is the only crucial insight that we need.

Fast Exponentiation [3]¶

So let's use $x = 1$ as a fixed boundary. If $\textsf{foobar}_{M^*,C}(x)$ is supposed to approximate $x^C$, then we should also expect that $\textsf{foobar}_{M^*,C}(x = 1) = 1^C$. $$ \textsf{foobar}_{M^*,C}(1) = \left(M^* + C \cdot 1^\sharp\right)^\flat \approx 1^C = 1 $$

Recall that ${a^\flat}^\sharp = a$, so let's raise both sides by half-tone. $$ {\left(M^* + C \cdot 1^\sharp\right)^\flat}^\sharp = M^* + C \cdot 1^\sharp \approx 1^\sharp $$

Unfortunately, $1^\sharp \ne 1$, but we can subtract the $C \cdot 1^\sharp$ from both sides to get $$ \begin{align*} \left(M^* + C \cdot 1^\sharp\right) - C \cdot 1^\sharp &\approx \left(1^\sharp\right) - C \cdot 1^\sharp \\ M^* &\approx \boxed{(1 - C) \cdot 1^\sharp} \end{align*} $$

Fast Exponentiations [Fin]¶

This seems to suggest that $\textsf{foobar}_{(1 - C)1^\sharp,C}(x) \approx x^C$. How close is this to the truth?

In [218]:

# What is 1#?
display(Latex(r'Just $1^\sharp = \textsf{%s}$.' % hex(sharp(1))))

# What about inverse square-root?
display(Latex(r'For the inverse square-root, its magical constant should be \
  $$\left(1 - \frac{-1}{2}\right)1^\sharp = \textsf{%s}$$' 
              % hex(3 * sharp(1) // 2)))

hide_code_in_slideshow()

Just $1^\sharp = \textsf{0x3f800000}$.

For the inverse square-root, its magical constant should be $$\left(1 - \frac{-1}{2}\right)1^\sharp = \textsf{0x5f400000}$$

[Double Click for Code]

Hmm, weren't we expecting 0x5f3759df instead of 0x5f400000?

It turns out that magic constants that are close to each other have nearly identicle behavior. Here, 0x5f3759df is only 0.035% away from 0x5f400000! Close enough.

Behold! The Fast Arbitrary Power Function!¶

In [203]:

def qexp(C):
    # (1 - C) * sharp(1) + C * sharp(x)
    return np.vectorize(lambda x: flat((1 - C) * sharp(1) + C * sharp(x)))

#define MAGIC 0x3f800000
float qpow(float x, float exponent) {
  long i  = * ( long * ) &x;                  // floating point bit hack
  i  = (1 - exponent) * MAGIC + exponent * i; // what the fuck? 
  return * ( float * ) &i;
}```

In [211]:

x = np.linspace(1, 1000, 5000)
properties = {
    (0, 0): {'M': allM, 'C': -1},
    (1, 0): {'M': allM, 'C': 2},
    (0, 1): {'M': allM, 'C': 0.3},
    (1, 1): {'M': allM, 'C': -0.6},
}
fig, axarr = plt.subplots(2, 2, figsize=(14,8));
for key, property in properties.items():
    C = property['C']
    handle, = axarr[key].plot(x, x ** C);
    handles = [handle]
    baz = qexp(C)
    handle, = axarr[key].plot(x, baz(x))
    handles.append(handle)
    # axarr[key].set_title(r'For slope C = $%s$, ${\rm foobar}_{M,%s}(x)$' % (C, C))
    axarr[key].legend(
        handles, 
        [
            r'$x^{%s}$' % C, 
            r'$M^* = $ %s' % hex(int(C * sharp(1))),
        ], loc=4)

hide_code_in_slideshow()

[Double Click for Code]

Hey, that actually looks pretty good! But what about the errors?

In [216]:

x = np.linspace(1, 1000, 5000)
properties = {
    (0, 0): {'C': -1},
    (1, 0): {'C': 2},
    (0, 1): {'C': 0.3},
    (1, 1): {'C': -0.6},
}
fig, axarr = plt.subplots(2, 2, figsize=(14,8));
for key, property in properties.items():
    axarr[key].set_ylim(0, 0.5)
    axarr[key].yaxis.set_major_formatter(formatter)
    C = property['C']
    baz = qexp(C)
    handle, = axarr[key].plot(x, np.abs(x ** C - baz(x))/(x ** C));
    axarr[key].set_title(r'Relative error for $x^{%s}$' % C)
    axarr[key].legend(
        [handle], 
        [r'Relative error for $x^{%s}$' % C])

hide_code_in_slideshow()

[Double Click for Code]

An error of around $10\%$? That's like nothing!

You Could Have Invented Fast Inverse Roots Using $8^{th}$ Grade Math.¶

What's the point?¶

What's the Fast Inverse Root Method you ask?¶

Does this actually work?¶

Close Enough!¶

So what's up with the 0x5f3759df¶

As it turns out, yes we can¶

Notation¶

Do Snakes Dream of EDM?¶

Music Theory¶

A Tale of two Variables¶

Foo, meet Bar¶

What do you see?¶

Graphing Calculator Woes¶

Throwback to the $8^{th}$ Grade¶

Eureka!¶

In fact, you can find a magic constant for every exponent $x^C$.¶

Fast Exponentiation [1]¶

Fast Exponentiation [2]¶

Fast Exponentiation [3]¶

Fast Exponentiations [Fin]¶

Behold! The Fast Arbitrary Power Function!¶

So what's up with the `0x5f3759df`¶