Emprical Cumulative Distribution Function on the GPU

In this Jupyter notebook we show how to use the GPU seamlessly in a scripting mode with Alea GPU V3 to calculate the empirical cumulative distribution function for a very large sample size. The purpose of this notebook is to demonstrate the new highly efficient GPU primitives that will come with our next release 3.1 of Alea GPU, such as highly optimized parallel sort, scan, reduce, copy if, stream compactification, merge, join, etc.

Background

The empirical cumulative distribution function $\hat{F}_n(t)$ for the samples $x_1, \ldots, x_n$ is defined as

$${\hat F}_{n}(t)={\frac {{\mbox{number of elements in the sample}}\leq t}n}={\frac {1}{n}}\sum _{{i=1}}^{n}{\mathbf {1}}_{{x_{i}\leq t}}$$

It is an estimator of the true distribution function from which the samples $x_1, \ldots, x_n$ are generated. More details can be found on Wikipedia.

Let's Get Started!

Before you continue you should install the F# kernel for Jupyter.

We use Paket to get the newest version 3.1 beta NuGet packages of Alea GPU. You can run Alea GPU free on any CUDA capable GeForce or Quadro GPU. In case you want to run it on an enterprise GPU such as a Tesla K80 or the brand new NVIDIA P100 you need a license.

Unfortunately as of now (January 2017), it is not possible to run this notebook on Azure Notebooks with server side execution because the Azure Notebooks service does not yet provide GPU instances.

In [1]:
#load "Paket.fsx"
Paket.Version [ ("Alea", "3.0.3-beta2") ]
Paket.Package [ "NUnit" ]
Out[1]:
<null>
In [2]:
#load "packages/Alea/Alea.fsx"
#r "packages/Alea/lib/net45/Alea.Parallel.dll" 
#r "packages/NUnit/lib/net45/nunit.framework.dll"
In [3]:
#load "XPlot.Plotly.Paket.fsx"
#load "XPlot.Plotly.fsx"
open XPlot.Plotly