Here we give a brief description of where to find the main data sets used in this tutorial. Detailed descriptions of how to work with this data once it has been downloaded are given within the main tutorial content (links given below).
This is arguably the most well-known benchmark data set for the pattern recognition task. The data is available at
http://yann.lecun.com/exdb/mnist/
for anyone with an internet connection. No registration is required.
Once the raw data has been acquired, we assume that it is stored in the data/mnist
directory, and prepared as follows.
$ mkdir data/mnist
$ cd data/mnist
$ wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
$ wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
$ wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
$ wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
$ gunzip *
From here, we can get to work.
Let's prepare a sub-directory called cifar10
to store the data. There are three versions of the data available: a Python version, MATLAB version, and binary version. While the Python version is perfectly acceptable, let's prepare using the binary version.
$ cd data/cifar10
$ wget http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz
$ tar -xzf cifar-10-binary.tar.gz
A directory cifar-10-batches-bin
is created, with content as follows:
$ ls cifar-10-batches-bin
batches.meta.txt data_batch_2.bin data_batch_4.bin readme.html
data_batch_1.bin data_batch_3.bin data_batch_5.bin test_batch.bin
From here, we can get to work.
$ mkdir data/iris
$ cd data/iris
$ wget [ URL ]/bezdekIris.data
$ wget [ URL ]/iris.data
$ wget [ URL ]/iris.names
where [ URL ]
is as follows:
https://archive.ics.uci.edu/ml/machine-learning-databases/iris
From here, we can get to work.
The vim-2 data set, also known as the "Gallant Lab Natural Movie 4T fMRI Data set", is available from the website of Collaborative Research in Computational Neuroscience (CRCNS), at the following URL:
https://crcns.org/data-sets/vc/vim-2
This requires free registration to CRCNS.org, which can be done quickly using their "Request Account" page:
https://crcns.org/request-account
The application is screened, and so it may take a day or two before it is (hopefully) accepted.
If you are just downloading it locally, then logging in and downloading via your browser is perfectly acceptable, but if you are using a remote server for computation, be it your own or some cloud-based solution, it is best to make use of the download scripts that are provided:
https://crcns.org/download
Under "Batch download method", there is a link (https://portal.nersc.gov/project/crcns/download/tools) to a page which requires input of your username and password. From here, we get access to the sub-directory within tools
. Looking inside tools/download
, there are a handful of files, including crcns-download-tools-instuctions
, which explains how to set up the configuration file and how to use the download/verification scripts. Setup requires only a few minutes; just follow the lucid instructions and take a break while the files are downloaded.
Once the raw data has been acquired, we assume that it is stored in the data/vim-2
directory, in whatever your working directory is.
From here, we can get to work.