Notebook

Random hyperparameter search using Jobman¶

Prerequisites¶

For this tutorial we assume the reader is familiar with Jobman and its jobdispatch helper script. For more information, see the jobman documentation.

Problem overview¶

Suppose you have a yaml file describing an experiment for which you'd like to do hyperparameter optimization by random search:

!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
        which_set: 'train',
        start: 0,
        stop: 50000
    },
    model: !obj:pylearn2.models.mlp.MLP {
        layers: [
            !obj:pylearn2.models.mlp.Sigmoid {
                layer_name: 'h0',
                dim: 500,
                sparse_init: 15,
            }, !obj:pylearn2.models.mlp.Softmax {
                layer_name: 'y',
                n_classes: 10,
                irange: 0.
            }
        ],
        nvis: 784,
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        batch_size: 100,
        learning_rate: 1e-3,
        learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
            init_momentum: 0.5,
        },
        monitoring_batches: 10,
        monitoring_dataset : *train,
        termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: 1
        },
    },
    save_path: "mlp.pkl",
    save_freq : 5
}

Here's how you can do it using Pylearn2 and Jobman:

Consider your yaml file as a string and adapt it by replacing hyperparameter values with statements of the form %(hyperparameter_name)x, just like you'd do for string substitution
Write an extraction method which takes a Train object as input and returns results extracted from it as output
Write a Jobman-compatible experiment method which will do string substitution on the yaml string using a dictionary of your hyperparameters, instantiate a Train object, train it and extract results by calling the extraction method on the Train object
Write a separate Jobman-compatible configuration file containing your yaml file in a string representation, your hyperparameters and the name of a method which will be used to extract results
Call the jobman executable with the experiment method and the configuration file

Let's now break it down a little.

Adapting an existing yaml file¶

Very little has to be done: just replace all desired hyperparameter values by a string substitution statement.

For example, if you want to optimize the learning rate and the momentum coefficient, here's how your yaml file would look like:

!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
        which_set: 'train',
        start: 0,
        stop: 50000
    },
    model: !obj:pylearn2.models.mlp.MLP {
        layers: [
            !obj:pylearn2.models.mlp.Sigmoid {
                layer_name: 'h0',
                dim: 500,
                sparse_init: 15,
            }, !obj:pylearn2.models.mlp.Softmax {
                layer_name: 'y',
                n_classes: 10,
                irange: 0.
            }
        ],
        nvis: 784,
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        batch_size: 100,
        learning_rate: %(learning_rate)f,
        learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
            init_momentum: %(init_momentum)f,
        },
        monitoring_batches: 10,
        monitoring_dataset : *train,
        termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: 1
        },
    },
    save_path: "mlp.pkl",
    save_freq : 5
}

Writing the experiment method¶

Luckily for you, there's already an experiment method built in Pylearn2: pylearn2.scripts.jobman.experiment.train_experiment.

Like all methods compatible with Jobman, it expects two arguments (state and channel) and returns channel.COMPLETE when done. Furthermore, it expects state to contain at least:

a yaml_template key pointing to the (not yet complete) yaml string describing the experiment,
a hyper_parameters key pointing to a DD object containing all variable hyperparameters for the experiment, and
an extract_results key pointing to a string

Here's the method's implementation:

def train_experiment(state, channel):
    """
    Train a model specified in state, and extract required results.

    This function builds a YAML string from ``state.yaml_template``, taking
    the values of hyper-parameters from ``state.hyper_parameters``, creates
    the corresponding object and trains it (like train.py), then run the
    function in ``state.extract_results`` on it, and store the returned values
    into ``state.results``.

    To know how to use this function, you can check the example in tester.py
    (in the same directory).
    """
    yaml_template = state.yaml_template

    # Convert nested DD into nested ydict.
    hyper_parameters = expand(flatten(state.hyper_parameters), dict_type=ydict)

    # This will be the complete yaml string that should be executed
    final_yaml_str = yaml_template % hyper_parameters

    # Instantiate an object from YAML string
    train_obj = pylearn2.config.yaml_parse.load(final_yaml_str)

    try:
        iter(train_obj)
        iterable = True
    except TypeError:
        iterable = False
    if iterable:
        raise NotImplementedError(
                ('Current implementation does not support running multiple '
                 'models in one yaml string.  Please change the yaml template '
                 'and parameters to contain only one single model.'))
    else:
        # print "Executing the model."
        train_obj.main_loop()
        # This line will call a function defined by the user and pass train_obj
        # to it.
        state.results = jobman.tools.resolve(state.extract_results)(train_obj)
        return channel.COMPLETE

It simply builds a dictionary out of state.hyper_parameters does string substitution on state.yaml_template with it.

It then instantiates the Train object as described in the yaml string and calls its main_loop method.

Finally, when the method returns, it calls the method referenced in the state.extract_results string by passing it the Train object as argument. This method is responsible to extract any relevant results from the Train object and returning them, either as is or in a DD object. The return value is stored in state.results.

Writing the extraction method¶

Your extraction method should accept a Train object instance and return either a single value (float, int, str, etc.) or a DD object containing your values.

For the purpose of this tutorial, let's write a simple method which extracts the misclassification rate and the NLL from the model's monitor:

def results_extractor(train_obj):
    channels = train_obj.model.monitor.channels
    train_y_misclass = channels['y_misclass'].val_record[-1]
    train_y_nll = channels['y_nll'].val_record[-1]

    return DD(train_y_misclass=train_y_misclass,
              train_y_nll=train_y_nll)

Here we extract misclassification rate and NLL values at the last training epoch from their respective channels of the model's monitor and return a DD object containing those values.

Writing the configuration file¶

Your configuration file should contain

yaml_template: a yaml string representing your experiment
hyper_parameters.[name]: the value of the [name] hyperparameter. You must have at least one such item, but you can have as many as you want.
extract_results: a string of the module.method form representing the result extraction method which is to be used

Here's how a configuration file could look for our experiment:

yaml_template:=@__builtin__.open('mlp.yaml').read()

hyper_parameters.learning_rate:=@utils.log_uniform(1e-5, 1e-1)
hyper_parameters.init_momentum:=@utils.log_uniform(0.5, 1.0)

extract_results = "sheldon.code.pylearn2.scripts.jobman.extraction.trivial_extractor"

Notice how we're using the key:=@method statement. This serves two purposes:

We don't have to copy the yaml file to the configuration file as a long, hard to edit string.
We don't have to hard-code hyperparameter values, which means every time jobman is called with this configuration file, it'll get different hyperparameters.

For reference, here's utils.log_uniform's implementation:

def log_uniform(low, high):
    """
    Generates a number that's uniformly distributed in the log-space between
    `low` and `high`

    Parameters
    ----------
    low : float
        Lower bound of the randomly generated number
    high : float
        Upper bound of the randomly generated number

    Returns
    -------
    rval : float
        Random number uniformly distributed in the log-space specified by `low`
        and `high`
    """
    log_low = numpy.log(low)
    log_high = numpy.log(high)

    log_rval = numpy.random.uniform(log_low, log_high)
    rval = float(numpy.exp(log_rval))

    return rval

Running the whole thing¶

Here's how you would call jobman to train your model:

In [2]:

!jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workid_prefix=jobman_demo jobman_demo/mlp.conf

/opt/lisa/os/epd-7.1.2/lib/python2.7/site-packages/scikits/__init__.py:1: UserWarning: Module jobman was already imported from /data/lisa/exp/dumouliv/jobman/__init__.pyc, but /opt/lisa/os/lib64/python2.7/site-packages is being added to sys.path
The working directory is: /data/lisa/exp/dumouliv/Pylearn2/pylearn2/scripts/tutorials/jobman1
/data/lisa/exp/dumouliv/Pylearn2/pylearn2/models/mlp.py:44: UserWarning: MLP changing the recursion limit.
  warnings.warn("MLP changing the recursion limit.")
/data/lisa/exp/dumouliv/Pylearn2/pylearn2/space/__init__.py:272: UserWarning: It looks like the <class 'pylearn2.space.CompositeSpace'>subclass of Space does not call the superclass __init__ method. Currently this is a warning. It will become an error on or after 2014-06-17.
  "subclass of Space does not call the superclass __init__ "
Parameter and initial learning rate summary:
	h0_W: 0.000205
	h0_b: 0.000205
	softmax_b: 0.000205
	softmax_W: 0.000205
Compiling sgd_update...
Compiling sgd_update done. Time elapsed: 3.680988 seconds
compiling begin_record_entry...
compiling begin_record_entry done. Time elapsed: 0.094591 seconds
Monitored channels: 
	h0_col_norms_max
	h0_col_norms_mean
	h0_col_norms_min
	h0_max_x_max_u
	h0_max_x_mean_u
	h0_max_x_min_u
	h0_mean_x_max_u
	h0_mean_x_mean_u
	h0_mean_x_min_u
	h0_min_x_max_u
	h0_min_x_mean_u
	h0_min_x_min_u
	h0_range_x_max_u
	h0_range_x_mean_u
	h0_range_x_min_u
	h0_row_norms_max
	h0_row_norms_mean
	h0_row_norms_min
	learning_rate
	momentum
	monitor_seconds_per_epoch
	objective
	y_col_norms_max
	y_col_norms_mean
	y_col_norms_min
	y_max_max_class
	y_mean_max_class
	y_min_max_class
	y_misclass
	y_nll
	y_row_norms_max
	y_row_norms_mean
	y_row_norms_min
Compiling accum...
graph size: 116
Compiling accum done. Time elapsed: 1.706080 seconds
Monitoring step:
	Epochs seen: 0
	Batches seen: 0
	Examples seen: 0
	h0_col_norms_max: 6.23503405999
	h0_col_norms_mean: 3.82355643971
	h0_col_norms_min: 2.06193996111
	h0_max_x_max_u: 0.999166448331
	h0_max_x_mean_u: 0.835546521642
	h0_max_x_min_u: 0.484522687851
	h0_mean_x_max_u: 0.898498338615
	h0_mean_x_mean_u: 0.477264282788
	h0_mean_x_min_u: 0.14077357946
	h0_min_x_max_u: 0.502200790533
	h0_min_x_mean_u: 0.13427006672
	h0_min_x_min_u: 0.000345345510862
	h0_range_x_max_u: 0.982607199247
	h0_range_x_mean_u: 0.701276454921
	h0_range_x_min_u: 0.212314451878
	h0_row_norms_max: 5.89326124667
	h0_row_norms_mean: 2.98549156744
	h0_row_norms_min: 0.0
	learning_rate: 0.000205
	momentum: 0.583961
	monitor_seconds_per_epoch: 0.0
	objective: 2.30258509299
	y_col_norms_max: 0.0
	y_col_norms_mean: 0.0
	y_col_norms_min: 0.0
	y_max_max_class: 0.1
	y_mean_max_class: 0.1
	y_min_max_class: 0.1
	y_misclass: 0.903
	y_nll: 2.30258509299
	y_row_norms_max: 0.0
	y_row_norms_mean: 0.0
	y_row_norms_min: 0.0
Time this epoch: 8.862564 seconds
Monitoring step:
	Epochs seen: 1
	Batches seen: 500
	Examples seen: 50000
	h0_col_norms_max: 6.23503528894
	h0_col_norms_mean: 3.82355957684
	h0_col_norms_min: 2.0619418242
	h0_max_x_max_u: 0.999166295949
	h0_max_x_mean_u: 0.835562596825
	h0_max_x_min_u: 0.484561123332
	h0_mean_x_max_u: 0.898487726921
	h0_mean_x_mean_u: 0.47726432182
	h0_mean_x_min_u: 0.140795730801
	h0_min_x_max_u: 0.502187137372
	h0_min_x_mean_u: 0.134253984725
	h0_min_x_min_u: 0.000345513621585
	h0_range_x_max_u: 0.982615216558
	h0_range_x_mean_u: 0.701308612099
	h0_range_x_min_u: 0.212336533604
	h0_row_norms_max: 5.89326388489
	h0_row_norms_mean: 2.98549391807
	h0_row_norms_min: 9.32520084457e-07
	learning_rate: 0.000205
	momentum: 0.583961
	monitor_seconds_per_epoch: 8.862564
	objective: 2.21611916125
	y_col_norms_max: 0.0675190825355
	y_col_norms_mean: 0.0446384068231
	y_col_norms_min: 0.0272975089866
	y_max_max_class: 0.134260376897
	y_mean_max_class: 0.114756053537
	y_min_max_class: 0.104770463666
	y_misclass: 0.57
	y_nll: 2.21611916125
	y_row_norms_max: 0.0153922340481
	y_row_norms_mean: 0.0060778646612
	y_row_norms_min: 0.00163779261751
Saving to mlp.pkl...
Saving to mlp.pkl done. Time elapsed: 0.421483 seconds
The experiment returned value is None

Multiple runs using `jobdispatch`¶

Launching 10 hyperoptimization jobs is as easy as

In [3]:

!jobdispatch --local --repeat_jobs=10 /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf


The jobs will be launched on the system: Local
With options:  ['repeat_jobs:10', "tasks_filename:['nb0', 'compact']", 'launch_cmd:Local']
We generate the DBI object with 10 command
Fri Jan 10 15:41:21 2014
[DBI] The Log file are under LOGS.NOBACKUP/jobman_cmdline_-g_numbered_pylearn2.scripts.jobman.experiment.train_experiment_workdir_prefix_jobman_demo__jobman_demo_mlp.conf_2014-01-10_15-41-21.548948
[DBI,1/10,Fri Jan 10 15:41:22 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf
[DBI,2/10,Fri Jan 10 15:41:43 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf
[DBI,3/10,Fri Jan 10 15:42:04 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf
[DBI,4/10,Fri Jan 10 15:42:25 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf
[DBI,5/10,Fri Jan 10 15:42:45 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf
[DBI,6/10,Fri Jan 10 15:43:06 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf
[DBI,7/10,Fri Jan 10 15:43:27 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf
[DBI,8/10,Fri Jan 10 15:43:48 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf
[DBI,9/10,Fri Jan 10 15:44:08 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf
[DBI,10/10,Fri Jan 10 15:44:29 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf
[DBI,Fri Jan 10 15:44:50 2014] left running: 0/10
[DBI] 10 jobs. finished: 10, running: 0, waiting: 0, init: 0
[DBI] jobs unfinished (starting at 1):  []
[DBI] The Log file are under LOGS.NOBACKUP/jobman_cmdline_-g_numbered_pylearn2.scripts.jobman.experiment.train_experiment_workdir_prefix_jobman_demo__jobman_demo_mlp.conf_2014-01-10_15-41-21.548948

Parsing results using `jobman.tools.find_conf_files`¶

Once all relevant results have been extracted, you'll probably want to find the best set of hyperparameters.

One way to do that is to call jobman.tools.find_conf_files on the directory containing your experiment directories; the method will return a list of DD objects for all experiment files present in that directory and in its subdirectories. You can then go through that list and quickly extract the best hyperparameters:

In [4]:

import numpy
from jobman import tools

def parse_results(cwd):
    optimal_dd = None
    optimal_measure = numpy.inf

    for tup in tools.find_conf_files(cwd):
        dd = tup[1]
        if 'results.train_y_misclass' in dd:
            if dd['results.train_y_misclass'] < optimal_measure:
                optimal_measure = dd['results.train_y_misclass']
                optimal_dd = dd
    
    print "Optimal " + "results.train_y_misclass" + ": " + str(optimal_measure)
    for key, value in optimal_dd.items():
        if 'hyper_parameters' in key:
            print key + ": " + str(value)

parse_results("jobman_demo/")

WARNING: jobman_demo/__init__.pyc/current.conf file not found. Skipping it
WARNING: jobman_demo/mlp.conf/current.conf file not found. Skipping it
WARNING: jobman_demo/__init__.py/current.conf file not found. Skipping it
WARNING: jobman_demo/mlp.pkl/current.conf file not found. Skipping it
WARNING: jobman_demo/utils.py/current.conf file not found. Skipping it
WARNING: jobman_demo/utils.pyc/current.conf file not found. Skipping it
WARNING: jobman_demo/mlp.yaml/current.conf file not found. Skipping it
Optimal results.train_y_misclass: 0.217
hyper_parameters.learning_rate: 0.00191878940445
hyper_parameters.init_momentum: 0.782112604517

In [ ]: