For this tutorial we assume the reader is familiar with Jobman and its jobdispatch
helper script. For more information, see the jobman documentation.
Suppose you have a yaml file describing an experiment for which you'd like to do hyperparameter optimization by random search:
!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
which_set: 'train',
start: 0,
stop: 50000
},
model: !obj:pylearn2.models.mlp.MLP {
layers: [
!obj:pylearn2.models.mlp.Sigmoid {
layer_name: 'h0',
dim: 500,
sparse_init: 15,
}, !obj:pylearn2.models.mlp.Softmax {
layer_name: 'y',
n_classes: 10,
irange: 0.
}
],
nvis: 784,
},
algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
batch_size: 100,
learning_rate: 1e-3,
learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
init_momentum: 0.5,
},
monitoring_batches: 10,
monitoring_dataset : *train,
termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
max_epochs: 1
},
},
save_path: "mlp.pkl",
save_freq : 5
}
Here's how you can do it using Pylearn2 and Jobman:
%(hyperparameter_name)x
, just like you'd do for string substitutionTrain
object as input and returns results extracted from it as outputTrain
object, train it and extract results by calling the extraction
method on the Train
objectjobman
executable with the experiment method and the configuration fileLet's now break it down a little.
Very little has to be done: just replace all desired hyperparameter values by a string substitution statement.
For example, if you want to optimize the learning rate and the momentum coefficient, here's how your yaml file would look like:
!obj:pylearn2.train.Train {
dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
which_set: 'train',
start: 0,
stop: 50000
},
model: !obj:pylearn2.models.mlp.MLP {
layers: [
!obj:pylearn2.models.mlp.Sigmoid {
layer_name: 'h0',
dim: 500,
sparse_init: 15,
}, !obj:pylearn2.models.mlp.Softmax {
layer_name: 'y',
n_classes: 10,
irange: 0.
}
],
nvis: 784,
},
algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
batch_size: 100,
learning_rate: %(learning_rate)f,
learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
init_momentum: %(init_momentum)f,
},
monitoring_batches: 10,
monitoring_dataset : *train,
termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter {
max_epochs: 1
},
},
save_path: "mlp.pkl",
save_freq : 5
}
Luckily for you, there's already an experiment method built in Pylearn2: pylearn2.scripts.jobman.experiment.train_experiment
.
Like all methods compatible with Jobman, it expects two arguments (state
and channel
) and returns channel.COMPLETE
when done. Furthermore, it expects state
to contain at least:
yaml_template
key pointing to the (not yet complete) yaml string describing the experiment,hyper_parameters
key pointing to a DD
object containing all variable hyperparameters for the experiment, andextract_results
key pointing to a stringHere's the method's implementation:
def train_experiment(state, channel):
"""
Train a model specified in state, and extract required results.
This function builds a YAML string from ``state.yaml_template``, taking
the values of hyper-parameters from ``state.hyper_parameters``, creates
the corresponding object and trains it (like train.py), then run the
function in ``state.extract_results`` on it, and store the returned values
into ``state.results``.
To know how to use this function, you can check the example in tester.py
(in the same directory).
"""
yaml_template = state.yaml_template
# Convert nested DD into nested ydict.
hyper_parameters = expand(flatten(state.hyper_parameters), dict_type=ydict)
# This will be the complete yaml string that should be executed
final_yaml_str = yaml_template % hyper_parameters
# Instantiate an object from YAML string
train_obj = pylearn2.config.yaml_parse.load(final_yaml_str)
try:
iter(train_obj)
iterable = True
except TypeError:
iterable = False
if iterable:
raise NotImplementedError(
('Current implementation does not support running multiple '
'models in one yaml string. Please change the yaml template '
'and parameters to contain only one single model.'))
else:
# print "Executing the model."
train_obj.main_loop()
# This line will call a function defined by the user and pass train_obj
# to it.
state.results = jobman.tools.resolve(state.extract_results)(train_obj)
return channel.COMPLETE
It simply builds a dictionary out of state.hyper_parameters
does string substitution on state.yaml_template
with it.
It then instantiates the Train
object as described in the yaml string and calls its main_loop
method.
Finally, when the method returns, it calls the method referenced in the state.extract_results
string by passing it the Train
object as argument. This method is responsible to extract any relevant results from the Train
object and returning them, either as is or in a DD
object. The return value is stored in state.results
.
Your extraction method should accept a Train
object instance and return either a single value (float
, int
, str
, etc.) or a DD
object containing your values.
For the purpose of this tutorial, let's write a simple method which extracts the misclassification rate and the NLL from the model's monitor:
def results_extractor(train_obj):
channels = train_obj.model.monitor.channels
train_y_misclass = channels['y_misclass'].val_record[-1]
train_y_nll = channels['y_nll'].val_record[-1]
return DD(train_y_misclass=train_y_misclass,
train_y_nll=train_y_nll)
Here we extract misclassification rate and NLL values at the last training epoch from their respective channels of the model's monitor and return a DD
object containing those values.
Your configuration file should contain
yaml_template
: a yaml string representing your experimenthyper_parameters.[name]
: the value of the [name]
hyperparameter. You must have at least one such item, but you can have as many as you want.extract_results
: a string of the module.method
form representing the result extraction method which is to be usedHere's how a configuration file could look for our experiment:
yaml_template:=@__builtin__.open('mlp.yaml').read()
hyper_parameters.learning_rate:=@utils.log_uniform(1e-5, 1e-1)
hyper_parameters.init_momentum:=@utils.log_uniform(0.5, 1.0)
extract_results = "sheldon.code.pylearn2.scripts.jobman.extraction.trivial_extractor"
Notice how we're using the key:=@method statement. This serves two purposes:
jobman
is called with this configuration file, it'll get different hyperparameters.For reference, here's utils.log_uniform
's implementation:
def log_uniform(low, high):
"""
Generates a number that's uniformly distributed in the log-space between
`low` and `high`
Parameters
----------
low : float
Lower bound of the randomly generated number
high : float
Upper bound of the randomly generated number
Returns
-------
rval : float
Random number uniformly distributed in the log-space specified by `low`
and `high`
"""
log_low = numpy.log(low)
log_high = numpy.log(high)
log_rval = numpy.random.uniform(log_low, log_high)
rval = float(numpy.exp(log_rval))
return rval
Here's how you would call jobman
to train your model:
!jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workid_prefix=jobman_demo jobman_demo/mlp.conf
/opt/lisa/os/epd-7.1.2/lib/python2.7/site-packages/scikits/__init__.py:1: UserWarning: Module jobman was already imported from /data/lisa/exp/dumouliv/jobman/__init__.pyc, but /opt/lisa/os/lib64/python2.7/site-packages is being added to sys.path The working directory is: /data/lisa/exp/dumouliv/Pylearn2/pylearn2/scripts/tutorials/jobman1 /data/lisa/exp/dumouliv/Pylearn2/pylearn2/models/mlp.py:44: UserWarning: MLP changing the recursion limit. warnings.warn("MLP changing the recursion limit.") /data/lisa/exp/dumouliv/Pylearn2/pylearn2/space/__init__.py:272: UserWarning: It looks like the <class 'pylearn2.space.CompositeSpace'>subclass of Space does not call the superclass __init__ method. Currently this is a warning. It will become an error on or after 2014-06-17. "subclass of Space does not call the superclass __init__ " Parameter and initial learning rate summary: h0_W: 0.000205 h0_b: 0.000205 softmax_b: 0.000205 softmax_W: 0.000205 Compiling sgd_update... Compiling sgd_update done. Time elapsed: 3.680988 seconds compiling begin_record_entry... compiling begin_record_entry done. Time elapsed: 0.094591 seconds Monitored channels: h0_col_norms_max h0_col_norms_mean h0_col_norms_min h0_max_x_max_u h0_max_x_mean_u h0_max_x_min_u h0_mean_x_max_u h0_mean_x_mean_u h0_mean_x_min_u h0_min_x_max_u h0_min_x_mean_u h0_min_x_min_u h0_range_x_max_u h0_range_x_mean_u h0_range_x_min_u h0_row_norms_max h0_row_norms_mean h0_row_norms_min learning_rate momentum monitor_seconds_per_epoch objective y_col_norms_max y_col_norms_mean y_col_norms_min y_max_max_class y_mean_max_class y_min_max_class y_misclass y_nll y_row_norms_max y_row_norms_mean y_row_norms_min Compiling accum... graph size: 116 Compiling accum done. Time elapsed: 1.706080 seconds Monitoring step: Epochs seen: 0 Batches seen: 0 Examples seen: 0 h0_col_norms_max: 6.23503405999 h0_col_norms_mean: 3.82355643971 h0_col_norms_min: 2.06193996111 h0_max_x_max_u: 0.999166448331 h0_max_x_mean_u: 0.835546521642 h0_max_x_min_u: 0.484522687851 h0_mean_x_max_u: 0.898498338615 h0_mean_x_mean_u: 0.477264282788 h0_mean_x_min_u: 0.14077357946 h0_min_x_max_u: 0.502200790533 h0_min_x_mean_u: 0.13427006672 h0_min_x_min_u: 0.000345345510862 h0_range_x_max_u: 0.982607199247 h0_range_x_mean_u: 0.701276454921 h0_range_x_min_u: 0.212314451878 h0_row_norms_max: 5.89326124667 h0_row_norms_mean: 2.98549156744 h0_row_norms_min: 0.0 learning_rate: 0.000205 momentum: 0.583961 monitor_seconds_per_epoch: 0.0 objective: 2.30258509299 y_col_norms_max: 0.0 y_col_norms_mean: 0.0 y_col_norms_min: 0.0 y_max_max_class: 0.1 y_mean_max_class: 0.1 y_min_max_class: 0.1 y_misclass: 0.903 y_nll: 2.30258509299 y_row_norms_max: 0.0 y_row_norms_mean: 0.0 y_row_norms_min: 0.0 Time this epoch: 8.862564 seconds Monitoring step: Epochs seen: 1 Batches seen: 500 Examples seen: 50000 h0_col_norms_max: 6.23503528894 h0_col_norms_mean: 3.82355957684 h0_col_norms_min: 2.0619418242 h0_max_x_max_u: 0.999166295949 h0_max_x_mean_u: 0.835562596825 h0_max_x_min_u: 0.484561123332 h0_mean_x_max_u: 0.898487726921 h0_mean_x_mean_u: 0.47726432182 h0_mean_x_min_u: 0.140795730801 h0_min_x_max_u: 0.502187137372 h0_min_x_mean_u: 0.134253984725 h0_min_x_min_u: 0.000345513621585 h0_range_x_max_u: 0.982615216558 h0_range_x_mean_u: 0.701308612099 h0_range_x_min_u: 0.212336533604 h0_row_norms_max: 5.89326388489 h0_row_norms_mean: 2.98549391807 h0_row_norms_min: 9.32520084457e-07 learning_rate: 0.000205 momentum: 0.583961 monitor_seconds_per_epoch: 8.862564 objective: 2.21611916125 y_col_norms_max: 0.0675190825355 y_col_norms_mean: 0.0446384068231 y_col_norms_min: 0.0272975089866 y_max_max_class: 0.134260376897 y_mean_max_class: 0.114756053537 y_min_max_class: 0.104770463666 y_misclass: 0.57 y_nll: 2.21611916125 y_row_norms_max: 0.0153922340481 y_row_norms_mean: 0.0060778646612 y_row_norms_min: 0.00163779261751 Saving to mlp.pkl... Saving to mlp.pkl done. Time elapsed: 0.421483 seconds The experiment returned value is None
jobdispatch
¶Launching 10 hyperoptimization jobs is as easy as
!jobdispatch --local --repeat_jobs=10 /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf
The jobs will be launched on the system: Local With options: ['repeat_jobs:10', "tasks_filename:['nb0', 'compact']", 'launch_cmd:Local'] We generate the DBI object with 10 command Fri Jan 10 15:41:21 2014 [DBI] The Log file are under LOGS.NOBACKUP/jobman_cmdline_-g_numbered_pylearn2.scripts.jobman.experiment.train_experiment_workdir_prefix_jobman_demo__jobman_demo_mlp.conf_2014-01-10_15-41-21.548948 [DBI,1/10,Fri Jan 10 15:41:22 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf [DBI,2/10,Fri Jan 10 15:41:43 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf [DBI,3/10,Fri Jan 10 15:42:04 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf [DBI,4/10,Fri Jan 10 15:42:25 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf [DBI,5/10,Fri Jan 10 15:42:45 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf [DBI,6/10,Fri Jan 10 15:43:06 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf [DBI,7/10,Fri Jan 10 15:43:27 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf [DBI,8/10,Fri Jan 10 15:43:48 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf [DBI,9/10,Fri Jan 10 15:44:08 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf [DBI,10/10,Fri Jan 10 15:44:29 2014] /opt/lisa/os/bin/jobman cmdline -g numbered pylearn2.scripts.jobman.experiment.train_experiment workdir_prefix=jobman_demo/ jobman_demo/mlp.conf [DBI,Fri Jan 10 15:44:50 2014] left running: 0/10 [DBI] 10 jobs. finished: 10, running: 0, waiting: 0, init: 0 [DBI] jobs unfinished (starting at 1): [] [DBI] The Log file are under LOGS.NOBACKUP/jobman_cmdline_-g_numbered_pylearn2.scripts.jobman.experiment.train_experiment_workdir_prefix_jobman_demo__jobman_demo_mlp.conf_2014-01-10_15-41-21.548948
jobman.tools.find_conf_files
¶Once all relevant results have been extracted, you'll probably want to find the best set of hyperparameters.
One way to do that is to call jobman.tools.find_conf_files
on the directory containing your experiment directories; the method will return a list of DD
objects for all experiment files present in that directory and in its subdirectories. You can then go through that list and quickly extract the best hyperparameters:
import numpy
from jobman import tools
def parse_results(cwd):
optimal_dd = None
optimal_measure = numpy.inf
for tup in tools.find_conf_files(cwd):
dd = tup[1]
if 'results.train_y_misclass' in dd:
if dd['results.train_y_misclass'] < optimal_measure:
optimal_measure = dd['results.train_y_misclass']
optimal_dd = dd
print "Optimal " + "results.train_y_misclass" + ": " + str(optimal_measure)
for key, value in optimal_dd.items():
if 'hyper_parameters' in key:
print key + ": " + str(value)
parse_results("jobman_demo/")
WARNING: jobman_demo/__init__.pyc/current.conf file not found. Skipping it WARNING: jobman_demo/mlp.conf/current.conf file not found. Skipping it WARNING: jobman_demo/__init__.py/current.conf file not found. Skipping it WARNING: jobman_demo/mlp.pkl/current.conf file not found. Skipping it WARNING: jobman_demo/utils.py/current.conf file not found. Skipping it WARNING: jobman_demo/utils.pyc/current.conf file not found. Skipping it WARNING: jobman_demo/mlp.yaml/current.conf file not found. Skipping it Optimal results.train_y_misclass: 0.217 hyper_parameters.learning_rate: 0.00191878940445 hyper_parameters.init_momentum: 0.782112604517