An overview of DeepArchitect

DeepArchitect is a framework for representing expressive search spaces over deep architectures and automatically search over them. We use an example workflow to present a step-by-step explanation of the main aspects of the framework.

If you use this work, please cite: DeepArchitect: Automatically Designing and Training Deep Architectures (TODO: update with arXiv reference; temporary local version here).

Loading the data

We will be working with CIFAR-10. Run the two cells below to: download the dataset (only needs to be done once); load the training, validation, and test splits into memory, and set up standard data augmentation techniques.

Download the data

In [ ]:
import subprocess
subprocess.call(['./download_cifar10.sh'])

Load the data

In [ ]:
import darch.datasets as ds

# options for data augmentation (random cropping, random flipping, zero padding)
trans_height = 32
trans_width = 32
p_flip = 0.5
pad_size = 4 
in_d = (trans_height, trans_width, 3)
nclasses = 10

# load cifar-10.
(Xtrain, ytrain, Xval, yval, Xtest, ytest) = ds.load_cifar10(
        data_dir='data/cifar10/cifar-10-batches-py/', 
        flatten=False,
        one_hot=True,
        normalize_range=False,
        whiten_pixels=True,
        border_pad_size=pad_size)

augment_train_fn = ds.get_augment_cifar_data_train(trans_height, trans_width, p_flip)
augment_eval_fn = ds.get_augment_cifar_data_eval(trans_height, trans_width)

# wrap data into a InMemoryDataset object.
train_dataset = ds.InMemoryDataset(Xtrain, ytrain, True, augment_train_fn)
val_dataset = ds.InMemoryDataset(Xval, yval, False, augment_eval_fn)
test_dataset = ds.InMemoryDataset(Xtest, ytest, False, augment_eval_fn)

Creating a search space over models

We propose an extensible and modular language that allows the human expert to compactly represent complex search spaces over architectures and their hyperparameters.

We can construct a search space capturing many structural hyperparameters. In the complex search space example below, we choose whether to use batch normalization before or after relu, whether to use dropout or not, whether to use residual connections or not, and how many times to repeat each of the different modules. Besides these structural hyperparameters, there are also hyperparameters for the number of filters in the convolutional modules, the size of the filters, the dropout probability, and the scale of the initialization of the parameters.

The code below exemplifies the usage of the model search space specification language. Other types of modules are available in darch/modules.py. We recommend you to take a look at the different modules there and think about what types of search spaces can be constructed using them.

Run one of the cells below.

A simple search space:

In [ ]:
from darch.base import *
from darch.modules import *
from darch.initializers import *

conv_initers = [ kaiming2015delving_initializer_conv(1.0) ]
aff_initers = [ invsqrt_size_gaussian_initializer_affine( np.sqrt(2.0) )]

b_search = Concat([
                Conv2D([32, 48, 64, 96, 128], [1, 3, 5], [1], ["SAME"], conv_initers), 
                ReLU(),
                MaxPooling2D([2], [2], ["SAME"]),
                Conv2D([32, 48, 64, 96, 128], [1, 3, 5], [1], ["SAME"], conv_initers), 
                ReLU(),
                MaxPooling2D([2], [2], ["SAME"]),
                Affine([256, 512, 1024], aff_initers),
                ReLU(),
                Dropout([0.25, 0.5, 0.75]),
                Affine([nclasses], aff_initers) 
           ])

A complex search space:

In [ ]:
from darch.base import *
from darch.modules import *
from darch.initializers import *

# the initializers for the parameters (with different gains).
gains = [0.1, 1.0]
conv_initers = [ kaiming2015delving_initializer_conv( g ) for g in gains]
aff_initers = [ xavier_initializer_affine( g ) for g in gains]

# define some auxiliary modules (which represent some auxiliary spaces).
def InnerModule_fn(filter_ns, filter_ls, keep_ps, stride=1):
    return Concat([
                Conv2D(filter_ns, filter_ls, [stride], ["SAME"], conv_initers),
                MaybeSwap_fn( ReLU(), BatchNormalization() ),
                Optional_fn( Dropout(keep_ps) )
            ])

def Module_fn(filter_ns, filter_ls, keep_ps, repeat_ns):
    return RepeatTied(
                Or([
                    InnerModule_fn(filter_ns, filter_ls, keep_ps),
                    Residual( InnerModule_fn(filter_ns, filter_ls, keep_ps) )
                ]), 
            repeat_ns)

# the composite search space.
b_search = Concat([
                InnerModule_fn([48, 64, 80, 96, 112, 128], [3, 5, 7], [0.5, 0.9], stride=2),
                Module_fn([48, 64, 80, 96, 112, 128], [3, 5], [0.5, 0.9], [1, 2, 4, 8, 16, 32]),
                InnerModule_fn([48, 64, 80, 96, 112, 128], [3, 5, 7], [0.5, 0.9], stride=2),
                Module_fn([96, 128, 160, 192, 224, 256], [3, 5], [0.5, 0.9], [1, 2, 4, 8, 16, 32]),
                Affine([nclasses], aff_initers)
            ])

Instantiating the searcher

The definition of the search space only tells us the set of models that we will be considering, but it does not determine how the search space is going to be explored. This is the role of the model search algorithm.

For this example, we will use a simple random searcher. Other model search algorithms are defined in darch/searchers.py. In our paper, we describe different model search algorithms and present experimental results.

In [ ]:
import darch.searchers as srch

searcher = srch.RandomSearcher(b_search, in_d)

Instantiating the evaluator

The model evaluation algorithm determines how each of the different models in the search space is going to be evaluated.

This example uses a model evaluation algorithm that is specific to classification problems. Similar evaluators could be defined for other problems. Later in this notebook, we will look at an evaluator that has hyperparameters that will be searched jointly with the architecture of the model.

In [ ]:
import darch.evaluators as ev

evaluator = ev.ClassifierEvaluator(train_dataset=train_dataset,
                                   val_dataset=val_dataset,
                                   in_d=in_d,
                                   nclasses=nclasses,
                                   training_epochs_max=int(1e6),
                                   time_minutes_max=60, ###
                                   display_step=1,
                                   stop_patience=256, ###
                                   rate_patience=8, ###
                                   batch_patience=int(1e6),
                                   save_patience=2, 
                                   rate_mult=0.66, ###
                                   optimizer_type='adam', ###
                                   learning_rate_init=2.5e-4, ###
                                   learning_rate_min=1e-9, ###
                                   batch_size_init=64,
                                   model_path='out/model.ckpt',
                                   output_to_terminal=True)

Searching the search space

Now that we instantiated the search space, the searcher, and the evaluator, we can search over models. Let us use the searcher and evaluator to evaluate 3 models in the search space.

In [ ]:
num_models = 3

# printing a textual representation for the search space.
from pprint import pprint
print "*** Search space ***"
pprint( b_search.repr_program() , width=40, indent=2)
print 

# running the searcher and evaluator.
# prints a textual representation for the model sampled 
# from the space and logs the training process.
print "*** Models evaluated ***"
(scores, choice_hists) = srch.run_random_searcher(evaluator, 
                                                  searcher, 
                                                  num_models, 
                                                  output_to_terminal=True)

Searching over models and evaluation hyperparameters

The hyperparameters of the previous evaluator (e.g., optimizer and learning rate schedule) were fixed. These hyperparameters can have a large impact on performance and often the human expert does not know good values apriori for them. In this example, we will see how can we jointly search over the model architectures and evaluation hyperparameters.

Adding the hyperparameters for the evaluation algorithm to the search space

In [ ]:
b_hps = UserHyperparams(['optimizer_type',
                         'learning_rate_init',
                         'rate_mult',
                         'rate_patience', 
                         'stop_patience', 
                         'learning_rate_min' ],
                         [['adam', 'sgd_mom'], 
                         list( np.logspace(-2, -7, num=32) ), 
                         list( np.logspace(-2, np.log10(0.9), num=8) ),
                         range(4, 33, 4), 
                         [64], 
                         [1e-9] ])

# concatenating with the previously defined search space
b_search_with_hps = Concat([b_hps, b_search])

Defining a custom evaluator that extracts the values for the evaluator hyperparameters, instantiates the evaluator, and evaluates the model

In [ ]:
import darch.evaluators as ev

# NOTE that this evaluator relies on the specific structure of the search space
# to extract the values of the hyperparameters.
class CustomEvaluator:
    """Custom evaluator whose performance depends on the values of certain
    hyperparameters specified in the hyperparameter module. Hyperparameters that 
    we do not expect to set this way, will take default values.
    """

    def __init__(self, train_dataset, val_dataset, test_dataset, in_d, nclasses, 
            max_minutes_per_model, model_path, output_to_terminal):
        self.train_dataset = train_dataset
        self.val_dataset = val_dataset
        self.test_dataset = test_dataset
        self.in_d = in_d
        self.nclasses = nclasses
        self.max_minutes_per_model = max_minutes_per_model
        self.model_path = model_path
        self.output_to_terminal = output_to_terminal
        
    def eval_model(self, b):
        """Extract parameters from a UserHyperparams module and uses then to 
        udpate the values of certain hyperparameters of the evaluator. This 
        code is still very much based on ClassifierEvaluator.
        """
        
        # extraction of the evaluator hyperparameters (NOTE: slight violation of encapsulation).
        b_hp, b_search = b.bs
        b_hp.compile(None, None, None)
        order = b_hp.scope.s['UserHyperparams-0']['hyperp_names']
        vals = b_hp.scope.s['UserHyperparams-0']['hyperp_vals']
        hps = dict(zip(order, vals))

        evaluator = ev.ClassifierEvaluator(train_dataset=self.train_dataset,
                                        val_dataset=self.val_dataset,
                                        test_dataset=self.test_dataset,
                                        in_d=self.in_d,
                                        nclasses=self.nclasses,
                                        training_epochs_max=int(1e6),
                                        time_minutes_max=self.max_minutes_per_model,
                                        display_step=1,
                                        stop_patience=hps['stop_patience'], ###
                                        rate_patience=hps['rate_patience'], ###
                                        batch_patience=int(1e6),
                                        save_patience=2, 
                                        rate_mult=hps['rate_mult'], ###
                                        optimizer_type=hps['optimizer_type'], ###
                                        learning_rate_init=hps['learning_rate_init'], ###
                                        learning_rate_min=hps['learning_rate_min'], ###
                                        batch_size_init=64,
                                        model_path=self.model_path,
                                        output_to_terminal=self.output_to_terminal)
        return evaluator.eval_model(b_search)

Searching the search space is done the same way once the evaluator is defined

In [ ]:
ctm_num_models = 3

import darch.searchers as srch

# instantiate the evaluator.
ctm_evaluator = CustomEvaluator(train_dataset=train_dataset,
                                val_dataset=val_dataset,
                                test_dataset=None,
                                in_d=in_d,
                                nclasses=nclasses,
                                max_minutes_per_model=60, ###
                                model_path='out/model_custom.ckpt',
                                output_to_terminal=True)

# instantiate the searcher.
ctm_searcher = srch.RandomSearcher(b_search_with_hps, in_d)

# printing a textual representation for the search space.
from pprint import pprint
print "*** Search space ***"
pprint( b_search_with_hps.repr_program() , width=40, indent=2)
print 

# using the searcher and evaluator to explore the space.
print "*** Models evaluated ***"
(ctm_scores, ctm_choice_hists) = srch.run_random_searcher(ctm_evaluator, 
                                                          ctm_searcher, 
                                                          ctm_num_models, 
                                                          output_to_terminal=True)

Summary

In this tutorial we have seem how DeepArchitect allows to easily set up a search space over models and automatically search and evaluate them. The three fundamental components of the framework are the model search space specification language, the model search algorithm, and the model evaluation algorithm. Any of these components can be modified or extended while keeping the others fixed.

To explore the code further, we suggest the following:

  • Understand the module interface, e.g., look at BasicModule in darch/module.py.
  • Understand how the module interface is used by the searcher to transverse the search space, e.g., look RandomSearcher in darch/searchers.py.
  • Understand how an evaluator uses the module interface to compile a fully specified model to a computational graph and evaluate it, e.g., look at ClassifierEvaluator in darch/evaluators.py.
  • Implement new modules capturing interesting structural transformations that you want to define. Look at some instructive implementations such as BasicModule, Affine, Conv2D, Or, Repeat, and Residual.
  • Implement new modules with complex wiring structure (but perhaps still single-input single-output). Residual is an instructive example to look at.
  • Implement new evaluators for tasks of interest, e.g., reinforcement learning.

The extension of DeepArchitect to multiple-input multiple-output modules can be done along the same lines of the single-input single-output. The main differences are that the search space needs to be traversed slightly differently and compilation requires slightly more bookkeeping. Nonetheless, the same module interface is valid and the insights described in the paper carry over unchanged or slightly adapted.

In the near future, we intend to implement the multiple-input multiple-output extension and provide support for PyTorch.

If you use or extend DeepArchitect, cite: DeepArchitect: Automatically Designing and Training Deep Architectures (TODO: update with arXiv reference).

Want to contribute or have ideas?

Reach out to negrinho at cs dot cmu dot .