Local Search in Discrete State and Action Spaces

For some search problems, the path to reach the goal is not important. Discovering a goal state or a state with the best value of some objective function is the important part. Such problems are often called optimization problems.

What are some examples?

What is the best setting for a control on your stove to cook your soup as quickly as possible without burning it?

What is the best way to pack the words of a word cloud, or a wordle? See an answer here.

What is the best way to divide up the computations of a matrix calculation to be distributed across multiple cores?

Local search algorithms operate by maintaining a single state (or set of pretty good states) and generating another state to try based on this limited memory. Some set of actions are available to modify a state to another "neighboring" state. It is a "neighbor" state in an abstract graph of states connected by single actions.

Imagine all of the possible states for a problem laid out along a line, along an x-axis. The value of the objective function can be plotted as a height above the points on the x-axis. A local search algorithm tries to find the state with the best height, where best could be the maximal or minimal value depending on the search problem. The best value over all states is the global optimum (a minimum or a maximum depending on the problem).

A local search algorithm is complete if it always finds a goal state. It is optimal if it always finds the optimum state, the one with the global maximum or minimum.

Hill-Climbing Search

This is our first example of a local search algorithm. Imagine you are climbing a mountain and you are in a very thick fog. You can only see a distance equal to one step length. To try to climb you take the step in the direction that is steepest to get to the highest point of all the locations you can currently see.

In other words, hill-climbing search simply evaluates the objective function for all states that are neighbors to the current state, and takes the neighbor state with the best objective function value as the new current state. If there are more than one next best states, one is picked randomly.

Hill-climbing search is sometimes called greedy search, because a step is taken after only considering the immediate neighbors. No time is spent considering possible future states.

Hill-climbing is easy to formulate and implement and often finds pretty good states quickly. But, it has the following problems:

  • it gets stuck on local optima (hills for maximizing searches, valleys for minimizing searches,
  • it may get stuck on a ridge, if no single action can advance the search along the ridge,
  • it may get stuck wandering on a plateau for which all neighboring states have equal value.

Common variations include

  • allow sideways moves (when on a plateau)
  • stochastic hill-climbing: choose next state with probability related to increase in value of objective function
  • first-choice hill-climbing: generate neighbors by random choice of available actions and keep first state that has better value,
  • random-restart hill climbing: conduct multiple hill-climbing searches from multiple, randomly generated, initial states.

Only this last one, with random-restarts, is complete. In the limit, all states will be tried as starting states so the goal, or best state, will eventually be found.

The Eight-Queens Problem

Place eight queens on a chess board so that no queen is attacking another. Each queen must be in one of the 8 columns, so each queen can be placed in one of the 8 rows, for a total of $8^8 \approx 17$ million states. Actions are moving a single queen to a different row in the same column. The objective function is the number of pairs of queens attacking each other. This function is to be minimized, of course. See this animation and this nice explanation

Hill-climbing search as described only finds solutions 14% of the time, but solves those instances quickly, in an average of 4 steps. When it gets stuck this is discovered in an average of 3 steps.

Hill-climbing Variations Percent Solved Steps to Find Solution Steps to Know Failure
basic 14% 4 3
with sideways moves 94% 21 64
with restarts 100% 22
with sidways moves and restarts 100% 28

Hard Problems

Hard ones for hill-climbing are ones with many local optima. NP-hard problems often have an exponential number of local optima, but states with pretty good value can often be found with a small number of restarts.

Simulated Annealing

Hill-climbing searches will get stuck on local optima. Only by adding random restarts can you have a hill-climbing algorithm that is complete.

To get off of a local optimum, a search must be defined to allow steps that are "downhill" for maximizing searches, and "uphill" for minimizing searches, away from the optimum.

Simulated annealing is an algorithm that does this probabilisitically. Assume we are doing a maximizing search, meaning we want to find the state with the maximum value. Let the value of the current state be $v$. Imagine an action has been applied to that state and the resulting state has a lower (worse) value $v'$. Simulated annealing will accept this new state as the current state with probability $e^{(v' - v)/T}$. $T$ is like a "temperature", the higher the value the more likely we are to take a step to a state with a worse value. In practice, $T$ starts at a high value and is slowly decreased towards zero. If it is decreased "slowly enough", the global optimum will be found with probabilty 1. In other words, this is a complete algorithm.

In [2]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

def probOfAcceptance(dE, T):
    r = np.exp(dE/T)
    r[r>1] = 1.0
    return r
    
dE = np.linspace(-5, 5, num=100)

plt.figure(figsize=(10,7))
plt.clf()
legendText = []
for T in [0.1, 1, 10, 100]:
    plt.plot(dE, probOfAcceptance(dE,T), 'o-')
    legendText.append('Temperature = {:1f}'.format(T))
plt.xlabel('New Value - Current Value')
plt.ylabel('Probability of Accepting New State')
plt.legend(legendText,loc='lower right');

Local Beam Search

The above searches keep just one state in memory. Local beam search keeps the best $k$ states in memory. The successors of all $k$ states are generated, and the best $k$ among them are kept.

A variant is stochastic beam search that selects the $k$ states to keep as a probabilistic function of their values. This search tends to maintain more diversity in the set of kept $k$ states on each iteration.

Genetic Algorithms

Genetic algorithms are a lot like stochastic beam search. A set of $k$ states are kept. The difference is in how successor states are generated.

Genetic algorithms generate successor states by combining parts of good states to make new ones (the crossover operator) and by randomly modifying parts (the mutation operator). Then the values of the new ones are used to select the best to keep for the next iteration, or generation.

Many variations on representations of states as strings of symbols, operators, and on ways of selecting winners for the next generation.

Local Search in Continuous State and Action Spaces

Without Derivatives of Value Function

A monotonic function of a single variable can be optimized using binary search.

Here is a useful application of binary search.

Effective Branching Factor

How can we compare the efficiency of different search algorithms, or of the A* algorithm using different heuristic functions? We can use the computation time and the maximum amount of memory used during the search.

It would also be nice to know how "focused" a search is. If we use a perfect heuristic function, we should get a search that won't explore any nodes that do not lie on an optimal search path.

A measure related to this is the "effective branching factor". The branching factor of a tree is the number of children at each node. If this is not the same for all nodes, then we can find an average, or "effective" branching factor.

If we do a breadth-first (or depth-first) search on a tree like this

we will have explored 1 + 3 + 6 = 10 nodes. What is the effective branching factor for this tree? Or, stated mathematically, what is the value of $b$ in

$$ 1 + b + b^2 = 10$$

Now we can solve this for $b$. What should the value be, roughly? Solve it exactly. What do you get?

We can solve this one exactly, but what if the search is 10 levels deep? We will have to do a search (yay!) to find the value of $b$. How? Could pick a whole bunch of values randomly.....

How about a binary search? To start a binary search we need to pick a low and high value of $b$ that we know bracket the true value. For a search that explored $n$ nodes to a maximum depth of $d$, what would good low and high values be? Now do a binary search between these two values and for each new guess at $b$, calculate

$$ 1 + b + b^2 + \cdots + b^d$$

and compare the result with the actual number of nodes, $n$. Use the comparison to continue the binary search. Continue until the range of possible values of $b$ are within a desired precision, such as $0.01$. A faster way to calculate the above quantity is

$$ \frac{1-b^{d+1}}{1-b}$$

How would you derive this?

To estimate the effective branching factor, you could define a function def ebf(numberNodes, depth, precision=0.01). Then, to use it

In [3]: ebf(100, 12, 0.01)
Out[3]: 1.3034343719482422

In [4]: ebf(0, 0)
Out[4]: 0

In [5]: ebf(1, 1,)
Out[5]: 0.0078125

Run and Twiddle

Take a step in a random direction from current state. If value is higher, keep going in that direction. If value is lower, pick a new random direction. The "smallest" implementation of this algorithm is found in bacteria.

Nelder-Mead Simplex Algorithm

This is not in our text. See the Scholarpedia entry for Nelder-Mead. This maintains a set of $k$ good states. To generate one next state to evaluate, the worst of the $k$ states is reflected through the centroid, or mean, of the $k$ states. If the new state is better than the worst of the $k$ states, it replaces the worst state.

Also see the Python documentation for fmin function in scipy.

CMA-ES

Stands for Covariance Matrix Adaptation Evolution Strategy. A Gaussian distribution is fit to the set of good states and a new state is generated by sampling probabilistically from the Gaussian distribution. The Gaussian distribution is updated when a new good state is found. Recent results show show that certain quasi-random methods are computationally more efficient.

With Derivatives of Value Function

When the derivative of the value function with respect to each component of the state can be calculated, then many derivative-based searches are available. If you know the slope at the current state, you know which direction to go. If you also know the second derivative, you can make some intelligent guesses at how far to go.

If the state has multiple components, almost always the case, the derivatives are gradients, and the second derivatives are Hessian matrices.

Let's find the minimum of the function

$$f(x) = 2 x^4 + 3 x^3 + 3$$

Its first derivative is

$$\frac{df(x)}{dx} = 8 x^3 + 9 x^2$$

and its second derivative is

$$\frac{d^2f(x)}{dx^2} = 24 x^2 + 18 x$$
In [31]:
import time
import IPython.display as ipd  # for display and clear_output
import random

def f(x):
    return 2 * x**4 + 3 * x**3 + 3

def df(x): 
    return 8 * x**3 + 9 * x**2

def ddf(x):
    return 24 * x**2 + 18*x

def taylorf(x,dx):
    return f(x) + df(x) * dx + 0.5 * ddf(x) * dx**2

xs = np.linspace(-2,1,num=100)

dxs = np.linspace(-0.5,0.5,num=100)

fig = plt.figure(figsize=(18, 8))

for rep in range(5):
    x = random.uniform(-2, 1)  # first guess at minimum
    for step in range(10):
        time.sleep(1) # sleep 2 seconds
        plt.clf()
        plt.plot(xs, f(xs))
        plt.grid('on')
        plt.plot(x+dxs, taylorf(x, dxs),'g-',linewidth=5, alpha=0.4)
        plt.plot(x, f(x), 'ro')         
        y0, y1 = plt.ylim()
        plt.plot([x, x], [y0, y1], 'r--')
        if step == 0:
            plt.text(x, 10, 'New first x', color='r', fontsize=40)
        plt.text(x + 0.05, (y0 + y1) * 0.5, str(x), color='r', fontsize=20)
     
        x = x - df(x) / float(ddf(x))
        plt.plot(x, f(x), 'go')
        plt.legend(('$f(x)$', '$\hat{f}(x)$'))
        
   
        ipd.clear_output(wait=True)
        ipd.display(fig)
    
ipd.clear_output(wait=True)

Try these

4.1 What kind of search do we get from

  • Local beam search with $k=1$?
  • Local beam search with one initial state and no limit on the number of states retained?
  • Simulated annealing with $T=0$ at all times?
  • Simulated annealing with $T=\infty$?
  • Genetic algorithm with population size $k=1$?
  • Gradient-ascent in a discrete state and action space?
  • Nelder-Mead with $k=2$ in a two-dimensional state space?

On-Line Search Algorithms (from Section 4.5)

For many real-world problems, an agent cannot predict the outcome of applying an action in the current state. It must take the action and observe the new state.

An agent can end up in a dead-end from which it cannot recover. We will ignore this, and assume that a search problem is safely explorable---some goal state is reachable from every reachable state.

An on-line search agent cannot explore the successors to an arbitrary node, but only the successors of the current node. This is the nature of depth-first so the authors adapt off-line dept-first search to an on-line search algorithm. Results of each action are stored in a map that associates each state-action pair with the resulting state. It assumes it is possible to undo, or reverse, each action in order to backtrack.

An on-line version of A is learning real-time A , or LRTA Instead of maintaining memory of states as in the above depth-first algorithm, LRTA only maintains a memory of the current best estimate of cost to reach a goal for each node. It updates this estimate after each action is tried. It assumes it can identify all successors for a state and compare their remaining cost estimates. An interesting aspect is that all untried actions are assumed to result in a state whose heuristic function value is correct, a feature called optimism under uncertainty.

Learning better estimates of remaining cost based on single steps is at the heart of reinforcement learning, covered in Chapter 21.