Code examples from Think Complexity, 2nd edition, Chapter 11

Copyright 2016 Allen Downey, MIT License

In [ ]:

```
from __future__ import print_function, division
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from thinkstats2 import Cdf
from thinkstats2 import RandomSeed
import thinkplot
from matplotlib import rc
rc('animation', html='html5')
```

A genotype is represented by a length-N array of 0s and 1.

The fitness landscape maps from each location in N-D space to a random fitness.

In [ ]:

```
class FitnessLandscape:
def __init__(self, N):
"""Create a fitness landscape.
N: number of dimensions
"""
self.N = N
self.set_values()
def set_values(self):
self.one_values = np.random.random(self.N)
self.zero_values = np.random.random(self.N)
def random_loc(self):
"""Choose a random location."""
# in NumPy versions prior to 1.11, you might need
#return np.random.randint(2, size=self.N).astype(np.int8)
return np.random.randint(2, size=self.N, dtype=np.int8)
def fitness(self, loc):
"""Evaluates the fitness of a location.
loc: array of N 0s and 1s
returns: float fitness
"""
fs = np.where(loc, self.one_values, self.zero_values)
return fs.mean()
def distance(self, loc1, loc2):
return np.sum(np.logical_xor(loc1, loc2))
```

As an example, here's a 3-D landscape.

In [ ]:

```
fit_land = FitnessLandscape(3)
fit_land.N
```

`one_values`

and `zero_values`

contain the fitness contributions of having a 1 or 0 at each element of the location array.

In [ ]:

```
fit_land.one_values, fit_land.zero_values
```

The fitness of a location is the mean of its fitness contributions.

In [ ]:

```
loc = fit_land.random_loc()
loc
```

In [ ]:

```
a = np.where(loc, fit_land.one_values, fit_land.zero_values)
a, np.mean(a)
```

`fitness`

evaluates the fitness of a location.

In [ ]:

```
loc, fit_land.fitness(loc)
```

`distance`

computes the number of bit flips to get from one location to another.

In [ ]:

```
loc1 = fit_land.random_loc()
loc2 = fit_land.random_loc()
print(loc1)
print(loc2)
fit_land.distance(loc1, loc2)
```

It uses `np.logical_xor`

In [ ]:

```
np.logical_xor(loc1, loc2)
```

Here's the class that represents agents.

In [ ]:

```
class Agent:
"""Represents an agent in an NK model."""
def __init__(self, loc, fit_land):
"""Create an agent at the given location.
loc: array of N 0s and 1s
fit_land: reference to an fit_land
"""
self.loc = loc
self.fit_land = fit_land
self.fitness = fit_land.fitness(self.loc)
def copy(self):
return Agent(self.loc, self.fit_land)
```

Each agent has a location, a reference to a FitnessLandscape, and a fitness.

In [ ]:

```
loc = fit_land.random_loc()
agent = Agent(loc, fit_land)
agent.loc, agent.fitness
```

The `Simulator`

class provides methods to run the simulations.

In [ ]:

```
class Simulation:
def __init__(self, fit_land, agents):
"""Create the simulation:
fit_land: fit_land
num_agents: int number of agents
agent_maker: function that makes agents
"""
self.fit_land = fit_land
self.agents = np.asarray(agents)
self.instruments = []
def add_instrument(self, instrument):
"""Adds an instrument to the list.
instrument: Instrument object
"""
self.instruments.append(instrument)
def plot(self, index, *args, **kwargs):
"""Plot the results from the indicated instrument.
"""
self.instruments[index].plot(*args, **kwargs)
def run(self, num_steps=500):
"""Run the given number of steps.
num_steps: integer
"""
# initialize any instruments before starting
self.update_instruments()
for _ in range(num_steps):
self.step()
def step(self):
"""Simulate a time step and update the instruments.
"""
n = len(self.agents)
fits = self.get_fitnesses()
# see who dies
index_dead = self.choose_dead(fits)
num_dead = len(index_dead)
# replace the dead with copies of the living
replacements = self.choose_replacements(num_dead, fits)
self.agents[index_dead] = replacements
# update any instruments
self.update_instruments()
def update_instruments(self):
for instrument in self.instruments:
instrument.update(self)
def get_locs(self):
"""Returns a list of agent locations."""
return [tuple(agent.loc) for agent in self.agents]
def get_fitnesses(self):
"""Returns an array of agent fitnesses."""
fits = [agent.fitness for agent in self.agents]
return np.array(fits)
def choose_dead(self, ps):
"""Choose which agents die in the next timestep.
ps: probability of survival for each agent
returns: indices of the chosen ones
"""
n = len(self.agents)
is_dead = np.random.random(n) < 0.1
index_dead = np.nonzero(is_dead)[0]
return index_dead
def choose_replacements(self, n, weights):
"""Choose which agents reproduce in the next timestep.
n: number of choices
weights: array of weights
returns: sequence of Agent objects
"""
agents = np.random.choice(self.agents, size=n, replace=True)
replacements = [agent.copy() for agent in agents]
return replacements
```

We'll use a few functions to create agents. If we want to start with identical agents:

In [ ]:

```
def make_identical_agents(fit_land, num_agents, agent_maker):
"""Make an array of Agents.
fit_land: FitnessLandscape
num_agents: integer
agent_maker: class used to make Agent
returns: array of Agents
"""
loc = fit_land.random_loc()
agents = [agent_maker(loc, fit_land) for _ in range(num_agents)]
return np.array(agents)
```

Or agents at random locations:

In [ ]:

```
def make_random_agents(fit_land, num_agents, agent_maker):
"""Make an array of Agents.
fit_land: FitnessLandscape
num_agents: integer
agent_maker: class used to make Agent
returns: array of Agents
"""
locs = [fit_land.random_loc() for _ in range(num_agents)]
agents = [agent_maker(loc, fit_land) for loc in locs]
return np.array(agents)
```

Or one agent at each possible location:

In [ ]:

```
def make_all_agents(fit_land, agent_maker):
"""Make an array of Agents.
fit_land: FitnessLandscape
agent_maker: class used to make Agent
returns: array of Agents
"""
N = fit_land.N
xs = np.arange(2**N)
ys = 2**np.arange(N)[::-1]
locs = np.bitwise_and.outer(xs, ys) > 0
agents = [agent_maker(loc, fit_land) for loc in locs]
return np.array(agents)
```

`make_all_agents`

uses the outer product of `bitwise_and`

, which is not the most obvious operation. The following cells demonstrate how it works.

We want an array that contains one row for each number $0..2^(N-1)$, where each row contains the binary representation of that number.

With `N=4`

, here are the numbers.

In [ ]:

```
N=4
xs = np.arange(2**N)
xs
```

To compute the binary representation of each number, we compute bitwise and operations with powers of two.

In [ ]:

```
ys = 2**np.arange(N)[::-1]
ys
```

The outer product puts the `xs`

down the rows and the `ys`

across the columns, the computes the bitwise `and`

for each `(x, y)`

at each intersection.

In [ ]:

```
np.bitwise_and.outer(xs, ys) > 0
```

We could convert those booleans to integers, but it's not really necessary.

Here's how `make_all_agents`

works:

In [ ]:

```
fit_land = FitnessLandscape(4)
agents = make_all_agents(fit_land, Agent)
for agent in agents:
print(agent.loc, agent.fitness)
```

Let's create a fitness landscape and see what the distribution of fitness looks like.

In [ ]:

```
RandomSeed(17)
N = 8
fit_land = FitnessLandscape(N)
agents = make_all_agents(fit_land, Agent)
sim = Simulation(fit_land, agents)
```

`plot_fitnesses`

plots the CDF of fitness across the population.

In [ ]:

```
def plot_fitnesses(sim):
"""Plot the CDF of fitnesses.
sim: Simulation object
"""
fits = sim.get_fitnesses()
cdf_fitness = Cdf(fits)
thinkplot.Cdf(cdf_fitness)
return np.mean(fits)
```

Initially the distribution is approximately Gaussian, because it's the sum of 8 independent uniformly distributed variates. See the Central Limit Theorem.

In [ ]:

```
plot_fitnesses(sim)
thinkplot.Config(xlabel='Fitness', ylabel='CDF')
```

After one time step, there's not much change.

In [ ]:

```
sim.step()
plot_fitnesses(sim)
thinkplot.Config(xlabel='Fitness', ylabel='CDF')
```

After 100 time steps, we can see that the number of unique values has decreased.

In [ ]:

```
sim.run(100)
plot_fitnesses(sim)
thinkplot.Config(xlabel='Fitness', ylabel='CDF')
```

To measure these changes over the course of the simulations, we'll use Instrument objects.

In [ ]:

```
class Instrument:
"""Computes a metric at each timestep."""
def __init__(self):
self.metrics = []
def update(self, sim):
"""Compute the current metric.
Appends to self.metrics.
sim: Simulation object
"""
# child classes should implement this method
pass
def plot(self, **options):
thinkplot.plot(self.metrics, **options)
```

The `MeanFitness`

instrument computes the mean fitness after each time step.

In [ ]:

```
class MeanFitness(Instrument):
"""Computes mean fitness at each timestep."""
label = 'Mean Fitness'
def update(self, sim):
mean = np.nanmean(sim.get_fitnesses())
self.metrics.append(mean)
```

Here's mean fitness as a function of (simulated) time for a single run.

In [ ]:

```
RandomSeed(17)
N = 8
fit_land = FitnessLandscape(N)
agents = make_all_agents(fit_land, Agent)
sim = Simulation(fit_land, agents)
instrument = MeanFitness()
sim.add_instrument(instrument)
sim.run(500)
sim.plot(0)
```

We can get a better sense of average behavior, and variation around the average, but plotting multiple runs.

In [ ]:

```
def plot_sims(fit_land, agent_maker, sim_maker, instrument_maker, **plot_options):
"""Runs simulations and plots metrics.
fit_land: FitnessLandscape
agent_maker: function that makes an array of Agents
sim_maker: function that makes a Simulation
instrument_maker: function that makes an instrument
plot_options: passed along to plot
"""
plot_options['alpha'] = 0.3
for _ in range(10):
agents = agent_maker(fit_land)
sim = sim_maker(fit_land, agents)
instrument = instrument_maker()
sim.add_instrument(instrument)
sim.run()
sim.plot(0, **plot_options)
thinkplot.Config(xlabel='Time', ylabel=instrument.label)
return sim
```

`agent_maker1`

puts one agent at each location.

In [ ]:

```
def agent_maker1(fit_land):
return make_all_agents(fit_land, Agent)
```

With no differential survival or reproduction, we get a random walk.

In [ ]:

```
RandomSeed(17)
plot_sims(fit_land, agent_maker1, Simulation, MeanFitness, color='blue')
thinkplot.Save('chap11-1', clf=False)
```

We can add differential survival by overriding `choose_dead`

In [ ]:

```
class SimWithDiffSurvival(Simulation):
def choose_dead(self, ps):
"""Choose which agents die in the next timestep.
ps: probability of survival for each agent
returns: indices of the chosen ones
"""
n = len(self.agents)
is_dead = np.random.random(n) > ps
index_dead = np.nonzero(is_dead)[0]
return index_dead
```

With differential survival, mean fitness increases and then levels off.

In [ ]:

```
RandomSeed(17)
plot_sims(fit_land, agent_maker1, SimWithDiffSurvival, MeanFitness, color='blue')
thinkplot.Save('chap11-2', clf=False)
```

We can add differential reproduction by overriding `choose_replacements`

In [ ]:

```
class SimWithDiffReproduction(Simulation):
def choose_replacements(self, n, weights):
"""Choose which agents reproduce in the next timestep.
n: number of choices
weights: array of weights
returns: sequence of Agent objects
"""
p = weights / np.sum(weights)
agents = np.random.choice(self.agents, size=n, replace=True, p=p)
replacements = [agent.copy() for agent in agents]
return replacements
```

With differential reproduction (but not survival), mean fitness increases and then levels off.

In [ ]:

```
RandomSeed(17)
plot_sims(fit_land, agent_maker1, SimWithDiffReproduction, MeanFitness, color='blue')
```

**Exercise:** What if you have both? Write a class called `SimWithBoth`

that uses the new versions of `choose_dead`

and `choose_replacements`

. Does mean fitness increase more quickly?

In [ ]:

```
# Solution
class SimWithBoth(Simulation):
choose_dead = SimWithDiffSurvival.choose_dead
choose_replacements = SimWithDiffReproduction.choose_replacements
```

In [ ]:

```
#Solution
RandomSeed(17)
plot_sims(fit_land, agent_maker1, SimWithBoth, MeanFitness, color='blue')
None
```

Without mutation, we have no way to add diversity. The number of occupied locations goes down over time.

`OccupiedLocations`

is an instrument that counts the number of occupied locations.

In [ ]:

```
class OccupiedLocations(Instrument):
label = 'Occupied Locations'
def update(self, sim):
uniq_agents = len(set(sim.get_locs()))
self.metrics.append(uniq_agents)
```

Here's what that looks like with no differential survival or reproduction.

In [ ]:

```
RandomSeed(17)
plot_sims(fit_land, agent_maker1, Simulation, OccupiedLocations, color='green')
```

**Exercise:** What effect do differential survival and reproduction have on the number of occupied locations?

In [ ]:

```
# Solution
RandomSeed(17)
plot_sims(fit_land, agent_maker1, SimWithDiffSurvival, OccupiedLocations, color='green')
None
```

In [ ]:

```
# Solution
RandomSeed(17)
plot_sims(fit_land, agent_maker1, SimWithDiffReproduction, OccupiedLocations, color='green')
None
```

In [ ]:

```
# Solution
RandomSeed(17)
plot_sims(fit_land, agent_maker1, SimWithBoth, OccupiedLocations, color='green')
None
```

The model we have so far might explain changes in existing populations, but it doesn't explain increasing diversity or complexity.

Mutation is one way of increasing, or at least maintaining, diversity.

`Mutant`

is a kind of agent that overrides `copy`

:

In [ ]:

```
class Mutant(Agent):
prob_mutate = 0.05
def copy(self):
if np.random.random() > self.prob_mutate:
loc = self.loc.copy()
else:
direction = np.random.randint(self.fit_land.N)
loc = self.mutate(direction)
return Mutant(loc, self.fit_land)
def mutate(self, direction):
"""Computes the location in the given direction.
Result differs from the current location along the given axis.
direction: int index from 0 to N-1
returns: new array of N 0s and 1s
"""
new_loc = self.loc.copy()
new_loc[direction] ^= 1
return new_loc
```

To test it out, I'll create an agent at a random location.

In [ ]:

```
N = 8
fit_land = FitnessLandscape(N)
loc = fit_land.random_loc()
agent = Mutant(loc, fit_land)
agent.loc
```

If we make 20 copies, we expect about one mutant.

In [ ]:

```
for i in range(20):
copy = agent.copy()
print(fit_land.distance(agent.loc, copy.loc))
```

`agent_maker2`

makes identical agents.

In [ ]:

```
def agent_maker2(fit_land):
agents = make_identical_agents(fit_land, 100, Mutant)
return agents
```

If we start with identical mutants, we still see increasing fitness.

In [ ]:

```
RandomSeed(17)
sim = plot_sims(fit_land, agent_maker2, SimWithBoth, MeanFitness, color='blue')
thinkplot.Save('chap11-3', clf=False)
```

And now the number of occupied locations increases, reaching a steady state at about 10.

In [ ]:

```
RandomSeed(17)
sim = plot_sims(fit_land, agent_maker2, SimWithBoth, OccupiedLocations, color='green')
thinkplot.Save('chap11-4', clf=False)
```

In steady state, many agents are at the optimal location, and others are usually just a few mutations away. To quantify that, we can compute the mean distance between all pairs of agents.

The distance between two agents is the number of bit flips to get from one location to another.

In [ ]:

```
class MeanDistance(Instrument):
"""Computes mean distance between pairs at each timestep."""
label = 'Mean Distance'
def update(self, sim):
N = sim.fit_land.N
i1, i2 = np.triu_indices(N)
agents = zip(sim.agents[i1], sim.agents[i2])
distances = [fit_land.distance(a1.loc, a2.loc)
for a1, a2 in agents if a1 != a2]
mean = np.mean(distances)
self.metrics.append(mean)
```

Mean distance is initially 0, when all agents are identical. It increases as the population migrates toward the optimal location, then settles into a steady state around 1.5.

In [ ]:

```
RandomSeed(17)
fit_land = FitnessLandscape(10)
agents = make_identical_agents(fit_land, 100, Mutant)
sim = SimWithBoth(fit_land, agents)
sim.add_instrument(MeanDistance())
sim.run(500)
sim.plot(0, color='orange')
thinkplot.Config(xlabel='Time', ylabel='Mean distance')
thinkplot.Save('chap11-5', clf=False)
```

One cause of speciation is a change in the landscape. Suppose a population in steady state in one landscape is transported to another landscape. After a period of migration, it would settle around the new equilibrium point, with a small mean distance between agents. The mean distance between the original agents and the migrants is generally much larger.

The following simulation runs 500 steps on one landscape, then switches to a different landscape and resumes the simulation.

In [ ]:

```
RandomSeed(17)
fit_land = FitnessLandscape(10)
agents = make_identical_agents(fit_land, 100, Mutant)
sim = SimWithBoth(fit_land, agents)
sim.add_instrument(MeanFitness())
sim.add_instrument(OccupiedLocations())
sim.add_instrument(MeanDistance())
sim.run(500)
locs_before = sim.get_locs()
fit_land.set_values()
sim.run(500)
locs_after = sim.get_locs()
```

After the switch to a new landscape, mean fitness drops briefly, then the population migrates to the new optimal location, which happens to be higher, in this example.

In [ ]:

```
sim.plot(0, color='blue')
plt.axvline(500, color='gray', linewidth=3, alpha=0.5)
thinkplot.Config(xlabel='Time', ylabel='Mean Fitness')
thinkplot.Save('chap11-6', clf=False)
```

The number of occupied locations (sometimes) increases while the population is migrating.

In [ ]:

```
sim.plot(1, color='green')
plt.axvline(500, color='gray', linewidth=3, alpha=0.5)
thinkplot.Config(xlabel='Time', ylabel='Occupied Locations')
```

And the mean distance (sometimes) increases until the population reaches the new steady state.

In [ ]:

```
sim.plot(2, color='orange')
plt.axvline(500, color='gray', linewidth=3, alpha=0.5)
thinkplot.Config(xlabel='Time', ylabel='Mean Distance')
```

The mean distance between clusters is much bigger than the dispersion within clusters, so we can interpret the clusters as distinct species.

In [ ]:

```
distances = []
for loc1 in locs_before:
for loc2 in locs_after:
distances.append(fit_land.distance(loc1, loc2))
np.mean(distances)
```

**Exercise:** When we change the landscape, the number of occupied locations and the mean distance sometimes increases, but it is not always clear. You might want to try out some different random seeds to see how general this behavior is.