#!/usr/bin/env python
# coding: utf-8

# # Wedding Seat Optimization using Simulated Annealing

# *Or what I do when I go back to Singapore for one friend's wedding (congratulations Crystalbel and Kenneth!) and find that all but three of my friends are married / engaged.*
# 
# Turns out they all have a similar problem: wedding dinner planning. In particular, seat planning is a nightmare. It doesn't take much imagination for one to picture one's favorite relative specifying seating arrangements to the most minute detail. ("eh my sister's uncle's brother cannot sit with my brother's cousin because they don't see each other eye to eye.") You probably don't want to seat ex-es together as well. Same goes for ex-colleagues who may be awkward with each other. Whatever the reason is, seating arrangement is tricky.
# 
# Thankfully, I'm not the first person to come across this problem. There's a [very nice paper here by Bellows and Peterson](https://www.improbable.com/news/2012/Optimal-seating-chart.pdf) on this exact problem. This paper provided a neat framework to think about this problem, but aimed for the global optimal via mixed-integer linear optimization. I find that probably too overkill, and an approximate method that sufficient avoids the local minima trap should suffice. The paper also uses a proprietary solver which is a huge turn-off, and no one wants that for a wedding. Heh. Finally, I also wanted to learn some simulated annealing. So here's a very short script that accomplishes that.

# In[37]:


import matplotlib.pyplot as plt
plt.style.use('ggplot')
get_ipython().run_line_magic('matplotlib', 'inline')
get_ipython().run_line_magic('config', "InlineBackend.figure_format = 'svg'")

import numpy as np
import networkx as nx
import string
import pandas as pd


# ## Framing This as a Combinatorial Optimization Problem
# 
# Going back to the problem statement, we can think about this as an combinatorial optimization problem. Essentially, seating preferences can be expressed as a series of weights between two parties. For example, if Linan hates sitting with Spongebob, I can assign the event that they sit together a **positive cost** of 50. If Gabriel and Kristie are a couple, and they would very much want to sit together, then them sitting together would incur a **negative cost** of -50. If Spongebob don't want to sit with Squidward but won't die if they do end up sit together, I can assign them a slightly lower positive cost (e.g. 20).
# 
# Then, my overall goal is to minimize the cost of an arrangement. For example, if I had an arrangement that made Gabriel sit together with Kristie, Linan sit with Spongebob, and Spongebob sit with Squidward, the overall cost would be `-50 + 50 + 20 = 20`. However, if I moved Spongebob to a different table such that Gabriel, Kristie, Linan, and Squiward are at the same table, the cost for the arrangement becomes `-50` and that would be a way more preferable arrangement compared to the original. We assume of course that unspecified pairs (e.g. Gabriel and Squidward, Kristie and Linan) have a cost of 0 i.e. they don't care if they do sit together or not.
# 
# You can think of this as a graph problem with nodes being the dinner participants and edges being the costs.
# 
# So let's set up the problem. Assume that we have 10 guests initially and they hate / love each other in the following manner. Assume that each table seats 5 people, so we have two tables. Admire how creative the guest names are and how I have to import a whole damn package for that.

# In[45]:


guest_list = list(string.ascii_uppercase)[:10]

# negative is good (those two like each other), positive is bad (those two hate each other)
relationships_edges = {
    ('A', 'B'): -50,
    ('C', 'D'): -50,
    ('A', 'D'): 50,
    ('E', 'F'): 25,
    ('F', 'G'): -50,
    ('H', 'I'): -50,
}
table_size = 5
table_count = len(guest_list) // table_size

print(guest_list)


# Given `relationships_edges`, I want to convert them into a symmetric cost matrix so that matrix operations become easier (and faster! thanks `numpy`). To do this, I use the `networkx`.
# 
# I also normalize the costs by a factor of `100` so that it plays nice with the cost and acceptance probability later.

# In[46]:


temp_graph = nx.Graph()
for k, v in relationships_edges.items():
    temp_graph.add_edge(k[0], k[1], weight=v)
relationships_mat_unnormed = nx.to_numpy_matrix(temp_graph.to_undirected(), nodelist=guest_list)

relationships_mat = relationships_mat_unnormed / 100

print(relationships_mat)


# In[47]:


relationships_mat[0, 1]


# Now we have a symmetric matrix of costs. An element at `relationships_mat[i, j]` describes the cost of putting guest `i` and guest `j` together in the same table. For example, `guest_list[0] = 'A'` and `guest_list[1] = 'B'`. The cost of putting `A` and `B` together at the same table is `-50` (i.e. they love each other! yay!). `relationships_mat[0, 1]` gives us the normalized cost of `-50 / 100 = -0.5` as expected.
# 
# Notice how `networkx` converted a `nodelist` of `guest_list` into positional integers for me. Isn't that neat?

# In[48]:


table_0_seats = np.matrix([[1, 0, 1, 0, 0, 1, 1, 1, 0, 0]])
print(table_0_seats)


# We can express a single "table" as a row in a matrix by marking guests present at the table with a `1` and those not present as `0` for the positional integer that represents them. For example, a table that contains `'A', 'C', 'F', 'G', 'H'` can be expressed using the `table_0_seats` above.

# In[55]:


table_0_seats = np.matrix([[1, 0, 1, 0, 0, 1, 1, 1, 0, 0]])
print(table_0_seats * relationships_mat * table_0_seats.T)


# We then observe the interesting (okay probably not interesting) property that $SRS^T$ where $S$ is the seats matrix, $R$ is the relationships cost matrix `relationships_mat` and $S^T$ is the transpose of the seats matrix gives us the cost of a particular arrangement (or more precisely, the total cost incurred per person. e.g. since `F` and `G` are sitting at the same table, the cost incurred by each of them is `-0.5`, so the total cost for the table is `-1`).
# 
# Using `table_0_seats` as an example, the result of doing the quadratic multiplication is:
# 
# $$SRS^T = \sum^{guests}_{i=0} \sum^{guests}_{j=0} S_{1,i} \times R_{i,j} \times S^T_{j,1} $$
# 
# Now this is just the formulation for a single table. We can easily imagine each row of $S$ as a single table. In that case, we get:
# 
# $$SRS^T = C$$
# 
# $$C_{k, l} = \sum^{guests}_{i=0} \sum^{guests}_{j=0} S_{k,i} \times R_{i,j} \times S^T_{j,l} $$
# 
# for all tables $k$ and $l$
# 
# This is what we do below:

# In[6]:


table_seats_a = np.matrix([
    [1, 0, 1, 0, 0, 1, 1, 1, 0, 0],
    [0, 1, 0, 1, 1, 0, 0, 0, 1, 1]
])
table_costs_a = table_seats_a * relationships_mat * table_seats_a.T
print(table_costs_a)
print(np.trace(table_costs_a))


# Then, the diagonal of $C$ should give the cost of tables where $k=l$ i.e. our actual costs. The sum of the diagonal (or the *trace*) gives us the total cost of a given arrangement.
# 
# $$Cost = \sum^{tables}_{m} C_{m,m}$$
# 
# Using the example above, `table_seats_a` puts `'A', 'C', 'F', 'G', 'H'` in one table and `'B', 'D', 'E', 'I', 'J'` in another. `F` loves `G`, so each incurs a normalized negative cost of `-0.5`. So `np.trace(table_costs_a)` gives us `-1`.

# In[7]:


table_seats_b = np.matrix([
    [1, 1, 1, 0, 1, 0, 0, 0, 0, 1],
    [0, 0, 0, 1, 0, 1, 1, 1, 1, 0]
])
table_costs_b = table_seats_b * relationships_mat * table_seats_b.T
print(table_costs_b)
print(np.trace(table_costs_b))


# We can probably do better than that. So let's try putting `'A', 'B', 'C', 'E', 'J'` in one table and `'D', 'F', 'G', 'H', 'I'` in the other. In this case, our first table gives us a cost of `-1` because of `A` and `B`. The second table gives us `-2` due to `F` `G` and `H` `I`. This leaves us with a total cost of `np.trace(table_costs_b) = -3`.
# 
# The advantage of defining our cost calculation in terms of a matrix quadratic operation is that **`numpy` is fucking fast** and tons of optimizations are built around those things. Plus it's kind of elegant heh.
# 
# ## Constraints, Requirements, and Strategy
# 
# Now we just need to minimize $C$ subject to some constraints:
# 
# - The number of people in each table should not change. 
# - As a corollary (and if you defined the starting point correctly), the total number of guests with seats should not change and hence everyone should be seated (otherwise one solution where people only hate each other and there's no love is to keep removing people).
# - Avoiding local minima is very important for obvious reasons.
# - Finding the absolute best answer is not that important. Having an approximate answer's probably good enough.
# 
# I chose [simulated annealing](https://www.wikiwand.com/en/Simulated_annealing) because I've been wanting to learn that for quite a while. (thanks Quan!) It also gives me a really simple way to solve a pretty intractable problem (wedding dinners often contain 200 or more guests) while satisfying all the requirements I have above -- especially the one on local minimum.
# 
# If you want a good introduction to simulated annealing, I suggest you read this: [http://katrinaeg.com/simulated-annealing.html](http://katrinaeg.com/simulated-annealing.html). It's a really good tutorial, and she kicks ass at explaining simulated annealing. If you're too lazy to read that, I stole the best part of the article that sums up simulated annealing nicely:
# 
# > 1. First, generate a random solution
# > 2. Calculate its cost using some cost function you've defined
# > 3. Generate a random neighboring solution
# > 4. Calculate the new solution's cost
# > 5. Compare them:
# >     - If `c_new` < `c_old`: move to the new solution
# >     - If `c_new` > `c_old`: maybe move to the new solution. **Be more eager to move initially, and less eager to move later**
# > 6. Repeat steps 3-5 above until an acceptable solution is found or you reach some maximum number of iterations.
# 
# Seriously though, go read the article. It's awesomesauce.
# 
# ## Simulated Annealing Functions
# 
# First we build our seat reshaping function:

# In[60]:


def reshape_to_table_seats(x):
    table_seats = x.reshape(table_count, len(guest_list))
    return table_seats


# `reshape` is a constant time operation that `numpy` provides that's good for matrix heavy code. Helps you keep track of where you are and avoid cryptic bugs. I want my seat matrix $S$ to be of the shape `(table_count, guests)` so that the $SRS^T$ operation can be done easily.

# In[61]:


def cost(x):
    table_seats = reshape_to_table_seats(x)
    table_costs = table_seats * relationships_mat * table_seats.T
    table_cost = np.trace(table_costs)
    return table_cost


# Then we define the cost function. This should be nothing new. It is basically the trace of the cost matrix as mentioned earlier. It calculates the cost for a given seating arrangement using the relationship cost matrix.

# In[62]:


def take_step(x):
    table_seats = reshape_to_table_seats(np.matrix(x, copy=True))
    # randomly swap two guests
    table_from, table_to = np.random.choice(table_count, 2, replace=False)
    
    table_from_guests = np.where(table_seats[table_from] == 1)[1]
    table_to_guests = np.where(table_seats[table_to] == 1)[1]
    
    table_from_guest = np.random.choice(table_from_guests)
    table_to_guest = np.random.choice(table_to_guests)
    
    table_seats[table_from, table_from_guest] = 0
    table_seats[table_from, table_to_guest] = 1
    table_seats[table_to, table_to_guest] = 0
    table_seats[table_to, table_from_guest] = 1
    return table_seats


# The `take_step` function does step 3: generate a random neighboring solution. It does so by picking two random tables and swapping two random guests from the two tables. This has the very nice property of:
# 
# - Being random (no seriously this is important. If you're proceeding in a definite direction ala [gradient descent](https://www.wikiwand.com/en/Gradient_descent) you'll find yourself in a local minimum pretty quickly. You'll also limit your space exploration tremendously if you're not random.
# - Never changing the number of guests per table. This affords us tons of flexibility, including specifying tables with different number of guests (a common problem for odd ballrooms)
# - Similarly, not seating the same guest at two tables / having empty table etc.

# In[63]:


def prob_accept(cost_old, cost_new, temp):
    a = 1 if cost_new < cost_old else np.exp((cost_old - cost_new) / temp)
    return a


# Finally, `prob_accept` is the function that gives the probability of us moving from the current position to the neighboring position. If the neighboring position is indeed better (i.e. `cost_new < cost_old`) we move (i.e. probabilty is 1). Otherwise, we "may" move using the function `np.exp((cost_old - cost_new) / temp)` where `temp` is a decreasing function of the number of iterations. This means that our probability for taking a worse neighboring position over the current position
# 
# - Decreases the shittier the new position is. (i.e. our neighboring position is already going to be worse than our current position because otherwise, we would have returned probability 1. Given that it is worse than our current position, we are more willing to move to a position that is less shitty than a position that is more shitty.) This is expressed in `cost_old - cost_new`. Since `cost_old < cost_new`, `(cost_old - cost_new) < 0`. The more negative `(cost_old - cost_new)` is, the lower `np.exp((cost_old - cost_new) / temp)` is and hence the lower the probability.
# - Decreases as we do more iteration (i.e. we become more conservative as we run more iterations). This is expressed in the `temp` denominator. Since the numerator `(cost_old - cost_new)` is negative, having an increasingly smaller `temp` would make `(cost_old - cost_new) / temp` more negative, hence the probability lower.
# 
# The four functions are reproduced below:

# In[8]:


def reshape_to_table_seats(x):
    table_seats = x.reshape(table_count, len(guest_list))
    return table_seats

def cost(x):
    table_seats = reshape_to_table_seats(x)
    table_costs = table_seats * relationships_mat * table_seats.T
    table_cost = np.trace(table_costs)
    return table_cost

def take_step(x):
    table_seats = reshape_to_table_seats(np.matrix(x, copy=True))
    # randomly swap two guests
    table_from, table_to = np.random.choice(table_count, 2, replace=False)
    
    table_from_guests = np.where(table_seats[table_from] == 1)[1]
    table_to_guests = np.where(table_seats[table_to] == 1)[1]
    
    table_from_guest = np.random.choice(table_from_guests)
    table_to_guest = np.random.choice(table_to_guests)
    
    table_seats[table_from, table_from_guest] = 0
    table_seats[table_from, table_to_guest] = 1
    table_seats[table_to, table_to_guest] = 0
    table_seats[table_to, table_from_guest] = 1
    return table_seats

def prob_accept(cost_old, cost_new, temp):
    a = 1 if cost_new < cost_old else np.exp((cost_old - cost_new) / temp)
    return a


# ## Annealing
# 
# Now we get to the `anneal` function that runs the whole damn thing. We basically keep iterating until we reach a minimum temperature. The code should be pretty self explanatory and maps 1:1 to the pseudocode earlier. Essentially, it:
# 
# - Starts with an initial position
# - While the current temperature is above the minimum temperature,
#     - For `n_iter` steps,
#         - Calculate a neighboring position (`pos_new`)
#         - Calculate the cost of the neighboring position (`cost_new`)
#         - Calculate the acceptance probability (`p_accept`) for the neighboring position
#         - "Maybe" move to the neighboring position
#     - Reduce current temperature by a given multiple (default value of `alpha=0.9`)
# 
# `audit_trail` lets me plot pretty graphs.

# In[21]:


def anneal(pos_current, temp=1.0, temp_min=0.00001, alpha=0.9, n_iter=100, audit=False):
    cost_old = cost(pos_current)
    
    audit_trail = []
    
    while temp > temp_min:
        for i in range(0, n_iter):
            pos_new = take_step(pos_current)
            cost_new = cost(pos_new)
            p_accept = prob_accept(cost_old, cost_new, temp)
            if p_accept > np.random.random():
                pos_current = pos_new
                cost_old = cost_new
            if audit:
                audit_trail.append((cost_new, cost_old, temp, p_accept))
        temp *= alpha
    
    return pos_current, cost_old, audit_trail


# In[22]:


result = anneal(table_seats_b, audit=True)


# In[23]:


print(result[0])
print(cost(result[0]))


# And voila! We get our result. Turns out an optimal arrangement is to put `'C', 'D', 'E', 'H', 'I'` in one table and `'A', 'B', 'F', 'G', 'J'` in the other. The first table makes use of `C` `D` and `H` `I`. The second table makes use of `A` `B` and `F` `G`. That's all the negative weights we have specified. Furthermore, it has avoided the positive weight combinations (`A` `D` and `E` `F`).
# 
# To see the journey we took to get here, we can use the `audit_trail` data.

# In[30]:


audit_df = pd.DataFrame(result[2], columns=['cost_new', 'cost_old', 'temp', 'p_accept'])


# In[69]:


audit_df[['cost_old']].plot()
audit_df[['temp']].plot()
audit_df[['p_accept']].plot()


# We see that
# 
# - `cost_old` decreases with random spikes initially due to us (adventurously!) exploring neighboring solutions.
# - `temp` decreases (duh)
# - `p_accept` decreases along with `temp` unless we find better solutions (in which case we set `p_accept` to 1)
# 
# This is fucking awesome `:D`

# ## Questions?
# 
# **Why are all your friends getting married?**
# 
# Because they're on a mission to repopulate Singapore and reduce our dependency on foreign talents like me. They're also probably more charming than me.
# 
# **When's yours?**
# 
# Fuck off.
# 
# **I noticed you used `table_count = len(guest_list) // table_size`. That means all tables have to be of the same size right? What if I want different table sizes?**
# 
# Ah! Yes. Great question. There's actually no such assumption. You just need to have a constant number of tables. We take neighboring steps by swapping two guests in two random tables. That means we don't actually change the number of guests in each other. So even if in our example (using 10 guests) we have one table with 1 guest and the other with 9 guests, this would still work.
# 
# However, the number of tables would need to be constant. In other words, the table configuration (i.e. number of tables and number of people per table) need to be fixed. If you want to optimize over that, you'll have to do multiple runs of this algorithm.
# 
# **Why did you scale by `100` for the cost matrix and not use the cost matrix directly?**
# 
# Because the way `p_accept` is defined (`np.exp((cost_old - cost_new) / temp)`) causes strange things to occur if `cost_old` is way too different from `cost_new`. You'd end up getting either 0 or 1 probabilities, which is not useful. Just observe the graph of $f(x) = e^x$ in the region of $f(x) \leq 1$.
# 
# **Won't performance suffer when you start having a 100 tables or thousands of guests?**
# 
# Yes! Great question again. Given that our cost computation is $SRS^T$ where $S$ is of the shape `(table_count, guest count)` and $R$ is a symmetric matrix of the shape `(guest_count, guest_count)`, our cost computation can get pretty intensive. Fortunately, we can exploit certain properties of the matrices, namely:
# 
# - $S$ is going to be super sparse. Each row of $S$ represents a table, and it contains `1`s for the guests that are at the table and `0`s for the other guests. That means a single row will be predominantly `0`s. That means $S$ itself will be super sparse. `scipy` has an implementation of a matrix that optimizes for sparse performance. We can use that when things start to slow down.
# - $R$ is a symmetric matrix (as with most undirected cost matrices). Again, `scipy` to the rescue -- `scipy` has an implementation of a symmetric matrix that we can use.
# 
# Fortunately, `numpy` is already blazing fast and I haven't ran into any performance problems for the larger tests I've ran (up to 100 guests). We can also farm out the large matrix operations to GPUs (especially if we modify the operations to be in place so that we don't have to keep loading and unloading memory into the GPU's RAM).
# 
# For now, YAGNI. No performance problems yet, and we know squeezing performance can be easy.
# 
# **Are you going to make this into a website / command line app / app**
# 
# Maybe. The fun's all gone once the solution's here though. I solved this three weeks ago and procrastinated writing this post for that long. Is this even worth making? I don't know. Maybe more friends should get married.
# 
# **Speaking of that, you should get married...**
# 
# Fuck off.