This project draws inspiration from a specific aspect of a popular application called SplitWise. SplitWise basically helps people settle expenses among different people in a group in an efficient manner. It's easiest to understand with an example, so here's a minimal example (which we will actually get working later) :

- Bob pays \$20 for gas, split between him and Alice. (Alice owes Bob \$10)
- Charles pays \$20 for lunch, split between him and Bob. (Bob owes Charles \$10)

How can they settle their debts? Of course, the simplest option is :

- Alice pays Bob the \$10 she owes him
- Bob pays Charles the \$10 he owes him.

If Alice, Bob and Charles are friends (we'll say they're in the same group from here on) then of course, it's easy for us to understand that intuitively, the most effective way to settle their debts would just be :

- Alice pays Charles the \$10 that Bob owes him.

This reduces the total number of transanctions required for the group to reach a **settled state** from 2 to 1.

The utility of such an application is obvious. However, what's interesting is that this problem of finding the minimum number of transanctions that will lead to a group reaching a "settled" state, is actually **NP-Complete**. Also, it can actually be modelled in some interesting ways, including as an **integer programming problem**!

Over the course of this notebook, we will see one possible formulations of this problem, and also explore some extensions of the problem statement.

This seemingly straightforward problem can actually be reduced to the well known **subset-sum** problem, which is NP-Complete.

The subset-sum problem is defined as follows :

Given a set of Integers $S$, determine the existence of a non-empty subset who's elements add up to a given integer $n$

One possible transformation of our problem into the subset-sum problem is as follows :

Initialize the subset $S$ with all amounts of money owed

Set each integer $n$ to the amount of money owed to each individual

This means that finding an optimal solution to this problem for large sets of transanctions is computationally intractable, and interestingly enough, according to the founder of the actual SplitWise app they use a greedy solution (which, as greedy algorithms often go, isn't always optimal but run in reasonable time).

The data used in this notebook is synthetically generated by a Python script that writes to a CSV file. The CSV file is in the format :

(owed by), (owed to), (amount owed)

This format was chosen because even though real transanctions most likely occur in the format :

(person who paid), (amount paid), (people splitting the expense)

Transforming from that format to the one used by this notebook is simple, and therefore does not affect the modelling in any way. It would be simple to add some pre-processing to convert data between these two formats.

To understand the integer programming problem, let's go back to our minimal example and try and solve that.

We can represent the transanctions in terms of a matrix $T$ such that :

each element $T_{ij}$ of the matrix is the amount of money (in dollars)

person $i$ owes person $j$.

Here's a quick recap of the transanctions we had, simplified down :

Alice owes Bob \$10 Bob owes Charles \$10

So based on our above formulation of the $T$ matrix:

$T_{alice, bob} = 10$

$T_{bob, charles} = 10$

For convenience, let's assign integer indices to Alice, Bob and Charles as follows :

Alice = 1

Bob = 2

Charles = 3

We can now form our $T$ matrix :

$T = \begin{bmatrix} 0 & 10 & 0 \\ 0 & 0 & 10 \\ 0 & 0 & 0 \end{bmatrix}$

Note here that $T_{i,i}$ will always be zero (as you can't owe yourself any money).

Similarly, we can construct a **settlement matrix $S$** which represents the optimal solution.

Let's define each element of $S$, **$S_{i,j}$ as they money that person $i$ needs to pay person $j$** in order for all debts to be settled. In our simple case, we can construct this by observation as :

$S = \begin{bmatrix} 0 & 0 & 10 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}$

So our settlement matrix tells us that Alice must pay Charles \$10, which is what we want. This is in some sense, our "target" matrix.

Intepreted in a different way, our $S$ matrix is also an *alternate to our $T$ matrix*, because it's just telling us who owes who, and how much, which is exactly what the $T$ matrix too. In other words, it basically says "Alice finally owes charles \$10."

The important thing to understand here, is that **we don't really care who pays who**, as long as everyone receives or pays what they're supposed to. We can express this as :

Net Amount for person $i$ = Amount owed to $i$ - Amount owed by $i$

How do we represent that using the matrices we've constructed?

Net amount owed by $i$ = $\sum_{j} T_{ij}$

(sum along row $i$)

Net amount owed to $i$ = $\sum_{j} T_{ji}$(sum along column $i$)

Since the net amount owed to or by each person must be the same in the the $S$ and $T$ matrices we arrive at this important **constraint** :

$\sum_{j} T_{ji} - \sum_{j} T_{ij} = \sum_{j} S_{ji} - \sum_{j} S_{ij}$

Therefore, as long as this constraint holds, $S$ and $T$ are basically equivalent in the manner we require.

Our task has now reduced to finding a sparisified version of $T$ that satisifies the above constraint!

How do we actually sparisfy $T$ though?

One simple way of doing this is to express this as our objective:

Minimize the number of elements of $S$ that are not 0.

And to do this, we can turn to integer programming, by simply using a binary indicator variable to determine if an element of $S$ is zero or nonzero.

Therefore, our final formulation is as follows :

$$ \begin{aligned} \underset{I \in \mathbb{R^{n x n}}}{\text{minimize}}\qquad& \sum_{i, j} I_{ij} && i,j = 1,\dots,n \\ \text{subject to:}\qquad& S_{i,j} \ge 0 && i,j = 1,\dots,n\\ & S_{i,j} \le MI_{i,j} && i,j = 1,\dots,n\\ & \sum_{j} S_{ji} - \sum_{j} S_{ij} = \sum_{j} T_{ji} - \sum_{j} T_{ij} && i = 1,\dots,n\\ \end{aligned} $$

where $M$ is an upper-bound on the values $S_{i,j}$ can take.

We can now solve an MWE of our problem.

In [1]:

```
using JuMP, NamedArrays, Cbc
```

Let's reuse our example from above.

In [2]:

```
T = [0 10 0; 0 0 10; 0 0 0 ]
```

Out[2]:

In [3]:

```
m = Model(solver = CbcSolver())
@variable(m, s[1:3, 1:3])
@variable(m, indicator[1:3, 1:3], Bin)
# while some possible formulations of this problem might allow
# for a negative S(i,j) value in the sense that a negative
# amount of money owed can be construed as money lent, but the
# formulation described above does not allow for negative values of S.
@constraint(m, s .>= 0)
@expression(m, rowsums[i in 1:3], sum(s[i,j] for j in 1:3))
@expression(m, colsums[i in 1:3], sum(s[j,i] for j in 1:3))
# This is the constraint that forces equivalence between
# the S matrix and the T matrix.
@constraint(m, rowsumc[i in 1:3],
(colsums[i] - rowsums[i]) == (sum(T[j,i] for j in 1:3) - sum(T[i,j] for j in 1:3)))
# This constraint forces the indicator variables to be set to 1
# if a transanction occurs i.e. if any element of the S matrix is non-zero
# we're just going to use an arbitrarily large number for the upper-bound for now
# more on this later.
@constraint(m, s .<= indicator*10000)
@objective(m, Min, sum(indicator))
status = solve(m)
```

Out[3]:

Let's examine the value of our S matrix :

In [4]:

```
s_opt = getvalue(s)
```

Out[4]:

This matches what we were expecting, which is great. Now that we have a simple working model, we can extend it to more realistic examples, with several transanctions.

Now that we have a minimum working example, let's test it out with a larger data set.

We will use a random set of 20 transanctions among 5 people. First, let's just parse the CSV file.

In [5]:

```
function get_T_from_csv(datafile)
transanctions = []
all_users = []
f = open(datafile)
lines = readlines(f)
for line in lines
owed_by, owed_to, amount = split(line, ",")
push!(transanctions, [owed_by, owed_to, parse(Int, amount)])
push!(all_users, owed_by)
push!(all_users, owed_to)
end
# Set up some variables we'll need for later
unique_users = sort(unique(all_users))
n = length(unique_users)
# Initialize the T Matrix
T = NamedArray(zeros(n,n), (unique_users, unique_users), ("OWED BY", "OWED TO"));
# Fill in the T Matrix
# T[i,j] => i owes j T[i,j]
for transanction in transanctions
owed_by, owed_to, amount = transanction
T[owed_by, owed_to] = amount
end
return T, n, unique_users
end
```

Out[5]:

In [6]:

```
T, n, unique_users = get_T_from_csv("dense.csv")
println(T)
```

Here, we have a total of 20 transanctions that occured. Let's examine them.

In [7]:

```
for i in unique_users
for j in unique_users
if T[i,j] != 0
println("$i owes $j \$$(T[i,j])")
end
end
end
```

Here's a graph showing us just how messy these debts are right now :

Let's see how many transanctions we can reduce this down to.

In [8]:

```
function find_S_matrix(T, n, unique_users)
m = Model(solver = CbcSolver())
@variable(m, s[1:n, 1:n])
@variable(m, indicator[1:n, 1:n], Bin)
@constraint(m, s .>= 0)
@expression(m, rowsums[i in 1:n], sum(s[i,j] for j in 1:n))
@expression(m, colsums[i in 1:n], sum(s[j,i] for j in 1:n))
@constraint(m, rowsumc[i in 1:n],
(colsums[i] - rowsums[i]) == (sum(T[j,i] for j in 1:n) - sum(T[i,j] for j in 1:n)))
@constraint(m, s .<= indicator*10000)
@objective(m, Min, sum(indicator))
status = solve(m)
println(status)
s_opt = getvalue(s)
S = NamedArray(s_opt, (unique_users, unique_users), ("OWED BY", "OWED TO"));
return S
end
```

Out[8]:

In [10]:

```
S = find_S_matrix(T, n, unique_users)
```

Out[10]:

In [12]:

```
for i in unique_users
for j in unique_users
if S[i,j] != 0
println("$i owes $j \$$(S[i,j])")
#println("\"$i,$j,$(S[i,j])\",")
end
end
end
```

And here's our new, simplified graph :

Just 4 transanctions! Let's verify that this result ensures everyone is getting what they deserve.

First, let's see what everyone's receiving (we know that this is the sum along columns) :

In [13]:

```
colsums = []
for i in 1:n
colsum = sum(S[j,i] for j in 1:n)
# we'll also save the colsum for later
push!(colsums, colsum)
println( "$(unique_users[i]) => $colsum" )
end
```

Next, let's see what everyone's paying (sum along the rows):

In [14]:

```
rowsums = []
for i in 1:n
rowsum = sum(S[i,j] for j in 1:n)
push!(rowsums, rowsum)
println( "$(unique_users[i]) => $rowsum" )
end
```

Finally, let's see the net amount everyone receives or gets. This is just what they're getting minus what they owe.

In [15]:

```
net = colsums - rowsums
for i in 1:n
println( "$(unique_users[i]) => $(net[i])" )
end
```

No surprises there. Let's repeat this process for the $T$ matrix we started off with. If we end up at the same thing, we know for sure that we're being fair to everyone.

In [16]:

```
colsums = []
for i in 1:n
colsum = sum(T[j,i] for j in 1:n)
push!(colsums, colsum)
end
rowsums = []
for i in 1:n
rowsum = sum(T[i,j] for j in 1:n)
push!(rowsums, rowsum)
end
net = colsums - rowsums
for i in 1:n
println( "$(unique_users[i]) => $(net[i])" )
end
```

Great, it matches our values from the $S$ matrix.

So now we know that our model works, as no one is paying more than they owe, or receiving less than they should.

An interesting case to try out would be to have many circular transanctions.

Basically, something along the lines of :

Alice owes Bob \$10 Bob owes Charles \$10

Charles owes Dick \$10 Dick owes Alice \$10

We can of course tell that in this case, there's no transanctions needed as the group is already in a settled state. But let's see if our model realizes that.

Let's get our data first

In [17]:

```
T, n, unique_users = get_T_from_csv("circular.csv")
```

Out[17]:

In [18]:

```
for i in unique_users
for j in unique_users
if T[i,j] != 0
println("$i owes $j \$$(T[i,j])")
end
end
end
```

If we were to graph these, it becomes clear why I keep calling them "circular transanctions" :

Let's give them to our solver and see what it comes up with.

In [19]:

```
S = find_S_matrix(T, n, unique_users);
```

In [20]:

```
S.array
```

Out[20]:

It works as we expected it to. Now we can get more complex.

In a more realistic situation, there isn't going to be just one group of people on the app. There are going to exist multiple groups. Let's define what a group is exactly :

- A user can belong to multiple groups.
- An input transanction will only occur among members of one group.
- A settlement transanction can be across groups only if it's between two people who belong to both groups.
- The contrapositive to the above is that if money is flowing between two people, they must belong to at least one group together.

Here's a simple example to illustrate why this is important :

**Example 1**

**Group 1** (Alice, Bob and Charles)

Alice owes Bob \$10 Bob owes Charles \$10

**Group 2** (Alice, Dick, and Charles)

Charles owes Dick \$20 Dick owes Alice \$20

If we just treated them as individual groups, we'd end up with these 2 transanctions to stabilize each group :

Alice owes Charles \$10 Charles owes Alice \$20

Whereas what we really want is :

Charles owes Alice \$10

So how do we approach this problem? We definitely can't combine everything into one big $T$ matrix. Consider the following example to understand why :

**Example 2**

**Group 1** (Alice, Bob and Charles)

Alice owes Bob \$10 Bob owes Charles \$10

**Group 2** (Dick and Charles)

Charles owes Dick \$10

Here, we *wouldn't* want to end up with an answer that looks like :

Alice owes Dick \$10

Because Alice and Dick aren't in the same group together. This time, we do want our solver to give us two separate transanctions that respect the group structuring. Or in other words :

Alice owes Charles \$10 Charles owes Dick \$10

It turns out, all we have to do is tell our solver who money can't move between. This is just an additional (set of) constraints that we need to derive based on the group information.

But first off, we need this group information, which we will generate another CSV file for, in the format :

Group Number, User1, User2...

For simplicity, we will assume the group numbers start at 1 and increase by 1 with each line.

Let's parse our group information CSV file.

In [21]:

```
function get_group_info(groupfile)
groups = []
all_users = []
f = open(groupfile)
lines = readlines(f)
for line in lines
this_group = split(line, ",")
users_only = this_group[2:end]
push!(groups, users_only)
for user in users_only
push!(all_users, user)
end
end
return groups, sort(unique(all_users))
end
groups, unique_users = get_group_info("groups.csv");
```

Here's our groups :

In [22]:

```
group_number = 1
for group in groups
println("Group $group_number : ")
for name in group
print("$name, ")
end
println("\n")
group_number+=1
end
```

Now, we need to construct a list of disallowed transanctions. This is basically a list of (user1, user2) pairs that money cannot flow between. Basically, if two users are never in the same group, then money can't flow between them.

In [23]:

```
function get_disallowed_transanctions(groups, unique_users)
disallowed = []
n = length(unique_users)
for i in 1:n-1
for j in i:n
first_user = unique_users[i]
second_user = unique_users[j]
allowed = false
for k in 1:3
is_first_in_group_k = first_user in groups[k]
is_second_in_group_k = second_user in groups[k]
are_both_in_group_k = is_first_in_group_k & is_second_in_group_k
if are_both_in_group_k
allowed=true
# no need to check the other groups as
# we found at least one group in common they share
break
end
end
if allowed == false
# money can't flow in either direction
# for disallowed users
push!(disallowed, (i,j))
push!(disallowed, (j,i))
end
end
end
return disallowed
end
disallowed_transanctions = get_disallowed_transanctions(groups, unique_users);
```

Now that we have that, let's load up their transanctions as usual. The trasnanction file has the same structure as before.

In [24]:

```
T, n, unique_users = get_T_from_csv("group_transanctions.csv");
```

We need to change our model to include the additional constraint that now, money can't flow between certain people.

In [27]:

```
function find_S_matrix_grouped(T, n, unique_users)
m = Model(solver = CbcSolver())
# first, all the same variables and constraints as before.
@variable(m, s[1:n, 1:n])
@variable(m, indicator[1:n, 1:n], Bin)
@constraint(m, s .>= 0)
@expression(m, rowsums[i in 1:n], sum(s[i,j] for j in 1:n))
@expression(m, colsums[i in 1:n], sum(s[j,i] for j in 1:n))
@constraint(m, rowsumc[i in 1:n],
(colsums[i] - rowsums[i]) == (sum(T[j,i] for j in 1:n) - sum(T[i,j] for j in 1:n)))
@constraint(m, s .<= indicator*10000)
# and now, the additional constraint of disallowed transanctions
for user_tuple in disallowed_transanctions
# we can actually define a constraint either on indicator, or S itself.
i = user_tuple[1]
j = user_tuple[2]
@constraint(m, s[i,j] == 0)
end
# objective doesn't change either
@objective(m, Min, sum(indicator))
status = solve(m)
println(status)
s_opt = getvalue(s)
S = NamedArray(s_opt, (unique_users, unique_users), ("OWED BY", "OWED TO"));
return S
end;
```

Let's run this updated model.

In [28]:

```
S = find_S_matrix_grouped(T, n, unique_users);
```

In [29]:

```
for i in unique_users
for j in unique_users
if S[i,j] != 0
println("$i owes $j \$$(S[i,j])")
end
end
end
```

If we examine the results, we can actually see that none of the transanctions occur between two people who are never in the same group at least once. This is exactly what we wanted. Here's the groups again just so you can look for yourself :

In [30]:

```
group_number = 1
for group in groups
println("Group $group_number : ")
for name in group
print("$name, ")
end
println("\n")
group_number+=1
end
```

Let's re-run this for the example we constructed by hand above, and compare it against our earlier model

Just to recap, here's the earlier example :

**Group 1** (Alice, Bob and Charles)

Alice owes Bob \$10 Bob owes Charles \$10

**Group 2** (Dick and Charles)

Charles owes Dick \$10

Let's set up the $T$ and disallowed_transanction matrices manually.

In [31]:

```
T = [
0 10 0 0;
0 0 10 0;
0 0 0 10;
0 0 0 0;
];
unique_users = ["Alice", "Bob", "Charles", "Dick"]
n = 4
# Dick can't transanct with either Bob or Alice
disallowed_transanctions = [
(1,4)
(4,1)
(2,4)
(4,2)
];
```

In [32]:

```
S = find_S_matrix_grouped(T, n, unique_users)
for i in unique_users
for j in unique_users
if S[i,j] != 0
println("$i owes $j \$$(S[i,j])")
end
end
end
```

This is what we expected, but now let's just compare it to what we would get if we just used our earlier solver.

In [33]:

```
S = find_S_matrix(T, n, unique_users)
for i in unique_users
for j in unique_users
if S[i,j] != 0
println("$i owes $j \$$(S[i,j])")
end
end
end
```

Of course, it doesn't know anything about groups, so it just (wrongly) suggests Alice should pay Dick.

Till now, we've been using an arbitrarily large value for the upper bound on the problem, but we can actually determine this by just looking at the raw transanctions.

The upper bound in our case is essentially the largest single transanction that could occur in the settlement phase.

This can also be thought of as the maximum amount of money that a single person has to receive. We don't have to worry about the maximum amount of money a single person owes, because even if it's greater than the maximum someone has to receive, that particular value will never occur in the settlement phase because the amount owed would be split between at least two people.

In [34]:

```
function find_upper_bound(T, n)
colsums = []
for i in 1:n
colsum = sum(T[j,i] for j in 1:n)
push!(colsums, colsum)
end
return maximum(colsums)
end
```

Out[34]:

In [35]:

```
T, n, unique_users = get_T_from_csv("dense.csv");
```

In [36]:

```
find_upper_bound(T, n)
```

Out[36]:

It's trivial to integrate this reasonining into our function that actually solves the problem, so for brevity, it's not done here.

So we looked at how we can use integer programming in order to simplify debts across multiple groups. This can make a big difference in the number of transnactions required to get to a settled state.

There's many different aspects of this problem that can be explored further :

**Are there other, simpler methods of modelling this problem?**The number of variables we use is of the order $n^2$ for every $n$ users. Therefore this approach will not scale to hundreds or thousands of transanctions. I had initially explored the possibility of modelling this as a flow problem, which would simplify the model greatly. I think this is something definitely worth purusing.

**Zero diagonal.**One of the characteristics of the problem we could make use of is the fact that the main diagonal of both the $T$ and $S$ matrix have to always be zero. This would require changing the model quite a bit, but it would reduce the problem to requiring*n(n-1)*variables, which on paper doesn't seem like a big difference, but it can make a difference for large values for $n$.

Do you have debts you need simplified? Go ahead and put your transanctions in and run the cell after that to find the quickest way to get to a settled state with your friends!

In [37]:

```
my_transanctions = [
# format : "owed by,owed to,amount",
# don't forget the , at the end
# sample :
"Tony,Bob,10",
];
```

In [38]:

```
function get_T_manual()
transanctions = []
all_users = []
lines = my_transanctions
for line in lines
owed_by, owed_to, amount = split(line, ",")
push!(transanctions, [owed_by, owed_to, parse(Int, amount)])
push!(all_users, owed_by)
push!(all_users, owed_to)
end
unique_users = sort(unique(all_users))
n = length(unique_users)
T = NamedArray(zeros(n,n), (unique_users, unique_users), ("OWED BY", "OWED TO"));
for transanction in transanctions
owed_by, owed_to, amount = transanction
T[owed_by, owed_to] = amount
end
return T, n, unique_users
end
T, n, unique_users = get_T_manual()
S = find_S_matrix(T, n, unique_users)
for i in unique_users
for j in unique_users
if S[i,j] != 0
println("$i owes $j \$$(S[i,j])")
#println("\"$i,$j,$(S[i,j])\",")
end
end
end
```