This notebook contains course material from CBE40455 by Jeffrey Kantor (jeff at nd.edu); the content is available on Github. The text is released under the CC-BY-NC-ND-4.0 license, and code is released under the MIT license.
THe risk neutral gambler enters a game with the idea of betting until he or she either reaches a goal $N$ or runs out of money. Beginning with a stake $x$ and wager $u$, the resulting stake is either $x+u$ with probability $p$, or $x-u$ with probability $q$ (where $p + q \leq 1$.) The wager must be smaller than the stake or any maximum wager established for the game. The future value of money may be discounted by a factor $a \leq 1$.
Given an initial stake $x \lt N$, what is the optimal gambling strategy?
This classic problem in Dynamic Programming is discussed, for example, by Richard Sutton and Andrew Barto in their book Reinforcement Learning (MIT Press, 1998). The function $V(k,x)$ is the expected value of the game after the $k^{th}$ wager and with a stake $x$. If the gambler reaches the goal of winning a stake $N$ at $k$ then the value of the game is $V(k,N) = N$. If the gambler loses everything, then $V(k,0) = 0$. Otherwise, for $x \lt N$, the Bellman equation for optimality provides the recursion
$$V(k-1,x) = a \max_u \left[ p V(k,x+u) + q V(k,x-u) \right]$$where $a$ is the discount factor for future values. The maximization is over the set of possible bets ranging from $0$ to the minimum of $x$, $N-x$, or the bet limit $B$. Note that the state space and set of control actions are finite.
The optimality equation can be solved by well known methods for policy iteration. Alternatively, as shown, for example, by Sheldon Ross in Introduction to Stochastic Dynamic Programming (Academic Press, 1983), an exact solution can be found by linear programming. We seek a stationary solution $V[x]$ by minimizing $\sum_{x \in 0..N} V[x]$ subject to
$$ V[x] \geq a \left( p V[x + u] + q V[x-u]\right) $$for all feasible bets and all $x$ in $1..N-1$ with boundary conditions $V[0] = 0$ and $V[N] = N$. The set of optimal wagers $u[x]$ are found by determing the constraints that are active at optimality. $u[x]$ may have multiple values.
%%script glpsol -m /dev/stdin -y output.txt --out output
/* Problem Parameters. Any of these can be adjusted in a data section. */
param N default 100, >= 1; # Goal
param p default 0.25, >= 0, <= 1; # Winning probability
param q default 1-p, >= 0, <= 1-p; # Losing probability
param B default N, >= 1, <= N; # Maximum wager size
param a default 1, >= 0, <= 1; # Discount factor
/* Set of States */
set X:= 0..N;
/* Sets of possible wagers. These are parameterized by the State */
set U{x in X} := 1..min(B,min(N-x,x));
/* Value function */
var V{X};
/* Exact Linear Program Equivalent of the DP */
minimize OBJ: sum{x in X} V[x] ;
s.t. C1 {x in 1..N-1, u in U[x]}: V[x] >= a*(p*V[x+u] + q*V[x-u]);
s.t. C2: V[0] = 0;
s.t. C3: V[N] = N;
solve;
table tab1 {x in X} OUT "CSV" "output.csv" :
x~Stake, V[x]~ExpectedValue;
printf " Goal = %4d", N;
printf "\n Maximum Bet = %4d", B;
printf "\nWinning Probability = %8.3f", p ;
printf "\n Losing Probability = %8.3f", q ;
printf "\n Discount Factor = %8.3f", a;
printf "\n\n %7s %10s %4s\n",'x','V[x]','u[x]: Optimal Wagers';
printf " %7s %10s %4s" ,'-','----','---------------------';
for {x in X}{
printf "\n %7d %10.4f ",x, V[x];
printf {u in U[x]: abs(-V[x] + a*(p*V[x+u] + q*V[x-u])) < 0.00001} " %3d",u;
}
end;
import pandas as pd
df = pd.read_csv('output.csv')
df['ExpectedValue'].plot()
xlabel('Stake')
ylabel('Expected Value')
<matplotlib.text.Text at 0x109a55c90>
f = open("output.txt")
print(f.read())
f.close()
Goal = 100 Maximum Bet = 100 Winning Probability = 0.250 Losing Probability = 0.750 Discount Factor = 1.000 x V[x] u[x]: Optimal Wagers - ---- --------------------- 0 0.0000 1 0.0073 1 2 0.0291 2 3 0.0695 3 4 0.1166 4 5 0.1771 5 6 0.2781 6 7 0.4037 7 8 0.4663 8 9 0.5601 9 10 0.7085 10 11 0.9041 11 12 1.1124 12 13 1.5680 12 13 14 1.6146 11 14 15 1.6953 10 15 16 1.8652 9 16 17 1.9826 8 17 18 2.2406 7 18 19 2.7385 6 19 20 2.8340 5 20 21 3.0495 4 21 22 3.6164 3 22 23 3.8496 2 23 24 4.4497 1 24 25 6.2500 25 26 6.2719 1 24 26 27 6.3374 2 23 27 28 6.4586 3 22 28 29 6.5997 4 21 29 30 6.7814 5 20 30 31 7.0843 6 19 31 32 7.4610 7 18 32 33 7.6489 8 17 33 34 7.9304 9 16 34 35 8.3755 10 15 35 36 8.9623 11 14 36 37 9.5873 12 13 37 38 10.9539 12 38 39 11.0939 11 39 40 11.3360 10 40 41 11.8457 9 41 42 12.1978 8 42 43 12.9717 7 43 44 14.4654 6 44 45 14.7520 5 45 46 15.3984 4 46 47 17.0991 3 47 48 17.7988 2 48 49 19.5991 1 49 50 25.0000 50 51 25.0219 1 49 52 25.0874 2 48 53 25.2086 3 47 54 25.3497 4 46 55 25.5314 5 45 56 25.8343 6 44 57 26.2110 7 43 58 26.3989 8 42 59 26.6804 9 41 60 27.1255 10 40 61 27.7123 11 39 62 28.3373 12 38 63 29.7039 12 13 37 64 29.8439 11 14 36 65 30.0860 10 15 35 66 30.5957 9 16 34 67 30.9478 8 17 33 68 31.7217 7 18 32 69 33.2154 6 19 31 70 33.5020 5 20 30 71 34.1484 4 21 29 72 35.8491 3 22 28 73 36.5488 2 23 27 74 38.3491 1 24 26 75 43.7500 25 76 43.8156 1 24 77 44.0123 2 23 78 44.3757 3 22 79 44.7992 4 21 80 45.3441 5 20 81 46.2530 6 19 82 47.3830 7 18 83 47.9468 8 17 84 48.7913 9 16 85 50.1265 10 15 86 51.8868 11 14 87 53.7618 12 13 88 57.8617 12 89 58.2818 11 90 59.0081 10 91 60.5372 9 92 61.5935 8 93 63.9151 7 94 68.3963 6 95 69.2561 5 96 71.1951 4 97 76.2972 3 98 78.3963 2 99 83.7972 1 100 100.0000