Notebook

This notebook contains course material from CBE40455 by Jeffrey Kantor (jeff at nd.edu); the content is available on Github. The text is released under the CC-BY-NC-ND-4.0 license, and code is released under the MIT license.

< Scenario Analysis for a Plant Expansion | Contents | Risk Averse Gambler >

Risk Neutral Gambler¶

Problem Statement¶

THe risk neutral gambler enters a game with the idea of betting until he or she either reaches a goal $N$ or runs out of money. Beginning with a stake $x$ and wager $u$, the resulting stake is either $x+u$ with probability $p$, or $x-u$ with probability $q$ (where $p + q \leq 1$.) The wager must be smaller than the stake or any maximum wager established for the game. The future value of money may be discounted by a factor $a \leq 1$.

Given an initial stake $x \lt N$, what is the optimal gambling strategy?

Formulation¶

This classic problem in Dynamic Programming is discussed, for example, by Richard Sutton and Andrew Barto in their book Reinforcement Learning (MIT Press, 1998). The function $V(k,x)$ is the expected value of the game after the $k^{th}$ wager and with a stake $x$. If the gambler reaches the goal of winning a stake $N$ at $k$ then the value of the game is $V(k,N) = N$. If the gambler loses everything, then $V(k,0) = 0$. Otherwise, for $x \lt N$, the Bellman equation for optimality provides the recursion

$$V(k-1,x) = a \max_u \left[ p V(k,x+u) + q V(k,x-u) \right]$$

where $a$ is the discount factor for future values. The maximization is over the set of possible bets ranging from $0$ to the minimum of $x$, $N-x$, or the bet limit $B$. Note that the state space and set of control actions are finite.

Solution by Linear Programming¶

The optimality equation can be solved by well known methods for policy iteration. Alternatively, as shown, for example, by Sheldon Ross in Introduction to Stochastic Dynamic Programming (Academic Press, 1983), an exact solution can be found by linear programming. We seek a stationary solution $V[x]$ by minimizing $\sum_{x \in 0..N} V[x]$ subject to

$$ V[x] \geq a \left( p V[x + u] + q V[x-u]\right) $$

for all feasible bets and all $x$ in $1..N-1$ with boundary conditions $V[0] = 0$ and $V[N] = N$. The set of optimal wagers $u[x]$ are found by determing the constraints that are active at optimality. $u[x]$ may have multiple values.

MathProg Model¶

In [4]:

%%script glpsol -m /dev/stdin -y output.txt --out output

/* Problem Parameters.  Any of these can be adjusted in a data section.  */

param N default 100, >= 1;               # Goal
param p default 0.25, >= 0, <= 1;        # Winning probability
param q default 1-p, >= 0, <= 1-p;       # Losing probability
param B default N, >= 1, <= N;           # Maximum wager size
param a default 1, >= 0, <= 1;           # Discount factor

/* Set of States */

set X:= 0..N;

/* Sets of possible wagers. These are parameterized by the State */

set U{x in X} := 1..min(B,min(N-x,x));

/* Value function */

var V{X};

/* Exact Linear Program Equivalent of the DP */

minimize OBJ: sum{x in X} V[x] ;

s.t. C1 {x in 1..N-1, u in U[x]}: V[x] >=  a*(p*V[x+u] + q*V[x-u]);
s.t. C2: V[0] = 0;
s.t. C3: V[N] = N;

solve;

table tab1 {x in X} OUT "CSV" "output.csv" : 
    x~Stake, V[x]~ExpectedValue;

printf "               Goal = %4d", N;
printf "\n        Maximum Bet = %4d", B;
printf "\nWinning Probability = %8.3f", p ;
printf "\n Losing Probability = %8.3f", q ;
printf "\n    Discount Factor = %8.3f", a;
printf "\n\n %7s  %10s   %4s\n",'x','V[x]','u[x]: Optimal Wagers';
printf     " %7s  %10s   %4s"  ,'-','----','---------------------';
for {x in X}{
   printf "\n %7d  %10.4f  ",x, V[x];
   printf {u in U[x]: abs(-V[x] + a*(p*V[x+u] + q*V[x-u])) < 0.00001} " %3d",u;
}

end;

In [10]:

import pandas as pd
df = pd.read_csv('output.csv')
df['ExpectedValue'].plot()
xlabel('Stake')
ylabel('Expected Value')

Out[10]:

<matplotlib.text.Text at 0x109a55c90>

In [5]:

f = open("output.txt")
print(f.read())
f.close()

               Goal =  100
        Maximum Bet =  100
Winning Probability =    0.250
 Losing Probability =    0.750
    Discount Factor =    1.000

       x        V[x]   u[x]: Optimal Wagers
       -        ----   ---------------------
       0      0.0000  
       1      0.0073     1
       2      0.0291     2
       3      0.0695     3
       4      0.1166     4
       5      0.1771     5
       6      0.2781     6
       7      0.4037     7
       8      0.4663     8
       9      0.5601     9
      10      0.7085    10
      11      0.9041    11
      12      1.1124    12
      13      1.5680    12  13
      14      1.6146    11  14
      15      1.6953    10  15
      16      1.8652     9  16
      17      1.9826     8  17
      18      2.2406     7  18
      19      2.7385     6  19
      20      2.8340     5  20
      21      3.0495     4  21
      22      3.6164     3  22
      23      3.8496     2  23
      24      4.4497     1  24
      25      6.2500    25
      26      6.2719     1  24  26
      27      6.3374     2  23  27
      28      6.4586     3  22  28
      29      6.5997     4  21  29
      30      6.7814     5  20  30
      31      7.0843     6  19  31
      32      7.4610     7  18  32
      33      7.6489     8  17  33
      34      7.9304     9  16  34
      35      8.3755    10  15  35
      36      8.9623    11  14  36
      37      9.5873    12  13  37
      38     10.9539    12  38
      39     11.0939    11  39
      40     11.3360    10  40
      41     11.8457     9  41
      42     12.1978     8  42
      43     12.9717     7  43
      44     14.4654     6  44
      45     14.7520     5  45
      46     15.3984     4  46
      47     17.0991     3  47
      48     17.7988     2  48
      49     19.5991     1  49
      50     25.0000    50
      51     25.0219     1  49
      52     25.0874     2  48
      53     25.2086     3  47
      54     25.3497     4  46
      55     25.5314     5  45
      56     25.8343     6  44
      57     26.2110     7  43
      58     26.3989     8  42
      59     26.6804     9  41
      60     27.1255    10  40
      61     27.7123    11  39
      62     28.3373    12  38
      63     29.7039    12  13  37
      64     29.8439    11  14  36
      65     30.0860    10  15  35
      66     30.5957     9  16  34
      67     30.9478     8  17  33
      68     31.7217     7  18  32
      69     33.2154     6  19  31
      70     33.5020     5  20  30
      71     34.1484     4  21  29
      72     35.8491     3  22  28
      73     36.5488     2  23  27
      74     38.3491     1  24  26
      75     43.7500    25
      76     43.8156     1  24
      77     44.0123     2  23
      78     44.3757     3  22
      79     44.7992     4  21
      80     45.3441     5  20
      81     46.2530     6  19
      82     47.3830     7  18
      83     47.9468     8  17
      84     48.7913     9  16
      85     50.1265    10  15
      86     51.8868    11  14
      87     53.7618    12  13
      88     57.8617    12
      89     58.2818    11
      90     59.0081    10
      91     60.5372     9
      92     61.5935     8
      93     63.9151     7
      94     68.3963     6
      95     69.2561     5
      96     71.1951     4
      97     76.2972     3
      98     78.3963     2
      99     83.7972     1
     100    100.0000

In [ ]:

< Scenario Analysis for a Plant Expansion | Contents | Risk Averse Gambler >