This is part of a larger project on the practical application of quantal response equilibrium and related concepts of statistical game theory.

It is inspired in part by several recent papers by Camerer, Palfrey, and coauthors, developing various generalizations of quantal response equilibrium (including heterogeneous QRE) and links between QRE and cognitive hierarchy/level-k models.

These are exciting developments (in the view of this researcher) because this appears to offer a coherent and flexible statistical framework to use as a foundation for behavioural modeling in games, and a data-driven approach to behavioural game theory.

What we will do in this talk is to use the facilities in Gambit to explore how to fit QRE to small games. We'll focus on small games so that all computations can be done live in this Jupyter notebook - making this talk equally a useful practical tutorial, and automatically documenting all the computations done in support of the development of the lines of thought herein.

**Advertisement**: Thanks to some friends of Gambit at Cardiff (and support from a Google Summer of Code internship), Gambit is being integrated into the SAGE package for computational mathematics in Python (http://www.sagemath.org). Also as part of this, it will soon be possible to interact with Gambit (alongside all of the mathetmatical tools that SAGE bundles) using notebooks like this one, on a server via your web browser.

We begin by looking at some of the examples from the original Quantal Response Equilibrium paper:

McKelvey, R. D. and Palfrey, T. R. (1995) Quantal response equilibrium for normal form games. *Games and Economic Behavior* **10**: 6-38.

We will take the examples slightly out of order, and look first at the non-constant sum game of "asymmetric" matching pennies they consider from

Ochs, J. (1995) Games with unique mixed strategy equilibria: An experimental study. *Games and Economic Behavior* **10**: 174-189.

In [1]:

```
import math
import numpy
import scipy.optimize
import pandas
import matplotlib.pyplot as plt
%matplotlib inline
import gambit
```

In [2]:

```
gambit.__version__
```

Out[2]:

In [3]:

```
m1 = numpy.array([[ 4, 0 ], [ 0, 1 ]], dtype=gambit.Decimal) * gambit.Decimal("0.2785")
m2 = numpy.array([[ 0, 1 ], [ 1, 0 ]], dtype=gambit.Decimal) * gambit.Decimal("1.1141")
m1, m2
```

Out[3]:

Create a Gambit game object from these payoff matrices (`from_arrays`

is new in Gambit 15.1.0)

In [4]:

```
g = gambit.Game.from_arrays(m1, m2)
g
```

Out[4]:

In [5]:

```
gambit.nash.enummixed_solve(g)
```

Out[5]:

In [6]:

```
data1 = g.mixed_strategy_profile()
data1[g.players[0]] = [ 16*128*0.527, 16*128*(1.0-0.527) ]
data1[g.players[1]] = [ 16*128*0.366, 16*128*(1.0-0.366) ]
data1
```

Out[6]:

In [7]:

```
qre1 = gambit.nash.logit_estimate(data1)
qre1
```

Out[7]:

We successfully replicate the same value of lambda and the same fitted mixed strategy profile as reported by McKPal95. Note that the method Gambit uses to find the maximizer is quite accurate; you can trust these to about 8 decimal places.

*
What Gambit does is to use standard path-following methods to traverse the "principal branch" of the QRE correspondence, and look for extreme points of the log-likelihood function restricted to that branch. This is the "right way" to do it. Some people try to use fixed-point iteration or other methods at various values of lambda. These are wrong; do not listen to them!
*

What does not quite match up is the log-likelihood; McKPal95 claim a logL of -1747, whereas ours is much larger in magnitude. I have not been able to reconcile the difference in an obvious way; it is much too large simply to be rounding errors.

In [8]:

```
numpy.dot(list(data1), numpy.log(list(qre1)))
```

Out[8]:

In [9]:

```
data2 = g.mixed_strategy_profile()
data2[g.players[0]] = [ 16*128*0.573, 16*128*(1.0-0.573) ]
data2[g.players[1]] = [ 16*128*0.393, 16*128*(1.0-0.393) ]
qre2 = gambit.nash.logit_estimate(data2)
qre2
```

Out[9]:

And for periods 33-48:

In [10]:

```
data3 = g.mixed_strategy_profile()
data3[g.players[0]] = [ 16*128*0.610, 16*128*(1.0-0.610) ]
data3[g.players[1]] = [ 16*128*0.302, 16*128*(1.0-0.302) ]
qre3 = gambit.nash.logit_estimate(data3)
qre3
```

Out[10]:

In [11]:

```
data4 = g.mixed_strategy_profile()
data4[g.players[0]] = [ 4*128*0.455, 4*128*(1.0-0.455) ]
data4[g.players[1]] = [ 4*128*0.285, 4*128*(1.0-0.285) ]
qre4 = gambit.nash.logit_estimate(data4)
qre4
```

Out[11]:

As an estimation method, this works a treat if the game is small. However, under the hood, `logit_estimate`

is solving a fixed-point problem, and fixed-point problems are computationally difficult problems, meaning the running times does not scale well as you make the size of the problem (i.e., the game) larger. A practical feasible limit for estimating QRE is with games of perhaps 100 strategies in total - less if one also wants to recover estimates of e.g. risk-aversion parameters (of which more anon).

Recently, in several papers, Camerer, Palfrey, and coauthors have been using a method first proposed by Bajari and Hortacsu in 2005. The idea is that, given the actual observed data, under the assumption that the model (QRE) is correct, then one can compute a consistent estimator of the expected payoffs for each strategy from the existing data, and from there compute a consistent estimator of lambda.

This is an interesting development because it finesses the problem of having to compute fixed points, and therefore one can compute estimates of lambda (and more interestingly of other parameters) in a computationally efficient way.

To fix ideas, return to `data1`

. We can ask Gambit to compute the expected payoffs to players facing that empirical distribution of play:

In [12]:

```
p1 = data1.copy()
p1.normalize()
p1
```

Out[12]:

In [13]:

```
p1.strategy_values()
```

Out[13]:

In [14]:

```
resp = 1.84*numpy.array(p1.strategy_values(g.players[0]))
resp = numpy.exp(resp)
resp /= resp.sum()
resp
```

Out[14]:

To estimate $\lambda$, then, it is a matter of picking the $\lambda$ that maximises the likelihood of the actual choice probabilities against those computed via the logit rule. If we assume for the moment the same $\lambda$ for both players, we see this is going to be very efficient: the choice probabilities are monotonic in $\lambda$, and we have a one-dimensional optimisation problem, so it is going to be very fast to find the likelihood-maximising choice -- even if we had hundreds or thousands of strategies!

It takes just a few lines of code to set up the optimisation:

In [15]:

```
import scipy.optimize
def estimate_payoff_method(freqs):
def log_like(freqs, values, lam):
logit_probs = [ [ math.exp(lam*v) for v in player ] for player in values ]
sums = [ sum(v) for v in logit_probs ]
logit_probs = [ [ v/s for v in vv ]
for (vv, s) in zip(logit_probs, sums) ]
logit_probs = [ v for player in logit_probs for v in player ]
logit_probs = [ max(v, 1.0e-293) for v in logit_probs ]
return sum([ f*math.log(p) for (f, p) in zip(list(freqs), logit_probs) ])
p = freqs.copy()
p.normalize()
v = p.strategy_values()
res = scipy.optimize.minimize(lambda x: -log_like(freqs, v, x[0]), (0.1,),
bounds=((0.0, None),))
return res.x[0]
```

In [16]:

```
estimate_payoff_method(data1), qre1.lam
```

Out[16]:

In [17]:

```
estimate_payoff_method(data2), qre2.lam
```

Out[17]:

In [18]:

```
estimate_payoff_method(data3), qre3.lam
```

Out[18]:

In [19]:

```
estimate_payoff_method(data4), qre4.lam
```

Out[19]:

We get different estimates $\hat{\lambda}$ for the different methods. This is of course completely OK; they are different estimation procedures so we don't expect to get the same answer. But there are some patterns: in three of the four cases, the payoff method estimates a smaller $\hat{\lambda}$ than the fixed-point method, that is, it estimates less precise behaviour. In the last case, where the fixed-point method claims play is close to Nash, the payoff method claims it is closest to uniform randomisation.

We can pursue this idea further by some simulation exercises. Suppose we knew that players were actually playing a QRE with a given value of $\lambda$, which we set to $1.5$ for our purposes here, as being roughly in the range observed over most of the Ochs experiment. Using Gambit we can compute the corresponding mixed strategy profile.

In [20]:

```
qre = gambit.nash.logit_atlambda(g, 1.5)
qre
```

Out[20]:

`trials`

times, and then the results are summarised for us in a handy dataset (a `pandas.DataFrame`

).

In [21]:

```
def simulate_fits(game, lam, N, trials=100):
qre = gambit.nash.logit_atlambda(game, lam).profile
samples = [ ]
for sample in xrange(trials):
f = game.mixed_strategy_profile()
for player in game.players:
f[player] = numpy.random.multinomial(N, qre[player], size=1)[0]
samples.append(f)
labels = [ "p%ds%d" % (i, j)
for (i, player) in enumerate(game.players)
for (j, strategy) in enumerate(player.strategies) ]
return pandas.DataFrame([ list(freqs) +
[ gambit.nash.logit_estimate(freqs).lam,
estimate_payoff_method(freqs),
lam ]
for freqs in samples ],
columns=labels + [ 'fixedpoint', 'payoff', 'actual' ])
```

Let's have a look at 100 simulations of 50 plays each with $\lambda=1.5.$

In [22]:

```
simdata = simulate_fits(g, 1.5, 50)
simdata
```

Out[22]:

In [23]:

```
(simdata.fixedpoint>simdata.payoff).astype(int).mean()
```

Out[23]:

In [24]:

```
simdata.fixedpoint.describe()
```

Out[24]:

In [25]:

```
simdata.payoff.describe()
```

Out[25]:

In [26]:

```
def method_scatterplot(df, lam):
plt.plot([ row['payoff']/(1.0+row['payoff']) for (index, row) in df.iterrows() ],
[ row['fixedpoint']/(1.0+row['fixedpoint']) for (index, row) in df.iterrows() ],
'kp')
plt.plot([ 0, 1 ], [ 0, 1 ], 'k--')
plt.plot([ lam/(1.0+lam) ], [ lam/(1.0+lam) ], 'm*', ms=20)
plt.xticks([ 0.0, 0.25, 0.50, 0.75, 1.00 ],
[ '0.00', '0.50', '1.00', '2.00', 'Nash' ])
plt.xlabel('Estimated lambda via payoff method')
plt.yticks([ 0.0, 0.25, 0.50, 0.75, 1.00 ],
[ '0.00', '0.50', '1.00', '2.00', 'Nash' ])
plt.ylabel('Estimated lambda via fixed point method')
plt.show()
```

In [27]:

```
method_scatterplot(simdata, 1.5)
```

In [28]:

```
def correspondence_plot(game, df=None):
corresp = gambit.nash.logit_principal_branch(game)
plt.plot([ x.profile[game.players[0]][0] for x in corresp ],
[ x.profile[game.players[1]][0] for x in corresp ], 'k--')
plt.xlabel("Pr(Player 0 strategy 0)")
plt.ylabel("Pr(Player 1 strategy 0)")
for lam in numpy.arange(1.0, 10.1, 1.0):
qrelam = gambit.nash.logit_atlambda(game, lam).profile
plt.plot([ qrelam[game.players[0]][0] ],
[ qrelam[game.players[1]][0] ], 'kd')
if df is None: return
for (index, sample) in df.iterrows():
lam = sample['payoff']
fitted = gambit.nash.logit_atlambda(game, lam).profile
plt.plot([ sample['p0s0']/(sample['p0s0']+sample['p0s1']),
fitted[game.players[0]][0] ],
[ sample['p1s0']/(sample['p1s0']+sample['p1s1']),
fitted[game.players[1]][0] ], 'r')
lam = sample['fixedpoint']
fitted = gambit.nash.logit_atlambda(game, lam).profile
plt.plot([ sample['p0s0']/(sample['p0s0']+sample['p0s1']),
fitted[game.players[0]][0] ],
[ sample['p1s0']/(sample['p1s0']+sample['p1s1']),
fitted[game.players[1]][0] ], 'b')
```

In [29]:

```
correspondence_plot(g)
```

In [30]:

```
correspondence_plot(g, simdata)
```

It is reasonable to ask whether this biased performance is due to some special characteristic of the games studied by Ochs. We can look to some other games from McKelvey-Palfrey (1995) to investigate this further.

The second case study is a constant-sum game with four strategies for each player reported in

O'Neill, B. (1987) Nonmetric test of the minimax theory of two-person zerosum games. *Proceedings of the National Academy of Sciences of the USA*, **84**:2106-2109.

In [31]:

```
matrix = numpy.array([ [ 5, -5, -5, -5 ],
[ -5, -5, 5, 5 ],
[ -5, 5, -5, 5 ],
[ -5, 5, 5, -5 ] ], dtype=gambit.Rational)
matrix
```

Out[31]:

We again follow McKPal95 and translate these payoffs into 1982 cents for comparability.

In [32]:

```
matrix *= gambit.Rational(913, 1000)
game = gambit.Game.from_arrays(matrix, -matrix)
```

We simulate our data around the $\hat\lambda\approx 1.3$ reported by McKPal95.

In [33]:

```
simdata = simulate_fits(g, 1.3, 50)
```

In [34]:

```
(simdata.fixedpoint>simdata.payoff).astype(int).mean()
```

Out[34]:

In [35]:

```
simdata.fixedpoint.describe()
```

Out[35]:

In [36]:

```
simdata.payoff.describe()
```

Out[36]:

In [37]:

```
method_scatterplot(simdata, 1.5)
```

Another game considered by McKPal95 is a 5x5 constant-sum game from

Rapoport, A. and Boebel, R. (1992) Mixed strategies in strictly competitive games: A further test of the minimax hypothesis. *Games and Economic Behavior* **4**: 261-283.

We put the estimation approaches through their paces, around the $\hat\lambda \approx 0.25$ reported by McKPal95.

In [38]:

```
matrix = numpy.array([ [ 10, -6, -6, -6, -6 ],
[ -6, -6, 10, 10, 10 ],
[ -6, 10, -6, -6, 10 ],
[ -6, 10, -6, 10, -6 ],
[ -6, 10, 10, -6, -6 ] ], dtype=gambit.Rational)
matrix *= gambit.Rational(713, 1000)
game = gambit.Game.from_arrays(matrix, -matrix)
simdata = simulate_fits(g, 0.25, 50)
```

In [39]:

```
simdata.fixedpoint.describe()
```

Out[39]:

In [40]:

```
simdata.payoff.describe()
```

Out[40]:

In [41]:

```
method_scatterplot(simdata, 0.25)
```

While it appears, from these (still preliminary) results, that the payoff method for estimating $\lambda$ is biased towards estimates which are more "noisy" than the true behaviour, this may not in practice be much of a concern.

In the view of this researcher, the real strength of the logit model is in using it to estimate other parameters of interest. The logit model has some attractive foundations in terms of information theory which make it ideal for this task (in addition to being computationally quite tractable).

It is precisely in these applications where the payoff-based estimation approach is most attractive. The fixed-point method becomes progressively more computationally infeasible as the game gets larger, or as the number of parameters to be considered grows. The payoff approach on the other hand scales much more attractively. What we really want to know, then, is not so much whether $\hat\lambda$ is biased, but whether estimates of *other* parameters of interest are systematically biased as well.

We pick this up by looking at some examples from

Goeree, Holt, and Palfrey (2002), Risk averse behavior in generalized matching pennies games.

In [42]:

```
matrix1 = numpy.array([ [ 200, 160 ], [ 370, 10 ]], dtype=gambit.Decimal)
matrix2 = numpy.array([ [ 160, 10 ], [ 200, 370 ]], dtype=gambit.Decimal)
game = gambit.Game.from_arrays(matrix1, matrix2)
```

In [43]:

```
def transform_matrix(m, r):
r = gambit.Decimal(str(r))
return (numpy.power(m, 1-r) - numpy.power(10, 1-r)) / \
(numpy.power(370, 1-r) - numpy.power(10, 1-r))
```

In [44]:

```
def estimate_risk_fixedpoint_method(matrix1, matrix2, freqs):
def log_like(r):
tm1 = transform_matrix(matrix1, r)
tm2 = transform_matrix(matrix2, r)
g = gambit.Game.from_arrays(tm1, tm2)
profile = g.mixed_strategy_profile()
for i in xrange(len(profile)):
profile[i] = freqs[i]
qre = gambit.nash.logit_estimate(profile)
logL = numpy.dot(numpy.array(list(freqs)), numpy.log(list(qre.profile)))
return logL
results = [ (x0, log_like(x0))
for x0 in numpy.linspace(0.01, 0.99, 100) ]
results.sort(key=lambda r: -r[1])
return results[0][0]
```

In [45]:

```
def estimate_payoff_method(freqs):
def log_like(freqs, values, lam):
logit_probs = [ [ math.exp(lam*v) for v in player ] for player in values ]
sums = [ sum(v) for v in logit_probs ]
logit_probs = [ [ v/s for v in vv ]
for (vv, s) in zip(logit_probs, sums) ]
logit_probs = [ v for player in logit_probs for v in player ]
logit_probs = [ max(v, 1.0e-293) for v in logit_probs ]
return sum([ f*math.log(p) for (f, p) in zip(list(freqs), logit_probs) ])
p = freqs.copy()
p.normalize()
v = p.strategy_values()
res = scipy.optimize.minimize(lambda x: -log_like(freqs, v, x[0]), (0.1,),
bounds=((0.0, 10.0),))
return log_like(freqs, v, res.x[0])
def estimate_risk_payoff_method(matrix1, matrix2, freqs):
def log_like(r):
tm1 = transform_matrix(matrix1, r)
tm2 = transform_matrix(matrix2, r)
g = gambit.Game.from_arrays(tm1, tm2)
profile = g.mixed_strategy_profile()
for i in xrange(len(profile)):
profile[i] = freqs[i]
logL = estimate_payoff_method(profile)
return logL
results = [ (x0, log_like(x0))
for x0 in numpy.linspace(0.01, 0.99, 100) ]
results.sort(key=lambda r: -r[1])
return results[0][0]
```

In [46]:

```
def simulate_fits(game, matrix1, matrix2, r, lam, N, trials=100):
qre = gambit.nash.logit_atlambda(game, lam).profile
samples = [ ]
for sample in xrange(trials):
f = game.mixed_strategy_profile()
for player in game.players:
f[player] = numpy.random.multinomial(N, qre[player], size=1)[0]
samples.append(f)
labels = [ "p%ds%d" % (i, j)
for (i, player) in enumerate(game.players)
for (j, strategy) in enumerate(player.strategies) ]
return pandas.DataFrame([ list(freqs) +
[ estimate_risk_fixedpoint_method(matrix1, matrix2, freqs),
estimate_risk_payoff_method(matrix1, matrix2, freqs),
r ]
for freqs in samples ],
columns=labels + [ 'r_fixedpoint', 'r_payoff', 'actual' ])
```

In [47]:

```
game = gambit.Game.from_arrays(transform_matrix(matrix1, gambit.Decimal("0.44")),
transform_matrix(matrix2, gambit.Decimal("0.44")))
```

In [48]:

```
simdata = simulate_fits(game, matrix1, matrix2, 0.44, 1.0/0.150, 100, trials=30)
simdata
```

Out[48]:

In [49]:

```
simdata.r_fixedpoint.describe()
```

Out[49]:

In [50]:

```
simdata.r_payoff.describe()
```

Out[50]:

- The payoff-based approach to estimating QRE is attractive and offers hope of fitting much richer models.
- Preliminary indications are that estimated QRE parameters $\hat\lambda$ obtained by using the payoff method are biased (in the direction of understating precision), so these should be used with care in small samples.
- At least in the generalized matching pennies game, the bias in $\hat\lambda$ does not seem to result in any bias in risk attitude estimates.

In [ ]:

```
```