function FP_ZSG(u, N; initial = [1 1])
# Define best response maps
BR_row(u, σ) = argmax(u*σ)[1]
BR_col(u, σ) = argmin(σ'*u)[2]
# Size of straegy spaces
= size(u)
(m₁, m₂)
# Strategies over time
= zeros(Int,2,N)
t
# Beliefs over time
= zeros(m₁, N)
σ₁ = zeros(m₂, N)
σ₂
# Initial strategies
:,1] = initial
t[
# Initial beliefs
1,1], 1] = 1
σ₁[ t[2,1], 1] = 1
σ₂[ t[
for n = 1:N-1
1,n+1] = BR_row(u, σ₂[:,n])
t[2,n+1] = BR_col(u, σ₁[:,n])
t[
:,n+1] = n*σ₁[:, n]/(n+1)
σ₁[1, n+1], n+1 ] += 1/(n+1)
σ₁[ t[
:,n+1] = n*σ₂[:, n]/(n+1)
σ₂[2, n+1], n+1 ] += 1/(n+1)
σ₂[ t[end
return (t, σ₁, σ₂)
end
4 Fictitious Play
Fictitious play is an iterative algorithm to compute the solution of static game. It was first proposed by Brown (1951) who conjectured that it could be used to find the optimal solution of two player zero sum games; a result that was soon proved by Robinson (1951). Later it was shown to converge for \(2 × 2\) games (Miyasawa 1961), potential games (Monderer and Shapley 1996), and games with an interior evolutionary stable equilibrium (Hofbauer 1995). For a detailed discussion, see Fudenberg and Levine (1998).
In this section, we will describe fictitious play for two player games and present the high level idea of why it converges in the zero sum setting.
4.1 Fictitious play for two player zero sum games
Consider a two player ZSG that is played iteratively over an infinite horizon. Let \(\{t_{i,n}\}_{n \ge 0}\) denote the (pure) strategies played by player \(i\) at time \(n\). Define sequence of mixed strategy profiles \(\{ σ_{i,n} \}_{n \ge 1}\) as follows: \[ σ_{i,n}(s_i) = \frac{1}{n} \sum_{k=1}^n \IND\{ t_{i,k} = s_i \}. \] These mixed strategies are called beliefs. Think of \(σ_{i,n}\) as the belief of player \(-i\) (in general, all players other than \(i\)) on how player \(i\) will play.
Then, at time \(n+1\), each player \(i\) chooses a pure strategy \(t_{i,n+1}\) that is a best response of its belief on how others will play: \[ t_{i,n+1} \in \text{BR}_i(σ_{-i,n}) \] where \(\text{BR}_i(σ_{-i})\) denotes the best response of player \(i\) to the other players playing mixed strategy \(σ_{-i}\).
This is a simple algorithm, which can be implemented as follows.
We now test it over Rock-Paper-Scissors.
$\mathsf{R}$ | $\mathsf{P}$ | $\mathsf{S}$ | |
$\mathsf{R}$ | $0$ | $-1$ | $1$ |
$\mathsf{P}$ | $1$ | $0$ | $-1$ |
$\mathsf{S}$ | $-1$ | $1$ | $0$ |
To fix ideas, we run FP algorithm for \(N=4\) starting with the intial strategy of P1 playing rock and P2 playing scissors.
Time | $n=1$ | $n=2$ | $n=3$ | $n=4$ |
---|---|---|---|---|
Action P1 | R | R | S | S |
Action P2 | S | P | P | R |
Belief P1 | $\begin{bmatrix} 1.0 \\ 0.0 \\ 0.0 \end{bmatrix}$ | $\begin{bmatrix} 1.0 \\ 0.0 \\ 0.0 \end{bmatrix}$ | $\begin{bmatrix} 0.67 \\ 0.0 \\ 0.33 \end{bmatrix}$ | $\begin{bmatrix} 0.5 \\ 0.0 \\ 0.5 \end{bmatrix}$ |
Belief P2 | $\begin{bmatrix} 0.0 \\ 0.0 \\ 1.0 \end{bmatrix}$ | $\begin{bmatrix} 0.0 \\ 0.5 \\ 0.5 \end{bmatrix}$ | $\begin{bmatrix} 0.0 \\ 0.67 \\ 0.33 \end{bmatrix}$ | $\begin{bmatrix} 0.25 \\ 0.5 \\ 0.25 \end{bmatrix}$ |
If we run the algorithm for a longer time, say \(N=1000\), we observe that the beliefs are slowly converging to \([1/3, 1/3, 1/3]\), which is the optimal mixed strategy for both players in this example.
Time | $n=996$ | $n=997$ | $n=998$ | $n=999$ | $n=1000$ |
---|---|---|---|---|---|
Belief P1 | $\begin{bmatrix} 0.35 \\ 0.32 \\ 0.33 \end{bmatrix}$ | $\begin{bmatrix} 0.35 \\ 0.32 \\ 0.33 \end{bmatrix}$ | $\begin{bmatrix} 0.35 \\ 0.32 \\ 0.33 \end{bmatrix}$ | $\begin{bmatrix} 0.35 \\ 0.32 \\ 0.33 \end{bmatrix}$ | $\begin{bmatrix} 0.35 \\ 0.32 \\ 0.33 \end{bmatrix}$ |
Belief P2 | $\begin{bmatrix} 0.31 \\ 0.35 \\ 0.34 \end{bmatrix}$ | $\begin{bmatrix} 0.31 \\ 0.35 \\ 0.34 \end{bmatrix}$ | $\begin{bmatrix} 0.31 \\ 0.35 \\ 0.34 \end{bmatrix}$ | $\begin{bmatrix} 0.31 \\ 0.35 \\ 0.34 \end{bmatrix}$ | $\begin{bmatrix} 0.31 \\ 0.35 \\ 0.34 \end{bmatrix}$ |
However, the rate of convergence is pretty slow as can be seen from Figure 4.1.
4.2 Convergence Guarantee
Theorem 4.1 For a two player ZSG, the belief in fictitious play converge to the optimal strategy, i.e., \[ \lim_{n \to ∞} σ_{i,n} = σ_i^*, \quad \forall i \in \{1,2\} \] where \((σ_1^*, σ_2^*)\) is an optimal strategy.
This result was proved in Robinson (1951). The main idea of the proof is as follows: Define \[\begin{align*} \underline V_n(s_2) &= \frac{1}{n} \sum_{k=1}^n u(t_{1,k}, s_2), & \forall s_2 & \in \ALPHABET S_2 \\ \bar V_n(s_1) &= \frac{1}{n} \sum_{k=1}^n u(s_1, t_{2,k}), & \forall s_1 & \in \ALPHABET S_1 \end{align*}\] Recall that the expected utility \(U(σ_1, σ_2)\) is bilinear. Therefore, \[\begin{align*} \underline V_n(s_2) &= U(σ_{1,n}, s_2) \\ \bar V_n(s_1) &= U(s_1, σ_{2,n}). \end{align*}\] Moreover, we have that \[\begin{align*} \min_{s_2 \in \ALPHABET S_2} \underline V_n(s_2) &= \min_{s_2 \in \ALPHABET S_2} U(σ_{1,n}, s_2) \\ &\le \max_{σ_1 \in Δ(\ALPHABET S_1)} \min_{s_2 \in \ALPHABET S_2} U(σ_1, s_2) \\ &\stackrel{(a)}= \max_{σ_1 \in Δ(\ALPHABET S_1)} \min_{σ_2 \in \ALPHABET S_2} U(σ_1, σ_2) \\ &= v \end{align*}\] where \((a)\) uses the fact that for a fixed \(σ_1\), the term \(U(σ_1, σ_2)\) is linear in \(σ_2\), therefore \(\min_{σ_2 \in Δ(\ALPHABET S_2)} U(σ_1, σ_2) = \min_{s_2 \in \ALPHABET S_2} U(σ_1, s_2)\). [This is the same idea we used when coming up with an algorithm to compute the value of a ZSG.]
By a symmetric argument, we have \[ \max_{s_1 \in \ALPHABET S_1} \ge v. \] Thus, we have established that \[ \min_{s_2 \in \ALPHABET S_2} \underline V_n(s_2) \le v \le \max_{s_1 \in \ALPHABET S_1} \bar V_n(s_1). \]
The proof of Theorem 4.1 works by establishing that \[ \lim_{n \to ∞} \min_{s_2 \in \ALPHABET S_2} \underline V_n(s_2) = v \quad\text{and}\quad \lim_{n \to ∞} \max_{s_1 \in \ALPHABET S_1} \bar V_n(s_1) = v. \] See Robinson (1951) for details.
Exercises
Exercise 4.1 Run fictitious play for \(N=1000\) steps for each of the games given in Exercise 2.5. Plot the beliefs of each player as a function of time. (To simplify grading, please start with \((1,1)\) as the inital pure strategy profile.)
Part of the objective of this exercise is to completely understand fictitious play by coding the algorithm. So, instead of blindly copy pasting the code provided above, implement the algorithm from the description provided above. You may use any programming language to do this exercise.
The belifs as a function of time are shown below.
Exercise 4.2 In this exercise, we describe a variation of fictitious play called stochastic fictitious play. This is version is useful when we consider general sum games, but for the purpose of this exercise, we will continue to work with zero sum games but will use the notation the extended notation \(u_1, u_2 \colon \ALPHABET S \to \reals\) as the utility functions of the two players (instead of the short notation \(u \colon \ALPHABET S \to \reals\) as the utility of the row player).
One of the technical challenges in analyzing fictitious play is that the best response is a set valued object. To circiumven this issue, one can replace the best response by the best response of the regularized utility function \[ \tilde U_i(σ_1, σ_2) = U_i(σ_1, σ_2) + θ \sum_{s_i \in \ALPHABET S_i} σ_i(s_i) \log σ_i(s_i) \] where \(θ\) is a design parameter (often called temperature). Observe that the regularized utility function is strictly concave in \(σ_i\), therefore the best response to the regularized utility function is unique.
Stochastic ficiticious play is similar to regular fictitious play except that at each time, player \(i\) chooses a pure strategy \(t_{i,n}\) which is the best response to the regularized utlity function \(\tilde U_i\).
Such a best response can be computed efficiently from the following result: Let \(μ_i = \arg \max_{σ_i \in Δ(\ALPHABET S_i)} \tilde U_i(σ_i, σ_{-i})\). Then \(μ_i\) can be computed in closed form as \[ μ_{i}(s_i) = \frac{ \exp\bigl(\frac{1}{θ} U_i(s_i, σ_{-i}) \bigr) } { \sum_{\tilde s_i \in \ALPHABET S_i} \exp\bigl(\frac{1}{θ} U_i(\tilde s_i, σ_{-i}) \bigr) }. \] This means that the players pick a pure strategy according to the probability \(μ_i\).
Repeat Exercise 4.1 for stochastic fictitious play and \(θ \in \{1, 0.1, 0.01\}\). Compare with the solution that you obtained in Exercise 4.1.
The belifs as a function of time for \(θ = 0.1\) are shown below.