$\mathsf{H}$ | $\mathsf{T}$ | |||
$\mathsf{H}$ | $1$ | $-1$ | $-1$ | $1$ |
$\mathsf{T}$ | $-1$ | $1$ | $1$ | $-1$ |
2 Zero-sum games
Recall the game of matching pennies.
This game has the property that for any strategy profile \((s_1, s_2) \in \ALPHABET S\), \[ u_1(s_1, s_2) + u_2(s_1, s_2) = 0. \] Games with such property are called zero-sum games. In this course, we will focus on two player zero sum games (ZSG).
- Some remarks
-
- Many classical games such sa chess, checkers, Go, etc. are two player ZSGs.
ZSGs are a highly restrictive and are, therefore, much easier to analyze as compared to general-sum games.
ZSGs emerge in many engineering applications when we consider worst case performance. Such scenarios can be considered as games against nature.
When we consider the maxmin value (or the security level) of player in general-sum game, we are effectively doing worst case analysis. This means that each player is considering an auxiliary ZSG in which all other players collude and act as a single opponnet who try to minimize the payff of the player. Thus, ZSGs can be useful even when analyzing non-zero-sum games.
2.1 Simplified notation
For two player ZSGs, we can simplify the notation. Since \(u_2(s_1, s_2) = -u_1(s_1, s_2)\), we can just specify the payoff of player 1 instead of specifying the payoff of both players. For instance, the matching pennies game can be represented as
$\mathsf{H}$ | $\mathsf{T}$ | |
$\mathsf{H}$ | $1$ | $-1$ |
$\mathsf{T}$ | $-1$ | $1$ |
Here we can think of \(P_1\) as the maximizing player and \(P_2\) as the minimizing player. Thus, instead of a bimatrix representation, we will specify the payoffs by a matrix and assume that the row player is the maximizer and the column player is the minimizer.
Moreover, we will use \(u(s_1, s_2)\) to denote \(u_1(s_1, s_2)\) and \(-u_2(s_1, s_2)\).
Now recall that the maxmin levels of the two players: \[\begin{align*} \underline v_1 &= \max_{s_1 \in \ALPHABET S_1} \min_{s_2 \in \ALPHABET S_2} u_1(s_1, s_2) = \textcolor{red}{ \max_{s_1 \in \ALPHABET S_1} \min_{s_2 \in \ALPHABET S_2} u(s_1, s_2)} \\ \underline v_2 &= \max_{s_2 \in \ALPHABET S_2} \min_{s_1 \in \ALPHABET S_1} u_2(s_1, s_2) = \textcolor{red}{ - \min_{s_2 \in \ALPHABET S_2} \max_{s_1 \in \ALPHABET S_1} u(s_1, s_2)} \end{align*}\]
For ZSGs we use a simpler notation and define \[\begin{align*} \text{Maxmin value:} && \underline v &= \max_{s_1 \in \ALPHABET S_1} \min_{s_2 \in \ALPHABET S_2} u(s_1, s_2) \\ \text{Minmax value:} && \bar v &= \min_{s_2 \in \ALPHABET S_2} \max_{s_1 \in \ALPHABET S_1} u(s_1, s_2) \end{align*}\]
Example 2.1 Find the maxmin and minmax value of matching pennies.
Theorem 2.1 In a two player ZSG, \[\underline v \le \bar v.\]
Let \(s_1^*\) be a maxmin strategy for \(P_1\) and \(s_2^*\) be a minmax strategy for \(P_2\). Then, by definition of maxmin strategy, we have \[ \underline v = \max_{s_1 \in \ALPHABET S_1} \min_{s_2 \in \ALPHABET S_2} u(s_1, s_2) = \min_{s_2 \in \ALPHABET S_2} u(\textcolor{red}{s^*_1}, s_2). \] Thus, for any \(s_2 \in \ALPHABET S_2\), we have \[ \underline v \le u(s_1^*, s_2). \] Hence, by taking \(s_2 = s_2^*\), we get \[\begin{equation}\label{eq:maxmin-bound} \underline v \le u(s_1^*, s_2^*). \end{equation}\]
Similarly, by the definition of minmax strategy, we have \[ \bar v = \min_{s_2 \in \ALPHABET S_2} \max_{s_1 \in \ALPHABET S_1} u(s_1, s_2) = \max_{s_1 \in \ALPHABET S_1} u(s_1, \textcolor{red}{s^*_2}) \] Thus, for any \(s_1 \in \ALPHABET S_1\), we have \[ \bar v \ge u(s_1, s_2^*). \] Hence, by taking \(s_1 = s_1^*\), we get \[\begin{equation}\label{eq:minmax-bound} \bar v \ge u(s_1^*, s_2^*). \end{equation}\]
Combining \(\eqref{eq:maxmin-bound}\) and \(\eqref{eq:minmax-bound}\), we get \[ \underline v \le u(s_1^*, s_2^*) \le \bar v. \]
As shown in Example 2.1, in general the inequality is strict. However, for some games, the maxmin value is equal to the minmax value as seen from the following example.
Example 2.2 Find the maxmin and minmax value of the following game.
$\mathsf{L}$ | $\mathsf{C}$ | $\mathsf{R}$ | |
$\mathsf{T}$ | $2$ | $-1$ | $-2$ |
$\mathsf{M}$ | $1$ | $0$ | $1$ |
$\mathsf{B}$ | $-2$ | $-1$ | $2$ |
Definition 2.1 In a two player ZSG if \(\underline v = \bar v\), then the quantity \[ v \coloneqq \underline v = \bar v \] is called the value of the game. Any \((\text{maxmin}, \text{minmax})\) strategy of the players is called the optimal strategy profile.
As we have seen earlier, a two player ZSG may not have a value in pure strategies. In the sequel, we will show that if we allow mixed strategies, then every finite two player ZSG has a value!
2.2 Zero sum games on the unit square
As an intermediate step to mixed strategies, we consider two player ZSGs on the unit square, i.e., games on which \(\ALPHABET S_1 = \ALPHABET S_2 = [0,1]\).
For the ease of notation, we will use \(\ALPHABET X = [0, 1]\) and \(\ALPHABET Y = [0, 1]\) to denote the strategy spaces of the player.
Example 2.3 Consider a two player ZSG on the unit square with \[ u(x,y) = 4xy - 2x - y + 3, \quad \forall x \in \ALPHABET X, y \in \ALPHABET Y. \] Does this game have a value? If so, find all optimal strategy profiles
To check if the game has a value, we will compute the maxmin value \[ \underline v = \max_{x \in \ALPHABET X} \min_{y \in \ALPHABET Y} u(x,y) \] and the minmax value \[ \bar v = \min_{y \in \ALPHABET Y} \max_{x \in \ALPHABET X} u(x,y) \] and check if they are equal.
Consider, \[\begin{align*} \min_{y \in \ALPHABET Y} u(x,y) &= \min_{y \in [0,1]} \bigl[ 4xy - 2x - y - 4 \bigr] \\ &= \min_{y \in [0,1]} \bigl[ (4x -1) y - 2x + 3 \bigr]. \end{align*}\] For a fixed \(x\), this is a linear function in \(y\) and the minimizer depends on the slope.
- If the slope is positive, then the function is inreasing in \(y\) and the miniizer is \(0\).
- If the slope is negative, then the function is decreasing in \(y\) and the minimizer is \(1\).
Therefore, we have the following: \[ \min_{y \in [0,1]} u(x,y) = \begin{cases} 2x + 2, & \text{if } x < \tfrac 14 \\ 2.5, & \text{if } x = \tfrac 14 \\ -2x + 3, & \text{if } x > \tfrac 14 \end{cases} \]
The plot of \(\min_{y \in [0,1]} u(x,y)\) is shown below in Figure 2.1.
As we can see from the plot, \[ \bbox[5pt,border: 1px solid]{ \underline v = \max_{x \in [0,1]} \min_{y \in [0,1]} u(x,y) = 2.5. } \]
Now consider \[\begin{align*} \max_{x \in \ALPHABET X} u(x,y) &= \max_{x \in [0,1]} \bigl[ 4xy - 2x - y + 3 \bigr] \\ &= \max_{x \in [0,1]} \bigl[ (4y - 2) x - y + 3 \bigr] \end{align*}\]
By the same argument as before, we have \[ \max_{x \in [0,1]} u(x,y) = \begin{cases} -y + 3, & \text{if } y < \tfrac 12 \\ 2.5, & \text{if } y = \tfrac 12 \\ 3y + 1, & \text{if } y > \tfrac 12 \end{cases} \]
The plot of \(\max_{x \in [0,1]} u(x,y)\) is shown below Figure 2.2.
As we can see from the plot, \[ \bbox[5pt,border: 1px solid]{ \bar v = \min_{y \in [0,1]} \max_{x \in [0,1]} u(x,y) = 2.5. } \]
Since \(\underline v = \bar v = 2.5\), the game has a value of \(2.5\). The unique optimal strategy profile is \((\tfrac 14, \tfrac 12)\).
The previous example is an illustration of what is known as the :Minimax Theorem. One of the foundational results of Game Theory is the von Neumann minimax theorem (von Neumann 1928).
Recall that a function \(f \colon \ALPHABET X × \ALPHABET Y \to \reals\) is called bilinear if it is linear in each argument separately, i.e.,
- \(f(x_1 + x_2, y) = f(x_1,y) + f(x_2, y)\) and \(f(α x, y) = α f(x,y)\);
- \(f(x, y_1 + y_2) = f(x, y_1) + f(x, y_2)\) and \(f(x, αy) = α f(x,y)\).
Note that the utility function in Example 2.3 is bilinear.
Theorem 2.2 (von Neumann’s Minimax Theorem) Let \(\ALPHABET X\) and \(\ALPHABET Y\) be compact (i.e., closed and bounded) subsets of Eucledian spaces and \(f \colon \ALPHABET X × \ALPHABET Y \to \reals\) be a bilinear function. Then, \[ \max_{x \in \ALPHABET X} \min_{y \in \ALPHABET Y} f(x,y) = \min_{y \in \ALPHABET Y} \max_{x \in \ALPHABET X} f(x,y). \]
Some remarks:
For a facinating historial discussion of this result, see Kjeldsen (2001).
For a self contained proof of the result using elementary ideas from convex analysis, see Neumann and Morgenstern (1944), Chapter 3 and Hespanha (2017), Chapter 5.
For a short proof based on :Brouwer’s fixed point theorem, see Raghavan (1994).
The minmax theorem is equivalent to the duality theorem of linear programming. See Dantzig (1951) and Stengel (2024).
In Example 2.3, since the utility function is bilinear, we could have simply used Theorem 2.2 to conclude that the game has a value, without doing any calculations.
A useful generalization of von Neumann’s minimax theorem is the following.
Theorem 2.3 Let \(\ALPHABET X\) and \(\ALPHABET Y\) be compact subsets of Eucledian space. If \(f \colon \ALPHABET X × \ALPHABET Y \to \reals\) is concave-convex, i.e.,
- for any fixed \(y\), \(f(⋅, y) \colon \ALPHABET X \to \reals\) is concave;
- for any fixed \(x\), \(f(x, ⋅) \colon \ALPHABET Y \to \reals\) is convex.
Then, \[ \max_{x \in \ALPHABET X} \min_{y \in \ALPHABET Y} f(x,y) = \min_{y \in \ALPHABET Y} \max_{x \in \ALPHABET X} f(x,y). \]
For a generalization that removes the compactness assumption, see Sion (1958). For a general discussion of various fixed point theorems and their relevance to zero sum games, see Raghavan (1994).
2.3 Mixed strategies in finite games
We now revist the notion of mixed strategies in finite games. Recall that given a finite game \(\mathscr{G} = \langle N, (\ALPHABET S_i)_{i \in N}, (u_i)_{i \in N} \rangle\) and mixed strategies \(σ = (σ_i)_{i \in N}\) for the players, the expected utility is defined as \[\begin{equation} \label{eq:expected-utility} U_i(σ) = \sum_{s_1 \in \ALPHABET S_1} \cdots \sum_{s_N \in \ALPHABET S_N} σ_1(s_1) \cdots σ_N(s_N) u_i(s_1, \dots, s_N) \end{equation}\] or more compactly for two player games \[ U_i(σ_1, σ_2) = \sum_{s_1 \in \ALPHABET S_1} \sum_{s_2 \in \ALPHABET S_2} σ_1(s_1) σ_2(s_2) u_i(s_1, s_2). \]
A game in which players are playing mixed strategies may be viewed as another game, called the mixed extension, in which all players have continuous action spaces and are playing pure strategies.
Definition 2.2 (Mixed extension) Given a game \(\mathscr{G} = \langle N, (\ALPHABET S_i)_{i \in N}, (u_i)_{i \in N} \rangle\), its mixed extension is a game \(\mathscr{G}^* = \langle N, ( Δ(\ALPHABET S_i) )_{i \in N}, (U_i)_{i \in N} \rangle\) where
- \(Δ(\ALPHABET S_i)\) denotes the set of probability distributions over \(\ALPHABET S_i\)
- \(U_i \colon \prod_{j \in N} Δ(\ALPHABET S_j) \to \reals\) is the expected utility function defined in \(\eqref{eq:expected-utility}\).
For a discussion of different interpretations of mixed strategies, see Osborne and Rubinstein (1994, Sec 3.2).
Observe that if the original game \(\mathscr{G}\) is a two player ZSG, then its mixed extension \(\mathscr{G}^*\) is also a two player ZSG because for any mixed strategy \((σ_1, σ_2)\) \[\begin{align*} U_1(σ_1, σ_2) + U_2(σ_1, σ_2) &= \sum_{s_1 \in \ALPHABET S_1} \sum_{s_2 \in \ALPHABET S_2} σ_1(s_1) σ_2(s_2) \underbrace{\bigl[ u_1(s_1,s_2) + u_2(s_1, s_2) \bigr]}_{=0} \\ &= 0. \end{align*}\] Thus, we may simply use \(U\) to denote the expected utility of \(P_1\) with the understanding that the expected utility of \(P_2\) is \(-U\).
Furthermore, observe that \(U\) is bilinear. Thus, we can conclude the following from Theorem 2.2.
Theorem 2.4 For any finite game \(\mathscr{G}\), its mixed extesion \(\mathscr{G}^*\) has a value, which is called value of \(\mathscr{G}\) in mixed strategies.
Thus, we finally have a solution concept that always exists, albeit for a special subclass of games.
2.4 Properties of optimal strategy profiles
The value of a game has a nice geometric interpretation.
Definition 2.3 Let \(\ALPHABET X\) and \(\ALPHABET Y\) be sets and \(f \colon \ALPHABET X × \ALPHABET Y \to \reals\). A point \((x^*, y^*) \in \ALPHABET X × \ALPHABET Y\) is said to be a saddle point of \(f\) if \[\begin{align*} f(x^*, y^*) &\ge f(x, y^*), & \forall x &\in \ALPHABET X \\ f(x^*, y^*) &\le f(x^*, y), & \forall y &\in \ALPHABET Y \end{align*}\]
The term saddle point comes from the fact that the typical two dimensional example of a function with a saddle point looks like a saddle of a horse (curves up in one direction and curves down in the other).
A key property of an optimal strategy profile is the following.
Theorem 2.5 In a two player zero sum game, \((σ_1^*, σ_2^*)\) is a saddle point of the expected utility function \(U\) if and only if \(σ_1^*\) is an optimal strategy for player 1 and \(σ_2^*\) is an optimal strategy for player 2.
In this case \(U(σ_1^*, σ_2^*)\) is the value of the game.
This follows immediately from the definition of saddle point and that of the value of a game.
In general, a ZSG can have more than one optimal strategy profile. Suppose \((σ_1^*, σ_2^*)\) and \((τ_1^*, τ_2^*)\) where \(σ_1^* \neq τ_1^*\) and \(σ_2^* \neq τ_2^*\) both satisfy \[ \max_{σ_1 \in Δ(\ALPHABET S_1)} \min_{σ_2 \in Δ(\ALPHABET S_2)} U(σ_1, σ_2) = \min_{σ_2 \in Δ(\ALPHABET S_2)} \max_{σ_1 \in Δ(\ALPHABET S_1)} U(σ_1, σ_2) = v. \] Then,
\(U(σ_1^*, σ_2^*) = U(τ_1^*, τ_2^*)\)
Hence, if a game has a value and multiple optimal strategy profiles, then each optimal strategy profiles gives the value of the game.
\(U(σ_1^*, τ_2^*) = U(τ_1^*, σ_2^*) = v\)
Hence, it does not matter which optimal strategy is chosen by players 1 and 2. Every combination of optimal strategies is an optimal strategy profile.
The proof of this statement is left as an exercise (see Exercise 2.3).
2.5 Computing the value of ZSG
We now consider how to compute the value of ZSG in mixed strategies. The key simplification is as follows. When computing the maxmin value \[ \underline v = \max_{σ_1 \in Δ(\ALPHABET S_1)} \min_{σ_2 \in Δ(\ALPHABET S_2)} U(σ_1, σ_2), \] consider the inner minimization problem for a fixed \(σ_1 \in Δ(\ALPHABET S_1)\). In this case, the set of best resposes of \(P_2\) will always include pure strategies. This is because, we can write \[ U(σ_1, σ_2) = \sum_{s_2 \in \ALPHABET S_2} σ_2(s_2) U(σ_1, s_2). \] Thus, if the minimizer \(σ_2\) gives positive weights to pure strategies \(s_{2,i}, s_{2,j}, s_{2,k}\), etc., each of them must have the same \(U(σ_1, s_2)\); otherwise, we can omit putting positive weight on the pure strategy which has strictly large payoff and reduce the expected utility, which is not possible because \(σ_2\) is a minimizer.
Therefore, when computing the maxmin value, we may consider \[ \underline v = \max_{σ_1 \in Δ(\ALPHABET S_1)} \textcolor{red}{ \min_{s_2 \in \ALPHABET S_2} } U(σ_1, \textcolor{red}{s_2}). \]
By a similar argument, for minmax value, we may consider \[ \bar v = \min_{σ_2 \in Δ(\ALPHABET S_2)} \textcolor{red}{ \max_{s_1 \in \ALPHABET S_1} } U(\textcolor{red}{s_1}, σ_2). \]
To illustrate how this simplification helps, we start with an example of a \(2 × 2\) game. In this case, the mixed extension is similar to the game on the unit square, which we have considered earlier.
Example 2.4 Find the value in mixed strategies of the game below:
$\mathsf{L}$ | $\mathsf{R}$ | |
$\mathsf{T}$ | $3$ | $0$ |
$\mathsf{B}$ | $1$ | $2$ |
Strategy for row player
We first consider the row player. Suppose the row player is playing \(σ_1 = (p, 1-p)\), i.e., it chooses action \(\mathsf{T}\) with probability \(p\) and action \(\mathsf{B}\) with probability \(1-p\). Then,
\[\begin{alignat*}{2} U_1(σ_1, \mathsf{L}) &= 3p + (1-p) &&= 2p + 1, \\ U_1(σ_1, \mathsf{R}) &= 2(1-p) &&= -2p + 2. \end{alignat*}\]
The two payoffs are shown in Figure 2.3.
We now consider \[ \underline v = \max_{p \in [0, 1]} \min \bigl{ 2p + 1, -2p + 2 \bigr\}. \] Note that the inner minimization can be easily carried out graphically. The minimization of the two curves is what is called a lower envelop which is shown in red in Figure 2.4.
Since each curve is linear, the lower envelop is convex and its maximum value is the peak point, which is also the point of intersection of the two curves and is given by \[ 2p + 1 - -2p + 2 \implies \bbox[5pt,border: 1px solid]{ p = \tfrac{1}{4} }. \]
Thus, \(σ_1^* = (\frac 14, \frac 34)\) and \[ \underline v = U(σ_1^*, \mathsf{L}) = U(σ_1^*, \mathsf{R}) = \tfrac 32. \]
Strategy for column player
Now, let’s repeat the calculations for the column player. Suppose the column player is playing \(σ_2 = (q, 1-q)\), i.e., it chooses action \(\mathsf{L}\) with probability \(q\) and action \(\mathsf{R}\) with probability \(1-q\). Then,The two payoffs are shown in Figure 2.5.
We now consider \[ \bar v = \min_{q \in [0, 1]} \max \bigl\{3q, -q + 2 \bigr\}. \] As before, the inner maximization can be easily carried out graphically. The maximization of the two curves is called an upper envelop which is shown in red in Figure 2.6.
Since each curve is linear, the upper envelop is concave and its minimum value is the lowest point, which is also the point of intersection of the two curves and is given by \[ 3q = -q + 2 \implies \bbox[5pt,border: 1px solid]{ q = \tfrac{1}{2} }. \]
Thus, \(σ_2^* = (\frac 12, \frac 12)\) and \[ \bar v = U(\mathsf{T}, σ_2^*) = U(\mathsf{B}, σ_2^*) = \tfrac 32. \]
Note that, as expected, \(\bar v = \underline v\).
The above idea works for general \(2 × 2\) games. See Exercise 2.1 for some examples. We now show that the idea also be extended to general \(2 × n\) games as well.
Example 2.5 Find the value in mixed strategies of the game below:
$\mathsf{L}$ | $\mathsf{C}$ | $\mathsf{R}$ | |
$\mathsf{T}$ | $2$ | $5$ | $-1$ |
$\mathsf{B}$ | $0$ | $-2$ | $5$ |
Strategy for row player
The row player has two pure strategies, so we use the same idea as before. Suppose the row player is playing \(σ_1 = (p, 1-p)\), i.e., it chooses action \(\mathsf{T}\) with probability \(p\) and action \(\mathsf{B}\) with probability \(1-p\). Then,
\[\begin{alignat*}{2} U_1(σ_1, \mathsf{L}) &= 2p + (1-p) &&= 2p, \\ U_1(σ_1, \mathsf{C}) &= 5p - 2(1-p) &&= 7p - 2, \\ U_1(σ_1, \mathsf{R}) &= -p + 5(1-p) &&= -6p + 5. \end{alignat*}\]
As before, we can plot these functions as shown in Figure 2.7, where the lower concave envelop is shown in red.
Note that the maximum of the lower convex envelop is the intersection of \(U(σ_1, \mathsf{L})\) and \(U(σ_1, \mathsf{R})\) which happens when \[ 2p = -6p + 5 \implies \bbox[5pt,border: 1px solid]{ p = \tfrac{5}{8} }. \] Thus, \(σ_1^* = (\tfrac 58, \tfrac 38)\) and \[ v = \underline v = U(σ_1^*, \mathsf{L}) = U(σ_1^*, \mathsf{R}) = \tfrac 54. \]
Observe that \(U(σ_1^*, \mathsf{C}) = \tfrac{19}{8} > v\). Thus, if the row player plays \(σ_1^*\), then the column player will either play \(\mathsf{L}\) or \(\mathsf{R}\), but not \(\mathsf{C}\).
Strategy for column player
We now consider the column player. We cannot follow the previous procedure because we will need to construct \(U(⋅, σ_2)\), where \(σ_2\) lies in a subset of \(\reals^2\). So, we take an alternative approach.
Suppose \(σ_1 = (p, 1-p)\) and \(σ_2\) are mixed strategies. For the ease of notation, we will write \(U(σ_1, σ_2)\) as \(U(p, σ_2)\). By construction, \(U(p, σ_2)\) is linear in \(p\). Thus, if \(σ_2^*\) is optimal strategy for the column player, then \[ U(p, σ_2^*) \le v = \tfrac 54, \quad \forall p \in [0, 1]. \] Now, consider the graph in Figure 2.7. As we have previously seen, at \(p^* = \tfrac 58\), \[ U(p^*, \mathsf{L}) = U(p^*, \mathsf{R}) = \tfrac 54 = v \quad\text{but}\quad U(p^*, \mathsf{C}) > \tfrac 54 = v. \] So, \(σ_2^*\) must give zero weight to \(\mathsf{C}\); otherwise \(U(p^*, σ_2^*)\) cannot be equal to \(\tfrac 54\). Thus, we know that \[ σ_2^* = (q, 0, 1-q). \] Hence, we effectively have a \(2 × 2\) game. For find \(q\), we can solve \[ U(\mathsf{T}, σ_2^*) = U(\mathsf{B}, σ_2^*) = v = \tfrac 54. \] We know that these equations will have a consistent solution, so we only need to solve one of these. Let’s take \[ U(\mathsf{T}, σ_2^*) = 2q - (1-q) = 3q - 1 = v = \tfrac{5}{4}. \] Thus, \[ \bbox[5pt,border: 1px solid]{ q = \tfrac{3}{4} }. \]
Final solution
Thus, the optimal strategy is \[ σ_1^* = (\tfrac 58, \tfrac 38) \quad\text{and}\quad σ_2^* = (\tfrac 34, 0, \tfrac 14) \] and the value is \(v = \tfrac 54\).
In Example 2.5, the column player had three strategies but the optimal strategy randomizes between only two of them. This is an instance of the following general result.
Theorem 2.6 In a two player zero-sum game where player 1 has \(m_1\) pure strategies and player 2 has \(m_2\) pure strategies, with \(m_1 < m_2\), then player 2 has an optimal strategy that puts positive weight on at most \(m_1\) pure strategies.
Note that Theorem 2.6 means that there exists an optimal strategy with the above feature; not that all optimal strategies have this feature.
An implication of Theorem 2.6 is that in a two player ZSG where one player, say the row player, has \(2\) pure strategies and the other player, the column player, has \(m\) pure strategies with \(m > 2\), there is an optimal strategy for the column player that puts positive weight on at most two pure strategies. We can find the solution of such \(2 × m\) games using the procedure described above. In particular, we have the following:
STEP 1: Consider \(σ_1 = (p, 1-p)\) and for each pure strategy \(s_2\) of player 2, compute \(U(σ_1, s_2)\). In general, we can have six possibilities as shown in Figure 2.8.
Observe the following:
In cases (a) and (f), the optimal strategy is attained on an internal point \(p^*\)
In cases (b) and (c), the optimal strategy is a boundary point and corresponds to a pure strategy.
In case (d) and (e), the maximum is attained in an interval; hence every point in this interval is an optimal strategy.
STEP 2: It can be shown that for player 2:
case (a) implies that the optimal strategy is an internal point
case (b)–(e) implies thta the optimal strategy is a pure strategy
case (f) implies that there is an interval of randomization probabilities that are optimal.
2.6 Computing optimal strategy profile using linear programming
The graphical method described in the previous section does not work if both players have more than two actions. In general, the optimal solution can be obtained using linear programming.
Let \((σ_1, σ_2)\) be a candidate optimal strategy profile. Then, it must satisfy \[\begin{align} U(σ_1, s_2) &\ge v, & \forall s_2 &\in \ALPHABET S_2 \label{eq:constraint-1} \\ U(s_1, σ_2) &\le v, & \forall s_1 &\in \ALPHABET S_1. \label{eq:constraint-2} \end{align}\]
We can write \(\eqref{eq:constraint-1}\)–\(\eqref{eq:constraint-2}\) as two linear programs. Suppose \(\ABS{\ALPHABET S_1} = n\) and \(\ABS{\ALPHABET S_2} = m\). For the ease of notation, we assume that \(\ALPHABET S_1 = \{1, \dots, n\}\) and \(\ALPHABET S_2 = \{1, \dots, m\}\) and use \(u(i,j)\) to denote the utility of player 1 (the maximizer) when player 1 plays \(i \in \ALPHABET S_1\) and player 2 plays \(j \in \ALPHABET S_2\).
Let \[ σ_1 = (p_1, \dots, p_n) \quad\text{and}\quad σ_2 = (q_1, \dots, q_m) \] be an optimal strategy profile of the game and \(v\) be the value. Then, \(\eqref{eq:constraint-1}\) and \(\eqref{eq:constraint-2}\) are equalent to the following linear programs.
\[ \bbox[5pt,border: 1px solid]{ \begin{gathered} \max v \\ \text{s.t. } \begin{aligned}[t] & \sum_{i=1}^n p_i u(i,j) \ge v, \quad \forall j \in \ALPHABET S_2 \\ & p_i \ge 0, \quad \forall i \in \{1, \dots, n\} \\ & \sum_{i=1}^n p_i = 1 \end{aligned} \end{gathered}} \qquad \bbox[5pt,border: 1px solid]{ \begin{gathered} \min v \\ \text{s.t. } \begin{aligned}[t] & \sum_{j=1}^m q_j u(i,j) \le v, \quad \forall i \in \ALPHABET S_1 \\ & q_j \ge 0, \quad \forall j \in \{1, \dots, m\} \\ & \sum_{j=1}^m q_j = 1 \end{aligned} \end{gathered}} \]
A naive implementation of the above LP is given below:
using JuMP, GLPK
function solve_ZSG_naive(u)
= size(u)
n, m
## Primal LP to find strategies of row player
= Model(GLPK.Optimizer)
primal
@variable(primal, v_lower)
@variable(primal, p[1:n] >= 0)
@objective(primal, Max, v_lower)
@constraint(primal, sum(p[i] for i ∈ 1:n) == 1)
for j ∈ 1:m
@constraint(primal, sum(p[i]*u[i,j] for i ∈ 1:n) >= v_lower)
end
## Dual LP to find strategies for column player
= Model(GLPK.Optimizer)
dual
@variable(dual, v_upper)
@variable(dual, q[1:m] >= 0)
@objective(dual, Min, v_upper)
@constraint(dual, sum(q[j] for j ∈ 1:m) == 1)
for i ∈ 1:n
@constraint(dual, sum(q[j]*u[i,j] for j ∈ 1:m) <= v_upper)
end
optimize!(primal)
JuMP.optimize!(dual)
JuMP.
= JuMP.value.(p)
row_strategy = JuMP.value.(q)
col_strategy = JuMP.value(v_lower) # Same as v_upper
game_value
return row_strategy, col_strategy, game_value
end
However, from LP duality, we know that the primal and the dual linear programs have the same solution. So, a more efficient implementation is the following, where we compute the strategies of the column player from the dual variables of the primal LP.
function solve_ZSG(u)
= size(u)
n, m
= Model(GLPK.Optimizer)
lp
@variable(lp, v)
@variable(lp, p[1:n] >= 0)
@objective(lp, Max, v)
= Vector{JuMP.ConstraintRef}(undef, m)
q
@constraint(lp, sum(p[i] for i ∈ 1:n) == 1)
for j ∈ 1:m
= @constraint(lp, sum(p[i]*u[i,j] for i ∈ 1:n) >= v)
q[j] end
optimize!(lp)
JuMP.
= JuMP.value.(p)
row_strategy = JuMP.dual.(q)
col_strategy = JuMP.value(v)
game_value
return row_strategy, col_strategy, game_value
end
Example 2.6 Solve Example 2.5 using linear programming
We first use the naive solution presented above:
= [2 5 -1; 0 -2 5]
u = solve_ZSG_naive(u) p, q, v
([0.6249999999999998, 0.375], [0.75, 0.0, 0.25], 1.2500000000000002)
Next, we use the efficient solution that uses the dual variables
= [2 5 -1; 0 -2 5]
u = solve_ZSG(u) p, q, v
([0.6249999999999998, 0.375], [0.75, -0.0, 0.25], 1.2500000000000002)
Note that both solutions give the same answer (up to numerical precision) that we obtained “by hand”.
See Exercise 2.5 for larger examples.
2.7 Games with non-compact action spaces
So far, we have restricted the discussion to finite games and shown that the game has a value in mixed strategies and there exist optimal strategies that achieve the value. We now present an example to show that this result is not always true if the game is not finite.
Consider a \(∞ × 2\) game with the following payoff matrix:
$\mathsf{L}$ | $\mathsf{R}$ | |
$\mathsf{1}$ | $2$ | $0$ |
$\mathsf{2}$ | $\tfrac 12$ | $\tfrac 12$ |
$\mathsf{3}$ | $\tfrac 13$ | $\tfrac 23$ |
$\mathsf{4}$ | $\tfrac 14$ | $\tfrac 34$ |
$\mathsf{5}$ | $\tfrac 15$ | $\tfrac 45$ |
$\mathsf{\cdot}$ | $\cdot$ | $\cdot$ |
$\mathsf{\cdot}$ | $\cdot$ | $\cdot$ |
$\mathsf{\cdot}$ | $\cdot$ | $\cdot$ |
where the payoffs for \(k > 1\) are \[ U(k, \mathsf{L}) = \frac{1}{k} \quad \text{and} \quad U(k, \mathsf{R}) = \frac{k-1}{k}. \]
Note that the row player has countable infinite number of strategies. A natural solution approach in this case is to truncate the strategy space of the row player to \(K\) strategies and then solve the resulting finite game. Since the column player has two strategies, we use the graphical method presented above.
In particular, suppose the column player is playering a mixed srtategy \(σ_2 = (q_K, 1-q_K)\), where we use subscript \(K\) to empahsize the fact that this is the strategy for the \(K × 2\) truncated game.
Then, we have \[ U(1, σ_2) = 2q_K, \quad\text{and}\quad U(k, σ_2) = \frac{q_K}{k} + \left(1 - \frac 1k\right)(1-q_K), \quad \forall k > 1. \]
The payoffs for \(K = 4\) are shown in Figure 2.9, where the upper convex envelop is shown in red.
Observe that the upper convex envelop consists of \(U(1, σ_2)\) and and \(U(K, σ_2)\). We find the maxmin strategy (for the truncated game) by solving \[ U(1,σ_2) = U(K, σ_2) \implies 2q_K = \frac{q_K}{K} + \left(1 - \frac 1K\right)(1-q_K). \] Solving the above, we have \[ q_K = \frac{K-1}{3K -2} \quad\text{and}\quad v_K = \frac{2(K-1)}{3K-2} \]
Now consider the row player. When the column player is randomizing according to \(q_K\), the row player gets the same payoff when playing strategies \(1\) and \(K\) and gets a lower payoff when playing any other strategy. So, the row player will choose a strategy \[ σ_1 = (p_K, 0, 0, \dots, 0, 1-p_K) \] where \(p_K\) can be computed by solving \[ U(σ_1, \mathsf{L}) = U(σ_1, \mathsf{R}) = v_K \implies 2 p_K + \frac{1-p}{K} = \left(1 - \frac 1K\right) (1 - p_K) = \frac{2(K-1)}{3K - 2} \] Solving this, we get \[ p_K = \frac{K-2}{3K-2}. \]
Observe that \[ \lim_{K \to ∞} v_K = \tfrac{2}{3}, \] So, the original game has a value. However, to achieve this value, the row player has to randomize between strategies \(1\) and \(K \to ∞\); so there is no strategy for the row player which actually achieves this value.
Thus, when action sets are not compact, the existence of a value does not imply that there is a strategy that will achieve that value.
In some sense, finding a strategy that achieves a value is equivalent to maximizing \[ f(k) = 1 - \frac 1k, \quad k \in \naturalnumbers. \] Clearly, \(\sup_{k \in \naturalnumbers} f(k) = 1\), but there is no value of \(k \in \naturalnumbers\) which achieves \(f(k) = 1\).
Nonetheless, we can talk about an \(ε\)-optimal strategy. In the above example, pick \[ σ_1 = (\tfrac 13, 0, \dots, 0, \tfrac 23, 0, 0, \dots) \] where player is randomizing between strategy \(1\) and strategy \(k\). Then, \[ U(σ_1, \mathsf{L}) = \frac 23 + \frac{2}{3k} \quad\text{and}\quad U(σ_1, \mathsf{R}) = \frac 23 \left(1 - \frac 1k\right) = \frac{2}{3} - \frac{2}{3k}. \] Thus, for any \(ε > 0\), we can guarantee a value of \(\frac 23 - ε\) by choosing \(k > 2/(3 ε)\).
2.8 Sufficient conditions for existence of optimal strategies for general strategy spaces
We now present a general theorem that guarantees the existence of optimal strategies for a two player zero sum game with continuous action spaces.
Theorem 2.7 Consider a two player ZSG where \(\ALPHABET S_1\) and \(\ALPHABET S_2\) are convex sets and \(u(s_1, s_2)\) is a continuous function and:
- \(u(⋅, s_2)\) is strictly concave for each \(s_2\)
- \(u(s_1, ⋅)\) is strictly convex for each \(s_1\).
Furthermore, either
\(\ALPHABET S_1\) and \(\ALPHABET S_2\) are closed and bounded
\(\ALPHABET S_i = \reals^{m_i}\), \(i \in \{1,2\}\) and
\(\displaystyle \lim_{\NORM{s_1} \to ∞} u(s_1, s_2) = -∞\), for all \(s_2 \in \ALPHABET S_2\)
\(\displaystyle \lim_{\NORM{s_2} \to ∞} u(s_1, s_2) = ∞\), for all \(s_1 \in \ALPHABET S_1\)
Then the game has a unique optimal pure strategy.
When strategy spaces are compact, a solution in mixed strategies exist under very weak conditions. This result is know as :Glicksberg Theorem. We present a simplified version of it below (in the more general setting we can relax continuity of the utility with semi-continuity).
Theorem 2.8 A two player zero sum game with compact strategy spaces and continuous utility function has a value in mixed strategies.
We conclude this section by presenting an example to show that a game with continuous strategy spaces may not have a solution in pure strategies.
Example 2.7 Consider a ZSG game with both players guess a number in the interval \([0, 1]\). Player 1 wants to be close to player while player 2 wants to be far from player 1. We model this game as follows:
We take \(\ALPHABET S_1 = \ALPHABET S_2 = [0, 1]\) and \[ u(s_1, s_2) = - (s_1 - s_2)^2. \]
Observe that the utility function is concave in both \(s_1\) and \(s_2\). So, the above example does not satisfy the sufficient conditions of Theorem 2.7. We check if a value exists from first principles.
Strategy for row player
Consider the optimization problem \[ \underline v = \max_{s_1 \in [0,1]} \min_{s_2 \in [0,1]} u(s_1, s_2). \] In the inner optimization problem, player 2 is minimizing a concave function. So the minima is attached at the boundary. Thus, the above optimization problem is same as one where player 2 is choosing from the discrete set \(\{0, 1\}\), that is \[\begin{align*} \max_{s_1 \in [0,1]} \min_{s_2 \in [0,1]} u(s_1, s_2) &= \max_{s_1 \in [0,1]} \min_{s_2 \in \{0,1\}} u(s_1, s_2) \\ \max_{s_1 \in [0,1]} \min \{ -s_1^2, -(1-s_1)^2 \}. \end{align*}\] Although the utility functions are not linear, we can follow the same argument as before: we plot both functions, find the lower envelop, and the find the maxmin value \(s_1\). See Figure 2.10.
Thus, we have that \[ s_1^* = \tfrac 12 \quad\text{and}\quad \underline v = - \tfrac{1}{4}. \]
Strategy for column player
We know that player 2 only chooses among the actions in \(\{0, 1\}\). Suppose player 2 is playing a randomized strategy \(σ_2\) where it picks strategy \(0\) with probability \(q\) and strategy \(1\) with probability \(1\).
Then, player 2 wants to solve: \[ \bar v = \min_{q \in [0, 1]} \max_{s_1 \in [0, 1]} u(s_1, s_2). \]
Consider the inner optimization problem for player 1: \[ max_{s_1 \in [0, 1]} u(s_1, σ_2) = \max_{s_1 \in [0, 1]} \bigl[ - q s_1^2 - (1-q) (1 - s_1)^2 \bigr] \] which is concave in \(s_1\). So, we look at the first order optimality conditions to find the optima: \[ \frac{∂u}{∂ s_1} = 0 \implies -2q s_1 + 2(1-q)(1-s_1) = 0 \implies s_1 = 1 - q. \] Thus, we have \[ \max_{s_1 \in [0, 1]} \min_{s_2 \in \ALPHABET S_2} = - q (1-q)^2 - (1-q)q^2 = -q(1-q). \]
Substituting this back in the equation for the minmax, we have \[ \bar v = \min_{q \in [0, 1]} - q(1-q). \] By the :AM-GM inequality, we know that the minima is achieved at \(q = \tfrac 12\). Therefore, \[ \bar v = - \tfrac 14. \]
Therefore, we have that \(\underline v = \bar v = -\tfrac 14\); thus, the game has a value, where player 1 plays a pure strategy of \(s_1 = \tfrac 12\) and player 2 randomizes between \(0\) and \(1\) with equal probability (this makes sense from the description of the problem).
In the above example, the utility function was not convex in \(s_2\). The game does not have a value in pure strategies, but it still has a value in mixed strategies. There are general sufficient conditions to verify the existence of value in mixed strategies for games with continuous strategy spaces, but we will not discuss them in the course.
Exercises
Exercise 2.1 For each of the following games, find the value of the game and the optimal strategy profile.
$\mathsf{L}$ | $\mathsf{R}$ | |
$\mathsf{T}$ | $6$ | $0$ |
$\mathsf{B}$ | $-3$ | $3$ |
$\mathsf{L}$ | $\mathsf{R}$ | |
$\mathsf{T}$ | $-3$ | $8$ |
$\mathsf{B}$ | $4$ | $4$ |
Exercise 2.2 Find the optimal strategy profile and the value of the following game.
$\mathsf{L}$ | $\mathsf{C}$ | $\mathsf{R}$ | |
$\mathsf{T}$ | $0$ | $4$ | $6$ |
$\mathsf{M}$ | $5$ | $7$ | $4$ |
$\mathsf{B}$ | $9$ | $6$ | $3$ |
Hint: Use iterative elimination of strongly dominated strategies to first reduce the above to a \(2 × 2\) game and then simplify the resulting game.
Exercise 2.3 Suppose \((σ_1^*, σ_2^*)\) and \((τ_1^*, τ_2^*)\) both satisfy \[ \max_{σ_1 \in Δ(\ALPHABET S_1)} \min_{σ_2 \in Δ(\ALPHABET S_2)} U(σ_1, σ_2) = \min_{σ_2 \in Δ(\ALPHABET S_2)} \max_{σ_1 \in Δ(\ALPHABET S_1)} U(σ_1, σ_2) = v. \] Prove that \[ U(σ_1^*, τ_2^*) = U(τ_1^*, σ_2^*) = v. \]
Hint: Use Theorem 2.5 to argue that \[ U(τ_1^*, τ_2^*) \le U(τ_1^*, σ_2^*) \le U(σ_1^*, σ_2^*) \quad\text{and}\quad U(σ_1^*, σ_2^*) \le U(σ_1^*, τ_2^*) \le U(τ_1^*, τ_2^*). \]
Exercise 2.4 A zero-sum game with \(\ALPHABET S_1 = \ALPHABET S_2\) is called symmetric if the utlity function is skew-symmetric, i.e., \[ u(s_1, s_2) = - u(s_2, s_1), \quad \forall s_1, s_2 \in \ALPHABET S_1. \] An example of a symmetric game is rock-paper-scissors.
Let \(\mathscr{G}\) be a finite symmetric game. Show that:
The value of \(\mathscr{G}\) in mixed strategies is zero.
If \((σ_1^*, σ_2^*)\) is an optimal (mixed) strategy for \(\mathscr{G}\), then \((σ_2^*, σ_1^*)\) is also an optimal (mixed) strategy.
Use part b and Exercise 2.3 to argue that a symmetric zero-sum game always has a symmetric optimal (mixed) strategy of the form \((σ^*, σ^*)\), where both players are playing the same mixed strategy.
Exercise 2.5 Use the LP formulation to find optimal strategy profile of the following games.
$\mathsf{1}$ | $\mathsf{2}$ | $\mathsf{3}$ | |
$\mathsf{1}$ | $3$ | $-1$ | $2$ |
$\mathsf{2}$ | $1$ | $2$ | $-2$ |
$\mathsf{1}$ | $\mathsf{2}$ | $\mathsf{3}$ | $\mathsf{4}$ | |
$\mathsf{1}$ | $6$ | $0$ | $5$ | $6$ |
$\mathsf{2}$ | $-3$ | $3$ | $-4$ | $3$ |
$\mathsf{3}$ | $8$ | $1$ | $2$ | $2$ |
Note that part of the objective of this exercise is for you to learn how to use a linear programming solver. So, instead of blindly copy pasting the code provided above, implement the solution from the equations using the programming language of your choice.
Exercise 2.6 Consider a two player zero-sum game with \(\ALPHABET S_1 = \ALPHABET S_2 = \reals\).
\[ u(s_1, s_2) = - s_1^2 + s_2^2 - 2 s_1 s_2 - s_1 + 2 s_2. \]
Show that the game satisfies the sufficient conditions of Theorem 2.7. Therefore, the game has a value and a unique optimal strategy.
Find the maxmin strategy of player 1 and the maxmin value of the game by solving the following: \[ \underline v = \sup_{s_1 \in \reals} \inf_{s_2 \in \reals} u(s_1, s_2). \]
Find the minmax strategy of player 2 and the minmax value of the game by solving the following: \[ \bar v = \inf_{s_2 \in \reals} \sup_{s_1 \in \reals} u(s_1, s_2). \]
Is the maxmin value same as the minmax value?