10 Markov chains
Let \(\ALPHABET X\) be a finite set. A stochastic process \(\{X_n\}_{n \ge 0}\), \(X_n \in \ALPHABET X\), is called a Markov chain if it satisfies the Markov property: for any \(n \in \integers_{\ge 0}\) and any \(x_{1:n+1} \in \ALPHABET X^{n+1}\), we have \[\begin{equation}\tag{Markov property}\label{eq:Markov} \PR(X_{n+1} = x_{n+1} \mid X_{1:n} = x_{1:n}) = \PR(X_{n+1} = x_{n+1} \mid X_n = x_n). \end{equation}\]
The variable \(X_n\) is called the state of the Markov chain at time \(n\); the set \(\ALPHABET X\) is called state space. The \(\eqref{eq:Markov}\) implies that the current state captures all the information from the past that is relevant for the future. Stated differently, conditioned on the present, the past is independent of the future.
Independence is a symmetric relationship. Thus, we expect the Markov property to also hold if time is reversed! Exercise 10.1 asks you to formally prove that.
We now present some examples of Markov chains arising in different applications.
Example 10.1 (Gilbert Elliot channel model for burst erasures) The Gilbert-Elliot channel model is used to model communication channels with burst errors. The channel can be in one of two states: a “good” state (state 0) with low error probability, or a “bad” state (state 1) with high error probability. The channel state evolves as a Markov chain, and the error probability depends on the current state. This is essentially the same as the on-off Markov chain described in Example 10.4.
Example 10.2 (Ehrenfest model of diffusion) The following simplified model of diffusion through a porous membrane was proposed by @Ehrenfest1907 to describe the exchange of heat between two systems at different temperatures.
There are \(K\) particles that can either be in compartment \(A\) or compartment \(B\). At each time, a particle is picked at random and moved from its compartment to the other. Thus, if there are \(X_n = i\) particles in compartment \(A\) at time \(n\), then the next state \(X_{n+1}\) is either \(i-1\) (a particle was displaced from compartment \(A\) to \(B\)) with probability \(i/K\), or \(i+1\) (a particle was displaced from compartment \(B\) to \(A\)) with probability \((K-i)/K\).
Example 10.3 (Gambler’s ruin) Consider a gambler who is playing in a casino. The gambler starts with an initial fortune of $\(x_0\). At each time, the gambler places a bet where he wins $\(1\) with probability \(p\) and loses $\(1\) with probability \(1-p\). Thus, his fortune evolves as \[ X_{n+1} = X_n + Z_n \] where \(\{Z_n\}_{n \ge 0}\) is an i.i.d. sequence which takes values \(+1\) with probability \(p\) and \(-1\) with probability \(1-p\).
The gambler stops when his fortune is ruined, i.e., \(X_n = 0\). In some variations, the gambler may also stop when his fortune reaches a pre-specified amount of $\(K\).
See the following video from Veritasium for an excellent history of Markov chains and its applications
10.1 Time-homogeneous Markov chains
In this course, we will focus on time-homogeneous Markov chains. The Markov chain is called time-homogeneous if the right hand side of \(\eqref{eq:Markov}\) does not depend on \(n\). In this case, we describe the Markov chain by a state transition matrix \(P\), where \(P_{ij} = \PR(X_{n+1} = j | X_n = i)\). Such Markov chains can also be visualized using state transition diagrams as we illustrate in the examples below.
Example 10.4 (On-Off Markov chain) The On-Off Markov chain discussed earlier can be modelled with \(\ALPHABET X = \{0, 1\}\) and a general transition probability matrix of the form. \[ P = \MATRIX{ 1 - a & a \\ b & 1 - b }. \] The transition matrix can be visualized as follows.
In spite of its simplicity, such simple on-off Markov chains are used in various applications including telecommunication systems, network traffic modeling, machine failure-repair models, gene activation, and others. Some properties of the on-off Markov chain are as follows:
If \(a + b = 1\), then both rows of the transition matrix are identical. Therefore, the Markov chain has no memory and is equivalent to a Bernoulli process with success probability \(a = 1-b\).
When \(a\) or \(b\) are small, the corresponding state is “sticky”, i.e., when the Markov chain enters a sticky state, it stays there for a long time.
Example 10.5 The Ehrenfest model of diffusion presented in Example 10.2 can be modelled as a Markov chain with state space \(\{0, 1, \dots, K\}\) and transition probability \[ P_{ij} = \begin{cases} i/K & j = i-1 \\ (K-i)/K & j = i + 1 \\ 0 & \text{otherwise}. \end{cases} \] The transition matrix can be visualized as follows.
Example 10.6 (Random walk in one dimension) Imagine a particle which moves in a straight line in unit steps. Each step is one unit to the right with probability \(p\) or one unit to the left with probability \(q = 1-p\). It moves until it reaches one of two extreme points, which are called boundary points. The behavior of the particle at the boundary determines several different possibilities.
We will consider the case where the state space is \(\ALPHABET X = \{-2, -1, 0, 1, 2\}\) and the process starts in state \(0\).
Absorbing random walk: Assume that when the particle reaches a boundary state, it stays there from that time on. We may visualize the Markov chain as follows.
Absorbing random walk In this case, the transition matrix is given by \[ P = \MATRIX{ 1 & 0 & 0 & 0 & 0 \\ q & 0 & p & 0 & 0 \\ 0 & q & 0 & p & 0 \\ 0 & 0 & q & 0 & p \\ 0 & 0 & 0 & 0 & 1}. \]
Reflected random walk: Assume that when the particle reaches a boundary states, it is reflected and returns to the point it came from. We may visualize the Markov chain as follows.
Reflected random walk In this case, the transition matrix is given by \[ P = \MATRIX{ 0 & 1 & 0 & 0 & 0 \\ q & 0 & p & 0 & 0 \\ 0 & q & 0 & p & 0 \\ 0 & 0 & q & 0 & p \\ 0 & 0 & 0 & 1 & 0}. \]
Random walk with restart: Assume that when the particle reaches a boundary state, it restarts in the initial state. We may visualize the Markov chain as follows.
Random walk with restart In this case, the transition matrix is given by \[ P = \MATRIX{ 0 & 0 & 1 & 0 & 0 \\ q & 0 & p & 0 & 0 \\ 0 & q & 0 & p & 0 \\ 0 & 0 & q & 0 & p \\ 0 & 0 & 1 & 0 & 0}. \]
Random walk with periodic boundary: Assume that when the particle reaches a boundary state, it moves to the other boundary. We may visualize the Markov chain as follows.
Random walk with periodic boundary In this case, the transition matrix is given by \[ P = \MATRIX{ 0 & 0 & 0 & 0 & 1 \\ q & 0 & p & 0 & 0 \\ 0 & q & 0 & p & 0 \\ 0 & 0 & q & 0 & p \\ 1 & 0 & 0 & 0 & 0}. \]
Note that the gambler’s fortune in Example 10.3 is a random walk on non-negative integers \(\{0,1,2, \dots\}\) with absorption at \(0\). In the variation where the gambler stops when his fortune reaches $\(K\), it is a random walk over \(\{0,1,\dots,K\}\) with absorption at both ends: \(0\) and \(K\).
10.1.1 Properties of interest
Depending on the application, we are typically interested in the following properties of a Markov chain:
If the chain starts in state \(i\), what is the probability that after \(n\) steps it is in state \(j\)?
If the chain starts in state \(i\), what is the expected number of visits to state \(j\) in \(n\) steps?
What is the expected number of steps that it takes for a chain starting in state \(i\) to visit state \(j\) for the first time?
What is the average number of times that the chain is in state \(i\)? How does this depend on the initial state?
In the rest of this section, we will present results in Markov chain theory that answer the above questions.
10.2 State occupancy probabilities
Let \(μ^{(n)}\) denote the PMF of the state of the Markov chain at time \(n\). This is also called the state occupancy probabilities. We will think of \(μ^{(n)}\) as a row vector.
By the law of total probability, we have \[ \PR(X_n = j) = \sum_{i \in \ALPHABET X} \PR(X_{n-1} = i) \PR(X_n = j | X_{n-1} = i) \] or, equivalently, \[ μ^{(n)}_j = \sum_{i \in \ALPHABET X} μ^{(n-1)}_i P_{ij} \] which can be written in matrix form as \[ μ^{(n)} = μ^{(n-1)} P \] and, by recursively expanding the right hand side, we have \[ μ^{(n)} = μ^{(0)} P^n. \]
We will abbreviate \([P^n]_{ij}\) as \(P^{(n)}_{ij}\). Note that \[ P^{(n)}_{ij} = \PR(X_n = j \mid X_0 = i) \] denotes the \(n\)-step transition probability.
Chapman-Kolmogorov equations: The multi-step transition probabilities satisfy the following: for any positive integers \(m\) and \(n\) and \(i,j \in \ALPHABET X\), we have \[ P^{(n+m)}_{ij} = \sum_{k \in \ALPHABET X} P^{(n)}_{ik} P^{(m)}_{kj}. \] Thus, the probability of going from \(i\) to \(j\) in \(n+m\) steps equals the probability of going from \(i\) to somewhere in \(n\) steps and then from there to \(j\) in \(m\) steps.
Example 10.7 Numerically compute the state occupancy probabilities for the different examples of random walk presented in Example 10.6 when \(p = q = \tfrac 12\) for \(n \in \{1, \dots, 8\}\). In all cases, the initial probability \[ μ^{(0)} = \MATRIX{0 & 0 & 1 & 0 & 0 }. \]
Absorbing random walk: In this case, we have
- \(μ^{(0)} = \left[ \begin{array}{ccccc} 0 & 0 & 1 & 0 & 0 \\ \end{array} \right]\)
- \(μ^{(1)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ \end{array} \right]\)
- \(μ^{(2)} = \left[ \begin{array}{ccccc} \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} \\ \end{array} \right]\)
- \(μ^{(3)} = \left[ \begin{array}{ccccc} \frac{1}{4} & \frac{1}{4} & 0 & \frac{1}{4} & \frac{1}{4} \\ \end{array} \right]\)
- \(μ^{(4)} = \left[ \begin{array}{ccccc} \frac{3}{8} & 0 & \frac{1}{4} & 0 & \frac{3}{8} \\ \end{array} \right]\)
- \(μ^{(5)} = \left[ \begin{array}{ccccc} \frac{3}{8} & \frac{1}{8} & 0 & \frac{1}{8} & \frac{3}{8} \\ \end{array} \right]\)
- \(μ^{(6)} = \left[ \begin{array}{ccccc} \frac{7}{16} & 0 & \frac{1}{8} & 0 & \frac{7}{16} \\ \end{array} \right]\)
- \(μ^{(7)} = \left[ \begin{array}{ccccc} \frac{7}{16} & \frac{1}{16} & 0 & \frac{1}{16} & \frac{7}{16} \\ \end{array} \right]\)
- \(μ^{(8)} = \left[ \begin{array}{ccccc} \frac{15}{32} & 0 & \frac{1}{16} & 0 & \frac{15}{32} \\ \end{array} \right]\)
Reflected random walk: In this case, we have
- \(μ^{(0)} = \left[ \begin{array}{ccccc} 0 & 0 & 1 & 0 & 0 \\ \end{array} \right]\)
- \(μ^{(1)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ \end{array} \right]\)
- \(μ^{(2)} = \left[ \begin{array}{ccccc} \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} \\ \end{array} \right]\)
- \(μ^{(3)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ \end{array} \right]\)
- \(μ^{(4)} = \left[ \begin{array}{ccccc} \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} \\ \end{array} \right]\)
- \(μ^{(5)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ \end{array} \right]\)
- \(μ^{(6)} = \left[ \begin{array}{ccccc} \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} \\ \end{array} \right]\)
- \(μ^{(7)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ \end{array} \right]\)
- \(μ^{(8)} = \left[ \begin{array}{ccccc} \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} \\ \end{array} \right]\)
Random walk with restart: In this case, we have
- \(μ^{(0)} = \left[ \begin{array}{ccccc} 0 & 0 & 1 & 0 & 0 \\ \end{array} \right]\)
- \(μ^{(1)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ \end{array} \right]\)
- \(μ^{(2)} = \left[ \begin{array}{ccccc} \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} \\ \end{array} \right]\)
- \(μ^{(3)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{4} & \frac{1}{2} & \frac{1}{4} & 0 \\ \end{array} \right]\)
- \(μ^{(4)} = \left[ \begin{array}{ccccc} \frac{1}{8} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{8} \\ \end{array} \right]\)
- \(μ^{(5)} = \left[ \begin{array}{ccccc} \frac{1}{8} & \frac{1}{8} & \frac{1}{2} & \frac{1}{8} & \frac{1}{8} \\ \end{array} \right]\)
- \(μ^{(6)} = \left[ \begin{array}{ccccc} \frac{1}{16} & \frac{1}{4} & \frac{3}{8} & \frac{1}{4} & \frac{1}{16} \\ \end{array} \right]\)
- \(μ^{(7)} = \left[ \begin{array}{ccccc} \frac{1}{8} & \frac{3}{16} & \frac{3}{8} & \frac{3}{16} & \frac{1}{8} \\ \end{array} \right]\)
- \(μ^{(8)} = \left[ \begin{array}{ccccc} \frac{3}{32} & \frac{3}{16} & \frac{7}{16} & \frac{3}{16} & \frac{3}{32} \\ \end{array} \right]\)
Random walk with periodic boundary: In this case, we have
- \(μ^{(0)} = \left[ \begin{array}{ccccc} 0 & 0 & 1 & 0 & 0 \\ \end{array} \right]\)
- \(μ^{(1)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ \end{array} \right]\)
- \(μ^{(2)} = \left[ \begin{array}{ccccc} \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} \\ \end{array} \right]\)
- \(μ^{(3)} = \left[ \begin{array}{ccccc} \frac{1}{4} & \frac{1}{4} & 0 & \frac{1}{4} & \frac{1}{4} \\ \end{array} \right]\)
- \(μ^{(4)} = \left[ \begin{array}{ccccc} \frac{3}{8} & 0 & \frac{1}{4} & 0 & \frac{3}{8} \\ \end{array} \right]\)
- \(μ^{(5)} = \left[ \begin{array}{ccccc} \frac{3}{8} & \frac{1}{8} & 0 & \frac{1}{8} & \frac{3}{8} \\ \end{array} \right]\)
- \(μ^{(6)} = \left[ \begin{array}{ccccc} \frac{7}{16} & 0 & \frac{1}{8} & 0 & \frac{7}{16} \\ \end{array} \right]\)
- \(μ^{(7)} = \left[ \begin{array}{ccccc} \frac{7}{16} & \frac{1}{16} & 0 & \frac{1}{16} & \frac{7}{16} \\ \end{array} \right]\)
- \(μ^{(8)} = \left[ \begin{array}{ccccc} \frac{15}{32} & 0 & \frac{1}{16} & 0 & \frac{15}{32} \\ \end{array} \right]\)
Example 10.8 Analytically compute the state occupancy probabilities for the on-off Markov chain of Example 10.4 when it starts from the initial probability distribution \(μ^{(0)}\).
Since \(μ^{(n+1)} = μ^{(n)} P\), we have \[\begin{align*} μ^{(n+1)}_0 &= μ^{(n)}_0 (1-a) + μ^{(n)}_1 b \\ &= μ^{(n)}_0 (1-a) + (1 - μ^{(n)}_0) b \\ &= μ^{(n)}_0 (1-a-b) + b. \end{align*}\] If \(a = b = 0\), then \(μ^{(n+1)}_0 = μ^{(n)}_0 = \cdots = μ^{(0)}_0\). If not, we exploit the fact that \[ b = \frac{b}{a+b} - \frac{b}{a+b}(1-a-b) \] to recursively write \[\begin{align*} μ^{(1)}_0 &= μ^{(0)}_0 (1-a-b) + b \\ &= \left(μ^{(0)}_0 - \frac{b}{a+b}\right)(1-a-b) + \frac{b}{a+b} \end{align*}\] and \[\begin{align*} μ^{(2)}_0 &= μ^{(1)}_0 (1-a-b) + b \\ &= \left(μ^{(0)}_0 - \frac{b}{a+b}\right)(1-a-b)^2 + \frac{b}{a+b}(1-a-b) + b \\ &= \left(μ^{(0)}_0 - \frac{b}{a+b}\right)(1-a-b)^2 + \frac{b}{a+b} \end{align*}\] and, so on, to get \[\begin{align*} μ^{(n)}_0 &= μ^{(n-1)}_0 (1-a-b) + b \\ &= \left(μ^{(0)}_0 - \frac{b}{a+b}\right)(1-a-b)^n + \frac{b}{a+b}(1-a-b) + b \\ &= \left(μ^{(0)}_0 - \frac{b}{a+b}\right)(1-a-b)^n + \frac{b}{a+b}. \end{align*}\] Therefore, \[ μ^{(n)}_1 = 1 - μ^{(n)}_0 = \left(μ^{(0)}_1 - \frac{a}{a+b}\right)(1-a-b)^n + \frac{a}{a+b}. \]
The above analysis appears to be a bit of black magic. To understand what is going on, note that we are interested in computing \(μ^{(0)} P^n\). What is an efficient way to compute \(P^n\)? Eigen decomposition!. Since \(P\) is a row stochastic matrix, we have \(P \mathbf{1} = \mathbf{1}\), where \(\mathbf{1}\) is the all-ones vector. Thus, \(λ_1 = 1\) is always an eigenvalue of any transition matrix, with corresponding eigenvector \(\mathbf{1}\).
For Example 10.4, we can explicitly compute all eigenvalues by finding the roots of the characteristic equation \[\begin{align*} \det(λI - P) &= \DET{λ - 1 + a & - a \\ -b & λ - 1 + b} \\ &= (λ-1 +a)(λ-1+b) - ab \\ &= (λ-1)^2 + (a+b)(λ-1) = (λ-1)(λ-1 + a + b). \end{align*}\] Thus, the eigenvalues are \(λ_1 = 1\) and \(λ_2 = 1 - a - b\).
For the special case of \(2 × 2\) transition matrices, we can find the second eigenvalue by observing that \(λ_1 = 1\) is always an eigenvalue and \(\TR(P) = λ_1 + λ_2 = 1 + λ_2\) or \(\det(P) = λ_1 λ_2 = λ_2\).
To find the eigenvector, we find a vector \(v\) such that \((λI - P)v = 0\).
- For \(λ_1 = 1\), we already know that \(v_1 = [1; 1]\) is an eigenvector.
- For \(λ_2 = (1-a-b)\), we have \[ λ_2I - P = \MATRIX{-b & -a \\ -b & -a}. \] Therefore, one possible eigenvector is \([a; -b]\).
Then, from spectral decomposition, we know that \[ P = V Λ V^{-1} \] where \(V = [v_1 v_2] = [1, a; 1, -b]\) and \(Λ = \diag(1, 1-a-b)\). Therefore, \[\begin{align*} μ^{(n)} &= μ^{(0)} P^n = μ^{(0)} V Λ^n V^{-1} \\ &= \MATRIX{ μ^{(0)}_0 & 1 - μ^{(0)}_0 } \MATRIX{1 & a \\ 1 & -b } \MATRIX{1 & 0 \\ 0 & 1 - a -b} \frac{1}{a+b} \MATRIX{b & a \\ 1 & -1 } \\[5pt] &= \MATRIX{\dfrac{1}{a+b} & μ^{(0)}_0 - \dfrac{b}{a+b} } \MATRIX{1 & 0 \\ 0 & 1 - a -b} \MATRIX{b & a \\ 1 & -1 } \\[5pt] &=\MATRIX{\dfrac{1}{a+b} & 0 \\ 0 & \left(μ^{(0)}_0 - \dfrac{b}{a+b}\right)(1-a-b)^n } \MATRIX{b & a \\ 1 & -1 } \\[5pt] &= \frac{b}{a+b} + \left(μ^{(0)}_0 - \dfrac{b}{a+b}\right)(1-a-b)^n \end{align*}\]
TODO: Add more examples!
10.3 Class structure
We say that a state \(j\) is accessible from state \(i\) (abbreviated as \(i \rightsquigarrow j\)) if there exists an \(m \in \integers_{\ge 0}\) (which may depend on \(i\) and \(j\)) such that \([P^m]_{ij} > 0\). The fact that \(P^{(m)}_{ij} > 0\) implies that there exists an ordered sequence of states \((i_0, \dots, i_m)\) such that \(i_0 = i\) and \(i_m = j\) such that \(P_{i_k i_{k+1}} > 0\); thus, there is a path of positive probability from state \(i\) to state \(j\).
Accessibility is a transitive relationship, i.e., if \(i \rightsquigarrow j\) and \(j \rightsquigarrow k\), then \(i \rightsquigarrow k\).
Example 10.9 Consider the Markov chain shown below.
- Identify all states that are accessible from state \(1\).
- Identify all states from which state \(1\) is accessible.
NoteSolution- States accessible from state \(1\) are \(\{1,2,3\}\).
- States from which state \(1\) is accessible are \(\{1,2,3,4,5,6\}\).
Two distinct states \(i\) and \(j\) are said to communicate (abbreviated to \(i \leftrightsquigarrow j\)) if \(i\) is accessible from \(j\) (i.e., \(j \rightsquigarrow i\)) and \(j\) is accessible from \(i\) (\(i \rightsquigarrow j\)). Alternatively, we say that \(i\) and \(j\) communicate if there exist \(m, m' \in \integers_{\ge 0}\) such that \(P^{(m)}_{ij} > 0\) and \(P^{(m')}_{ji} > 0\).
For instance, in Example 10.9, state \(1\) communicates with state \(2\) but does not communicate with state \(5\).
Communication is an equivalence relationship, i.e., it is reflexive (\(i \leftrightsquigarrow i\)), symmetric (\(i \leftrightsquigarrow j\) if and only if \(j \leftrightsquigarrow i\)), and transitive (\(i \leftrightsquigarrow j\) and \(j \leftrightsquigarrow k\) implies \(i \leftrightsquigarrow k\)).
The states in a finite-state Markov chain can be partitioned into two sets: recurrent states and transient states. A state is recurrent if it is accessible from all states that are accessible from it (i.e., \(i\) is recurrent if \(i \rightsquigarrow j\) implies that \(j \rightsquigarrow i\)). States that are not recurrent are transient.
It can be shown that a state \(i\) is recurrent if and only if \[\sum_{m=1}^{\infty} P^{(m)}_{ii} = \infty.\] This is a consequence of the second Borel-Cantelli lemma (Lemma 8.2) and strong Markov property (to be discussed later).
States \(i\) and \(j\) are said to belong to the same communicating class if \(i\) and \(j\) communicate. Communicating classes form a partition of the state space. Within a communicating class, all states are of the same type, i.e., either all states are recurrent (in which case the class is called a recurrent class) or all states are transient (in which case the class is called a transient class).
For example, in Example 10.9, there are two communication classes: \(\{1,2,3\}\) and \(\{4,5,6\}\). The communication class \(\{4,5,6\}\) is transient while the communication class \(\{1,2,3\}\) is recurrent.
A communicating class \(C\) is said to be closed if \[ i \in C \text{ and } i \rightsquigarrow j \implies j \in C. \] Thus, there is no escape from a closed class. For finite state spaces, a recurrent class is always closed and a transient class is never closed. But this is not the case for countable state Markov chains.
A state \(i\) is called absorbing if \(\{i\}\) is a closed class, i.e., if \(P_{ii} = 1\).
A Markov chain with a single communicating class (thus, all states communicate with each other and are, therefore, recurrent) is called irreducible.
If \(C\) is a finite irreducible closed set of states, then every state in \(C\) is recurrent. This is an important result for finite-state Markov chains: any finite closed communicating class must be recurrent.
The period of a state \(i\), denoted by \(d(i)\), is defined as \[d(i) = \gcd\{ t \in \integers_{\ge 1} : [P^t]_{ii} > 0 \}.\] If the period is \(1\), the state is aperiodic, and if the period is \(2\) or more, the state is periodic. It can be shown that all states in the same class have the same period.
A Markov chain is aperiodic, if all states are aperiodic. A simple sufficient (but not necessary) condition for an irreducible Markov chain to be aperiodic is that there exists a state \(i\) such that \(P_{ii} > 0\). In general, for a finite and aperiodic Markov chain, there exists a positive integer \(M\) such that \[ P^{(m)}_{ii} > 0, \quad \forall m \ge M, i \in \ALPHABET X.\]
10.4 Hitting times and absorption probabilities
We use the following notation:
- For any event \(E\), \(\PR_i(E)\) denotes \(\PR(E \mid X_0 = i)\)
- For any random variable \(Y\), \(\EXP_i[Y]\) denotes \(\EXP[Y \mid X_0 = i]\).
Let \(A\) be a subset of \(\ALPHABET X\). The hitting time \(H^A\) of \(A\) is a random variable \(H^A \colon \ALPHABET X \to \{0, 1, \dots \} \cup \{∞\}\) given by \[ H^A(ω) = \min\{n \ge 0 : X_n(ω) \in A\}. \] The standard convention is that \(H^A\) is taken to be \(∞\) if \(X_n \neq A\) for any \(n > 0\). For a state \(j \in \ALPHABET X\), we use the short-hand \(H^j\) to denote \(H^{\{j\}}\).
The probability that starting from state \(i\) the Markov chain ever hits \(A\) is then given by \[ h^A_i = \PR_i(H^A < ∞). \] This is called hitting probability. When \(A\) is a closed class, \(h^A_i\) is called the absorption probability.
The mean-time taken for the Markov chain to reach \(A\) is given by \[ m^A_i = \EXP_i[H^A] = \sum_{n=0}^{∞} n \PR_i(H^A = n). \] This is called the mean hitting time.
A remarkable property of Markov chain is that these quantities can be computed by solving a system of linear equations associated with the transition probability matrix \(P\). We start with some examples to illustrate the main idea.
Example 10.10 Suppose we toss a coin multiple times. What is the probability of getting a head before getting a tail?
We model this using a Markov chain where the state represents the outcome we’re waiting for. Let \(p\) denote the probability of heads and \(q = 1-p\) denote the probability of tails. The Markov chain has three states:- State \(-1\): tail has occurred (absorbing state, we lose)
- State \(0\): initial state (no outcome yet)
- State \(1\): head has occurred (absorbing state, we win)
Markov chain for coin tossing until head or tail Let \(h^1_i\) denote the probability of hitting state \(1\) (i.e., getting a head) when starting at state \(i\). Then, we have \[\begin{align*} h^1_{-1} &= 0, \\ h^1_0 &= p h^1_1 + q h^1_{-1} = p \cdot 1 + q \cdot 0 = p, \\ h^1_1 &= 1. \end{align*}\] Therefore, starting from state \(0\), the probability of getting a head before a tail is \(h^1_0 = p\).
Example 10.11 Two players are playing a game. They repeatedly toss a coin. Player 1 wins if two consecutive heads occur before two consecutive tails. Player 2 wins if two consecutive tails occur before two consecutive heads. What is the probability that Player 1 wins?
We model this using a Markov chain where the state tracks the pattern of recent coin tosses. Let \(p\) denote the probability of heads and \(q = 1-p\) denote the probability of tails. The Markov chain has five states:- State \(-2\): two consecutive tails (Player 2 wins, absorbing state)
- State \(-1\): one tail (last toss was a tail)
- State \(0\): initial state (or after alternating outcomes like HT or TH)
- State \(1\): one head (last toss was a head)
- State \(2\): two consecutive heads (Player 1 wins, absorbing state)
Markov chain for two heads before two tails Let \(h^2_i\) denote the probability of hitting state \(2\) (i.e., Player 1 wins) when starting at state \(i\). Then, we have \[\begin{align*} h^2_{-2} &= 0, \\ h^2_{-1} &= p h^2_0 + q h^2_{-2} = p h^2_0 + q \cdot 0 = p h^2_0, \\ h^2_0 &= p h^2_1 + q h^2_{-1}, \\ h^2_1 &= p h^2_2 + q h^2_0 = p \cdot 1 + q h^2_0 = p + q h^2_0, \\ h^2_2 &= 1. \end{align*}\] From the second equation, we get \(h^2_{-1} = p h^2_0\). From the fourth equation, we get \(h^2_1 = p + q h^2_0\). Substituting into the third equation: \[ h^2_0 = p h^2_1 + q h^2_{-1} = p(p + q h^2_0) + q(p h^2_0) = p^2 + pq h^2_0 + pq h^2_0 = p^2 + 2pq h^2_0. \] Solving for \(h^2_0\), we get \(h^2_0(1 - 2pq) = p^2\), so \[ h^2_0 = \frac{p^2}{1 - 2pq} = \frac{p^2}{1 - 2p(1-p)} = \frac{p^2}{1 - 2p + 2p^2} = \frac{p^2}{2p^2 - 2p + 1}. \] For a fair coin (\(p = q = \frac{1}{2}\)), we get \(h^2_0 = \frac{(1/2)^2}{1 - 2(1/2)(1/2)} = \frac{1/4}{1 - 1/2} = \frac{1/4}{1/2} = \frac{1}{2}\).
We now provide a general formula for computing hitting probabilities.
Theorem 10.1 (Hitting probabilities) The hitting probabilities \(\{h^A_i\}_{i \in \ALPHABET X}\) satisfies the following system of linear equations: \[ h^A_i = \begin{cases} 1, & i \in A \\ \sum_{j \in \ALPHABET X} P_{ij} h^A_j, & i \not\in A \end{cases} \] When \(\ALPHABET X\) is finite, the above system has a unique solution; when \(\ALPHABET X\) is countable, the above system may have multiple solutions and the hitting probabilities correspond to the minimal non-negative solution.
NoteProofWhen \(X_0 = i \in A\), the hitting time \(H^A = 0\), so \(h^A_i = 1\). This proves the first part of the formula.
For the second part, consider \(X_0 = i \not\in A\). Then \(H^A_i \ge 1\). By the Markov property, we have \[ \PR_i(H^A < ∞ \mid X_1 = j) = \PR_j(H^A < ∞) = h^A_j. \] Moreover, by the law of total probability, we have \[\begin{align*} h^A_i &= \PR_i(H^A < ∞) = \sum_{j \in \ALPHABET X} \PR_i(H^A < ∞, X_1 = j) \\ &= \sum_{j \in \ALPHABET X} \PR_i(H^A < ∞ \mid X_1 = j) \PR_i(X_1 = j) \\ &= \sum_{j \in \ALPHABET X} P_{ij} h^A_j. \end{align*}\]
Example 10.12 Consider a gambler’s ruin problem, where the gambler starts with $\(1\) and stops either when he is ruined or when his fortune reaches $\(K\).
Find the probability of ruin (i.e., the fortune gets absorbed in state \(0\) rather than state \(K\)).
NoteSolutionFor the ease of notation, we use \(h_i\) as a short-form for \(h^{\{0\}}_i\). The hitting probabilities satisy the linear system of equations:
\[\begin{align*} h_0 &= 1 \\ h_i &= ph_{i+1} + q h_{i-1}, \quad i \in \{1,\dots,K-1\} \\ h_K &= 0, \end{align*}\] where \(q = 1-p\).
The characteristic equation associated with the linear recurrence relationship is \[ λ = p λ^2 + q \] which has two distinct roots, \(λ_1 = 1\) and \(λ_2 = q/p\) if \(p \neq q\) and a double root at \(λ_1 = 1\) if \(p = q = \frac 12\). Therefore, the general solution is of the form \[ h_i = a λ_1^i + b λ_2^i = a + b\Bigl(\tfrac {q}{p} \Bigr)^i \] where we determine the coefficients \(a\) and \(b\) from the boundary conditions \(h_0 = 1\) and \(h_K = 0\). Solving for \(a\) and \(b\), we get that for \(p \neq q\), we have \[ h_i = \frac{1 - \Bigl(\frac qp\Bigr)^i}{1 - \Bigl(\frac qp\Bigr)^K} \] and for \(p = q = \frac 12\), we have \[ h_i = \frac{i}{K}. \]
Similarly as above, we can derive a formula for computing mean hitting times. We start with a couple of examples to illustrate the main idea.
Example 10.13 Suppose we toss a coin multiple times and stop at the first head. What is the expected number of coin tosses until stopping?
From elementary probability we know that the number of tosses until stopping is a geometric random variable. However, we will model this using a Markov chain where the state denotes the number of consecutive heads so far. Let \(p\) denote the probability of heads and \(q = 1-p\) denote the probability of tails. Then, the Markov chain model is as follows.
Markov chain for coin tossing until one head Let \(m^1_i\) denote the expected number of tosses until stopping (i.e., hitting state \(1\)) when starting at state \(i\). Then, we have \[\begin{align*} m^1_0 &= 1 + q m^1_0 + p m^1_1, \\ m^1_1 &= 0. \end{align*}\] Solving this system of equations, we get \(m^1_0 = 1/(1-q) = 1/p\).
Example 10.14 Suppose we toss a coin multiple times and stop at two consecutive heads. What is the expected number of coin tosses until stopping?
We can model this in the same manner as before, where the state denotes the number of consecutive heads so far. The Markov chain is as follows:
Markov chain for coin tossing until two heads Let \(m^2_i\) denote the expected number of tosses until stopping (i.e., hitting state \(2\)) when starting at state \(i\). Then, we have \[\begin{align*} m^2_0 &= 1 + q m^2_0 + p m^2_1, \\ m^2_1 &= 1 + q m^2_0 + p m^2_2, \\ m^2_2 &= 0. \end{align*}\] Solving this system of equations, we get \(m^2_0 = 1/(1-p)\).
We now provide a general formula for computing mean hitting times.
Theorem 10.2 (Mean hitting times) The mean hitting times \(\{m^A_i\}_{i \in \ALPHABET X}\) satisfies the following system of linear equations: \[ m^A_i = \begin{cases} 0, & i \in A \\ 1 + \sum_{j \not\in A} P_{ij} m^A_j, & i \not\in A \end{cases} \] When \(\ALPHABET X\) is finite, the above system has a unique solution; when \(\ALPHABET X\) is countable, the above system may have multiple solutions and the mean hitting times correspond to the minimal non-negative solution.
NoteProofThe proof is similar to the proof of Theorem 10.1. When \(X_0 = i \in A\), the hitting time \(H^A = 0\), so \(m^A_i = 0\). This proves the first part of the formula.
For the second part, consider \(X_0 = i \not\in A\). Then \(H^A_i \ge 1\). By the Markov property, we have \[ \EXP_i[ H^A \mid X_1 = j] = 1 + \EXP_j[ H^A ] = 1 + m^A_j. \] Moreover, by the law of total probability, we have \[\begin{align*} m^A_i &= \EXP_i[ H^A ] = \sum_{j \in \ALPHABET X} \EXP_i[ H^A \mid X_1 = j ] \PR_i(X_1 = j) \\ &= \sum_{j \in \ALPHABET X} \EXP_i[ H^A \mid X_1 = j ] \PR_i(X_1 = j) \\ &= \sum_{j \in \ALPHABET X} P_{ij} \bigl[ 1 + m^A_j \bigr] \\ &= 1 + \sum_{j \not\in A} P_{ij} m^A_j. \end{align*}\]
For absorbing Markov chains, we can use matrix methods to efficiently compute hitting probabilities and mean hitting times. The key idea is to rearrange the transition matrix into a canonical form where absorbing states are grouped together.
Suppose the state space \(\ALPHABET X\) can be partitioned into transient states \(\ALPHABET T\) and absorbing states \(\ALPHABET C\). By reordering the states so that all transient states come first, followed by all absorbing states, the transition matrix \(P\) can be written in the canonical form: \[ P = \MATRIX{ Q & R \\ 0 & I}. \] where:
- \(Q\) is a \(|\ALPHABET T| \times |\ALPHABET T|\) matrix containing transition probabilities between transient states,
- \(R\) is a \(|\ALPHABET T| \times |\ALPHABET C|\) matrix containing transition probabilities from transient states to absorbing states,
- \(0\) is a \(|\ALPHABET C| \times |\ALPHABET T|\) zero matrix (since absorbing states cannot transition to transient states),
- \(I\) is a \(|\ALPHABET C| \times |\ALPHABET C|\) identity matrix (since absorbing states stay in themselves).
Multi-step transitions: For the canonical form, we can show by induction that \[ P^n = \MATRIX{ Q^n & (I + Q + Q^2 + \cdots + Q^{n-1}) R \\ 0 & I}. \] This follows from the fact that matrix multiplication preserves the block structure, and the lower-left block remains zero while the lower-right block remains \(I\) for all powers.
Fundamental Matrix: The fundamental matrix \(N\) is defined as \[ N = (I - Q)^{-1} = I + Q + Q^2 + Q^3 + \cdots. \] The series expansion is valid because for transient states, the spectral radius of \(Q\) is less than 1, ensuring convergence. The fundamental matrix exists and is well-defined for any Markov chain with at least one absorbing state.
Interpretation of \(N_{ij}\): The entry \(N_{ij}\) represents the expected number of visits to transient state \(j\) before absorption, starting from transient state \(i\). This can be seen from the series expansion: \[ N_{ij} = \sum_{n=0}^∞ [Q^n]_{ij} = \sum_{n=0}^∞ \PR_i(X_n = j, \text{ chain hasn't been absorbed by time } n). \]
Mean hitting times: Let \(\ONES\) denote a column vector of ones with dimension \(|\ALPHABET T|\). Then the mean hitting times (expected time until absorption) starting from each transient state are given by \[ m = N \ONES, \] where \(m_i\) is the mean time until absorption starting from transient state \(i\).
NoteProofThe mean hitting time \(m_i\) equals the expected number of steps spent in transient states before absorption. Since \(N_{ij}\) represents the expected number of visits to transient state \(j\) starting from transient state \(i\), we have \[ m_i = \sum_{j \in \ALPHABET T} N_{ij} = [N \ONES]_i, \] which gives the desired result in matrix form.
Hitting probabilities: The hitting probabilities (absorption probabilities) are given by \[ h = N R, \] where \(h_{ij}\) is the probability of being absorbed in absorbing state \(j\) when starting from transient state \(i\).
NoteProofStarting from transient state \(i\), the chain is absorbed in absorbing state \(j\) if and only if it takes some number \(n \ge 0\) of steps within transient states, and then transitions to \(j\) in step \(n+1\). The probability of being in transient state \(k\) after \(n\) steps (without being absorbed) is \([Q^n]_{ik}\), and the probability of transitioning from \(k\) to \(j\) is \(R_{kj}\). Therefore, \[ h_{ij} = \sum_{n=0}^∞ \sum_{k \in \ALPHABET T} [Q^n]_{ik} R_{kj} = \sum_{n=0}^∞ [Q^n R]_{ij} = [N R]_{ij}, \] where the last equality follows from \(N = I + Q + Q^2 + \cdots\).
Example 10.15 Consider the gambler’s ruin problem from Example 10.12 with \(K = 3\) (i.e., the gambler starts with $\(1\) and stops at either $\(0\) or $\(3\)). Write the transition matrix in canonical form and use the fundamental matrix to compute the probability of ruin and the expected duration of play.
NoteSolutionThe state space is \(\{0, 1, 2, 3\}\) where states \(0\) and \(3\) are absorbing. In the natural ordering \((0, 1, 2, 3)\), the transition matrix is: \[ P = \MATRIX{ 1 & 0 & 0 & 0 \\ q & 0 & p & 0 \\ 0 & q & 0 & p \\ 0 & 0 & 0 & 1 } \] where \(q = 1-p\). Reordering to put transient states first, we use the ordering \((1, 2, 0, 3)\): \[ P = \MATRIX{ 0 & p & q & 0 \\ q & 0 & 0 & p \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 } = \MATRIX{ Q & R \\ 0 & I } \] where \[ Q = \MATRIX{ 0 & p \\ q & 0 }, \quad R = \MATRIX{ q & 0 \\ 0 & p }. \]
The fundamental matrix is: \[ N = (I - Q)^{-1} = \MATRIX{ 1 & -p \\ -q & 1 }^{-1} = \frac{1}{1-pq} \MATRIX{ 1 & p \\ q & 1 }. \]
The hitting probabilities are: \[ h = N R = \frac{1}{1-pq} \MATRIX{ 1 & p \\ q & 1 } \MATRIX{ q & 0 \\ 0 & p } = \frac{1}{1-pq} \MATRIX{ q & p^2 \\ q^2 & p }. \]
Thus, starting from state \(1\), the probability of ruin (absorption in state \(0\)) is \(h_{1,0} = q/(1-pq)\) and the probability of winning (absorption in state \(3\)) is \(h_{1,3} = p^2/(1-pq)\).
The mean hitting times are: \[ m = N \ONES = \frac{1}{1-pq} \MATRIX{ 1 & p \\ q & 1 } \MATRIX{ 1 \\ 1 } = \frac{1}{1-pq} \MATRIX{ 1+p \\ q+1 }. \]
Thus, starting from state \(1\), the expected duration is \(m_1 = (1+p)/(1-pq)\).
10.5 Invariant Distribution
A probability distribution \(π\) on the state space \(\ALPHABET X\) is called an invariant distribution if it satisfies the equation \[ π = π P. \]
An invariant distribution is also called a stationary distribution. If the initial distribution of the Markov chain is \(π\), then the distribution remains \(π\) at all future times. In other words, if \(μ^{(0)} = π\) then \(μ^{(n)} = π\) for all \(n \ge 0\).
Consider the on-off Markov chain from Example 10.4 with transition matrix \[ P = \MATRIX{ 1 - a & a \\ b & 1 - b }. \] This chain has an invariant distribution given by \(π = [b/(a+b), a/(a+b)]\) (assuming \(a+b > 0\)). One can verify that \(π = π P\), which shows that an invariant distribution exists for this chain.
A finite Markov chain with only one communication class has a unique invariant distribution. Thus, irreducible Markov chains have a unique invariant distribution. Note that aperiodicity is not required. For instance, consider on-off Markov chain from Example 10.4 with \(a = b = 1\). This chain is periodic, but has a unique invariant distribution \(π = [\frac 12, \frac 12]\).
A Markov chain with more than one closed communicating class has multiple invariant distributions. For example, consider a Markov chain with two absorbing states \(0\) and \(1\). The transition matrix is \[ P = \MATRIX{ 1 & 0 \\ 0 & 1 }. \]
Markov chain with two absorbing states In this case, any distribution of the form \(π = [α, 1-α]\) with \(α \in [0,1]\) is an invariant distribution, since \[ π P = [α, 1-α] \MATRIX{ 1 & 0 \\ 0 & 1 } = [α, 1-α] = π. \] Thus, there are uncountably many invariant distributions. This happens because the chain consists of two disjoint recurrent classes (the absorbing states \(0\) and \(1\)), and any probability mass can be put on either class and be preserved.
More generally, if a Markov chain has multiple irreducible communicating (closed) classes, each class can have its own invariant distribution (supported only on that class). Any convex combination of these distributions is also an invariant distribution for the whole chain.
A Markov chain that is irreducible and aperiodic has a special prorperty: for any initial distribution \(μ^{(0)}\), we have \[\lim_{n \to ∞} μ^{(n)}_j = π_j, \quad \forall j \in \ALPHABET X.\]
Irreducible and aperiodic Markov chains are also called ergodic. Thus, an ergodic Markov chain converges to its invariant distribution. For this reason, the invariant distribution is also called the limiting distribution or the steady-state distribution.
Computing invariant distributions. The equation \(π = π P\) can be written component-wise as \[ π_j = \sum_{i \in \ALPHABET X} π_i P_{ij}, \quad \forall j \in \ALPHABET X. \] These equations are called balance equations and can be used to find invariant distributions. Together with the constraint that \(π\) is a probability distribution (i.e., \(\sum_{j \in \ALPHABET X} π_j = 1\) and \(π_j \ge 0\) for all \(j\)), they form a system of linear equations that can be solved to determine the invariant distribution.
As an example, let’s compute the invariant distribution of the on-off Markov chain from Example 10.4. The transition matrix is \[ P = \MATRIX{ 1 - a & a \\ b & 1 - b }. \] To find the invariant distribution \(π = [π_0, π_1]\), we solve the balance equations \(π = π P\): \[\begin{align*} π_0 &= π_0 (1-a) + π_1 b, \\ π_1 &= π_0 a + π_1 (1-b). \end{align*}\] From the first equation, we get \(π_0 = π_0 (1-a) + π_1 b\), which simplifies to \(π_0 a = π_1 b\), or equivalently \(π_1 = (a/b) π_0\) (assuming \(b > 0\)). Using the normalization constraint \(π_0 + π_1 = 1\), we have \[ π_0 + \frac{a}{b} π_0 = 1 \quad \Rightarrow \quad π_0 \left(1 + \frac{a}{b}\right) = 1 \quad \Rightarrow \quad π_0 = \frac{b}{a+b}. \] Therefore, \(π_1 = 1 - π_0 = \frac{a}{a+b}\). The invariant distribution is \(π = [b/(a+b), a/(a+b)]\), which matches the result mentioned in point 3.
We can write balance equations directly from the state transition diagram. For each state \(j\), the balance equation states that the probability flow into state \(j\) equals the probability flow out of state \(j\). Consider the three-state Markov chain with transition matrix \[ P = \MATRIX{ 0.3 & 0.7 & 0 \\ 0 & 0.3 & 0.7 \\ 0.7 & 0 & 0.3 }. \]
Three-state aperiodic Markov chain The balance equations, equating the flow into and out of each state, can be written as:
State 0:
\[\begin{align*} \text{Inflow} &= \pi_2 \cdot 0.7 \\ \text{Outflow} &= \pi_0 \cdot 0.7 \end{align*}\]State 1:
\[\begin{align*} \text{Inflow} &= \pi_0 \cdot 0.7 \\ \text{Outflow} &= \pi_1 \cdot 0.7 \end{align*}\]State 2:
\[\begin{align*} \text{Inflow} &= \pi_1 \cdot 0.7 \\ \text{Outflow} &= \pi_2 \cdot 0.7 \end{align*}\]
Thus, the balance equations are given by \[\begin{align*} \pi_0 \cdot 0.7 = \pi_2 \cdot 0.7 \\ \pi_1 \cdot 0.7 = \pi_0 \cdot 0.7 \\ \pi_2 \cdot 0.7 = \pi_1 \cdot 0.7 \end{align*}\] Together with the normalization constraint \(π_0 + π_1 + π_2 = 1\), we get \(π_0 = π_1 = π_2 = 1/3\). This illustrates how balance equations can be derived directly from the state diagram by considering probability flows.
The above example is an illustration of a more general principle. If \(P\) is doubly stochastic (i.e., both row sum and column sums are \(1\)), then the invariant distribution is a uniform distribution.
Exercises: Computing Invariant Distributions
Example 10.16 Consider the Markov chain with transition matrix
\[ P = \MATRIX{ 0.6 & 0.4 & 0 & 0 \\ 0.3 & 0.7 & 0 & 0 \\ 0.2 & 0.3 & 0.3 & 0.2 \\ 0.1 & 0.4 & 0.1 & 0.4 }. \]Markov chain with transient states and closed class This chain has states \(\{0, 1, 2, 3\}\) where states \(\{0, 1\}\) form a closed communicating class and states \(\{2, 3\}\) are transient. Find all invariant distributions for this chain.
Example 10.17 Consider the Markov chain with transition matrix
\[ P = \MATRIX{ 1 & 0 & 0 \\ 0.2 & 0.4 & 0.4 \\ 0 & 0 & 1 }. \]Markov chain with absorbing states Note that this chain is not irreducible (it has two absorbing states). Compute all possible invariant distributions.
10.6 Limiting Distribution
Suppose a finite Markov chain has the property that starting from any initial distribution, the distribution of \(X_n\) converges to some distribution \(\pi\) as \(n \to \infty\): \[ \lim_{n \to \infty} P^{(n)}_{i, j} = \pi_j \qquad \text{for all } i, j \in \ALPHABET X. \] Then, \(\pi\) is an invariant distribution.
NoteProofWe start by showing that \(π\) must be a valid probability distribution. In particular, \[ \sum_{j \in \ALPHABET X} π_j = \sum_{j \in \ALPHABET X} \lim_{n \to ∞} P^{(n)}_{ij} = \lim_{n \to ∞} \sum_{j \in \ALPHABET X} P^{(n)}_{ij} = 1 \]
We now show that \(π\) is an invariant distribution. \[ π_j = \lim_{n \to ∞} P^{(n)}_{ij} = \lim_{n \to ∞} \sum_{k \in \ALPHABET X} P^{(n-1)}_{ik} P_{kj} = \sum_{k \in \ALPHABET X} \lim_{n \to ∞} P^{(n-1)}_{ik} P_{kj} = \sum_{k \in \ALPHABET X} π_k P_{kj} \] Thus, \(π\) is an invariant distribution.
As discussed above, limiting distributions exist for ergodic Markov chains (i.e., irreducible and aperiodic Markov chains): for any initial distribution \(μ^{(0)}\), we have \[\lim_{n \to ∞} μ^{(n)}_j = π_j, \quad \forall j \in \ALPHABET X.\]
This is equivalent to the statement that for all \(i, j \in \ALPHABET X\), \[\lim_{n \to \infty} P^{(n)}_{ij} = π_j.\] This is an consequence of :Perron Frobenius Theorem. In matrix form, we have \[ \lim_{n \to \infty} P^n = \MATRIX{ \pi \\ \pi \\ \vdots \\ \pi } \] where each row of the limiting matrix is the invariant distribution \(π\). That is, for large \(n\), every row of \(P^n\) approaches \(π\).
In fact, we have a stronger result: the Strong Law of Large Numbers (SLLN) for (finite) Markov chains. Let \(\{X_n\}_{n \ge 0}\) be an irreducible and aperiodic (i.e., ergodic) Markov chain. Then, for any initial distribution, \[ \lim_{N \to \infty} \frac{1}{N} \sum_{n=1}^N \IND\{X_n = j\} \xrightarrow{a.s.} \pi_j, \quad \forall j \in \ALPHABET X. \] This is known as ergodic property: sample average equals ensemble average.
An immediate consequence of the ergodic property is that given an ergodic Markov chain \(\{X_n\}_{n \ge 0}\) and any function \(f \colon \ALPHABET X \to \reals\), we have \[ \lim_{N \to \infty} \frac{1}{N} \sum_{n=1}^N f(X_n) \xrightarrow{a.s.} \sum_{j\in \ALPHABET X} \pi_j f(j). \]
If the chain is periodic with period \(d > 1\) (i.e., not aperiodic), the limiting distribution \(\lim_{n \to \infty} P^n\) does not exist in the usual sense, as the transition probabilities continue to oscillate. However, if the chain is irreducible, it still possesses a unique invariant distribution \(π\). In the periodic case, results like ergodicity and the law of large numbers still hold if we consider time averages over subsequences that are integer multiples of the period \(d\), i.e., convergence happens along these subsequences, not along every \(n\).
10.7 Censored Markov chains
Suppose a Markov chain is only observed when the state lies in a subset \(\ALPHABET O \subset \ALPHABET X\). Let \(\{τ_k\}_{k \ge 1}\) denote the when the Markov chain returns to the set \(\ALPHABET O\), i.e., \(τ_1\) is the first time when the Markov chain \(\{X_n\}_{n \ge 1}\) hits \(\ALPHABET O\); \(τ_2\) denotes the second time when the Markov chain hits \(\ALPHABET O\); and so on.
Define the process \(\{O_k\}_{k \ge 1}\) by \(O_k = X_{τ_k}\). The strong Markov property implies that \(\{O_k\}_{k \ge 1}\) is a Markov chain. What is the transition probability matrix \(\tilde P\) of this chain?
As we did for the absorbing Markov chains, we re-order states so that the unobservable states \(\ALPHABET O^c\) come first, followed by all observable state \(\ALPHABET O\). Then the transition matrix \(P\) can be written as \[ P = \MATRIX{ Q & R \\ S & P_O } \] where:
- \(Q\) is a \(\ABS{\ALPHABET O^c} × \ABS{\ALPHABET O^c}\) matrix containing the transition probabilities between unobserved states,
- \(R\) is a \(\ABS{\ALPHABET O^c} × \ABS{\ALPHABET O}\) matrix containing the transition probabilities from unobserved to observed states,
- \(S\) is the \(\ABS{\ALPHABET O} × \ABS{\ALPHABET O^c}\) matrix containing the transition probabilities from observed to unobserved states
- \(P_O\) is the \(\ABS{\ALPHABET O} × \ABS{\ALPHABET O}\) matrix containing the transitions from observed to observed states.
Suppose we start with a state \(O_1 = X_1 \in \ALPHABET O\). Then the probability of going to each of the states in \(\ALPHABET O\) in one step is given by \(P_O\). To take more than one step, the chain must enter a state in \(\ALPHABET O^c\) with probability \(S\). We will assume that the set \(\ALPHABET O^c\) does not have any obsorbing subsets. Then, as we showed in the analysis for absorbing Markov chains, starting from a given state in \(\ALPHABET O^c\), the Markov chain enters \(\ALPHABET O\) for the first time with probability \(NR = (I - Q)^{-1} R\). Thus, we have that \[ \tilde P = P_O + S(I - Q)^{-1}R. \]
If \(P\) has only one communicating class, then \(\{X_n\}_{n \ge 1}\) has a unique invariant distribution, which we denote by \(π\). Then \((π_k)_{k \in \ALPHABET O}\) normalized to have sum \(1\) is the unique invariant distribution of \(\{O_n\}_{n \ge 1}\).
As an example, consider the reflected random walk from Example 10.6 (point 2) with state space \(\{-2, -1, 0, 1, 2\}\)
Reflected random walk Suppose we only observe the process when \(X_n \in \ALPHABET O = \{0, 1, 2\}\), so the unobservable states are \(\ALPHABET O^c = \{-2, -1\}\).
Since the states are already ordered in the right order, we can write the transition matrix as: \[ P = \left[\begin{array}{cc|ccc} 0 & 1 & 0 & 0 & 0 \\ q & 0 & p & 0 & 0 \\ \hline 0 & q & 0 & p & 0 \\ 0 & 0 & q & 0 & p \\ 0 & 0 & 0 & 1 & 0\end{array}\right]. \]
To simplify computations, we will assume \(p = \tfrac{1}{4}\) and \(q = \tfrac{3}{4}\). Then, the transition matrix of the Markov chain \(\{O_k\}_{k \ge 1}\) is given by
\[\begin{align*} \tilde P &= P_O + S(I - Q)^{-1} R \\ &= \MATRIX{ 0 & \tfrac{1}{4} & 0 \\ \tfrac{3}{4} & 0 & \tfrac{1}{4} \\ 0 & 1 & 0 } + \MATRIX{ 0 & \tfrac{3}{4} \\ 0 & 0 \\ 0 & 0 } \MATRIX{ 1 & -1 \\ -\tfrac{3}{4} & 1 }^{-1} \MATRIX{ 0 & 0 & 0 \\ \tfrac{1}{4} & 0 & 0 } \\ &= \MATRIX{ \tfrac{3}{4} & \tfrac{1}{4} & 0 \\ \tfrac{3}{4} & 0 & \tfrac{1}{4} \\ 0 & 1 & 0 }. \end{align*}\]
This intuitively makes sense. The chain \(\{X_n\}_{n \ge 1}\) can only go to unobserved states at state \(X_n = 0\). When it does to go \(\ALPHABET O^c\), the first observed state it visits must be \(0\). Then \(\tilde P_{00} = P_{0,-1} = \frac 34\). The rest of the entries of \(\tilde P\) are the same as \(P\).
The invariant distribution of \(\tilde P\) is \[\MATRIX{ \tfrac{12}{17} & \tfrac{4}{17} & \tfrac{1}{17} }\] while the invariant distribution of \(P\) is \[\MATRIX{ \tfrac{27}{80} & \tfrac{36}{80} & \tfrac{12}{80} & \tfrac{4}{80} & \tfrac{1}{80} }.\]
Observe that if we take the last three elements of the latter and normalize it, we get the former.
Exercises
Exercise 10.1 (Time reversal of Markov chains) Let \(\{X_n\}_{n \ge 0}\) be a Markov chain. Show that for any \(N > n\), \[ \PR(X_n = x_n \mid X_{n+1:N} = x_{n+1:N}) = \PR(X_n = x_n \mid X_{n+1} = x_{n+1}). \] Thus, a time reversed Markov chain is also Markov.
Exercise 10.2 Suppose \(\{X_n\}_{n \ge 0}\) is a Markov chain with transition matrix \(P\). For a fixed positive integer \(k\), define \(Y_n = X_{kn}\). Show that \(\{Y_n\}_{n \ge 0}\) is a Markov chain with transition matrix \(P^k\).
Exercise 10.3 Suppose a (6-sided) die is ‘fixed’ so that two consecutive rolls cannot have the same outcome. In particular, if the outcome of a roll is \(i\), then the next roll cannot be \(i\); all \(5\) other outcomes are equally likely.
- Model the above as a Markov chain.
- If the outcome of the first roll is \(1\), what is the probability that the outcome of the \(n\)th roll is also \(1\)?
Further Reading
Gubner, Chapter 12.
Grinstead and Snell, Introduction to Probability, Chapter 11 has an excellent treatment of finite Markov chains. Has interesting exercises.
Kemeny and Snell, Finite Markov Chains, Chapter 2–4.
Kemeny, Snell, and Knapp, Denumerable Markov Chains, a slightly advanced version of the previous book.
Doyle and Snell, Random Walks and Electrical Networks shows a facinating relationship between random walks and circuits!