9  Markov chains

Updated

November 7, 2025

Let \(\ALPHABET X\) be a finite set. A stochastic process \(\{X_n\}_{n \ge 0}\), \(X_n \in \ALPHABET X\), is called a Markov chain if it satisfies the Markov property: for any \(n \in \integers_{\ge 0}\) and any \(x_{1:n+1} \in \ALPHABET X^{n+1}\), we have \[\begin{equation}\tag{Markov property}\label{eq:Markov} \PR(X_{n+1} = x_{n+1} \mid X_{1:n} = x_{1:n}) = \PR(X_{n+1} = x_{n+1} \mid X_n = x_n). \end{equation}\]

The variable \(X_n\) is called the state of the Markov chain at time \(n\); the set \(\ALPHABET X\) is called state space. The \(\eqref{eq:Markov}\) implies that the current state captures all the information from the past that is relevant for the future. Stated differently, conditioned on the present, the past is independent of the future.

Independence is a symmetric relationship. Thus, we expect the Markov property to also hold if time is reversed! Exercise 9.1 asks you to formally prove that.

We now present some examples of Markov chains arising in different applications.

Example 9.1 (Gilbert Elliot channel model for burst erasures)  

Example 9.2 (Ehrenfest model of diffusion) The following simplified model of diffusion through a porous membrane was proposed by @Ehrenfest1907 to describe the exchange of heat between two systems at different temperature.

There are \(K\) particles that can either be in compartment \(A\) or compartment \(B\). At each time, a particle is picked at random and moved from its compartment to the other. Thus, if there are \(X_n = i\) particles in compartment \(A\) at time \(n\), then the next state \(X_{n+1}\) is either \(i-1\) (a particle was displaced from compartment \(A\) to \(B\)) with probability \(i/K\), or \(i+1\) (a particle was displaced from compartment \(B\) to \(A\)) with probability \((K-i)/K\).

Example 9.3 (Gambler’s ruin) Consider a gambler who is playing in a casino. The gambler starts with an initial fortune of $\(x_0\). At each time, the gambler places a bet where he wins $\(1\) with probability \(p\) and loses $\(1\) with probability \(1-p\). Thus, his fortune evolves as \[ X_{n+1} = X_n + Z_n \] where \(\{Z_n\}_{n \ge 0}\) is an i.i.d. sequence which takes values \(+1\) with probability \(p\) and \(-1\) with probability \(1-p\).

The gambler stops when his fortune is ruined, i.e., \(X_n = 0\). In some variations, the gambler may also stop when his fortune reaches a pre-specified amount of $\(K\).

See the following video from Veritasium for an excellent history of Markov chains and its applications

9.1 Time-homogeneous Markov chains

In this course, we will focus on time-homogeneous Markov chains. The Markov chain is called time-homogeneous if the right hand side of \(\eqref{eq:Markov}\) does not depend on \(n\). In this case, we describe the Markov chain by a state transition matrix \(P\), where \(P_{ij} = \PR(X_{n+1} = j | X_n = i)\). Such Markov chains can also be visualized using state transition diagrams as we illustrate in the examples below.

Example 9.4 (On-Off Markov chain) The On-Off Markov chain discussed earlier can be modelled with \(\ALPHABET X = \{0, 1\}\) and general transition probability matrix of the form. \[ P = \MATRIX{ 1 - a & a \\ b & 1 - b }. \] The transition matrix can be visualized as follows.

On-Off Markov chain

In spite of its simplicity, such simple on-off Markov chains are used in various applications including telecommunication systems, network traffic modeling, machine failure-repair models, gene activation, and others. Some properties of the on-off Markov chain are as follows:

  • If \(a + b = 1\), then both rows of the transition matrix are identical. Therefore, the Markov chain has no memory and is equivalent to a Bernoulli process with success probability \(a = 1-b\).

  • When \(a\) or \(b\) are small, the corresponding state is “sticky”, i.e., when the Markov chain enters a sticky state, it stays there for a long time.

Example 9.5 The Ehrenfest model of diffusion presented in Example 9.2 can be modelled as a Markov chain with state space \(\{0, 1, \dots, K\}\) and transition probability \[ P_{ij} = \begin{cases} i/K & j = i-1 \\ (K-i)/K & j = i + 1 \\ 0 & \text{otherwise}. \end{cases} \] The transition matrix can be visualized as follows.

Ehrenfest model of diffusion

Example 9.6 (Random walk in one dimension) Imagine a particle which moves in a straight line in unit steps. Each step is one unit to the right with probability \(p\) or one unit to the left with probability \(q = 1-p\). It moves until it reaches one of two extreme points, which are called boundary points. The behavior of the particle at the boundary determines several different possibilities.

We will consider the case where the state space is \(\ALPHABET X = \{-2, -1, 0, 1, 2\}\) and the process starts in state \(0\).

  1. Absorbing random walk: Assume that when the particle reaches a boundary state, it stays there from that time on. We may visualize the Markov chain as follows.

    Absorbing random walk

    In this case, the transition matrix is given by \[ P = \MATRIX{ 1 & 0 & 0 & 0 & 0 \\ q & 0 & p & 0 & 0 \\ 0 & q & 0 & p & 0 \\ 0 & 0 & q & 0 & p \\ 0 & 0 & 0 & 0 & 1}. \]

  2. Reflected random walk: Assume that when the particle reaches a boundary states, it is reflected and returns to the point it came from. We may visualize the Markov chain as follows.

    Reflected random walk

    In this case, the transition matrix is given by \[ P = \MATRIX{ 0 & 1 & 0 & 0 & 0 \\ q & 0 & p & 0 & 0 \\ 0 & q & 0 & p & 0 \\ 0 & 0 & q & 0 & p \\ 0 & 0 & 0 & 1 & 0}. \]

  3. Random walk with restart: Assume that when the particle reaches a boundary state, it restarts in the initial state. We may visualize the Markov chain as follows.

    Random walk with restart

    In this case, the transition matrix is given by \[ P = \MATRIX{ 0 & 0 & 1 & 0 & 0 \\ q & 0 & p & 0 & 0 \\ 0 & q & 0 & p & 0 \\ 0 & 0 & q & 0 & p \\ 0 & 0 & 1 & 0 & 0}. \]

  4. Random walk with periodic boundary: Assume that when the particle reaches a boundary state, it moves to the other boundary. We may visualize the Markov chain as follows.

    Random walk with periodic boundary

    In this case, the transition matrix is given by \[ P = \MATRIX{ 0 & 0 & 0 & 0 & 1 \\ q & 0 & p & 0 & 0 \\ 0 & q & 0 & p & 0 \\ 0 & 0 & q & 0 & p \\ 1 & 0 & 0 & 0 & 0}. \]

Note that the gambler’s fortune in Example 9.3 is a random walk on non-negative integers \(\{0,1,2, \dots\}\) with absorption at \(0\). In the variation where the gambler stops when his fortune reaches $\(K\), it is a random walk over \(\{0,1,\dots,K\}\) with absorption at both ends: \(0\) and \(K\).

TODO: Add other examples

  • Success runs

9.1.1 Properties of interest

Depending on the application, we are typically interested in the following properties of a Markov chain:

  • If the chain starts in state \(i\), what is the probability that after \(n\) steps it is in state \(j\)?

  • If the chain starts in state \(i\), what is the expected number of visits to state \(j\) in \(n\) steps?

  • What is the expected number of steps that it takes for a chain starting in state \(i\) to visit state \(j\) for the first time?

  • What is the average number of times that the chain is in state \(i\)? How does this depend on the initial state?

In the rest of this section, we will present results in Markov chain theory that answer the above questions.

9.2 State occupancy probabilities

  1. Let \(μ^{(n)}\) denote the PMF of the state of the Markov chain at time \(n\). This is also called the state occupancy probabilities. We will think of \(μ^{(n)}\) as a row vector.

  2. By the law of total probability, we have \[ \PR(X_n = j) = \sum_{i \in \ALPHABET X} \PR(X_{n-1} = i) \PR(X_n = j | X_{n-1} = i) \] or, equivalently, \[ μ^{(n)}_j = \sum_{i \in \ALPHABET X} μ^{(n-1)}_i P_{ij} \] which can be written in matrix form as \[ μ^{(n)} = μ^{(n-1)} P \] and, by recursively expanding the right hand side, we have \[ μ^{(n)} = μ^{(0)} P^n. \]

  3. We will abbreviate \([P^n]_{ij}\) as \(P^{(n)}_{ij}\). Note that \[ P^{(n)}_{ij} = \PR(X_n = j \mid X_0 = i) \] denotes the \(n\)-step transition probability.

  4. Chapman-Kolmogorov equations: The multi-step transition probabilities satisfy the following: for any positive integers \(m\) and \(n\) and \(i,j \in \ALPHABET X\), we have \[ P^{(n+m)}_{ij} = \sum_{k \in \ALPHABET X} P^{(n)}_{ik} P^{(m)}_{kj}. \] Thus, the probability of going from \(i\) to \(j\) in \(n+m\) steps equals the probability of going from \(i\) to somewhere in \(n\) steps and then from there to \(j\) in \(m\) steps.

Example 9.7 Numerically compute the state occupancy probabilities for the different examples of random walk presented in Example 9.6 when \(p = q = \tfrac 12\) for \(n \in \{1, \dots, 8\}\). In all cases, the initial probability \[ μ^{(0)} = \MATRIX{0 & 0 & 1 & 0 & 0 }. \]

  1. Absorbing random walk: In this case, we have

    • \(μ^{(0)} = \left[ \begin{array}{ccccc} 0 & 0 & 1 & 0 & 0 \\ \end{array} \right]\)
    • \(μ^{(1)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ \end{array} \right]\)
    • \(μ^{(2)} = \left[ \begin{array}{ccccc} \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} \\ \end{array} \right]\)
    • \(μ^{(3)} = \left[ \begin{array}{ccccc} \frac{1}{4} & \frac{1}{4} & 0 & \frac{1}{4} & \frac{1}{4} \\ \end{array} \right]\)
    • \(μ^{(4)} = \left[ \begin{array}{ccccc} \frac{3}{8} & 0 & \frac{1}{4} & 0 & \frac{3}{8} \\ \end{array} \right]\)
    • \(μ^{(5)} = \left[ \begin{array}{ccccc} \frac{3}{8} & \frac{1}{8} & 0 & \frac{1}{8} & \frac{3}{8} \\ \end{array} \right]\)
    • \(μ^{(6)} = \left[ \begin{array}{ccccc} \frac{7}{16} & 0 & \frac{1}{8} & 0 & \frac{7}{16} \\ \end{array} \right]\)
    • \(μ^{(7)} = \left[ \begin{array}{ccccc} \frac{7}{16} & \frac{1}{16} & 0 & \frac{1}{16} & \frac{7}{16} \\ \end{array} \right]\)
    • \(μ^{(8)} = \left[ \begin{array}{ccccc} \frac{15}{32} & 0 & \frac{1}{16} & 0 & \frac{15}{32} \\ \end{array} \right]\)
  2. Reflected random walk: In this case, we have

    • \(μ^{(0)} = \left[ \begin{array}{ccccc} 0 & 0 & 1 & 0 & 0 \\ \end{array} \right]\)
    • \(μ^{(1)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ \end{array} \right]\)
    • \(μ^{(2)} = \left[ \begin{array}{ccccc} \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} \\ \end{array} \right]\)
    • \(μ^{(3)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ \end{array} \right]\)
    • \(μ^{(4)} = \left[ \begin{array}{ccccc} \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} \\ \end{array} \right]\)
    • \(μ^{(5)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ \end{array} \right]\)
    • \(μ^{(6)} = \left[ \begin{array}{ccccc} \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} \\ \end{array} \right]\)
    • \(μ^{(7)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ \end{array} \right]\)
    • \(μ^{(8)} = \left[ \begin{array}{ccccc} \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} \\ \end{array} \right]\)
  3. Random walk with restart: In this case, we have

    • \(μ^{(0)} = \left[ \begin{array}{ccccc} 0 & 0 & 1 & 0 & 0 \\ \end{array} \right]\)
    • \(μ^{(1)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ \end{array} \right]\)
    • \(μ^{(2)} = \left[ \begin{array}{ccccc} \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} \\ \end{array} \right]\)
    • \(μ^{(3)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{4} & \frac{1}{2} & \frac{1}{4} & 0 \\ \end{array} \right]\)
    • \(μ^{(4)} = \left[ \begin{array}{ccccc} \frac{1}{8} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{8} \\ \end{array} \right]\)
    • \(μ^{(5)} = \left[ \begin{array}{ccccc} \frac{1}{8} & \frac{1}{8} & \frac{1}{2} & \frac{1}{8} & \frac{1}{8} \\ \end{array} \right]\)
    • \(μ^{(6)} = \left[ \begin{array}{ccccc} \frac{1}{16} & \frac{1}{4} & \frac{3}{8} & \frac{1}{4} & \frac{1}{16} \\ \end{array} \right]\)
    • \(μ^{(7)} = \left[ \begin{array}{ccccc} \frac{1}{8} & \frac{3}{16} & \frac{3}{8} & \frac{3}{16} & \frac{1}{8} \\ \end{array} \right]\)
    • \(μ^{(8)} = \left[ \begin{array}{ccccc} \frac{3}{32} & \frac{3}{16} & \frac{7}{16} & \frac{3}{16} & \frac{3}{32} \\ \end{array} \right]\)
  4. Random walk with periodic boundary: In this case, we have

    • \(μ^{(0)} = \left[ \begin{array}{ccccc} 0 & 0 & 1 & 0 & 0 \\ \end{array} \right]\)
    • \(μ^{(1)} = \left[ \begin{array}{ccccc} 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ \end{array} \right]\)
    • \(μ^{(2)} = \left[ \begin{array}{ccccc} \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} \\ \end{array} \right]\)
    • \(μ^{(3)} = \left[ \begin{array}{ccccc} \frac{1}{4} & \frac{1}{4} & 0 & \frac{1}{4} & \frac{1}{4} \\ \end{array} \right]\)
    • \(μ^{(4)} = \left[ \begin{array}{ccccc} \frac{3}{8} & 0 & \frac{1}{4} & 0 & \frac{3}{8} \\ \end{array} \right]\)
    • \(μ^{(5)} = \left[ \begin{array}{ccccc} \frac{3}{8} & \frac{1}{8} & 0 & \frac{1}{8} & \frac{3}{8} \\ \end{array} \right]\)
    • \(μ^{(6)} = \left[ \begin{array}{ccccc} \frac{7}{16} & 0 & \frac{1}{8} & 0 & \frac{7}{16} \\ \end{array} \right]\)
    • \(μ^{(7)} = \left[ \begin{array}{ccccc} \frac{7}{16} & \frac{1}{16} & 0 & \frac{1}{16} & \frac{7}{16} \\ \end{array} \right]\)
    • \(μ^{(8)} = \left[ \begin{array}{ccccc} \frac{15}{32} & 0 & \frac{1}{16} & 0 & \frac{15}{32} \\ \end{array} \right]\)

Example 9.8 Analytically compute the state occupancy probabilities for the on-off Markov chain of Example 9.4 when it starts from the initial probability distribution \(μ^{(0)}\).

Since \(μ^{(n+1)} = μ^{(n)} P\), we have \[\begin{align*} μ^{(n+1)}_0 &= μ^{(n)}_0 (1-a) + μ^{(n)}_1 b \\ &= μ^{(n)}_0 (1-a) + (1 - μ^{(n)}_0) b \\ &= μ^{(n)}_0 (1-a-b) + b. \end{align*}\] If \(a = b = 0\), then \(μ^{(n+1)}_0 = μ^{(n)}_0 = \cdots = μ^{(0)}_0\). If not, we exploit the fact that \[ b = \frac{b}{a+b} - \frac{b}{a+b}(1-a-b) \] to recursively write \[\begin{align*} μ^{(1)}_0 &= μ^{(0)}_0 (1-a-b) + b \\ &= \left(μ^{(0)}_0 - \frac{b}{a+b}\right)(1-a-b) + \frac{b}{a+b} \end{align*}\] and \[\begin{align*} μ^{(2)}_0 &= μ^{(1)}_0 (1-a-b) + b \\ &= \left(μ^{(0)}_0 - \frac{b}{a+b}\right)(1-a-b)^2 + \frac{b}{a+b}(1-a-b) + b \\ &= \left(μ^{(0)}_0 - \frac{b}{a+b}\right)(1-a-b)^2 + \frac{b}{a+b} \end{align*}\] and, so on, to get \[\begin{align*} μ^{(n)}_0 &= μ^{(n-1)}_0 (1-a-b) + b \\ &= \left(μ^{(0)}_0 - \frac{b}{a+b}\right)(1-a-b)^n + \frac{b}{a+b}(1-a-b) + b \\ &= \left(μ^{(0)}_0 - \frac{b}{a+b}\right)(1-a-b)^n + \frac{b}{a+b}. \end{align*}\] Therefore, \[ μ^{(n)}_1 = 1 - μ^{(n)}_0 = \left(μ^{(0)}_1 - \frac{a}{a+b}\right)(1-a-b)^n + \frac{a}{a+b}. \]

The above analysis appears to be a bit of black magic. To understand what is going on, note that we are interested in computing \(μ^{(0)} P^n\). What is an efficient way to compute \(P^n\)? Eigen decomposition!. Since \(P\) is a row stochastic matrix, we have \(P \mathbf{1} = 1.\) Thus, \(λ_1 = 1\) is always an eigenvector of any transition matrix with eigenvector \(\mathbf{1}\).

For Example 9.4, we can explicitly compute all eigenvalues by finding the roots of the characteristic equation \[\begin{align*} \det(λI - P) &= \DET{λ - 1 + a & - a \\ -b & λ - 1 + b} \\ &= (λ-1 +a)(λ-1+b) - ab \\ &= (λ-1)^2 + (a+b)(λ-1) = (λ-1)(λ-1 + a + b). \end{align*}\] Thus, the eigenvalues are \(λ_1 = 1\) and \(λ_2 = 1 - a - b\).

For the special case of \(2 × 2\) transition matrices, we can find the second eigenvalue by observing that \(λ_1 = 1\) is always an eigenvalue and \(\TR(P) = λ_1 + λ_2 = 1 + λ_2\) or \(\det(P) = λ_1 λ_2 = λ_2\).

To find the eigenvector, we find a vector \(v\) such that \((λI - P)v = 0\).

  • For \(λ_1 = 1\), we already know that \(v_1 = [1; 1]\) is an eigenvector.
  • For \(λ_2 = (1-a-b)\), we have \[ λ_2I - P = \MATRIX{-b & -a \\ -b & -a}. \] Therefore, one possible eigenvector is \([a; -b]\).

Then, from spectral decomposition, we know that \[ P = V Λ V^{-1} \] where \(V = [v_1 v_2] = [1, a; 1, -b]\) and \(Λ = \diag(1, 1-a-b)\). Therefore, \[\begin{align*} μ^{(n)} &= μ^{(0)} P^n = μ^{(0)} V Λ^n V^{-1} \\ &= \MATRIX{ μ^{(0)}_0 & 1 - μ^{(0)}_0 } \MATRIX{1 & a \\ 1 & -b } \MATRIX{1 & 0 \\ 0 & 1 - a -b} \frac{1}{a+b} \MATRIX{b & a \\ 1 & -1 } \\[5pt] &= \MATRIX{\dfrac{1}{a+b} & μ^{(0)}_0 - \dfrac{b}{a+b} } \MATRIX{1 & 0 \\ 0 & 1 - a -b} \MATRIX{b & a \\ 1 & -1 } \\[5pt] &=\MATRIX{\dfrac{1}{a+b} & 0 \\ 0 & \left(μ^{(0)}_0 - \dfrac{b}{a+b}\right)(1-a-b)^n } \MATRIX{b & a \\ 1 & -1 } \\[5pt] &= \frac{b}{a+b} + \left(μ^{(0)}_0 - \dfrac{b}{a+b}\right)(1-a-b)^n \end{align*}\]

TODO: Add more examples!

9.3 Class structure

  1. We say that a state \(j\) is accessible from state \(i\) (abbreviated as \(i \rightsquigarrow j\)) if there exists an \(m \in \integers_{\ge 0}\) (which may depend on \(i\) and \(j\)) such that \([P^m]_{ij} > 0\). The fact that \(P^{(m)}_{ij} > 0\) implies that there exists an ordered sequence of states \((i_0, \dots, i_m)\) such that \(i_0 = i\) and \(i_m = j\) such that \(P_{i_k i_{k+1}} > 0\); thus, there is a path of positive probability from state \(i\) to state \(j\).

    Accessibility is a transitive relationship, i.e., if \(i \rightsquigarrow j\) and \(j \rightsquigarrow k\), then \(i \rightsquigarrow k\).

    Example 9.9 Consider the Markov chain shown below.

    1. Identify all states that are accessible from state \(1\).
    2. Identify all states from which state \(1\) is accessible.
    1. States accessible from state \(1\) are \(\{1,2,3\}\).
    2. States from which state \(1\) is accessible are \(\{1,2,3,4,5,6\}\).
  2. Two distinct states \(i\) and \(j\) are said to communicate (abbreviated to \(i \leftrightsquigarrow j\)) if \(i\) is accessible from \(j\) (i.e., \(j \rightsquigarrow i\)) and \(j\) is accessible from \(i\) (\(i \rightsquigarrow j\)). Alternatively, we say that \(i\) and \(j\) communicate if there exist \(m, m' \in \integers_{\ge 0}\) such that \(P^{(m)}_{ij} > 0\) and \(P^{(m')}_{ji} > 0\).

    For instance, in Example 9.9, state \(1\) communicates with state \(2\) but does not communicate with state \(5\).

    Communication is an equivalence relationship, i.e., it is reflexive (\(i \leftrightsquigarrow i\)), symmetric (\(i \leftrightsquigarrow j\) if and only if \(j \leftrightsquigarrow i\)), and transitive (\(i \leftrightsquigarrow j\) and \(j \leftrightsquigarrow k\) implies \(i \leftrightsquigarrow k\)).

  3. The states in a finite-state Markov chain can be partitioned into two sets: recurrent states and transient states. A state is recurrent if it is accessible from all states that are accessible from it (i.e., \(i\) is recurrent if \(i \rightsquigarrow j\) implies that \(j \rightsquigarrow i\)). States that are not recurrent are transient.

  4. It can be shown that a state \(i\) is recurrent if and only if \[\sum_{m=1}^{\infty} P^{(m)}_{ii} = \infty.\] This is a consequence of the second Borel-Cantelli lemma (Lemma 7.2) and strong Markov property (to be discussed later).

  5. States \(i\) and \(j\) are said to belong to the same communicating class if \(i\) and \(j\) communicate. Communicating classes form a partition of the state space. Within a communicating class, all states are of the same type, i.e., either all states are recurrent (in which case the class is called a recurrent class) or all states are transient (in which case the class is called a transient class).

    For example, in Example 9.9, there are two communication classes: \(\{1,2,3\}\) and \(\{4,5,6\}\). The communication class \(\{4,5,6\}\) is transient while the communication class \(\{1,2,3\}\) is recurrent.

  6. A communicating class \(C\) is said to be closed if \[ i \in C \text{ and } i \rightsquigarrow j \implies j \in C. \] Thus, there is no escape from a closed class. For finite state spaces, a recurrent class is always closed and a transient class is never closed. But this is not the case for countable state Markov chains.

  7. A state \(i\) is called absorbing if \(\{i\}\) is a closed class, i.e., if \(P_{ii} = 1\).

  8. A Markov chain with a single communicating class (thus, all states communicate with each other and are, therefore, recurrent) is called irreducible.

  9. The period of a state \(i\), denoted by \(d(i)\), is defined as \[d(i) = \gcd\{ t \in \integers_{\ge 1} : [P^t]_{ii} > 0 \}.\] If the period is \(1\), the state is aperiodic, and if the period is \(2\) or more, the state is periodic. It can be shown that all states in the same class have the same period.

  10. A Markov chain is aperiodic, if all states are aperiodic. A simple sufficient (but not necessary) condition for an irreducible Markov chain to be aperiodic is that there exists a state \(i\) such that \(P_{ii} > 0\). In general, for a finite and aperiodic Markov chain, there exists a positive integer \(M\) such that \[ P^{(m)}_{ii} > 0, \quad \forall m \ge M, i \in \ALPHABET X.\]

9.4 Hitting times and absorption probabilities

  1. We use the following notation:

    • For any event \(E\), \(\PR_i(E)\) denotes \(\PR(E \mid X_0 = i)\)
    • For any random variable \(Y\), \(\EXP_i[Y]\) denotes \(\EXP[Y \mid X_0 = i]\).
  2. Let \(A\) be a subset of \(\ALPHABET X\). The hitting time \(H^A\) of \(A\) is a random variable \(H^A \colon \ALPHABET X \to \{0, 1, \dots \} \cup \{∞\}\) given by \[ H^A(ω) = \min\{n \ge 0 : X_n(ω) \in A\}. \] The standard convention is that \(H^A\) is taken to be \(∞\) if \(X_n \neq A\) for any \(n > 0\). For a state \(j \in \ALPHABET X\), we use the short-hand \(H^j\) to denote \(H^{\{j\}}\).

  3. The probability that starting from state \(i\) the Markov chain ever hits \(A\) is then given by \[ h^A_i = \PR_i(H^A < ∞). \] This is called hitting probability. When \(A\) is a closed class, \(h^A_i\) is called the absorption probability.

  4. The mean-time taken for the Markov chain to reach \(A\) is given by \[ m^A_i = \EXP_i[H^A] = \sum_{n=0}^{∞} n \PR_i(H^A = n). \] This is called the mean hitting time.

  5. A remarkable property of Markov chain is that these quantities can be computed by solving a system of linear equations associated with the transition probability matrix \(P\). We start with some examples to illustrate the main idea.

    Example 9.10 Suppose we toss a coin multiple times. What is the probability of getting a head before getting a tail?

    We model this using a Markov chain where the state represents the outcome we’re waiting for. Let \(p\) denote the probability of heads and \(q = 1-p\) denote the probability of tails. The Markov chain has three states:
    • State \(-1\): tail has occurred (absorbing state, we lose)
    • State \(0\): initial state (no outcome yet)
    • State \(1\): head has occurred (absorbing state, we win)

    Markov chain for coin tossing until head or tail

    Let \(h^1_i\) denote the probability of hitting state \(1\) (i.e., getting a head) when starting at state \(i\). Then, we have \[\begin{align*} h^1_{-1} &= 0, \\ h^1_0 &= p h^1_1 + q h^1_{-1} = p \cdot 1 + q \cdot 0 = p, \\ h^1_1 &= 1. \end{align*}\] Therefore, starting from state \(0\), the probability of getting a head before a tail is \(h^1_0 = p\).

  6. Example 9.11 Two players are playing a game. They repeatedly toss a coin. Player 1 wins if two consecutive heads occur before two consecutive tails. Player 2 wins if two consecutive tails occur before two consecutive heads. What is the probability that Player 1 wins?

    We model this using a Markov chain where the state tracks the pattern of recent coin tosses. Let \(p\) denote the probability of heads and \(q = 1-p\) denote the probability of tails. The Markov chain has five states:
    • State \(-2\): two consecutive tails (Player 2 wins, absorbing state)
    • State \(-1\): one tail (last toss was a tail)
    • State \(0\): initial state (or after alternating outcomes like HT or TH)
    • State \(1\): one head (last toss was a head)
    • State \(2\): two consecutive heads (Player 1 wins, absorbing state)

    Markov chain for two heads before two tails

    Let \(h^2_i\) denote the probability of hitting state \(2\) (i.e., Player 1 wins) when starting at state \(i\). Then, we have \[\begin{align*} h^2_{-2} &= 0, \\ h^2_{-1} &= p h^2_0 + q h^2_{-2} = p h^2_0 + q \cdot 0 = p h^2_0, \\ h^2_0 &= p h^2_1 + q h^2_{-1}, \\ h^2_1 &= p h^2_2 + q h^2_0 = p \cdot 1 + q h^2_0 = p + q h^2_0, \\ h^2_2 &= 1. \end{align*}\] From the second equation, we get \(h^2_{-1} = p h^2_0\). From the fourth equation, we get \(h^2_1 = p + q h^2_0\). Substituting into the third equation: \[ h^2_0 = p h^2_1 + q h^2_{-1} = p(p + q h^2_0) + q(p h^2_0) = p^2 + pq h^2_0 + pq h^2_0 = p^2 + 2pq h^2_0. \] Solving for \(h^2_0\), we get \(h^2_0(1 - 2pq) = p^2\), so \[ h^2_0 = \frac{p^2}{1 - 2pq} = \frac{p^2}{1 - 2p(1-p)} = \frac{p^2}{1 - 2p + 2p^2} = \frac{p^2}{2p^2 - 2p + 1}. \] For a fair coin (\(p = q = \frac{1}{2}\)), we get \(h^2_0 = \frac{(1/2)^2}{1 - 2(1/2)(1/2)} = \frac{1/4}{1 - 1/2} = \frac{1/4}{1/2} = \frac{1}{2}\).

  7. We now provide a general formula for computing hitting probabilities.

    Theorem 9.1 (Hitting probabilities) The hitting probabilities \(\{h^A_i\}_{i \in \ALPHABET X}\) satisfies the following system of linear equations: \[ h^A_i = \begin{cases} 1, & i \in A \\ \sum_{j \in \ALPHABET X} P_{ij} h^A_j, & i \not\in A \end{cases} \] When \(\ALPHABET X\) is finite, the above system has a unique solution; when \(\ALPHABET X\) is countable, the above system may have multiple solutions and the hitting probabilities correspond to the minimal non-negative solution.

    When \(X_0 = i \in A\), the hitting time \(H^A = 0\), so \(h^A_i = 1\). This proves the first part of the formula.

    For the second part, consider \(X_0 = i \not\in A\). Then \(H^A_i \ge 1\). By the Markov property, we have \[ \PR_i(H^A < ∞ \mid X_1 = j) = \PR_j(H^A < ∞) = h^A_j. \] Moreover, by the law of total probability, we have \[\begin{align*} h^A_i &= \PR_i(H^A < ∞) = \sum_{j \in \ALPHABET X} \PR_i(H^A < ∞, X_1 = j) \\ &= \sum_{j \in \ALPHABET X} \PR_i(H^A < ∞ \mid X_1 = j) \PR_i(X_1 = j) \\ &= \sum_{j \in \ALPHABET X} P_{ij} h^A_j. \end{align*}\]

  8. Example 9.12 Consider a gambler’s ruin problem, where the gambler starts with $\(1\) and stops either when he is ruined or when his fortune reaches $\(K\).

    Find the probability of ruin (i.e., the fortune gets absorbed in state \(0\) rather than state \(K\)).

    For the ease of notation, we use \(h_i\) as a short-form for \(h^{\{0\}}_i\). The hitting probabilities satisy the linear system of equations:

    \[\begin{align*} h_0 &= 1 \\ h_i &= ph_{i+1} + q h_{i-1}, \quad i \in \{1,\dots,K-1\} \\ h_K &= 0, \end{align*}\] where \(q = 1-p\).

    The characteristic equation associated with the linear recurrence relationship is \[ λ = p λ^2 + q \] which has two distinct roots, \(λ_1 = 1\) and \(λ_2 = q/p\) if \(p \neq q\) and a double root at \(λ_1 = 1\) if \(p = q = \frac 12\). Therefore, the general solution is of the form \[ h_i = a λ_1^i + b λ_2^i = a + b\Bigl(\tfrac {q}{p} \Bigr)^i \] where we determine the coefficients \(a\) and \(b\) from the boundary conditions \(h_0 = 1\) and \(h_K = 0\). Solving for \(a\) and \(b\), we get that for \(p \neq q\), we have \[ h_i = \frac{1 - \Bigl(\frac qp\Bigr)^i}{1 - \Bigl(\frac qp\Bigr)^K} \] and for \(p = q = \frac 12\), we have \[ h_i = \frac{i}{K}. \]

  9. Similarly as above, we can derive a formula for computing mean hitting times. We start with a couple of examples to illustrate the main idea.

    Example 9.13 Suppose we toss a coin multiple times and stop at the first head. What is the expected number of coin tosses until stopping?

    From elementary probability we know that the number of tosses until stopping is a geometric random variable. However, we will model this using a Markov chain where the state denotes the number of consecutive heads so far. Let \(p\) denote the probability of heads and \(q = 1-p\) denote the probability of tails. Then, the Markov chain model is as follows.

    Markov chain for coin tossing until one head

    Let \(m^1_i\) denote the expected number of tosses until stopping (i.e., hitting state \(1\)) when starting at state \(i\). Then, we have \[\begin{align*} m^1_0 &= 1 + q m^1_0 + p m^1_1, \\ m^1_1 &= 0. \end{align*}\] Solving this system of equations, we get \(m^1_0 = 1/(1-q) = 1/p\).

  10. Example 9.14 Suppose we toss a coin multiple times and stop at two consecutive heads. What is the expected number of coin tosses until stopping?

    We can model this in the same manner as before, where the state denotes the number of consecutive heads so far. The Markov chain is as follows:

    Markov chain for coin tossing until two heads

    Let \(m^2_i\) denote the expected number of tosses until stopping (i.e., hitting state \(2\)) when starting at state \(i\). Then, we have \[\begin{align*} m^2_0 &= 1 + q m^2_0 + p m^2_1, \\ m^2_1 &= 1 + q m^2_0 + p m^2_2, \\ m^2_2 &= 0. \end{align*}\] Solving this system of equations, we get \(m^2_0 = 1/(1-p)\).

  11. We now provide a general formula for computing mean hitting times.

    Theorem 9.2 (Mean hitting times) The mean hitting times \(\{m^A_i\}_{i \in \ALPHABET X}\) satisfies the following system of linear equations: \[ m^A_i = \begin{cases} 0, & i \in A \\ 1 + \sum_{j \not\in A} P_{ij} m^A_j, & i \not\in A \end{cases} \] When \(\ALPHABET X\) is finite, the above system has a unique solution; when \(\ALPHABET X\) is countable, the above system may have multiple solutions and the mean hitting times correspond to the minimal non-negative solution.

    The proof is similar to the proof of Theorem 9.1. When \(X_0 = i \in A\), the hitting time \(H^A = 0\), so \(m^A_i = 0\). This proves the first part of the formula.

    For the second part, consider \(X_0 = i \not\in A\). Then \(H^A_i \ge 1\). By the Markov property, we have \[ \EXP_i[ H^A \mid X_1 = j] = 1 + \EXP_j[ H^A ] = 1 + m^A_j. \] Moreover, by the law of total probability, we have \[\begin{align*} m^A_i &= \EXP_i[ H^A ] = \sum_{j \in \ALPHABET X} \EXP_i[ H^A \mid X_1 = j ] \PR_i(X_1 = j) \\ &= \sum_{j \in \ALPHABET X} \EXP_i[ H^A \mid X_1 = j ] \PR_i(X_1 = j) \\ &= \sum_{j \in \ALPHABET X} P_{ij} \bigl[ 1 + m^A_j \bigr] \\ &= 1 + \sum_{j \not\in A} P_{ij} m^A_j. \end{align*}\]

  12. For absorbing Markov chains, we can use matrix methods to efficiently compute hitting probabilities and mean hitting times. The key idea is to rearrange the transition matrix into a canonical form where absorbing states are grouped together.

    Suppose the state space \(\ALPHABET X\) can be partitioned into transient states \(\ALPHABET T\) and absorbing states \(\ALPHABET C\). By reordering the states so that all transient states come first, followed by all absorbing states, the transition matrix \(P\) can be written in the canonical form: \[ P = \MATRIX{ Q & R \\ 0 & I}. \] where:

    • \(Q\) is a \(|\ALPHABET T| \times |\ALPHABET T|\) matrix containing transition probabilities between transient states,
    • \(R\) is a \(|\ALPHABET T| \times |\ALPHABET C|\) matrix containing transition probabilities from transient states to absorbing states,
    • \(0\) is a \(|\ALPHABET C| \times |\ALPHABET T|\) zero matrix (since absorbing states cannot transition to transient states),
    • \(I\) is a \(|\ALPHABET C| \times |\ALPHABET C|\) identity matrix (since absorbing states stay in themselves).
  13. Multi-step transitions: For the canonical form, we can show by induction that \[ P^n = \MATRIX{ Q^n & (I + Q + Q^2 + \cdots + Q^{n-1}) R \\ 0 & I}. \] This follows from the fact that matrix multiplication preserves the block structure, and the lower-left block remains zero while the lower-right block remains \(I\) for all powers.

  14. Fundamental Matrix: The fundamental matrix \(N\) is defined as \[ N = (I - Q)^{-1} = I + Q + Q^2 + Q^3 + \cdots. \] The series expansion is valid because for transient states, the spectral radius of \(Q\) is less than 1, ensuring convergence. The fundamental matrix exists and is well-defined for any Markov chain with at least one absorbing state.

  15. Interpretation of \(N_{ij}\): The entry \(N_{ij}\) represents the expected number of visits to transient state \(j\) before absorption, starting from transient state \(i\). This can be seen from the series expansion: \[ N_{ij} = \sum_{n=0}^∞ [Q^n]_{ij} = \sum_{n=0}^∞ \PR_i(X_n = j, \text{ chain hasn't been absorbed by time } n). \]

  16. Mean hitting times: Let \(\ONES\) denote a column vector of ones with dimension \(|\ALPHABET T|\). Then the mean hitting times (expected time until absorption) starting from each transient state are given by \[ m = N \ONES, \] where \(m_i\) is the mean time until absorption starting from transient state \(i\).

    The mean hitting time \(m_i\) equals the expected number of steps spent in transient states before absorption. Since \(N_{ij}\) represents the expected number of visits to transient state \(j\) starting from transient state \(i\), we have \[ m_i = \sum_{j \in \ALPHABET T} N_{ij} = [N \ONES]_i, \] which gives the desired result in matrix form.

  17. Hitting probabilities: The hitting probabilities (absorption probabilities) are given by \[ h = N R, \] where \(h_{ij}\) is the probability of being absorbed in absorbing state \(j\) when starting from transient state \(i\).

    Starting from transient state \(i\), the chain is absorbed in absorbing state \(j\) if and only if it takes some number \(n \ge 0\) of steps within transient states, and then transitions to \(j\) in step \(n+1\). The probability of being in transient state \(k\) after \(n\) steps (without being absorbed) is \([Q^n]_{ik}\), and the probability of transitioning from \(k\) to \(j\) is \(R_{kj}\). Therefore, \[ h_{ij} = \sum_{n=0}^∞ \sum_{k \in \ALPHABET T} [Q^n]_{ik} R_{kj} = \sum_{n=0}^∞ [Q^n R]_{ij} = [N R]_{ij}, \] where the last equality follows from \(N = I + Q + Q^2 + \cdots\).

  18. Example 9.15 Consider the gambler’s ruin problem from Example 9.12 with \(K = 3\) (i.e., the gambler starts with $\(1\) and stops at either $\(0\) or $\(3\)). Write the transition matrix in canonical form and use the fundamental matrix to compute the probability of ruin and the expected duration of play.

    The state space is \(\{0, 1, 2, 3\}\) where states \(0\) and \(3\) are absorbing. In the natural ordering \((0, 1, 2, 3)\), the transition matrix is: \[ P = \MATRIX{ 1 & 0 & 0 & 0 \\ q & 0 & p & 0 \\ 0 & q & 0 & p \\ 0 & 0 & 0 & 1 } \] where \(q = 1-p\). Reordering to put transient states first, we use the ordering \((1, 2, 0, 3)\): \[ P = \MATRIX{ 0 & p & q & 0 \\ q & 0 & 0 & p \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 } = \MATRIX{ Q & R \\ 0 & I } \] where \[ Q = \MATRIX{ 0 & p \\ q & 0 }, \quad R = \MATRIX{ q & 0 \\ 0 & p }. \]

    The fundamental matrix is: \[ N = (I - Q)^{-1} = \MATRIX{ 1 & -p \\ -q & 1 }^{-1} = \frac{1}{1-pq} \MATRIX{ 1 & p \\ q & 1 }. \]

    The hitting probabilities are: \[ h = N R = \frac{1}{1-pq} \MATRIX{ 1 & p \\ q & 1 } \MATRIX{ q & 0 \\ 0 & p } = \frac{1}{1-pq} \MATRIX{ q & p^2 \\ q^2 & p }. \]

    Thus, starting from state \(1\), the probability of ruin (absorption in state \(0\)) is \(h_{1,0} = q/(1-pq)\) and the probability of winning (absorption in state \(3\)) is \(h_{1,3} = p^2/(1-pq)\).

    The mean hitting times are: \[ m = N \ONES = \frac{1}{1-pq} \MATRIX{ 1 & p \\ q & 1 } \MATRIX{ 1 \\ 1 } = \frac{1}{1-pq} \MATRIX{ 1+p \\ q+1 }. \]

    Thus, starting from state \(1\), the expected duration is \(m_1 = (1+p)/(1-pq)\).

9.5 Invariant Distribution

  1. \(π = π P\).

  2. Balance equations to find invariant distributions

9.6 First passage time and strong Markov property

  1. Define the first passage time \[ f^{(n)}_{ij} \coloneqq P_i(T_j = n) = P_i(X_1 \neq j, \dots, X_{n-1} \neq j, X_n = j). \]

  2. \(f^{(n)}_{ij}\) satisfies the following recursion.

    • \(f^{(1)}_{ij} = P_i(X_1 = j) = P_{ij}\).
    • And for \(n > 1\), \(f^{(n+1)}_{ij} = \sum_{k \neq j} P_{ik} f^{(n)}_{kj}\).
  3. Let \[ f_{ij} = \sum_{n=1}^∞ f^{(n)}_{ij} = P_i(T_j < ∞) \] denotes the probability that a chain starting in \(i\) eventually visits \(j\). In particular \(f_{jj}\) denotes the probability that a chain starting in \(j\) will return to \(j\).

  4. \(f_{ij}\) satisfy the following property.
    \[\displaystyle P^{(n)}_{ij} = \sum_{m=1}^n f^{(m)}_{ij} P^{(n-m)}_{jj}.\]

    An immediate implication of the above is that if \(j\) is an absorbing state then \(P^{(n)}_{ij} = \sum_{m=1}^n f^{(m)}_{ij} = P_i(T_j \le n).\)

  5. Let \(N^{(n)}_j = \sum_{m=1}^{n} \IND\{X_m = j\}\) denote the number of visits to state \(j\) in \(n\) steps. Define \[ G_{ij}^{(n)} = \EXP_i[ N^{(n)}_j ] = \sum_{m=1}^n P^{(m)}_{ij} \] to be the expected number of visits to state \(j\) in \(n\) steps, starting in \(i\).

  6. Let \(N_j = \lim_{n \to ∞} N^{(n)}_j \sum_{m=1}^∞ \IND\{X_m = j \}\) denote the number of visits to state \(j\). Similarly, define \[ G_{ij} = \EXP_{i}[N_j] = \lim_{n \to ∞} G_{ij}^{(n)} = \sum_{m=1}^{∞} P^{(m)}_{ij} \] to denote the expected number of visits to state \(j\) for a chain starting in \(i\).

  7. A state \(j\) is recurrent if \(f_{jj} = 1\) and transient if \(f_{jj} < 1\).

  8. A state is called periodic if \(f_{ii}^(n)\) is non-zero only for multiples of some smallest integer \(d\), \(d > 1\).

  9. For every transient state \(j\), we have for every \(i\), \(P_i(N_j < ∞) = 1\) and \[ G_{ij} = \dfrac{f_{ij}}{1 - f_{jj}}.\] On the other hand, if \(j\) is recurrent, then \(P_j(N_j = ∞) = 1\) and \(G_{jj} = ∞\). Moreover, \[ P_i(N_j = ∞) = P_i(T_j < ∞) = f_{ij}. \] So, if \(f_{ij} = 0\), then \(G_{ij} = 0\) while if \(f_{ij} = ∞\) then \(G_{ij} = ∞\).

  10. Thus, a state \(i\) is recurrent if and only if \[G_{ii} = \sum_{n=1}^∞ P^{(n)}_{ii} = ∞. \]

  11. A state \(j\) is said to be accessible from \(i\) (abbreviated as \(i \rightsquigarrow j\)) if there is an ordered string of notes \((i_0, \dots, i_m)\) such that \(i_0 = i\) and \(i_m = j\) and \(P_{i_k i_{k+1}} > 0\). Equivalently, \(i \rightsquigarrow j\) if there exists a \(m\) such that \(P^{(m)}_{ij} > 0\).

  12. Accessibility is an transitive relationship, i.e., if \(i \rightsquigarrow j\) and \(j \rightsquigarrow k\) implies that \(i \rightsquigarrow k\).

  13. If \(f_{ij} > 0\) but \(f_{ji} < 1\), then \(i\) is transient.

  14. If \(i\) is recurrent and \(i \rightsquigarrow j\). Then, \(j\) is also recurrent and \(f_{ij} = f_{ji} = 1\).

  15. A subset \(C\) of \(\ALPHABET X\) is said to be closed if no state inside \(C\) can lead to any state outside \(C\), i.e., \[ f_{ij} = 0, \quad \forall i \in C \text{ and } j \not\in C. \]

  16. A closed set \(C\) is called irreducible if \(i \rightsquigarrow j\) for all \(i,j \in C\). Thus, if \(C\) is an irreducible set, then all states in \(C\) are either recurrent or transient.

  17. Consequently, if \(C\) is an irreducible and closed set of recurrent states. Then for all \(i,j \in C\),

    • \(f_{ij} = 1\)
    • \(P_i(N_j = ∞) = 1\)
    • \(G_{ij} = ∞\)
  18. If \(C\) is a finite irreducible closed set of states. Then every state in \(C\) is recurrent.

  19. Let \(\ALPHABET X_T\) and \(\ALPHABET X_R\) denote the set of transient and recurrent states. The set \(\ALPHABET X_R\) can be paritioned into a finite or countable number of irreducible closed sets \(C_1\), \(C_2\), \(\dots\).

9.7 Stationary distribution

  1. A distribution \(π\) is said the be a stationary distribution if \[ π = π P. \]

  2. Stationary distributions can be computed by solving balance equations.

  3. Let \(C\) be an irreducible class of a Markov chain. Then, there is a unique stationary distribution \(π\) that assigns positive probability only to states in \(C\).

  4. If \(π_1\) and \(π_2\) are stationary distributions of a Markov chain and \(α \in (0,1)\), then \(α π_1 + (1-α) π_2\) is also a stationary distribution.

  5. Thus, if a Markov chain has a single irreducible class, then it has a unique stationary distribution; if it has multiple irreducible classes, then it has uncountable number of stationary distributions.

9.8 Limiting distribution

  1. Suppose \(j\) is a transient state. Then, we know that \(N_j < ∞\) and \(G_{ij} < ∞\). Therefore, \[ \lim_{n \to ∞} \frac{N^{(n)}_{j}}{n} = 0, a.s., \quad\text{and}\quad \lim_{n \to ∞} \frac{G^{(n)}_{ij}}{n} = 0, \forall i \in \ALPHABET X. \]

  2. Suppose \(j\) is a recurrent state. Then, \[ \lim_{n \to ∞} \frac{N^{(n)}_{j}}{n} = μ_j, a.s., \quad\text{and}\quad \lim_{n \to ∞} \frac{G^{(n)}_{ij}}{n} = μ_j, \forall i \in \ALPHABET X \] where \(μ_j = \EXP_{j}[T_j]\) is the mean return time to state \(j\).

A Markov chain is said to have a steady state distribution if \(π_n\) converges to a limit as \(n \to ∞\) and the limit does not depend on the initial distribution \(π_0\).

A Markov chain has a steady state distribution if it is ergodic. We can find the steady state distribution by solving the balance equation: \(π = π P\).

Exercises

Exercise 9.1 (Time reversal of Markov chains) Let \(\{X_n\}_{n \ge 1}\) is a Markov chain. Show that for any \(N > n\), \[ \PR(X_n = x_n \mid X_{n+1:N} = x_{n+1:N}) = \PR(X_n = x_n \mid X_{n+1} = x_{n+1}). \] Thus, a time reversed Markov chain is also Markov.

Exercise 9.2 Suppose \(\{X_n\}_{n \ge 0}\) is a Markov chain with transition matrix \(P\). For a fixed positive integer \(k\), define \(Y_n = X_{kn}\). Show that \(\{Y_n\}_{n \ge 0}\) is a Markov chain with transition matrix \(P^k\).

Exercise 9.3 Suppose a (6-sided) die is ‘fixed’ so that two consecutive rolls cannot have the same outcome. In particular, if the outcome of a roll is \(i\), then the next roll cannot be \(i\); all \(5\) other outcomes are equally likely.

  1. Model the above as a Markov chain.
  2. If the outcome of the first roll is \(1\), what is the probability that the outcome of the \(n\)th roll is also \(1\)?