7 Convergence of random variables
Suppose we have an infinite sequence \(\{X_n\}_{n \ge 1}\) of random variables defined on a common probability space \((Ω, \ALPHABET F, \PR)\). Thus, for every \(ω \in Ω\), there is an infinite sequence \[ X_1(ω), X_2(ω), X_3(ω), \dots \] A sequence of random variables, also called a stochastic process, can be thought of a generalization of random vectors. When does this sequence converge?
7.1 Almost sure convergence
Recall that a sequence \(\{x_n\}_{n \ge 1}\) of real numbers converges to a limit \(x\) if for every \(ε > 0\), there exists a \(N\) such that for all \(n \ge N\), we have that \[ \ABS{ x_n - x } < ε. \]
The simplest way to define convergence of a sequence of random variables as follows: a sequence \(\{X_n\}_{n \ge 1}\) of random variables converges to a limit \(X\) surely if for every \(ω \in Ω\), the sequence \(\{X_n(ω)\}_{n \ge 1}\) of real numbers converges to \(X(ω)\).
Sure convergence can be too strong, as is illustrated by the following example.
Example 7.1 Consider a probability space \((Ω, \ALPHABET F, \PR)\) where \(Ω = [0,1]\), \(\ALPHABET F = \mathscr{B}(0,1)\), and \(\PR\) is the uniform distribution on \(Ω\). Define \(X_n(ω) = \IND_{A_n}(ω)\) where \(A_n = [0, \frac 1n]\), i.e., \[ X_n(ω) = \begin{cases} 1, & ω \in [0, \frac{1}{n}] \\ 0, & ω \in (\frac{1}{n}, 1] \end{cases} \]
Show that \(\{X_n\}_{n \ge 1}\) does not converge in the sure sense.
NoteSolution
For any \(ω \in (0,1]\), \(X_n(ω) = 1\) for \(n \le \lfloor 1/ω \rfloor\), and \(0\) afterwards. Thus, \(X_n(ω) \to 0\).
However, for \(ω = 0\), \(X_n(ω) = 1\) for all \(n\). Thus, \(X_n(ω) \to 1\).
From a practical point of view, we do not care about not converging at \(ω = 0\), because that is an event of zero probability. Stated differently, the set \[ \{ ω : X_n(ω) \to 0 \} \] has probability 1.
Based on the previous discussion, we can relax the notion of sure convergence to almost sure convergence. A sequence \(\{X_n\}_{n \ge 1}\) of random variables converges to a random variable \(X\) almost surely if \[ \PR\left( \left\{ ω : \lim_{n \to ∞} X_n(ω) = X(ω) \right\} \right) = 1 \] Or, equivalently, for any \(ε > 0\), \[ \PR\left( \limsup_{n \to ∞} \{ ω : | X_n(ω) - X(ω) | > ε \} \right) = 0 \] We denote such convergence as \(X_n \xrightarrow{a.s.} X\).
Another way to generalize the definition of convergence of a sequence of real numbers to that of random variables is called convergence in probability. A sequence \(\{X_n\}_{n \ge 1}\) of random variables converges to a random variable \(X\) in probability if \[ \lim_{n \to ∞} \PR( \ABS{ X_n - X } > ε ) = 0. \] Or, equivalently, for any \(ε > 0\) and \(δ > 0\), there exists a \(N\) such that for all \(n \ge N\), we have \[ \PR( \ABS{ X_n - X} > ε ) \le δ. \] We denote such convergence as \(X_n \xrightarrow{p} X\).
7.2 Convergence in probability
Almost sure convergence implies convergence in probability, i.e., \[ [X_n \xrightarrow{a.s.} X] \implies [X_n \xrightarrow{p} X] \]
NoteProofFix \(ε > 0\). Define \[ A_n = \{ ω : \exists m \ge n, \ABS{X_m(ω) - X} \ge ε \}. \] Then, \(\{A_n\}_{n \ge 1}\) is a decreasing sequence of events. If \(ω \in \bigcap_{n \ge 1} A_n\), then \(X_n(ω) \not\xrightarrow{a.s} X(ω)\). This implies \[\PR\Bigl( \bigcap_{n \ge 1} A_n \Bigr) \le \PR\Bigl( \Bigl\{ ω : \lim_{n \to ∞}X_n(ω) \neq X(ω) \Bigr\}\Bigr) = 0. \] By continuity of probability, \[ \lim_{n \to ∞} \PR(A_n) = \PR\Bigl( \lim_{n \to ∞} A_n \Bigr) = 0. \]
However, the converse is not true. Convergence in probability does not imply almost sure convergence, as is illustrated by the following example. An easier to understand example is presented in Example 7.5, part (b).
Example 7.2 Consider a probability space as in Example 7.1. Define \(X_n(ω) = \IND_{A_n}(ω)\), where \(A_n = \{ ω : n ω - \lfloor n ω \rfloor \in [0, \frac 1n] \}\).
Show that \(\{X_n\}_{n \ge 1}\) converges in probability but not almost surely.
NoteSolution
Note that \(X_n(ω) \in \{0, 1\}\). Therefore, the sequence \(\{X_n(ω)\}_{n \ge 1}\) of real numbers converges if and only if there exists an \(N(ω)\) such that \(X_n(ω)\) takes a constant value for all \(n \ge N(ω)\). However, as can be seen from the above figure, for any value of \(ω \neq 0\), \(X_n(ω)\) takes both values \(0\) and \(1\) infinitely often. Therefore, \(X_n(ω)\) does not converge to a limit.
For \(ω = 0\), \(X_n(ω) = 1\), and therefore \(X_n(ω) \to 1\). However, \(\PR(\{ω = 0 \}) = 0\). Hence, the sequence \(\{X_n(ω)\}_{n \ge 1}\) does not converge almost surely.
Now fix an \(ε > 0\) and consider the set \[ E_n = \{ ω : |X_n(ω) - 0| > ε \}. \] If \(ε > 1\), then \(E_n = \emptyset\), and \(\PR(E_n) = 0\) for all \(n\). So, we assume that \(ε < 1\). Then, \[ E_n = \{ ω : X_n(ω) = 1 \}. \] Note that \(X_n(ω) = 1\) for roughly \(1/n\) fraction of points (One can make this argument formal by showing the sequence \(X_n(ω)\) is equidistributed sequence modulo \(1\) by using Weyl’s criterion). So, \[ \PR(E_n) = \PR(X_n = 1) = \frac 1n. \] Thus, \[ \lim_{n \to ∞} \PR(E_n) = 0. \] Hence, \(\{X_n\}\) converges in probability.
There is however a partial converse. If \(\{X_n\}_{n \ge 1}\) is a monotone sequence of random variables (i.e., either \(X_1 \le X_2 \le \cdots\) almost surely or \(X_1 \ge X_2 \ge \cdots\) almost surely) and \(c\) is a constant, then \[ [X_n \xrightarrow{p} c] \implies [X_n \xrightarrow{a.s.} c]. \]
If \(X_n \xrightarrow{p} X\), then there exists a subsequence \(\{n_k : k \in \naturalnumbers \}\) such that \(\{X_{n_k}\}_{k \ge 1}\) converges almost surely to \(X\).
\(X_n \xrightarrow{p} X\) if and only if every subsequence \(\{n_k : k \in \naturalnumbers \}\) has a sub-subsequence \(\{n_{k_m} : m \in \naturalnumbers \}\) such that \(\{X_{n_{k_m}} \}_{m \ge 1}\) that converges to \(X\) almost surely.
7.3 Convergence in Distribution
We have already seen the notion of convergence in distribution (also called weak-convergence). A sequence \(\{X_n\}_{n \ge 1}\) of random variables converges to a random variable \(X\) in distribution if \[ \lim_{n \to ∞} \PR(X_n \le x) = \PR(X \le x) \] for every \(x\) where \(F(x) = \PR(X \le x)\) is continuous.
Note that the limiting object \(\lim_{n \to ∞} \PR(X_n \le x)\) need not be a valid CDF. For example, consider the deterministic sequence, \(X_n = n\). The corresponding CDF is \[ F_{X_n}(x) = \begin{cases} 0 & x < n \\ 1 & x \ge n \end{cases} \] Thus, for any \(x\), we have \(\lim_{n \to ∞} F_{X_n}(x) = 0\) which is not a valid CDF.
Convergence in probability implies convergence in distribution. \[ [X_n \xrightarrow{p} X] \implies [X_n \xrightarrow{D} X] \]
NoteProofLet \(F_n\) and \(F\) denote the CDFs of \(X_n\) and \(X\), respectively. Fix \(ε > 0\), pick \(x\) such that \(F\) is continuous at \(x\), and consider \[\begin{align*} F_n(x) &= \PR(X_n \le x) = \PR(X_n \le x, X \le x + ε) + \PR(X_n \le x, X > x + ε) \\ &\le \PR(X \le x + ε) + \PR(X - X_n > ε) \\ &\le F(x + ε) + \PR(\ABS{X_n - X} > ε). \end{align*}\] Similarly, \[\begin{align*} F(x-ε) &= \PR(X \le x-ε) = \PR(X \le x-ε, X_n \le x ) + \PR(X \le x - ε, X_n > x ) \\ &\le \PR(X_n \le x) + \PR(X_n - X > ε) \\ &\le F_n(x) + \PR(\ABS{X_n - X} > ε). \end{align*}\]
Thus, \[ F(x-ε) - \PR(\ABS{X_n - X} > ε) \le F_n(x) \le F(x+ε) + \PR(\ABS{X_n - X} > ε). \] Taking \(n \to ∞\), we have \[ F(x-ε) \le \liminf_{n \to ∞} F_n(x) \le \limsup_{n \to ∞} F_n(x) \le F(x + ε). \] The result is true for all \(ε > 0\). Since \(F\) is continuous at \(x\), when we take \(ε \downarrow 0\), we have \[ F(x - ε) \uparrow F(x) \quad\text{and}\quad F(x + ε) \downarrow F(x) \] which implies that \(F_n(x) \to F(x)\).
The converse is not true. For example, let \(X_n\) be a sequence of i.i.d. \(\text{Uniform}(0,1)\) random variables. Clearly, \(X_n \xrightarrow{D} X\), where \(X\) is also \(\text{Uniform}(0,1)\). However, for any \(ε \in (0,1)\), \[ \PR( \ABS{X_n - X} > ε ) = (1-ε)^2 \]
[Think of the area of the unit square corresponding to the event \(|X_n - X| > ε\)]. Thus, \[ \lim_{n \to ∞} \PR(\ABS{X_n - X} > ε) \neq 0. \] So, \(X_n\) does not converge in probability to \(X\).
There is a partial converse. If \(c\) is a constant then \[ [X_n \xrightarrow{D} c] \implies [X_n \xrightarrow{p} c]. \]
Convergence in distribution does not imply convergence of moments! For example, consider the sequence of random variables where \[ X_n = \begin{cases} 0 & \text{w.p. } 1 - \frac 1n \\ n & \text{w.p. } \frac 1n \end{cases} \] Then, \[ F_{X_n}(x) = \begin{cases} 0, & x < 0 1 - \frac 1n, & 0 \le x < n \\ 1 & x \ge n \end{cases} \] Thus, we have \[ \lim_{n \to ∞} F_{X_n}(x) = \begin{cases} 0 & x < 0 \\ 1 & x \ge 0 \end{cases} \] which corresponds to a random variable \(X = 0\).
However, for any \(k \ge 1\), we have \(\EXP[X_n^k] = n^{k-1}\) but \(\EXP[X^k] = 0\). Thus, the moments do not converge!
Skorokhod’s representation theorem. If \(X_n \xrightarrow{D} X\), then there exists a sequence \(\{Y_n\}_{n \ge 1}\) that is identically distributed to \(\{X_n\}_{n \ge 1}\) such that \(Y_n \xrightarrow{a.s.} Y\), where \(Y\) is identically distributed to \(X\).
7.4 Convergence in \(r\)-th mean
There is another form of convergence: convergence in the \(r\)-th mean. We say that a sequence \(\{X_n\}_{n \ge 1}\) of random variables converges to \(X\) in \(r\)-th mean, \(r \ge 1\), if \(\EXP[\ABS{X_n}^r] < ∞\) for all \(n\), \(\EXP[\ABS{X}^r] < ∞\), and \[ \lim_{n \to ∞} \EXP[ \ABS{X_n - X}^r ] = 0. \]
When \(r = 1\), we say that \(\{X_n\}_{n \ge 1}\) converges in the mean. When \(r=2\), we say that \(\{X_n\}_{n \ge 1}\) converges in the mean-square. We will mainly focus on mean-square convergence and denote it by \(X_n \xrightarrow{m.s.} X\).
If \(\{X_n\}_{n \ge 1}\) converges to \(X\) in the \(r\)-th mean, then it also converges in the \(s\)-th mean, for all \(s < r\).
Convergence in \(r\)-th mean implies convergence in probability.
NoteProofBy Markov inequality, we have (for any \(r \ge 1\)) \[ \PR(\ABS{X_n - X} > ε) = \PR(\ABS{X_n - X}^r > ε^r) \le \frac{ \EXP[ \ABS{X_n - X}^r ] }{ε^r}. \] So, if the RHS goes to zero as \(n \to ∞\), then so does the LHS.
However, the converse is not true as is illustrated by Example 7.3, part (c).
Convergence in mean-square and almost surely are not comparable. Example 7.3, part (c) gives an example that converges almost surely but not in mean square. Example 7.5, part (a) gives an example that converges in mean square but not almost surely.
7.5 Continuity properties of convergence
Continuity preserves convergence almost surely, in probability, and in distribution. Thus, if \(X_n\) converges to \(X\) in any of these modes and \(g\) is a continuous function, then \(g(X_n)\) converges in the same mode to \(g(X)\).
This is not the case for mean square convergence. For example, \(X_n = 0\) with probability \(1 - \frac 1{n^3}\) and \(X_n = 1\) with probability \(\frac 1{n^3}\). Then, \[ \EXP[|X_n - 0|^2] = n^2 \frac{1}{n^3} \to 0. \] Thus, \(X_n \xrightarrow{m.s.} 0\). Now consider the function \(g(x) = x^2\). \[ \EXP[ | g(X_n) - g(0)|^2 ] = n^4 \frac{1}{n^3} = n \to ∞. \] Thus, \(g(X_n)\) does not converge in mean square to \(g(0)\).
If \(X_n \xrightarrow{a.s.} X\) and \(Y_n \xrightarrow{a.s.} Y\), then \((X_n, Y_n) \xrightarrow{a.s.} (X,Y)\). Therefore, for any continuous function \(g\), we have \(g(X_n, Y_n) \xrightarrow{a.s.} g(X,Y)\).
NoteProofDefine \[\begin{align*} Ω_X &= \{ ω : \lim_{n \to ∞} X_n(ω) = X(ω) \} \\ Ω_Y &= \{ ω : \lim_{n \to ∞} Y_n(ω) = Y(ω) \} \end{align*}\] We know that \(\PR(Ω_X) = 1\) and \(\PR(Ω_Y) = 1\). From Example 6.2, we have that \(\PR(Ω_X \cap Ω_Y) = 1\). Thus, \((X_n, Y_n) \xrightarrow{a.s} (X,Y)\).
The second part follows from continuity property of convergence.
The above property is also true for convergence in probability. The proof is left as a homework exercise.
However, this property is not true for convergence in distribution as is illustrated by the following example. Suppose \(X_n \sim \text{Unif}(-1,1)\) and \(X \sim \text{Unif}(-1,1)\) be independent random variables. Define \(Y_n = X_n\) and \(Y = - X\). Then, \(X_n \xrightarrow{D} X\), \(Y_n \xrightarrow{D} Y\), but \(X_n + Y_n = 2X_n\) does not converge in distribution to \(X + Y = 0\).
Slutsky’s theorem: Suppose \(X_n \xrightarrow{D} X\) and \(Y_n \xrightarrow{D} c\), where \(c\) is a constant. Then, \((X_n, Y_n) \xrightarrow{D} (X, c)\). Therefore, for any continuous function \(g(X_n, Y_n) \xrightarrow{D} g(X, c)\). In particular, \(X_n + Y_n \xrightarrow{D} X + c\) and \(X_n Y_n \xrightarrow{D} cX\).
Note that we have already argued that \(Y_n \xrightarrow{D} c\) is equivalent to \(Y_n \xrightarrow{p} c\). So, one can replace \(Y_n \xrightarrow{D} c\) in Slutsky’s theorem by \(Y_n \xrightarrow{p} c\).
7.6 Some examples
Example 7.3 Consider the probability space \((Ω, \ALPHABET F, \PR)\) as in Example 7.1, and consider different choices of \(X_n\) shown below. For each case, determine whether \(X_n\) converges almost surely, in probability, or in mean-square.
- \(X_n(ω) = \IND_{A_n}(ω)\), where \(A_n = [0, \frac 1n]\).
- \(X_n(ω) = \IND_{A_n}(ω)\), where \(A_n = [0, \frac 1{n^2}]\).
- \(X_n(ω) = \textcolor{red}{n}\IND_{A_n}(ω)\), where \(A_n = [0, \frac 1n]\).
NoteSolution
We only prove part (a). The proof of the other two parts is similar.
For \(ω \neq 0\), the sequence \(\{X_n(ω)\}_{n \ge 1}\) is a finite sequence of ones followed by an infinite sequence of zeros. Then, \(\lim_{n \to ∞} X_n(ω) = 0\). Thus, \[ \PR\Bigl( \Bigl\{ ω : \lim_{n \to ∞} X_n(ω) = 0 \Bigr\} \Bigr) = \PR( ω \in (0,1]) = 1. \] Hence, \(X_n \xrightarrow{a.s} 0\).
Fix an \(ε > 0\) and consider the set \[ E_n = \{ ω : |X_n(ω) - 0| > ε \}. \] If \(ε > 1\), then \(E_n = \emptyset\), and \(\PR(E_n) = 0\) for all \(n\). So, we assume that \(ε < 1\). Then, \[ E_n = \{ ω : X_n(ω) = 1 \}. \] Hence, \[ \PR(E_n) = \PR(X_n = 1) = \frac 1n \] which converges to \(0\) as \(n \to ∞\). Therefore, \(X_n \xrightarrow{p} 0\).
Consider \[ \EXP[\ABS{X_n}^2] = \int_{A_n} 1\, d ω = \frac{1}{n} \] which converges to \(0\) as \(n \to ∞\). Thus, \(X_n \xrightarrow{m.s.} 0\).
However, observe that for part (c) \[ \EXP[\ABS{X_n}^2] = \int_{A_n} n^2\, d ω = n \] which goes to \(∞\) as \(n \to ∞\). Thus, \(\{X_n\}_{n \ge 1}\) does not converge in mean square.
Example 7.4 Consider an i.i.d. sequence \(\{X_n\}_{n \ge 1}\), where \(X_n \sim \text{Uniform}(0,1)\). Define \[ Y_n = \min\{X_1, \dots, X_n\}. \] Determine whether \(Y_n\) converges almost surely, in probability, or in mean-square.
NoteSolution
Convergence in probability
Fix an \(ε > 0\) and consider the set \[ E_n = \{ ω : |Y_n(ω) - 0| > ε \}. \] As in Example 7.3, if \(ε > 1\), then \(\PR(E_n) = 0\). So, we assume that \(ε < 1\). Then, \[\begin{align*} \PR(E_n) &= \PR(Y_n > ε) = \PR\bigl( \min\{ X_1, \dots, X_n \} \ge ε ) \\ &= \PR( X_1 \ge ε, X_2 \ge ε, \dots, X_n \ge ε ) \\ &= (1-ε)^n \end{align*}\] which goes to zero as \(n \to ∞\). Thus, \(Y_n \xrightarrow{p} 0\).
Almost sure convergence
We now consider almost sure convergence. Note that for a fixed \(ω\), the sequence \(\{Y_n(ω)\}_{n \ge 1}\) is a decreasing sequence. Hence, it must have a limit. Denote that limit by \(Y(ω)\), i.e., \[ Y(ω) = \lim_{n \to ∞} Y_n(ω). \]
Since \(\{Y_n\}_{n \ge 1}\) is a decreasing sequence, we have that \(Y(ω) \le Y_n(ω)\). Hence, for any \(ε > 0\), \[ \PR(Y > ε) \le \PR(Y_n > ε) = (1 - ε)^n \] where the last inequality follows from the calculations done above.
The above inequality holds for every \(n\), so we have \[ \PR(Y > ε) \le \lim_{n \to ∞} (1-ε)^n = 0. \] Recall that \(ε > 0\) was arbitrary. Therefore, we have shown that \[ \PR\Bigl( \lim_{n\to ∞} Y_n > ε \Bigr) = 0. \] Thus, the only possibility is that \[ \PR\Bigl( \lim_{n\to ∞} Y_n = 0 \Bigr) = 1. \] Hence \(Y_n \xrightarrow{a.s.} 0\).
Mean square convergence
Let \(F_n\) denote the CDF of \(Y_n\) and \(f_n\) denote the PDF. As computed above, \[ F_n(y) = 1 - (1-y)^n, \quad y \in (0,1). \] Therefore, \[ f_n(y) = \frac{d F_n(y)}{dy} = n(1-y)^{n-1}, \quad y \in (0,1). \] Since \(Y_n \in [0, 1]\), we have \[ \EXP[ \ABS{Y_n - 0}^2 ] = \EXP[ Y_n^2 ] \le \EXP[ Y_n] \] where the last inequality uses the fact that \(y^2 \le y\) for \(y \in [0,1]\). We now compute \(\EXP[Y_n]\). \[\begin{align*} \EXP[Y_n] &= \int_{0}^{1} y f_n(y)\, dy \\ &= \int_{0}^1 y n (1-y)^{n-1}\, dy \\ &= n \int_{0}^1 y (1-y)^{n-1}\, dy. \end{align*}\] We could compute the integral using integration by parts, but here we simply use the fact that the above is a :Beta integral. Thus, \[ \EXP[Y_n] = n \int_{0}^1 y (1-y)^{n-1}\, dy = n B(2,n) = n\frac{1!(n-1)!}{(n+1)!} = \frac{1}{n+1}. \] Thus, \[ \lim_{n \to ∞} \EXP[Y_n^2 ] \le \lim_{n \to ∞} \EXP[Y_n ] = 0. \] Hence, \(Y_n \xrightarrow{m.s.} 0\).
7.7 Almost sure convergence from convergence in probability.
Example 7.4 shows that verifying almost sure convergence can be a bit tricky. In this section, we show that sometimes it is possible to infer almost sure convergence from convergence in probability.
WarningLim inf and lim sup of sets
Explanation adapted from math.stackexchange [1] and [2]
Limits of sets is easy to describe when we have weakly increasing or weakly decreasing sequence of sets. In particular, if \(\{C_n\}_{n \ge 1}\) is a weakly increasing sequence of sets, then \[ \lim_{n \to ∞} C_n = \bigcup_{n=1}^{∞} C_n. \] Thus, the limit is the union of all sets. Moreover, if \(\{D_n\}_{n \ge 1}\) is weakly decreasing sequence of sets, then \[ \lim_{n \to ∞} D_n = \bigcap_{n=1}^{∞} D_n. \] Thus, the limit is the intersection of all sets.
What happens when a sequence \(\{A_n\}_{n \ge 1}\) is neither increasing nor decreasing? We can sandwich it between an increasing sequence \(\{C_n\}_{n \ge 1}\) and a decreasing sequence \(\{D_n\}\) as follows:
\[ \begin{array}{lcl} C_1 = A_1 \cap A_2 \cap A_3 \cap \dotsb &\quad\subseteq\qquad A_1 \qquad &\subseteq \qquad D_1 = A_1 \cup A_2 \cup A_3 \cup \dotsb \\ C_2 = \phantom{A_1 \cap{}} A_2 \cap A_3 \cap \dotsb &\quad\subseteq\qquad A_2 \qquad &\subseteq \qquad D_2 = \phantom{A_1 \cup{}} A_2 \cup A_3 \cup \dotsb \\ C_3 = \phantom{A_1 \cap A_2 \cap{}} A_3 \cap \dotsb &\quad\subseteq\qquad A_3 \qquad &\subseteq \qquad D_3 = \phantom{A_1 \cup A_2 \cup{}} A_3 \cup \dotsb \\ & \qquad\vdots & \end{array} \]
The limit of \(\{C_n\}_{n \ge 1}\) is called the lim inf of \(\{A_n\}_{n \ge 1}\), i.e., \[ \liminf_{n \to ∞} A_n = \lim_{n \to ∞} C_n = \bigcup_{n=1}^∞ C_n = \bigcup_{n=1}^{∞} \bigcap_{i=n}^{∞} A_i. \] Similarly, the limit of \(\{D_n\}_{n \ge 1}\) is called the lim sup of \(\{A_n\}_{n \ge 1}\), i.e., \[ \limsup_{n \to ∞} A_n = \lim_{n \to ∞} D_n = \bigcap_{n=1}^∞ D_n = \bigcap_{n=1}^{∞} \bigcup_{i=n}^{∞} A_i. \] When the two limits are equal, we say that the sequence \(\{A_n\}_{n \ge 1}\) has a limit.
Another way to think about these definitions is as follows. \[ ω \in \limsup_{n \to ∞} A_n \iff \limsup_{n \to ∞} \IND_{A_n}(ω) = 1 \] which holds if and only if the binary sequence \(\{\IND_{A_n}(ω)\}\) has infinitely many ones, i.e., \(ω\) is a member of infinitely many \(A_n\).
Similarly, \[ ω \in \liminf_{n \to ∞} A_n \iff \liminf_{n \to ∞} \IND_{A_n}(ω) = 1 \] which holds if and only if the binary sequence \(\{\IND_{A_n}(ω)\}\) eventually becomes \(1\) forever, i.e., \(ω\) is eventually a member of \(A_n\) forever.
For example, suppose we toss a coin infinite number of coins. Let \((Ω, \ALPHABET F, \PR)\) denote the corresponding probability space, and let \(A_n\) denote the event that the \(n\)-th toss is a heads. Then,
- \(\limsup_{n \to ∞} A_n\) is the event that there are infinitely many heads.
- \(\liminf_{n \to ∞} A_n\) is the event that all but a finite number of the coins were heads, i.e., there were only finitely many tails.
We now state two fundamental results. The proofs are not difficult but are omitted due to time.
Lemma 7.1 (Borel Cantelli Lemma) Let \(\{A_n\}_{n \ge 1}\) be a sequence of events defined on a common probability space \((Ω, \ALPHABET F, \PR)\). If the sum of the probability of the events is finite, i.e., \[ \sum_{n=1}^∞ \PR(A_n) < ∞, \] then the probability that infinitely many of them occur is zero, i.e., \[ \PR\Bigl(\limsup_{n \to ∞} A_n \Bigr) = 0. \]
There is a partial converse of Borel-Cantelli lemma.
Lemma 7.2 (Second Borel Cantelli Lemma) Let \(\{A_n\}_{n \ge 1}\) be a sequence of independent events defined on a common probability space \((Ω, \ALPHABET F, \PR)\). If the sum of the probability of the events is infinite, i.e., \[ \sum_{n=1}^∞ \PR(A_n) = ∞, \] then the probability that infinitely many of them occur is one, i.e., \[ \PR\Bigl(\limsup_{n \to ∞} A_n \Bigr) = 1. \]
An immediate implication of Borel-Cantelli lemma is the following:
Lemma 7.3 Suppose \(X_n \xrightarrow{p} X\) and for any \(ε > 0\), we have \[ \sum_{n=1}^{∞} \PR(\ABS{X_n - X} > ε) < ∞ \] then \(X_n \xrightarrow{a.s.} X\).
In light of the above result, we revisit some variations of the examples of the previous section.
Consider Example 7.3, part a, we have \[ \PR(E_n) = \frac 1n. \] Since \(\sum_{n \ge 1} \PR(E_n) = ∞\), we cannot use the above theorem to infer that \(X_n \xrightarrow{a.s.} 0\). This is not a contradiction because Lemma 7.3 is a sufficient condition, not a necessary condition.
Consider Example 7.3, part b, we have \[ \PR(E_n) = \frac 1{n^2}. \] Since \(\sum_{n \ge 1} \PR(E_n) < ∞\), we can use Lemma 7.3 to infer that \(X_n \xrightarrow{a.s.} 0\).
In Example 7.4, we have argued that \(\PR(Y_n > ε) = (1-ε)^n\). Therefore, \(\sum_{n \ge 1} \PR(Y_n > ε) = \sum_{n=1}^∞ (1-ε)^n = \frac{1-ε}{ε} < ∞\) (for \(0 < ε < 1\)). Hence, by Lemma 7.3, \(Y_n \xrightarrow{a.s.} 0\).
We can also consider a variation of the above.
Example 7.5 Consider a variation of Example 7.3, where we no longer specify \(X_n\) as a function of \(ω\) but simply assume that \[
X_n = \begin{cases}
0 & \text{with probability } 1 - p_n \\
1 & \text{with probability } p_n
\end{cases}
\] and \(\{X_n\}_{n \ge 1}\) are independent.
Determine whether \(\{X_n\}_{n \ge 1}\) converges almost surely, in probability, or in mean square
- \(p_n = \frac 1n\)
- \(p_n = \frac 1{n^2}\).
NoteSolution
Convergence in probability: Fix \(ε \in (0,1)\). As before define \(E_n = \{ ω : \ABS{X_n(ω)} > ε \} = \{ ω : X_n(ω) = 1 \}\). Thus, \(\PR(E_n) = p_n\) and for both cases, \(p_n \to 0\) as \(n \to ∞\). Therefore, \(X_n \xrightarrow{p} 0\).
Almost sure convergence: For part (a), \(\sum_{n \ge 1} p_n = ∞\); hence by the Second Borel-Cantelli lemma, \[ \PR(\limsup_{n \to ∞} \{ \ABS{X_n} > ε \}) = 1. \] So, \(X_n\) does not converge almost surely!
For part (b), \(\sum_{n \ge 1} p_n < ∞\); hence by Lemma 7.3, \(X_n \xrightarrow{a.s.} 0\).
Mean square convergence: We have \(\EXP[\ABS{X_n}^2] = p_n\). For both cases \(p_n \to 0\). Hence, \(X_n \xrightarrow{m.s.} 0\).
Note that part (a) converges in probability and mean square but not almost surely.
A variation of Lemma 7.4 is the following:
Lemma 7.4 Let \(\{X_n\}_{n \ge 1}\) be a sequence of random variables with finite expectations and let \(X\) be another random variable. If \[ \sum_{n=1}^{∞} \EXP[ \ABS{X_n - X} ] < ∞ \] then \[ X_n \xrightarrow{a.s.} X. \]
NoteProof
To simplify the notation, we assume that \(X = 0\).
Pick any \(ε > 0\) and define the sequence of events \[ A_n = \bigl\{ ω : \ABS{X_n} > ε \bigr\}, \quad n \in \naturalnumbers. \]
From Markov inequality, we have \[\PR(A_n) = \PR\bigl( \ABS{X_n} > ε \bigr) \le \dfrac{\EXP[\ABS{X_n}]}{ε}. \] Therefore, \[ \sum_{n=1}^{∞} \PR(A_n) \le \frac{1}{ε} \sum_{n=1}^{∞} \EXP[\ABS{X_n}] < ∞ \] by the hypothesis of the result. Therefore, by Borel-Cantelli lemma, we have \[ \PR\Bigl( \limsup_{n \to ∞} A_n \Bigr) = 0. \]
7.8 Strong law of large numbers
Theorem 7.1 Let \(\{X_n\}_{n \ge 1}\) be an i.i.d. sequence of random variables with mean \(μ\) and variance \(σ^2\). Let \[ \bar X_n = \frac 1n \sum_{i=1}^n X_i \] be the sample average. Then, \(\bar X_n \xrightarrow{a.s.} μ\), i.e., \[ \PR\Bigl( ω : \lim_{n \to ∞} \bar X_n(ω) = μ \Bigr) = 1. \]
We provide a proof under the assumption that the fourth moment exists.
NoteProof
We assume that \(μ = 0\) (this is just for notational simplicity) and \(\EXP[X^4] = γ < ∞\) (this is a strong assumption).
Since we know that the fourth moment exists, we can use a fourth moment version of Markov inequality: \[ \PR(\ABS{\bar X_n} \ge ε) = \PR(\ABS{\bar X_n}^4 \ge ε^4) \le \frac{ \EXP[ \bar X_n^4]}{ε^4}. \]
Then, by the multinomial theorem, we have \[ \EXP[\bar X_n^4] = \frac{1}{n^4} \EXP\biggl[ \sum_{i} X_i^4 + \binom{4}{1,3} \sum_{i \neq j} X_i X_j^3 + \binom{4}{2,2} \sum_{i \neq j} X_i^2 X_j^2 + \binom{4}{1,1,2} \sum_{i \neq j \neq k} X_i X_j X_k^2 + \sum_{i \neq j \neq k \neq \ell} X_i X_j X_k X_{\ell} \biggr]. \]
Since the \(\{X_i\}_{i \ge 1}\) are independent and zero mean, we have
- \(\EXP[X_i X_j^3] = \EXP[X_i] \EXP[X_j^3] = 0\).
- \(\EXP[X_i X_j X_k^2] = \EXP[X_i] \EXP[X_j] \EXP[X_k^2] = 0\).
- \(\EXP[X_i X_j X_k X_{\ell}] = \EXP[X_i] \EXP[X_j] \EXP[X_k] \EXP[X_{\ell}] = 0\).
Therefore, \[ \EXP[\bar X_n^4] = \frac{1}{n^4} \bigl[ n \EXP[X_i^4] + 3 n(n-1) \EXP[ X_i^2 X_j^2] \bigr] \le \frac{M_4}{n^3} + \frac{3 σ^4}{n^2} \] where \(M_4\) is the fourth moment. Now, from the fourth moment version of Markov inequality, we have \[ \PR(\ABS{\bar X_n} \ge ε) \le \frac{ \frac{M_4}{n^3} + \frac{3 σ^4}{n^2} }{ε^4}. \] This implies that \[ \sum_{n=1}^{∞} \PR(\ABS{\bar X_n} \ge ε) < ∞. \] Thus, from Lemma 7.3, we have that \(\bar X_n \xrightarrow{a.s.} 0\).
7.9 Further Reading
- Gubner: Sec. 13.1, 14.1, 14.2, 14.3 (and corresponding problems at the end of the chapters)
- Grimmett and Stirzaker: Sec. 7.1, 7.2, 7.3, 7.4, 7.5 (and corresponding problems at the end of the sections and the end of the chapters)