Convergence of random variables
Suppose we have an infinite sequence of random variables \(\{X_n\}_{n \ge 1}\) defined on a common probability space \((Ω, \ALPHABET F, \PR)\). A sequence of random variables, also called a stochastic process, can be thought of a generalization of random vectors. When does this sequence converge?
1 Different notions of convergence
Recall that a sequence \(\{x_n\}_{n \ge 1}\) of real numbers converges to a limit \(x^*\) if for every \(ε > 0\), there exists a \(N\) such that for all \(n \ge N\), we have that \[ \ABS{ x_n - x^* } < ε. \]
Convergence of a sequence of random variables is much different. We have already seen one form of convergence: convergence in distribution
There are two ways in which this notion can be extended to sequence of random variables:
Convergence in probability. A sequence \(\{X_n\}_{n \ge 1}\) of random variables converges to a random variale \(X\) in probability if \[ \lim_{n \to ∞} \PR( \ABS{ X_n - X } > ε ) = 0. \] Or, equivalently, for any \(ε > 0\) and \(δ > 0\), there exists a \(N\) such that for all \(n \ge N\), we have \[ \PR( \ABS{ X_n - X} > ε ) \le δ. \] We denote such convergence as \(X_n \xrightarrow{p} X\).
Almost sure convergence. A sequence \(\{X_n\}_{n \ge 1}\) of random variables converges to a random variable \(X\) almost surely if \[ \PR\left( \left\{ ω : \lim_{n \to ∞} X_n(ω) = X(ω) \right\} \right) = 1 \] Or, equivalently, for any \(ε > 0\), \[ \PR\left( \limsup_{n \to ∞} \{ ω : | X_n(ω) - X(ω) | > ε \} \right) = 0 \] We denote such convergence as \(X_n \xrightarrow{a.s.} X\).
Explanation adapted from math.stackexchange [1] and [2]
Limits of sets is easy to describe when we have weakly increasing or weakly decreasing sequence of sets. In particular, if \(\{I_n\}_{n \ge 1}\) is a weakly increasing sequence of sets, then \[ \lim_{n \to ∞} C_n = \bigcup_{n=1}^{∞} C_n. \] Thus, the limit is the union of all sets. Moreover, if \(\{D_n\}_{n \ge 1}\) is weakly decreasing sequence of sets, then \[ \lim_{n \to ∞} D_n = \bigcap_{n=1}^{∞} D_n. \] Thus, the limit is the intersection of all sets.
What happens when a sequence \(\{A_n\}_{n \ge 1}\) is neither increasing nor decreasing? We can sandwitch it between an increasing sequence \(\{C_n\}_{n \ge 1}\) and a decreasing sequence \(\{D_n\}\) as follows:
\[ \begin{array}{lcl} C_1 = A_1 \cap A_2 \cap A_3 \cap \dotsb &\quad\subseteq\qquad A_1 \qquad &\subseteq \qquad D_1 = A_1 \cup A_2 \cup A_3 \cup \dotsb \\ C_2 = \phantom{A_1 \cap{}} A_2 \cap A_3 \cap \dotsb &\quad\subseteq\qquad A_2 \qquad &\subseteq \qquad D_2 = \phantom{A_1 \cup{}} A_2 \cup A_3 \cup \dotsb \\ C_3 = \phantom{A_1 \cap A_2 \cap{}} A_3 \cap \dotsb &\quad\subseteq\qquad A_3 \qquad &\subseteq \qquad D_3 = \phantom{A_1 \cup A_2 \cup{}} A_3 \cup \dotsb \\ & \qquad\vdots & \end{array} \]
The limit of \(\{C_n\}_{n \ge 1}\) is called the lim inf of \(\{A_n\}_{n \ge 1}\), i.e., \[ \liminf_{n \to ∞} A_n = \lim_{n \to ∞} C_n = \bigcup_{n=1}^∞ C_n = \bigcup_{n=1}^{∞} \bigcap_{i=n}^{∞} A_i. \] Similarly, the limit of \(\{D_n\}_{n \ge 1}\) is call the lim sup of \(\{A_n\}_{n \ge 1}\), i.e., \[ \limsup_{n \to ∞} A_n = \lim_{n \to ∞} D_n = \bigcap_{n=1}^∞ D_n = \bigcap_{n=1}^{∞} \bigcup_{i=n}^{∞} A_i. \] When the two limits are equal, we say that the sequence \(\{A_n\}_{n \ge 1}\) has a limit.
Another way to think about these definitions is as follows. \[ ω \in \limsup_{n \to ∞} A_n \iff \limsup_{n \to ∞} \IND_{A_n}(ω) = 1 \] which holds if and only if the binary sequence \(\{\IND_{A_n}(ω)\}\) has infinitely many ones, i.e., \(ω\) is a member of infinitely many \(A_n\).
Similarly, \[ ω \in \liminf_{n \to ∞} A_n \iff \liminf_{n \to ∞} \IND_{A_n}(ω) = 1 \] which holds if and only if the binary sequence \(\{\IND_{A_n}(ω)\}\) eventually becomes \(1\) forever, i.e., \(ω\) is eventually a member of \(A_n\) forever.
2 Examples of convergence in probability
Example 1 Consider a probability space \((Ω, \ALPHABET F, \PR)\) where \(Ω = [0,1]\), \(\ALPHABET F = \mathscr{B}(0,1)\), and \(\PR\) is the uniform distribution on \(Ω\). Define \[ X_n(ω) = \begin{cases} 1, & ω \in [0, \frac{1}{n^2}) \\ 0, & ω \in [\frac{1}{n^2}, 1] \end{cases} \] Show that \(X_n \xrightarrow{p} 0\).
Note that \[ X_n = \begin{cases} 1 & \text{with probability } \frac 1{n^2} \\ 0 & \text{with probability } 1 - \frac 1{n^2} \end{cases} \] Pick any \(ε > 0\). If \(ε > 1\), then \(\PR(X_n > ε) = 0\). So, we assume that \(ε \in (0,1)\). Then, \[ \PR(X_n > ε) = \PR(X_n = 1) = \frac 1{n^2} \] which converges to \(0\) as \(n \to ∞\). Therefore, \(X_n \xrightarrow{p} 0\).
Example 2 Consider the same setup as Example 1, but the random variable \(X_n\) defined as \[ X_n(ω) = \begin{cases} 1, & ω \in [0, \frac{1}{n}) \\ 0, & ω \in [\frac{1}{n}, 1] \end{cases} \] Show that \(X_n \xrightarrow{p} 0\).
This can be solved in exactly the same manner as Example 1.
Example 3 Consider the same setup as Example 1, but the random variable \(X_n\) defined as \[ X_n(ω) = \begin{cases} n, & ω \in [0, \frac{1}{n}) \\ 0, & ω \in [\frac{1}{n}, 1] \end{cases} \] Show that \(X_n \xrightarrow{p} 0\).
This can be solved in exactly the same manner as Example 1.
Example 4 Consider an i.i.d. sequence \(\{X_n\}_{n \ge 1}\), where \(X_n \sim \text{Uniform}(0,1)\). Define \[ Y_n = \min\{X_1, \dots, X_n\}. \] Show that \(Y_n \xrightarrow{p} 0\).
Pick any \(ε > 0\). As in Example 1, if \(ε > 1\), then \(\PR(Y_n > ε) = 0\). So, we assume that \(ε \in (0,1)\). Then, \[\begin{align*} \PR(Y_n > ε) &= \PR\bigl( \min\{ X_1, \dots, X_n \} \ge ε ) \\ &= \PR( X_1 \ge ε, X_2 \ge ε, \dots, X_n \ge ε ) \\ &= (1-ε)^n \end{align*}\] which goes to zero as \(n \to ∞\). Thus, \(Y_n \xrightarrow{p} 0\).
3 Examples of almost sure convergence
We revisit the examples of previous section.
In Example 1, for any \(ω \neq 0\), the sequence \(\{X_n(ω)\}_{n \ge 1}\) is a finite sequence of \(1\)’s followed by infinite sequence of \(0\)’s. Thus, \(\lim_{n\to ∞} X_n(ω) = 0\). Thus, \[ \PR\Bigl( \Bigl\{ ω : \lim_{n \to ∞} X_n(ω) = 0 \Bigr\} \Bigr) = \PR( ω \in (0,1]) = 1. \] Hence, \(X_n \xrightarrow{a.s} 0\).
In Example 2, the same argument as above works.
In Example 3, a slight variation of the above argument works.
In Example 4, we proceed as follows. Fix \(ω\) and consider the sequence \(\{Y_n(ω)\}_{n \ge 1}\). Since this is a decreasing sequence, it must have a limit. Denote that limit by \(Y(ω)\), i.e., \[ Y(ω) = \lim_{n \to ∞} Y_n(ω). \]
Since \(\{Y_n\}_{n \ge 1}\) is a decreasing sequence, we have that \(Y(ω) \le Y_n(ω)\). Hence, for any \(ε > 0\), \[ \PR(Y > ε) \le \PR(Y_n > ε) = (1 - ε)^n \] (where the last inequality follows from the calculation in the solution of Example 4).
The above inequality holds for every \(n\), so we have \[ \PR(Y > ε) \le \lim_{n \to ∞} (1-ε)^n = 0. \] Recall that \(ε > 0\) was arbitrary. Therefore, we have shown that \[ \PR\Bigl( \lim_{n\to ∞} Y_n > ε \Bigr) = 0. \] Thus, the only possibility is that \[ \PR\Bigl( \lim_{n\to ∞} Y_n = 0 \Bigr) = 1. \] Hence \(Y_n \xrightarrow{a.s.} 0\).
4 Almost sure convergence from convergence in probability.
It is possible to infer almost sure convergence from convergence in probability. For that we need to state a result. The proof is not difficult, but is omitted due to time.
Lemma 1 (Borel Cantelli Lemma) Let \(\{A_n\}_{n \ge 1}\) be a sequence of events defined on a common probability space \((Ω, \ALPHABET F, \PR)\). If the sum of the probability of the events is finite, i.e., \[ \sum_{n=1}^∞ \PR(A_n) < ∞, \] then the probability that infinitely many of them occur is zero, i.e., \[ \PR\Bigl(\limsup_{n \to ∞} A_n \Bigr) = 0. \]
There is a partial converse of Borel-Cantelli lemma.
Lemma 2 (Second Borel Cantelli Lemma) Let \(\{A_n\}_{n \ge 1}\) be a sequence of independent events defined on a common probability space \((Ω, \ALPHABET F, \PR)\). If the sum of the probability of the events is infinite, i.e., \[ \sum_{n=1}^∞ \PR(A_n) = ∞, \] then the probability that infinitely many of them occur is one, i.e., \[ \PR\Bigl(\limsup_{n \to ∞} A_n \Bigr) = 1. \]
An immediate implication of Borel-Cantelli lemma is the following:
Lemma 3 Suppose \(X_n \xrightarrow{p} X\) and for any \(ε > 0\), we have \[ \sum_{n=1}^{∞} \PR(\ABS{X_n - X} > ε) < ∞ \] then \(X_n \xrightarrow{a.s.} X\).
In light of the above result, we revisit some variations of the examples of the previous section.
Consider a variation Example 1 where we no longer specify \(X_n\) as a function of \(ω\) but simply assume that \[ X_n = \begin{cases} 1 & \text{with probability } \frac 1{n^2} \\ 0 & \text{with probability } 1 - \frac 1{n^2} \end{cases} \] Then for any \(ε > 0\), \(\PR(\ABS{X_n} > ε) = \PR(X_n > ε) = 1/n^2\). Therefore, \(\sum_{n \ge 1} \PR(\ABS{X_n} > ε) < ∞\); hence by Lemma 3, \(X_n \xrightarrow{a.s.} 0\).
Consider a variation Example 2 where we no longer specify \(X_n\) as a function of \(ω\) but simply assume that \[ X_n = \begin{cases} 1 & \text{with probability } \frac 1{n} \\ 0 & \text{with probability } 1 - \frac 1{n} \end{cases} \] and \(\{X_n\}_{n \ge 1}\) are independent.
Then for any \(ε > 0\), \(\PR(\ABS{X_n} > ε) = \PR(X_n > ε) = 1/n\). Therefore, \(\sum_{n \ge 1} \PR(\ABS{X_n} > ε) = ∞\); hence by the Second Borel-Cantelli lemma, \[ \PR(\limsup_{n \to ∞} \{ \ABS{X_n} > ε \}) = 1. \] So, \(X_n\) does not converge almost surely!
In Example 4, we can directly apply Lemma 3 to argue that \(Y_n \xrightarrow{a.s.} 0\).
A variation of Lemma 4 is the following:
Lemma 4 Let \(\{X_n\}_{n \ge 1}\) be a sequence of random variables with finite expectations and let \(X\) be another random variable. If \[ \sum_{n=1}^{∞} \EXP[ \ABS{X_n - X} ] < ∞ \] then \[ X_n \xrightarrow{a.s.} X. \]
To simplify the notation, we assume that \(X = 0\).
Pick any \(ε > 0\) and define the sequence of events \[ A_n = \bigl\{ ω : \ABS{X_i} > ε \bigr\}, \quad n \in \naturalnumbers. \]
From Markov inequality, we have \[\PR(A_n) = \PR\bigl( \ABS{X_i} > ε \bigr) \le \dfrac{\EXP[\ABS{X_i}]}{ε}. \] Therefore, \[ \sum_{n=1}^{∞} \PR(A_n) \le \frac{1}{ε} \sum_{n=1}^{∞} \EXP[\ABS{X_i}] < ∞ \] by the hypothesis of the result. Therefore, by Borel-Cantelli lemma, we have \[ \PR\Bigl( \limsup_{n \to ∞} A_n \Bigr) = 0. \]
5 Some properties of convergence of sequence of random variables
We now state some properties without proof.
- The three notions of convergence that we have defined are related as follows: \[ [X_n \xrightarrow{a.s.} X] \implies [X_n \xrightarrow{p} X] \implies [X_n \xrightarrow{D} X] \]
Fix \(ε > 0\). Define \[ A_n = \{ ω : \exists m \ge n, \ABS{X_m(ω) - X} \ge ε \}. \] Then, \(\{A_n\}_{n \ge 1}\) is a decreasing sequence of events. If \(ω \in \bigcap_{n \ge 1} A_n\), then \(X_n(ω) \not\xrightarrow{a.s} X(ω)\). This implies \[\PR\Bigl( \bigcap_{n \ge 1} A_n \Bigr) \le \PR\Bigl( \Bigl\{ ω : \lim_{n \to ∞}X_n(ω) \neq X(ω) \Bigr\}\Bigr) = 0. \] By continuity of probability, \[ \lim_{n \to ∞} \PR(A_n) = \PR\Bigl( \lim_{n \to ∞} A_n \Bigr) = 0. \]
Let \(F_n\) and \(F\) denote the CDFs of \(X_n\) and \(X\), respectively. Fix \(ε > 0\), pick \(x\) such that \(F\) is continuous at \(x\), and consider \[\begin{align*} F_n(x) &= \PR(X_n \le x) = \PR(X_n \le x, X \le x + ε) + \PR(X_n \le x, X > x + ε) \\ &\le \PR(X \le x + ε) + \PR(X - X_n > ε) \\ &\le F(x + ε) + \PR(\ABS{X_n - X} > ε). \end{align*}\] Similarly, \[\begin{align*} F(x-ε) &= \PR(X \le x-ε) = \PR(X \le x-ε, X_n \le x ) + \PR(X \le x - ε, X_n > x ) \\ &\le \PR(X_n \le x) + \PR(X_n - X > ε) \\ &\le F_n(x) + \PR(\ABS{X_n - X} > ε). \end{align*}\]
Thus, \[ F(x-ε) - \PR(\ABS{X_n - X} > ε) \le F_n(x) \le F(x+ε) + \PR(\ABS{X_n - X} > ε). \] Taking \(n \to ∞\), we have \[ F(x-ε) \le \liminf_{x \to ∞} F_n(x) \le \limsup_{x \to ∞} F_n(x) \le F(x + ε). \] The result is true for all \(ε > 0\). Since \(F\) is continuous at \(x\), when we take \(ε \downarrow 0\), we have \[ F(x - ε) \uparrow F(x) \quad\text{and}\quad F(x + ε) \downarrow F(x) \] which implies that \(F_n(x) \to F(x)\).
There are partial converss. For any constant \(c\) \[ [X_n \xrightarrow{D} c] \implies [X_n \xrightarrow{p} c]. \] If \(\{X_n\}_{n \ge 1}\) is a strictly decreasing sequence then \[ [X_n \xrightarrow{p} c] \implies [X_n \xrightarrow{a.s.} c]. \]
If \(X_n \xrightarrow{p} X\), then there exists a subsequence \(\{n_k : k \in \naturalnumbers \}\) such that \(\{X_{n_k}\}_{k \ge 1}\) converges almost surely to \(X\).
\(X_n \xrightarrow{p} X\) if and only if every subsequence \(\{n_k : k \in \naturalnumbers \}\) has a sub-subsequence \(\{n_{k_m} : m \in \naturalnumbers \}\) such that \(\{X_{n_{k_m}} \}_{m \ge 1}\) that converges to \(X\) almost surely.
Skorokhod’s representation theorem. If \(X_n \xrightarrow{D} X\), then there exists a sequence \(\{Y_n\}_{n \ge 1}\) that is identically distributed to \(\{X_n\}_{n \ge 1}\) such that \(Y_n \xrightarrow Y\), where \(Y\) is identically distributed to \(X\).
Continuous mapping theorems. Let \(g \colon \reals \to \reals\) be a continuous function. Then,
- \(X_n \xrightarrow{a.s.} X\) implies \(g(X_n) \xrightarrow{a.s.} g(X)\).
- \(X_n \xrightarrow{p} X\) implies \(g(X_n) \xrightarrow{p} g(X)\).
- \(X_n \xrightarrow{D} X\) implies \(g(X_n) \xrightarrow{D} g(X)\).
Convergence of sums.
- If \(X_n \xrightarrow{a.s.} X\) and \(Y_n \xrightarrow{a.s.} Y\), then \(X_n + Y_n \xrightarrow{a.s.} X + Y\).
- If \(X_n \xrightarrow{p} X\) and \(Y_n \xrightarrow{p} Y\), then \(X_n + Y_n \xrightarrow{p} X + Y\).
- It is not true in general that \(X_n + Y_n \xrightarrow{D} X + Y\) whenever \(X_n \xrightarrow{p} X\) and \(Y_n \xrightarrow{p} Y\). The result is true when \(X\) or \(Y\) is a constant.
If \(X_n \xrightarrow{p} X\) and \(Y_n \xrightarrow{p} Y\), then \(X_n + Y_n \to X + Y\) and \(X_n Y_n \xrightarrow X Y\).
6 Strong law of large numbers
Theorem 1 Let \(\{X_n\}_{n \ge 1}\) be an i.i.d. sequence of random variables with mean \(μ\) and variance \(σ^2\). Let \[ \bar X_n = \frac 1n \sum_{i=1}^n X_i \] be the sameple average. Then, \(\bar X_n \xrightarrow{a.s} μ\), i.e., \[ \PR\Bigl( ω : \lim_{n \to ∞} X_n(ω) = μ \Bigr) = 1. \]
We provide a proof under the assumption that the fouth moment’s exist.
We assume that \(μ = 0\) (this is just for notational simplicity) and \(\EXP[X^4] = γ < ∞\) (this is a strong assumption).
Since we know that the forth moment exist, we can use a forth moment version of Chebyshev inequality: \[ \PR \ABS{\bar X_n} \ge ε) \le \frac{ \EXP[ \bar X_n^4]}{ε^2}. \]
Then, by the multinomial theorem, we have \[ \EXP{\bar X_n^4} = \frac{1}{4} \EXP\biggl[ \sum_{i} X_i^4 + \binom{4}{1,3} \sum_{i \neq j} X_i X_j^3 + \binom{4}{2,2} \sum_{i \neq j} X_i^2 X_j^2 + \binom{4}{1,1,2} \sum_{i \neq j \neq k} X_i X_j X_k^2 + \sum_{i \neq j \neq k \neq \ell} X_i X_j X_k X_{\ell} \biggr]. \]
Since the \(\{X_i\}_{i \ge 1}\) are independent and zero mean, we have
- \(\EXP[X_i X_j^3] = \EXP[X_i] \EXP[X_j^3] = 0\).
- = = 0$.
Therefore, \[ \EXP[\bar X_n^4= \frac{1}{n^4} \bigl[ n \EXP[X_i^4] + 3 n(n-1) \EXP[ X_i^2 X_j^2] \bigr] \le \frac{M_4}{n_3} + \frac{3 σ^4}{n^2} \] where \(M_4\) is the forth moment. Now, from the forth moment version of Chebyshev inequality, we have \[ \PR \ABS{\bar X_n} \ge ε) \le \frac{ \frac{M_4}{n_3} + \frac{3 σ^4}{n^2} }{ε^2}. \] This implies that \[ \sum_{n=1}^{∞} \PR(\ABS{X_n} \ge ε) < ∞. \] Thus, from Lemma 3, we have that \(X_n \xrightarrow{a.s.} 0\).
7 There’s more
There’s another type of convergence commonly used in engineering: convergence in mean-squared sense. In the interest of time, we will not study this notion in class.