7  Convergence of random variables

Updated

October 16, 2025

Suppose we have an infinite sequence \(\{X_n\}_{n \ge 1}\) of random variables defined on a common probability space \((Ω, \ALPHABET F, \PR)\). Thus, for every \(ω \in Ω\), there is an infinite sequence \[ X_1(ω), X_2(ω), X_3(ω), \dots \] A sequence of random variables, also called a stochastic process, can be thought of a generalization of random vectors. When does this sequence converge?

7.1 Different notions of convergence

  1. Recall that a sequence \(\{x_n\}_{n \ge 1}\) of real numbers converges to a limit \(x\) if for every \(ε > 0\), there exists a \(N\) such that for all \(n \ge N\), we have that \[ \ABS{ x_n - x } < ε. \]

  2. The simplest way to define convergence of a sequence of random variables as follows: a sequence \(\{X_n\}_{n \ge 1}\) of random variables converges to a limit \(X\) surely if for every \(ω \in Ω\), the sequence \(\{X_n(ω)\}_{n \ge 1}\) of real numbers converges to \(X(ω)\).

  3. Sure convergence can be too strong, as is illustrated by the following example.

Example 7.1 Consider a probability space \((Ω, \ALPHABET F, \PR)\) where \(Ω = [0,1]\), \(\ALPHABET F = \mathscr{B}(0,1)\), and \(\PR\) is the uniform distribution on \(Ω\). Define \(X_n(ω) = \IND_{A_n}(ω)\) where \(A_n = [0, \frac 1n]\), i.e., \[ X_n(ω) = \begin{cases} 1, & ω \in [0, \frac{1}{n}] \\ 0, & ω \in (\frac{1}{n}, 1] \end{cases} \]

Show that \(\{X_n\}_{n \ge 1}\) does not converge in the sure sense.

NoteSolution

For any \(ω \in (0,1]\), \(X_n(ω) = 1\) for \(n < \lfloor 1/n \rfloor\), and \(0\) afterwards. Thus, \(X_n(ω) \to 0\).

However, for \(ω = 0\), \(X_n(ω) = 1\) for all \(n\). Thus, \(X_n(ω) \to 1\).

  1. From a practical point of view, we do not care about not converging at \(ω = 0\), because that is an event of zero probability. Stated differently, the set \[ \{ ω : X_n(ω) \to 0 \} \] has probability 1.

  2. Based on the previous discussion, we can relax the notion of sure convergence to almost sure convergence. A sequence \(\{X_n\}_{n \ge 1}\) of random variables converges to a random variable \(X\) almost surely if \[ \PR\left( \left\{ ω : \lim_{n \to ∞} X_n(ω) = X(ω) \right\} \right) = 1 \] Or, equivalently, for any \(ε > 0\), \[ \PR\left( \limsup_{n \to ∞} \{ ω : | X_n(ω) - X(ω) | > ε \} \right) = 0 \] We denote such convergence as \(X_n \xrightarrow{a.s.} X\).

  3. Another way to generalize the definition of convergence of a sequence of real numbers to that of random variables is called convergence in probability. A sequence \(\{X_n\}_{n \ge 1}\) of random variables converges to a random variable \(X\) in probability if \[ \lim_{n \to ∞} \PR( \ABS{ X_n - X } > ε ) = 0. \] Or, equivalently, for any \(ε > 0\) and \(δ > 0\), there exists a \(N\) such that for all \(n \ge N\), we have \[ \PR( \ABS{ X_n - X} > ε ) \le δ. \] We denote such convergence as \(X_n \xrightarrow{p} X\).

  4. Convergence in probability is a weaker notion than almost sure convergence, as is illustrated by the following example.

Example 7.2 Consider a probability space as in Example 7.1. Define \(X_n(ω) = \IND_{A_n}(ω)\), where where \(A_n = \{ ω : n ω - \lfloor n ω \rfloor \in [0, \frac 1n] \}\).

Show that \(\{X_n\}_{n \ge 1}\) converges in probability but not almost surely.

Note that \(X_n(ω) \in \{0, 1\}\). Therefore, the sequence \(\{X_n(ω)\}_{n \ge 1}\) of real numbers converges if and only if there exists an \(N(ω)\) such that \(X_n(ω)\) takes a constant value for all \(n \ge N(ω)\). However, as can be seen from the above figure, for any value of \(ω \neq 0\), \(X_n(ω)\) takes both values \(0\) and \(1\) infinitely often. Therefore, \(X_n(ω)\) does not converge to a limit.

For \(ω = 0\), \(X_n(ω) = 1\), and therefore \(X_n(ω) \to 1\). However, \(\PR(\{ω = 0 \}) = 0\). Hence, the sequence \(\{X_n(ω)\}_{n \ge 1}\) does not converge almost surely.

Now fix an \(ε > 0\) and consider the set \[ E_n = \{ ω : |X_n(ω) - 0| > ε \}. \] If \(ε > 1\), then \(E_n = \emptyset\), and \(\PR(E_n) = 0\) for all \(n\). So, we assume that \(ε < 1\). Then, \[ E_n = \{ ω : X_n(ω) = 1 \}. \] Note that \(X_n(ω) = 1\) for roughly \(1/n\) fraction of points (One can make this argument formal by showing the sequence \(X_n(ω)\) is equidistributed sequence modulo \(1\) by using Weyl’s criterion). So, \[ \PR(E_n) = \PR(X_n = 1) = \frac 1n. \] Thus, \[ \lim_{n \to ∞} \PR(E_n) = 0. \] Hence, \(\{X_n\}\) converges in probability.

7.2 Some examples

Example 7.3 Consider the probability space \((Ω, \ALPHABET F, \PR)\) as in Example 7.1, and consider different choices of \(X_n\) shown below. For each case, determine whether \(X_n\) converges almost surely or in probability.

  1. \(X_n(ω) = \IND_{A_n}(ω)\), where \(A_n = [0, \frac 1n]\).
  2. \(X_n(ω) = \IND_{A_n}(ω)\), where \(A_n = [0, \frac 1{n^2}]\).
  3. \(X_n(ω) = \textcolor{red}{n}\IND_{A_n}(ω)\), where \(A_n = [0, \frac 1n]\).
NoteSolution

We only prove part (a). The proof of the other two parts is similar.

For \(ω \neq 0\), the sequence \(\{X_n(ω)\}_{n \ge 1}\) is a finite sequence of ones followed by an infinite sequence of zeros. Then, \(\lim_{n \to ∞} X_n(ω) = 0\). Thus, \[ \PR\Bigl( \Bigl\{ ω : \lim_{n \to ∞} X_n(ω) = 0 \Bigr\} \Bigr) = \PR( ω \in (0,1]) = 1. \] Hence, \(X_n \xrightarrow{a.s} 0\).

Fix an \(ε > 0\) and consider the set \[ E_n = \{ ω : |X_n(ω) - 0| > ε \}. \] If \(ε > 1\), then \(E_n = \emptyset\), and \(\PR(E_n) = 0\) for all \(n\). So, we assume that \(ε < 1\). Then, \[ E_n = \{ ω : X_n(ω) = 1 \}. \] Hence, \[ \PR(E_n) = \PR(X_n = 1) = \frac 1n \] which converges to \(0\) as \(n \to ∞\). Therefore, \(X_n \xrightarrow{p} 0\).

Example 7.4 Consider an i.i.d. sequence \(\{X_n\}_{n \ge 1}\), where \(X_n \sim \text{Uniform}(0,1)\). Define \[ Y_n = \min\{X_1, \dots, X_n\}. \] Determine whether \(Y_n\) converges almost surely or in probability.

NoteSolution

We first check convergence in probability and then almost surely.

Fix an \(ε > 0\) and consider the set \[ E_n = \{ ω : |Y_n(ω) - 0| > ε \}. \] As in Example 7.3, if \(ε > 1\), then \(\PR(E_n) = 0\). So, we assume that \(ε < 1\). Then, \[\begin{align*} \PR(E_n) &= \PR(Y_n > ε) = \PR\bigl( \min\{ X_1, \dots, X_n \} \ge ε ) \\ &= \PR( X_1 \ge ε, X_2 \ge ε, \dots, X_n \ge ε ) \\ &= (1-ε)^n \end{align*}\] which goes to zero as \(n \to ∞\). Thus, \(Y_n \xrightarrow{p} 0\).

We now consider almost sure convergence. Note that for a fixed \(ω\), the sequence \(\{Y_n(ω)\}_{n \ge 1}\) is a decreasing sequence. Hence, it must have a limit. Denote that limit by \(Y(ω)\), i.e., \[ Y(ω) = \lim_{n \to ∞} Y_n(ω). \]

Since \(\{Y_n\}_{n \ge 1}\) is a decreasing sequence, we have that \(Y(ω) \le Y_n(ω)\). Hence, for any \(ε > 0\), \[ \PR(Y > ε) \le \PR(Y_n > ε) = (1 - ε)^n \] where the last inequality follows from the calculations done above.

The above inequality holds for every \(n\), so we have \[ \PR(Y > ε) \le \lim_{n \to ∞} (1-ε)^n = 0. \] Recall that \(ε > 0\) was arbitrary. Therefore, we have shown that \[ \PR\Bigl( \lim_{n\to ∞} Y_n > ε \Bigr) = 0. \] Thus, the only possibility is that \[ \PR\Bigl( \lim_{n\to ∞} Y_n = 0 \Bigr) = 1. \] Hence \(Y_n \xrightarrow{a.s.} 0\).

7.3 Almost sure convergence from convergence in probability.

Example 7.4 shows that verifying almost sure convergence can be a bit tricky. In this section, we show that sometimes it is possible to infer almost sure convergence from convergence in probability.

WarningLim inf and lim sup of sets

Explanation adapted from math.stackexchange [1] and [2]

Limits of sets is easy to describe when we have weakly increasing or weakly decreasing sequence of sets. In particular, if \(\{C_n\}_{n \ge 1}\) is a weakly increasing sequence of sets, then \[ \lim_{n \to ∞} C_n = \bigcup_{n=1}^{∞} C_n. \] Thus, the limit is the union of all sets. Moreover, if \(\{D_n\}_{n \ge 1}\) is weakly decreasing sequence of sets, then \[ \lim_{n \to ∞} D_n = \bigcap_{n=1}^{∞} D_n. \] Thus, the limit is the intersection of all sets.

What happens when a sequence \(\{A_n\}_{n \ge 1}\) is neither increasing nor decreasing? We can sandwich it between an increasing sequence \(\{C_n\}_{n \ge 1}\) and a decreasing sequence \(\{D_n\}\) as follows:

\[ \begin{array}{lcl} C_1 = A_1 \cap A_2 \cap A_3 \cap \dotsb &\quad\subseteq\qquad A_1 \qquad &\subseteq \qquad D_1 = A_1 \cup A_2 \cup A_3 \cup \dotsb \\ C_2 = \phantom{A_1 \cap{}} A_2 \cap A_3 \cap \dotsb &\quad\subseteq\qquad A_2 \qquad &\subseteq \qquad D_2 = \phantom{A_1 \cup{}} A_2 \cup A_3 \cup \dotsb \\ C_3 = \phantom{A_1 \cap A_2 \cap{}} A_3 \cap \dotsb &\quad\subseteq\qquad A_3 \qquad &\subseteq \qquad D_3 = \phantom{A_1 \cup A_2 \cup{}} A_3 \cup \dotsb \\ & \qquad\vdots & \end{array} \]

The limit of \(\{C_n\}_{n \ge 1}\) is called the lim inf of \(\{A_n\}_{n \ge 1}\), i.e., \[ \liminf_{n \to ∞} A_n = \lim_{n \to ∞} C_n = \bigcup_{n=1}^∞ C_n = \bigcup_{n=1}^{∞} \bigcap_{i=n}^{∞} A_i. \] Similarly, the limit of \(\{D_n\}_{n \ge 1}\) is call the lim sup of \(\{A_n\}_{n \ge 1}\), i.e., \[ \limsup_{n \to ∞} A_n = \lim_{n \to ∞} D_n = \bigcap_{n=1}^∞ D_n = \bigcap_{n=1}^{∞} \bigcup_{i=n}^{∞} A_i. \] When the two limits are equal, we say that the sequence \(\{A_n\}_{n \ge 1}\) has a limit.


Another way to think about these definitions is as follows. \[ ω \in \limsup_{n \to ∞} A_n \iff \limsup_{n \to ∞} \IND_{A_n}(ω) = 1 \] which holds if and only if the binary sequence \(\{\IND_{A_n}(ω)\}\) has infinitely many ones, i.e., \(ω\) is a member of infinitely many \(A_n\).

Similarly, \[ ω \in \liminf_{n \to ∞} A_n \iff \liminf_{n \to ∞} \IND_{A_n}(ω) = 1 \] which holds if and only if the binary sequence \(\{\IND_{A_n}(ω)\}\) eventually becomes \(1\) forever, i.e., \(ω\) is eventually a member of \(A_n\) forever.

For example, suppose we toss a coin infinite number of coins. Let \((Ω, \ALPHABET F, \PR)\) denote the corresponding probability space, and let \(A_n\) denote the event that the \(n\)-th toss is a heads. Then,

  • \(\limsup_{n \to ∞} A_n\) is the event that there are infinitely many heads.
  • \(\liminf_{n \to ∞} A_n\) is the event that there are all but a finite number of the coins were head, i.e., there were only finitely many tails.

We now state two fundamental results. The proofs are not difficult but are omitted due to time.

Lemma 7.1 (Borel Cantelli Lemma) Let \(\{A_n\}_{n \ge 1}\) be a sequence of events defined on a common probability space \((Ω, \ALPHABET F, \PR)\). If the sum of the probability of the events is finite, i.e., \[ \sum_{n=1}^∞ \PR(A_n) < ∞, \] then the probability that infinitely many of them occur is zero, i.e., \[ \PR\Bigl(\limsup_{n \to ∞} A_n \Bigr) = 0. \]

There is a partial converse of Borel-Cantelli lemma.

Lemma 7.2 (Second Borel Cantelli Lemma) Let \(\{A_n\}_{n \ge 1}\) be a sequence of independent events defined on a common probability space \((Ω, \ALPHABET F, \PR)\). If the sum of the probability of the events is infinite, i.e., \[ \sum_{n=1}^∞ \PR(A_n) = ∞, \] then the probability that infinitely many of them occur is one, i.e., \[ \PR\Bigl(\limsup_{n \to ∞} A_n \Bigr) = 1. \]

An immediate implication of Borel-Cantelli lemma is the following:

Lemma 7.3 Suppose \(X_n \xrightarrow{p} X\) and for any \(ε > 0\), we have \[ \sum_{n=1}^{∞} \PR(\ABS{X_n - X} > ε) < ∞ \] then \(X_n \xrightarrow{a.s.} X\).

In light of the above result, we revisit some variations of the examples of the previous section.

  • Consider Example 7.3, part a, we have \[ \PR(E_n) = \frac 1n. \] Since \(\sum_{n \ge 1} \PR(E_n) = ∞\), we cannot use the above theorem to infer that \(X_n \xrightarrow{a.s.} 0\). This is not a contraction because Lemma 7.3 is a sufficient condition, not a necessary condition.

  • Consider a variation of the above problem, where we no longer specify \(X_n\) as a function of \(ω\) but simply assume that \[ X_n = \begin{cases} 1 & \text{with probability } \frac 1{n} \\ 0 & \text{with probability } 1 - \frac 1{n} \end{cases} \] and \(\{X_n\}_{n \ge 1}\) are independent.

    Then for any \(ε \in (0,1)\), \(\PR(\ABS{X_n} > ε) = \PR(X_n > ε) = 1/n\). Therefore, \(\sum_{n \ge 1} \PR(\ABS{X_n} > ε) = ∞\); hence by the Second Borel-Cantelli lemma, \[ \PR(\limsup_{n \to ∞} \{ \ABS{X_n} > ε \}) = 1. \] So, \(X_n\) does not converge almost surely!

  • Consider Example 7.3, part b, we have \[ \PR(E_n) = \frac 1{n^2}. \] Since \(\sum_{n \ge 1} \PR(E_n) < ∞\), we can use Lemma 7.3 to infer that \(X_n \xrightarrow{a.s.} 0\).

  • Consider a variation of the above problem, where we no longer specify \(X_n\) as a function of \(ω\) but simply assume that \[ X_n = \begin{cases} 1 & \text{with probability } \frac 1{n^2} \\ 0 & \text{with probability } 1 - \frac 1{n^2} \end{cases} \] Then for any \(ε \in (0,1)\), \(\PR(\ABS{X_n} > ε) = \PR(X_n > ε) = 1/n^2\). Therefore, \(\sum_{n \ge 1} \PR(\ABS{X_n} > ε) < ∞\); hence by Lemma 7.3, \(X_n \xrightarrow{a.s.} 0\).

  • In Example 7.4, we have argued that \(\PR(Y_n > ε) = (1-ε)^n\). Therefore, \(\sum_{n \ge 1} \PR(Y_n > ε) = \frac{1}{ε} < ∞\). Hence, by Lemma 7.3, \(Y_n \xrightarrow{a.s.} 0\).

A variation of Lemma 7.4 is the following:

Lemma 7.4 Let \(\{X_n\}_{n \ge 1}\) be a sequence of random variables with finite expectations and let \(X\) be another random variable. If \[ \sum_{n=1}^{∞} \EXP[ \ABS{X_n - X} ] < ∞ \] then \[ X_n \xrightarrow{a.s.} X. \]

NoteProof

To simplify the notation, we assume that \(X = 0\).

Pick any \(ε > 0\) and define the sequence of events \[ A_n = \bigl\{ ω : \ABS{X_i} > ε \bigr\}, \quad n \in \naturalnumbers. \]

From Markov inequality, we have \[\PR(A_n) = \PR\bigl( \ABS{X_i} > ε \bigr) \le \dfrac{\EXP[\ABS{X_i}]}{ε}. \] Therefore, \[ \sum_{n=1}^{∞} \PR(A_n) \le \frac{1}{ε} \sum_{n=1}^{∞} \EXP[\ABS{X_i}] < ∞ \] by the hypothesis of the result. Therefore, by Borel-Cantelli lemma, we have \[ \PR\Bigl( \limsup_{n \to ∞} A_n \Bigr) = 0. \]

7.4 Some properties of convergence of sequence of random variables

We now state some properties without proof.

  1. The three notions of convergence that we have defined are related as follows: \[ [X_n \xrightarrow{a.s.} X] \implies [X_n \xrightarrow{p} X] \implies [X_n \xrightarrow{D} X] \]
NoteProof that almost sure convergence implies convergence in probability

Fix \(ε > 0\). Define \[ A_n = \{ ω : \exists m \ge n, \ABS{X_m(ω) - X} \ge ε \}. \] Then, \(\{A_n\}_{n \ge 1}\) is a decreasing sequence of events. If \(ω \in \bigcap_{n \ge 1} A_n\), then \(X_n(ω) \not\xrightarrow{a.s} X(ω)\). This implies \[\PR\Bigl( \bigcap_{n \ge 1} A_n \Bigr) \le \PR\Bigl( \Bigl\{ ω : \lim_{n \to ∞}X_n(ω) \neq X(ω) \Bigr\}\Bigr) = 0. \] By continuity of probability, \[ \lim_{n \to ∞} \PR(A_n) = \PR\Bigl( \lim_{n \to ∞} A_n \Bigr) = 0. \]

NoteProof that convergence in probability implies convergence in distribution

Let \(F_n\) and \(F\) denote the CDFs of \(X_n\) and \(X\), respectively. Fix \(ε > 0\), pick \(x\) such that \(F\) is continuous at \(x\), and consider \[\begin{align*} F_n(x) &= \PR(X_n \le x) = \PR(X_n \le x, X \le x + ε) + \PR(X_n \le x, X > x + ε) \\ &\le \PR(X \le x + ε) + \PR(X - X_n > ε) \\ &\le F(x + ε) + \PR(\ABS{X_n - X} > ε). \end{align*}\] Similarly, \[\begin{align*} F(x-ε) &= \PR(X \le x-ε) = \PR(X \le x-ε, X_n \le x ) + \PR(X \le x - ε, X_n > x ) \\ &\le \PR(X_n \le x) + \PR(X_n - X > ε) \\ &\le F_n(x) + \PR(\ABS{X_n - X} > ε). \end{align*}\]

Thus, \[ F(x-ε) - \PR(\ABS{X_n - X} > ε) \le F_n(x) \le F(x+ε) + \PR(\ABS{X_n - X} > ε). \] Taking \(n \to ∞\), we have \[ F(x-ε) \le \liminf_{x \to ∞} F_n(x) \le \limsup_{x \to ∞} F_n(x) \le F(x + ε). \] The result is true for all \(ε > 0\). Since \(F\) is continuous at \(x\), when we take \(ε \downarrow 0\), we have \[ F(x - ε) \uparrow F(x) \quad\text{and}\quad F(x + ε) \downarrow F(x) \] which implies that \(F_n(x) \to F(x)\).

  1. There are partial converss. For any constant \(c\) \[ [X_n \xrightarrow{D} c] \implies [X_n \xrightarrow{p} c]. \] If \(\{X_n\}_{n \ge 1}\) is a strictly decreasing sequence then \[ [X_n \xrightarrow{p} c] \implies [X_n \xrightarrow{a.s.} c]. \]

  2. If \(X_n \xrightarrow{p} X\), then there exists a subsequence \(\{n_k : k \in \naturalnumbers \}\) such that \(\{X_{n_k}\}_{k \ge 1}\) converges almost surely to \(X\).

  3. \(X_n \xrightarrow{p} X\) if and only if every subsequence \(\{n_k : k \in \naturalnumbers \}\) has a sub-subsequence \(\{n_{k_m} : m \in \naturalnumbers \}\) such that \(\{X_{n_{k_m}} \}_{m \ge 1}\) that converges to \(X\) almost surely.

  4. Skorokhod’s representation theorem. If \(X_n \xrightarrow{D} X\), then there exists a sequence \(\{Y_n\}_{n \ge 1}\) that is identically distributed to \(\{X_n\}_{n \ge 1}\) such that \(Y_n \xrightarrow Y\), where \(Y\) is identically distributed to \(X\).

  5. Continuous mapping theorems. Let \(g \colon \reals \to \reals\) be a continuous function. Then,

    • \(X_n \xrightarrow{a.s.} X\) implies \(g(X_n) \xrightarrow{a.s.} g(X)\).
    • \(X_n \xrightarrow{p} X\) implies \(g(X_n) \xrightarrow{p} g(X)\).
    • \(X_n \xrightarrow{D} X\) implies \(g(X_n) \xrightarrow{D} g(X)\).
  6. Convergence of sums.

    • If \(X_n \xrightarrow{a.s.} X\) and \(Y_n \xrightarrow{a.s.} Y\), then \(X_n + Y_n \xrightarrow{a.s.} X + Y\).
    • If \(X_n \xrightarrow{p} X\) and \(Y_n \xrightarrow{p} Y\), then \(X_n + Y_n \xrightarrow{p} X + Y\).
    • It is not true in general that \(X_n + Y_n \xrightarrow{D} X + Y\) whenever \(X_n \xrightarrow{p} X\) and \(Y_n \xrightarrow{p} Y\). The result is true when \(X\) or \(Y\) is a constant.

If \(X_n \xrightarrow{p} X\) and \(Y_n \xrightarrow{p} Y\), then \(X_n + Y_n \to X + Y\) and \(X_n Y_n \xrightarrow X Y\).

7.5 Strong law of large numbers

Theorem 7.1 Let \(\{X_n\}_{n \ge 1}\) be an i.i.d. sequence of random variables with mean \(μ\) and variance \(σ^2\). Let \[ \bar X_n = \frac 1n \sum_{i=1}^n X_i \] be the sameple average. Then, \(\bar X_n \xrightarrow{a.s} μ\), i.e., \[ \PR\Bigl( ω : \lim_{n \to ∞} X_n(ω) = μ \Bigr) = 1. \]

We provide a proof under the assumption that the fouth moment’s exist.

NoteProof

We assume that \(μ = 0\) (this is just for notational simplicity) and \(\EXP[X^4] = γ < ∞\) (this is a strong assumption).

Since we know that the forth moment exist, we can use a forth moment version of Chebyshev inequality: \[ \PR \ABS{\bar X_n} \ge ε) \le \frac{ \EXP[ \bar X_n^4]}{ε^2}. \]

Then, by the multinomial theorem, we have \[ \EXP{\bar X_n^4} = \frac{1}{4} \EXP\biggl[ \sum_{i} X_i^4 + \binom{4}{1,3} \sum_{i \neq j} X_i X_j^3 + \binom{4}{2,2} \sum_{i \neq j} X_i^2 X_j^2 + \binom{4}{1,1,2} \sum_{i \neq j \neq k} X_i X_j X_k^2 + \sum_{i \neq j \neq k \neq \ell} X_i X_j X_k X_{\ell} \biggr]. \]

Since the \(\{X_i\}_{i \ge 1}\) are independent and zero mean, we have

  • \(\EXP[X_i X_j^3] = \EXP[X_i] \EXP[X_j^3] = 0\).
  • = = 0$.

Therefore, \[ \EXP[\bar X_n^4= \frac{1}{n^4} \bigl[ n \EXP[X_i^4] + 3 n(n-1) \EXP[ X_i^2 X_j^2] \bigr] \le \frac{M_4}{n_3} + \frac{3 σ^4}{n^2} \] where \(M_4\) is the forth moment. Now, from the forth moment version of Chebyshev inequality, we have \[ \PR \ABS{\bar X_n} \ge ε) \le \frac{ \frac{M_4}{n_3} + \frac{3 σ^4}{n^2} }{ε^2}. \] This implies that \[ \sum_{n=1}^{∞} \PR(\ABS{X_n} \ge ε) < ∞. \] Thus, from Lemma 7.3, we have that \(X_n \xrightarrow{a.s.} 0\).

7.6 There’s more

There’s another type of convergence commonly used in engineering: convergence in mean-squared sense. In the interest of time, we will not study this notion in class.