Convergence of random variables

Updated

November 11, 2024

Suppose we have an infinite sequence of random variables {Xn}n1 defined on a common probability space (Ω,F,P). A sequence of random variables, also called a stochastic process, can be thought of a generalization of random vectors. When does this sequence converge?

1 Different notions of convergence

Recall that a sequence {xn}n1 of real numbers converges to a limit x if for every ε>0, there exists a N such that for all nN, we have that |(xnx|<ε.

Convergence of a sequence of random variables is much different. We have already seen one form of convergence: convergence in distribution

There are two ways in which this notion can be extended to sequence of random variables:

  • Convergence in probability. A sequence {Xn}n1 of random variables converges to a random variale X in probability if limnP(|(XnX|>ε)=0. Or, equivalently, for any ε>0 and δ>0, there exists a N such that for all nN, we have P(|(XnX|>ε)δ. We denote such convergence as XnpX.

  • Almost sure convergence. A sequence {Xn}n1 of random variables converges to a random variable X almost surely if P({ω:limnXn(ω)=X(ω)})=1 Or, equivalently, for any ε>0, P(lim supn{ω:|Xn(ω)X(ω)|>ε})=0 We denote such convergence as Xna.s.X.

Explanation adapted from math.stackexchange [1] and [2]

Limits of sets is easy to describe when we have weakly increasing or weakly decreasing sequence of sets. In particular, if {In}n1 is a weakly increasing sequence of sets, then limnCn=n=1Cn. Thus, the limit is the union of all sets. Moreover, if {Dn}n1 is weakly decreasing sequence of sets, then limnDn=n=1Dn. Thus, the limit is the intersection of all sets.

What happens when a sequence {An}n1 is neither increasing nor decreasing? We can sandwitch it between an increasing sequence {Cn}n1 and a decreasing sequence {Dn} as follows:

C1=A1A2A3A1D1=A1A2A3C2=A1A2A3A2D2=A1A2A3C3=A1A2A3A3D3=A1A2A3

The limit of {Cn}n1 is called the lim inf of {An}n1, i.e., lim infnAn=limnCn=n=1Cn=n=1i=nAi. Similarly, the limit of {Dn}n1 is call the lim sup of {An}n1, i.e., lim supnAn=limnDn=n=1Dn=n=1i=nAi. When the two limits are equal, we say that the sequence {An}n1 has a limit.


Another way to think about these definitions is as follows. ωlim supnAnlim supn1An(ω)=1 which holds if and only if the binary sequence {1An(ω)} has infinitely many ones, i.e., ω is a member of infinitely many An.

Similarly, ωlim infnAnlim infn1An(ω)=1 which holds if and only if the binary sequence {1An(ω)} eventually becomes 1 forever, i.e., ω is eventually a member of An forever.

2 Examples of convergence in probability

Example 1 Consider a probability space (Ω,F,P) where Ω=[0,1], F=B(0,1), and P is the uniform distribution on Ω. Define Xn(ω)={1,ω[0,1n2)0,ω[1n2,1] Show that Xnp0.

Solution

Note that Xn={1with probability 1n20with probability 11n2 Pick any ε>0. If ε>1, then P(Xn>ε)=0. So, we assume that ε(0,1). Then, P(Xn>ε)=P(Xn=1)=1n2 which converges to 0 as n. Therefore, Xnp0.

Example 2 Consider the same setup as , but the random variable Xn defined as Xn(ω)={1,ω[0,1n)0,ω[1n,1] Show that Xnp0.

Solution

This can be solved in exactly the same manner as .

Example 3 Consider the same setup as , but the random variable Xn defined as Xn(ω)={n,ω[0,1n)0,ω[1n,1] Show that Xnp0.

Solution

This can be solved in exactly the same manner as .

Example 4 Consider an i.i.d. sequence {Xn}n1, where XnUniform(0,1). Define Yn=min{X1,,Xn}. Show that Ynp0.

Solution

Pick any ε>0. As in , if ε>1, then P(Yn>ε)=0. So, we assume that ε(0,1). Then, P(Yn>ε)=P(min{X1,,Xn}ε)=P(X1ε,X2ε,,Xnε)=(1ε)n which goes to zero as n. Thus, Ynp0.

3 Examples of almost sure convergence

We revisit the examples of previous section.

  • In , for any ω0, the sequence {Xn(ω)}n1 is a finite sequence of 1’s followed by infinite sequence of 0’s. Thus, limnXn(ω)=0. Thus, P({ω:limnXn(ω)=0})=P(ω(0,1])=1. Hence, Xna.s0.

  • In , the same argument as above works.

  • In , a slight variation of the above argument works.

  • In , we proceed as follows. Fix ω and consider the sequence {Yn(ω)}n1. Since this is a decreasing sequence, it must have a limit. Denote that limit by Y(ω), i.e., Y(ω)=limnYn(ω).

    Since {Yn}n1 is a decreasing sequence, we have that Y(ω)Yn(ω). Hence, for any ε>0, P(Y>ε)P(Yn>ε)=(1ε)n (where the last inequality follows from the calculation in the solution of ).

    The above inequality holds for every n, so we have P(Y>ε)limn(1ε)n=0. Recall that ε>0 was arbitrary. Therefore, we have shown that P(limnYn>ε)=0. Thus, the only possibility is that P(limnYn=0)=1. Hence Yna.s.0.

4 Almost sure convergence from convergence in probability.

It is possible to infer almost sure convergence from convergence in probability. For that we need to state a result. The proof is not difficult, but is omitted due to time.

Lemma 1 (Borel Cantelli Lemma) Let {An}n1 be a sequence of events defined on a common probability space (Ω,F,P). If the sum of the probability of the events is finite, i.e., n=1P(An)<, then the probability that infinitely many of them occur is zero, i.e., P(lim supnAn)=0.

There is a partial converse of Borel-Cantelli lemma.

Lemma 2 (Second Borel Cantelli Lemma) Let {An}n1 be a sequence of independent events defined on a common probability space (Ω,F,P). If the sum of the probability of the events is infinite, i.e., n=1P(An)=, then the probability that infinitely many of them occur is one, i.e., P(lim supnAn)=1.

An immediate implication of Borel-Cantelli lemma is the following:

Lemma 3 Suppose XnpX and for any ε>0, we have n=1P(|(XnX|>ε)< then Xna.s.X.

In light of the above result, we revisit some variations of the examples of the previous section.

  • Consider a variation where we no longer specify Xn as a function of ω but simply assume that Xn={1with probability 1n20with probability 11n2 Then for any ε>0, P(|(Xn|>ε)=P(Xn>ε)=1/n2. Therefore, n1P(|(Xn|>ε)<; hence by , Xna.s.0.

  • Consider a variation where we no longer specify Xn as a function of ω but simply assume that Xn={1with probability 1n0with probability 11n and {Xn}n1 are independent.

    Then for any ε>0, P(|(Xn|>ε)=P(Xn>ε)=1/n. Therefore, n1P(|(Xn|>ε)=; hence by the Second Borel-Cantelli lemma, P(lim supn{|(Xn|>ε})=1. So, Xn does not converge almost surely!

  • In , we can directly apply to argue that Yna.s.0.

A variation of is the following:

Lemma 4 Let {Xn}n1 be a sequence of random variables with finite expectations and let X be another random variable. If n=1E[|(XnX|]< then Xna.s.X.

Proof

To simplify the notation, we assume that X=0.

Pick any ε>0 and define the sequence of events An={ω:|(Xi|>ε},nN.

From Markov inequality, we have P(An)=P(|(Xi|>ε)E[|(Xi|]ε. Therefore, n=1P(An)1εn=1E[|(Xi|]< by the hypothesis of the result. Therefore, by Borel-Cantelli lemma, we have P(lim supnAn)=0.

5 Some properties of convergence of sequence of random variables

We now state some properties without proof.

  1. The three notions of convergence that we have defined are related as follows: [Xna.s.X][XnpX][XnDX]
Proof that almost sure convergence implies convergence in probability

Fix ε>0. Define An={ω:mn,|(Xm(ω)X|ε}. Then, {An}n1 is a decreasing sequence of events. If ωn1An, then Xn(ω)a.sX(ω). This implies P(n1An)P({ω:limnXn(ω)X(ω)})=0. By continuity of probability, limnP(An)=P(limnAn)=0.

Proof that convergence in probability implies convergence in distribution

Let Fn and F denote the CDFs of Xn and X, respectively. Fix ε>0, pick x such that F is continuous at x, and consider Fn(x)=P(Xnx)=P(Xnx,Xx+ε)+P(Xnx,X>x+ε)P(Xx+ε)+P(XXn>ε)F(x+ε)+P(|(XnX|>ε). Similarly, F(xε)=P(Xxε)=P(Xxε,Xnx)+P(Xxε,Xn>x)P(Xnx)+P(XnX>ε)Fn(x)+P(|(XnX|>ε).

Thus, F(xε)P(|(XnX|>ε)Fn(x)F(x+ε)+P(|(XnX|>ε). Taking n, we have F(xε)lim infxFn(x)lim supxFn(x)F(x+ε). The result is true for all ε>0. Since F is continuous at x, when we take ε0, we have F(xε)F(x)andF(x+ε)F(x) which implies that Fn(x)F(x).

  1. There are partial converss. For any constant c [XnDc][Xnpc]. If {Xn}n1 is a strictly decreasing sequence then [Xnpc][Xna.s.c].

  2. If XnpX, then there exists a subsequence {nk:kN} such that {Xnk}k1 converges almost surely to X.

  3. XnpX if and only if every subsequence {nk:kN} has a sub-subsequence {nkm:mN} such that {Xnkm}m1 that converges to X almost surely.

  4. Skorokhod’s representation theorem. If XnDX, then there exists a sequence {Yn}n1 that is identically distributed to {Xn}n1 such that YnY, where Y is identically distributed to X.

  5. Continuous mapping theorems. Let g:RR be a continuous function. Then,

    • Xna.s.X implies g(Xn)a.s.g(X).
    • XnpX implies g(Xn)pg(X).
    • XnDX implies g(Xn)Dg(X).
  6. Convergence of sums.

    • If Xna.s.X and Yna.s.Y, then Xn+Yna.s.X+Y.
    • If XnpX and YnpY, then Xn+YnpX+Y.
    • It is not true in general that Xn+YnDX+Y whenever XnpX and YnpY. The result is true when X or Y is a constant.

If XnpX and YnpY, then Xn+YnX+Y and XnYnXY.

6 Strong law of large numbers

Theorem 1 Let {Xn}n1 be an i.i.d. sequence of random variables with mean μ and variance σ2. Let X¯n=1ni=1nXi be the sameple average. Then, X¯na.sμ, i.e., P(ω:limnXn(ω)=μ)=1.

We provide a proof under the assumption that the fouth moment’s exist.

Proof

We assume that μ=0 (this is just for notational simplicity) and E[X4]=γ< (this is a strong assumption).

Since we know that the forth moment exist, we can use a forth moment version of Chebyshev inequality: P|(X¯n|ε)E[X¯n4]ε2.

Then, by the multinomial theorem, we have EX¯n4=14E[iXi4+(41,3)ijXiXj3+(42,2)ijXi2Xj2+(41,1,2)ijkXiXjXk2+ijkXiXjXkX].

Since the {Xi}i1 are independent and zero mean, we have

  • E[XiXj3]=E[Xi]E[Xj3]=0.
  • = = 0$.

Therefore, E[X¯n4=1n4[nE[Xi4]+3n(n1)E[Xi2Xj2]]M4n3+3σ4n2 where M4 is the forth moment. Now, from the forth moment version of Chebyshev inequality, we have P|(X¯n|ε)M4n3+3σ4n2ε2. This implies that n=1P(|(Xn|ε)<. Thus, from , we have that Xna.s.0.

7 There’s more

There’s another type of convergence commonly used in engineering: convergence in mean-squared sense. In the interest of time, we will not study this notion in class.

close all nutshells