3 Random vectors

Author

Affiliation

Updated

December 1, 2025

Suppose \(X\) and \(Y\) are two random variables defined on the same probability space. The CDFs \(F_X\) and \(F_Y\) provide information about their individual probabilities. However, we are often interested in functions of the two random variables, e.g., \[ X + Y, \quad XY, \quad \max(X,Y), \quad \min(X,Y), \quad \text{etc.} \] To understand how they behave together, we need to think of the random vector \((X,Y)\) taking values in \(\reals^2\). The natural way to do so is to think of the joint CDF \[ F_{X,Y}(x,y) = \PR(\{ ω \in Ω : X(ω) \le x, Y(ω) \le y \}) \] where we may write the right hand side as \(\PR(X \le x, Y \le y)\) for short.

Lemma 3.1 (Properties of CDFs)

Regularity properties
1. \(\lim_{x \to -∞} F_{X,Y}(x,y) = 0\), \(\lim_{y \to -∞} F_{X,Y}(x,y) = 0\) and \(\lim_{x,y \to +∞} F(x,y) = 1\).
2. Joint CDFs are non-decreasing, i.e., if \((x_1,y_1) < (x_2, y_2)\), then \(F_{X,Y}(x_1,y_1) \le F_{X,Y}(x_2,y_2)\).
3. Joint CDFs are continuous from above, i.e., \[\lim_{u,v \downarrow 0}F_{X,Y}(x+u,y+v) = F_{X,Y}(x,y).\]
4. \(\PR(X = x, Y = y) = F(x,y) - F(x^{-},y^{-})\).
Marginalization of joint CDFs
1. \(\lim_{y \to ∞} F_{X,Y}(x,y) = F_X(x)\)
2. \(\lim_{x \to ∞} F_{X,Y}(x,y) = F_Y(y)\)

Example 3.1 Consider random variables \(X\) and \(Y\) with joint CDF \(F_{X,Y}\). Show that \[ \PR(a < X \le b, c < Y \le d) = F_{X,Y}(b,d) - F_{X,Y}(a,d) - F_{X,Y}(b,c) + F_{X,Y}(a,c). \] Hint: Draw each region.

Example 3.2 Suppose \[ F_{X,Y}(x,y) = \begin{cases} \dfrac{y + e^{-x(y+1)}}{y+1} - e^{-x}, & x,y > 0 \\ 0, & \text{otherwise} \end{cases} \] Find \(F_X(x)\) and \(F_Y(y)\).

Solution

We have \[ F_X(x) = \lim_{y \to ∞} F_{X,Y}(x,y) = \begin{cases} 1 - e^{-x}, & x > 0 \\ 0 , & \text{otherwise} \end{cases} \] \[ F_Y(y) = \lim_{x \to ∞} F_{X,Y}(x,y) = \begin{cases} \dfrac{y}{y+1}, & y > 0 \\ 0 , & \text{otherwise} \end{cases} \]

3.1 Classification of random vectors

As was the case for random variables, we can also classify random vectors as discrete, continuous, and mixed.

A random vector \((X,Y)\) is called jointly discrete if it takes values in a countable subset of \(\reals^2\) (we denote this subset by \(\ALPHABET X \times \ALPHABET Y\)). The jointly discrete random variables have a joint PMF \(P_{X,Y} \colon \reals^2 \to [0,1]\) given by \[ \PR(X = x, Y = y) = P_{X,Y}(x,y). \]
A random vector \((X, Y)\) is called jointly continuous (or absolutely continuous) if there exists an integrable function \(f_{X,Y} \colon \reals^2 \to [0, ∞)\) such that \[ F_{X,Y}(x,y) = \int_{-∞}^x \int_{-∞}^{y} f_{X,Y}(u,v)\, du dv, \quad x,y \in \reals \] \(f_{X,Y}\) is called the joint PDF and can be computed as \[ f_{X,Y}(x,y) = \frac{∂^2}{∂x ∂y}F_{X,Y}(x,y). \]

Can we always define a density if CDF is continuous?

Continuity of the CDF is not sufficient for a joint density to exist. Consider the joint CDF \[ F_{X,Y}(x,y) = \begin{cases} 0, & \min(x,y) < 0 \\ \min(x,y), & 0 \le \min(x,y) \le 1 \\ 1, & \min(x,y) > 1 \end{cases} \] This is a valid CDF. It corresponds to \[ (X,Y) = (U,U), \quad \text{where } U \sim \text{Unif}[0,1], \] i.e., the distribution is uniform along the diagonal of the unit square.

The diagonal is a set a Lebesgue measure zero. That means for any ordinary function \(f_{X,Y}\) \[ \iint_{A} f_{X,Y}(x,y)\, dx\,dy = 0 \] for all subsets \(A\) of the diagonal. Outside the diagonal, the density must be zero. Hence, the density cannot integrate to one over the entire plane. Therefore, mathematically, a density does not exist. Such random variables are said to be singular.

Sometimes in the engineering literature, we sometimes set the PDF to be \[ f_{X,Y}(x,y) = δ(x-y), \quad 0 \le x,y \le 1 \] which produces the correct \(F_{X,Y}\) but delta functions are generalized functions. Thus, although the CDF is continuous, it does not have a joint density.

Lemma 3.2 (Properties of PMFs and PDFs)

Properties of PMFs
1. Normalization. For a jointly discrete random vector \((X,Y)\), \[\sum_{x,y \in \ALPHABET X × \ALPHABET Y}P_{X,Y}(x,y) = 1.\]
2. For any event \(A \in \ALPHABET F\), \[\PR((X,Y) \in A) = \sum_{(x,y) \in (\ALPHABET X \times \ALPHABET Y) \cap A} P_{X,Y}(x,y).\]
3. Marginalization.
  - \(\displaystyle \sum_{x \in \ALPHABET X} P_{X,Y}(x,y) = P_Y(y)\).
  - \(\displaystyle \sum_{y \in \ALPHABET Y} P_{X,Y}(x,y) = P_X(x)\).
Properties of PDFs
1. Normalization. For a jointly continuous random vector \((X,Y)\), \[\int_{-∞}^{∞} \int_{-∞}^{∞} f_{X,Y}(x,y)\, dxdy = 1.\]
2. For any event \(A \in \ALPHABET F\), \[\PR((X,Y) \in A) = \iint_{(x,y) \in A} f_{X,Y}(x,y)\,dxdy.\]
3. Marginalization.
  - \(\displaystyle \int_{-∞}^{∞} f_{X,Y}(x,y) dx = f_Y(y)\).
  - \(\displaystyle \int_{-∞}^{∞} f_{X,Y}(x,y) dy = f_X(x)\).

The above discussion generalizes in the obvious manner to more than two random variables. Thus, we can talk about random vectors \(X = (X_1, \dots, X_n) \in \reals^n\). In practice, we often do not make a distinction between random variables and random vectors and refer both of them simply as random variables.

Example 3.3 Consider jointly discrete random variables \(X \in \{1,2,3\}\) and \(Y \in \{1, 2, 3\}\) with joint PMF \[ P_{X,Y} = \MATRIX{ 0.1 & 0.1 & 0.2 \\ 0.2 & 0.1 & 0 \\ 0.3 & 0 & 0 } \]

Find the marginals \(P_X\) and \(P_Y\).
Find the probability of the event \(A = \{ X + Y = 3 \}\).

Example 3.4 Consider \(F_{X,Y}\) given in Example 3.2.

Find the joint density \(f_{X,Y}\)
Find the marginal densities \(f_X\) and \(f_Y\).
Find the probability of the event \(A = \{ X + Y \le 1 \}\).

Solution

For \(x,y > 0\), we have \[ f_{X,Y}(x,y) = \frac{∂^2}{∂x ∂y}F_{X,Y}(x,y) = x e^{-x(y+1)}. \] Thus, \[ f_{X,Y}(x,y) = \begin{cases} x e^{-x(y+1)}, & x,y > 0 \\ 0, & \text{otherwise} \end{cases} \]

Thus, \[ f_X(x) = \int_{-∞}^{∞} f_{X,Y}(x,y)\,dy = \begin{cases} e^{-x}, & x > 0 \\ 0, & x \le 0 \end{cases} \] and \[ f_Y(y) = \int_{-∞}^{∞} f_{X,Y}(x,y)\,dx = \begin{cases} \dfrac{1}{(1+y)^2}, & y > 0 \\ 0, & y \le 0 \end{cases} \]

Example 3.5 Consider a joint PDF \[ f_{X,Y}(x,y) = c x y, \quad 0 \le y \le x \le 1. \] Find the constant \(c\).

Solution

We know that the joint PDF must integrate to \(1\). Thus, \[\begin{align*} 1 &= \int_{-∞}^{∞} f_{X,Y}(x,y)\, dx\, dy \\ &= \int_{0}^{1} \int_{0}^x c xy\, dy\, dx \\ &= \int_{0}^{1} c x\frac{x^2}{2} dx \\ &= c \frac{x^3}{8}\biggr|_{0}^1 = \frac{c}{8} \end{align*}\] Therefore, \(c=8\).

3.2 Independence of random vectors

Definition 3.1 Two random variables \(X\) and \(Y\) defined on a common probability space \((Ω, \ALPHABET F, \PR)\) are said to be independent if the sigma algebras \(σ(X)\) and \(σ(Y)\) are independent.

The above definition means that if we take any (Borel) subsets \(B_1\) and \(B_2\) of \(\reals\), then the events \(\{X \in B_1\}\) and \(\{X \in B_2\}\) are independent, i.e., \[ \PR(X \in B_1, Y \in B_2) = \PR(X \in B_1) \PR(Y \in B_2). \]

Using this, we can show that following:

\(X\) and \(Y\) are independent if and only if \[ F_{X,Y}(x,y) = F_X(x) F_Y(y), \quad \forall x, y \in \reals. \]
Two jointly continuous random variables \(X\) and \(Y\) are independent if and only if \[ f_{X,Y}(x,y) = f_X(x) f_Y(y), \quad \forall x, y \in \reals. \]
Two jointly discrete random variables \(X\) and \(Y\) are independent if and only if \[ P_{X,Y}(x,y) = P_X(x) P_Y(y), \quad \forall x, y \in \reals. \]

Example 3.6 Consider the random variables \(X\) and \(Y\) with the joint PMF \(P_{X,Y}\) given in Example 3.3. Are these random variables independent?

Example 3.7 Consider the random variables \(X\) and \(Y\) with the joint CDF \(F_{X,Y}\) given in Example 3.2. Are these random variables independent?

Solution

Observe that \(F_{X,Y}(x,y) \neq F_X(x) F_Y(y)\). Hence, the two random variables are not independent.

We can also see that because \(f_{X,Y}(x,y) \neq f_X(x) f_Y(y)\).

Example 3.8 Consider random variables \(X\) and \(Y\) with joint PDF \(f_{X,Y}\) which is a uniform distribution on the unit square. Are \(X\) and \(Y\) independent?

Example 3.9 Consider random variables \(X\) and \(Y\) with joint PDF \(f_{X,Y}\) which is a uniform distribution on the unit triangle Are \(X\) and \(Y\) independent?

These definitions extend naturally to any number of random variables:

A sequence of random variables \(X_1, \dots, X_n\) are independent if and only if \[ F_{X_1, \dots, X_n}(x_1, \dots, x_n) = \prod_{i=1}^n F_{X_i}(x_i). \]
A sequence of jointly continuous random variables \(X_1, \dots, X_n\) are independent if and only if \[ f_{X_1, \dots, X_n}(x_1, \dots, x_n) = \prod_{i=1}^n f_{X_i}(x_i). \]
A sequence of jointly discrete random variables \(X_1, \dots, X_n\) are independent if and only if \[ P_{X_1, \dots, X_n}(x_1, \dots, x_n) = \prod_{i=1}^n P_{X_i}(x_i). \]

Unlike independence of events, we do not need to separately check for independence of subsets of random variables because that is automatically implied due to the marginalization property.

Example 3.10 Suppose (X, Y, Z) are independent with joint PDF: \[ f_{X,Y,Z}(x,y,z) = f_X(x)\, f_Y(y)\, f_Z(z). \]

Show that \(X\) and \(Y\) are independent.

Solution

To find the joint PDF of \(X\) and \(Y\), marginalize over \(Z\): \[ f_{X,Y}(x,y) = \int_{-\infty}^{\infty} f_{X,Y,Z}(x,y,z)\, dz = \int_{-\infty}^{\infty} f_X(x)\, f_Y(y)\, f_Z(z)\, dz. \]

Since \(f_X(x)\) and \(f_Y(y)\) do not depend on \(z\): \[ f_{X,Y}(x,y) = f_X(x)\, f_Y(y) \int_{-\infty}^{\infty} f_Z(z)\, dz = f_X(x)\, f_Y(y) \cdot 1. \]

Thus, \(X\) and \(Y\) remain independent after marginalization.

An immediate implication of the definition of independence is the following.

Proposition 3.1 Let \(X\) and \(Y\) be independent random variables defined on a common probability space. Consider \(U = g(X)\) and \(V = h(Y)\) for some (measurable) functions \(g\) and \(h\). Then, \(U\) and \(V\) are independent.

Proof

Consider any (Borel) subsets \(B_1\) and \(B_2\) of \(\reals\) and consider the events \(\{ U \in B_1 \}\) and \(\{ V \in B_2 \}\). Note that

\(\{ U \in B_1 \} = \{ X \in g^{-1}(B_1) \}\).
\(\{ V \in B_2 \} = \{ Y \in h^{-1}(B_2) \}\).

Since the random variables \(X\) and \(Y\) are independent, the events \(\{ X \in g^{-1}(B_1) \}\) and \(\{ Y \in h^{-1}(B_2) \}\). Which implies that the events \(\{ U \in B_1 \}\) and \(\{ V \in B_2 \}\) are independent. Consequently, the random variables \(U\) and \(V\) are independent.

Proposition 3.2 Let \(X\) and \(Y\) be independent random variables defined on a common probability space. Then \(X\) and \(Y\) are independent if and only if \[\begin{equation}\label{eq:expectation-product} \EXP[ g(X) h(Y) ] = \EXP[ g(X) ] \EXP[ h(Y) ] \end{equation}\] for all (measurable) functions \(g\) and \(h\).

Proof

There are two claims here.

If \(X\) and \(Y\) are independent then \(\eqref{eq:expectation-product}\) holds.
If \(\eqref{eq:expectation-product}\) holds, then \(X\) and \(Y\) are independent.

We will prove the first claim assuming that \(X\) and \(Y\) are continuous. Similar argument works for the discrete case as well. \[\begin{align*} \EXP[ g(X) h(Y) ] &= \int_{-∞}^∞ \int_{-∞}^∞ g(x) h(y) f_{X,Y}(x,y)\, dx dy \\ &\stackrel{(a)}= \int_{-∞}^∞ \int_{-∞}^∞ g(x) h(y) f_{X}(x) f_{Y}(y)\, dy dx \\ &\stackrel{(b)}= \int_{-∞}^∞ \left[ \int_{-∞}^∞ g(x)f_{X}(x)\, dx \right]h(y) f_{Y}(y) \, dy \\ &\stackrel{(c)}= \left[ \int_{-∞}^∞ g(x)f_{X}(x)\, dx \right] \left[\int_{-∞}^∞ h(y) f_{Y}(y) \, dy \right] \\ &= \EXP[ g(X) ] \EXP [ h(Y) ] \end{align*}\] where \((a)\) follows from the fact that \(X \independent Y\), \((b)\) and \((c)\) are simple algebra, and the last step uses the definition of expectation.

To prove the second claim, pick any (Borel) subsets \(B_1\) and \(B_2\) of \(\reals\) and consider the functions \(g(x) = \IND_{B_1}(x)\) and \(h(y) = \IND_{B_2}(y)\). Observe that \[\begin{align*} \PR(X \in B_1, Y \in B_2) &= \EXP[\IND_{ \{ X \in B_1, Y \in B_2 \}}] \\ &\stackrel{(d)}= \EXP[\IND_{ \{ X \in B_1 \}} \IND_{\{ Y \in B_2 \}}] \\ &\stackrel{(e)}=\EXP[\IND_{ \{ X \in B_1 \}}\ \EXP[ \IND_{\{ Y \in B_2 \}}] \\ &\stackrel{(f)}= \PR(X \in B_1) \PR(Y \in B_2) \end{align*}\] where \((d)\) follows from basic algebra, \((e)\) follows from \(\eqref{eq:expectation-product}\), and \((f)\) follows from expectation of an indicator.

The above equation shows that for any arbitrary (Borel) subsets \(B_1\) and \(B_2\) of \(\reals\), \(\PR(X \in B_1, Y \in B_2) = \PR(X \in B_1) \PR(Y \in B_2)\). Hence, \(\{X \in B_1\} \independent \{Y \in B_2 \}\). Since \(B_1\) and \(B_2\) were arbitrary, we have \(X \independent Y\).

Example 3.11 Let \(X\) and \(Y\) be independent random variables defined on a common probability space. Show that

\(\EXP[XY] = \EXP[X] \EXP[Y]\).
\(\VAR(X + Y) = \VAR(X) + \VAR(Y)\).

Definition 3.2 A collection of random variables \(X_1, \dots, X_n\) is called independent and identically distributed (i.i.d.) if all random variables are independent and have the same distribution, i.e., \[ F_{X_1}(x_1) = F_{X_2}(x_2) = \cdots = F_{X_n}(x_n). \]

3.3 Functions of random variables

In an interconnected systems, the output of one system is used as input to another system. To analyze such systems, it is important to understand how to analyze functions of random variables.

The same idea can be used for functions of multiple random variables as we illustrate via the following examples.

Example 3.12 Let \(X\) and \(Y\) be random variables defined on a common probability space. Define \[ U = \max(X,Y) \quad V = \min(X,Y). \] Find \(F_U\) and \(F_V\).

Solution

We first look at \(F_U\). By definition \[ F_U(u) = \PR(X \le u, Y \le u) = F_{X,Y}(u,u).\]
Now consider \(F_V\). The event \(\{V \le v\}\) can be expressed as \[ \{ V \le v \} = \{ X \le v \} \cup \{Y \le v \} \cap \{X \le v \} \cap \{Y \le v\}.\] Thus, \[F_V(v) = F_X(v) + F_Y(v) - F_{X,Y}(v,v). \]

Example 3.13 Suppose \(X_1\) and \(X_2\) are continuous random variables and \(Y = X_1 + X_2\). Find the PDF \(f_Y(y)\).

Solution

We can write the CDF \(F_Y(y)\) as follows: \[ F_Y(y) = \int_{-∞}^∞ \int_{-∞}^{y - x_1} f_{X_1,X_2}(x_1, x_2)\, d x_2 d x_1 \\ \] Therefore, \[\begin{align*} f_Y(y) &= \frac{d F_Y(y)}{dy} \\ &= \int_{-∞}^∞ \frac{d}{dy} \int_{-∞}^{y-x_1} f_{X_1, X_2}(x_1, x_2) \, dx_2\, dx_1 \\ &= \int_{-∞}^∞ f_{X_1, X_2}(x_1, y - x_1)\, dx_1. \end{align*}\]

Example 3.14 Repeat Example 3.13 when \(X_1\) and \(X_2\) are independent.

Solution

In this case, \(f_{X_1, X_2}(x_1, x_2) = f_{X_1}(x_1) f_{X_2}(x_2)\). Therefore, we get \[f_Y(y) = \int_{-∞}^{∞} f_{X_1}(x_1) f_{X_2}(y - x_1) d x_1 = (f_{X_1} * f_{X_2})(y)\] where \(*\) denotes convolution.

Example 3.15 Repeat Example 3.14 when \(X_1 \sim \text{Poisson}(λ_1)\) and \(X_2 \sim \text{Poisson}(λ_2)\).

Solution

Recall that for a Poisson random variable \(X\) with parameter \(λ\) \[ P_X(k) = e^{-λ} \frac{λ^k}{k!}, \quad k \ge 0 \]

Thus, \[\begin{align*} P_Y(n) &= (P_{X_1} * P_{X_2})(n) = \sum_{k=-∞}^{∞} P_{X_1}(k) P_{X_2}(n-k) \\ &=\sum_{k=0}^{n} P_{X_1}(k) P_{X_2}(n-k) \\ &= \sum_{k=0}^n e^{-λ_1 - λ_2} \frac{ λ_1^k λ_2^{n-k} }{ k! (n-k)! } \\ &= e^{-(λ_1 + λ_2)} \frac{1}{n!} \sum_{k=0}^n \frac{n!}{k!(n-k)!} λ_1^k λ_2^{n-k} \\ &= e^{-(λ_1 + λ_2)} \frac{(λ_1 + λ_2)^n}{n!} \end{align*}\]

Thus, \(Y \sim \text{Poisson}(λ_1 + λ_2)\).

3.3.1 Change of variables formulas

For continuous random variables, it is possible to obtain a general change of variable formula to obtain the PDF of functions of random variable in terms of their joint PDF.

Suppose \(\{X_1, \dots, X_n\}\) are jointly continuous random variables with joint PDF \(f\). Consider \(n\) random variables: \[\begin{align*} Y_1 &= g_1(X_1, \dots, X_n), \\ Y_2 &= g_2(X_1, \dots, X_n), \\ \vdots &= \vdots \\ Y_n &= g_n(X_1, \dots, X_n). \end{align*}\] We can view this as an equation between two \(n\)-dimensional vectors \(Y = \VEC(Y_1, \dots, Y_n)\) and \(X = \VEC(X_1, \dots, X_n)\) written as \[ Y = g(X) \]

As was the case for the scalar system, for a given \(y \in \reals^n\), the vector equation \(y = g(x)\) may have zero, one, or multiple solutions.

If \(y = g(x)\), \(y \in \reals^n\) has no solution, then \[ f_Y(y) = 0. \]
If \(y = g(x)\), \(y \in \reals^n\) has one solution \(x \in \reals^n\), then \[ f_Y(y) = \frac{f_X(x)}{\ABS{J(x)}}, \quad \text{where } y = g(x)\] and \(J(x)\) denotes the Jacobian on \(g(x)\) evaluated at \(x = (x_1, \dots, x_n)\), i.e., \[ \def\1#1#2{\dfrac{∂ g_{#1}}{∂ x_{#2}}} J(x_1, \dots, x_n) = \DET{ \1 11 & \cdots & \1 1n \\ \vdots & \vdots & \vdots \\ \1 n1 & \cdots & \1 nn } \]
If \(y = g(x)\), \(y \in \reals^n\) has multiple solutions given by \(\{x^{(1)}, \dots, x^{(m)}\}\), then \[ f_Y(y) = \sum_{k=1}^m \frac{f_X(x^{(k)})}{\ABS{J(x^{(k)})}}.\]

Example 3.16 Resolve Example 3.12 using the change of variables formula.

Solution

Let \(g_1(x,y) = \max\{x, y\}\) and \(g_2(x,y) = \min\{x,y\}\). Define \[ U = g_1(X,Y) \quad\text{and}\quad V = g_2(X,Y).\]

Define \(g(x,y) = \VEC(g_1(x,y), g_2(x,y))\). Note that \(g\) is not differentiable at \(x=y\).

When \(x > y\), we have \(g_1(x,y) = x\) and \(g_2(x,y) = y\). Thus, \[ J(x,y) = \DET{\1 11 & \1 12 \\ \1 21 & \1 22} = \DET{1 & 0 \\ 0 & 1} = 1. \]
When \(x < y\), we have \(g_1(x,y) = y\) and \(g_2(x,y) = x\). Thus, \[ J(x,y) = \DET{\1 11 & \1 12 \\ \1 21 & \1 22} = \DET{0 & 1 \\ 1 & 0} = -1. \]

We now compute \(f_{U,V}(u,v)\).

If \(u < v\), then the equation \((u,v) = g(x,y)\) has no solution. So we set \[ f_{U,V}(u,v) = 0. \]
If \(u > v\), then the equation \((u,v) = g(x,y)\) has two solutions: \(\{ (u,v), (v,u) \}\). Thus, \[ f_{U,V}(u,v) = \frac{f_{X,Y}(u,v)}{\ABS{1}} + \frac{f_{X,Y}(v,u)}{\ABS{-1}} = f_{X,Y}(u,v) + f_{X,Y}(v,u). \]
If \(u = v\), then the equation \((u,u) = g(x,y)\) has one solution \((u,u)\). Thus, \[ f_{U,V}(u,u) = f_{X,Y}(u,u). \] Note that \(u = v\) is a line in two-dimensional space. (Formally, it is a set of measure zero.) Hence, the choice of \(f_{U,V}\) at \(u = v\) will not affect any probability computations. So we can also set \[ f_{U,V}(u,u) = 0. \]

From the joint PDF \(f_{U,V}\), we can compute the marginals as follows:

For \(U\), we have \[ f_U(u) = \int_{-∞}^{∞} f_{U,V}(u,v) dv = \int_{-∞}^{u} \bigl[ f_{X,Y}(u,v) + f_{X,Y}(v,u) \bigr] dv. \] Therefore, \[ F_U(u) = \int_{-∞}^{u} f_U(\tilde u) d\tilde u = \int_{-∞}^u \int_{-∞}^{\tilde u} \bigl[ f_{X,Y}(\tilde u,v) + f_{X,Y}(v,\tilde u) \bigr] dv d\tilde u. \] Note that \[ \int_{-∞}^u \int_{-∞}^{\tilde u} f_{X,Y}(\tilde u, v) dv d\tilde u = \int_{-∞}^u \int_{-∞}^{x} f_{X,Y}(x, y) dy dx \] and \[\begin{align*} \int_{-∞}^u \int_{-∞}^{\tilde u} f_{X,Y}(v, \tilde u) dv d\tilde u &= \int_{-∞}^u \int_{-∞}^y f_{X,Y}(x,y) dx dy \\ &= \int_{-∞}^u \int_{x}^u f_{X,Y}(x,y) dy dx \end{align*}\] where the last step follows from changing the order of integration.

Substituting these back in the expression for \(F_U(u)\), we get \[ F_U(u) = \int_{-∞}^u \int_{-∞}^{x} f_{X,Y}(x, y) dy dx + \int_{-∞}^u \int_{x}^u f_{X,Y}(x,y) dy dx = \int_{-∞}^u \int_{-∞}^u f_{X,Y}(x,y) dy dx = F_{X,Y}(u,u). \]
For \(V\), we can follow similar algebra as above.

Example 3.17 Let \(X\) and \(Y\) be random variables defined on a common probability space. Define \[ U = X^2 \quad\text{and}\quad V = X + Y. \] Find \(F_{U,V}\)?

Solution

Let’s consider the system of equations \[ u = x^2 \quad\text{and}\quad v = x + y \] for a given value of \((u,v)\). First observe that \[ J(x,y) = \DET{ 2x & 0 \\ 1 & 1 } = 2x. \]

If \(u < 0\), then the system of equations has no solutions. Therefore, \[ F_{U,V}(u,v) = 0, \quad u < 0. \]
If \(u = 0\), then the system of equations has one solution: \[ x^{(1)} = 0 \quad\text{and}\quad y^{(1)} = v. \] However, \(J(0,v) = 0\). So, \[ f_{U,V}(0,v) = \frac{f_{X,Y}(0,v)}{J(0,v)} \] is undefined. However, since \(u = 0\) is a line in two-dimensions (i.e., a set of measure zero), the choice of \(f_{U,V}\) at \(u = 0\) will not affect any probability computations. So, we set \[ f_{U,V}(0,v) = 0. \]
If \(u > 0\), then the system of equations has two solutions: \[ (x^{(1)}, y^{(1)}) = (+\sqrt{u}, v - \sqrt{u}) \quad\text{and}\quad (x^{(2)}, y^{(2)}) = (-\sqrt{u}, v + \sqrt{u}) \] Therefore, \[ f_{U,V}(u,v) = \frac{f_{X,Y}(\sqrt{u}, v - \sqrt{u})}{2 \sqrt{u}} + \frac{f_{X,Y}(-\sqrt{u}, v + \sqrt{u})}{2 \sqrt{u}}. \]

3.4 Covariance of random vectors

We start with the definition of covariance of two real-valued random variables. We will then generalize the notion to random vectors.

3.4.1 Real-valued random variables

Let \(X\) and \(Y\) be real-valued random variables defined on the same probability space.

Covariance measures how two random variables vary together. In particular, let \(X\) and \(Y\) be jointly distributed random variables and let \(μ_X\) and \(μ_Y\) denote their means. Then, \[ \COV(X,Y) = \EXP[(X - μ_X) (Y - μ_Y)].\] Properties of expectation imply that \[ \COV(X,Y) = \EXP[XY] - \EXP[X] \EXP[Y]. \]
Correlation coefficient between \(X\) and \(Y\) is defined as \[ρ_{XY} = \frac{\COV(X,Y)}{\sqrt{\VAR(X) \VAR(Y)}}.\]
The correlation coefficient satisfies \(\ABS{ρ_{XY}} \le 1\) with equality if and only if \(\PR(aX + bY = c) = 1\) for some \(a,b,c \in \reals\). [The proof follows from Cauchy-Schwartz inequality, which we will study later]
\(X\) and \(Y\) are said to be uncorrelated if \(ρ_{XY} = 0\), which is equivalent to \(\COV(X,Y) = 0\) or \(\EXP[XY] = \EXP[X] \EXP[Y]\).
Note that \[\begin{align*} \VAR(X + Y) &= \EXP[ ((X - \EXP[X]) + (Y - \EXP[Y]) )^2 ] \\ &= \VAR(X) + \VAR(Y) + 2\COV(X,Y). \end{align*}\] Thus, when \(X\) and \(Y\) are uncorrelated, we have \[ \VAR(X + Y) = \VAR(X) + \VAR(Y). \]
Independent random variables are uncorrelated but the reverse is not true.

Example 3.18 Consider the probability space \((Ω, \ALPHABET F, \PR)\) where \(Ω = [0, 2 π)\), \(\ALPHABET F\) is the Borel \(σ\)-algebra on \([0, 2 π)\) and \(\PR\) is the uniform distribution on \(Ω\). Define \(X(ω) = \cos ω\) and \(Y(ω) = \sin ω\). Show that \(X\) and \(Y\) are uncorrelated but not independent.

Solution

The event \(\{X = 1\}\) corresponds to \(ω = 0\) and therefore \(\{Y = 0\}\). Thus, \(X\) and \(Y\) are not independent.

Observe that

\(\displaystyle \EXP[X] = \int_{0}^{2 π} \cos ω \frac{1}{2 π}\, d ω = 0\).
\(\displaystyle \EXP[Y] = \int_{0}^{2 π} \sin ω \frac{1}{2 π}\, d ω = 0\).
\(\displaystyle \EXP[XY] = \int_{0}^{2 π} \cos ω \sin ω \frac{1}{2 π}\, d ω = \frac{1}{4{π}} \int_0^{2 π} \cos 2 ω\, d ω = 0\).

Thus, \[\EXP[XY] = \EXP[X]\EXP[Y].\]

3.4.2 Random vectors

The idea of covariance generalizes to random vectors. First we recall the definition of expectation for random vectors and random matrices.

For a random vector \(X = [X_1, \dots, X_n] \in \reals^n\), we have \[ \EXP[X] = \MATRIX{ \EXP[X_1] & \cdots & \EXP[X_n] }. \]
For a random matrix \(X = \MATRIX{ X_{1,1} & \cdots & X_{1,n} \\ X_{2,1} & \cdots & X_{2,n} \\ \vdots & \vdots & \vdots \\ X_{m,1} & \cdots & X_{m,n} } \in \reals^{m \times n}\), we have \[ \EXP[X] = \MATRIX{ \EXP[X_{1,1}] & \cdots & \EXP[X_{1,n}] \\ \EXP[X_{2,1}] & \cdots & \EXP[X_{2,n}] \\ \vdots & \vdots & \vdots \\ \EXP[X_{m,1}] & \cdots & \EXP[X_{m,n}] }. \]

With the above notation, we have the following.

The covariance matrix of a random vector \(X \in \reals^n\) is defined as \[Σ_X = \COV(X) = \EXP[ (X - μ_X) (X - μ_X)^\TRANS].\]
When \(X = [X_1, \dots, X_n]^\TRANS\), the covariance can be written as \[ Σ_X = \MATRIX{ \VAR(X_1) & \COV(X_1, X_2) & \cdots & \COV(X_1, X_n) \\ \COV(X_2, X_1) & \VAR(X_2) & \cdots & \COV(X_2, X_n) \\ \vdots & \ddots & \ddots & \vdots \\ \COV(X_n, X_1) & \COV(X_n, X_2) & \cdots & \VAR(X_n) } \]
It is easy to see that \(Σ_X = Σ_X^\TRANS\). Thus, the covariance matrix is symmetric.
If the components \(X_1, \dots, X_n\) are independent, then \(Σ_X\) is a diagonal matrix.
\(Σ_X\) is positive semidefinite, i.e., its eigenvalues are real and non-negative. To see this, observe that for any (deterministic) vector \(v\), we have \[\begin{align*} v^\TRANS Σ_X v &= v^\TRANS \EXP[ (X - μ_X) (X - μ_X)^\TRANS ] v \\ &= \EXP[ v^\TRANS (X - μ_X) (X - μ_X)^\TRANS v ] \\ &= \EXP[ w^\TRANS w ] = \EXP[ \NORM{w}^2 ] \ge 0. \end{align*}\] where \(w = (X - μ_X)^\TRANS v\).
The cross covariance matrix of random vectors \(X \in \reals^n\) and \(Y \in \reals^m\) is a \(n × p\) matrix given by \[ Σ_{XY} = \COV(X,Y) = \EXP[ (X - μ_X) (Y - μ_Y)^\TRANS ]. \]
Two random vectors \(X\) and \(Y\) are called uncorrelated if \[ Σ_{XY} = 0 \implies \EXP[XY^\TRANS] = \EXP[X] \EXP[Y]^\TRANS. \]
Two random vectors \(X\) and \(Y\) are called orthogonal if \[\EXP[X Y^\TRANS] = 0 \]

3.4.3 Linear transformation of mean and covariance

Consider a random vector (not necessarily Gaussian) \(X = [X_1, \dots, X_n]^\TRANS\) with mean \(μ_X\) and covariance \(Σ_X\).

Let \(A \in \reals^{n × n}\) be a deterministic matrix and \(b \in \reals^n\) be a deterministic vector.

Define \(Y = A X + b\) and let \(μ_Y\) and \(Σ_Y\) denote the mean and covariance of \(Y\). Then, \[ μ_Y = A μ_X + b \quad\text{and}\quad Σ_Y = A Σ_X A^\TRANS \]

Proof

The property of the mean follows immediately from properties of expectation. For the covariance, note that \[\begin{align*} Σ_Y &= \EXP[ (Y - μ_Y) (Y - μ_Y)^\TRANS ] \\ &= \EXP[ (A X - A μ_X) (AX - A μ_X)^\TRANS ] \\ &= \EXP[ A (X - μ_X) (X - μ_X)^\TRANS A^\TRANS ] \\ &= A Σ_X A^\TRANS \end{align*}\]

3.5 Multi-dimensional Gaussian

A \(n\)-dimensional Gaussian with mean \(μ\) and covariance \(Σ\), written as \(\mathcal{N}(μ, Σ)\), has the PDF \[ f_X(x) = \frac{1}{\sqrt{(2 π)^n \det(Σ)}} \exp\left( - \tfrac{1}{2} (x - μ)^\TRANS Σ^{-1} (x-μ) \right), \quad x \in \reals^n. \]

We consider a few special cases to understand the definition.

Consider the special case when all components are independent, let \[ μ = \MATRIX{μ_1 \\ \vdots μ_n}, \quad\text{and}\quad Σ = \MATRIX{σ_1^2 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & σ_n^2 } \] where \(μ_i = \EXP[X_i]\) and \(σ_i^2 = \VAR(X_i)\). Observe that \[ \det(Σ) = \prod_{i=1}^n σ_i^2 \] and \[ (x - μ)^\TRANS Σ^{-1} (x - μ) = \sum_{i=1}^n \frac{(x_i - μ_i)^2}{σ_i^2}. \] Thus, \[ f_X(x) = \prod_{i=1}^n \frac{1}{\sqrt{2 π σ_i^2}} \exp\left(-\frac{(x_i-μ_i)^2}{2 σ_i^2} \right) \] which is the product of the marginal distributions.
Consider the two-dimensional Gaussian vector and let \(σ_i^2 = \VAR(X_i)\) and \(ρ = \COV(X_1,X_2)/σ_1 σ_2\). Then, \[ Σ = \MATRIX{ σ_1^2 & ρ σ_1 σ_2 \\ ρ σ_1 σ_2 & σ_2^2 }. \] When \(\ABS{ρ} = 1\), the distribution is singular. When \(\ABS{ρ} < 1\), \(\det(Σ) = (1 - ρ^2)σ_1^2 σ_2^2\) and \[ Σ^{-1} = \frac{1}{(1 - ρ^2)σ_1^2 σ_2^2} \MATRIX{ σ_2^2 & -ρ σ_1^2 σ_2^2 \\ -ρ σ_1^2 σ_2^2 & σ_1^2}. \] Thus, \[ (x-μ)^\TRANS Σ^{-1} (x - μ) = \frac{1}{1 - ρ^2} \biggl[ \frac{(x_1 - μ_1)^2}{2 σ_1^2 } - 2 ρ \frac{(x_1 - μ_1)(x_2 - μ_2)}{σ_1 σ_2} + \frac{(x_2 - μ_2)^2}{2 σ_2^2 } \biggr]. \] Thus, the level set \[(x - μ)^\TRANS Σ^{-1} (x - μ) = k^2\] is an ellipse centered at \(μ\).
- When \(ρ = 0\), the axes of the ellipse are aligned with the coordinates.
- When \(ρ > 0\), the ellipse tilts towards the \(x_1 = x_2\) diagonal
- When \(ρ < 0\), the ellipse tilts towards the anti-diagonal.

3.5.1 Linear transformation of Gaussian vectors

If real-valued random variables \(X\) and \(Y\) are jointly Gaussian, then \(X + Y\) is Gaussian. We will prove this later using moment generating functions.
If \(X\) and \(Y\) are jointly Gaussian, then they are independent if and only if they are uncorrelated.
Let \(X\) let a Gaussian random vector with mean \(μ_X\) and covariance \(Σ_X\) and \(Y = AX + b\) for a constant matrix \(A\) and vector \(b\). Then \(Y\) is also a Gaussian vector with \[μ_Y = A μ_X + b \quad\text{and}\quad Σ_Y = A Σ_X A^\TRANS. \]
Let \(X \sim \mathcal{N}(μ, Σ)\) and define \(Z = Σ^{-\frac 12}(X - μ)\). Then, \[ μ_Z = Σ^{-1}(μ - μ) = 0 \quad\text{and}\quad Σ_Z = Σ^{-\frac 12} Σ Σ^{-\frac 12} = I. \] Thus, \(Z \sim \mathcal{N}(0, I)\).