3 Random vectors
Suppose \(X\) and \(Y\) are two random variables defined on the same probability space. The CDFs \(F_X\) and \(F_Y\) provide information about their individual probabilities. However, we are often interested in functions of the two random variables, e.g., \[ X + Y, \quad XY, \quad \max(X,Y), \quad \min(X,Y), \quad \text{etc.} \] To understand how they behave together, we need to think of the random vector \((X,Y)\) taking values in \(\reals^2\). The natural way to do so is to think of the joint CDF \[ F_{X,Y}(x,y) = \PR(\{ ω \in Ω : X(ω) \le x, Y(ω) \le y \}) \] where we may write the right hand side as \(\PR(X \le x, Y \le y)\) for short.
Lemma 3.1 (Properties of CDFs)
Regularity properties
\(\lim_{x \to -∞} F_{X,Y}(x,y) = 0\), \(\lim_{y \to -∞} F_{X,Y}(x,y) = 0\) and \(\lim_{x,y \to +∞} F(x,y) = 1\).
Joint CDFs are non-decreasing, i.e., if \((x_1,y_1) < (x_2, y_2)\), then \(F_{X,Y}(x_1,y_1) \le F_{X,Y}(x_2,y_2)\).
Joint CDFs are continuous from above, i.e., \[\lim_{u,v \downarrow 0}F_{X,Y}(x+u,y+v) = F_{X,Y}(x,y).\]
\(\PR(X = x, Y = y) = F(x,y) - F(x^{-},y^{-})\).
Marginalization of joint CDFs
- \(\lim_{y \to ∞} F_{X,Y}(x,y) = F_X(x)\)
- \(\lim_{x \to ∞} F_{X,Y}(x,y) = F_Y(y)\)
Example 3.1 Consider random variables \(X\) and \(Y\) with joint CDF \(F_{X,Y}\). Show that \[ \PR(a < X \le b, c < Y \le d) = F_{X,Y}(b,d) - F_{X,Y}(a,d) - F_{X,Y}(b,c) + F_{X,Y}(a,c). \] Hint: Draw each region.
Example 3.2 Suppose \[ F_{X,Y}(x,y) = \begin{cases} \dfrac{y + e^{-x(y+1)}}{y+1} - e^{-x}, & x,y > 0 \\ 0, & \text{otherwise} \end{cases} \] Find \(F_X(x)\) and \(F_Y(y)\).
We have \[ F_X(x) = \lim_{y \to ∞} F_{X,Y}(x,y) = \begin{cases} 1 - e^{-x}, & x > 0 \\ 0 , & \text{otherwise} \end{cases} \] \[ F_Y(y) = \lim_{x \to ∞} F_{X,Y}(x,y) = \begin{cases} \dfrac{y}{y+1}, & y > 0 \\ 0 , & \text{otherwise} \end{cases} \]
3.1 Classification of random vectors
As was the case for random variables, we can also classify random vectors as discrete, continuous, and mixed.
A random vector \((X,Y)\) is called jointly discrete if it takes values in a countable subset of \(\reals^2\) (we denote this subset by \(\ALPHABET X \times \ALPHABET Y\)). The jointly discrete random variables have a joint PMF \(P_{X,Y} \colon \reals^2 \to [0,1]\) given by \[ \PR(X = x, Y = y) = P_{X,Y}(x,y). \]
A random vector \((X, Y)\) is called jointly continuous (or absolutely continuous) if there exists an integrable function \(f_{X,Y} \colon \reals^2 \to [0, ∞)\) such that \[ F_{X,Y}(x,y) = \int_{-∞}^x \int_{-∞}^{y} f_{X,Y}(u,v)\, du dv, \quad x,y \in \reals \] \(f_{X,Y}\) is called the joint PDF and can be computed as \[ f_{X,Y}(x,y) = \frac{∂^2}{∂x ∂y}F_{X,Y}(x,y). \]
A random vector \((X,Y)\) is called jointly mixed if it is neither jointly discrete nor jointly continuous.
Continuity of the CDF is not sufficient for a joint density to exist. Consider the joint CDF \[ F_{X,Y}(x,y) = \begin{cases} 0, & \min(x,y) < 0 \\ \min(x,y), & 0 \le \min(x,y) \le 1 \\ 1, & \min(x,y) > 1 \end{cases} \] This is a valid CDF. It corresponds to \[ (X,Y) = (U,U), \quad \text{where } U \sim \text{Unif}[0,1], \] i.e., the distribution is uniform along the diagonal of the unit square.
The diagonal is a set a Lebesgue measure zero. That means for any ordinary function \(f_{X,Y}\) \[ \iint_{A} f_{X,Y}(x,y)\, dx\,dy = 0 \] for all subsets \(A\) of the diagonal. Outside the diagonal, the density must be zero. Hence, the density cannot integrate to one over the entire plane. Therefore, mathematically, a density does not exist. Such random variables are said to be singular.
Sometimes in the engineering literature, we sometimes set the PDF to be \[ f_{X,Y}(x,y) = δ(x-y), \quad 0 \le x,y \le 1 \] which produces the correct \(F_{X,Y}\) but delta functions are generalized functions. Thus, although the CDF is continuous, it does not have a joint density.
Lemma 3.2 (Properties of PMFs and PDFs)
Properties of PMFs
Normalization. For a jointly discrete random vector \((X,Y)\), \[\sum_{x,y \in \ALPHABET X × \ALPHABET Y}P_{X,Y}(x,y) = 1.\]
For any event \(A \in \ALPHABET F\), \[\PR((X,Y) \in A) = \sum_{(x,y) \in (\ALPHABET X \times \ALPHABET Y) \cap A} P_{X,Y}(x,y).\]
Marginalization.
- \(\displaystyle \sum_{x \in \ALPHABET X} P_{X,Y}(x,y) = P_Y(y)\).
- \(\displaystyle \sum_{y \in \ALPHABET Y} P_{X,Y}(x,y) = P_X(x)\).
Properties of PDFs
Normalization. For a jointly continuous random vector \((X,Y)\), \[\int_{-∞}^{∞} \int_{-∞}^{∞} f_{X,Y}(x,y)\, dxdy = 1.\]
For any event \(A \in \ALPHABET F\), \[\PR((X,Y) \in A) = \iint_{(x,y) \in A} f_{X,Y}(x,y)\,dxdy.\]
Marginalization.
- \(\displaystyle \int_{-∞}^{∞} f_{X,Y}(x,y) dx = f_Y(y)\).
- \(\displaystyle \int_{-∞}^{∞} f_{X,Y}(x,y) dy = f_X(x)\).
The above discussion generalizes in the obvious manner to more than two random variables. Thus, we can talk about random vectors \(X = (X_1, \dots, X_n) \in \reals^n\). In practice, we often do not make a distinction between random variables and random vectors and refer both of them simply as random variables.
Example 3.3 Consider jointly discrete random variables \(X \in \{1,2,3\}\) and \(Y \in \{1, 2, 3\}\) with joint PMF \[ P_{X,Y} = \MATRIX{ 0.1 & 0.1 & 0.2 \\ 0.2 & 0.1 & 0 \\ 0.3 & 0 & 0 } \]
- Find the marginals \(P_X\) and \(P_Y\).
- Find the probability of the event \(A = \{ X + Y = 3 \}\).
Example 3.4 Consider \(F_{X,Y}\) given in Example 3.2.
- Find the joint density \(f_{X,Y}\)
- Find the marginal densities \(f_X\) and \(f_Y\).
- Find the probability of the event \(A = \{ X + Y \le 1 \}\).
For \(x,y > 0\), we have \[ f_{X,Y}(x,y) = \frac{∂^2}{∂x ∂y}F_{X,Y}(x,y) = x e^{-x(y+1)}. \] Thus, \[ f_{X,Y}(x,y) = \begin{cases} x e^{-x(y+1)}, & x,y > 0 \\ 0, & \text{otherwise} \end{cases} \]
Thus, \[ f_X(x) = \int_{-∞}^{∞} f_{X,Y}(x,y)\,dy = \begin{cases} e^{-x}, & x > 0 \\ 0, & x \le 0 \end{cases} \] and \[ f_Y(y) = \int_{-∞}^{∞} f_{X,Y}(x,y)\,dx = \begin{cases} \dfrac{1}{(1+y)^2}, & y > 0 \\ 0, & y \le 0 \end{cases} \]
Example 3.5 Consider a joint PDF \[ f_{X,Y}(x,y) = c e^{-x} e^{-y}, \quad 0 \le y \le x < ∞. \] Find the constant \(c\).
We know that the joint PDF must integrate to \(1\). Thus, \[\begin{align*} 1 &= \int_{-∞}^{∞} f_{X,Y}(x,y)\, dx\, dy \\ &= \int_{0}^{∞} \int_{0}^x c e^{-x} e^{-y}\, dy\, dx \\ &= \int_{0}^{∞} c e^{-x}(1 - e^{-x}) dx \\ &= \frac{c}{2}. \end{align*}\] Therefore, \(c=2\).
3.2 Independence of random vectors
Definition 3.1 Two random variables \(X\) and \(Y\) defined on a common probability space \((Ω, \ALPHABET F, \PR)\) are said to be independent if the sigma algebras \(σ(X)\) and \(σ(Y)\) are independent.
The above definition means that if we take any (Borel) subsets \(B_1\) and \(B_2\) of \(\reals\), then the events \(\{X \in B_1\}\) and \(\{X \in B_2\}\) are independent, i.e., \[ \PR(X \in B_1, Y \in B_2) = \PR(X \in B_1) \PR(Y \in B_2). \]
Using this, we can show that following:
\(X\) and \(Y\) are independent if and only if \[ F_{X,Y}(x,y) = F_X(x) F_Y(y), \quad \forall x, y \in \reals. \]
Two jointly continuous random variables \(X\) and \(Y\) are independent if and only if \[ f_{X,Y}(x,y) = f_X(x) f_Y(y), \quad \forall x, y \in \reals. \]
Two jointly discrete random variables \(X\) and \(Y\) are independent if and only if \[ P_{X,Y}(x,y) = P_X(x) P_Y(y), \quad \forall x, y \in \reals. \]
Example 3.6 Consider the random variables \(X\) and \(Y\) with the joint PMF \(P_{X,Y}\) given in Example 3.3. Are these random variables independent?
Example 3.7 Consider the random variables \(X\) and \(Y\) with the joint CDF \(F_{X,Y}\) given in Example 3.2. Are these random variables independent?
Observe that \(F_{X,Y}(x,y) \neq F_X(x) F_Y(y)\). Hence, the two random variables are not independent.
We can also see that because \(f_{X,Y}(x,y) \neq f_X(x) f_Y(y)\).
Example 3.8 Consider random variables \(X\) and \(Y\) with joint PDF \(f_{X,Y}\) which is a uniform distribution on the unit square. Are \(X\) and \(Y\) independent?
Example 3.9 Consider random variables \(X\) and \(Y\) with joint PDF \(f_{X,Y}\) which is a uniform distribution on the unit triangle Are \(X\) and \(Y\) independent?
These definitions extend naturally to any number of random variables:
A sequence of random variables \(X_1, \dots, X_n\) are independent if and only if \[ F_{X_1, \dots, X_n}(x_1, \dots, x_n) = \prod_{i=1}^n F_{X_i}(x_i). \]
A sequence of jointly continuous random variables \(X_1, \dots, X_n\) are independent if and only if \[ f_{X_1, \dots, X_n}(x_1, \dots, x_n) = \prod_{i=1}^n f_{X_i}(x_i). \]
A sequence of jointly discrete random variables \(X_1, \dots, X_n\) are independent if and only if \[ P_{X_1, \dots, X_n}(x_1, \dots, x_n) = \prod_{i=1}^n P_{X_i}(x_i). \]
Unlike independence of events, we do not need to separately check for independence of subsets of random variables because that is automatically implied due to the marginalization property.
Example 3.10 Suppose (X, Y, Z) are independent with joint PDF: \[ f_{X,Y,Z}(x,y,z) = f_X(x)\, f_Y(y)\, f_Z(z). \]
Show that \(X\) and \(Y\) are independent.
To find the joint PDF of \(X\) and \(Y\), marginalize over \(Z\): \[ f_{X,Y}(x,y) = \int_{-\infty}^{\infty} f_{X,Y,Z}(x,y,z)\, dz = \int_{-\infty}^{\infty} f_X(x)\, f_Y(y)\, f_Z(z)\, dz. \]
Since \(f_X(x)\) and \(f_Y(y)\) do not depend on \(z\): \[ f_{X,Y}(x,y) = f_X(x)\, f_Y(y) \int_{-\infty}^{\infty} f_Z(z)\, dz = f_X(x)\, f_Y(y) \cdot 1. \]
Thus, \(X\) and \(Y\) remain independent after marginalization.
An immediate implication of the definition of independence is the following.
Proposition 3.1 Let \(X\) and \(Y\) be independent random variables defined on a common probability space. Consider \(U = g(X)\) and \(V = h(Y)\) for some (measurable) functions \(g\) and \(h\). Then, \(U\) and \(V\) are independent.
Consider any (Borel) subsets \(B_1\) and \(B_2\) of \(\reals\) and consider the events \(\{ U \in B_1 \}\) and \(\{ V \in B_2 \}\). Note that
- \(\{ U \in B_1 \} = \{ X \in g^{-1}(B_1) \}\).
- \(\{ V \in B_2 \} = \{ Y \in h^{-1}(B_2) \}\).
Since the random variables \(X\) and \(Y\) are independent, the events \(\{ X \in g^{-1}(B_1) \}\) and \(\{ Y \in h^{-1}(B_2) \}\). Which implies that the events \(\{ U \in B_1 \}\) and \(\{ V \in B_2 \}\) are independent. Consequently, the random variables \(U\) and \(V\) are independent.
Proposition 3.2 Let \(X\) and \(Y\) be independent random variables defined on a common probability space. Then \(X\) and \(Y\) are independent if and only if \[\begin{equation}\label{eq:expectation-product} \EXP[ g(X) h(Y) ] = \EXP[ g(X) ] \EXP[ h(Y) ] \end{equation}\] for all (measurable) functions \(g\) and \(h\).
There are two claims here.
If \(X\) and \(Y\) are independent then \(\eqref{eq:expectation-product}\) holds.
If \(\eqref{eq:expectation-product}\) holds, then \(X\) and \(Y\) are independent.
We will prove the first claim assuming that \(X\) and \(Y\) are continuous. Similar argument works for the discrete case as well. \[\begin{align*} \EXP[ g(X) h(Y) ] &= \int_{-∞}^∞ \int_{-∞}^∞ g(x) h(y) f_{X,Y}(x,y)\, dx dy \\ &\stackrel{(a)}= \int_{-∞}^∞ \int_{-∞}^∞ g(x) h(y) f_{X}(x) f_{Y}(y)\, dy dx \\ &\stackrel{(b)}= \int_{-∞}^∞ \left[ \int_{-∞}^∞ g(x)f_{X}(x)\, dx \right]h(y) f_{Y}(y) \, dy \\ &\stackrel{(c)}= \left[ \int_{-∞}^∞ g(x)f_{X}(x)\, dx \right] \left[\int_{-∞}^∞ h(y) f_{Y}(y) \, dy \right] \\ &= \EXP[ g(X) ] \EXP [ h(Y) ] \end{align*}\] where \((a)\) follows from the fact that \(X \independent Y\), \((b)\) and \((c)\) are simple algebra, and the last step uses the definition of expectation.
To prove the second claim, pick any (Borel) subsets \(B_1\) and \(B_2\) of \(\reals\) and consider the functions \(g(x) = \IND_{B_1}(x)\) and \(h(y) = \IND_{B_2}(y)\). Observe that \[\begin{align*} \PR(X \in B_1, Y \in B_2) &= \EXP[\IND_{ \{ X \in B_1, Y \in B_2 \}}] \\ &\stackrel{(d)}= \EXP[\IND_{ \{ X \in B_1 \}} \IND_{\{ Y \in B_2 \}}] \\ &\stackrel{(e)}=\EXP[\IND_{ \{ X \in B_1 \}}\ \EXP[ \IND_{\{ Y \in B_2 \}}] \\ &\stackrel{(f)}= \PR(X \in B_1) \PR(Y \in B_2) \end{align*}\] where \((d)\) follows from basic algebra, \((e)\) follows from \(\eqref{eq:expectation-product}\), and \((f)\) follows from expectation of an indicator.
The above equation shows that for any arbitrary (Borel) subsets \(B_1\) and \(B_2\) of \(\reals\), \(\PR(X \in B_1, Y \in B_2) = \PR(X \in B_1) \PR(Y \in B_2)\). Hence, \(\{X \in B_1\} \independent \{Y \in B_2 \}\). Since \(B_1\) and \(B_2\) were arbitrary, we have \(X \independent Y\).
Example 3.11 Let \(X\) and \(Y\) be independent random variables defined on a common probability space. Show that
- \(\EXP[XY] = \EXP[X] \EXP[Y]\).
- \(\VAR(XY) = \VAR(X) \VAR(Y)\).
Definition 3.2 A collection of random variables \(X_1, \dots, X_n\) is called independent and identically distributed (i.i.d.) if all random variables are independent and have the same distribution, i.e., \[ F_{X_1}(x_1) = F_{X_2}(x_2) = \cdots = F_{X_n}(x_n). \]
3.3 Functions of random variables
In an interconnected systems, the output of one system is used as input to another system. To analyze such systems, it is important to understand how to analyze functions of random variables.
The same idea can be used for functions of multiple random variables as we illustrate via the following examples.
Example 3.12 Let \(X\) and \(Y\) be random variables defined on a common probability space. Define \[ U = \max(X,Y) \quad V = \min(X,Y). \] Find \(F_U\) and \(F_V\).
We first look at \(F_U\). By definition \[ F_U(u) = \PR(X \le u, Y \le u) = F_{X,Y}(u,u).\]
Now consider \(F_V\). The event \(\{V \le v\}\) can be expressed as \[ \{ V \le v \} = \{ X \le v \} \cup \{Y \le v \} \cap \{X \le v \} \cap \{Y \le v\}.\] Thus, \[F_V(v) = F_X(v) + F_Y(v) - F_{X,Y}(v,v). \]
Example 3.13 Suppose \(X_1\) and \(X_2\) are continuous random variables and \(Y = X_1 + X_2\). Find the PDF \(f_Y(y)\).
We can write the CDF \(F_Y(y)\) as follows: \[ F_Y(y) = \int_{-∞}^∞ \int_{-∞}^{y - x_1} f_{X_1,X_2}(x_1, x_2)\, d x_2 d x_1 \\ \] Therefore, \[\begin{align*} f_Y(y) &= \frac{d F_Y(y)}{dy} \\ &= \int_{-∞}^∞ \frac{d}{dy} \int_{-∞}^{y-x_1} f_{X_1, X_2}(x_1, x_2) \, dx_2\, dx_1 \\ &= \int_{-∞}^∞ f_{X_1, X_2}(x_1, y - x_1)\, dx_1. \end{align*}\]
Example 3.14 Repeat Example 3.13 when \(X_1\) and \(X_2\) are independent.
In this case, \(f_{X_1, X_2}(x_1, x_2) = f_{X_1}(x_1) f_{X_2}(x_2)\). Therefore, we get \[f_Y(y) = \int_{-∞}^{∞} f_{X_1}(x_1) f_{X_2}(y - x_2) d x_1 = (f_{X_1} * f_{X_2})(y)\] where \(*\) denotes convolution.
Example 3.15 Repeat Example 3.14 when \(X_1 \sim \text{Poisson}(λ_1)\) and \(X_2 \sim \text{Poisson}(λ_2)\).
Recall that for a Poisson random variable \(X\) with parameter \(λ\) \[ P_X(k) = e^{-λ} \frac{λ^k}{k!}, \quad k \ge 0 \]
Thus, \[\begin{align*} P_Y(n) &= (P_{X_1} * P_{X_2})(n) = \sum_{k=-∞}^{∞} P_{X_1}(k) P_{X_2}(n-k) \\ &=\sum_{k=0}^{n} P_{X_1}(k) P_{X_2}(n-k) \\ &= \sum_{k=0}^n e^{-λ_1 - λ_2} \frac{ λ_1^k λ_2^{n-k} }{ k! (n-k)! } \\ &= e^{-(λ_1 + λ_2)} \frac{1}{n!} \sum_{k=0}^n \frac{n!}{k!(n-k)!} λ_1^k λ_2^{n-k} \\ &= e^{-(λ_1 + λ_2)} \frac{(λ_1 + λ_2)^n}{n!} \end{align*}\]
Thus, \(Y \sim \text{Poisson}(λ_1 + λ_2)\).
3.3.1 Change of variables formulas
For continuous random variables, it is possible to obtain a general change of variable formula to obtain the PDF of functions of random variable in terms of their joint PDF. My personal view is that it is simpler to reason about such change of variables from first principles, but nonetheless it is good to know the results.
Now suppose \(\{X_1, \dots, X_n\}\) are jointly continuous random variables with joint PDF \(f\). Consider \(n\) random variables: \[\begin{align*} Y_1 &= g_1(X_1, \dots, X_n) \\ Y_2 &= g_2(X_1, \dots, X_n) \\ \vdots &= \vdots \\ Y_n &= g_n(X_1, \dots, X_n). \end{align*}\] We can view this as an equation between two \(n\)-dimensional vectors \(Y = \VEC(Y_1, \dots, Y_n)\) and \(X = \VEC(X_1, \dots, X_n)\) written as \[ Y = g(X) \]
As was the case for the scalar system, for a given \(y \in \reals^n\), the vector equation \(y = g(x)\) may have zero, one, or multiple solutions.
If \(y = g(x)\), \(y \in \reals^n\) has no solution, then \[ f_Y(y) = 0. \]
If \(y = g(x)\), \(y \in \reals^n\) has one solution \(x \in \reals^n\), then \[ f_Y(y) = \frac{f_X(x)}{\ABS{J(x)}}, \quad \text{where } y = g(x)\] and \(J(x)\) denotes the Jacobian on \(g(x)\) evaluated at \(x = (x_1, \dots, x_n)\), i.e., \[ \def\1#1#2{\dfrac{∂ g_{#1}}{∂ x_{#2}}} J(x_1, \dots, x_n) = \DET{ \1 11 & \cdots & \1 1n \\ \vdots & \vdots & \vdots \\ \1 n1 & \cdots & \1 nn } \]
If \(y = g(x)\), \(y \in \reals^n\) has multiple solutions given by \(\{x^{(1)}, \dots, x^{(m)}\}\), then \[ f_Y(y) = \sum_{k=1}^m \frac{f_X(x^{(k)})}{\ABS{J(x^{(k)})}}.\]
Example 3.16 Resolve Example 3.12 using the change of variables formula.
Let \(g_1(x,y) = \max\{x, y\}\) and \(g_2(x,y) = \min\{x,y\}\). Define \[ U = g_1(X,Y) \quad\text{and}\quad V = g_2(X,Y).\]
Define \(g(x,y) = \VEC(g_1(x,y), g_2(x,y))\). Note that \(g\) is not differentiable at \(x=y\).
When \(x > y\), we have \(g_1(x,y) = x\) and \(g_2(x,y) = y\). Thus, \[ J(x,y) = \DET{\1 11 & \1 12 \\ \1 21 & \1 22} = \DET{1 & 0 \\ 0 & 1} = 1. \]
When \(x < y\), we have \(g_1(x,y) = y\) and \(g_2(x,y) = x\). Thus, \[ J(x,y) = \DET{\1 11 & \1 12 \\ \1 21 & \1 22} = \DET{0 & 1 \\ 1 & 0} = -1. \]
We now compute \(f_{U,V}(u,v)\).
If \(u < v\), then the equation \((u,v) = g(x,y)\) has no solution. So we set \[ f_{U,V}(u,v) = 0. \]
If \(u > v\), then the equation \((u,v) = g(x,y)\) has two solutions: \(\{ (u,v), (v,u) \}\). Thus, \[ f_{U,V}(u,v) = \frac{f_{X,Y}(u,v)}{\ABS{1}} + \frac{f_{X,Y}(v,u)}{\ABS{-1}} = f_{X,Y}(u,v) + f_{X,Y}(v,u). \]
If \(u = v\), then the equation \((u,u) = g(x,y)\) has one solution \((u,u)\). Thus, \[ f_{U,V}(u,u) = f_{X,Y}(u,u). \] Note that \(u = v\) is a line in two-dimensional space. (Formally, it is a set of measure zero.) Hence, the choice of \(f_{U,V}\) at \(u = v\) will not affect any probability computations. So we can also set \[ f_{U,V}(u,u) = 0. \]
From the joint PDF \(f_{U,V}\), we can compute the marginals as follows:
For \(U\), we have \[ f_U(u) = \int_{-∞}^{∞} f_{U,V}(u,v) dv = \int_{-∞}^{u} \bigl[ f_{X,Y}(u,v) + f_{X,Y}(v,u) \bigr] dv. \] Therefore, \[ F_U(u) = \int_{-∞}^{u} f_U(\tilde u) d\tilde u = \int_{-∞}^u \int_{-∞}^{\tilde u} \bigl[ f_{X,Y}(\tilde u,v) + f_{X,Y}(v,\tilde u) \bigr] dv d\tilde u. \] Note that \[ \int_{-∞}^u \int_{-∞}^{\tilde u} f_{X,Y}(\tilde u, v) dv d\tilde u = \int_{-∞}^u \int_{-∞}^{x} f_{X,Y}(x, y) dy dx \] and \[\begin{align*} \int_{-∞}^u \int_{-∞}^{\tilde u} f_{X,Y}(v, \tilde u) dv d\tilde u &= \int_{-∞}^u \int_{-∞}^y f_{X,Y}(x,y) dx dy \\ &= \int_{-∞}^u \int_{x}^u f_{X,Y}(x,y) dy dx \end{align*}\] where the last step follows from changing the order of integration.
Substituting these back in the expression for \(F_U(u)\), we get \[ F_U(u) = \int_{-∞}^u \int_{-∞}^{x} f_{X,Y}(x, y) dy dx + \int_{-∞}^u \int_{x}^u f_{X,Y}(x,y) dy dx = \int_{-∞}^u \int_{-∞}^u f_{X,Y}(x,y) dy dx = F_{X,Y}(u,u). \]
For \(V\), we can follow similar algebra as above.
Example 3.17 Let \(X\) and \(Y\) be random variables defined on a common probability space. Define \[ U = X^2 \quad\text{and}\quad V = X + Y. \] Find \(F_{U,V}\)?
Let’s consider the system of equations \[ u = x^2 \quad\text{and}\quad v = x + y \] for a given value of \((u,v)\). First observe that \[ J(x,y) = \DET{ 2x & 0 \\ 1 & 1 } = 2x. \]
If \(u < 0\), then the system of equations has no solutions. Therefore, \[ F_{U,V}(u,v) = 0, \quad u < 0. \]
If \(u = 0\), then the system of equations has one solution: \[ x^{(1)} = 0 \quad\text{and}\quad y^{(1)} = v. \] However, \(J(0,v) = 0\). So, \[ f_{U,V}(0,v) = \frac{f_{X,Y}(0,v)}{J(0,v)} \] is undefined. However, since \(u = 0\) is a line in two-dimensions (i.e., a set of measure zero), the choice of \(f_{U,V}\) at \(u = 0\) will not affect any probability computations. So, we set \[ f_{U,V}(0,v) = 0. \]
If \(u > 0\), then the system of equations has two solutions: \[ (x^{(1)}, y^{(1)}) = (+\sqrt{u}, v - \sqrt{u}) \quad\text{and}\quad (x^{(2)}, y^{(2)}) = (-\sqrt{u}, v + \sqrt{u}) \] Therefore, \[ f_{U,V}(u,v) = \frac{f_{X,Y}(\sqrt{u}, v - \sqrt{u})}{2 \sqrt{u}} + \frac{f_{X,Y}(-\sqrt{u}, v + \sqrt{u})}{2 \sqrt{u}}. \]
3.4 Correlation and covariance
Let \(X\) and \(Y\) be random variables defined on the same probability space.
Correlation between \(X\) and \(Y\) is defined as \(\EXP[XY]\).
Covariance between \(X\) and \(Y\) is defined as \(\COV(X,Y) = \EXP[(X - μ_X) (Y - μ_Y)]\). The covariance satisfies the following: \[ \COV(X,Y) = \EXP[XY] - \EXP[X] \EXP[Y]. \]
Correlation coefficient between \(X\) and \(Y\) is defined as \[ρ_{XY} = \frac{\COV(X,Y)}{\sqrt{\VAR(X) \VAR(Y)}}.\]
The correlation coefficeint satisfies \(\ABS{ρ_{XY}} \le 1\) with equality if and only if \(\PR(aX + bY = c) = 1\) for some \(a,b,c \in \reals\). [The proof follows from Cauchy-Schwartz inequality, which we will study later]
\(X\) and \(Y\) are said to be uncorrelated if \(ρ_{XY} = 0\), which is equivalent to \(\COV(X,Y) = 0\) or \(\EXP[XY] = \EXP[X] \EXP[Y]\).
Note that \[\begin{align*} \VAR(X + Y) &= \EXP[ ((X - \EXP[X]) + (Y - \EXP[Y]) )^2 ] \\ &= \VAR(X) + \VAR(Y) + 2\COV(X,Y). \end{align*}\] Thus, when \(X\) and \(Y\) are uncorrelated, we have \[ \VAR(X + Y) = \VAR(X) + \VAR(Y). \]
Indepdent random variables are uncorrelated but the reverse in not true.
The event \(\{X = 1\}\) corresponds to \(ω = 0\) and therefore \(\{Y = 0\}\). Thus, \(X\) and \(Y\) are not independent.
Observe that
\(\displaystyle \EXP[X] = \int_{0}^{2 π} \cos ω \frac{1}{2 π}\, d ω = 0\).
\(\displaystyle \EXP[Y] = \int_{0}^{2 π} \sin ω \frac{1}{2 π}\, d ω = 0\).
\(\displaystyle \EXP[XY] = \int_{0}^{2 π} \cos ω \sin ω \frac{1}{2 π}\, d ω = \frac{1}{4{π}} \int_0^{2 π} \cos 2 ω\, d ω = 0\).
Thus, \[\EXP[XY] = \EXP[X]\EXP[Y].\]
3.4.1 Correlation and covariance for random vectors
Some of these concepts also generalize to random vectors. First, we define expected value for random vectors and random matrices.
If \(X = [X_1, \dots, X_n] \in \reals^n\), then \[ \EXP[X] = \MATRIX{ \EXP[X_1] & \cdots & \EXP[X_n] }. \]
If \(X = \MATRIX{ X_{1,1} & \cdots & X_{1,n} \\ X_{2,1} & \cdots & X_{2,n} \\ \vdots & \vdots & \vdots \\ X_{m,1} & \cdots & X_{m,n} } \in \reals^{m \times n}\) is a ranom matrix, then \[ \EXP[X] = \MATRIX{ \EXP[X_{1,1}] & \cdots & \EXP[X_{1,n}] \\ \EXP[X_{2,1}] & \cdots & \EXP[X_{2,n}] \\ \vdots & \vdots & \vdots \\ \EXP[X_{m,1}] & \cdots & \EXP[X_{m,n}] }. \]
We the above notation, we can define the following:
The correlation matrix of a random vector \(X \in \reals^n\) is defined as \[ R = \EXP[X X^\TRANS],\] where \(X^\TRANS\) denotes the transpose of \(X\).
The correlation matrix is symmetric, i.e., \(R = R^\TRANS\)
The covariance matrix of a random vector \(X \in \reals^n\) is defined as \[\COV(X) = \EXP[ (X - μ_X) (X - μ_X)^\TRANS].\]
The covariance matrix is symmetric. Moreover, \([\COV(X)]_{i,j} = \COV(X_i,X_j)\).
The cross correlation matrix of random vectors \(X \in \reals^n\) and \(Y \in \reals^m\) is a \(n × p\) matrix given by \[ R_{XY} = \EXP[X Y^\TRANS]. \]
The cross covariance matrix of random vectors \(X \in \reals^n\) and \(Y \in \reals^m\) is a \(n × p\) matrix given by \[ \COV(X,Y) = \EXP[ (X - μ_X) (Y - μ_Y)^\TRANS ]. \]
Two random vectors \(X\) and \(Y\) are called uncorrelated if \[\EXP[ (X - μ_X) (Y - μ_Y)^\TRANS ] = 0 \]
Two random vectors \(X\) and \(Y\) are called orthogonal if \[\EXP[X Y^\TRANS] = 0 \]
Both the correlation and covariance matrices are positive semidefinite. Thus, their eigenvalues are real and non-negative.