Random variables and random vectors

Author

Affiliation

Updated

November 11, 2024

In many situations, we are not directly interested in the outcome of a random experiment, but a consequence of the outcome. Such consequences may be thought of as a function of the outcome. When they are real-valued, such functions of the outcome are called random variables.

Example 1 Suppose a fair coin is tossed twice. Thus, \(Ω = \{ HH, HT, TH, TT \}\), \(\ALPHABET F = 2^Ω\), and \(\PR\) is such that all outcomes are equally likely. For an \(ω \in Ω\), let \(X(ω)\) denote the number of heads. Thus, \[ X(HH) = 2, \quad X(HT) = 1, \quad X(TH) = 1, \quad X(TT) = 0. \]

The random variable \(X \colon Ω \to \reals\) induces a probability measure on \(\reals\). Formally, to define such a probability measure, we need an associated \(σ\)-algebra on \(\reals\). As discussed in last lecture, the commonly used \(σ\)-algebra on reals is the Borel \(σ\)-algebra, \(\mathscr{B}(\reals)\). For everything to be consistent, we require the function \(X\) to satisfy a property known as measurability.

Definition 1 A random variable is a function \(X \colon Ω \to \reals\) with the property that for every \(x \in \reals\) the event \(A(x) \coloneqq \{ω \in Ω : X(ω) \le x \} \in \ALPHABET F\). Such functions are said to be \(\ALPHABET F/\mathscr{B}(\reals)\)-measurable.

The cumulative distribution function (CDF) of a random variable \(X\) is the function \(F \colon \reals \to [0,1]\) given by \[ F(x) \coloneqq \PR(A(x)) = \PR(\{ ω \in Ω : X(ω) \le x \}). \]

Some comments on notation

The standard notation in probability theory is to use uppercase letters such as \(X\), \(Y\), \(Z\), etc. to denote random variables and the corresponding lowercase letters \(x\), \(y\), \(z\), etc. to denote the possible numerical values of these variables.
Events such as \(\{ω \in Ω : X(ω) \le x \}\) are commonly abbreviated as \(\{ω : X(ω) \le x \}\) or \(\{X \le x\}\). Thus, we have
- \(\PR(X \le x) = \PR(\{ω \in Ω : X(ω) \le x \})\).
- \(\PR(X = x) = \PR(\{ω \in Ω : X(ω) = x \})\).
- \(\PR(x < X \le y) = \PR(\{ω \in Ω : x < X(ω) \le y \})\).
- For any (Borel) subset \(B\) of \(\reals\), \(\PR(X \in B) = \PR(\{ω \in Ω : X(ω) \in B \})\).
When we need to emphasize the dependence of the CDF on the random variable, we use the notation \(F_X\), etc.

For instance, for Example 1, the CDF is given by \[ F_X(x) = \begin{cases} 0, & \hbox{if } x < 0, \\ \frac 14, & \hbox{if } 0 \le x < 1, \\ \tfrac 34, & \hbox{if } 1 \le x < 2,\\ 1, &\hbox{if } 2 \le x. \end{cases}\]

Example 2 (Constant random variables) The simplest random variable takes a constant value on the whole domain \(Ω\), i.e., \[ X(ω) = c, \quad \forall ω \in Ω \] where \(c\) is a constant. The CDF \(F(x) = \PR(X \le x)\) is the step function \[ F(x) = \begin{cases} 0, & x < c \\ 1, & x \ge c. \end{cases} \]

Slightly more generally, we say that \(X\) is almost surely a constant if there exists a \(c \in \reals\) such that \(\PR(X=c) = 1\).

Example 3 (Bernoulli random variable) A Bernoulli random variable takes two possible values: value \(0\) with probability \(1-p\) and value \(1\) with probability \(p\). It’s CDF is given by \[ F(x) = \begin{cases} 0, & x < 0 \\ 1 - p, & 0 \le x < 1 \\ 1, & x \ge 1. \end{cases} \]

Example 4 (Indicator functions) Let \(A\) be an event. Define the indicator of event \(A\), denoted by \(\IND_{A} \colon Ω \to \reals\), as \[ \IND_{A}(ω) = \begin{cases} 1, & \hbox{if } ω \in A \\ 0, & \hbox{otherwise } \end{cases}.\] Observe that \(\IND_A\) is a Bernoulli random variable which takes values \(1\) and \(0\) with probabilities \(\PR(A)\) and \(1 - \PR(A)\).

Lemma 1 (Properties of CDFs)

\(\PR(X > x) = 1 - F(x)\).
\(\PR(x < X \le y) = F(y) - F(x)\).
\(\lim_{x \to -∞} F(x) = 0\) and \(\lim_{x \to +∞} F(x) = 1\).
CDFs are non-decreasing, i.e., if \(x < y\), then \(F(x) \le F(y)\).
CDFs are right continuous, i.e., \(\lim_{h \downarrow 0}F(x+h) = F(x)\).
\(\PR(X = x) = F(x) - F(x^{-})\), where \(F(x^{-})\) is defined as \(\lim_{h \downarrow 0} F(x - h)\)

Proof

By definition, \(\{X > x\}^c = Ω\setminus \{X \le x\}\). Thus, \[\PR(X > x) = 1 - \PR(X \le x) = 1 - F(x).\]
\[\PR(x < X \le y) = \PR(X \le y) - \PR(X \le x) = F(y) - F(x).\] [See assignment 1 for the first equality.]
Define the increasing sequence of events \(A_n = \{ X \le n\}\), \(n \in \naturalnumbers\). By continuity of probability, we have \[\begin{align*} &\quad & \PR\biggl( \bigcup_{n=1}^{∞} A_n \biggr) & = \lim_{n \to ∞} \PR(A_n) \\ \implies && \PR(\{X < ∞\}) &= \lim_{n \to ∞} \PR(X \le n) \\ \implies && \PR(Ω) &= \lim_{n \to ∞} F(n) \\ \implies && 1 &= \lim_{n \to ∞} F(n). \end{align*}\]

The reverse argument is similar where we consider the decreasing sequence of events \(B_n = \{ X_n \le -n \}\), \(n \in \naturalnumbers\). Then, by continuity of probability, we have \[\begin{align*} &\quad & \PR\biggl( \bigcap_{n=1}^{∞} B_n \biggr) & = \lim_{n \to ∞} \PR(B_n) \\ \implies && \PR(\{X < -∞\}) &= \lim_{n \to ∞} \PR(X \le -n) \\ \implies && \PR(\emptyset) &= \lim_{n \to ∞} F(-n) \\ \implies && 0 &= \lim_{n \to ∞} F(-n). \end{align*}\]
Recall that \[\begin{align*} F(x) &= \PR(X \le x) \\ F(y) &= \PR(X \le y) \end{align*}\] Observe that since \(x < y\), we have \(\{X \le x\} \subseteq \{X \le y\}\). Hence, by monotonicity of probability, we have \[\PR(X \le x) \le \PR(X \le y),\] which proves the result.
Consider the decreasing sequence of sets: \[ A_n = \{ X \le x + \tfrac 1n \}, \quad n \in \naturalnumbers. \] Then, by continuity of probability, we have \[\begin{align*} &\quad & \PR\biggl( \bigcap_{i=1}^{∞} A_n \biggr) & = \lim_{n \to ∞} \PR(A_n) \\ \implies && \PR(X \le x) &= \lim_{n \to ∞} \PR(X \le x + \tfrac 1n ) \\ \implies && F(x) &= \lim_{n \to ∞} F(x + \tfrac 1n). \end{align*}\]
Define the decreasing sequence of sets \[ A_n = \biggl\{ x - \frac 1n < X \le x \biggr\}, \quad n \in \naturalnumbers.\] Observe that by the previous property \[\PR(A_n) = F(x) - F(x - \tfrac 1n).\] Since \(A_n\) is a decreasing sequence of sets, we have \[\begin{align*} & \quad & \PR\biggl( \bigcap_{n=1}^∞ A_n \biggr) &= \lim_{n \to ∞} \PR(A_n) \\ \implies && \PR(X = x) &= F(x) - \lim_{n \to ∞} F(x - \tfrac 1n) \\ &&& = F(x) - F(x^{-}). \end{align*}\]

Exercise 1 For \(x < y\), express the following in terms of the CDF:

\(\PR(x \le X \le y)\).
\(\PR(x \le X < y)\).

1 Classification of random variables

There are three types of random variables

A random variable \(X\) is said to be discrete if it takes values in a finite or countable subset \(\text{range}(X) \coloneqq \{x_1, x_2, \dots\}\) of \(\reals\). A discrete random variable has a probability mass function (PMF) \(p \colon \reals \to [0,1]\) which satisfies the following properties:
- \(p(x) = \PR(X = x) = F(x) - F(x^{-})\).
- \(F(x) = \sum_{x_n : x_n \le x} p(x_n).\)
Thus, for a discrete random variable, the CDF is a piecewise constant function
A random variable \(X\) is called continuous if there exists an integrable function \(f \colon \reals \to [0, ∞)\) called the probability denisity function such that the CDF can be written as \[ F(x) = \int_{-∞}^x f(x) dx. \]

Thus, for a continuous random variable, the CDF is a continuous function
A random variable is called mixed if it is neither discrete nor continuous. For a mixed random variable, the CDF has has jumps at a finite or countable infinite number of points and it is continuous over one or many intervals.

As an example, consider the following random experiment. A fair coin is tossed: if the outcome is heads, then \(X \sim \text{Bernoulli}(0.5)\); if the outcome is tails; then \(X \sim \text{Uniform}[0,1]\). Thus (from the law of total probability), the CDF of \(X\) is \[ F_X(x) = \begin{cases} 0, & \hbox{if } x < 0 \\ \frac 14 & \hbox{if } x = 0 \\ \frac 14 + \frac x2 & \hbox{if } 0 < x < 1 \\ 1 & \hbox{if } x \ge 1. \end{cases}. \]

\(σ\)-algebra generated by random variables

A discrete random variable creates a partition of the sample space. In particular, suppose \(X\) is a random variable and \(\{x_1, x_2, \dots\}\) is the range of \(X\). Define \[A_n = \{ω \in Ω : X(ω) = x_n \} = X^{-1}(x_n)\] Then, \(\{A_1, A_2, \dots \}\) is a partition of \(Ω\).

Proof that it is a partition

To show that \(\{A_1, A_2, \dots \}\) forms a partition, we need to establish two properties:

\(A_i \cap A_j = \emptyset\).
\(\bigcup_{i=1}^∞ A_i = Ω\).

The details are left as an exercise.

The power-set of \(\{A_1, A_2, \dots\}\) is called the \(σ\)-algebra generated by \(X\) and denoted by \(σ(X)\). This \(σ\)-algebra captures the crux of measurability. As an illustration, let’s reconsider Example 1. In this case, the range of \(X\) is \(\{0, 1, 2\}\). The partition corresponding to \(σ(X)\) is shown in Figure 1.

Figure 1: Illustration of \(σ(X)\) for Example 1

Lemma 2 (Properties of PMFs and PDFs)

Properties of PMFs
1. For a discrete random variable, \(\sum_{x \in \text{range}(X)}p(x) = 1\).
2. For any event \(A \in \ALPHABET F\), \(\PR(X \in A) = \sum_{x \in \text{range}(X) \cap A} p(x)\).
Properties of PDFs
1. For a continuous random variable, \(\int_{-∞}^{∞} f(x)\, dx = 1\).
2. For any event \(A \in \ALPHABET F\), \(\PR(X \in A) = \int_{x \in A} f(x)\,dx\).
3. The PDF is the derivative of CDF: \[ f_X(x) = \frac{d}{dx} F_X(x). \]

1.1 Some examples of discrete random variables

We now consider some other examples of discrete random variables

Example 5 (Binomial random variable) A Binomial random variable is the sum of intendant and identically Bernoulli random variables (we will prove this fact later). For example, if a biased coin (with \(\PR(H) = p\)) is tossed \(n\) times, then the number of heads is a binomial random variable with parameters \(n\) and \(p\), which is denoted by \(\text{Binomial}(n,p)\). For such a random variable, \[ p_X(k) = \binom n k p^k (1-p)^{n-k}, \quad 0 \le k \le n. \]

Example 6 (Geometric random variable) A geometric random variable is the number of trails in i.i.d. Bernoulli random variables. For example, if a biased coin (with \(\PR(H) = p\)) is tossed repeated, the number of tosses needed for the first head is a geometric random variable with parameter \(p \in (0,1)\)_, which is denoted by \(\text{Geo}(p)\). For such a random variable, \[ p_X(k) = (1-p)^{k-1} p, \quad k \in \integers_{> 0}. \]

Example 7 (Poisson random variable) Poisson random variables model many different phenomenon ranging from photoelectric effect in photonics to inter-packet arrival times in computer networks. A random variable is said to Poisson random variable with parameter \(λ > 0\), which is denoted by \(\text{Poisson}(λ)\), if \[ p_X(k) = \frac{λ^k}{k!} e^{-λ}, \quad k \in \integers_{\ge 0}. \]

Example 8 (Uniform random variable) A random variable is said to have a (discrete) uniform distribution over a discrete set \(\ALPHABET S\) if \[p_X(k) = \frac 1{\ABS{\ALPHABET S}}, \quad k \in \ALPHABET S.\]

1.2 Some examples of continuous random variables

Example 9 (Uniform random variable) A random variable is said to have a (continuous) uniform distribution over an interval \([a, b]\), where \(a < b\) if \[f(x) = \frac 1{b - a}, \quad x \in [a,b].\]

Example 10 (Exponential random variable) A random variable is said to have an exponential distribution with parameter \(λ > 0\), which is denoted by \(\text{exp}(λ)\) if \[f(x) = λ e^{-λ x}, \quad x \ge 0.\]

Example 11 (Gaussian random variable) A random variable is said to have a Gaussian distribution with mean \(μ\) and standard deviation \(σ > 0\), which is denoted by \(\mathcal N(μ, σ^2)\) if \[f(x) = \frac 1{\sqrt{2 π}\, σ} \exp\left( -\frac {(x-μ)^2}{2 σ^2} \right), \quad x \in \reals.\]

2 Expectation of random variables

Suppose we generate \(N\) i.i.d. (independent and identically distributed) samples \(\{s_1, s_2, \dots, s_N\}\) of a random variable \(X\) and compute the average: \[ m = \frac 1N \sum_{n=1}^N s_n. \] When \(X\) is discrete and takes values \(\{x_1, x_2, \dots, x_n\}\), we expect that the number of times we obtain a value \(x_i\) is approximately \(Np(x_i)\) when \(N\) is large. Thus, \[ m \approx \frac 1N \sum_{i=1}^n x_i \, N p(x_i) = \sum_{i=1}^n x_i p(x_i). \]

This quantity is called the expectation or the expected value or the mean value of the random variable \(X\) and denoted by \(\EXP[X]\).

Definition 2 The expectation of a random variable \(X\) is defined as follows:

when \(X\) is discrete and takes values \(\{x_1, x_2, \dots, x_n \}\), then \[\EXP[X] = \sum_{i=1}^n x_i p(x_i).\]
when \(X\) is continuous, then \[\EXP[X] = \int_{-∞}^{∞} x f(x)\, dx. \]

Thus, we can think of the expected value as the center of mass of the PDF.

Does the summation or integration exist?

When \(X\) takes countably or uncountably infinite values, we need to be a bit more precise by what we mean by the summation (or the integration) formula above. In particular, we do not want the answer to depend on the order in which we do the summation or the integration (i.e., we do not want \(∞ - ∞\) situation). This means that the sum or the integral should be :absolutely convergent. Such random variables are called integrable random variables.

Formally, expectation is defined only for integrable random variables.

To illustrate why this is important, consider a discrete random variable defined over \(\integers\setminus\{0\}\) where \[ p(n) = p(-n) = \frac {1}{2C n^2}, \quad n \in \naturalnumbers \] where \(C\) is a normalizing constant given by \[ C = \sum_{n=1}^∞ \frac 1{n^2} = \frac{π^2}{6}. \] Then, observe that \[\begin{align*} \EXP[X] &= \sum_{n=1}^∞ \frac{n}{2 C n^2} + \sum_{n=-∞}^{-1} \frac{n}{2 C n^2} \\ &= \frac 1{2C} \sum_{n=1}^∞ \frac{1}{n} + \frac 1{2C} \sum_{n=-∞}^{-1} \frac{1}{n} \\ &= \frac{∞}{2C} - \frac{∞}{2C} \end{align*}\] which is undefined.

The concern here is that the summation is undefined. Mathematically, we are okay when the summation is infinity. For example, consider another random variable \(Y\) defined over \(\naturalnumbers\) for which \[ p(n) = \frac {1}{C n^2}, \quad n \in \naturalnumbers \] where \(C\) is as defined above. This is called the Zipf distribution. By following an argument same as above, we see that \[\EXP[Y] = ∞.\]

Exercise 2 Find the expected value of the random variables with the following distributions:

\(\text{Bernoulli}(p)\).
\(\text{Binomial}(n,p)\).
\(\text{Geo}(p)\).
\(\text{Poisson}(λ)\).
\(\text{Uniform}[a,b]\).
\(\text{Exp}(λ)\).

Lemma 3 For any (measurable) function \(g \colon \reals \to \reals\), we have

when \(X\) is discrete and takes values \(\{x_1, x_2, \dots, x_n \}\), then \[\EXP[g(X)] = \sum_{i=1}^n g(x_i) p(x_i).\]
when \(X\) is continuous, then \[\EXP[g(X)] = \int_{-∞}^{∞} g(x) f(x)\, dx. \]

Both expressions are defined only when the sum/integral is absolutely convergent.

How to avoid a proof

This result is sometimes called *the law of the unconscious statistician (LOTUS). One typically shows this result by defining a new random variable \(Y = g(X)\), computing its PMF/PDF \(f_Y\) and then using the definition in Definition 2.

A simpler proof is to define expectation by Lemma 3 for any (measurable) function \(g\). Then the definition of Definition 2 falls off as a special case for \(g(x) = x\). No proofs needed!

Exercise 3 Suppose \(X \sim \text{Unif}[-1,1]\). Compute \(\EXP[X^2]\).

Lemma 4 (Properties of expectation)

Linearity. For any (measurable) functions \(g\) and \(h\) \[\EXP[g(X) + h(X)] = \EXP[ g(X)] + \EXP[ h(X) ]. \] As a special case, for a constant \(c\), \[\EXP[X + c] = \EXP[X] + c.\]
Scaling. For any constant \(c\), \[\EXP[cX] = c\EXP[X].\]
Bounds. If \(a \le X(ω) \le b\) for all \(ω \in Ω\), then \[ a \le \EXP[X] \le b. \]
Indicator of events. For any (Borel) subset \(B\) of \(\reals\), we have \[\EXP[ \IND_{\{ X \in B \}}] = \PR(X \in B). \]

A continuous random variable is said to be symmetric if \(f_X(-x) = f_X(x)\) for all \(x \in \reals\). A symmetric random variable has mean \(0\).
A continuous random variable is said to be symmetric around \(m\) if \(f(m - x) = f(m + x)\), for all \(x \in \reals\). The mean of such a random variable is \(m\).

2.1 Higher moments

The \(m\)-th moment, \(m \ge 1\) of a random variable \(X\) is defined as \(\EXP[X^m]\).
The \(m\)-th central moment is defined as \(\EXP[(X - μ)^m]\), where \(μ = \EXP[X]\).
For second central moment (i.e., \(m=2\)) is called variance. The variance satisfies the following: \[\VAR(X) = \EXP[X^2] - (\EXP[X])^2.\]
The positive square root of variance is called the standard deviation. Variance is often denoted by \(σ^2\) and the standard deviation by \(σ\).

Lemma 5 (Properties of variance)

Scaling. For any constant \(c\), \[\VAR(cX) = c^2 \VAR(X).\]
Shift invariance. For any constant \(c\), \[\VAR(X + c) = \VAR(X).\]

The mean and variance of common random variables is show in Table 1

Table 1: Mean and variance of common random variables

Random variable	Parameter(s)	Mean	Variance
Bernoulli	\(p\)	\(p\)	\(p(1-p)\)
Binomial	\((n,p)\)	\(np\)	\(np(1-p)\)
Geometric	\(p\)	\(\dfrac 1p\)	\(\dfrac{1-p}{p}\)
Poisson	\(λ\)	\(λ\)	\(λ\)
Uniform	\((a,b)\)	\(\frac 12 (a+b)\)	\(\frac 1{12}(b-a)^2\)
Exponential	\(λ\)	\(\dfrac 1 λ\)	\(\dfrac 1{λ^2}\)
Gaussian	\((μ,σ)\)	\(μ\)	\(σ^2\)

3 Random vectors and joint distributions.

Suppose \(X\) and \(Y\) are two random variables defined on the same probability space. The CDFs \(F_X\) and \(F_Y\) provide information about their individual probabilities. To understand how they behave together, we need to think of the random vector \((X,Y)\) taking values in \(\reals^2\). The natural way to do so is to think of the joint CDF \[ F_{X,Y}(x,y) = \PR(\{ ω \in Ω : X(ω) \le x, Y(ω) \le y \}) \] where we may write the right hand side as \(\PR(X \le x, Y \le y)\) for short.

Lemma 6 (Properties of CDFs)

Regularity properties
1. \(\lim_{x \to -∞} F_{X,Y}(x,y) = 0\), \(\lim_{y \to -∞} F_{X,Y}(x,y)\) and \(\lim_{x,y \to +∞} F(x,y) = 1\).
2. Joint CDFs are non-decreasing, i.e., if \((x_1,y_1) < (x_2, y_2)\), then \(F_{X,Y}(x_1,y_1) \le F_{X,Y}(x_2,y_2)\).
3. Joint CDFs are continuous from above, i.e., \[\lim_{u,v \downarrow 0}F_{X,Y}(x+u,y+v) = F_{X,Y}(x,y).\]
4. \(\PR(X = x, Y = y) = F(x,y) - F(x^{-},y^{-})\).
Marginalization of joint CDFs
1. \(\lim_{y \to ∞} F_{X,Y}(x,y) = F_X(x)\)
2. \(\lim_{x \to ∞} F_{X,Y}(x,y) = F_Y(y)\)

Exercise 4 Consider random variables \(X\) and \(Y\) with joint CDF \(F\). Show that \[ \PR(a < X \le b, c < Y \le d) = F(b,d) - F(a,d) - F(b,c) + F(a,c). \]

3.1 Classification of random vectors

As was the case for random variables, we can also classify random vectors as discrete, continuous, and mixed.

A random vector \((X,Y)\) is called jointly discrete if it takes values in a countable subset of \(\reals^2\) (we denote this subset by \(\text{range}(X,Y)\)). The jointly discrete random variables have a joint PMF \(f \colon \reals \to [0,1]\) given by \[ \PR(X = x, Y = y) = p(x,y). \]
A random vector \((X, Y)\) is called jointly continuous if its CDF can be expressed as \[ F(x,y) = \int_{-∞}^x \int_{-∞}^{y} f(u,v)\, du dv, \quad x,y \in \reals \] for some integrable function \(f \colon \reals^2 \to [0, ∞)\) which is called the joint PDF.
A random vector \((X,Y)\) is called jointly mixed if it is neither jointly discrete nor jointly continuous.

Lemma 7 (Properties of PMFs and PDFs)

Properties of PMFs
1. Normalization. For a jointly discrete random vector \((X,Y)\), \[\sum_{x,y \in \text{range}(X,Y)}p_{X,Y}(x,y) = 1.\]
2. For any event \(A \in \ALPHABET F\), \[\PR((X,Y) \in A) = \sum_{(x,y) \in \text{range}(X,Y) \cap A} p_{X,Y}(x,y).\]
3. Marginalization.
  - \(\displaystyle \sum_{x \in \text{range(X)}} p_{X,Y}(x,y) = p_Y(y)\).
  - \(\displaystyle \sum_{y \in \text{range(Y)}} p_{X,Y}(x,y) = p_X(x)\).
Properties of PDFs
1. Normalization. For a jointly continuous random vector \((X,Y)\), \[\int_{-∞}^{∞} \int_{-∞}^{∞} f_{X,Y}(x,y)\, dxdy = 1.\]
2. For any event \(A \in \ALPHABET F\), \[\PR((X,Y) \in A) = \iint_{(x,y) \in A} f_{X,Y}(x,y)\,dxdy.\]
3. Marginalization.
  - \(\displaystyle \int_{-∞}^{∞} f_{X,Y}(x,y) dx = f_Y(y)\).
  - \(\displaystyle \int_{-∞}^{∞} f_{X,Y}(x,y) dy = f_X(x)\).

The above discussion generalizes in the obvious manner to more than two random variables as well. Thus, we can talk about random vectors \(X = (X_1, \dots, X_n) \in \reals^n\). In practice, we often do not make a distinction between random variables and random vectors and refer both of them simply as random variables.

4 Independence of random vectors

Definition 3 Two random variables \(X\) and \(Y\) defined on a common probability space \((Ω, \ALPHABET F, \PR)\) are said to be independent if the sigma algebras \(σ(X)\) and \(σ(Y)\) are independent.

The above definition means that if we take any (Borel) subsets \(B_1\) and \(B_2\) of \(\reals\), then the events \(\{X \in B_1\}\) and \(\{X \in B_2\}\) are independent, i.e., \[ \PR(X \in B_1, Y \in B_2) = \PR(X \in B_1) \PR(Y \in B_2). \]

Using this, we can show that following:

\(X\) and \(Y\) are independent if and only if \[ F_{X,Y}(x,y) = F_X(x) F_Y(y), \quad \forall x, y \in \reals. \]
Two jointly continuous random variables \(X\) and \(Y\) are independent if and only if \[ f_{X,Y}(x,y) = f_X(x) f_Y(y), \quad \forall x, y \in \reals. \]
Two jointly discrete random variables \(X\) and \(Y\) are independent if and only if \[ p_{X,Y}(x,y) = p_X(x) p_Y(y), \quad \forall x, y \in \reals. \]

An immediate implication of the above definition is the following.

Proposition 1 Let \(X\) and \(Y\) be independent random variables defined on a common probability space. Consider \(U = g(X)\) and \(V = h(Y)\) for some (measurable) functions \(g\) and \(h\). Then, \(U\) and \(V\) are independent.

Proof

Consider any (Borel) subsets \(B_1\) and \(B_2\) of \(\reals\) and consider the events \(\{ U \in B_1 \}\) and \(\{ V \in B_2 \}\). Note that

\(\{ U \in B_1 \} = \{ X \in g^{-1}(B_1) \}\).
\(\{ V \in B_2 \} = \{ Y \in h^{-1}(B_2) \}\).

Since the random variables \(X\) and \(Y\) are independent, the events \(\{ X \in g^{-1}(B_1) \}\) and \(\{ Y \in h^{-1}(B_2) \}\). Which implies that the events \(\{ U \in B_1 \}\) and \(\{ V \in B_2 \}\) are independent. Consequently, the random variables \(U\) and \(V\) are independent.

Proposition 2 Let \(X\) and \(Y\) be independent random variables defined on a common probability space. Then \(X\) and \(Y\) are independent if and only if \[\begin{equation}\label{eq:expectation-product} \EXP[ g(X) h(Y) ] = \EXP[ g(X) ] \EXP[ h(Y) ] \end{equation}\] for all (measurable) functions \(g\) and \(h\).

Proof

There are two claims here.

If \(X\) and \(Y\) are independent then \(\eqref{eq:expectation-product}\) holds.
If \(\eqref{eq:expectation-product}\) holds, then \(X\) and \(Y\) are independent.

We will prove the first claim assuming that \(X\) and \(Y\) are continuous. Similar argument works for the discrete case as well. \[\begin{align*} \EXP[ g(X) h(Y) ] &= \int_{-∞}^∞ \int_{-∞}^∞ g(x) h(y) f_{X,Y}(x,y)\, dx dy \\ &\stackrel{(a)}= \int_{-∞}^∞ \int_{-∞}^∞ g(x) h(y) f_{X}(x) f_{Y}(y)\, dy dx \\ &\stackrel{(b)}= \int_{-∞}^∞ \left[ \int_{-∞}^∞ g(x)f_{X}(x)\, dx \right]h(y) f_{Y}(y) \, dy \\ &\stackrel{(c)}= \left[ \int_{-∞}^∞ g(x)f_{X}(x)\, dx \right] \left[\int_{-∞}^∞ h(y) f_{Y}(y) \, dy \right] \\ &= \EXP[ g(X) ] \EXP [ h(Y) ] \end{align*}\] where \((a)\) follows from the fact that \(X \independent Y\), \((b)\) and \((c)\) are simple algebra, and the last step uses the definition of expectation.

To prove the second claim, pick any (Borel) subsets \(B_1\) and \(B_2\) of \(\reals\) and consider the functions \(g(x) = \IND_{B_1}(x)\) and \(h(y) = \IND_{B_2}(y)\). Observe that \[\begin{align*} \PR(X \in B_1, Y \in B_2) &= \EXP[\IND_{ \{ X \in B_1, Y \in B_2 \}}] \\ &\stackrel{(d)}= \EXP[\IND_{ \{ X \in B_1 \}} \IND_{\{ Y \in B_2 \}}] \\ &\stackrel{(e)}=\EXP[\IND_{ \{ X \in B_1 \}}\ \EXP[ \IND_{\{ Y \in B_2 \}}] \\ &\stackrel{(f)}= \PR(X \in B_1) \PR(Y \in B_2) \end{align*}\] where \((d)\) follows from basic algebra, \((e)\) follows from \(\eqref{eq:expectation-product}\), and \((f)\) follows from expectation of an indicator.

The above equation shows that for any arbitrary (Borel) subsets \(B_1\) and \(B_2\) of \(\reals\), \(\PR(X \in B_1, Y \in B_2) = \PR(X \in B_1) \PR(Y \in B_2)\). Hence, \(\{X \in B_1\} \independent \{Y \in B_2 \}\). Since \(B_1\) and \(B_2\) were arbitrary, we have \(X \independent Y\).

5 Correlation and covariance

Let \(X\) and \(Y\) be random variables defined on the same probability space.

Correlation between \(X\) and \(Y\) is defined as \(\EXP[XY]\).
Covariance between \(X\) and \(Y\) is defined as \(\COV(X,Y) = \EXP[(X - μ_X) (Y - μ_Y)]\). The covariance satisfies the following: \[ \COV(X,Y) = \EXP[XY] - \EXP[X] \EXP[Y]. \]
Correlation coefficient between \(X\) and \(Y\) is defined as \[ρ_{XY} = \frac{\COV(X,Y)}{\sqrt{\VAR(X) \VAR(Y)}}.\]
The correlation coefficeint satisfies \(\ABS{ρ_{XY}} \le 1\) with equality if and only if \(\PR(aX + bY = c) = 1\) for some \(a,b,c \in \reals\). [The proof follows from Cauchy-Schwartz inequality, which we will study later]
\(X\) and \(Y\) are said to be uncorrelated if \(ρ_{XY} = 0\), which is equivalent to \(\COV(X,Y) = 0\) or \(\EXP[XY] = \EXP[X] \EXP[Y]\).
Note that \[\begin{align*} \VAR(X + Y) &= \EXP[ ((X - \EXP[X]) + (Y - \EXP[Y]) )^2 ] \\ &= \VAR(X) + \VAR(Y) + 2\COV(X,Y). \end{align*}\] Thus, when \(X\) and \(Y\) are uncorrelated, we have \[ \VAR(X + Y) = \VAR(X) + \VAR(Y). \]
Indepdent random variables are uncorrelated but the reverse in not true.

Example 12 Consider the probability space \((Ω, \ALPHABET F, \PR)\) where \(Ω = [0, 2 π)\), \(\ALPHABET F\) is the Borel \(σ\)-algebra on \([0, 2 π)\) and \(\PR\) is the uniform distribution on \(Ω\). Define \(X(ω) = \cos ω\) and \(Y(ω) = \sin ω\). Show that \(X\) and \(Y\) are uncorrelated but not independent.

Solution

The event \(\{X = 1\}\) corresponds to \(ω = 0\) and therefore \(\{Y = 0\}\). Thus, \(X\) and \(Y\) are not independent.

Observe that

\(\displaystyle \EXP[X] = \int_{0}^{2 π} \cos ω \frac{1}{2 π}\, d ω = 0\).
\(\displaystyle \EXP[Y] = \int_{0}^{2 π} \sin ω \frac{1}{2 π}\, d ω = 0\).
\(\displaystyle \EXP[XY] = \int_{0}^{2 π} \cos ω \sin ω \frac{1}{2 π}\, d ω = \frac{1}{4{π}} \int_0^{2 π} \cos 2 ω\, d ω = 0\).

Thus, \[\EXP[XY] = \EXP[X]\EXP[Y].\]

5.1 Correlation and covariance for random vectors

Some of these concepts also generalize to random vectors. First, we define expected value for random vectors and random matrices.

If \(X = [X_1, \dots, X_n] \in \reals^n\), then \[ \EXP[X] = \MATRIX{ \EXP[X_1] & \cdots & \EXP[X_n] }. \]
If \(X = \MATRIX{ X_{1,1} & \cdots & X_{1,n} \\ X_{2,1} & \cdots & X_{2,n} \\ \vdots & \vdots & \vdots \\ X_{m,1} & \cdots & X_{m,n} } \in \reals^{m \times n}\) is a ranom matrix, then \[ \EXP[X] = \MATRIX{ \EXP[X_{1,1}] & \cdots & \EXP[X_{1,n}] \\ \EXP[X_{2,1}] & \cdots & \EXP[X_{2,n}] \\ \vdots & \vdots & \vdots \\ \EXP[X_{m,1}] & \cdots & \EXP[X_{m,n}] }. \]

We the above notation, we can define the following:

The correlation matrix of a random vector \(X \in \reals^n\) is defined as \[ R = \EXP[X X^\TRANS],\] where \(X^\TRANS\) denotes the transpose of \(X\).
The correlation matrix is symmetric, i.e., \(R = R^\TRANS\)
The covariance matrix of a random vector \(X \in \reals^n\) is defined as \[\COV(X) = \EXP[ (X - μ_X) (X - μ_X)^\TRANS].\]
The covariance matrix is symmetric. Moreover, \([\COV(X)]_{i,j} = \COV(X_i,X_j)\).
The cross correlation matrix of random vectors \(X \in \reals^n\) and \(Y \in \reals^m\) is a \(n × p\) matrix given by \[ R_{XY} = \EXP[X Y^\TRANS]. \]
The cross covariance matrix of random vectors \(X \in \reals^n\) and \(Y \in \reals^m\) is a \(n × p\) matrix given by \[ \COV(X,Y) = \EXP[ (X - μ_X) (Y - μ_Y)^\TRANS ]. \]
Two random vectors \(X\) and \(Y\) are called uncorrelated if \[\EXP[ (X - μ_X) (Y - μ_Y)^\TRANS ] = 0 \]
Two random vectors \(X\) and \(Y\) are called orthogonal if \[\EXP[X Y^\TRANS] = 0 \]
Both the correlation and covariance matrices are positive semidefinite. Thus, their eigenvalues are real and non-negative.

6 Functions of random variables

In an interconnected systems, the output of one system is used as input to another system. To analyze such systems, it is important to understand how to analyze functions of random variables.

In particular, let \(X\) be a random variable defined on \((Ω, \ALPHABET F, \PR)\). Suppose \(g \colon \reals \to \reals\) is a (measurable) function. Define \(Y = g(X)\).

Since \(g\) is measurable, for any (Borel) subset of \(\reals\), we have that \(C = g^{-1}(B) \in \mathscr B(\reals)\). Therefore, \(X^{-1}(C) \in \ALPHABET F\). Thus, we can think of \(Y\) as a random variable.
Since \(Y\) is a random variable, it is possible to compute its CDF and PMF/PDF as appropriate. We illustrate this concept via some examples.

Example 13 Suppose \(X \sim \text{Uniform}[0,2]\). Consider a function \(g\) given by \[ g(x) = \begin{cases} x & x \in (0,1] \\ 1- x & x \in (1,2] \\ 0 & \hbox{otherwise} \end{cases} \] Define \(Y = g(X)\). Find \(F_Y(y)\) and \(f_Y(y)\).

Solution

From the definition of \(g\), we know that the rannge of \(g\) is \([0,1]\). Thus, we know that the support of \(Y\) is \([0,1]\).

For any \(y < 0\), the event \(\{Y \le y\} = \emptyset\). Therefore, \(F_Y(y) = 0\).
For any \(y > 1\), the event \(\{Y \le y\} = Ω\). Therefore, \(F_Y(y) = 1\).
Now consider a \(y \in (0,1)\). We have \[ \{Y \le y \} = \{ X \le y \} \cup \{X \ge 2 - y \}. \] Thus, \[ F_Y(y) = F_X(y) + F_X(2-y) = \frac {y}{2} + 1 - \frac{2-y}{2} = y. \] Thus, \[ f_Y(y) = \dfrac{d}{dy} F_Y(y) = 1, \quad y \in [0,1]. \] Thus, \(Y\) is \(\text{Uniform}[0,1]\).

Example 14 Suppose \(X \sim \text{Uniform}[0,4]\). Consider a function \(g\) given by \[ g(x) = \begin{cases} x & x \in (0,1] \\ 1 & x \in (1, 3) \\ 4- x & x \in (3,4] \\ 0 & \hbox{otherwise} \end{cases} \] Define \(Y = g(X)\). Find \(F_Y(y)\) and \(f_Y(y)\).

The same idea can be used for functions of multiple random variables as we illustrate via the following examples.

Example 15 Suppose \(X \sim \mathcal{N}(μ,σ^2)\). Show that \(Z = (X - μ)/σ\) is a standard normal random variable, i.e., \(Z \sim \mathcal{N}(0,1)\).

Solution

We can write the CDF \(F_Z(z)\) as \[\begin{align*} F_Z(z) &= \PR(Z \le z) = \PR\left( \frac{X - μ}{σ} \le z \right) = \PR(X \le σ z + μ) \\ &= \int_{-∞}^{σ z + μ} \frac{1}{\sqrt{2 π}\, σ} \exp\left( - \frac{(x-μ)^2}{2 σ^2}\, dx \right) \\ &= \int_{-∞}^{z} \frac{1}{\sqrt{2 π}} \exp\left( - \frac{y^2}{2}\, dy \right) \end{align*}\] where the last step uses the change of variables \(y = (x-μ)/σ\).

Thus, \[f_Z(z) = \frac{d F_Z(z)}{dz} = \frac{1}{\sqrt{2 π}} e^{-z^2/2}.\] Thus, \(Z \sim \mathcal{N}(0,1)\).

Example 16 Suppose \(X \sim \text{Uniform}[0,1]\), and \(F \colon \reals \to [0,1]\) is a function that satisfies the following properties: there exist a pair \((a,b)\) with \(a < b\) (we allow \(a\) to be \(-∞\) and \(b\) to be \(∞\)) such that

\(F(y) = 0\) for \(y \le a\)
\(F(y) = 1\) for \(y \ge b\)
\(F(y)\) is strctly increasing in \((a,b)\).

Thus, \(F\) satisifies the properties of the CDF of a continuous random variable and \(F\) is invertible in the interval \((a,b)\).

Define \(Y = F^{-1}(X)\). Show that \(F_Y(y) = F(y)\).

Solution

We can write the CDF \(F_Y(y)\) as \[ F_Y(y) = \PR(Y \le y) = \PR(F^{-1}(X) \le y) \]

Since \(F\) is strictly increasing, \(F^{-1}(X) \le y\) is equivalent to \(X \le F(y)\). Thus, \[ F_Y(y) = \PR(X \le F(y)) = F_X(F(y)) = F(y) \] where the last step uses the fact tht \(X\) is uniform over \([0,1]\).

Example 17 Suppose \(X_1\) and \(X_2\) are continuous random variables and \(Y = X_1 + X_2\). Find the PDF \(f_Y(y)\).

Solution

We can write the CDF \(F_Y(y)\) as follows: \[ F_Y(y) = \int_{-∞}^∞ \int_{-∞}^{y - x_1} f_{X_1,X_2}(x_1, x_2)\, d x_2 d x_1 \\ \] Therefore, \[\begin{align*} f_Y(y) &= \frac{d F_Y(y)}{dy} \\ &= \int_{-∞}^∞ \frac{d}{dy} \int_{-∞}^{y-x_1} f_{X_1, X_2}(x_1, x_2) \, dx_2\, dx_1 \\ &= \int_{-∞}^∞ f_{X_1, X_2}(x_1, y - x_1)\, dx_1. \end{align*}\]

Example 18 Repeat Example 17 when \(X_1\) and \(X_2\) are independent.

Solution

In this case, \(f_{X_1, X_2}(x_1, x_2) = f_{X_1}(x_1) f_{X_2}(x_2)\). Therefore, we get \[f_Y(y) = \int_{-∞}^{∞} f_{X_1}(x_1) f_{X_2}(y - x_2) d x_1 = (f_{X_1} * f_{X_2})(y)\] where \(*\) denotes convolution.

Example 19 Repeat Example 18 when \(X_1 \sim \text{Poisson}(λ_1)\) and \(X_2 \sim \text{Poisson}(λ_2)\).

Solution

Recall that for a Poisson random variable \(X\) with parameter \(λ\) \[ p_X(k) = e^{-λ} \frac{λ^k}{k!}, \quad k \ge 0 \]

Thus, \[\begin{align*} p_Y(n) &= (p_{X_1} * P_{X_2})(n) = \sum_{k=-∞}^{∞} p_{X_1}(k) p_{X_2}(n-k) \\ &=\sum_{k=0}^{n} p_{X_1}(k) p_{X_2}(n-k) \\ &= \sum_{k=0}^n e^{-λ_1 - λ_2} \frac{ λ_1^k λ_2^{n-k} }{ k! (n-k)! } \\ &= e^{-(λ_1 + λ_2)} \frac{1}{n!} \sum_{k=0}^n \frac{n!}{k!(n-k)!} λ_1^k λ_2^{n-k} \\ &= e^{-(λ_1 + λ_2)} \frac{(λ_1 + λ_2)^n}{n!} \end{align*}\]

Thus, \(Y \sim \text{Poisson}(λ_1 + λ_2)\).

Example 20 Let \(X\) and \(Y\) be random variables defined on a common probability space. Define \[ U = \max(X,Y) \quad V = \min(X,Y). \] Find \(F_U\) and \(F_V\).

Solution

We first look at \(F_U\). By definition \[ F_U(u) = \PR(X \le u, Y \le u) = F_{X,Y}(u,u).\]
Now consider \(F_V\). The event \(\{V \le v\}\) can be expressed as \[ \{ V \le v \} = \{ X \le v \} \cup \{Y \le v \} \cap \{X \le v \} \cap \{Y \le v\}.\] Thus, \[F_V(v) = F_X(v) + F_Y(v) - F_{X,Y}(v,v). \]

6.1 Change of variables formulas

For continuous random variables, it is possible to obtain a general change of variable formula to obtain the PDF of functions of random variable in terms of their joint PDF. My personal view is that it is simpler to reason about such change of variables from first principles, but nonetheless it is good to know the results.

Suppose \(X\) is a continuous random variable with PDF \(f_X\) and \(Y = g(X)\), where \(g\) is a continuous and one-to-one function (from \(\text{Range}(X)\) to \(\text{Range}{Y}\)). Thus, \(g\) must be either strictly increasing or strictly decreasing, and in both cases the inverse \(g^{-}\) is well defined.
- If \(g^{-1}\) is strictly increasing, we have \[ F_Y(y) = \PR(Y \le y) = \PR(X \le g^{-1}(y) = F_X(g^{-1}(y))\] Therefore, \[ f_Y(y) = \frac{d F_X(g^{-1}(y))}{dy} = f_X(g^{-1}(y)) \frac{d g^{-1}(y)}{dy}. \]
- If \(g^{-1}\) is strictly decreasing, we have \[ F_Y(y) = \PR(Y \le y) = \PR(X \ge g^{-1}(y)) = 1 - F_X(g^{-1}(y)).\] Therefore, \[ f_Y(y) = - \frac{d F_X(g^{-1}(y))}{dy} = - f_X(g^{-1}(y)) \frac{d g^{-1}(y)}{dy}. \]
- The above two formulas can be combined as \[ \bbox[5pt,border: 1px solid]{f_Y(y) = f_X(g^{-1}(y)) \left| \frac{d g^{-1}(y)}{dy} \right|} \]
From calculus, we know that if \(h(y) = g^{-1}(y)\), then \(h'(y) = 1/g'(h(y))\). Thus, the above expression can be simplified as \[ \bbox[5pt, border: 1px solid]{f_Y(y) = \frac{f_X(x)}{\ABS{g'(x)}}, \quad \text{where } x = g^{-1}(y).} \]

Resolve Example 15 using the above formula.
If the transform \(g(x)\) is not one-to-one (as in Example 13), we can obtain \(f_Y(y)\) as follows. Suppose \(y = g(x)\) has finite roots, denoted by \(\{x^{(k)}\}_{k=1}^m\). Then, \[ f_Y(y) = \sum_{k=1}^m \frac{f_X(x^{(k)})}{\ABS{g'(x^{(k)})}}. \]

Resolve Example 13 using the above formula.
Now suppose \(\{X_1, \dots, X_n\}\) are jointly continuous random variables with joint PDF \(f\). Consider \(n\) random variables: \[\begin{align*} Y_1 &= g_1(X_1, \dots, X_n) \\ Y_2 &= g_2(X_1, \dots, X_n) \\ \vdots &= \vdots \\ Y_n &= g_n(X_1, \dots, X_n). \end{align*}\] We can view this as an equation between two \(n\)-dimensional vectors \(Y = \VEC(Y_1, \dots, Y_n)\) and \(X = \VEC(X_1, \dots, X_n)\) written as \[ Y = g(X) \]

As was the case for the scalar system, for a given \(y \in \reals^n\), the vector equation \(y = g(x)\) may have zero, one, or multiple solutions.
- If \(y = g(x)\), \(y \in \reals^n\) has no solution, then \[ f_Y(y) = 0. \]
- If \(y = g(x)\), \(y \in \reals^n\) has one solution \(x \in \reals^n\), then \[ f_Y(y) = \frac{f_X(x)}{\ABS{J(x)}}, \quad \text{where } y = g(x)\] and \(J(x)\) denotes the Jacobian on \(g(x)\) evaluated at \(x = (x_1, \dots, x_n)\), i.e., \[ \def\1#1#2{\dfrac{∂ g_{#1}}{∂ x_{#2}}} J(x_1, \dots, x_n) = \DET{ \1 11 & \cdots & \1 1n \\ \vdots & \vdots & \vdots \\ \1 n1 & \cdots & \1 nn } \]
- If \(y = g(x)\), \(y \in \reals^n\) has multiple solutions given by \(\{x^{(1)}, \dots, x^{(m)}\}\), then \[ f_Y(y) = \sum_{k=1}^m \frac{f_X(x^{(k)})}{\ABS{J(x^{(k)})}}.\]

Example 21 Resolve Example 20 using the change of variables formula.

Solution

Let \(g_1(x,y) = \max\{x, y\}\) and \(g_2(x,y) = \min\{x,y\}\). Define \[ U = g_1(X,Y) \quad\text{and}\quad V = g_2(X,Y).\]

Define \(g(x,y) = \VEC(g_1(x,y), g_2(x,y))\). Note that \(g\) is not differentiable at \(x=y\).

When \(x > y\), we have \(g_1(x,y) = x\) and \(g_2(x,y) = y\). Thus, \[ J(x,y) = \DET{\1 11 & \1 12 \\ \1 21 & \1 22} = \DET{1 & 0 \\ 0 & 1} = 1. \]
When \(x < y\), we have \(g_1(x,y) = y\) and \(g_2(x,y) = x\). Thus, \[ J(x,y) = \DET{\1 11 & \1 12 \\ \1 21 & \1 22} = \DET{0 & 1 \\ 1 & 0} = -1. \]

We now compute \(f_{U,V}(u,v)\).

If \(u < v\), then the equation \((u,v) = g(x,y)\) has no solution. So we set \[ f_{U,V}(u,v) = 0. \]
If \(u > v\), then the equation \((u,v) = g(x,y)\) has two solutions: \(\{ (u,v), (v,u) \}\). Thus, \[ f_{U,V}(u,v) = \frac{f_{X,Y}(u,v)}{\ABS{1}} + \frac{f_{X,Y}(v,u)}{\ABS{-1}} = f_{X,Y}(u,v) + f_{X,Y}(v,u). \]
If \(u = v\), then the equation \((u,u) = g(x,y)\) has one solution \((u,u)\). Thus, \[ f_{U,V}(u,u) = f_{X,Y}(u,u). \] Note that \(u = v\) is a line in two-dimensional space. (Formally, it is a set of measure zero.) Hence, the choice of \(f_{U,V}\) at \(u = v\) will not affect any probability computations. So we can also set \[ f_{U,V}(u,u) = 0. \]

From the joint PDF \(f_{U,V}\), we can compute the marginals as follows:

For \(U\), we have \[ f_U(u) = \int_{-∞}^{∞} f_{U,V}(u,v) dv = \int_{-∞}^{u} \bigl[ f_{X,Y}(u,v) + f_{X,Y}(v,u) \bigr] dv. \] Therefore, \[ F_U(u) = \int_{-∞}^{u} f_U(\tilde u) d\tilde u = \int_{-∞}^u \int_{-∞}^{\tilde u} \bigl[ f_{X,Y}(\tilde u,v) + f_{X,Y}(v,\tilde u) \bigr] dv d\tilde u. \] Note that \[ \int_{-∞}^u \int_{-∞}^{\tilde u} f_{X,Y}(\tilde u, v) dv d\tilde u = \int_{-∞}^u \int_{-∞}^{x} f_{X,Y}(x, y) dy dx \] and \[\begin{align*} \int_{-∞}^u \int_{-∞}^{\tilde u} f_{X,Y}(v, \tilde u) dv d\tilde u &= \int_{-∞}^u \int_{-∞}^y f_{X,Y}(x,y) dx dy \\ &= \int_{-∞}^u \int_{x}^u f_{X,Y}(x,y) dy dx \end{align*}\] where the last step follows from changing the order of integration.

Substituting these back in the expression for \(F_U(u)\), we get \[ F_U(u) = \int_{-∞}^u \int_{-∞}^{x} f_{X,Y}(x, y) dy dx + \int_{-∞}^u \int_{x}^u f_{X,Y}(x,y) dy dx = \int_{-∞}^u \int_{-∞}^u f_{X,Y}(x,y) dy dx = F_{X,Y}(u,u). \]
For \(V\), we can follow similar algebra as above.

Example 22 Let \(X\) and \(Y\) be random variables defined on a common probability space. Define \[ U = X^2 \quad\text{and}\quad V = X + Y. \] Find \(F_{U,V}\)?

Solution

Let’s consider the system of equations \[ u = x^2 \quad\text{and}\quad v = x + y \] for a given value of \((u,v)\). First observe that \[ J(x,y) = \DET{ 2x & 0 \\ 1 & 1 } = 2x. \]

If \(u < 0\), then the system of equations has no solutions. Therefore, \[ F_{U,V}(u,v) = 0, \quad u < 0. \]
If \(u = 0\), then the system of equations has one solution: \[ x^{(1)} = 0 \quad\text{and}\quad y^{(1)} = v. \] However, \(J(0,v) = 0\). So, \[ f_{U,V}(0,v) = \frac{f_{X,Y}(0,v)}{J(0,v)} \] is undefined. However, since \(u = 0\) is a line in two-dimensions (i.e., a set of measure zero), the choice of \(f_{U,V}\) at \(u = 0\) will not affect any probability computations. So, we set \[ f_{U,V}(0,v) = 0. \]
If \(u > 0\), then the system of equations has two solutions: \[ (x^{(1)}, y^{(1)}) = (+\sqrt{u}, v - \sqrt{u}) \quad\text{and}\quad (x^{(2)}, y^{(2)}) = (-\sqrt{u}, v + \sqrt{u}) \] Therefore, \[ f_{U,V}(u,v) = \frac{f_{X,Y}(\sqrt{u}, v - \sqrt{u})}{2 \sqrt{u}} + \frac{f_{X,Y}(-\sqrt{u}, v + \sqrt{u})}{2 \sqrt{u}}. \]