Conditional probability and conditional expectation
Conditional probability is perhaps the most important aspect of probability theory as it explains how to incorporate new information in a probability model. However, formally defining conditional probability is a bit intricate. In the notes, I will first provide an intuitive high-level explanation of conditional probabability. We will then do a deeper dive, trying to develop a bit more intuition about what is actually going on.
1 Conditioning on events
Recall that conditional probability for events is defined as follows: given a probability space
and events such that , we haveBuilding on this definition, we can define the conditional CDF of a random variable
conditioned on an event (such that ) as follows:As we pointed out, conditional probabilities are probabilities, the conditional CDF defined above satisfies the properties of regular CDFs. In particular
is non-decreasing function. is right-continuous function.
Since
is CDF, we can classify random variables conditioned on an event as discrete or continuous in the usual way. In particularIf
is piecewise constant, then conditioned on is a discrete random variable which takes values in a finite or countable subset of . Furthermore, conditioned on has a conditional PMF defined asIf
is continuous, then conditioned on is a continuous random variable which has a conditional PDF given byIf
is neither piecewise constant nor continuous, then conditioned on is a mixed random variable.
Therefore, a random variable conditioned on an event behaves exactly like a regular random variable. We can define conditional expectation
, conditional variance in the obvious manner.An immediate implication of the law of total probability is the following.
If
is a partition of , then Furthermore, if and conditioned on are both discrete, we have and if and conditioned on are both continuous, we have
Exercise 1 Consider the following experiment. A fair coin is tossed. If the outcome is heads,
Example 1 (Memoryless property of geometric random variable) Let
Recall that the PMF of a geometric random variable is
Now consider
This is called the memoryless property of a geometric random variable.
Example 2 (Memoryless property of exponential random variable) Let
Recall that the PDF of an exponential random variable is
Now consider
This is called the memoryless property of a exponential random variable.
2 Conditioning on random variables
We first start with the case where we are conditioning on discrete random variables.
If
and are random variables defined on a common probability space and is discrete, then for any such that .If
is also discrete, the conditional PMF is defined as for any such that .Moreover, we have that
The above expression can be written differently to give the chain rule for random variables:
For any event
in , the law of total probability may be written asIf
is independent of , we have
We now consider the case when we are conditioning on a continuous random variable.
If
is continuous, for all . We may think ofWhen
and are jointly continuous, we define the conditional PDFNote that the conditional PDF cannot be interpreted in the same manner as the conditional PMF because it gives the impression that we are conditioning on a zero-probability event. However, we can view it as a limit as follows:
If is small, we can approximate and Substituting in the above equation, we get . Thus, when and are jointly continuous, we haveThe formal definition of conditional densities requires some ideas from advanced probability theory, which we will not cover in this course. Nonetheless, I will try to explain the intuition behind the formal definitions in the next section.
The above expression may be written differently to give the chain rule for random variables:
For any event
, the law of total probability may be written as An immediate implication of this isIf
is independent of , we haveWe can show that conditional PMF and conditional PDF satisfy all the properties of PMFs and PDFs. Therefore, we can define conditional expectation
in terms of or . We can similarly define conditional variance
Example 3 Suppose
We first compute the marginal
Example 4 Suppose
We will use the law of total probability.
We will compute
For
,For
,For
,
Thus,
Hence,
3 Conditional expectation
Define
3.1 Conditioning on a -algebra
The key idea is conditioning on a
- Consider a probability space
where is a finite -algebra. Let be a sub- -algebra of . In particular, we assume that there is a partition of such that . The elements are called the atoms of the -algebra .
TODO: Add example. 4x4 grid. partition for
We define
(which we will write as as Thus, on each , the value of is equal to .This idea can be extended to any random variable instead of
, that is, for any random variable Thus, on each , the value of is equal to .When
is the trivial -algebra,When
, .If
and are joint random variables and and are constants, thenIf
is another random variable which is measurable (i.e., takes constant values on the atoms of ), then[The result can be proved pictorially.]
3.2 Smoothing property of conditional expectation
Let
A special case of the above property is that
3.3 Conditioning on random variable
Now suppose
is a discrete random variable, then and may be viewed as a short-hand notation for and . Similar interpretations hold for conditioning on multiple random variables (or, equivalently, conditioning on random vectors).The smoothing property of conditional expectation can then be stated as
An implication of the smoothing property is the following: for any (measurable) function
,
This previous property is used for generalizing the definition of conditional expectation to continuous random variables. First, we consider conditioning wrt
-algebra , which is not necessarily finite (or countable).Then, for any non-negative1 random variable
, is defined as a -measurable random variable that satisfies for every .
1 We start with non-negative random variables just to avoid the concerns with existence of expectation due to
It can be shown that
exists and is unique up to sets of measure zero. Formally, one takes about a “version” of conditional expectation.Then
for continuous may be viewed as .The formal definition of conditional expectation implies that if we take any Borel subsets
of , then is a (measurable) function that satisfies for all Borel subsets of .We will show that
satisfies . In particular, the RHS of is which equals the LHS of . This is why the conditional density is defined the way it is defined!Finally, it can be shown that
, , satisfies the axioms of probability.Therefore, conditional probability satisfies all the properties of probability (and consequently, conditional expectations satisfy all the properties of expectations).Note that the definition of conditional expectation generalizes Bayes rule. In particular, for any (measurable) function
we have
Exercise 2 Let
Consider the events
, . Find .Compute
.