# 38 Martingales

Consider a probability space \((Ω, \mathcal F, P)\). We define a few key terms:

A

**filteration**\(\{\mathcal F_t\}_{t \ge 1}\) is an increasing familty of sub-sigma-fields of \(\mathcal F\), i.e., for any \(t_1 < t_2\), \(\mathcal F_{t_1} \subseteq \mathcal F_{t_2}\).The most common way of generating a filtration is through a stochastic process. In particular, given a stochastic process \(\{X_t\}_{t \ge 1}\), the filtration \(\{\mathcal F_t\}_{t \ge 1}\) given by \[ \mathcal F_t = σ(X_{1:t}), \quad t \ge 1 \] is called the

**filtration generated by \(\{X_t\}_{t \ge 1}\)**or the**natural filtration**of \(\{X_t\}_{t \ge 1}\).A family of integrable random variables \(\{X_t\}_{t \ge 1}\) is said to be

**adapted**with respect to the filtration \(\{\mathcal F_t\}_{t \ge 1}\) if \(X_t\) is \(\mathcal F_t\)-measurable for each \(t\).A family of integrable random variables \(\{X_t\}_{t \ge 1}\) is said to be

**predictable**with respect to the filtration \(\{\mathcal F_t\}_{t \ge 1}\) if \(X_t\) is \(\mathcal F_{t-1}\)-measurable for each \(t\).

Intuitively, a process \(\{X_t\}_{t \ge 1}\) is adapted with respect to the filtration \(\{\mathcal F_t\}_{t \ge 1}\) the value of \(X_t\) is fully known at time \(t\) based on the information \(\mathcal F_t\).

**Definition 38.1** A family of integrable random vabiables \(\{X_t\}_{t \ge 1}\) adapted to a filtration \(\{\mathcal F_t\}_{t \ge 1}\) is said to be a **martingale** if \[\begin{equation}\label{eq:martingale}
X_s = \EXP[X_{t} \mid \mathcal F_s], \quad \forall s < t.
\end{equation}\]

Property \eqref{eq:martingale} has the interpretation that \(X_s\) is the best predictor for \(X_t\) based on the information available at time \(s\).

The condition \(\EXP[X_{t+1} \mid \mathcal F_t] = X_t\) may be thought of as a “fair-game” requirement. If we are playing a fair game, then we expect neither to win nor to lose moeny on average. Given the history of our fortunes up to time \(s\), our expected fortune \(X_{t}\) at a future time \(t\) should just be the fortune \(X_s\) that we have at time \(s\).

In the definition of a martingale (and its variants that we will define below), it is assumed that the random variables \(\{X_t\}_{t \ge 1}\) are indegrable, i.e., \(\EXP[|X_t|] < ∞\) for all \(t\). Therefore, the conditional expectations used in the definitions are well defined.

Several desirable properties of martingales are shared by families of random variables for which \eqref{eq:martingale} is replaced by an inequality.

A family of integrable random variables \(\{X_t\}_{t \ge 1}\) adapted to a filtration \(\{\mathcal F_t\}_{t \ge 1}\) is said to be a

**submartingale**if \[\begin{equation}\label{eq:submartingale} X_s \le \EXP[X_{t} \mid \mathcal F_s], \quad \forall s < t. \end{equation}\]A family of integrable random variables \(\{X_t\}_{t \ge 1}\) adapted to a filtration \(\{\mathcal F_t\}_{t \ge 1}\) is said to be a

**supermartingale**if \[\begin{equation}\label{eq:supermartingale} X_s \ge \EXP[X_{t} \mid \mathcal F_s], \quad \forall s < t. \end{equation}\]

Note the definitions of the inequalities: \[\begin{alignat*}{2}
\text{supermartingale:} &\quad & X_s &\ge \EXP[X_t \mid \mathcal F_t] \\
\text{submartingale:} &\quad& X_s &\le \EXP[X_t \mid \mathcal F_t]
\end{alignat*}\] Thus, at each time, a *sub*martingale is *below* its future expected value, whereas a *supermartingale* is *above* its future expected value.

If \(\{X_t\}_{t \ge 1}\) is a submartingale, then \(\{-X_t\}_{t \ge 1}\) is a supermartingale. This property allows us to translate results between submartingales and supermartingales.

## 38.1 Examples and Immediate Properties

**Martingales generalize the theory of sums of independent random variables.**Let \(ξ_1, ξ_2, \dots\) be indepdent, integrable random variables. Define \(X_0 = 0\) and \(X_t = ξ_1 + \dots + ξ_t\). If \(\EXP[X_t] = 0\) for \(t \ge 1\), then the sequence \(\{X_t\}_{t \ge 1}\) is a martingale with respect to the natural filtration: \[ \EXP[X_{t+1} \mid X_{1:t}] = \EXP[X_t + ξ_{t+1} \mid X_{1:t}] = X_t. \]Similarly, if \(ξ_t\) has positive mean, \(\{X_t\}_{t \ge 1}\) is a submartingale, and if \(ξ_t\) has negative mean, \(\{X_t\}_{t \ge 1}\) is a supermartingale.

Let \(\{X_t\}_{t \ge 1}\) be martingale and let \(f\) be a convex function for which each \(f(X_t)\) is integrable. Then, a direct application of Jensen’s inequality implies that \(\{f(X_t)\}_{t \ge 1}\) is a submartingale.

A consequence of the previous result is the following. Let \(\{X_t\}_{t \ge 1}\) be a martingale and \(p \ge 1\) be such that \(|X_t|^p\) is integrable for all \(t\). Then \(\{|X_t|^p\}_{t \ge 1}\) is a submartingale.

Let \(\{X_t\}_{t \ge 1}\) be

**sub**martingale and let \(f\) be a convex**and increasing**function for which each \(f(X_t)\) is integrable. Then, \(\{f(X_t)\}_{t \ge 1}\) is a submartingale because \[ \EXP[ f(X_{t+1}) \mid \mathcal F_t] \ge f( \EXP[ X_{t+1} \mid \mathcal F_t ]) \ge f(X_t) \] where the first inequality follows from Jensen’s inequality and the second from the fact that \(\{X_t\}_{t \ge 1}\) is a sub-martingale and \(f\) is increasing.A consequence of the previous result is the following. Let \(\{X_t\}_{t \ge 1}\) be a submartingale. Then, for any constant \(a\), \(\{(X_t - a)^+\}_{t \ge 1}\) is also a submartingale.

If \(\{X_t\}_{t \ge 1}\) and \(\{Y_t\}_{t \ge 1}\) are martingales defined on the same family of \(σ\)-algebras, then \(\{a X_t + b Y_t\}_{t \ge 1}\) is a martingale.

The previous result holds for supermartingales provided \(a\) and \(b\) are positive.

If \(\{X_t\}_{t \ge 1}\) and \(\{Y_t\}_{t \ge 1}\) are supermartingales defined on the same family of \(σ\)-algebras, then \(\{X_t \wedge Y_t\}_{t \ge 1}\) is a supermartingale.

Suppose \(X\) is an integrable random variable and \(\{\mathcal F_t\}_{t \ge 1}\) is a filtration. Define \(X_t \coloneqq \EXP[X \mid \mathcal F_t]\). Then \(\{X_t\}_{t \ge 1}\) is a martingale with respect to the filtration (follows from the smoothing property of conditional expectations).

**Likelihood ratios are martingales.**Suppose \(X_1\), \(X_2\), are independent random variables with density \(f\). Imagine we are considering the alternative hypothesis that these random variables are independent with a different probability density \(g\) (but they are*really*distributed according to the density \(f\)). We assume that \(f\) and \(g\) have the same support. Define the likelihood ratio \[ Λ_t = \frac{g(X_1)}{f(X_1)} \dots \frac{g(X_t)}{f(X_t)}. \] Then, since \(Λ_{t+1} = Λ_t g(X_{t+1})/f(X_{t+1})\), we have \[\begin{align*} \EXP[Λ_{t+1} \mid X_{1:t}] &= Λ_t \EXP\biggl[ \frac{g(X_{t+1})}{f(X_{t+1})} \biggm| X_{1:t} \biggr] \\ &= Λ_t \EXP\biggl[ \frac{g(X_{t+1})}{f(X_{t+1})} \biggr] \\ &= Λ_t \int\biggl( \frac{g(x)}{f(x)} \biggr) f(x)dx \\ &= Λ_t \int g(x)dx \\ &= Λ_t. \end{align*}\] Hence, \(\{Λ_t\}_{t \ge 1}\) is a martingale.A martingale \(\{X_t\}_{t \ge 1}\) may be written as a sum of increments: \(X_t = X_0 + ξ_1 + \dots + ξ_t\). The increments \(\{ξ_t\}_{t \ge 1}\) are called

**martingale differences**. Each \(ξ_t\) is integrable and \(\EXP[ξ_t \mid \mathcal F_{t-1}] = 0\) for all \(t\).

## 38.2 Doob’s decomposition

**Theorem 38.1 (Doob’s decomposition)** Every sequence \(\{X_t\}_{t \ge 1}\) of integrable random variables adapted to a filtration \(\{\mathcal F_t\}_{t \ge 1}\) can be decomposed into a sum of a martingale and an integrable predictable process, i.e., \[
X_t = M_t + A_t
\] where \(\{M_t\}_{t \ge 1}\) is a martingale and \(\{A_t\}_{t \ge 1}\) is a predictable process

**Existence.**We can identify the Doob’s decomposition by defining \[ M_t = X_0 + \sum_{s=1}^{t}\bigl( X_s - \EXP[ X_s \mid \mathcal F_{s-1} \bigr],\] and \[A_t = \sum_{s=1}^t \bigl( \EXP[ X_s \mid \mathcal F_{s-1} ] - X_{s-1} \bigr).\]**Uniqueness.**The decomposition is almost surely unique. If \(X_t = M'_t + A'_t\) is another decomposition, then \(Y_t = M_t - M'_t\) being the difference of martingales is a martingale. But \(M_t - M'_t = A_t - A'_t\), which is a predictable process; combining this with the martingale property of \(M_t - M'_t\) implies that \[ A_t - A'_t = \EXP[ A_t - A'_t \mid \mathcal F_t] = \EXP[ M_t - M'_t \mid \mathcal F_t ] = M_{t-1} - M'_{t-1} = A_{t-1} - A'_{t-1}. \] Since \(A_0 = A'_0 = 0\), we get that \(A_t = A'_t\) for all \(t \ge 1\).A stochastic process \(\{X_t\}_{t \ge 1}\) is a submartingale if and only if its Doob’s decomposition gives a predictable process that is almost surely increasing.

A stochastic process \(\{X_t\}_{t \ge 1}\) is a supermartingale if and only if its Doob’s decomposition gives a predictable process that is almost surely decreasing

## 38.3 Martingale Inequalities

### Burkholder’s inequality

Let \(\{X_t\}_{t \ge 1}\) be a martingale and let \(\xi_1 = X_1\) and \(\xi_t = X_t - X_{t-1}\) be the corresponding martingale difference sequence. The *quadratic variation* process is defined as \[
Q_t \coloneqq \sum_{s = 1}^t \NORM{ξ_s}^2.
\]

**Proposition 38.1 (Burkholder’s inequality)** For any \(p > 1\) be such that \(\EXP[|X_t|^p] < ∞\) for all \(t\), there exists positive constants \(c_p\) and \(C_p\) (depending on \(p\) alone) such that \[
c_p \EXP\bigl[ Q_t^{p/2} \bigr]
\le
\EXP\Bigl[ \sup_{1 \le s \le t} \NORM{X_s}^p \Bigr]
\le
C_p \EXP\bigl[ Q_t^{p/2} \bigr]
\]

The inequality is due to Burkholder (1966). See notes of David Pollard for an accessible proof based on martingale ideas and Bogdan and Więcek (2022) for an elementary proof using Bergman divergence.

### Lenglart inequality

**Proposition 38.2 (Lenglart inequality)** Given a filtration \(\{\ALPHABET F_t\}_{t \ge 1}\), let \(\{X_t\}_{t \ge 1}\) be a non-negative adapted process and \(\{Y_t\}_{t \ge 1}\) be non-negative non-decreasing predictable process such that for any bounded stopping time \(τ\), we have \(\EXP[X_{τ} \mid \ALPHABET F_0] \le \EXP[ Y_{τ} \mid \ALPHABET F_0]\). Then:

For all \(p \in (0,1)\), \[ \EXP\biggl[ \Bigl( \sup_{t \ge 1} X_t \Bigr)^p \Bigm| \ALPHABET F_0 \biggr] \le C_p \EXP\biggl[ \Bigl( \sup_{t \ge 1} Y_t \Bigr)^p \Bigm| \ALPHABET F_0 \biggr], \] where \(C_p = p^{-p}/(1-p)\) and the constant \(C_p\) is sharp.

In addition, if \(\{X_t\}_{t \ge 1}\) is non-decreasing, then we have \[ \EXP\biggl[ \Bigl( \sup_{t \ge 1} X_t \Bigr)^p \Bigm| \ALPHABET F_0 \biggr] \le p^{-p} \EXP\biggl[ \Bigl( \sup_{t \ge 1} Y_t \Bigr)^p \Bigm| \ALPHABET F_0 \biggr], \] and the constant \(p^{-p}\) is sharp.

Moreover, for any \(c,d > 0\), we have \[ \PR\Bigl( \sup_{t \ge 1} X_t > c \Bigm| \ALPHABET F_0 \Bigr) \le \frac 1c \EXP\Bigl[ \sup_{t \ge 1} Y_t \wedge d \Bigm| \ALPHABET F_0 \Bigr] + \PR\Bigl( \sup_{t \ge 1} Y_t > d \Bigm| \ALPHABET F_0 \Bigr) \]

The result is due to Lenglart (1977). See Geiss and Scheutzow (2021) for the sharpness of the bounds.

## 38.4 Convergence of nonnegative supermartingales

We state certain properties of nonnegative supermartingales without proof. This material is taken from Neveu (1975).

**Maximal inequality for nonnegative supermartingales.**For every nonnegative supermartingale \(\{X_t\}_{t \ge 1}\), the random variable \(\sup_{t \ge 1} X_t\) is a.s. finite on the set \(\{X_0 < ∞\}\) and, more precisely, satisfies the following inequality: \[ \PR\Bigl( \sup_{t \ge 1} X_t \ge a \Bigm| \mathcal F_t \Bigr) \le \max\biggl( \frac{X_0}{a}, 1 \biggr) \]**Doob’s inequalities.**Let \(\{X_t\}_{t \ge 1}\) be a generalized martingale or a nonnegative submartingale. Then for all \(p \in (1,∞)\), and all \(t\), we have: \[ \EXP[\ABS{X_t}^p] \le \EXP\Bigl[ \sup_{1 \le s \le t} \ABS{X_s}^p \Bigr] \le \biggl(\frac{p}{p-1}\biggr)^p \EXP[\ABS{X_t}^p]. \]**Switching principle for supermartingales.**Given two nonnegative supermartingales \(\{X^{(i)}_t\}_{t \ge 1}\), \(i \in \{1, 2\}\), and a stopping time \(τ\) such that \(X^{(1)}_{τ} \ge X^{(2)}_{τ}\) on \(\{τ < ∞\}\), the formula \[ X_t(ω) = \begin{cases} X^{(1)}_t(ω), & \text{if $t < τ(ω)$} \\ X^{(2)}_t(ω), & \text{if $t \ge τ(ω)$} \end{cases} \] defines a new nonnegative supermartingale.**Supermartingale convergence theorem.**Every nonnegative supermartingale \(\{X_t\}_{t \ge 1}\) converges almost surely. Furthermore, the limit \(X_{∞} = \lim_{t \to ∞} X_t\) satisfies the inequality \[ \EXP[ X_{∞} \mid \mathcal F_t] \le X_t. \]A nonnegative supermartingale converges in \(\mathcal L^1\) to its limit \(X_{∞}\) if and only if \(\EXP[X_t] \to \EXP[X_{∞}]\).

For a nonnegative supermartingale \(\{X_t\}_{t \ge 1}\) and a stopping time \(τ\), the sequence \(\{X_{τ \wedge t}\}_{t \ge 1}\) “stopped at the stopping time \(τ\)” is also a non-negative supermartingale.

**Stopping theorem**Let \(\{X_t\}_{t \ge 1}\) be a nonnegative supermartingale. Then, for any*bounded*stopping times \(τ_1\) and \(τ_2\), we have \[ X_{τ_1} \ge \EXP[ X_{τ_2} \mid \mathcal F_{τ_1}], \quad \text{ on } \{τ_1 \le τ_2 \}. \]

The following interpretation is taken from Dellacherie and Meyer (1982).

Suppose the random variable \(X_t\) represents a gambler’s fortune at time \(t\). Then his successive gains are presented by the random variable \(ξ_t = X_t - X_{t-1}\). The gambler may be in an arbitrarily complicated casino, where he may choose between all sorts of games, move from one table to another, bet on other player’s fortunes, etc., but it is understood that his decisions are unprophetic, i.e., they can only be taken as functions for the *past* and not as functions of the future, with the convention that the present, i.e., the game which is in the process of being played, forms part of the future.

The supermartingale inequality \(\EXP[ξ_t \mid \ALPHABET F_t] \le 0\) means that, whatever decisions are taken by the gambler just before the \(t\)-th game, the *average* profits from that will be negative. In other words, the game favors the casino—which is what happens in reality! (The martingale equality corresponds to the case of an equitable casino)

Now imagine that the gambler, fearing that he is under the influence of an evil star, confides his fortune to a “luckier” (but still unprophetic) friend and goes out for fresh air and returns at random times \(τ_1 < τ_2 < \dots\). The stopping theorem says that **what he observes at these random instances is also a game favorable to the casino** (or merely equitable in case of martingales). In other words, things are no better.

The restriction on the length of the stopping time has the following meaning: suppose the gambler tells his friend to “call me at the first moment \(τ_1\) when my gain \(X_{τ_1} - X_0\) is positive. Then call me again at time \(τ_2\) when my gain \(X_{τ_2} - X_{τ_1}\) is positive. And so on” At such moments, mean gain is also positive and hence stopping theorem seems to be contradicted; however, the stopping theorem affirms that the stopping times \(τ_1\), \(τ_2\), etc., are not finite.

The converges result continues to hold for “almost” supermartingales.

**Theorem 38.2 (Almost supermartingale convergence theorem)** Suppose \(\{X_t\}_{t \ge 1}\), \(\{β_t\}_{t \ge 1}\), \(\{Y_t\}_{t \ge 1}\), \(\{Z_t\}_{t \ge 1}\) are nonnegative stochastic processes adapted to some filtration \(\{\mathcal F_t\}_{t \ge 1}\) that satisfy \[\begin{equation}\label{eq:almost-supermartingale}
\EXP[X_{t+1} \mid \mathcal F_t ] \le
(1 + β_t) X_t + Y_t - Z_t,
\quad t \ge 1
\end{equation}\] Define the set \(Ω_0\) by \[
Ω_0 = \biggl\{ ω : \sum_{t \ge 1} β_t(ω) < ∞ \biggr\}
\cap
\biggl\{ ω : \sum_{t \ge 1} Y_t(ω) < ∞ \biggr\}.
\] Then, for all \(ω \in Ω_0\), we have that

- \(\lim_{t \to ∞} X_t(ω)\) exists and is finite
- \(\sum_{t \ge 1} Z_t(ω) < ∞\).

The key idea is that we can fiddle with \eqref{eq:almost-supermartingale} to make it a supermartingale. Let \(b_t = 1/\prod_{\tau = 1}^{t-1} ( 1 + β_{τ} )\). Note that \(\{b_t\}_{t \ge 1}\) is \(\{\mathcal F_t\}_{t \ge 1}\) predictable. Define the nonnegative processes \(\{X'_t\}_{t \ge 1}\) \(\{Y'_t\}_{t \ge 1}\) and \(\{Z'_t\}_{t \ge 1}\), where \[ X'_t = b_t X_t, \quad Y'_t = b_{t+1} Y_t, \quad Z'_t = b_{t+1} Z_t. \] \[\begin{align} \EXP[X'_{t+1} \mid \mathcal F_t] &= b_{t+1} \EXP[ X_{t+1} \mid \mathcal F_t] \notag \\ &\le b_{t+1}(1 + β_t) X_t + b_{t+1} Y_t - b_{t+1} Z_t \notag \\ &= X'_t + Y'_t - Z'_t. \label{eq:almost-supermartingale-st-1} \end{align}\]

Define \[\begin{equation}\label{eq:almost-supermartingale-st-2} W_t = X'_t - \sum_{s = 1}^{t-1} (Y'_s - Z'_s) . \end{equation}\] From \eqref{eq:almost-supermartingale-st-1} we have \[ \EXP[W_{t+1} \mid \mathcal F_t] = \EXP\biggl[ X'_{t+1} - \sum_{s=1}^t(Y'_s - Z'_s) \biggm| \mathcal F_t \biggr] \le X_t - \sum_{s=1}^{t-1} (Y'_s - Z'_s) = W_t. \] Therefore, \(\{W_t\}_{t \ge 1}\) is a supermartingale, but we cannot immediately apply the supermartingale convergence theorem because we don’t know that \(W_t\) is bounded from below.

For an arbitrary \(a > 0\), define the stopping time \(τ_a = \inf \bigl\{ t : \sum_{s=1}^{t} Y'_s > a \bigr\}\). Now define the stopped sequence \[W^{(a)}_t := W_{τ_a \wedge t} = \begin{cases} W_t, & \hbox{if } t \le τ_a \\ W_{τ_a}, & \hbox{if } t > τ_a \end{cases}, \quad t \ge 1.\] Note that the stopped sequence satisfies: \[ \EXP[W^{(a)}_{t+1} \mid \ALPHABET F_t] \stackrel{(a)}= \begin{cases} \EXP[W_{t+1} \mid \ALPHABET F_t] , & \hbox{if } t+1 \le τ_a \\ W_{τ_a}, & \hbox{if } t + 1 > τ_a \end{cases} \stackrel{(b)}\le \begin{cases} W_t, & \hbox{if } t < τ_a \\ W_{τ_a}, & \hbox{if } t \ge τ_a \end{cases} \stackrel{(c)}= W^{(a)}_t, \] where \((a)\) uses the definition of \(W^{(a)}_t\) and the fact that \(\EXP[W_{τ_a} \mid \ALPHABET F_t] = W_{τ_a}\) since \(τ_a \le t\); \((b)\) uses the fact that \(\{W_t\}_{t \ge 1}\) is a supermartingale; and \((c)\) uses the fact that the two terms coincide when \(t = τ_a\).

Thus, the stopped sequence \(\{W^{(a)}_{t}\}_{t \ge 1}\) is a supermartingale. Moreover, it is bounded from below since \[ W^{(a)}_t = W_{t \wedge τ} \ge - \sum_{s = 1}^{τ \wedge (t-1)} Y_s \ge -a,\] So, we can think of \(W^{(a)}_{t} + a\) as a nonnegative supermartingale. Therefore, by the supermartingale convergence theorem, for any given \(a\), \(\lim_{t \to ∞} W^{(a)}_{t}\) exists and is finite a.s.

Now define \(Ω^{(a)}\) to be the set \(\{ ω : \sum_{t \ge 1} Y_t(ω) < a \}\). We know that \(Y'_t = b_{t+1} Y_t < Y_t\). So, on the set \(Ω^{(a)}\), \(\sum_{t \ge 1} Y'_t(ω) < a\). Hence, for all \(ω \in Ω^{(a)}\), we have \(τ_{a}(ω) = ∞\). Therefore, \(W_t = W^{(a)}_t\) and, by the previous argument, \(\{W_t\}_{t \ge 1}\) converges to a finite limit.

By countable additivity, we get that \(W_t\) converges to a finite limit for all \(a \in \integers\).

**Side remark:** It took me a while to understand why this is the case, so I am including a complete formal argument.

- Fix an \(a\). We know that on \(Ω^{(a)}\), \(W_t\) has a finite limit. Denote that limit by \(W_{∞}\).
- For any \(ε > 0\), define the event \(\ALPHABET E = \cap_{s = 1}^∞ \cup_{t = s}^∞ |W_t(ω) - W_{∞}(ω)| > ε\).
- The fact that \(W_t\) converges almost surely to \(W_{∞}\) on \(Ω^{(a)}\) means that \(\PR(Ω^{(a)} \cap \ALPHABET E) = 0\).
*We now restrict attention to integer values of \(a\)*. Since \(Ω^{(a)} \cap \ALPHABET E\) is an increasing sequence, we have \[ \PR(Ω_0 \cap \ALPHABET E) = \PR(\lim_{a \to ∞} Ω^{(a)} \cap \ALPHABET E) = \lim_{a \to ∞} \PR(Ω^{(a)} \cap \ALPHABET E) = 0. \] Hence, \(W_t\) converges to \(W_{∞}\) almost surely on \(Ω_0\).

Now \eqref{eq:almost-supermartingale-st-2} implies that \[ \lim_{t \to ∞} \biggl \{ X'_t - \sum_{s=1}^{t-1} (Y'_s - Z'_s) \biggr\} < ∞. \] Hence, on \(Ω_0\), \(\sum_{s=1}^∞ Z'_s < ∞\) and \(\lim_{t \to ∞} X'_t\) exists and is finite.

From Exercise 38.1, it follows that for \(ω \in Ω_0\), \(b_t(ω)\) has a limit. \[ X_t = X'_t/b'_t \] has a limit. Moreover, the inequality, \[ Z_t \le Z'_t \prod_{s \ge 1} (1 + β_s) \] implies that \(\sum_{t \ge 1} Z_t < ∞\) for \(ω \in Ω_0\).

The following is a determinisitic version of the above result, taken from Bertsekas and Tsitsiklis (2000).

**Proposition 38.3** Let \(\{X_t\}_{t \ge 1}\), \(\{Y_t\}_{t \ge 1}\), and \(\{Z_t\}_{t \ge 1}\) be sequences such that \(\{Z_t\}_{t \ge 1}\) is nonnegative and the sequences satisfy: \[
X_{t+1} \le X_t + Y_t - Z_t
\] and \(\sum_{t \ge 1} Y_t < ∞\). Then, either \(X_t \to -∞\) or else \(X_t\) converges to a finite value and \(\sum_{t \ge 1} Z_t < ∞\).

## 38.5 Convergence of submartingales

**Theorem 38.3 (Krickeberg decomposition)** Let \(\{S_t\}_{t \ge 1}\) be a submartingale for which \(\sup_t \EXP[S_t^{+}] < ∞\). Then, there exists a positive martingale \(\{M_t\}_{t \ge 1}\) and a positive supermartingale \(\{X_t\}_{t \ge 1}\) such that \(S_t = M_t - X_t\) almost sure for each \(t\).

The finiteness of \(\sup_{t} \EXP[ S_{t}^{+}]\) is equivalent to the finiteness of \(\sup_{t} \EXP[|S_t|]\) because \(|S_t| = 2S_t{+} - (S_t^{+} - S_t^{-})\) and by the submartingale property, \(\EXP[S_t^{+} - S_t^{-}] = \EXP[S_t]\) increases with \(t\).

**Corollary 38.1** A submartingale with \(\sup_{t} \EXP[S_t^{+}] < ∞\) converges almost surely to an integrable limit.

## 38.6 Square integrable martingales

If \(\{X_t\}_{t \ge 1}\) is an integrable sequence adapted to \(\{\ALPHABET F_t\}_{t \ge 1}\). Define the

**compensator**\(\{ \langle X_t \rangle\}_{t \ge 1}\) as: \(\langle X_1 \rangle = 0\) and for \(t > 1\): \[ \langle X_t \rangle - \langle X_{t-1} \rangle = \EXP[ X_t \mid \ALPHABET F_{t-1} ] - X_{t-1}. \] Note that the compensator is predictable and \(X_t - \langle X_t \rangle\) is a martingale.Now consder a square integrable martingle \(\{X_t\}_{t \ge 1}\). The compensator is given by: \[ \langle X_t^2 \rangle - \langle X_{t-1}^2 \rangle = \EXP[ X_t^2 - X_{t-1}^2 \mid \ALPHABET F_{t-1} ] = \EXP[ (X_t - X_{t-1})^2 \mid \ALPHABET F_{t-1} ]. \] Thus, the compensator \(\langle X_t^2 \rangle\) is increasing.

**Convergence of square integrable martinales.**Let \(\langle X_{∞} \rangle = \lim_{t \to ∞} \langle X_t \rangle\). If \(\EXP[ \langle X_{∞} \rangle ] < ∞\), then \(\{X_t\}_{t \ge 1}\) converges almost surely in the mean-square sense to a random variable \(X_{∞}\).Let \(\{X_t\}_{t \ge 1}\) be a square integrable martingale adapted to \(\{\ALPHABET F_t\}_{t \ge 1}\). Let \(τ\) be a stopping time such that \(\EXP[ \langle X_{τ} \rangle ] < ∞\). Then,

- \(\EXP[X_{τ} \mid \ALPHABET F_1] = X_1\).
- \(\EXP[X_{τ}^2 \mid \ALPHABET F_1 ] = \EXP[ \langle M_{τ} \rangle \mid \ALPHABET F_1]\).

## 38.7 Strong law of large number for martingale difference sequence

The following is a generalization of the strong law of large numbers for martingale differences. See Stout (1974) (Theorem 3.3.1).

**Theorem 38.4** Suppose \(\{M_t\}_{t \ge 1}\) is a martingale difference sequence with respect to the filtration \(\{\ALPHABET F_t\}_{t \ge 1}\). Let \(\{a_t\}_{t \ge 1}\) be positive sequence that is \(\{\ALPHABET F_t\}_{t \ge 1}\) predictable and \(\lim_{t \to ∞} a_t = ∞\).

If for some \(p \in (0,2]\), we have \[ \sum_{t=1}^∞ \EXP \biggl[ \frac{|M_t|^p}{a_t^p} \biggm| \ALPHABET F_{t-1} \biggr] < ∞, \] then \[ \sum_{t=1}^∞ \frac{M_t}{a_t} = 0, \quad a.s. \]

## Exercises

**Exercise 38.1** The purpose of this exercise is to establish the following: For a sequence of nonnegative real numbers \(\{a_t\}_{t \ge 1}\) \[\sum_{t \ge 1} a_t < ∞ \iff
\prod_{t \ge 1} (1 + a_t) \text{ converges to a finite non-zero limit}.
\]

Show that \(\sum_{t \ge 1}a_t \le \prod_{t \ge 1}(1 + a_t)\).

*Hint*: Expand the right hand side.Show that \(\prod_{t \ge 1}(1 + a_t) \le \exp\Bigl(\sum_{t \ge 1} a_t\Bigr)\).

*Hint*: Use Taylor series expansion of \(e^x\).

**Exercise 38.2** **(Kronecker’s Lemma)** This result is useful intermediate step for showing convergence of various iterative algorithms.

Suppose \(\{a_t\}_{t \ge 1}\) is a strictly positive increasing real sequence which diverges to \(∞\) and \(\{x_t\}_{t \ge 1}\) is a real sequence such that the series \(\sum_{t=1}^∞ x_t/a_t\) converges. Then, \[ \lim_{T \to ∞} \frac 1{a_T} \sum_{t=1}^T x_t = 0. \]

**Exercise 38.3** Consider a sequence \(\{α_t\}_{t \ge 1}\) adapted to \(\{\mathcal F_t\}\) such that \[
\sum_{t=1}^{∞} α_t = \infty
\quad\text{and}\quad
\sum_{t=1}^{∞} α_t^2 < ∞.
\] Now suppose \(\{X_t\}_{t \ge 1}\) is an adapted sequence of square integrable random variables such that \[
\EXP[ X_{t+1} | \mathcal F_t ] = 0
\quad\text{and}\quad
\EXP[ \| X_{t+1}\|_2^2 \mid \mathcal F_t ] \le B
\] where \(B\) is a deterministic constant. Then \[
\sum_{t=1}^T α_t X_{t+1}
\quad\text{and}\quad
\sum_{t=1}^T α_t^2 \| X_t\|_2^2,
\quad
T = 1,2, \dots
\] converge to finite limits almost surely.

**Exercise 38.4 (Generalization of Theorem 38.2)** Suppose \(\{X_t\}_{t \ge 1}\), \(\{a_t\}_{t \ge 1}\), \(\{Y_t\}_{t \ge 1}\) be nonnegative processes which are adapted to some filtration \(\{\ALPHABET F_t\}_{t \ge 1}\) such that \[
\EXP[X_{t+1} \mid \ALPHABET F_t] \le a_t X_t + Y_t.
\] Define the set \(Ω_0\) by \[
Ω_0 = \biggl\{ ω : \prod_{t \ge 1} a_t(\omega) < ∞ \biggr\}
\cap
\biggl\{ ω : \sum_{t \ge 1} Y_t(ω) < ∞ \biggr\}
\] Then, show that for all \(ω \in Ω_0\), we have that \(\lim_{t \to ∞} X_t(ω)\) exists and is finite.

**Exercise 38.5 (Vector generalization of Theorem 38.2)** In all the discussion of almost supermartingale results so far, we have assumed that the processes were scalar valued. Now consider the case where \(\{X_t\}_{t \ge 1}\) and \(\{Y_t\}_{t \ge 1}\) are processes in \(\reals_{\ge 0}^n\) and \(\{A_t\}_{t \ge 1}\) is a process in \(\reals_{\ge 0}\), all adapted to a filteration \(\{\ALPHABET F_t\}_{t \ge 1}\) such that \[
\EXP[X_{t+1} \mid \ALPHABET F_t] \le A_t X_t + Y_t.
\]

Assume that for all \(t\), the matrix \(A_t\) is of the form \(P_t D_t\), where \(P_t\) is a permutation matrix and \(D_t\) is a diagonal matrix with all positive elements.^{1}

^{1} Such matrices are called :positive monomial matrices and have the property that they are invertible and all entries of the inverse are non-negative.

Define the set \(Ω_0\) by \[ Ω_0 = \biggl\{ ω : \prod_{t \ge 1} A_t(\omega) < ∞ \biggr\} \cap \biggl\{ ω : \sum_{t \ge 1} Y_t(ω) < ∞ \biggr\} \] Then, show that for all \(ω \in Ω_0\), we have that \(\lim_{t \to ∞} X_t(ω)\) exists and is finite.

**Exercise 38.6 (Delayed almost supermartingales)** Suppose \(\{X_t\}_{t \ge 1}\), \(\{β_t\}_{t \ge 1}\), and \(\{Y_t\}_{t \ge 1}\) be nonnegative processes which are adapted to some filteration \(\{\ALPHABET F_t\}_{t \ge 1}\) such that \[
\EXP[ X_{t+1} \mid \ALPHABET F_t] \le (1 + β_t) X_{t-d} + Y_t,
\] where \(d \in \integers_{\ge 0}\) is a constant. Define the set \(Ω_0\) by \[
Ω_0 = \biggl\{ ω : \sum_{t \ge 1} β_t(\omega) < ∞ \biggr\}
\cap
\biggl\{ ω : \sum_{t \ge 1} Y_t(ω) < ∞ \biggr\}
\] Then, show that for all \(ω \in Ω_0\), we have that \(\lim_{t \to ∞} X_t(ω)\) exists and is finite.

*Hint:* Write the dynamics equation in vector form and use Exercise 38.5

**Exercise 38.7** Let \(\{X_t\}_{t \ge 1}\) be square integrable martingale with \(\EXP[ X_T^2 ] \le T\) for all \(T\). Prove that \(X_T/T \to 0\), a.s.

[This may be viewed as a generalization of a SLLN for i.i.d. sequences.]

*Hint:* Use Abel’s lemma to show \(\sum_{t=1}^T \EXP[ M_t^2 ]/t^2 < ∞\) and then use Theorem 38.4.

## Notes

The material in the introduction is taken from Pollard (2002), Chang (2007), and Dellacherie and Meyer (1982). The latter is an excellent reference for this material. Also see Doob (1971) for a detailed informal introduction to the topic. See Mazliak and Shafer (2022) for a historical account of the development of martingales, including a fascinating deep dive into the origin of the name.

Theorem 38.2 and its proof is adapted from Robbins and Siegmund (1971) and this post on math stackexchange. A version of Theorem 38.2 with \(Z_t = 0\) is stated as Exercise II-4 of Neveu (1975).

Exercise 38.3 is from Bertsekas and Tsitsiklis (2000). Exercise 38.5 is from Mahajan et al. (2024), which provides other vector generalizations of Theorem 38.2 as well.