43 Martingales
Consider a probability space \((Ω, \mathcal F, P)\). We define a few key terms:
- A filtration \(\{\mathcal F_t\}_{t \ge 1}\) is an increasing familty of sub-sigma-fields of \(\mathcal F\), i.e., for any \(t_1 < t_2\), \(\mathcal F_{t_1} \subseteq \mathcal F_{t_2}\). 
- The most common way of generating a filtration is through a stochastic process. In particular, given a stochastic process \(\{X_t\}_{t \ge 1}\), the filtration \(\{\mathcal F_t\}_{t \ge 1}\) given by \[ \mathcal F_t = σ(X_{1:t}), \quad t \ge 1 \] is called the filtration generated by \(\{X_t\}_{t \ge 1}\) or the natural filtration of \(\{X_t\}_{t \ge 1}\). 
- A family of integrable random variables \(\{X_t\}_{t \ge 1}\) is said to be adapted with respect to a filtration \(\{\mathcal F_t\}_{t \ge 1}\) if \(X_t\) is \(\mathcal F_t\)-measurable for each \(t\). 
- A family of integrable random variables \(\{X_t\}_{t \ge 1}\) is said to be predictable with respect to a filtration \(\{\mathcal F_t\}_{t \ge 1}\) if \(X_t\) is \(\mathcal F_{t-1}\)-measurable for each \(t\). 
Intuitively, a process \(\{X_t\}_{t \ge 1}\) is adapted with respect to a filtration \(\{\mathcal F_t\}_{t \ge 1}\) when the value of \(X_t\) is fully known at time \(t\) based on the information \(\mathcal F_t\).
Definition 43.1 A family of integrable random vabiables \(\{X_t\}_{t \ge 1}\) adapted to a filtration \(\{\mathcal F_t\}_{t \ge 1}\) is said to be a martingale if \[\begin{equation}\label{eq:martingale} X_s = \EXP[X_{t} \mid \mathcal F_s], \quad \forall s < t. \end{equation}\]
Property \eqref{eq:martingale} has the interpretation that \(X_s\) is the best predictor for \(X_t\) based on the information available at time \(s\).
The condition \(\EXP[X_{t+1} \mid \mathcal F_t] = X_t\) may be thought of as a “fair-game” requirement. If we are playing a fair game, then we expect neither to win nor to lose moeny on average. Given the history of our fortunes up to time \(s\), our expected fortune \(X_{t}\) at a future time \(t\) should just be the fortune \(X_s\) that we have at time \(s\).
In the definition of a martingale (and its variants that we will define below), it is assumed that the random variables \(\{X_t\}_{t \ge 1}\) are integrable, i.e., \(\EXP[|X_t|] < ∞\) for all \(t\). Therefore, the conditional expectations used in the definitions are well defined.
Several desirable properties of martingales are shared by families of random variables for which \eqref{eq:martingale} is replaced by an inequality.
- A family of integrable random variables \(\{X_t\}_{t \ge 1}\) adapted to a filtration \(\{\mathcal F_t\}_{t \ge 1}\) is said to be a submartingale if \[\begin{equation}\label{eq:submartingale} X_s \le \EXP[X_{t} \mid \mathcal F_s], \quad \forall s < t. \end{equation}\] 
- A family of integrable random variables \(\{X_t\}_{t \ge 1}\) adapted to a filtration \(\{\mathcal F_t\}_{t \ge 1}\) is said to be a supermartingale if \[\begin{equation}\label{eq:supermartingale} X_s \ge \EXP[X_{t} \mid \mathcal F_s], \quad \forall s < t. \end{equation}\] 
Note the definitions of the inequalities: \[\begin{alignat*}{2} \text{supermartingale:} &\quad & X_s &\ge \EXP[X_t \mid \mathcal F_t] \\ \text{submartingale:} &\quad& X_s &\le \EXP[X_t \mid \mathcal F_t] \end{alignat*}\] Thus, at each time, a submartingale is below its future expected value, whereas a supermartingale is above its future expected value.
If \(\{X_t\}_{t \ge 1}\) is a submartingale, then \(\{-X_t\}_{t \ge 1}\) is a supermartingale. This property allows us to translate results between submartingales and supermartingales.
43.1 Examples and Immediate Properties
- Martingales generalize the theory of sums of independent random variables. Let \(ξ_1, ξ_2, \dots\) be indepdent, integrable random variables. Define \(X_0 = 0\) and \(X_t = ξ_1 + \dots + ξ_t\). If \(\EXP[X_t] = 0\) for \(t \ge 1\), then the sequence \(\{X_t\}_{t \ge 1}\) is a martingale with respect to the natural filtration: \[ \EXP[X_{t+1} \mid X_{1:t}] = \EXP[X_t + ξ_{t+1} \mid X_{1:t}] = X_t. \] - Similarly, if \(ξ_t\) has positive mean, \(\{X_t\}_{t \ge 1}\) is a submartingale, and if \(ξ_t\) has negative mean, \(\{X_t\}_{t \ge 1}\) is a supermartingale. 
- Let \(\{X_t\}_{t \ge 1}\) be martingale and let \(f\) be a convex function for which each \(f(X_t)\) is integrable. Then, a direct application of Jensen’s inequality implies that \(\{f(X_t)\}_{t \ge 1}\) is a submartingale. 
- A consequence of the previous result is the following. Let \(\{X_t\}_{t \ge 1}\) be a martingale and \(p \ge 1\) be such that \(|X_t|^p\) is integrable for all \(t\). Then \(\{|X_t|^p\}_{t \ge 1}\) is a submartingale. 
- Let \(\{X_t\}_{t \ge 1}\) be submartingale and let \(f\) be a convex and increasing function for which each \(f(X_t)\) is integrable. Then, \(\{f(X_t)\}_{t \ge 1}\) is a submartingale because \[ \EXP[ f(X_{t+1}) \mid \mathcal F_t] \ge f( \EXP[ X_{t+1} \mid \mathcal F_t ]) \ge f(X_t) \] where the first inequality follows from Jensen’s inequality and the second from the fact that \(\{X_t\}_{t \ge 1}\) is a sub-martingale and \(f\) is increasing. 
- A consequence of the previous result is the following. Let \(\{X_t\}_{t \ge 1}\) be a submartingale. Then, for any constant \(a\), \(\{(X_t - a)^+\}_{t \ge 1}\) is also a submartingale. 
- If \(\{X_t\}_{t \ge 1}\) and \(\{Y_t\}_{t \ge 1}\) are martingales defined on the same family of \(σ\)-algebras, then \(\{a X_t + b Y_t\}_{t \ge 1}\) is a martingale. 
- The previous result holds for supermartingales provided \(a\) and \(b\) are positive. 
- If \(\{X_t\}_{t \ge 1}\) and \(\{Y_t\}_{t \ge 1}\) are supermartingales defined on the same family of \(σ\)-algebras, then \(\{X_t \wedge Y_t\}_{t \ge 1}\) is a supermartingale. 
- Suppose \(X\) is an integrable random variable and \(\{\mathcal F_t\}_{t \ge 1}\) is a filtration. Define \(X_t \coloneqq \EXP[X \mid \mathcal F_t]\). Then \(\{X_t\}_{t \ge 1}\) is a martingale with respect to the filtration (follows from the smoothing property of conditional expectations). 
- Likelihood ratios are martingales. Suppose \(X_1\), \(X_2\), are independent random variables with density \(f\). Imagine we are considering the alternative hypothesis that these random variables are independent with a different probability density \(g\) (but they are really distributed according to the density \(f\)). We assume that \(f\) and \(g\) have the same support. Define the likelihood ratio \[ Λ_t = \frac{g(X_1)}{f(X_1)} \dots \frac{g(X_t)}{f(X_t)}. \] Then, since \(Λ_{t+1} = Λ_t g(X_{t+1})/f(X_{t+1})\), we have \[\begin{align*} \EXP[Λ_{t+1} \mid X_{1:t}] &= Λ_t \EXP\biggl[ \frac{g(X_{t+1})}{f(X_{t+1})} \biggm| X_{1:t} \biggr] \\ &= Λ_t \EXP\biggl[ \frac{g(X_{t+1})}{f(X_{t+1})} \biggr] \\ &= Λ_t \int\biggl( \frac{g(x)}{f(x)} \biggr) f(x)dx \\ &= Λ_t \int g(x)dx \\ &= Λ_t. \end{align*}\] Hence, \(\{Λ_t\}_{t \ge 1}\) is a martingale. 
- A martingale \(\{X_t\}_{t \ge 1}\) may be written as a sum of increments: \(X_t = X_0 + ξ_1 + \dots + ξ_t\). The increments \(\{ξ_t\}_{t \ge 1}\) are called martingale differences. Each \(ξ_t\) is integrable and \(\EXP[ξ_t \mid \mathcal F_{t-1}] = 0\) for all \(t\). 
43.2 Doob’s decomposition
Theorem 43.1 (Doob’s decomposition) Every sequence \(\{X_t\}_{t \ge 1}\) of integrable random variables adapted to a filtration \(\{\mathcal F_t\}_{t \ge 1}\) can be decomposed into a sum of a martingale and an integrable predictable process, i.e., \[ X_t = M_t + A_t \] where \(\{M_t\}_{t \ge 1}\) is a martingale and \(\{A_t\}_{t \ge 1}\) is a predictable process
- Existence. We can identify the Doob’s decomposition by defining \[ M_t = X_0 + \sum_{s=1}^{t}\bigl( X_s - \EXP[ X_s \mid \mathcal F_{s-1} \bigr],\] and \[A_t = \sum_{s=1}^t \bigl( \EXP[ X_s \mid \mathcal F_{s-1} ] - X_{s-1} \bigr).\] 
- Uniqueness. The decomposition is almost surely unique. If \(X_t = M'_t + A'_t\) is another decomposition, then \(Y_t = M_t - M'_t\) being the difference of martingales is a martingale. But \(M_t - M'_t = A_t - A'_t\), which is a predictable process; combining this with the martingale property of \(M_t - M'_t\) implies that \[ A_t - A'_t = \EXP[ A_t - A'_t \mid \mathcal F_t] = \EXP[ M_t - M'_t \mid \mathcal F_t ] = M_{t-1} - M'_{t-1} = A_{t-1} - A'_{t-1}. \] Since \(A_0 = A'_0 = 0\), we get that \(A_t = A'_t\) for all \(t \ge 1\). 
- A stochastic process \(\{X_t\}_{t \ge 1}\) is a submartingale if and only if its Doob’s decomposition gives a predictable process that is almost surely increasing. 
- A stochastic process \(\{X_t\}_{t \ge 1}\) is a supermartingale if and only if its Doob’s decomposition gives a predictable process that is almost surely decreasing. 
43.3 Martingale Inequalities
Burkholder’s inequality
Let \(\{X_t\}_{t \ge 1}\) be a martingale and let \(\xi_1 = X_1\) and \(\xi_t = X_t - X_{t-1}\) be the corresponding martingale difference sequence. The quadratic variation process is defined as \[ Q_t \coloneqq \sum_{s = 1}^t \NORM{ξ_s}^2. \]
Proposition 43.1 (Burkholder’s inequality) For any \(p > 1\) be such that \(\EXP[|X_t|^p] < ∞\) for all \(t\), there exists positive constants \(c_p\) and \(C_p\) (depending on \(p\) alone) such that \[ c_p \EXP\bigl[ Q_t^{p/2} \bigr] \le \EXP\Bigl[ \sup_{1 \le s \le t} \NORM{X_s}^p \Bigr] \le C_p \EXP\bigl[ Q_t^{p/2} \bigr] \]
The inequality is due to Burkholder (1966). See notes of David Pollard for an accessible proof based on martingale ideas and Bogdan and Więcek (2022) for an elementary proof using Bergman divergence.
Lenglart inequality
Proposition 43.2 (Lenglart inequality) Given a filtration \(\{\ALPHABET F_t\}_{t \ge 1}\), let \(\{X_t\}_{t \ge 1}\) be a non-negative adapted process and \(\{Y_t\}_{t \ge 1}\) be non-negative non-decreasing predictable process such that for any bounded stopping time \(τ\), we have \(\EXP[X_{τ} \mid \ALPHABET F_0] \le \EXP[ Y_{τ} \mid \ALPHABET F_0]\). Then:
- For all \(p \in (0,1)\), \[ \EXP\biggl[ \Bigl( \sup_{t \ge 1} X_t \Bigr)^p \Bigm| \ALPHABET F_0 \biggr] \le C_p \EXP\biggl[ \Bigl( \sup_{t \ge 1} Y_t \Bigr)^p \Bigm| \ALPHABET F_0 \biggr], \] where \(C_p = p^{-p}/(1-p)\) and the constant \(C_p\) is sharp. 
- In addition, if \(\{X_t\}_{t \ge 1}\) is non-decreasing, then we have \[ \EXP\biggl[ \Bigl( \sup_{t \ge 1} X_t \Bigr)^p \Bigm| \ALPHABET F_0 \biggr] \le p^{-p} \EXP\biggl[ \Bigl( \sup_{t \ge 1} Y_t \Bigr)^p \Bigm| \ALPHABET F_0 \biggr], \] and the constant \(p^{-p}\) is sharp. 
- Moreover, for any \(c,d > 0\), we have \[ \PR\Bigl( \sup_{t \ge 1} X_t > c \Bigm| \ALPHABET F_0 \Bigr) \le \frac 1c \EXP\Bigl[ \sup_{t \ge 1} Y_t \wedge d \Bigm| \ALPHABET F_0 \Bigr] + \PR\Bigl( \sup_{t \ge 1} Y_t > d \Bigm| \ALPHABET F_0 \Bigr) \] 
The result is due to Lenglart (1977). See Geiss and Scheutzow (2021) for the sharpness of the bounds.
43.4 Convergence of nonnegative supermartingales
We state certain properties of nonnegative supermartingales without proof. This material is taken from Neveu (1975).
- Maximal inequality for nonnegative supermartingales. For every nonnegative supermartingale \(\{X_t\}_{t \ge 1}\), the random variable \(\sup_{t \ge 1} X_t\) is a.s. finite on the set \(\{X_0 < ∞\}\) and, more precisely, satisfies the following inequality: \[ \PR\Bigl( \sup_{t \ge 1} X_t \ge a \Bigm| \mathcal F_t \Bigr) \le \max\biggl( \frac{X_0}{a}, 1 \biggr) \] 
- Doob’s inequalities. Let \(\{X_t\}_{t \ge 1}\) be a generalized martingale or a nonnegative submartingale. Then for all \(p \in (1,∞)\), and all \(t\), we have: \[ \EXP[\ABS{X_t}^p] \le \EXP\Bigl[ \sup_{1 \le s \le t} \ABS{X_s}^p \Bigr] \le \biggl(\frac{p}{p-1}\biggr)^p \EXP[\ABS{X_t}^p]. \] 
- Switching principle for supermartingales. Given two nonnegative supermartingales \(\{X^{(i)}_t\}_{t \ge 1}\), \(i \in \{1, 2\}\), and a stopping time \(τ\) such that \(X^{(1)}_{τ} \ge X^{(2)}_{τ}\) on \(\{τ < ∞\}\), the formula \[ X_t(ω) = \begin{cases} X^{(1)}_t(ω), & \text{if $t < τ(ω)$} \\ X^{(2)}_t(ω), & \text{if $t \ge τ(ω)$} \end{cases} \] defines a new nonnegative supermartingale. 
- Supermartingale convergence theorem. Every nonnegative supermartingale \(\{X_t\}_{t \ge 1}\) converges almost surely. Furthermore, the limit \(X_{∞} = \lim_{t \to ∞} X_t\) satisfies the inequality \[ \EXP[ X_{∞} \mid \mathcal F_t] \le X_t. \] 
- A nonnegative supermartingale converges in \(\mathcal L^1\) to its limit \(X_{∞}\) if and only if \(\EXP[X_t] \to \EXP[X_{∞}]\). 
- For a nonnegative supermartingale \(\{X_t\}_{t \ge 1}\) and a stopping time \(τ\), the sequence \(\{X_{τ \wedge t}\}_{t \ge 1}\) “stopped at the stopping time \(τ\)” is also a non-negative supermartingale. 
- Stopping theorem Let \(\{X_t\}_{t \ge 1}\) be a nonnegative supermartingale. Then, for any bounded stopping times \(τ_1\) and \(τ_2\), we have \[ X_{τ_1} \ge \EXP[ X_{τ_2} \mid \mathcal F_{τ_1}], \quad \text{ on } \{τ_1 \le τ_2 \}. \] 
The following interpretation is taken from Dellacherie and Meyer (1982).
Suppose the random variable \(X_t\) represents a gambler’s fortune at time \(t\). Then his successive gains are presented by the random variable \(ξ_t = X_t - X_{t-1}\). The gambler may be in an arbitrarily complicated casino, where he may choose between all sorts of games, move from one table to another, bet on other player’s fortunes, etc., but it is understood that his decisions are unprophetic, i.e., they can only be taken as functions for the past and not as functions of the future, with the convention that the present, i.e., the game which is in the process of being played, forms part of the future.
The supermartingale inequality \(\EXP[ξ_t \mid \ALPHABET F_t] \le 0\) means that, whatever decisions are taken by the gambler just before the \(t\)-th game, the average profits from that will be negative. In other words, the game favors the casino—which is what happens in reality! (The martingale equality corresponds to the case of an equitable casino)
Now imagine that the gambler, fearing that he is under the influence of an evil star, confides his fortune to a “luckier” (but still unprophetic) friend and goes out for fresh air and returns at random times \(τ_1 < τ_2 < \dots\). The stopping theorem says that what he observes at these random instances is also a game favorable to the casino (or merely equitable in case of martingales). In other words, things are no better.
The restriction on the length of the stopping time has the following meaning: suppose the gambler tells his friend to “call me at the first moment \(τ_1\) when my gain \(X_{τ_1} - X_0\) is positive. Then call me again at time \(τ_2\) when my gain \(X_{τ_2} - X_{τ_1}\) is positive. And so on” At such moments, mean gain is also positive and hence stopping theorem seems to be contradicted; however, the stopping theorem affirms that the stopping times \(τ_1\), \(τ_2\), etc., are not finite.
The convergence result continues to hold for “almost” supermartingales.
Theorem 43.2 (Almost supermartingale convergence theorem) Suppose \(\{X_t\}_{t \ge 1}\), \(\{β_t\}_{t \ge 1}\), \(\{Y_t\}_{t \ge 1}\), \(\{Z_t\}_{t \ge 1}\) are nonnegative stochastic processes adapted to some filtration \(\{\mathcal F_t\}_{t \ge 1}\) that satisfy \[\begin{equation}\label{eq:almost-supermartingale} \EXP[X_{t+1} \mid \mathcal F_t ] \le (1 + β_t) X_t + Y_t - Z_t, \quad t \ge 1 \end{equation}\] Define the set \(Ω_0\) by \[ Ω_0 = \biggl\{ ω : \sum_{t \ge 1} β_t(ω) < ∞ \biggr\} \cap \biggl\{ ω : \sum_{t \ge 1} Y_t(ω) < ∞ \biggr\}. \] Then, for all \(ω \in Ω_0\), we have that
- \(\lim_{t \to ∞} X_t(ω)\) exists and is finite
- \(\sum_{t \ge 1} Z_t(ω) < ∞\).
The key idea is that we can fiddle with \eqref{eq:almost-supermartingale} to make it a supermartingale. Let \(b_t = 1/\prod_{\tau = 1}^{t-1} ( 1 + β_{τ} )\). Note that \(\{b_t\}_{t \ge 1}\) is \(\{\mathcal F_t\}_{t \ge 1}\) predictable. Define the nonnegative processes \(\{X'_t\}_{t \ge 1}\) \(\{Y'_t\}_{t \ge 1}\) and \(\{Z'_t\}_{t \ge 1}\), where \[ X'_t = b_t X_t, \quad Y'_t = b_{t+1} Y_t, \quad Z'_t = b_{t+1} Z_t. \] \[\begin{align} \EXP[X'_{t+1} \mid \mathcal F_t] &= b_{t+1} \EXP[ X_{t+1} \mid \mathcal F_t] \notag \\ &\le b_{t+1}(1 + β_t) X_t + b_{t+1} Y_t - b_{t+1} Z_t \notag \\ &= X'_t + Y'_t - Z'_t. \label{eq:almost-supermartingale-st-1} \end{align}\]
Define \[\begin{equation}\label{eq:almost-supermartingale-st-2} W_t = X'_t - \sum_{s = 1}^{t-1} (Y'_s - Z'_s) . \end{equation}\] From \eqref{eq:almost-supermartingale-st-1} we have \[ \EXP[W_{t+1} \mid \mathcal F_t] = \EXP\biggl[ X'_{t+1} - \sum_{s=1}^t(Y'_s - Z'_s) \biggm| \mathcal F_t \biggr] \le X_t - \sum_{s=1}^{t-1} (Y'_s - Z'_s) = W_t. \] Therefore, \(\{W_t\}_{t \ge 1}\) is a supermartingale, but we cannot immediately apply the supermartingale convergence theorem because we don’t know that \(W_t\) is bounded from below.
For an arbitrary \(a > 0\), define the stopping time \(τ_a = \inf \bigl\{ t : \sum_{s=1}^{t} Y'_s > a \bigr\}\). Now define the stopped sequence \[W^{(a)}_t := W_{τ_a \wedge t} = \begin{cases} W_t, & \hbox{if } t \le τ_a \\ W_{τ_a}, & \hbox{if } t > τ_a \end{cases}, \quad t \ge 1.\] Note that the stopped sequence satisfies: \[ \EXP[W^{(a)}_{t+1} \mid \ALPHABET F_t] \stackrel{(a)}= \begin{cases} \EXP[W_{t+1} \mid \ALPHABET F_t] , & \hbox{if } t+1 \le τ_a \\ W_{τ_a}, & \hbox{if } t + 1 > τ_a \end{cases} \stackrel{(b)}\le \begin{cases} W_t, & \hbox{if } t < τ_a \\ W_{τ_a}, & \hbox{if } t \ge τ_a \end{cases} \stackrel{(c)}= W^{(a)}_t, \] where \((a)\) uses the definition of \(W^{(a)}_t\) and the fact that \(\EXP[W_{τ_a} \mid \ALPHABET F_t] = W_{τ_a}\) since \(τ_a \le t\); \((b)\) uses the fact that \(\{W_t\}_{t \ge 1}\) is a supermartingale; and \((c)\) uses the fact that the two terms coincide when \(t = τ_a\).
Thus, the stopped sequence \(\{W^{(a)}_{t}\}_{t \ge 1}\) is a supermartingale. Moreover, it is bounded from below since \[ W^{(a)}_t = W_{t \wedge τ} \ge - \sum_{s = 1}^{τ \wedge (t-1)} Y_s \ge -a,\] So, we can think of \(W^{(a)}_{t} + a\) as a nonnegative supermartingale. Therefore, by the supermartingale convergence theorem, for any given \(a\), \(\lim_{t \to ∞} W^{(a)}_{t}\) exists and is finite a.s.
Now define \(Ω^{(a)}\) to be the set \(\{ ω : \sum_{t \ge 1} Y_t(ω) < a \}\). We know that \(Y'_t = b_{t+1} Y_t < Y_t\). So, on the set \(Ω^{(a)}\), \(\sum_{t \ge 1} Y'_t(ω) < a\). Hence, for all \(ω \in Ω^{(a)}\), we have \(τ_{a}(ω) = ∞\). Therefore, \(W_t = W^{(a)}_t\) and, by the previous argument, \(\{W_t\}_{t \ge 1}\) converges to a finite limit.
By countable additivity, we get that \(W_t\) converges to a finite limit for all \(a \in \integers\).
Side remark: It took me a while to understand why this is the case, so I am including a complete formal argument.
- Fix an \(a\). We know that on \(Ω^{(a)}\), \(W_t\) has a finite limit. Denote that limit by \(W_{∞}\).
- For any \(ε > 0\), define the event \(\ALPHABET E = \cap_{s = 1}^∞ \cup_{t = s}^∞ |W_t(ω) - W_{∞}(ω)| > ε\).
- The fact that \(W_t\) converges almost surely to \(W_{∞}\) on \(Ω^{(a)}\) means that \(\PR(Ω^{(a)} \cap \ALPHABET E) = 0\).
- We now restrict attention to integer values of \(a\). Since \(Ω^{(a)} \cap \ALPHABET E\) is an increasing sequence, we have \[ \PR(Ω_0 \cap \ALPHABET E) = \PR(\lim_{a \to ∞} Ω^{(a)} \cap \ALPHABET E) = \lim_{a \to ∞} \PR(Ω^{(a)} \cap \ALPHABET E) = 0. \] Hence, \(W_t\) converges to \(W_{∞}\) almost surely on \(Ω_0\).
Now \eqref{eq:almost-supermartingale-st-2} implies that \[ \lim_{t \to ∞} \biggl \{ X'_t - \sum_{s=1}^{t-1} (Y'_s - Z'_s) \biggr\} < ∞. \] Hence, on \(Ω_0\), \(\sum_{s=1}^∞ Z'_s < ∞\) and \(\lim_{t \to ∞} X'_t\) exists and is finite.
From Exercise 43.1, it follows that for \(ω \in Ω_0\), \(b_t(ω)\) has a limit. \[ X_t = X'_t/b'_t \] has a limit. Moreover, the inequality, \[ Z_t \le Z'_t \prod_{s \ge 1} (1 + β_s) \] implies that \(\sum_{t \ge 1} Z_t < ∞\) for \(ω \in Ω_0\).
The following is a determinisitic version of the above result, taken from Bertsekas and Tsitsiklis (2000).
Proposition 43.3 Let \(\{X_t\}_{t \ge 1}\), \(\{Y_t\}_{t \ge 1}\), and \(\{Z_t\}_{t \ge 1}\) be sequences such that \(\{Z_t\}_{t \ge 1}\) is nonnegative and the sequences satisfy: \[ X_{t+1} \le X_t + Y_t - Z_t \] and \(\sum_{t \ge 1} Y_t < ∞\). Then, either \(X_t \to -∞\) or else \(X_t\) converges to a finite value and \(\sum_{t \ge 1} Z_t < ∞\).
The following generalization of Theorem 43.2 is useful in some settings.
Corollary 43.1 Suppose \(\{X_t\}_{t \ge 1}\), \(\{β_t\}_{t \ge 1}\), \(\{Y_t\}_{t \ge 1}\), \(\{Z_t\}_{t \ge 1}\) are nonnegative stochastic processes adapted to some filtration \(\{\mathcal F_t\}_{t \ge 1}\) that satisfy \[\begin{equation}\label{eq:almost-supermartingale-alpha} \EXP[X_{t+1} \mid \mathcal F_t ] \le (1 + β_t) X_t + Y_t - \textcolor{red}{γ_t} Z_t, \quad t \ge 1 \end{equation}\] Define the set \(Ω_0\) by \[ Ω_0 = \biggl\{ ω : \sum_{t \ge 1} β_t(ω) < ∞ \biggr\} \cap \biggl\{ ω : \sum_{t \ge 1} Y_t(ω) < ∞ \biggr\}. \] and \[ Ω_1 = \biggl\{ ω : \sum_{t \ge 1} γ_t(ω) = ∞ \biggr\}. \] Then,
- For all \(ω \in Ω_0\), we have that \(\lim_{t \to ∞} X_t(ω)\) exists and is finite 
- For all \(ω \in Ω_0 \cap Ω_1\), \(\lim_{t \to ∞} Z_t(ω) = 0\). 
Part (i) follows immediately from Theorem 43.2.
For part (ii), first observe that Theorem 43.2 implies that for all \(ω \in Ω_0\), \[ \sum_{t \ge 1} γ_t(ω) Z_t(ω) < ∞. \] But, for \(ω \in Ω_1\), we have that \(\sum_{t \ge 1} γ_t(ω) = ∞\). Then, it must be the case that \(Z_t(ω) \to 0\) for all \(ω \in Ω_0 \cap Ω_1\).
This can be proved by contradiction. Fix \(ω \in Ω_0 \cap Ω_1\). Suppose that there exists a \(c > 0\) such that for all \(t \ge T(ω)\), \(|Z_t(ω)| > c\). Then, (recall all variables are non-negative) \[ \sum_{t \ge T(ω)} γ_t(ω) Z_t(ω) \ge c \sum_{t \ge T(ω)} Z_t(ω) = ∞, \] which implies that \(\sum_{t \ge 1} γ_t(ω) Z_t(ω)\) diverges, which is a contradiction.
43.5 Convergence of submartingales
Theorem 43.3 (Krickeberg decomposition) Let \(\{S_t\}_{t \ge 1}\) be a submartingale for which \(\sup_t \EXP[S_t^{+}] < ∞\). Then, there exists a positive martingale \(\{M_t\}_{t \ge 1}\) and a positive supermartingale \(\{X_t\}_{t \ge 1}\) such that \(S_t = M_t - X_t\) almost sure for each \(t\).
The finiteness of \(\sup_{t} \EXP[ S_{t}^{+}]\) is equivalent to the finiteness of \(\sup_{t} \EXP[|S_t|]\) because \(|S_t| = 2S_t^{+} - (S_t^{+} - S_t^{-})\) and by the submartingale property, \(\EXP[S_t^{+} - S_t^{-}] = \EXP[S_t]\) increases with \(t\).
Corollary 43.2 A submartingale with \(\sup_{t} \EXP[S_t^{+}] < ∞\) converges almost surely to an integrable limit.
43.6 Square integrable martingales
- If \(\{X_t\}_{t \ge 1}\) is an integrable sequence adapted to \(\{\ALPHABET F_t\}_{t \ge 1}\). Define the compensator \(\{ \langle X_t \rangle\}_{t \ge 1}\) as: \(\langle X_1 \rangle = 0\) and for \(t > 1\): \[ \langle X_t \rangle - \langle X_{t-1} \rangle = \EXP[ X_t \mid \ALPHABET F_{t-1} ] - X_{t-1}. \] Note that the compensator is predictable and \(X_t - \langle X_t \rangle\) is a martingale. 
- Now consder a square integrable martingle \(\{X_t\}_{t \ge 1}\). The compensator is given by: \[ \langle X_t^2 \rangle - \langle X_{t-1}^2 \rangle = \EXP[ X_t^2 - X_{t-1}^2 \mid \ALPHABET F_{t-1} ] = \EXP[ (X_t - X_{t-1})^2 \mid \ALPHABET F_{t-1} ]. \] Thus, the compensator \(\langle X_t^2 \rangle\) is increasing. 
- Convergence of square integrable martinales. Let \(\langle X_{∞} \rangle = \lim_{t \to ∞} \langle X_t \rangle\). If \(\EXP[ \langle X_{∞} \rangle ] < ∞\), then \(\{X_t\}_{t \ge 1}\) converges almost surely in the mean-square sense to a random variable \(X_{∞}\). 
- Let \(\{X_t\}_{t \ge 1}\) be a square integrable martingale adapted to \(\{\ALPHABET F_t\}_{t \ge 1}\). Let \(τ\) be a stopping time such that \(\EXP[ \langle X_{τ} \rangle ] < ∞\). Then, - \(\EXP[X_{τ} \mid \ALPHABET F_1] = X_1\).
- \(\EXP[X_{τ}^2 \mid \ALPHABET F_1 ] = \EXP[ \langle M_{τ} \rangle \mid \ALPHABET F_1]\).
 
43.7 Strong law of large number for martingale difference sequence
The following is a generalization of the strong law of large numbers for martingale differences. See Stout (1974) (Theorem 3.3.1).
Theorem 43.4 Let \(\{M_t\}_{t \ge 1}\) be a martingale difference sequence with respect to a filtration \(\{\ALPHABET F_t\}_{t \ge 1}\) and \(\{a_t\}_{t \ge 1}\) be positive sequence that is \(\{\ALPHABET F_t\}_{t \ge 1}\) predictable and \(\lim_{t \to ∞} a_t = ∞\).
If for some \(p \in (0,2]\), we have \[ \sum_{t=1}^∞ \EXP \biggl[ \frac{|M_t|^p}{a_t^p} \biggm| \ALPHABET F_{t-1} \biggr] < ∞, \] then \[ \sum_{t=1}^∞ \frac{M_t}{a_t} = 0, \quad a.s. \]
43.8 Central limit theorem
There are various versions of central limit theorems for martingales. The following is the simplest. See (Billingsley 2013, Theorem 35.11)
Theorem 43.5 Let \(\{M_t\}_{t \ge 1}\) be a martingale difference sequence with respect to a filteration \(\{\ALPHABET F_t\}_{t \ge 1}\) such that there exists a deterministic bounded sequence \(\{c_t\}_{t \ge 1}\) such that \[ \ABS{M_t} \le c_t, \text{ a.s.}, \quad \forall t \ge 1. \] The increasing process associated with \(\{ M_t^2 \}_{t \ge 1}\) is defined as \[ A_T = \sum_{t = 1}^T \EXP[ M_{t} \mid \ALPHABET F_{t-1} ], \quad \forall T \ge 1. \]
Let \(Ω_0\) denote the sample paths along with \(A_t\) is unbounded, i.e., \[ Ω_0 = \{ ω \in Ω : \lim_{t \to ∞} A_t = ∞ \}. \] For a given \(t\), let \(ν_t\) denote the first time \(T\) when \(A_T \ge t\). Note that \(ν_t\) is a stopping time and \(ν_t(ω) < ∞\) for \(ω \in Ω_0\). Then, if \(\PR(Ω_0) = 1\) then \[ \frac{1}{\sqrt{T}} \sum_{t=1}^{ν_T} Y_t \xrightarrow{(d)} \mathcal{N}(0,1). \]
43.9 Martingale concentration
As mentioned earlier, martingales generalize sums of independent random variables. Then, it is natural to expect, that in addition to SLLN and CLT, they would also satisfy some form of concentration.
The simplest of these is the generalization of Azuma-Hoeffding inequalty. See (Raginsky and Sason 2013, Theorem 2.2.1)
Theorem 43.6 (Azuma-Hoeffding inequality for martingales) Let \(\{M_t\}_{t \ge 1}\) be a martingale difference sequence with respect to a filteration \(\{\ALPHABET F_t\}_{t \ge 1}\) such that there exists a deterministic bounded sequence \(\{c_t\}_{t \ge 1}\) such that \[ \ABS{M_t} \le c_t, \text{ a.s.}, \quad \forall t \ge 1. \] Then, for any \(T \ge 1\) and any \(ε > 0\), we have \[ \PR\biggl( \biggl| \sum_{t=1}^T M_t \biggr| > ε \biggr) \le 2 \exp\biggl( \frac{ - ε^2 }{2 \sum_{t=1}^T c_t^2 } \biggr). \]
Using the smoothing property of conditional expectation, we have \[\begin{align} \EXP\biggl[ \exp\biggl( s \biggl( \sum_{t=1}^T M_t \biggr) \biggr) \biggl] &= \EXP\biggl[ \exp\biggl( s \biggl( \sum_{i=1}^{n-1} M_t \biggr) \biggr) \, \EXP\bigl[ \exp\bigl( s M_T \bigr) \bigm| \mathcal{F}_{T-1} \bigl] \biggl] \notag \\ &\le \EXP\biggl[ \exp\biggl( s \biggl( \sum_{i=1}^{n-1} M_t \biggr) \biggr) \biggl] \, \exp\bigl( \tfrac12 s^2 c_t^2 \bigr), \end{align}\] where the inequality follows from :Hoeffding’s lemma applied to \(M_T\) conditioned on \(\ALPHABET F_{T-1}\). Iterating backwards this way, we get \[ \EXP\biggl[ \exp\biggl( s \biggl( \sum_{t=1}^T M_t \biggr) \biggr) \biggl] \le \exp\biggl(\tfrac{1}{2} s^2 \sum_{t=1}^T c_T^2 \biggr). \] Thus, \(\sum_{t=1}^T M_t\) is sub-Gaussian with proxy \(\sum_{t=1}^T c_t^2\). Then, from Lemma 39.1, we get \[ \PR\biggl( \sum_{t=1}^T M_t \ge t \biggr) \le \exp\biggl( - \frac{t^2}{2 \sum_{t=1}^T c_t^2 } \biggr) \] and \[ \PR\biggl( \sum_{t=1}^T M_t \le -t \biggr) \le \exp\biggl( - \frac{t^2}{2 \sum_{t=1}^T c_t^2 } \biggr). \] Conbining these two, we get the stated result.
For simplicity, assume that there exists a \(C\) such that \(c_t \le C\). Then, an alternate way to state Theorem 43.6 is the following: for any \(T\) and any \(δ > 0\), it holds that with probability at least \(1 - δ\), \[ \frac{1}{T} \sum_{t=1}^T M_t \le C \sqrt{ \frac{ 2 \log(2/δ) }{T} } \]
If you look at the proof of Theorem 43.6, then the only place we used the assumption that \(M_t\) was bounded was while invoking Hoeffding’s lemma to show that \[ \EXP[ \exp(s M_t) \mid \ALPHABET F_{t-1} ] \le \exp(\frac 12 s^2 c_t^2). \] But such a bound on moment generating function holds for any sub-Guassian random variable! However, defining (and verifying) conditionally sub-Gaussian random variables is a bit tricky, so we can use Lemma 39.3 to assume a stronger condition that is easier to verify.
Let \(\{M_t\}_{t \ge 1}\) be a martingle difference sequence with respect to a filtration \(\{\ALPHABET F_t\}_{t \ge 1}\) such that \[ \PR(|M_t| > t \mid \ALPHABET F_{t-1}) \le 2\exp\biggl(\frac{-t^2}{2 σ_t^2} \biggr). \] Then, from Lemma 39.3, it follows that \[\begin{equation}\label{eq:sub-Gaussian-tails} \EXP[ \exp(s M_t) \mid \ALPHABET F_{t-1} ] \le \exp(4 s^2 σ_t^2). \end{equation}\] Consequently, following the proof argument of Theorem 43.6, we can show that for any \(T \ge 1\) and any \(ε > 0\), we have \[ \PR\biggl( \biggl| \sum_{t=1}^T M_t \biggr| > ε \biggr) \le 2 \exp\biggl( \frac{ - ε^2 }{16 \sum_{t=1}^T σ_t^2 } \biggr). \]
A variation of the above result is presented in Shamir (2011), with slightly weaker assumptions (but it appears with slightly looser constants).
We now present Bernstein inequality counterpart for martingales. See (Tropp 2011, Theorem 2.3) for proof.
Theorem 43.7 (Tropp Inequality) Let \(\{M_t\}_{t \ge 1}\) be an adapted sequence and \(\{V_t\}_{t \ge 1}\) be a predictable sequence of real symmetric matrices (or complex Hermitian matrices) of dimension \(d\) with respect to a filtration \(\{\ALPHABET F_t\}_{t \ge 1}\). Assume that there is a function \(g : (0, ∞) \to [0, ∞]\) such that for all \(s > 0\), \[ \EXP[ e^{s M_t} \mid \ALPHABET F_{t-1} ] \preceq e^{g(s) V_t}, \quad a.s. \] Then, for all \(ε, δ \in \reals\), \[ \PR\biggl(\exists T \ge 1 : λ_{\max}\biggl(\sum_{t=1}^{T} M_t \biggr) \ge ε \text{ and } λ_{\max}\biggl( \sum_{t=1}^{T} V_t \biggr) \le δ \biggr) \le d \inf_{s > 0} e^{-s ε + g(s) δ} \]
- The inequality in Theorem 43.7 is a maximal inequality. It also provides a bound on maximal eigenvalues at each time because for any sequence of events \(\{E_t\}_{t \ge 1}\), we have \[ \PR(E_T) \le \PR(\exists T \ge 1 : E_T). \] 
- We can view the Azuma-Hoeffding inequality with sub-Gaussian tails as a special case of Theorem 43.7. In that setting, the process \(V_t = σ_t^2\) is a deterministic process and the function \(g(s) = 4s^2\). Therefore, for \(d=1\), the upper bound is given by \[ \PR\biggl(\exists T \ge 1 : \sum_{t=1}^T M_t > ε \biggr) \le \inf_{s > 0} e^{ - s ε + 4s^2 δ } = e^{-s/{16 δ}} \] where \(δ = \sum_{t=1}^T σ_t^2\). Thus, we recover the one-sided version of the result. 
- Theorem 43.7 generalizes the Azuma-Hoeffding inequality to the setting where the variance proxy is a random variable. An example is Freedman inequality. See Tropp (2011) for discussion. 
Exercises
Exercise 43.1 The purpose of this exercise is to establish the following: For a sequence of nonnegative real numbers \(\{a_t\}_{t \ge 1}\) \[\sum_{t \ge 1} a_t < ∞ \iff \prod_{t \ge 1} (1 + a_t) \text{ converges to a finite non-zero limit}. \]
- Show that \(\sum_{t \ge 1}a_t \le \prod_{t \ge 1}(1 + a_t)\). - Hint: Expand the right hand side. 
- Show that \(\prod_{t \ge 1}(1 + a_t) \le \exp\Bigl(\sum_{t \ge 1} a_t\Bigr)\). - Hint: Use Taylor series expansion of \(e^x\). 
Exercise 43.2 (Kronecker’s Lemma) This result is useful intermediate step for showing convergence of various iterative algorithms.
Suppose \(\{a_t\}_{t \ge 1}\) is a strictly positive increasing real sequence which diverges to \(∞\) and \(\{x_t\}_{t \ge 1}\) is a real sequence such that the series \(\sum_{t=1}^∞ x_t/a_t\) converges. Then, \[ \lim_{T \to ∞} \frac 1{a_T} \sum_{t=1}^T x_t = 0. \]
Exercise 43.3 Consider a sequence \(\{α_t\}_{t \ge 1}\) adapted to \(\{\mathcal F_t\}\) such that \[ \sum_{t=1}^{∞} α_t = \infty \quad\text{and}\quad \sum_{t=1}^{∞} α_t^2 < ∞. \] Now suppose \(\{X_t\}_{t \ge 1}\) is an adapted sequence of square integrable random variables such that \[ \EXP[ X_{t+1} | \mathcal F_t ] = 0 \quad\text{and}\quad \EXP[ \| X_{t+1}\|_2^2 \mid \mathcal F_t ] \le B \] where \(B\) is a deterministic constant. Then \[ \sum_{t=1}^T α_t X_{t+1} \quad\text{and}\quad \sum_{t=1}^T α_t^2 \| X_t\|_2^2, \quad T = 1,2, \dots \] converge to finite limits almost surely.
Exercise 43.4 (Generalization of Theorem 43.2) Suppose \(\{X_t\}_{t \ge 1}\), \(\{a_t\}_{t \ge 1}\), \(\{Y_t\}_{t \ge 1}\) be nonnegative processes which are adapted to some filtration \(\{\ALPHABET F_t\}_{t \ge 1}\) such that \[ \EXP[X_{t+1} \mid \ALPHABET F_t] \le a_t X_t + Y_t. \] Define the set \(Ω_0\) by \[ Ω_0 = \biggl\{ ω : \prod_{t \ge 1} a_t(\omega) < ∞ \biggr\} \cap \biggl\{ ω : \sum_{t \ge 1} Y_t(ω) < ∞ \biggr\} \] Then, show that for all \(ω \in Ω_0\), we have that \(\lim_{t \to ∞} X_t(ω)\) exists and is finite.
Exercise 43.5 (Vector generalization of Theorem 43.2) In all the discussion of almost supermartingale results so far, we have assumed that the processes were scalar valued. Now consider the case where \(\{X_t\}_{t \ge 1}\) and \(\{Y_t\}_{t \ge 1}\) are processes in \(\reals_{\ge 0}^n\) and \(\{A_t\}_{t \ge 1}\) is a process in \(\reals_{\ge 0}\), all adapted to a filtration \(\{\ALPHABET F_t\}_{t \ge 1}\) such that \[ \EXP[X_{t+1} \mid \ALPHABET F_t] \le A_t X_t + Y_t. \]
Assume that for all \(t\), the matrix \(A_t\) is of the form \(P_t D_t\), where \(P_t\) is a permutation matrix and \(D_t\) is a diagonal matrix with all positive elements.1
1 Such matrices are called :positive monomial matrices and have the property that they are invertible and all entries of the inverse are non-negative.
Define the set \(Ω_0\) by \[ Ω_0 = \biggl\{ ω : \prod_{t \ge 1} A_t(\omega) < ∞ \biggr\} \cap \biggl\{ ω : \sum_{t \ge 1} Y_t(ω) < ∞ \biggr\} \] Then, show that for all \(ω \in Ω_0\), we have that \(\lim_{t \to ∞} X_t(ω)\) exists and is finite.
Exercise 43.6 (Delayed almost supermartingales) Suppose \(\{X_t\}_{t \ge 1}\), \(\{β_t\}_{t \ge 1}\), and \(\{Y_t\}_{t \ge 1}\) be nonnegative processes which are adapted to some filtration \(\{\ALPHABET F_t\}_{t \ge 1}\) such that \[ \EXP[ X_{t+1} \mid \ALPHABET F_t] \le (1 + β_t) X_{t-d} + Y_t, \] where \(d \in \integers_{\ge 0}\) is a constant. Define the set \(Ω_0\) by \[ Ω_0 = \biggl\{ ω : \sum_{t \ge 1} β_t(\omega) < ∞ \biggr\} \cap \biggl\{ ω : \sum_{t \ge 1} Y_t(ω) < ∞ \biggr\} \] Then, show that for all \(ω \in Ω_0\), we have that \(\lim_{t \to ∞} X_t(ω)\) exists and is finite.
Hint: Write the dynamics equation in vector form and use Exercise 43.5
Exercise 43.7 Let \(\{X_t\}_{t \ge 1}\) be square integrable martingale with \(\EXP[ X_T^2 ] \le T\) for all \(T\). Prove that \(X_T/T \to 0\), a.s.
[This may be viewed as a generalization of a SLLN for i.i.d. sequences.]
Hint: Use Abel’s lemma to show \(\sum_{t=1}^T \EXP[ M_t^2 ]/t^2 < ∞\) and then use Theorem 43.4.
Notes
The material in the introduction is taken from Pollard (2002), Chang (2007), and Dellacherie and Meyer (1982). The latter is an excellent reference for this material. Also see Doob (1971) for a detailed informal introduction to the topic. See Mazliak and Shafer (2022) for a historical account of the development of martingales, including a fascinating deep dive into the origin of the name.
Theorem 43.2 and its proof is adapted from Robbins and Siegmund (1971) and this post on math stackexchange. A version of Theorem 43.2 with \(Z_t = 0\) is stated as Exercise II-4 of Neveu (1975). Corollary 43.1 is from Karandikar and Vidyasagar (2024).
Exercise 43.3 is from Bertsekas and Tsitsiklis (2000). Exercise 43.5 is from Mahajan et al. (2024), which provides other vector generalizations of Theorem 43.2 as well.