ECSE 506: Stochastic Control and Decision Theory

Aditya Mahajan
Winter 2022

About  |  Lectures  |  Notes  |  Coursework

Theory: A Martingale Principle of Optimal Control

Consider a discounted cost MDP (with bounded rewards) and fix a Markov policy \(g \colon \ALPHABET X \to \reals\). Suppose for a bounded function \(Φ \colon \ALPHABET X \to \reals\), we define a process \(\{M_t\}_{t \ge 1}\) starting at \(M_1 = 0\) and its increments \(ΔM_t = M_{t+1} - M_t\) given by \[ ΔM_t = c(X_t, g(X_t)) + β Φ(X_{t+1}) - Φ(X_t). \]

Proposition

If \(\{M_t\}_{t\ge1}\) is a submartingale for all policies \(g\) and, for some policy \(g^*\), \(\{M_t\}_{t \ge 1}\) is a martingale, then \(g^*\) is an optimal policy and \(Φ(x) = V^{g^*}(x) = V(x)\).

Proof

Let \(\{\mathcal{F}_t\}_{t \ge 1}\) denote the natural filtration of the process.

As a first step, we define a new process \(\{M^β_t\}_{\ge 1}\) where

\[ M^β_t = \sum_{s=1}^{t-1} β^{s-1} ΔM_s. \]

Note that

\[\begin{align*} \EXP[ M^β_{t+1} \mid \mathcal{F}_t ] &= \sum_{s=1}^{t-1} β^{s-1} ΔM_s + β^t \EXP[ M_{t+1} \mid \mathcal{F}_t ] - β^t M_t. \\ &= M^β_t + β^t \EXP[ M_{t+1} \mid \mathcal{F}_t ] - β^t M_t. \end{align*}\]

Thus, if \(\{M_t\}_{t \ge 1}\) is a martingale or submartingale or supermartingale, then so is \(\{M^β_t\}_{t \ge 1}\).

Now, we assume that \(\{M_t\}_{t \ge 1}\) is a submatrignale for all policies. Then, as argued above, so is \(\{M^β_t\}_{t \ge 1}\). Therefore,

\[ 0 = M^β_1 \le \EXP[ M^β_t | \mathcal{F}_1 ] = \EXP\biggl[ \sum_{s=1}^{t-1} β^{s-1} c(X_s, g(X_s)) + β^{t} Φ(X_{t+1}) - Φ(X_1) \biggm| X_1 = x \biggr]. \]

Rearranging terms and letting \(t \to ∞\), we get that for any policy \(g\) \[ Φ(x) \le \EXP\biggl[ \sum_{s=1}^∞ β^{s-1} c(X_s, g(X_s)) \biggm| X_1 = x \biggr], \] where the inequality holds with equality of \(\{ M^β_t \}_{t \ge 1}\) is a martingale for some policy \(g^*\). Thus, we get that

\[ Φ(x) = V^{g^*}(x) \le V^g(x). \]

Hence, the policy \(g^*\) is optimal.

References

See Davis (1979) for a historical review of martingale methods for stochastic control.

Davis, M.H.A. 1979. Martingale methods in stochastic control. In: Stochastic control theory and stochastic differential systems. Springer-Verlag, 85–117. DOI: 10.1007/bfb0009377.

This entry was last updated on 27 Jul 2020 and posted in MDP and tagged infinite horizon, discounted cost, martingales.