# ECSE 506: Stochastic Control and Decision Theory

Theory: A Martingale Principle of Optimal Control

Consider a discounted cost MDP (with bounded rewards) and fix a Markov policy $$g \colon \ALPHABET X \to \reals$$. Suppose for a bounded function $$Φ \colon \ALPHABET X \to \reals$$, we define a process $$\{M_t\}_{t \ge 1}$$ starting at $$M_1 = 0$$ and its increments $$ΔM_t = M_{t+1} - M_t$$ given by $ΔM_t = c(X_t, g(X_t)) + β Φ(X_{t+1}) - Φ(X_t).$

Proposition

If $$\{M_t\}_{t\ge1}$$ is a submartingale for all policies $$g$$ and, for some policy $$g^*$$, $$\{M_t\}_{t \ge 1}$$ is a martingale, then $$g^*$$ is an optimal policy and $$Φ(x) = V^{g^*}(x) = V(x)$$.

#### Proof

Let $$\{\mathcal{F}_t\}_{t \ge 1}$$ denote the natural filtration of the process.

As a first step, we define a new process $$\{M^β_t\}_{\ge 1}$$ where

$M^β_t = \sum_{s=1}^{t-1} β^{s-1} ΔM_s.$

Note that

\begin{align*} \EXP[ M^β_{t+1} \mid \mathcal{F}_t ] &= \sum_{s=1}^{t-1} β^{s-1} ΔM_s + β^t \EXP[ M_{t+1} \mid \mathcal{F}_t ] - β^t M_t. \\ &= M^β_t + β^t \EXP[ M_{t+1} \mid \mathcal{F}_t ] - β^t M_t. \end{align*}

Thus, if $$\{M_t\}_{t \ge 1}$$ is a martingale or submartingale or supermartingale, then so is $$\{M^β_t\}_{t \ge 1}$$.

Now, we assume that $$\{M_t\}_{t \ge 1}$$ is a submatrignale for all policies. Then, as argued above, so is $$\{M^β_t\}_{t \ge 1}$$. Therefore,

$0 = M^β_1 \le \EXP[ M^β_t | \mathcal{F}_1 ] = \EXP\biggl[ \sum_{s=1}^{t-1} β^{s-1} c(X_s, g(X_s)) + β^{t} Φ(X_{t+1}) - Φ(X_1) \biggm| X_1 = x \biggr].$

Rearranging terms and letting $$t \to ∞$$, we get that for any policy $$g$$ $Φ(x) \le \EXP\biggl[ \sum_{s=1}^∞ β^{s-1} c(X_s, g(X_s)) \biggm| X_1 = x \biggr],$ where the inequality holds with equality of $$\{ M^β_t \}_{t \ge 1}$$ is a martingale for some policy $$g^*$$. Thus, we get that

$Φ(x) = V^{g^*}(x) \le V^g(x).$

Hence, the policy $$g^*$$ is optimal.

# References

See Davis (1979) for a historical review of martingale methods for stochastic control.

Davis, M.H.A. 1979. Martingale methods in stochastic control. In: Stochastic control theory and stochastic differential systems. Springer-Verlag, 85–117. DOI: 10.1007/bfb0009377.

This entry was last updated on 27 Jul 2020 and posted in MDP and tagged infinite horizon, discounted cost, martingales.