# ECSE 506: Stochastic Control and Decision Theory

## Aditya Mahajan

Winter 2022

### About | Lectures | Notes | Coursework

# 1 Preliminaries

## 1.1 Lipschitz continuous functions

Given two metric spaces \((\ALPHABET X, d_X)\) and \((\ALPHABET Y, d_Y)\), the Lipschitz constant of function \(f \colon \ALPHABET X \to \ALPHABET Y\) is defined by \[ \| f\|_{L} = \sup_{x_1 \neq x_2}
\left\{ \frac{ d_Y(f(x_1), f(x_2)) } { d_X(x_1, x_2) } :
x_1, x_2 \in \ALPHABET X \right\} \in [0, ∞]. \] The function is called *Lipschitz continuous* if its Lipschitz constant is finite.

Intuitively, a Lipschitz continuous function is limited by how fast it can change. For example, the following image from Wikipedia shows that for a Lipschitz continuous function, there exists a double cone (white) whose origin can be moved along the graph so that the whole graph always stays outside the double cone.

Let \(\ALPHABET U\) be an arbitrary set. A function \(f \colon \ALPHABET X × \ALPHABET U \to \ALPHABET Y\) is said to be *uniformly Lipschitz* in \(u\) if \[ \sup_{u \in \ALPHABET U} \| f(\cdot, u) \|_L =
\sup_{u \in \ALPHABET U} \sup_{x_1 \neq x_2}
\dfrac{ d_Y(f(x_1,u), f(x_2, u)) }{ d_X(x_1, x_2) } < ∞. \]

## 1.2 Some examples

A function \(f \colon \reals \to \reals\) is Lipschitz continuous if and only if it has bounded first derivative. The Lipschitz constant of such a function is equal to the maximum absolute value of the derivative.

Here are some examples of Lipschitz continuous functions:

The function \(f(x) = \sqrt{x^2 + 1}\) defined over \(\reals\) is Lipschitz continuous because it is everywhere differentiable and the maximum value of the derivative is \(L = 1\).

The function \(f(x) = |x|\) defined over \(\reals\) is Lipschitz continuous with Lipschitz constant equal to \(1\). Note that this function is continuous but not differentiable.

The function \(f(x) = x + \sin x\) defined over \(\reals\) is Lipschitz continuous with a Lipschitz constant equal to \(1\).

The function \(f(x) = \sqrt{x}\) defined over \([0,1]\) is

*not*Lipschitz continuous because the function becomes infinitely steep as \(x\) approaches \(0\).The function \(f(x) = x^2\) defined over \(\reals\) is

*not*Lipschitz continuous because it becomes arbitrarily steep as \(x\) approaches infinity.The function \(f(x) = \sin(1/x)\) is bounded but

*not*Lipschitz because becomes infinitely steep as \(x\) approaches \(0\).

## 1.3 Properties of Lipschitz functions

**Prop. 1**-
Lipschitz continuous functions have the following properties:

If a function \(f \colon (\ALPHABET X, d_X) \to (\ALPHABET Y, d_Y)\) is Lipschitz continuous, then \(f\) is uniformly continuous and measurable.

\(\| f\|_L = 0\) if and only if \(f\) is a constant.

If \(f \colon (\ALPHABET X, d_X) \to (\ALPHABET Y, d_Y)\) and \(g \colon (\ALPHABET Y, d_Y) \to (\ALPHABET Z, d_Z)\) are Lipschitz continuous, then \[ \| f \circ g \|_L \le \| f \|_L \cdot \| g \|_L. \]

The \(\| \cdot \|_{L}\) is a seminorm on the vector space of Lipschitz functions from a metric space \((\ALPHABET X, d_X)\) to \((\ALPHABET Y, d_Y)\). In particular, \(\| \cdot \|_L\) has the following properties: \(\| f \|_L \in [0, ∞]\), \(\| α f\|_L = |α| \cdot \|f\|_L\) for any \(α \in \reals\), and \(\| f_1 + f_2 \|_L \le \|f_1 \|_L + \|f_2 \|_L\).

Given a family of functions \(f_i\), \(i \in I\), on the same metric space such that \(\sup_{i \in I} f_i < ∞\), \[ \| \sup_{i \in I} f_i \|_{L} \le \sup_{i \in I} \| f_i \|_{L}. \]

Let \(f_n\), \(n \in \integers_{\ge 1}\), and \(f\) be functions from \((\ALPHABET X, d_X)\) to \((\ALPHABET Y, d_Y)\). If \(f_n\) converges pointwise to \(f\) for \(n \to ∞\), then \[ \| f \|_{L} \le \lim\inf_{n \to ∞} \| f_i \|_{L}. \]

## 1.4 Kantorovich distance

Let \(\mu\) and \(\nu\) be probability measures on \((\ALPHABET X, d_X)\). The *Kantorovich distance* between distributions \(\mu\) and \(\nu\) is defined as: \[ K(\mu,\nu) = \sup_{f : \| f\|_L \le 1 }
\left| \int_{\ALPHABET X} f\, d\mu - \int_{\ALPHABET X} f\, d\nu \right|. \]

The next two results follow immediately from the definition of Kantorovich distance.

**Lemma 1**-
For any Lipschitz function \(f \colon (\ALPHABET X, d_X) \to (\reals, \lvert \cdot \rvert)\), and \(μ,ν\) are probability measures on \((\ALPHABET X, d_X)\), \[ \left| \int_{\ALPHABET X} f\, dμ - \int_{\ALPHABET X} f\, dν \right| \le \| f \|_L \cdot K(μ,ν). \]

## 1.5 Some examples

Let \((\ALPHABET X, d_X)\) be a metric space and for any \(x,y \in \ALPHABET X\), let \(δ_x\) and \(δ_y\) denote the Dirac delta distributions centered at \(x\) and \(y\). Then, \[ K(δ_x, δ_y) = d_X(x,y). \]

Let \((\ALPHABET X, d_X)\) be a Euclidean space with Euclidean norm. Let \(μ \sim \mathcal{N}(m_1, \Sigma_1)\) and \(ν \sim \mathcal{N}(m_2, \Sigma_2)\) be two Gaussian distributions on \(\ALPHABET X\). Then, \[K(μ,ν) = \sqrt{ \| m_1 - m_2 \|_2^2 + \text{Tr}( \Sigma_1 + \Sigma_2 - 2(\Sigma_2^{1/2} \Sigma_1 \Sigma_2^{1/2})^{1/2} ) }. \] If the two covariances commute, i.e. \(\Sigma_1\Sigma_2 = \Sigma_2 \Sigma_1\), then, \[K(μ,ν) = \sqrt{ \| m_1 - m_2 \|_2^2 + \| \Sigma_1^{1/2} - \Sigma_2^{1/2} \|^2_F},\] where \(\| ⋅ \|_{F}\) denotes the Frobeinus norm of a matrix.

When \(\Sigma_1 = \Sigma_2\), we have \[K(μ,ν) = \| m_1 - m_2 \|_2. \]

If \(\ALPHABET X = \reals\) and \(d_X = | \cdot |\), then for any two distributions \(μ\) and \(ν\), \[ K(μ,ν) = \int_{-∞}^∞ \left| F_μ(x) - F_ν(x) \right| dx, \] where \(F_μ\) and \(F_ν\) denote the CDF of \(μ\) and \(ν\).

Furthermore, if \(μ\) is stochastically dominated by \(ν\), then \(F_μ(x) \ge F_ν(x)\). Thus, \[ K(μ, ν) = \bar μ - \bar ν \] where \(\bar μ\) and \(\bar ν\) are the means of \(μ\) and \(ν\).

# 2 Lipschitz MDPs

Now consider an MDP where the state and action spaces are Metric spaces. We denote the corresponding metric by \(d_X\) and \(d_U\) respectively. For ease of exposition, we define a metric \(d\) on \(\ALPHABET X × \ALPHABET U\) by \[ d( (x_1, u_1), (x_2, u_2) ) = d_X(x_1, x_2) + d_U(u_1, u_2). \]

We allow for randomized policies. Thus, given any state \(x \in \ALPHABET X\), \(g(\cdot | x)\) is a probability distribution on \(\ALPHABET U\). We say that a (possibly) randomized policy \(g\) has a Lipschitz constant of \(L_g\) if for any \(x_1, x_2 \in \ALPHABET X\), \[ K(g(\cdot| x_1), g(\cdot | x_2)) \le L_g d_X(x_1, x_2). \]

Note that if \(g\) is deterministic, then due to property of Kantorovich distance between delta distributions, the above relationship simplifies to \[ d_U(g(x_1), g(x_2)) \le L_g d_X(x_1, x_2). \]

**Definition 1**-
An MDP is \((L_c, L_p)\)-Lipschitz if for all \(x_1, x_2 \in \ALPHABET X\) and \(u_1, u_2 \in \ALPHABET U\),

- \(| c(x_1, u_1) - c(x_2, u_2) | \le L_c\bigl( d_X(x_1, x_2) + d_U(u_1, u_2) \bigr)\).
- \(K(p(\cdot | x_1, u_1), p(\cdot | x_2, u_2)) \le L_p\bigl( d_X(x_1, x_2) + d_U(u_1, u_2) \bigr)\).

## 2.1 Lipschitz continuity of Bellman updates

We now prove a series of results for the Lipschitz continuity of Bellman updates.

**Lemma 2**-
Let \(V \colon \ALPHABET X \to \reals\) be \(L_V\)-Lipschitz continuity. Define \[ Q(x,u) = c(x,u) + β \int V(y) p(y|x,u)dy. \] Then \(Q\) is \((L_c + β L_p L_V)\)-Lipschitz continuous.

#### Proof

Consider, \[\begin{align*} | Q(x_1, u_1) - Q(x_2, u_2) | &\stackrel{(a)}\le | c(x_1, u_1) - c(x_2, u_2) | \\ & \quad + \beta \left|\int V(y) p(y|x_1, u_1) dy - \int V(y) p(y|x_2, u_2) dy \right| \\ &\stackrel{(b)}\le L_c d( (x_1, u_1), (x_2, u_2) ) + \beta L_V L_p d( (x_1, u_1), (x_2, u_2) ), \end{align*}\] where \((a)\) follows from the triangle inequality and \((b)\) follows from Lemma 1. Thus, \(L_Q = L_c + β L_p L_V\). \(\Box\)

**Lemma 3**-
Let \(Q \colon \ALPHABET X × \ALPHABET U \to \reals\) be \(L_Q\)-Lipschitz continuous. Define \[V(x) = \min_{u \in \ALPHABET U} Q(x,u).\] Then \(V\) is \(L_Q\)-Lipschitz continuous.

#### Proof

Consider \(x_1, x_2 \in \ALPHABET X\) and let \(u_1\) and \(u_2\) denote the corresponding optimal action. Then, \[ \begin{align*} V(x_1) - V(x_2) &= Q(x_1, u_1) - Q(x_2, u_2) \\ &\stackrel{(a)}\le Q(x_1, u_2) - Q(x_2, u_2) \\ &\stackrel{(b)}\le L_Q( d_X(x_1, x_2) + d_U(u_2, u_2) )\\ &= L_Q d_X(x_1, x_2). \end{align*} \]

By symmetry, \[ V(x_2) - V(x_1) \le L_Q d_X(x_2, x_1). \] Thus, \[ | V(x_1) - V(x_2) | \le L_Q d_X(x_1, x_2). \] Thus, \(V\) is \(L_Q\)-Lipschitz continuous. \(\Box\)

**Lemma 4**-
Let \(Q \colon \ALPHABET X × \ALPHABET U \to \reals\) be \(L_Q\)-Lipschitz continuous and \(g\) be a (possibly randomized) \(L_g\)-Lipschitz policy. Define \[V_g(x) = \int Q(x, u) g(u | x) du.\] Then, \(V_g\) is \(L_Q( 1 + L_g)\)-Lipschitz continuous.

#### Proof

For any \(x_1, x_2 \in \ALPHABET X\), consider \[ \begin{align} | V_g(x_1) - V_g(x_2) | &= \left| \int Q(x_1, u) g(u | x_1) du - \int Q(x_2, u) g(u | x_2) du \right| \notag \\ &\stackrel{(a)}\le \left| \int Q(x_1, u) g(u | x_1) du - \int Q(x_2, u) g(u | x_1) du \right| \notag \\ & \quad + \left| \int Q(x_2, u) g(u | x_1) du - \int Q(x_2, u) g(u | x_2) du \right| \label{eq:split} \end{align} \] where \((a)\) follows from the triangle inequality. Now we consider both terms separately.

The first term of \eqref{eq:split} simplifies as follows: \[\begin{align} \left| \int Q(x_1, u) g(u | x_1) du - \int Q(x_2, u) g(u | x_1) du \right| &\stackrel{(b)}\le \int \left|Q(x_1, u) - Q(x_2, u)\right| g(u | x_1) du \notag \\ &\stackrel{(c)}\le \int L_Q d_X(x_1, x_2) g(u | x_1) du \notag \\ &= L_Q d_X(x_1, x_2), \label{eq:first} \end{align} \] where \((b)\) follows from the triangle inequality and \((c)\) follows from Lipschitz continuity of \(Q\).

The second term of \eqref{eq:split} simplifies as follows: \[ \begin{align} \left| \int Q(x, u) g(u | x_1) du - \int Q(x,u) g(u | x_2) du \right| &\stackrel{(d)}\le L_Q K (g(\cdot | x_1), g(\cdot | x_2)) \notag \\ &\stackrel{(e)}\le L_Q L_g d_X(x_1, x_2), \label{eq:second} \end{align} \] where the \((d)\) inequality follows from Lemma 1 and \((e)\) follows from the definition of Lipschitz continuous policy.

Substituting \eqref{eq:first} and \eqref{eq:second} in \eqref{eq:split}, we get \[ \begin{align*} | V_g(x_1) - V_g(x_2) | &\le L_Q d_X(x_1, x_2) + L_Q L_g d_X(x_1, x_2) \\ &= L_Q(1 + L_g) d_X(x_1, x_2). \end{align*} \] Thus, \(V\) is Lipschitz continuous with Lipschitz constant \(L_Q(1 + L_g)\). \(\Box\)

## 2.2 Lipschitz continuity of Picard iteration

**Lemma 5**-
Consider a discounted infinite horizon MDP which is \((L_c, L_p)\)-Lipschitz. Start with \(V^{(0)} = 0\) and recursively define

- \(\displaystyle Q^{(n+1)}(x,u) = c(x,u) + β \int V^{(n)}(y) p(y|x,u) dy.\)
- \(\displaystyle V^{(n+1)}(x) = \min_{u \in \ALPHABET U} Q^{(n+1)}(x,u).\)

Then, \(V^{(n)}\) is Lipschitz continuous and its Lipschitz constant \(L_{V^{(n)}}\) satisfies the following recursion: \[L_{V^{(n+1)}} = L_c + β L_p L_{V^{(n)}}.\]

#### Proof

We prove the result by induction. For \(n=1\), \(Q^{(1)}(x,u) = c(x,u)\), which is Lipschitz with Lipschitz constant \(L_{Q^{(1)}} = L_c\). Then, by Lemma 3, \(V^{(1)}\) is Lipschitz with Lipschitz constant \(L_{V^{(1)}} = L_{Q^{(1)}} = L_c\). This forms the basis of induction. Now assume that \(V^{(n)}\) is \(L_{V^{(n)}}\)-Lipschitz. Then, by Lemma 2, \(Q^{(n+1)}\) is \((L_c + βL_p L_{V^{(n)}})\)-Lipschitz. Therefore, by Lemma 3, \(V^{(n+1)}\) is Lipschitz with constant \[ L_{V^{(n+1)}} = L_c + β L_p L_{V^{(n)}}. \space\Box\]

**Lemma 6**-
Consider a discounted infinite horizon MDP which is \((L_c, L_p)\)-Lipschitz and let \(g\) be any randomized time-homogeneous policy which is \(L_g\)-Lipschitz. Start with \(V^{(0)} = 0\) and then recursively define

- \(V^{(n)}_g(x) = \int Q^{(n)}_g(x,u)g(u|x) du.\)
- \(\displaystyle Q^{(n+1)}_g(x,u) = c(x,u) + β \int V^{(n)}_g(y) p(y|x,u) dy.\)

Then, then \(Q^{(n)}_g\) is Lipschitz continuous and its Lipschitz constant \(L_{Q^{(n)}_g}\) satisfies the follwoing recursion: \[ L_{Q^{(n+1)}_g} + L_c + \beta(1 + L_g)L_p L_{Q^{(n)}_g}. \]

#### Proof

We prove the result by induction. For \(n=1\), \(Q^{(1)}_g(x,u) = c(x,u)\), which is Lipschitz with Lipschitz constant \(L_{Q^{(1)}_g} = L_c\). This forms the basis of induction. Now assume that \(Q^{(n)}_g\) is \(L_{Q^{(n)}_g}\)-Lipschitz. Then, by Lemma 4, \(V^{(n)}_g\) is Lipschitz with Lipschitz constant \(L_{V^{(n)}_g} = L_{Q^{(n)}_g}(1 + L_g)\) and by Lemma 2, \(Q^{(n+1)}_g\) is Lipschitz with Lipschitz constant \(L_{Q^{(n+1)}_g} = L_c + βL_p L_{V^{(n)}_g}.\) Combining these two we get \[ L_{Q^{(n+1)}_g} + L_c + \beta(1 + L_g)L_p L_{Q^{(n)}_g}. \]

**Theorem 1**-
Given any \((L_c, L_p)\)-Lipschitz MDP, if \(\beta L_p < 1\), then the infinite horizon \(\beta\)-discounted value function \(V\) is Lipschitz continuous with Lipschitz constant \[ L_{V} = \frac{L_c}{1 - β L_p} \] and the action-value function \(Q\) is Lipschitz with Lipschitz constant \[ L_Q = L_V = \frac{L_c}{1 - β L_p}. \]

#### Proof

Consider the sequence of \(L_n = L_{V^{(n)}}\) values. For simplicity write \(α = β L_p\). Then the sequence \(\{L_n\}_{n \ge 1}\) is given by: \(L_1 = L_c\) and for \(n \ge 1\), \[ L_{n+1} = L_c + α L_n. \] Hence, \[ L_n = L_c + α L_c + \dots + α^{n-1} L_c = \frac{1 - α^n}{1 - α} L_c. \] This sequence converges if \(|α| < 1\). Since \(α\) is non-negative, this is equivalent to \(α < 1\), which is true by hypothesis. Hence \(L_n\) is a convergent sequence. At convergence, the limit \(L_V\) must satisfy the fixed point of the recursion relationship introduced in Lemma 5, hence \[ L_V = L_c + β L_p L_V. \] Consequently, the limit is equal to \[ L_V = \frac{L_c}{1 - β L_p}. \] The Lipschitz constant of \(Q\) follows from Lemma 2. \(\Box\)

**Theorem 2**-
Given any \((L_c, L_p)\)-Lipschitz MDP and an \(L_g\)-Lipschitz (possibly randomized) time-homogeneous policy \(g\), if \(\beta (1 + L_g) L_p < 1\), then the infinite horizon \(\beta\)-discounted value-action function \(Q_g\) is Lipschitz continuous with Lipschitz constant \[ L_{Q_g} = \frac{L_c}{1 - β(1 + L_g) L_p} \] and the value function \(V_g\) is Lipschitz with Lipschitz constant \[ L_{V_g} = L_{Q_g}(1 + L_g) = \frac{L_c(1 + L_g)}{1 - β(1 + L_g) L_p}. \]

**Remark 1**-
The restrictive assumption in the result is that \(β(1 + L_g)L_p < 1\). For a specific model, even when this assumption does not hold, it may be possible to directly check if the \(Q\)-function is Lipschitz continuous. Such a direct check often gives a better Lipschitz constant.

#### Proof

Consider the sequence of \(L_n = L_{Q^{(n)}_g}\) values. For simplicity, write \(α = β(1 + L_g)L_p\). Then, the sequence \(\{L_n\}_{n \ge 1}\) is given by: \(L_1 = L_c\) and for \(n \ge 1\), \[L_{n+1} = L_c + α L_n. \] Hence, \[ L_n = L_c + α L_c + \dots + α^{n-1} L_c = \frac{1 - α^n}{1 - α} L_c. \] This sequence converges if \(|α| < 1\). Since \(α\) is non-negative, this is equivalent to \(α < 1\), which is true by hypothesis. Hence \(L_n\) is a convergent sequence. At convergence, the limit \(L_{Q_g}\) must satisfy the fixed point of the recursion relationship introduced in Lemma 6, hence \[ L_{Q_g} = L_c + β(1 + L_g)L_p L_{Q_g}. \] Consequently, the limit is equal to \[ L_{Q_g} = \frac{L_c}{1 - β(1 + L_g) L_p}. \]

The Lipschitz constant of \(V_g\) follows from Lemma 4.

# 3 Influence Radius

When the \(Q\)-function of an MDP is Lipschitz continuous, then the optimal action does not change too abruptly. More precisely, suppose an action \(u\) is optimal at state \(x\). Then, we can identify a hyperball \(B(x, ρ(x))\) of radius \(ρ(x)\) centered around \(x\) such that \(u\) is guaranteed to be the dominating action in \(ρ(x)\). This radius \(ρ(s)\) is called the *influence radius*.

Let \(g\) denote the optimal policy, i.e., \[ g(x) = \arg \min_{u \in \ALPHABET U} Q(x,u) \] and \(h\) denote the second best action, i.e., \[ h(x) = \arg \min_{u \in \ALPHABET U \setminus \{g(x)\}} Q(x,u). \] Define the *domination value* of state \(x\) to be \[ Δ(x) = Q(x, h(x)) - Q(x, g(x)). \]

**Theorem 3**-
For a Lipschitz continuous \(Q\)-function, the influence radius at state \(x\) is given by \[ ρ(x) = \frac{ Δ(x) }{ 2 L_Q }. \]

**Remark 2**-
Combining Theorem 2 and Theorem 3 implies that under the condition of Theorem 2, the influence radius at state \(x\) is at least \[ ρ(x) = Δ(x)(1 - β(1 + L_g)L_p)/2L_c. \]

#### Proof

The intuition behind the proof is the following. The value of the action \(g(x)\) can only decrease by \(L_Q ρ(x)\) in \(B(x, ρ(x))\), while the value of the second best action \(h(x)\) can only increase by \(L_Q ρ(x)\). So, the shortest distance \(ρ(x)\) from \(x\) needed for an action \(h(x)\) to “catch-up” with action \(g(x)\) should satisfy \(2 L_Q ρ(x) = Δ(x)\) or \(ρ(x) = Δ(x)/2L_Q\).

Formally, for any \(x' \in B(x,ρ(x))\), \(d_X(x,x') \le ρ(x)\). Thus, for any action \(u \in \ALPHABET U\), \[ | Q(x,u) - Q(x',u)| \le L_Q d_X(x,x') \le L_Q ρ(x). \] Equivalently, \[ Q(x,u) - L_Q ρ(x) \le Q(x',u) \le Q(x,u) + L_Q ρ(x) \] which states that as \(x'\) moves away from \(x\), the value of \(Q(x',u)\) remains within a symmetric bound that depends on the radius \(ρ(x)\). Since this bound holds for all \(u\), they also hold for \(u = g(x)\). Thus, \[ Q(x, g(x)) - L_Q ρ(x) \le Q(x', g(x)) \le Q(x, g(x)) + L_Q ρ(x). \]

Since \(g(x)\) is the optimal action, for any other action \(u \neq g(x)\), \[ Q(x,g(x)) \le Q(x,u). \] Thus, the action \(g(x)\) is optimal as long as the upper bound on \(Q(x', g(x))\) is lower than the lower bound on \(Q(x',u)\), i.e., \[ Q(x, g(x)) + L_Q ρ(x) \le Q(x,u) - L_Q ρ(x). \] Thus, the maximum value of \(ρ(x)\) is when the relationship holds with equality, i.e., \[ ρ(x) = \frac{Q(x,u) - Q(x,g(x))}{2 L_Q} \ge \frac{Δ(x)}{2 L_Q}. \space\Box\]

# Exercises

Let \((\ALPHABET X, d_X)\) be a metric space and \(x, y \in \ALPHABET X\). Consider two Bernoulli measures \[ μ = a δ_x + (1-a) δ_y, \qquad ν = b δ_x + (1-b) δ_y. \]

Show that \[ K(μ,ν) = |a - b| d(x,y). \]

# References

The material in this section is taken from Rachelson and Lagoudakis (2010) and Hinderer (2005).

*Mathematical Methods of Operations Research*

*62*, 1, 3–22. DOI: 10.1007/s00186-005-0438-1.

*Proceedings of 11th international symposium on artificial intelligence and mathematics*. Available at: https://oatao.univ-toulouse.fr/17977/.

This entry was last updated on 25 May 2022 and posted in MDP and tagged infinite horizon, discounted cost, lipschitz continuity.