16 Lipschitz MDPs
16.1 Preliminaries
Lipschitz continuous functions
Given two metric spaces \((\ALPHABET X, d_X)\) and \((\ALPHABET Y, d_Y)\), the Lipschitz constant of function \(f \colon \ALPHABET X \to \ALPHABET Y\) is defined by \[ \| f\|_{L} = \sup_{x_1 \neq x_2} \left\{ \frac{ d_Y(f(x_1), f(x_2)) } { d_X(x_1, x_2) } : x_1, x_2 \in \ALPHABET X \right\} \in [0, ∞]. \] The function is called Lipschitz continuous if its Lipschitz constant is finite.
Intuitively, a Lipschitz continuous function is limited by how fast it can change. For example, the following image from Wikipedia shows that for a Lipschitz continuous function, there exists a double cone (white) whose origin can be moved along the graph so that the whole graph always stays outside the double cone.
Let \(\ALPHABET Z\) be an arbitrary set. A function \(f \colon \ALPHABET X × \ALPHABET Z \to \ALPHABET Y\) is said to be uniformly Lipschitz in \(u\) if \[ \sup_{z \in \ALPHABET Z} \| f(\cdot, z) \|_L = \sup_{z \in \ALPHABET Z} \sup_{x_1 \neq x_2} \dfrac{ d_Y(f(x_1,z), f(x_2, z)) }{ d_X(x_1, x_2) } < ∞. \]
Some examples
A function \(f \colon \reals \to \reals\) is Lipschitz continuous if and only if it has bounded first derivative. The Lipschitz constant of such a function is equal to the maximum absolute value of the derivative.
Here are some examples of Lipschitz continuous functions:
The function \(f(x) = \sqrt{x^2 + 1}\) defined over \(\reals\) is Lipschitz continuous because it is everywhere differentiable and the maximum value of the derivative is \(L = 1\).
The function \(f(x) = |x|\) defined over \(\reals\) is Lipschitz continuous with Lipschitz constant equal to \(1\). Note that this function is continuous but not differentiable.
The function \(f(x) = x + \sin x\) defined over \(\reals\) is Lipschitz continuous with a Lipschitz constant equal to \(1\).
The function \(f(x) = \sqrt{x}\) defined over \([0,1]\) is not Lipschitz continuous because the function becomes infinitely steep as \(x\) approaches \(0\).
The function \(f(x) = x^2\) defined over \(\reals\) is not Lipschitz continuous because it becomes arbitrarily steep as \(x\) approaches infinity.
The function \(f(x) = \sin(1/x)\) is bounded but not Lipschitz because becomes infinitely steep as \(x\) approaches \(0\).
Properties of Lipschitz functions
Proposition 16.1 Lipschitz continuous functions have the following properties:
If a function \(f \colon (\ALPHABET X, d_X) \to (\ALPHABET Y, d_Y)\) is Lipschitz continuous, then \(f\) is uniformly continuous and measurable.
\(\| f\|_L = 0\) if and only if \(f\) is a constant.
If \(f \colon (\ALPHABET X, d_X) \to (\ALPHABET Y, d_Y)\) and \(g \colon (\ALPHABET Y, d_Y) \to (\ALPHABET Z, d_Z)\) are Lipschitz continuous, then \[ \| f \circ g \|_L \le \| f \|_L \cdot \| g \|_L. \]
The \(\| \cdot \|_{L}\) is a seminorm on the vector space of Lipschitz functions from a metric space \((\ALPHABET X, d_X)\) to \((\ALPHABET Y, d_Y)\). In particular, \(\| \cdot \|_L\) has the following properties: \(\| f \|_L \in [0, ∞]\), \(\| α f\|_L = |α| \cdot \|f\|_L\) for any \(α \in \reals\), and \(\| f_1 + f_2 \|_L \le \|f_1 \|_L + \|f_2 \|_L\).
Given a family of functions \(f_i\), \(i \in I\), on the same metric space such that \(\sup_{i \in I} f_i < ∞\), \[ \| \sup_{i \in I} f_i \|_{L} \le \sup_{i \in I} \| f_i \|_{L}. \]
Let \(f_n\), \(n \in \integers_{\ge 1}\), and \(f\) be functions from \((\ALPHABET X, d_X)\) to \((\ALPHABET Y, d_Y)\). If \(f_n\) converges pointwise to \(f\) for \(n \to ∞\), then \[ \| f \|_{L} \le \lim\inf_{n \to ∞} \| f_i \|_{L}. \]
16.2 Kantorovich distance
Let \(\mu\) and \(\nu\) be probability measures on \((\ALPHABET X, d_X)\). The Kantorovich distance between distributions \(\mu\) and \(\nu\) is defined as: \[ K(\mu,\nu) = \sup_{f : \| f\|_L \le 1 } \left| \int_{\ALPHABET X} f\, d\mu - \int_{\ALPHABET X} f\, d\nu \right|. \]
The next results follow immediately from the definition of Kantorovich distance.
Proposition 16.2 For any Lipschitz function \(f \colon (\ALPHABET X, d_X) \to (\reals, \lvert \cdot \rvert)\), and \(μ,ν\) are probability measures on \((\ALPHABET X, d_X)\), \[ \left| \int_{\ALPHABET X} f\, dμ - \int_{\ALPHABET X} f\, dν \right| \le \| f \|_L \cdot K(μ,ν). \]
Some examples
Let \((\ALPHABET X, d_X)\) be a metric space and for any \(x,y \in \ALPHABET X\), let \(δ_x\) and \(δ_y\) denote the Dirac delta distributions centered at \(x\) and \(y\). Then, \[ K(δ_x, δ_y) = d_X(x,y). \]
Let \((\ALPHABET X, d_X)\) be a Euclidean space with Euclidean norm. Let \(μ \sim \mathcal{N}(m_1, \Sigma_1)\) and \(ν \sim \mathcal{N}(m_2, \Sigma_2)\) be two Gaussian distributions on \(\ALPHABET X\). Then, \[K(μ,ν) = \sqrt{ \| m_1 - m_2 \|_2^2 + \text{Tr}( \Sigma_1 + \Sigma_2 - 2(\Sigma_2^{1/2} \Sigma_1 \Sigma_2^{1/2})^{1/2} ) }. \] If the two covariances commute, i.e. \(\Sigma_1\Sigma_2 = \Sigma_2 \Sigma_1\), then, \[K(μ,ν) = \sqrt{ \| m_1 - m_2 \|_2^2 + \| \Sigma_1^{1/2} - \Sigma_2^{1/2} \|^2_F},\] where \(\| ⋅ \|_{F}\) denotes the Frobeinus norm of a matrix.
When \(\Sigma_1 = \Sigma_2\), we have \[K(μ,ν) = \| m_1 - m_2 \|_2. \]
If \(\ALPHABET X = \reals\) and \(d_X = | \cdot |\), then for any two distributions \(μ\) and \(ν\), \[ K(μ,ν) = \int_{-∞}^∞ \left| F_μ(x) - F_ν(x) \right| dx, \] where \(F_μ\) and \(F_ν\) denote the CDF of \(μ\) and \(ν\).
Furthermore, if \(μ\) is stochastically dominated by \(ν\), then \(F_μ(x) \ge F_ν(x)\). Thus, \[ K(μ, ν) = \bar μ - \bar ν \] where \(\bar μ\) and \(\bar ν\) are the means of \(μ\) and \(ν\).
16.3 Lipschitz MDPs
Now consider an MDP where the state and action spaces are Metric spaces. We denote the corresponding metric by \(d_S\) and \(d_A\) respectively. For ease of exposition, we define a metric \(d\) on \(\ALPHABET S × \ALPHABET A\) by \[ d( (s_1, a_1), (s_2, a_2) ) = d_S(s_1, s_2) + d_A(a_1, a_2). \]
We allow for randomized policies. Thus, given any state \(s \in \ALPHABET S\), \(π(\cdot | s)\) is a probability distribution on \(\ALPHABET A\). We say that a (possibly) randomized policy \(π\) has a Lipschitz constant of \(L_π\) if for any \(s_1, s_2 \in \ALPHABET S\), \[ K(π(\cdot| s_1), π(\cdot | s_2)) \le L_π d_S(s_1, s_2). \]
Note that if \(π\) is deterministic, then due to property of Kantorovich distance between delta distributions, the above relationship simplifies to \[ d_A(π(s_1), π(s_2)) \le L_π d_S(s_1, s_2). \]
Definition 16.1 An MDP is \((L_c, L_p)\)-Lipschitz if for all \(s_1, s_2 \in \ALPHABET S\) and \(a_1, a_2 \in \ALPHABET A\),
- \(| c(s_1, a_1) - c(s_2, a_2) | \le L_c\bigl( d_S(s_1, s_2) + d_A(a_1, a_2) \bigr)\).
- \(K(p(\cdot | s_1, a_1), p(\cdot | s_2, a_2)) \le L_p\bigl( d_S(s_1, s_2) + d_A(a_1, a_2) \bigr)\).
Lipschitz continuity of Bellman updates
We now prove a series of results for the Lipschitz continuity of Bellman updates.
Lemma 16.1 Let \(V \colon \ALPHABET S \to \reals\) be \(L_V\)-Lipschitz continuity. Define \[ Q(s,a) = c(s,a) + γ \int V(y) p(y|s,a)dy. \] Then \(Q\) is \((L_c + γ L_p L_V)\)-Lipschitz continuous.
Lemma 16.2 Let \(Q \colon \ALPHABET S × \ALPHABET A \to \reals\) be \(L_Q\)-Lipschitz continuous. Define \[V(s) = \min_{a \in \ALPHABET A} Q(s,a).\] Then \(V\) is \(L_Q\)-Lipschitz continuous.
Lemma 16.3 Let \(Q \colon \ALPHABET S × \ALPHABET A \to \reals\) be \(L_Q\)-Lipschitz continuous and \(π\) be a (possibly randomized) \(L_π\)-Lipschitz policy. Define \[V_π(s) = \int Q(s, a) π(a | s) du.\] Then, \(V_π\) is \(L_Q( 1 + L_π)\)-Lipschitz continuous.
16.4 Lipschitz continuity of value iteration
Lemma 16.4 Consider a discounted infinite horizon MDP which is \((L_c, L_p)\)-Lipschitz. Start with \(V^{(0)} = 0\) and recursively define
- \(\displaystyle Q^{(n+1)}(s,a) = c(s,a) + γ \int V^{(n)}(y) p(y|s,a) dy.\)
- \(\displaystyle V^{(n+1)}(s) = \min_{a \in \ALPHABET A} Q^{(n+1)}(s,a).\)
Then, \(V^{(n)}\) is Lipschitz continuous and its Lipschitz constant \(L_{V^{(n)}}\) satisfies the following recursion: \[L_{V^{(n+1)}} = L_c + γ L_p L_{V^{(n)}}.\]
Lemma 16.5 Consider a discounted infinite horizon MDP which is \((L_c, L_p)\)-Lipschitz and let \(π\) be any randomized time-homogeneous policy which is \(L_π\)-Lipschitz. Start with \(V^{(0)} = 0\) and then recursively define
- \(V^{(n)}_π(s) = \int Q^{(n)}_π(s,a)π(a|s) du.\)
- \(\displaystyle Q^{(n+1)}_π(s,a) = c(s,a) + γ \int V^{(n)}_π(y) p(y|s,a) dy.\)
Then, then \(Q^{(n)}_π\) is Lipschitz continuous and its Lipschitz constant \(L_{Q^{(n)}_π}\) satisfies the follwoing recursion: \[ L_{Q^{(n+1)}_π} + L_c + \beta(1 + L_π)L_p L_{Q^{(n)}_π}. \]
Theorem 16.1 Given any \((L_c, L_p)\)-Lipschitz MDP, if \(\beta L_p < 1\), then the infinite horizon \(\beta\)-discounted value function \(V\) is Lipschitz continuous with Lipschitz constant \[ L_{V} = \frac{L_c}{1 - γ L_p} \] and the action-value function \(Q\) is Lipschitz with Lipschitz constant \[ L_Q = L_V = \frac{L_c}{1 - γ L_p}. \]
Theorem 16.2 Given any \((L_c, L_p)\)-Lipschitz MDP and an \(L_π\)-Lipschitz (possibly randomized) time-homogeneous policy \(π\), if \(\beta (1 + L_π) L_p < 1\), then the infinite horizon \(\beta\)-discounted value-action function \(Q_π\) is Lipschitz continuous with Lipschitz constant \[ L_{Q_π} = \frac{L_c}{1 - γ(1 + L_π) L_p} \] and the value function \(V_π\) is Lipschitz with Lipschitz constant \[ L_{V_π} = L_{Q_π}(1 + L_π) = \frac{L_c(1 + L_π)}{1 - γ(1 + L_π) L_p}. \]
16.5 Influence Radius
When the \(Q\)-function of an MDP is Lipschitz continuous, then the optimal action does not change too abruptly. More precisely, suppose an action \(a\) is optimal at state \(s\). Then, we can identify a hyperball \(B(s, ρ(s))\) of radius \(ρ(s)\) centered around \(s\) such that \(a\) is guaranteed to be the dominating action in \(ρ(s)\). This radius \(ρ(s)\) is called the influence radius.
Let \(π\) denote the optimal policy, i.e., \[ π(s) = \arg \min_{a \in \ALPHABET A} Q(s,a) \] and \(h\) denote the second best action, i.e., \[ h(s) = \arg \min_{a \in \ALPHABET A \setminus \{π(s)\}} Q(s,a). \] Define the domination value of state \(s\) to be \[ Δ(s) = Q(s, h(s)) - Q(s, π(s)). \]
Theorem 16.3 For a Lipschitz continuous \(Q\)-function, the influence radius at state \(s\) is given by \[ ρ(s) = \frac{ Δ(s) }{ 2 L_Q }. \]
Exercises
Exercise 16.1 Let \((\ALPHABET S, d_S)\) be a metric space and \(s, s' \in \ALPHABET S\). Consider two Bernoulli measures \[ μ = a δ_s + (1-a) δ_{s'}, \qquad ν = b δ_s + (1-b) δ_{s'}. \]
Show that \[ K(μ,ν) = |a - b| d(s,s'). \]
Notes
The material in this section is taken from Rachelson and Lagoudakis (2010) and Hinderer (2005).