# ECSE 506: Stochastic Control and Decision Theory

## Aditya Mahajan

Winter 2022

### About | Lectures | Notes | Coursework

TL;DR *Another benefit of infinite horizon models is that it is possible to prove structural properties of the optimal policy which might not hold for a finite horizon model. We illustrate this using a model for service migration in mobile edge computing.*

There are many mobile applications which consist of a front-end component running on a mobile device and a back-end component running on a cloud, where the cloud provides additional data processing and computing resources. Examples include real-time video processing, social networking, video games, etc. In such applications, communicating with the back-end server causes delay which can cause poor quality of service. This delay can be reduced by moving the computational server closer to the user giving rise to a architecture which is called *mobile edge computing (MEC)*. In an MEC, multiple edge servers are distributed across the networks which provide cloud services to the user. Ideally, a mobile user should be connected to the closest edge server. However, as the mobile user moves, the closest edge server may change. Moving the tasks from one server to another incur a setup cost. In this section, we study a stylized model of MEC from the point of view of the service provider who has to decide how to trade-off between the delay and migration cost in mobile edge computing.

We assume that the network is a two-dimensional topology \(\ALPHABET X\) with a distance metric \(\| \cdot \|\). The metric may either correspond to Eucledian distance or may depend on the network topology.

Let \(X_t \in \ALPHABET X\) denote the location of a mobile user at time \(t\). The user moves in \(\ALPHABET X\) according to a Markovian motion model. There are a finite set \(\ALPHABET S\) of mobile edge servers. Let \(S_t \in \ALPHABET S\) denote the server to which the user is connected at time \(t\). The state of the system is given by the tuple \((X_t, S_t)\).

At the beginning of each time slot, the MEC controller has the following control options:

Migrate the service from server \(S_t\) to some other server \(A_t \in \ALPHABET S\). This incurs a migration cost of \(b(\|S_t - A_t \|)\), where \(b(⋅)\) is a weakly increasing function with \(b(0) = 0\). Furthermore, we assume that \(b \circ \| ⋅ \|\) satisfies the triangle inequality, i.e., for any \(x, y, z \in \ALPHABET X\), \(b(\|x - y\|) + b(\|y - z \|) \ge b(\|x - z \|)\). Once the migration is complete, the state of the system is \((X_t, A_t)\)

Do not migrate the service, which can be indicated by \(A_t = S_t\), in which case the migration cost is \(b(0) = 0\).

In addition to migration, there is a data *transmission cost* incurred by the user for connecting to the currently active server. The transmission cost is related to the distance between the user and the server (after possible migration). The data transmission cost is given by \(c(\| X_t - A_t \|)\), where \(c(⋅)\) is a weakyl increasing function with \(c(0) = 0\).

We assume that the system runs for an infinite horizon. The objective is to choose a time-homogeneous control policy \(π \colon \ALPHABET X × \ALPHABET S \to \ALPHABET S\) to minimize the infinite horizon discounted cost given by \[ J(π) = \EXP^π \Bigl[ \sum_{t=1}^∞ γ^{t-1} \bigl( b(\| S_t - A_t\|) + c( \| X_t - A_t \|) \bigr) \Bigr], \] where \(γ \in (0,1)\) is the discount factor.

From the standard results in Markov decision theory, we know that the optimal policy is given by the unique fixed point of the following fixed-point equation:

\[V(x,s) = \min_{a \in \ALPHABET S} \bigl\{ b(\|s - a\|) + c(\|a - x\|) + γ \sum_{y \in \ALPHABET X} P_{xy} V(y, a) \bigr\}. \]

# 1 Structure of the optimal policy

We provide a basic characterization of the optimal policy.

**Prop. 1**-
Let \(π^*\) denote the optimal policy. Then for any \((x,s) \in \ALPHABET X × \ALPHABET S\), we have \[ \| x - π^*(x,s) \| \le \| x - s \|. \]

Prop. 1 states that the optimal policy always migrates the user to a server which is closer than the one already serving the user.

#### Proof

We prove the result using an interchange argument. Suppose we are given a service migration policy \(π\) such that the service is migrated to a location farther away from the user, i.e., \(\|x - a\| > \| x - s \|\). We will show that for an arbitrary sample path of the user locations \(\{ x_t\}_{t \ge 1}\), we can find a (possibly history dependent) policy \(μ\) that does not migrate to locations further away from the user in any time slot and performs no worse than policy \(π\).

Given a arbitrary sample path of user locations \(\{x_t\}_{t \ge 1}\) let \(t_0\) denote the *first* timeslot in which the service is migrated to somewhere farther away from the user when following policy \(π\). The state of \(t_0\) is \((x_{t_0}, s_{t_0})\) and the policy \(π\) moves the service to server \(a_{t_0} = π(x_{t_0}, s_{t_0})\), where \(\|x_{t_0} - a_{t_0}\| > \| x_{t_0} - s_{t_0} \|\). Let \(\{a^π_t\}_{t \ge t_0}\) denote the subsequent locations of the server (after migration) under policy \(π\).

Now, we define a policy \(μ\) such that the following conditions are satisfied for the given sample path \(\{x_t\}_{t \ge 1}\) of the user locations as follows. The policy \(μ\) chooses the same migration actions as policy \(π\) in timeslots \(t < t_0\).

Now, suppose \[\begin{equation} \label{eq:1} \| x_{t} - s^π_{t_0} \| \le \| x_{t} - a^π_{t} \|, \quad \forall t > t_0. \end{equation}\] Then, the policy \(μ\) does not choose any migrations from time \(t_0\) onwards. Hence, \(a^h_t = s^π_{t_0}\) for all \(t \ge t_0\). Note that from time \(t_0\) onwards, policy \(μ\) doesn’t incur any migration cost and always incurs a transmission cost which is less than \(π\). Hence, policy \(μ\) performs at least as well as policy \(π\).

Now suppose \eqref{eq:1} does not hold. Then define \(t_m\) to be the *first* timeslot after \(t_0\) such that \[
\| x_{t_m} - s^π_{t_0} \| > \| x_{t_m} - a^π_{t_m} \|.
\]

Now, we define policy \(μ\) as a policy which does not specify any migrations for time \(t \in [t_0, t_m - 1]\), migrates to location \(a^π_{t_m}\) at timeslot \(t_m\), and follows policy \(π\) from \(t_m\) onwards.

Note that policies \(π\) and \(μ\) agree on \([1, t_0 -1]\) and \([t_m + 1, ∞)\). In the interval \([t_0, t_m]\), \[ \| x_{t} - a^h_t \| \le \| x_t - a^π_t \|. \] Thus, the transmission cost of policy \(μ\) is no more than the transmission cost of policy \(π\).

Now, the migration cost incurred by policy \(π\) in the interval \([t_0, t_m]\) can be lower bounded by the migration cost incurred by policy \(μ\) as follows: \[\begin{align*} \hskip 2em & \hskip -2em γ^{t_0 - 1} b(\| s^π_{t_0} - a^π_{t_0} \|) + γ^{t_0 } b(\| a^π_{t_0} - a^π_{t_0 + 1} \|) + \cdots + γ^{t_m - 1} b(\| a^π_{t_m -1} - a^π_{t_m} \|) \\ &\ge γ^{t_m - 1}\bigl[ b(\| s^π_{t_0} - a^π_{t_0} \|) + b(\| a^π_{t_0} - a^π_{t_0 + 1} \|) + \cdots + b(\| a^π_{t_m -1} - a^π_{t_m} \|) \bigr] \\ &\ge γ^{t_m - 1} b(\| s^π_{t_0} - a^π_{t_m} \|), \end{align*}\] where the first inequality follows because \(γ < 1\) and the second follows from the triangle inequality.

Hence, policy \(μ\) performs at least as well as policy \(π\). The above procedure can be repeated so that all the mitigation actions to a location farther away from the user can be removed without increasing the overall cost.

Note that the policy \(μ\) constructed above is a history dependent policy. From the result for infinite horizon MDP, we know that a history dependent policy cannot outperform Markovian policies. Therefore, there exists a Markovian policy that does not migrate to a location farther away from the user, which does not perform worse than \(π\). \(\Box\)

# References

The model and results presented here are taken from Wang et al. (2019). See Urgaonkar et al. (2015) for a variation of this model.

*Performance Evaluation*

*91*, 205–228. DOI: 10.1016/j.peva.2015.06.013.

*IEEE/ACM Transactions on Networking*

*27*, 3, 1272–1288. DOI: 10.1109/tnet.2019.2916577.

This entry was last updated on 17 Feb 2022 and posted in MDP and tagged mobile edge computing, structural results, interchange argument, communication.