12  Service migration in mobile edge computing

Updated

February 17, 2026

Keywords

mobile edge computing, structural results, interchange argument, communication

Note Summary

We present an argument that uses an interchange argument to establish qualitative property of an optimal policy.

There are many mobile applications which consist of a front-end component running on a mobile device and a back-end component running on a cloud, where the cloud provides additional data processing and computing resources. Examples include real-time video processing, social networking, video games, etc. In such applications, communicating with the back-end server causes delay which can cause poor quality of service. This delay can be reduced by moving the computational server closer to the user giving rise to an architecture which is called mobile edge computing (MEC). In an MEC, multiple edge servers are distributed across the networks which provide cloud services to the user. Ideally, a mobile user should be connected to the closest edge server. However, as the mobile user moves, the closest edge server may change. Moving the tasks from one server to another incur a setup cost. In this section, we study a stylized model of MEC from the point of view of the service provider who has to decide how to trade-off between the delay and migration cost in mobile edge computing.

We assume that the network is a two-dimensional topology \(\ALPHABET X\) with a distance metric \(\NORM{\cdot}\). The metric may either correspond to Euclidean distance or may depend on the network topology.

Let \(X_t \in \ALPHABET X\) denote the location of a mobile user at time \(t\). The user moves in \(\ALPHABET X\) according to a Markovian motion model. There are a finite set \(\ALPHABET S\) of mobile edge servers. Let \(S_t \in \ALPHABET S\) denote the server to which the user is connected at time \(t\). The state of the system is given by the tuple \((X_t, S_t)\).

At the beginning of each time slot, the MEC controller has the following control options:

In addition to migration, there is a data transmission cost incurred by the user for connecting to the currently active server. The transmission cost is related to the distance between the user and the server (after possible migration). The data transmission cost is given by \(c(\NORM{X_t - A_t})\), where \(c(⋅)\) is a weakly increasing function with \(c(0) = 0\).

We assume that the system runs for a finite horizon \(T\). The objective is to choose a migration policy \(π = (π_1, \dots, π_T)\), \(π_t \colon \ALPHABET X × \ALPHABET S \to \ALPHABET S\) to minimize the expected total cost given by \[ J(π) = \EXP^π \Bigl[ \sum_{t=1}^T \bigl( b(\NORM{S_t - A_t}) + c(\NORM{X_t - A_t}) \bigr) \Bigr]. \]

For this model, the state at time \(t\) is given by \((X_t, S_t)\). We know that the optimal policy is given by the solution of the following dynamic program: \(V_{T+1}(x,s) = 0\) and for \(t \in \{T, \dots, 1\}\), \[V_t(x,s) = \min_{a \in \ALPHABET S} \bigl\{ b(\NORM{s - a}) + c(\NORM{x - a}) + \sum_{y \in \ALPHABET X} P_{xy} V_{t+1}(y, a) \bigr\}. \]

12.1 Structure of the optimal policy

We provide a basic characterization of the optimal policy.

Proposition 12.1 There exists an optimal policy \(π^* = (π^*_1, \dots, π^*_T)\) such that for every \((x,s) \in \ALPHABET X × \ALPHABET S\), we have \[ \NORM{x - π^*_t(x,s)} \le \NORM{x - s}. \]

Proposition 12.1 states that the optimal policy always migrates the user to a server which is closer than the one already serving the user.

The proof is somewhat subtle and proceeds as follows. Let \(\ALPHABET M\) denote the original MDP and \(\ALPHABET M'\) denote a modified MDP where the action set at each time is given by \[ \ALPHABET A(x,s) = \{ a \in \ALPHABET S : \NORM{x - a} \le \NORM{x - s} \}. \] Let \(π^*\) and \(μ^*\) denote the optimal policies of MDPs \(\ALPHABET M\) and \(\ALPHABET M'\). Since any feasible policy of \(\ALPHABET M'\) is also feasible for \(\ALPHABET M\), we have \[ J(μ^*) \ge J(π^*). \]

The key step is to show that for any (Markov) policy \(π\) of MDP \(\ALPHABET M\), there exists a history dependent policy \(μ\) of MDP \(\ALPHABET M\) that has the desired structure of Proposition 12.1 and \[ J(π) \ge J(μ) \ge J(μ^*). \] Since \(π\) was arbitrary, we have \[ J(π^*) \ge J(μ^*). \] Combined with the above, we have that \(J(μ^*) = J(π^*)\), that is there exists a Markov policy with the structural property of Proposition 12.1 that is optimal.

To complete the proof. we only need to prove the key step, which we prove via an interchange argument. Consider an arbitrary policy \(π\) that does not have the property described above. Let \(\{x_t\}_{t \ge 1}\) be a sample path of user locations such that for some \(t_0\) the policy \(π\) moves the serice to a server \[ a_{t_0} = π_{t_0}(x_{t_0}, s_{t_0}) \] such that \(\NORM{x_{t_0} - a_{t_0}} > \NORM{x_{t_0} - s_{t_0}}\). Given the sample path \(\{x_t\}_{t \ge 1}\) of user locations, let \(\{s^π_t\}_{t \ge 1}\) and \(\{a^π_t\}_{t \ge 1}\) denote the realizations of the server locations and actions under policy \(π\).

We construct a (history-depedent) policy \(μ\) such that \(μ\) does not violate the desired property at \(t_0\) as follows.

For time slots \(t < t_0\), the policy \(μ\) chooses the same migration actions as policy \(π\).

Let \(t_m\) be the first time after \(t_0\) such that \[\begin{equation} \label{eq:1} \NORM{x_{t_m} - s^π_{t_0}} > \NORM{x_{t_m} - a^π_{t}}. \end{equation}\]

If such a \(t_m\) does not exist, then the policy \(μ\) is constructed as follows. For time slots \(t < t_0\), the policy \(μ\) chooses the same migration actions as policy \(π\); for time slots after \(t_0\), the policy \(μ\) does not choose any migrations, i.e., sets \(a^μ_t = s^π_{t_0}\) for all \(t \in [t_0, T]\). The policy \(μ\) has same performance as \(π\) in time \([1, t_0)\), does not incur any migration cost in \([t_0, T]\), and incurs at most the same data transmission cost as \(π\) in \([t_0, T]\) (because \(c\) is weakly increasing and \(\NORM{x_t - s^π_{t_0}} \le \NORM{x_t - a^π_t}\) for all \(t \in [t_0, T]\).

If such a \(t_m\) exists, then the policy \(μ\) is constructed as follows. For time slots \(t < t_0\), the policy \(μ\) chooses the same migration actions as policy \(π\); for time slots in \([t_0, t_m)\), policy \(μ\) does not specify any migrations; at \(t_m\) it migrates to \(a^{π}_{t_m}\), and then follows policy \(π\) from \(t_m\) onwards. It can be verified that \(μ\) is a non-anticipative policy (i.e., the action at time \(t\) depends only on the history \(x_{1:t}\) and not on the future realizations \(x_{t+1:T}\)).

Note that policies \(π\) and \(μ\) agree on \([1, t_0 -1]\) and \([t_m + 1, ∞)\). In the interval \([t_0, t_m]\), \[ \NORM{x_{t} - a^μ_t} \le \NORM{x_t - a^π_t}. \] Thus, the transmission cost of policy \(μ\) is no more than the transmission cost of policy \(π\).

Moreover, the migration cost of policy \(μ\) in \([t_0, t_m]\) is no more than the migration cost of policy \(π\) because: \[\begin{align*} \hskip 2em & \hskip -2em b(\NORM{s^π_{t_0} - a^π_{t_0}}) + b(\NORM{a^π_{t_0} - a^π_{t_0 + 1}}) + \cdots + b(\NORM{a^π_{t_m -1} - a^π_{t_m}}) \\ &\ge b(\NORM{s^π_{t_0} - a^π_{t_m}}), \end{align*}\] due to triangle inequality.

Hence, policy \(μ\) performs at least as well as policy \(π\). The above procedure can be repeated so that all migrations to a location farther away from the user can be removed without increasing the overall cost. Thus, we have constructure a history dependent policy that satisfies the required property.

Notes

The model and results presented here are taken from Wang et al. (2019). See Urgaonkar et al. (2015) for a variation of this model.

Wang et al. (2019) provide a proof of the key step stated in the proof of Proposition 12.1 but do not provide a complete argument to show that there exists a Markov policy with the desired structure.