# ECSE 506: Stochastic Control and Decision Theory

Simplest model of a networked control systems

Networked control systems (NCS) refer to systems where there a communication network between different components of the control system. In this section, we consider the simplest setup of a NCS where there is a communication channel between the sensor and the controller. There are multiple ways to model the communication channel and we choose the simplest model—the communication channel is an i.i.d. packet drop channel. The details will be explained later.

Consider a linear system with state $$x_t \in \reals^n$$ and control action $$u_t \in \reals^n$$ where the initial state $$x_1$$ has zero mean and finite variance $$Σ^x_1$$. The system dynamics are given by $x_{t+1} = A_t x_t + B_t u_t + w_t,$ where $$A_t \in \reals^{n×n}$$ and $$B_t \in \reals^{n×m}$$ are known matrices and $$\{w_t\}_{t\ge 1}$$ is $$\reals^n$$-valued i.i.d. noise process with zero mean and finite variance $$\Sigma^w$$. We make the standard assumption that the primitve random variables $$\{x_1, w_1, \dots, w_T\}$$ are independent.

The key modeling assumption of a NCS is that the controller is not co-located with the plant. Rather, it is remotely located and connected to the plant over an i.i.d. packet-drop communication channel. In particular, there is a sensor, co-located with the plant, which observes the state of the plant and sends it to the remotely located controller over the packet-drop channel, which may be thought of as an ON-OFF channel. We use the variable $$γ_t \in \{0, 1\}$$ to denote the state of the channel, where $$γ_t = 0$$ denotes that the channel is OFF (which means that any transmitted packet is dropped) and $$γ_t = 1$$ denotes that the channel is ON (which means that any transmitted packet is believed). We assume that $$\{γ_1, \dots, γ_T \}$$ is an i.i.d. Bernoulli process with success probability $$\PR(γ_t = 1) = 1-p$$.

Let $$y_t \in \reals^n \cup \{\BLANK\}$$ denote the observation of the receiver, where $$\BLANK$$ denotes the event that the transmitted packet was dropped. Then, we have that $y_t = \begin{cases} x_t, & \text{if } γ_t = 1, \\ \BLANK, & \text{if } γ_t = 0. \end{cases}$

The controller generates a control action $$u_t$$ using all the information $$I_t = \{y_{1:t}, u_{1:t-1}\}$$ available to it at time $$t$$. Thus, $u_t = g_t(I_t),$ where $$g = (g_1, \dots, g_{T-1})$$ is called a control strategy. We consider the optimal regulation problem where the objective is to minimize the finite horizon cost given by $$$\label{eq:cost} J(g) = \EXP^{g} \Bigl[ \sum_{t=1}^{T-1} \bigl[ x_t^\TRANS Q_t x_t + u_t^\TRANS R_t u_t \bigr] + x_T^\TRANS Q_T x_T \Bigr],$$$ where $$\{Q_t\}_{t=1}^T$$ are positive semi-definite matrices and $$\{R_t\}_{t=1}^{T-1}$$ are positive definite matrices.

Given the system dynamics, the noise statistics, and the channel statistics, we are interested in choosing a control strategy $$g$$ to minimizes the total expected cost $$J(g)$$ given by \eqref{eq:cost}.

# 1 Completion of squares argument

Although it is possible to obtain a solution to the above model using a dynamic programming approach, we will follow the completion of squares based approach introduced in the notes on LQR.

Using Prop. 1 of LQR, the total cost of any strategy $$g$$ may be written as follows: \begin{align} J(g) = & \EXP\bigg[ \sum_{t=1}^{T-1} (u_t + L_t x_t)^\TRANS [R_t + B_t^\TRANS S_{t+1}B_t] (u_t + L_t x_t) \bigg] \nonumber \\ & \quad + \EXP\bigg[ x_1^\TRANS S_1 x_t + \sum_{t=1}^{T-1} w_t S_{t+1} w_t \bigg], \label{eq:astrom} \end{align} where the gain matrices $$\{L_t\}_{t\ge 1}$$ are given by: $L_t = [R_t + B_t^\TRANS S_{t+1} B_t]^{-1} \Lambda_t$ where $\Lambda_t = B_t^\TRANS S_{t+1} A_t$ and $$\{S_t\}_{t=1}^T$$ are determined by the solution of the backward Riccati equation: $$S_T = Q_T$$ and for $$t \in \{T-1, \dots, 1\}$$: $$$\label{eq:riccati} S_t = A_t^\TRANS S_{t+1} A_t + Q_t - \Lambda_t^\TRANS [ R_t + B_t^\TRANS S_{t+1} B_t ] ^{-1} \Lambda_t.$$$

Remark

The matrices $$\{L_t\}_{t=1}^T$$ and $$\{S_t\}_{t=1}^T$$ are the same as in the basic LQR model.

Now, as in the solution to the LQR problem, we note that the second term of \eqref{eq:astrom} is a function of the primitive random variables and does not depend on the choice of the control strategy $$g$$. Thus, in order to minimize the total expected cost, it suffices to minimize the first term of \eqref{eq:astrom}. However, unlike the case in LQR with perfect state observation, we cannot simply choose $$u_t = -L_t x_t$$ because the state $$x_t$$ is not known to the observer at all time instances. In the next section, we use state splitting and orthogonal projection to minimize the first term of \eqref{eq:astrom}.

# 2 State splitting and static reduction

We split the state $$x_t$$ into two components: $$x_t = x^g_t + x^s_t$$, where \begin{align*} x^g_1 &=0, & x^s_1 &= x_1, \\ x^g_{t+1} &= A_t x^g_t + B_t u_t, & x^s_{t+1} &= A_t x^s_t + w_t. \end{align*} We refer to $$x^g_t$$ and $$x^s_t$$ as the controlled and control-free components of the state, respectively. Now, define controlled and control-free components $$(y^g_t, y^s_t)$$ of the observation as follows: $(y^g_t, y^s_t) = \begin{cases} (x^g_t, x^s_t), & \text{if } γ_t = 1, \\ (\BLANK, \BLANK), & \text{if } γ_t = 0. \end{cases}$ If we define $$\BLANK + \BLANK = \BLANK$$, then we have that $$y_t = y^g_t + y^s_t$$. Define $$I^s_t = \{y^s_{1:t}\}$$.

Remark

Compared to the partially observed LQR, the components $$y^g_t$$ and $$y^s_t$$ of the state are defined differently. But the result of the proof argument, remains almost exactly the same.

Lemma 1

(Static reduction) For any control strategy $$g$$, the information sets $$I_t$$ and $$I^s_t$$ generate the same sigma algebra. Equivalently, $$I_t$$ and $$I^s_t$$ are functions of each other.

#### Proof

To be written $$\Box$$

An implication of Lemma 1 is that we can replace conditioning on $$I_t$$ by conditioning on $$I^s_t$$ in any conditional probability expression.

# 3 Orthogonal projection

To simplify the first term of \eqref{eq:astrom}, define $\hat x_t = \EXP[ x_t | I_t ]$ as the conditional estimate of the state given the observations at the controller and define $\tilde x_t = x_t - \hat x_t$ as the corresponding estimation error.

Then, these have the following properties.

Lemma 2

For any control strategy $$g$$, we have

1. $$\tilde x_t = x^s_t - \EXP[x^s_t | I^s_t]$$ is control-free and may be written just in terms of the primitive random variables.

Furthermore, for any matrix $$M$$ of appropriate dimensions:

1. $$\EXP[\hat x_t^\TRANS M \tilde x_t ] = 0$$.
2. $$\EXP[ u_t^\TRANS M \tilde x_t ] = 0$$.

#### Proof

To prove part 1, we note that \begin{align} \tilde x_t &= x_t - \EXP[x_t | I_t ] \notag \\ &\stackrel{(a)}= x^g_t + x^s_t - \EXP[ x^g_t + x^s_t | I_t] \notag \\ &\stackrel{(b)}= x^s_t - \EXP[x^s_t | I_t ] \notag \\ &\stackrel{(c)}= x^s_t - \EXP[x^s_t | I^s_t ] \label{eq:tilde-x} \end{align} where $$(a)$$ follows from state splitting, $$(b)$$ follows from the fact that $$x^g_t$$ is a function of $$u_{1:t-1}$$ which is a part of $$I_t$$, and $$(c)$$ follows from Lemma 1. Part 1 then follows by observing that \eqref{eq:tilde-x} depends only on primitive random variables.

To prove parts 2 and 3, let $$ξ_t$$ be a function of $$I_t$$ and $$M$$ be a matrix of appropriate dimensions. Then, \begin{align} \EXP[ξ_t^\TRANS M \tilde x_t] &\stackrel{(d)}= \EXP[ \EXP[ ξ_t^\TRANS M \tilde x_t | I_t ] ] \notag \\ &\stackrel{(e)}= \EXP[ ξ_t^\TRANS M \EXP[ \tilde x_t | I_t ] ] \notag \\ &\stackrel{(f)}= 0. \end{align} where $$(d)$$ follows from the smoothing property of conditional expectation, $$(e)$$ follows from the fact that $$ξ_t$$ is a function of $$I_t$$, and $$(f)$$ follows from the fact that $$\EXP[\tilde x_t | I_t] = 0$$ by construction.

Part 2 follows from observing that $$\hat x_t$$ is a function of $$I_t$$. Part 3 follows from observing that $$u_t$$ is a function of $$I_t$$$$\Box$$

Lemma 3

For any control strategy $$g$$, the first term of \eqref{eq:astrom} may be written as \begin{align} & \EXP^{g}\Bigl[ \sum_{t=1}^{T-1} (u_t + L_t \hat x_t)^\TRANS [R_t + B_t^\TRANS S_{t+1} B_t](u_t + L_t \hat x_t) ] \notag \\ &\quad + \EXP\Bigl[ \sum_{t=1}^{T-1} (L_t \tilde x_t)^\TRANS [R_t + B_t^\TRANS S_{t+1} B_t](L_t \tilde x_t) ] \label{eq:simple} \end{align}

#### Proof

Lemma 2 implies that for any matrix $$M$$ of appropriate dimensions, $\EXP[ (u_t + L_t x_t)^\TRANS M (u_t + L_t x_t) = \EXP[ (u_t + L_t \hat x_t)^\TRANS M (u_t + L_t \hat x_t) ] + \EXP[ (L_t \tilde x_t)^\TRANS M (L_t \tilde x_t) ],$ where the cross-terms are zero due to parts 2 and 3 of Lemma 2. The result of the Lemma follows by repeatedly using the above property at each time step.

# 4 Main Result

Theorem 1

The optimal control strategy for the networked control system discussed in this section is given by $$$\label{eq:optimal} u_t = - L_t \hat x_t.$$$ Furthermore, the state estimate $$\hat x_t$$ evolves as $\hat x_{t+1} = \begin{cases} x_{t+1}, & \text{if } γ_{t+1} = 1, \\ A_t \hat x_t + B_t u_t, & \text{ if } γ_{t+1} = 0. \end{cases}$.

#### Proof

The proof of the structure of the optimal controller follows by combining various properties described above. In particular, we have shown that for any any control strategy $$g$$, the total cost can be written as \eqref{eq:astrom}, where the second term depends just on the primitive random variables. Moreover, the first term of \eqref{eq:astrom} can be written as \eqref{eq:simple}, where (by Lemma 2, part 1) the second term is control free and depends just on the primitive random variables. Therefore, it suffices to minimize the first term of \eqref{eq:simple} to minimizing $$J(g)$$. By assumption, $$S_T = Q_T$$ is positive semi-definite. It can be recursively shown that $$S_t$$ is also positive definite. Therefore, the first term of \eqref{eq:simple} is greater than or equal to zero, with equality if and only if the strategy is given by \eqref{eq:optimal}. Since the policy \eqref{eq:optimal} achieves the minimal value of the cost, it is optimal.

We show the structure of the update of the state estimate separately for each value of $$γ_t$$. When $$γ_{t+1} = 1$$, we have that $$y_{t+1} = x_{t+1}$$. Thus, $$\hat x_{t+1} = x_{t+1}$$. When $$γ_{t+1} = 0$$, \begin{align*} \hat x_{t+1} &= \EXP[ x_{t+1} | I_{t+1} ] \\ &= \EXP[ A_t x_t + B_t u_t + w_t | y_{1:t}, u_{1:t}, y_{t+1} = \BLANK ] \\ &\stackrel{(a)}= A_t \EXP[ x_t | y_{1:t}, u_{1:t}, y_{t+1} = \BLANK ] + B_t u_t \end{align*} where $$(a)$$ follows from observing that $$w_t$$ is a zero mean random variable which is independent of $$(w_{1:t-1}, γ_{1:t})$$.

Now observe that the event $$\{y_{t+1} = \BLANK\}$$ is equivalent to the event $$\{γ_{t+1} = 0\}$$. So, we can remove it from the conditioning because $$x_t$$ is a function of $$(x_1, w_{1:t-1}, γ_{1:t-1})$$ which are independent of $$γ_{t+1}$$. Furthermore, we can remove $$u_t$$ from the conditioning because it is a function of the remaining variables. Thus, $\EXP[ x_t | y_{1:t}, u_{1:t}, y_{t+1} = \BLANK ] = \EXP[ x_t | y_{1:t}, u_{1:t-1} ] = \hat x_t.$ Substituting this in the above formula, we get the update equation for the state estimate. $$\Box$$

TODO: Discuss the implications of the result. Compare with certainty equivalence and dual effect. And separation of estimation and control.

This entry was last updated on 19 Oct 2020 and posted in NCS and tagged linear systems, riccati equation, lqr.