ECSE 506: Stochastic Control and Decision Theory
Aditya Mahajan
Winter 2022
About  Lectures  Notes  Coursework
In this section, we extend the model of a simple NCS to one where there a cost associated with sending data from the sensor to the controller. This cost may be due to considerations of transmit power at the sensor or it may be a viewed as a Lagrange multiplier if multiple sensors are sharing the same communication channel with limited bandwidth.
As before, we consider a linear system with state \(x_t \in \reals^n\) and control action \(u_t \in \reals^n\) where the initial state \(x_1\) has zero mean and finite variance \(Σ^x_1\). The system dynamics are given by \[ x_{t+1} = A_t x_t + B_t u_t + w_t, \] where \(A_t \in \reals^{n×n}\) and \(B_t \in \reals^{n×m}\) are known matrices and \(\{w_t\}_{t\ge 1}\) is \(\reals^n\)valued i.i.d. noise process with zero mean and finite variance \(\Sigma^w\). We make the standard assumption that the primitive random variables \(\{x_1, w_1, \dots, w_T\}\) are independent.
There is a sensor which is colocated with the plant. The key modeling assumption of this model is that the sensor decides whether or not to transmit to the controller. Let \(δ_t \in \{0, 1\}\) denote the decision of the sensor, where \(δ_t = 0\) denotes that the sensor decides not to transmit and \(δ_t = 1\) denotes that the sensor decides to transmit.
As before, we assume that there is a i.i.d. packetdrop communication channel between the sensor and the controller. The packetdrop channel may be thought of as an ONOFF channel. We use the variable \(γ_t \in \{0, 1\}\) to denote the state of the channel, where \(γ_t = 0\) denotes that the channel is OFF (which means that any transmitted packet is dropped) and \(γ_t = 1\) denotes that the channel is ON (which means that any transmitted packet is delieved). We assume that \(\{γ_1, \dots, γ_T \}\) is an i.i.d. Bernoulli process with success probability \(\PR(γ_t = 1) = 1p\). We assume that the transmission over the packetdrop channel takes place using a TCPlike protocol, so the sensor gets an ACK (acknowledgment) or NACK (negative acknowledgment) to indicate whether the controller received the packet.
Let \(y_t \in \reals^n \cup \{\BLANK\}\) denote the observation of the receiver, where \(\BLANK\) denotes the event that the transmitted packet was dropped. Then, we have that \[ y_t = \begin{cases} x_t, & \text{if } δ_t γ_t = 1, \\ \BLANK, & \text{if } δ_t γ_t = 0. \end{cases}\] For ease of notation, we use \(z_t = δ_t γ_t \in \{0, 1\}\) to denote event whether the controller received a packet or not.
Let \(I^δ_t\) and \(I^u_t\) denote the information available to the sensor and the controller respectively. We assume that \[\begin{align*} I^δ_t &= \{ x_{1:t}, δ_{1:t1}, z_{1:t1} \}, \\ I^u_t &= \{ y_{1_t}, u_{1:t1} \}. \end{align*}\] Thus, the sensor knows the history \(x_{1:t}\) of its past observations, the history \(δ_{1:t1}\) of its past decisions and the history \(z_{1:t1}\) of the ACK/NACK received from the controller. The controller knows the history \(y_{1:t}\) of the past channel outputs and the history \(u_{1:t1}\) of its past control actions.
The sensor generates the transmit decision \(δ_t\) using the information \(I^δ_t\) and the controller generates a control action \(u_t\) using the information \(I^u_t\). Thus, \[ δ_t = f_t(I^δ_t) \quad\text{and}\quad u_t = g_t(I^u_t), \] where \(f = (f_1, \dots, f_T)\) is called the transmission strategy and \(g = (g_1, \dots, g_T)\) is called the control strategy.
It is helpful to work with an expanded information structure at the sensor. In particular, note that the sensor can infer \(y_{1:t1}\) from \(x_{1:t1}\) and \(z_{1:t1}\) and can recursively infer \(u_{1:t1}\) from \(y_{1:t1}\). So, in the sequel, we assume that the information available at the sensor is given by \[ I^δ_t = \{ x_{1:t}, δ_{1:t1}, z_{1:t1}, y_{1:t1}, u_{1:t1}\}. \]
There are two costs at each time. A communication cost \(λ δ_t\), where \(λ\) denotes the cost of transmitting a packet and a regulation cost \(x_t^\TRANS Q_t x_t + u_t^\TRANS R_t u_t\), where \(Q_t\) is a positive semidefinite matrix and \(R_t\) is a postitive definite matrix. Thus, the performance of any strategy \((f,g)\) is given by \[\begin{equation} \label{eq:cost} J(f, g) = \EXP^{f,g} \Bigl[ \sum_{t=1}^{T1} \bigl[ λδ_t + x_t^\TRANS Q_t x_t + u_t^\TRANS R_t u_t \bigr] + x_T^\TRANS Q_T x_T \Bigr]. \end{equation} \]
Given the system dynamics, the noise statistics, and the channel statistics, we are interested in choosing a decision strategy \((f,g)\) to minimize the total expected cost \(J(f,g)\) given by \eqref{eq:cost}.
1 Completion of squares argument
Again, we will follow the completion of squares based approach introduced in the notes of LQR.
Using Prop. 1 of LQR, the total cost of any strategy \(g\) may be written as follows: \[ \begin{align} J(g) = & \EXP\bigg[ \sum_{t=1}^{T1} (u_t + L_t x_t)^\TRANS [R_t + B_t^\TRANS S_{t+1}B_t] (u_t + L_t x_t) \bigg] \nonumber \\ & \quad + \EXP\bigg[ x_1^\TRANS S_1 x_t + \sum_{t=1}^{T1} w_t S_{t+1} w_t \bigg], \label{eq:astrom} \end{align}\] where the gain matrices \(\{L_t\}_{t\ge 1}\) are given by: \[ L_t = [R_t + B_t^\TRANS S_{t+1} B_t]^{1} \Lambda_t \] where \[ \Lambda_t = B_t^\TRANS S_{t+1} A_t \] and \(\{S_t\}_{t=1}^T\) are determined by the solution of the backward Riccati equation: \(S_T = Q_T\) and for \(t \in \{T1, \dots, 1\}\): \[\begin{equation}\label{eq:riccati} S_t = A_t^\TRANS S_{t+1} A_t + Q_t  \Lambda_t^\TRANS [ R_t + B_t^\TRANS S_{t+1} B_t ] ^{1} \Lambda_t. \end{equation}\]
 Remark

The matrices \(\{L_t\}_{t=1}^T\) and \(\{S_t\}_{t=1}^T\) are the same as in the basic LQR model.
Now, as in the solution to the LQR problem, we note that the second term of \eqref{eq:astrom} is a function of the primitive random variables and does not depend on the choice of the control strategy \(g\). Thus, in order to minimize the total expected cost, it sufficies to minimize the first term of \eqref{eq:astrom}. However, unlike the case in LQR with perfect state observation, we cannot simply choose \(u_t = L_t x_t\) because the state \(x_t\) is not known to the observer at all time instances. In the next section, we use state splitting and orthogonal projection to minimize the first term of \eqref{eq:astrom}.
2 State splitting and static reduction
We split the state \(x_t\) into two components: \(x_t = x^g_t + x^s_t\), where \[\begin{align*} x^g_1 &=0, & x^s_1 &= x_1, \\ x^g_{t+1} &= A_t x^g_t + B_t u_t, & x^s_{t+1} &= A_t x^s_t + w_t. \end{align*}\] We refer to \(x^g_t\) and \(x^s_t\) as the controlled and controlfree components of the state, respectively. Now, define controlled and controlfree components \((y^g_t, y^s_t)\) of the observation as follows: \[ (y^g_t, y^s_t) = \begin{cases} (x^g_t, x^s_t), & \text{if } z_t = 1, \\ (\BLANK, \BLANK), & \text{if } z_t = 0. \end{cases}\] If we define \(\BLANK + \BLANK = \BLANK\), then we have that \(y_t = y^g_t + y^s_t\). Now define the static reduction of the information structure: \[ I^{δ,s}_t = \{ x^s_{1:t} \} \quad\text{and}\quad I^{u,s}_t = \{ y^s_{1:t} \}. \]
 Lemma 1

(Static reduction) For any strategy \((f,g)\), \[ I^δ_t ≡ I^{δ,s}_t \quad\text{and}\quad I^u_t ≡ I^{u,s}_t, \] that is, these information sets generate the same sigma algebras. Equivalently, \(I^δ_t\) and \(I^{δ,s}_t\) are functions of each other and so are \(I^u_t\) and \(I^{u,s}_t\).
Proof
To be written \(\Box\)
An implication of Lemma 1 is that we can replace conditioning on \(I^u_t\) by conditioning on \(I^{u,s}_t\) in any conditional probability expression.
Another implication is the following.
 Corollary 1

For any strategy \((f,g)\), there exists a transmission strategy \(\tilde f = (\tilde f_1, \dots, \tilde f_{T1})\), where \(\tilde f_t \colon I^{δ,s} \mapsto δ_t\), such that \(J(f, g) = J(\tilde f, g)\).
Now, in the rest of this section, we assume that the strategy at the sensor is of the form \(\tilde f\) described in Corollary 1. Therefore, the transmission decision \(δ_t = \tilde f_t(I^{δ,s}_t\) depends only on the primitive decisions. Thus, \(\{z_1, \dots, z_T\}\), where \(z_t = δ_t γ_t\), depends only on the primitive random variables. Note that \[\begin{align*} \PR(z_{t+1} = 1  z_{1:t}) &= \PR(δ_{t+1} = 1, γ_{t+1} = 1  z_{1:t}) \\ &= \PR(δ_{t+1} = 1  z_{1:t}) \PR(γ_{t+1} = 1) \end{align*}\]
3 Orthogonal projection
To simplify the first term of \eqref{eq:astrom}, define \[ \hat x_t = \EXP[ x_t  I^u_t ]\] as the conditional estimate of the state given the observations at the controller and define \[ \tilde x_t = x_t  \hat x_t\] as the corresponding estimation error.
Then, these have the following properties.
 Lemma 2

For any control strategy \(g\), we have
 $x_t = x^s_t  $ is controlfree and can be written just in terms of the primitive random variables.
Furthermore, for any matrix \(M\) of appropriate dimensions:
 \(\EXP[\hat x_t^\TRANS M \tilde x_t ] = 0\).
 \(\EXP[ u_t^\TRANS M \tilde x_t ] = 0\).
Proof
To prove part 1, we note that \[\begin{align} \tilde x_t &= x_t  \EXP[x_t  I^u_t ] \notag \\ &\stackrel{(a)}= x^g_t + x^s_t  \EXP[ x^g_t + x^s_t  I^u_t] \notag \\ &\stackrel{(b)}= x^s_t  \EXP[x^s_t  I^u_t ] \notag \\ &\stackrel{(c)}= x^s_t  \EXP[x^s_t  I^{u,s}_t ] \label{eq:tildex} \end{align}\] where \((a)\) follows from state splitting, \((b)\) follows from the fact that \(x^g_t\) is a function of \(u_{1:t1}\) which is a part of \(I^u_t\), and \((c)\) follows from Lemma 1. Part 1 then follows by observing that \eqref{eq:tildex} depends only on primitive random variables.
To prove parts 2 and 3, let \(z_t\) be a function of \(I^u_t\) and \(M\) be a matrix of appropriate dimensions. Then, \[\begin{align} \EXP[z_t^\TRANS M \tilde x_t] &\stackrel{(d)}= \EXP[ \EXP[ z_t^\TRANS M \tilde x_t  I^u_t ] ] \notag \\ &\stackrel{(e)}= \EXP[ z_t^\TRANS M \EXP[ \tilde x_t  I^u_t ] ] \notag \\ &\stackrel{(f)}= 0. \end{align}\] where \((d)\) follows from the smoothing property of conditional expectation, \((e)\) follows from the fact that \(z_t\) is a function of \(I^u_t\), and \((f)\) follows from the fact that \(\EXP[\tilde x_t  I^u_t] = 0\) by construction.
Part 2 follows from observing that \(\hat x_t\) is a function of \(I^u_t\). Part 3 follows from observing that \(u_t\) is a function of \(I^u_t\). \(\Box\)
 Lemma 3

For any control strategy \(g\), the first term of \eqref{eq:astrom} may be written as \[\begin{align} & \EXP^{g}\Bigl[ \sum_{t=1}^{T1} (u_t + L_t \hat x_t)^\TRANS [R_t + B_t^\TRANS S_{t+1} B_t](u_t + L_t \hat x_t) ] \notag \\ &\quad + \EXP\Bigl[ \sum_{t=1}^{T1} (L_t \tilde x_t)^\TRANS [R_t + B_t^\TRANS S_{t+1} B_t](L_t \tilde x_t) ] \label{eq:simple} \end{align}\]
Proof
Lemma 2 implies that for any matrix \(M\) of appropriate dimensions, \[ \EXP[ (u_t + L_t x_t)^\TRANS M (u_t + L_t x_t) = \EXP[ (u_t + L_t \hat x_t)^\TRANS M (u_t + L_t \hat x_t) ] + \EXP[ (L_t \tilde x_t)^\TRANS M (L_t \tilde x_t) ], \] where the crossterms are zero due to parts 2 and 3 of Lemma 2. The result of the Lemma follows by repeatedly using the above property at each time step.
4 Main Result
 Theorem 1

For any transmission strategy \(\tilde f\) of the form given in Corollary 1, the best response control strategy for the networked control system discussed in this section is given by \[\begin{equation} \label{eq:optimal} u_t =  L_t \hat x_t. \end{equation}\] Furthermore, the state estimate \(\hat x_t\) is given by \[\hat x_{t} = x^g_t + \EXP[ x^s_t  I^{u,s}_t]\]
Proof
The proof of the structure of the optimal controller follows by combining various properties described above. In particular, we have shown that for any any control strategy \(g\), the total cost can be written as \eqref{eq:astrom}, where the second term depends just on the primitive random variables. Moreover, the first term of \eqref{eq:astrom} can be written as \eqref{eq:simple}, where (by Lemma 2, part 1) the second term is control free and depends just on the primitive random variables. Therefore, it suffices to minimize the first term of \eqref{eq:simple} to minimizing \(J(g)\). By assumption, \(S_T = Q_T\) is positive semidefinite. It can be recursively shown that \(S_t\) is also positive definite. Therefore, the first term of \eqref{eq:simple} is greater than or equal to zero, with equality if and only if the strategy is given by \eqref{eq:optimal}. Since the policy \eqref{eq:optimal} achieves the minimal value of the cost, it is optimal.
We now show the structure of the state estimate. Note that \[\begin{align*} \hat x_{t} &= \EXP[ x_{t}  I^{u}_{t} ] = \EXP[ x^g_t + x^s_t  I^u_t ] \\ &\stackrel{(a)}= x^g_t + \EXP[ x^s_t  I^u_t ] \\ &\stackrel{(b)}= x^g_t + \EXP[ x^s_t  I^{u,s}_t ] \end{align*}\] where \((a)\) follows from the fact that \(x^g_t\) is a function of \(u_{1:t1}\) which is a part of \(I^u_t\) and \((b)\) follows from Lemma 1. \(\Box\)
Note that unlike the previous model of NCS, in the current model, we cannot identify the update rule for the state estimator in closed form. This is because the state estimator will depend on the transmission strategy. However, the problem of choosing the best transmission strategy is separate from that of choosing the control strategy. We have shown that for any given transmission strategy \(\tilde f\), if we choose the best response strategy at the controller, the performance is given by \[\begin{align} J^*(\tilde f) &= \EXP\Bigl[ \sum_{t=1}^{T1} (L_t \tilde x_t)^\TRANS [R_t + B_t^\TRANS S_{t+1} B_t](L_t \tilde x_t) \Bigr] \notag \\ & \quad + \EXP\bigg[ x_1^\TRANS S_1 x_t + \sum_{t=1}^{T1} w_t S_{t+1} w_t \bigg], \end{align}\] where \(\tilde x_t = x^s_t  \EXP[ x^s_t  I^{u,s}_t]\). Note that the second term of the above expression depends only on the primitive random variable. Thus, to find the optimal transmission strategy, we need to minimize the first term, which is a purely communication/estimation problem known as remote state estimation in the literature. The optimal solution to the remote state estimation problem for the case of scalar state is derived in Lipsa and Martins (2009) and Chakravorty and Mahajan (2020) (using basic proof idea developed in Lipsa and Martins (2011)).
References
This entry was last updated on 19 Oct 2020 and posted in NCS and tagged linear systems, riccati equation, lqr, communication.