# 29 Large scale systems

LQR, mean-field control

## 29.1 Mean-field control

Consider a system consisting of \(N\) subsystems, indexed by the set \(\ALPHABET N \coloneqq \{1, \dots, N\}\). Each subsystem \(i \in \ALPHABET N\) has a state \(x^i_t \in \reals^{n}\) and a control input \(u^i_t \in \reals^m\). The dynamics of each subsystem are given as \[\begin{equation}\label{eq:dynamics}
x^i_{t+1} = A x^i_t + B u^i_t + D \bar x_t + E \bar u_t + w^i_t
\end{equation}\] where \[\begin{equation}
\bar x_t \coloneqq \frac 1N \sum_{i \in \ALPHABET N} x^i_t
\quad\text{and}\quad
\bar u_t \coloneqq \frac 1N \sum_{i \in \ALPHABET N} u^i_t
\end{equation}\] are the *emperical mean-field* of the state and control, respectively and \(A\), \(B\), \(D\), \(E\) are matrices of appropriate dimensions. The noise processes \(\{w^i_t\}_{t \ge 1}\), \(i \in \ALPHABET N\) are correlated across subsystem but are assumed to be independent across time.

We use \(\pmb x_t \coloneqq (x^1_t, \dots, x^N_t)\) and \(\pmb u_t \coloneqq(u^1_t, \dots, u^N_t)\) to denote the global state and control of the system. The system incurs a per-step cost given by \[\begin{equation}\label{eq:mf-cost} c(\pmb x_t, \pmb u_t) = \bar x_t^\TRANS \bar Q \bar x_t + \bar u_t^\TRANS \bar R \bar u_t + \frac 1N \sum_{i \in \ALPHABET N} \bigl[ (x^i_t)^\TRANS Q x^i_t + (u^i_t)^\TRANS R u^i_t \bigr] \end{equation}\] and a terminal cost \[\begin{equation}\label{eq:mf-cost-T} c_T(\pmb x_T) = \bar x_T^\TRANS \bar Q_T \bar x_T + \frac 1N \sum_{i \in \ALPHABET N} (x^i_T)^\TRANS Q_T x^i_T \end{equation}\]

We are interested in identifying policies \(g = (g_1, \dots, g_{T-1})\) where \(\pmb u_t = g_t(\pmb x_t)\) to minimize \[\begin{equation}\label{eq:performance} J(g) = \EXP^g\biggl[\sum_{t=1}^{T-1} c(\pmb x_t, \pmb u_t) + c_T(\pmb x_T) \biggr] \end{equation}\]

Note that the subsystems are weakly coupled in the dynamics and cost. A naive solution using by solving a single Riccati equation has a complexity of \(\mathcal O(n^3 N^3)\), which scales cubically in the number of agents.

### 29.1.1 State space decomposition

We now present a decomposition method to simplify the above optimization problem. Note that \(\eqref{eq:dynamics}\) implies that \[\begin{equation}\label{eq:mf-dynamics} \bar x_t = (A + D) \bar x_t + (B + E) \bar u_t + \bar w_t \end{equation}\] where \(\bar w_t = (\sum_{i \in \ALPHABET N}w^i_t)/N\). Define \[ \breve x^i_t = x^i_t - \bar x_t \quad\text{and}\quad \breve u^i_t = u^i_t - \bar u_t. \] Then, subtracting \(\eqref{eq:mf-dynamics}\) from \(\eqref{eq:dynamics}\), we get \[\begin{equation}\label{eq:breve-dynamics} \breve x^i_t = A \breve x^i_t + B \breve u^i_t + \breve w^i_t \end{equation}\] where \(\breve w^i_t = w^i_t - \bar w_t\).

We can think of \(\bar x_t\) as the “center of mass” of the system and \(\breve x^i_t\) to be the relative coordinates of subsystem \(i\) wrt the center of mass. Building on this intuition, we make the following simple observation, which may be viewed as an analog of the :parallel axis theorem in physics.

**Lemma 29.1** We have the following:

- \(\displaystyle \frac 1N\sum_{i \in \ALPHABET N} (x^i_t)^\TRANS Q x^i_t = \bar x_t^\TRANS Q \bar x_t + \frac 1N \sum_{i \in \ALPHABET N} (\breve x^i_t)^\TRANS Q \breve x^i_t\).
- \(\displaystyle \frac 1N\sum_{i \in \ALPHABET N} (u^i_t)^\TRANS Q u^i_t = \bar u_t^\TRANS Q \bar u_t + \frac 1N \sum_{i \in \ALPHABET N} (\breve u^i_t)^\TRANS Q \breve u^i_t\).

The proof follows from the observation that \(\sum_{i \in \ALPHABET N} \breve x^i_t = 0\) and elementary algebra.

An immediate implication of Lemma 29.1 is that the per-step cost can be decomposed as follows: \[\begin{equation} c(\pmb x_t, \pmb u_t) = \bar c(\bar x_t, \bar u_t) + \frac 1N \sum_{i \in \ALPHABET N} \breve c^i(\breve x^i_t, \breve u^i_t) \end{equation}\] where \[\begin{align*} \bar c(\bar x_t, \bar u_t) &= \bar x_t^\TRANS (Q + \bar Q) \bar x_t + \bar u_t^\TRANS (R + \bar R) \bar u_t, \notag \\ \breve c^i(\breve x^i_t, \breve u^i_t) &= \frac 1N \sum_{i \in \ALPHABET N} \bigl[ (\breve x^i_t)^\TRANS Q \breve x^i_t + (\breve u^i_t)^\TRANS Q \breve u^i_t \bigr]. \end{align*}\] A similar decomposition also holds for the terminal cost \(c_t(\pmb x_t, \pmb u_t)\).

Thus, the original system is equivalent to \(N+1\) coupled subsystems:

- a mean-field subsystem with state \(\bar x_t\), control input \(\bar u_t\), and per-step cost \(\bar c(\bar x_t, \bar u_t)\).
- \(N\) auxiliary subsytems, where subsystem \(i \in \ALPHABET N\) has state \(\breve x^i_t\), control input \(\breve u^i_t\), and per-step cost \(\breve c(\breve x^i_t, \breve u^i_t)\).

Note that the only coupling between the subsystems is through the noise. Therefore, by the argument presented in Exercise 28.3, the optimal control strategy is of the following form.

**Proposition 29.1** The optimal control policy for the mean-field control system described above is given by \[
u_t = - \bar K_t \bar x_t + \breve K_t (x^i_t - \bar x_t)
\] where

- \(\bar K_{1:T-1} = \LQR_T(A+D, B+E, Q + \bar Q, R + \bar R; Q_T + \bar Q_T)\)
- \(\breve K_{1:T-1} = \LQR_T(A,B, Q, R; Q_T)\).

The above result is significant, both for synthesis and implementation.

For synthesis, rather than solving one Riccati equation with state dimension \(nN\), we need to solve two Riccati equations with dimension \(n\). Thus, the complexity of computing the optimal controller gains does not depend on the number \(N\) of subsystems.

For implementation, each subsystem does not need access to the global state \(\pmb x_t\); instead it just needs access to the mean-field \(\bar x_t\) in addition to its local state \(x^i_t\).

## 29.2 Network coupled subsystems

## Notes

The results for mean-field control are adapted from Arabneydi and Mahajan (2016). The discussion above is restricted to the simplest setting of homogenous subsystems. Generalization to hetrogeneous subsystems and infinite horizon settings are also presented in Arabneydi and Mahajan (2016). A similar result for \(N \to \infty\) is also presented in Elliott et al. (2013).

The results for network coupled subsystems are adapted from Gao and Mahajan (2022).