State space models

Updated

November 19, 2024

The main idea behind state space models is to express constant coefficient linear differential equations as a first order vector differential equation. This representation allows us to use linear algebra to design controllers. To understand this, let’s start with a homogeneous first-order LDE: \[ \frac{dy(t)}{dt} = a y(t) \] with the initial condition \(y(0) = y_0\). We will write this equation as \[\bbox[5pt,border: 1px solid] {\dot y(t) = a y(t), \quad y(0) = y_0} \] By inspection, we know that the solution is \[y(t) = e^{at} y_0.\]

However, we lose this simplicity if we go to second or higher order LDEs. For example, consider the LDE: \[\frac{d^2y(t)}{dt^2} + a_1 \frac{dy(t)}{dt} + a_0 y(t) = 0\] which known initial conditions \(y(0)\) and \(\dot y(0)\).

How can we find the solution? A commonly used method is to to use (one-sideded) Laplace transforms (with initial conditions) to compute the output. Recall \[\begin{align*} \dot x(t) &\xleftrightarrow{\quad \mathcal L\quad} s X(s) - x(0) \\ \ddot x(t) &\xleftrightarrow{\quad \mathcal L\quad} s^2 X(s) - sx(0) - \dot x(0). \end{align*}\]

So, if we take the LT of the DE, we get \[\begin{equation*} \bigl[ s^2 Y(s) - s y(0) - \dot y(0) \bigr] + a_1 \bigl[s Y(s) - y(0) \bigr] + a_2 Y_0(s) = 0 \end{equation*}\] Hence, \[ Y(s) = \frac{y(0) s + (a_1 y(0) + \dot y(0))}{s^2 + a_1 s + a_0}. \] We can now simplify the above using partial fractions expansion to compute \(y(t)\).

State space models provide a nicer way to solve such DEs. In particular, we will provide a method to convert general \(2\)-nd order LDE into a vector equation \[\begin{align*} \dot x(t) &= A x(t) \\ y(t) &= C x(t) \end{align*}\] where \(x(t) \in \reals^2\), \(A \in \reals^{2 × 2}\) and \(C \in \reals^{1 \times 2}\).

We will then look at general LTI systems and show that any \(n\)-th order TF can be written as a SSM: \[\begin{align*} \dot x(t) &= A x(t) + B u(t) \\ y(t) &= C x(t) \end{align*}\] where \(x(t) \in \reals^n\), \(A \in \reals^{n × n}\), \(B \in \reals^{n \times 1}\) and \(C \in \reals^{1 \times n}\).

In the next lecture, we will study how to solve such differential equations for a particular choice of input.

1 Converting DE to SSM

In this section, we show how to construct SSM from DE. The key point to realize is that the SSM is not unique, so we will consider different representations. Each representation is useful for a specific purpose, as we will see later in the course.

We assume that we are either given a normalized DE (i.e., the leading coefficient is 1): \[ \frac{d^n y(t)}{dt^n} + a_{n-1} \frac{d^{n-1}y(t)}{dt} + \cdots + a_0 y(t) = b_m \frac{d^m u(t)}{dt^m} + \cdots + b_0 u(t) \] or a normalized TF \[G(s) = \frac{b_m s^m + \cdots + b_0}{s^n + a_{n-1}s^{n-1} + \cdots + a_0}.\]

We will restrict attention to proper TFs, i.e., \(m < n\). There are three canoncial realizations, which we illustrate via examples first.

  1. Controllable Canonical Form (CCF)
  2. Observable Canonical Form (OCF)
  3. Diagonal Canonical Form (DCF)

2 Illustrative Example

2.1 CCF representation

Consider the homogeneous second order DE considered above: \[\begin{equation}\label{eq:2nd-DE} \frac{d^2y(t)}{dt^2} + a_1 \frac{dy(t)}{dt} + a_0 y(t) = 0. \end{equation}\]

We consider a state vector: \[ x(t) = \MATRIX{x_1(t) \\ x_2(t)} = \MATRIX{y(t) \\ \dot y(t)}. \] Then, we can write \(\eqref{eq:2nd-DE}\) as follows: \[\begin{align*} \dot x_1(t) &= x_2(t) \\ \dot x_2(t) &= - a_0 x_1(t) - a_1 x_2(t) \end{align*}\] This equation can be written in vector form as follows: \[\begin{align*} \MATRIX{\dot x_1(t) \\ \dot x_2(t) } &= \MATRIX{0 & 1 \\ -a_0 & -a_1} \MATRIX{x_1(t) \\ x_2(t) }, \\ y(t) &= \MATRIX{\phantom{-a_0}\llap{1} & \phantom{-a_1}\llap{0} } \MATRIX{x_1(t) \\ x_2(t)}. \end{align*}\] This is called controllable cannonical form (CCF) realization of \(\eqref{eq:2nd-DE}\). The meaning of the terminology will become clear later.

2.2 OCF representation

The second method to obtain a SSM from the LDE is as follows. Write the LDE \(\eqref{eq:2nd-DE}\) as an integro-differential equation: \[\begin{align*} \ddot y(t) &= - a_1 \dot y(t) + a_0 y(t) \\ \implies \quad \dot y(t) &= - a_1 y(t) + a_0 \int_{0^-}^t y(\tau)d\tau \end{align*}\] Now define \[\begin{align*} x_1(t) &= -a_0\int_{0^-}^t y(\tau) d\tau , & x_2(t) &= y(t) \end{align*}\] Then, \[\begin{align*} \dot x_1(t) &= -a_0 y(t) = -a_0 x_2(t) \\ \dot x_2(t) &= \dot y(t) = x_1(t) - a_1 x_2(t) \end{align*}\] This equation can be written in vector form as follows: \[\begin{align*} \MATRIX{\dot x_1(t) \\ \dot x_2(t) } &= \MATRIX{0 & -a_0 \\ 1 & -a_1} \MATRIX{x_1(t) \\ x_2(t) }, \\ y(t) &= \MATRIX{0 & \phantom{-a_1}\llap{1} } \MATRIX{x_1(t) \\ x_2(t)}. \end{align*}\] This is called observable canoncial form (OCF) realization of \(\eqref{eq:2nd-DE}\). Again, the reason for the terminolgy will become clear later.

Non-uniqueness of the SSM realization

The above example shows that the same DE can have multiple SSM realizations.

3 General form

We now consider a general second order TF: \[ G(s) = \frac{b_1 s+b_0}{s^2 + a_1 s + a_0} \]

As an illustration, we will also use the following numerical example:

Example 1 \[ G(s) = \frac{s+5}{s^2 + 3s + 2} \]

3.1 CCF representation

For the generic second order system, the CCF form is \[ A = \MATRIX{ 0 & 1 \\ \textcolor{red}{-a_0} & \textcolor{red}{-a_1} }, \quad B = \MATRIX{ 0 \\ 1}, \quad C = \MATRIX{\textcolor{red}{b_0} & \textcolor{red}{b_1} } \] This is the general structure of the matrix, where only the red elements have to be filled based on the TF. For example, for the TF in Example 1, we have \[ A = \MATRIX{ 0 & 1 \\ -2 & -3 } \quad B = \MATRIX{ 0 \\ 1}, \quad C = \MATRIX{5 & 1} \]

In general, we have the following:

Figure 1: Structure of CCF representation

3.2 OCF representation

For the generic second order system, the OCF form is \[ A = \MATRIX{ 0 & \textcolor{red}{-a_0} \\ 1 & \textcolor{red}{-a_1} }, \quad B = \MATRIX{\textcolor{red}{b_0} \\ \textcolor{red}{b_1} }, \quad C = \MATRIX{ 0 & 1} \] This is the general structure of the matrix, where only the red elements have to be filled based on the TF. For example, for the TF in Example 1, we have \[ A = \MATRIX{ 0 & -2 \\ 1 & -3 } \quad B = \MATRIX{5 \\ 1}, \quad C = \MATRIX{ 0 & 1}, \]

In general, we have the following:

Figure 2: Structure of OCF representation

3.3 DCF representation

This representation is special and only applicable when the system has distinct real roots. Then, using the partial fraction expansion, the TF may be written as \[ G(s) = \frac{c_1}{s+p_1} + \frac{c_2}{s + p_2} + \dots + \frac{c_n}{s+p_n} \] In DCF, the system matrices are given by

Figure 3: Structure of DCF representation

In particular, for the TF in Example 1, we have \[ G(s) = \frac{s+5}{s^2 + 3s + 2} = \frac{4}{s+1} - \frac{3}{s+2}. \]

Thus, \[ A = \MATRIX{-1 & 0 \\ 0 & -2}, \quad B = \MATRIX{ 1 \\ 1 }, \quad C = \MATRIX{4 & -3}. \]

In general, the poles will be complex and have multiplicity. In this setting, a generalization of DCF known as Jordan canconical form (JCF) is used. We will not consider JCF in this course.

4 SSMs for circuits

We will use the following notation in this section: \(v(t)\) denotes voltage, \(i(t)\) denotes current, and \(q(t)\) denotes charge.

For the three main circuit elements, we have the following relationship:

Relationship Resistor Capcitor Inductor
voltage-current \(v(t) = R i(t)\) \(\displaystyle v(t) = \frac{1}{C} \int_{0}^t i(τ) d τ\) \(\displaystyle \bbox[5pt, background-color:lightgray]{v(t) = L \frac{di(t)}{dt}}\)
current-voltage \(\displaystyle i(t) = \frac{1}{R} v(t)\) \(\bbox[5pt, background-color:lightgray]{\displaystyle i(t) = C \frac{dv(t)}{dt}}\) \(i(t) = \displaystyle \frac 1L \int_{0}^t v(τ) d τ\)
voltage-charge \(v(t) = R \dfrac{dq(t)}{dt}\) \(v(t) = \dfrac{1}{C} q(t)\) \(v(t) = L \dfrac{d^2q(t)}{dt^2}\)

Each energy storage element (i.e., capacitor and inductor) gives rise to a state variable. We want to avoid integral equations, so for capacitor we choose voltage as state and for inductor we choose current as state.

4.1 Example 1

As an example, let’s start with a simple example to understand how state space modeling works. Let’s consider an RCL circuit shown in Figure 4.

Figure 4: A simple circuit

We construct the state space model as follows.

4.1.1 Step 1: Select the state variables

We write the appropriate relation for the two energy storage elements: \[\begin{align*} C \frac{d v_C(t)}{dt} &= i_C(t) = i_L(t) \\ L \frac{d i_L(t)}{dt} &= v_L(t) \end{align*}\]

Thus, we use \(\MATRIX{ v_C(t) \\ i_L(t) }\) as state.

4.1.2 Step 2: Use KCL/KVL to simplify the RHS of step 1

We want the RHS of the above equations to be expressed in terms of input \(u(t)\) and state \((v_C(t), i_L(t))\). Note that \(dv_C(t)/dt\) is already in terms of \(i_L(t)\). We now use KVL to find an expression for \(v_L(t)\) in terms of the other variables: \[u(t) = v_R(t) + v_L(t) + v_C(t)\] Thus, \[ v_L(t) = -v_R(t) - v_C(t) + u(t) = - R i_L(t) - v_C(t) + u(t). \]

Thus, we have \[\begin{align*} \frac{d v_C(t)}{dt} &= \frac{1}{C} i_L(t) \\ \frac{d i_L(t)}{dt} &= - \frac{1}{L} v_C(t) - \frac{R}{L} i_L(t) + \frac{1}{L} u(t) \end{align*}\]

Thus, the SSM is: \[ \def\1{\vphantom{\dfrac{d}{dt}}} \bbox[5pt,background-color: lightgray] { \begin{align*} \MATRIX{ \dfrac{d v_C(t)}{dt} \\ \dfrac {d i_L(t)}{dt} } &= \MATRIX{0 & \dfrac{1}{C} \\ -\dfrac{1}{L} & - \dfrac{R}{L} } \MATRIX{ \1 v_C(t) \\ \1 i_L(t) } + \MATRIX{\1 0 \\ \dfrac{1}{L}} u(t) \\ y(t) &= \MATRIX{1 & 0} \MATRIX{ \1 v_C(t) \\ \1 i_L(t) } \end{align*} } \]

Thus, we have \[ \def\1{\vphantom{\dfrac{d}{dt}}} A = \MATRIX{0 & \dfrac{1}{C} \\ -\dfrac{1}{L} & - \dfrac{R}{L} } , \quad B = \MATRIX{\1 0 \\ \dfrac{1}{L}}, \quad C = \MATRIX{1 & 0 } \]

4.2 Example 2

Now consider the example shown in Figure 5

Figure 5: An RCL circuit

We construct the state space model as follows.

4.2.1 Step 1: Select the state variables

We write the appropriate relation for the two energy storage elements: \[\begin{align*} C \frac{d v_C(t)}{dt} &= i_C(t) \\ L \frac{d i_L(t)}{dt} &= v_L(t) \end{align*}\]

Thus, we use \(\MATRIX{ v_C(t) \\ i_L(t) }\) as state.

4.2.2 Step 2: Use KCL/KVL to simplify the RHS of step 1

We use KCL/KVL to write the right hand side of Step 1 as a linear combination of state and input. From KCL we have \[\begin{align*} i_L(t) &= i_R(t) + i_C(t) \\ \implies i_C(t) &= i_L(t) - i_R(t) \\ &= i_L(t) - \frac{1}{R} v_C(t) \qquad [\because v_C(t) = v_R(t)] \end{align*}\]

Moreover, from KVL we have \[\begin{align*} u(t) &= v_L(t) + v_C(t) \\ \implies v_L(t) &= -v_C(t) + u(t) \end{align*}\]

Thus, we have \[\begin{align*} \dfrac{d v_C(t)}{dt} &= \frac{1}{C} i_c(t) = \frac{1}{C} i_L(t) - \frac{1}{RC} v_C(t) \\ \dfrac{d i_L(t)}{dt} &= \frac{1}{L} v_L(t) = -\frac{1}{L} v_C(t) + \frac{1}{L} u(t) \end{align*}\]

Thus, the SSM is: \[ \def\1{\vphantom{\dfrac{d}{dt}}} \bbox[5pt,background-color: lightgray] { \begin{align*} \MATRIX{ \dfrac{d v_C(t)}{dt} \\ \dfrac {d i_L(t)}{dt} } &= \MATRIX{-\dfrac{1}{RC} & \dfrac{1}{C} \\ -\dfrac{1}{L} & 0 } \MATRIX{ \1 v_C(t) \\ \1 i_L(t) } + \MATRIX{\1 0 \\ \dfrac{1}{L}} u(t) \\ y(t) &= \MATRIX{1 & 0} \MATRIX{ \1 v_C(t) \\ \1 i_L(t) } \end{align*} } \]

Thus, we have \[ \def\1{\vphantom{\dfrac{d}{dt}}} A = \MATRIX{-\dfrac{1}{RC} & \dfrac{1}{C} \\ -\dfrac{1}{L} & 0 } , \quad B = \MATRIX{\1 0 \\ \dfrac{1}{L}}, \quad C = \MATRIX{1 & 0 } \]

5 From state space models to transfer functions

So far, we have seen how to write different SSM representations from a TF. It is also possible to go in the other direction.

Suppose we have a SSM: \[\begin{align*} \dot x(t) &= A x(t) + B u(t) \\ y(t) &= C x(t) \end{align*}\] Then the transfer function is \[ \bbox[5pt,border: 1px solid] {G(s) = C(sI - A)^{-1} B } \]

Take LT of the SSM: \[\begin{align*} s X(s) &= A X(s) + B U(s) & \implies&& X(s) &= (sI - A)^{-1} B U(s) \\ Y(s) &= C X(s) & \implies&& Y(s) &= C(sI - A)^{-1} B U(s) \end{align*}\]

Hence, \[ G(s) = \frac{Y(s)}{U(s)} = C(sI - A)^{-1} B.\]

We can think of this in terms of the following block diagram.

Figure 6: Block diagram of SSM

The TF can be “read backwards” from the block diagram.

Example 2 Let \(A = \MATRIX{-1 & 2 \\ 3 & -1}\), \(B = \MATRIX{1 \\ 0}\), and \(C = \MATRIX{2 & 1}\). Find the TF?

Note that \[sI - A = \MATRIX{s+1 & -2 \\ -3 & s+1}.\]

Therefore, \[\det(sI-A) = (s+1)^2 - 6 = s^2 + 2s -5. \]

Hence, \[(sI - A)^{-1} = \frac{1}{\det(sI-A)} \MATRIX{s+1 & 2 \\ 3 & s+1}.\]

Finally, \[G(s) = C(sI-A)^{-1}B = \frac{1}{\det(sI-A)} \MATRIX{2 & 1} \MATRIX{s+1 & 2 \\ 3 & s+1} \MATRIX{1 \\ 0} = \frac{2s + 5}{s^2 + 2s - 5}.\]

Recall that \(\det(sI-A)\) is the characteristic equation of \(A\) and the roots of the characteristic equation are the eigenvalues of \(A\). Therefore, an important implication of the above formula is \[\bbox[5pt,border: 1px solid] {\text{Poles of $G(s)$} = \text{evals of $A$}}\] This means that we can control the poles of the TF by controlling the eigenvalues of \(A\).

Changing the eigenvalues of the \(A\) matrix will be one of the most simple and the most important design methods that we will learn in this course.

6 Change of coordinates

A key idea for controller design using state space methods is change of coordinates. To fix ideas, we first review the change of coordinates for vectors and then we will come back to SSMs.

Consider a vector space \(\reals^2\). The standard basis for \(\reals^2\) is \[ e_1 = \MATRIX{1 \\ 0} \quad\text{and}\quad e_2 = \MATRIX{0 \\ 1}. \] When we write \(x = \MATRIX{x_1 \\ x_2} \in \reals^2\), what we really mean is \[x = x_1 e_1 + x_2 e_2.\]

What happens if we choose a different basis, say \(v_1\) and \(v_2\). If the vector \[ x = \tilde x_1 v_1 + \tilde x_2 v_2 \] then we can say that the vector is equal to \(\MATRIX{\tilde x_1 \\ \tilde x_2}\) in the coordinate system \((v_1, v_2)\). Note that the vector hasn’t changed, only the coodinate system has.

We use the following example to illustrate how to move from one coordinate system to another

Example 3 Let \(x = \MATRIX{2 \\ 3}\). Find its representation in the coordinate system \[v_1 = \MATRIX{1 \\ 0} \quad\text{and}\quad v_2 = \MATRIX{1 \\ 1}. \]

Solution

Let the representation in the new coordinate system be \(\tilde x = \MATRIX{\tilde x_1 \\ \tilde x_2}\). Thus, we must have \[ 2 e_1 + 3 e_2 = \tilde x_1 v_1 + \tilde x_2 v_2. \] Or, in vector form, \[ \MATRIX{1 & 0 \\ 0 & 1} \MATRIX{2 \\ 3} = \MATRIX{1 & 1 \\ 0 & 1} \MATRIX{\tilde x_1 \\ \tilde x_2} \] which can be written as \[\bbox[5pt,border: 1px solid] { x = T \tilde x} \] where \[T = \MATRIX{v_1 & v_2} = \MATRIX{1 & 1 \\ 0 & 1}.\] The matrix \(T\) is full rank (which is always the case if \(\text{span}(v_1,v_2) = \reals^2\). Thus, we have \[\bbox[5pt,border: 1px solid] { \tilde x = T^{-1} x} \]

So, for the above example we have \[\tilde x = \MATRIX{1 & 1 \\ 0 & 1}^{-1} \MATRIX{2 \\ 3} = \MATRIX{-1 \\ 3}.\]

(a) Vector \(x\) in standard coordinate system

(b) Vector \(x\) in oblique coordinate system
Figure 7: Vector \(x\) in different coordinate systems

7 Review of eigenvalues and eigenvectors

For the last several years, eigenvalues and eigenvectors are not covered in CEGEP. Therefore, students who come from CEGEP and don’t take the linear algebra course at McGill haven’t seen this material before. Eigenvalues and eigenvectors are perhaps the most fundamental idea in linear algebra. We present a quick review of this material here, but if you haven’t seen this material before, I strongly study it in detail. Some resources:

We can think of an \(n × n\) square matrix \(A\) as a function that takes vectors in \(\reals^n\) and outputs vectors in \(\reals^n\), i.e., \[ f(x) = A x. \]

Such a function is a linear function, i.e., for any scalars \(α_1\) and \(α_2\) and any vectors \(x_1\) and \(x_2\), we have \[ f(α_1 x_1 + α_2 x_2) = α_1 f(x_1) + α_2 f(x_2). \]

Suppose we were free to choose the coordinate system in which we evaluate \(f\). Which coordinate system should we choose? This is where eigenvalues and eigenvectors come in.

An eigenvector is a special vector \(x\) such that \(f(x)\) is a scaled (and possibly flipped) version of \(x\) (i.e., \(f(x)\) lies in the linear subspace spanned by \(x\)). Mathematically, there exists a scalar \(λ\) such that \[ A x = λ x \]

Example 4 Consider \(A = \MATRIX{0 & 1 \\ 1 & 0}\). This is a permultation matrix. \[ A \MATRIX{x_1 \\ x_2} = \MATRIX{x_2 \\ x_1}. \]

  • Find an eigenvector with eigenvalue \(λ = 1\)?

    We want a vector \(\MATRIX{x_1 \\ x_2}\) such that \[ \underbrace{\MATRIX{ x_2 \\ x_1 }}_{A x} = \underbrace{\MATRIX{x_1 \\ x_2}}_{λ x} \] Thus, any vector of the form \(\MATRIX{x_1 \\ x_1}\) (i.e., any vector that lies in \(\SPAN\left(\MATRIX{1 \\ 1}\right)\) is an eigenvector with eigenvalue \(λ = 1\).

  • Find an eigenvector with eigenvalue \(λ = -1\)?

    We want a vector \(\MATRIX{x_1 \\ x_2}\) such that \[ \underbrace{\MATRIX{ x_2 \\ x_1 }}_{A x} = \underbrace{-\MATRIX{x_1 \\ x_2}}_{λ x} \] Thus, any vector of the form \(\MATRIX{-x_1 \\ x_1}\) (i.e., any vector that lies in \(\SPAN\left(\MATRIX{-1 \\ 1}\right)\) is an eigenvector with eigenvalue \(λ = -1\).

Note that \(A\) has two eigenvalues \(λ \in \{-1, 1\}\). From the definition of the eigenvectors it is clear that the eigenvectors corresponding to distinct eigenvalues must be linearly independent.

7.1 How to compute eigenvalues and eigenvectors

The previous example shows how to find eigenvectors if we know the eigenvalues. But how do we find the eigenvalues? Let’s consider the equation \[ A x = λ x \] which can also be written as \[ Ax = λ I x \implies (λ I - A) x = 0.\]

If there is a \((λ, x)\) with \(x \neq 0\) that satisfies the above equation, then it must be case that \(λI - A\) is singular, i.e., \[ \bbox[5pt,border: 1px solid] {\det(λI - A) = 0} \] The above equation is called the characteristic equation of \(A\) and the eigenvalues of \(A\) are the roots of the characterisitc equation!

Some special cases
  • If \(A\) is a diagonal matrix, then the eigenvalues are the diagonal elements! For example, for \(A = \MATRIX{3 & 0 \\ 0 & 2}\) we have \[ \det(λ I - A) = \DET{ λ -3 & 0 \\ 0 & λ -2} = (λ-3)(λ - 2). \]

  • If \(A\) is an upper or lower triangular matrix, then the eigenvalues are the diagonal elements! For example, for \(A = \MATRIX{3 & 1 \\ 0 & 2}\) we have \[ \det(λ I - A) = \DET{ λ -3 & -1 \\ 0 & λ -2} = (λ-3)(λ - 2). \]

Example 5 Find the eigenvalues and eigenvectors of \(A = \MATRIX{3 & 1 \\ 1 & 3 }\).

Solution

The characterisitc equation of \(A\) is \[ \det(λI - A) = \DET{ λ - 3 & -1 \\ -1 & λ - 3 } = (λ-3)^2 - 1 = (λ-2)(λ-4). \] Thus, the eigenvalues are \(\{2, 4\}\).

To find the eigenvectors cooresponding to \(λ = 4\), observe that \[ λ I - A = \MATRIX{ 1 & -1 \\ -1 & 1}. \] Thus, the eigenvector \(x\) must satisfy \[ (λI - A) x = 0 \implies \MATRIX{ 1 & -1 \\ -1 & 1} \MATRIX{ x_1 \\ x_2 } = 0 \implies \MATRIX{ x_1 - x_2 \\ -x_1 + x_2 } = 0. \] Thus, any vector of the form \(\MATRIX{ α \\ α}\) is an eigenvector.

To find the eigenvector corresponding to \(λ = 2\), observe that \[ λ I - A = \MATRIX{ -1 & -1 \\ -1 & -1}. \] Thus, the eigenvector \(x\) must satisfy \[ (λI - A) x = 0 \implies \MATRIX{ -1 & -1 \\ -1 & -1} \MATRIX{ x_1 \\ x_2 } = 0 \implies \MATRIX{ x_1 + x_2 \\ x_1 + x_2 } = 0. \] Thus, any vector of the form \(\MATRIX{ -α \\ α}\) is an eigenvector.

Which eigenvector to pick?

For any eigenvalue \(λ\), the matrix \(λ I - A\) is singular. Therefore, the equation \((λI - A) x = 0\) will have multiple solutions. Which solution to pick?

The standard convention (especially when you are using a mathematical programming language to compute eigenvectors) is to pick the unit vector. For example, in Example 5, one would pick:

  • For \(λ = 4\), the eigenvector is \(\MATRIX{ \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}}}\).
  • For \(λ = 2\), the eigenvector is \(\MATRIX{ \frac{-1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}}}\).

However, when doing analysis by hand, it is often more convenient to pick eigenvectors which are easier to represent. For example, in Example 5, one would pick

  • For \(λ = 4\), the eigenvector is \(\MATRIX{1 \\ 1}\).
  • For \(λ = 2\), the eigenvector is \(\MATRIX{-1 \\ 1}\).
Some remarks
  • Note that in Example 5, the matrix \(A\) is symmetric. Symmetric matrices have real eigenvalues.

  • We can show that \[ λ_1 + λ_2 + \cdots + λ_n = \text{Trace}(A) \] (where \(\text{Trace}(A)\) means the sum of diagonal elements of \(A\) and \[ λ_1 λ_2 \cdots λ_n = \det(A). \] You can verify these properties for Example 5.

  • In general for any \(2 × 2\) matrix, \[ \det(λI - A) = λ^2 - \text{Trace}(A) λ + \det(A). \]

If the matrix is not symmetric, then the eigenvalues may be complex.

Example 6 (Complex eigenvalues) Find the eigenvalues of the rotation matrix \(A = \MATRIX{0 & -1 \\ 1 & 0}\).

Solution

The characterisic equation is \[ \det(λI - A) = \DET{λ & 1 \\ -1 & λ} = λ^2 + 1 = 0. \] Thus, the eigenvalues are \(\pm j\).

7.2 Why are eigenvalues useful?

Let’s consider an example. Suppose \(A = \MATRIX{3 & 1 \\ 1 & 3}\). We have seen in Example 5 that the eigenvalues are \(λ_1 = 4\) and \(λ_2 = 2\) and the corresponding eigenvectors are \[ v_1 = \MATRIX{1 \\ 1}, \quad v_2 = \MATRIX{-1 \\ 1}. \]

Suppose we want to compute \(y = A x\). Instead of working with the standard basis, let’s pick the basis as \[ T = \MATRIX{v_1 & v_2} = \MATRIX{1 & -1 \\ 1 & 1 } \]

What is the matrix \(A\) in the new coordinate system? We have \(\tilde y = T^{-1} y\) and \(\tilde x = T^{-1} x\). Thus, we have \[ y = A x \implies T \tilde y = A T \tilde x \implies \tilde y = T^{-1} A T \tilde x. \] Thus, the matrix \(A\) in the new coordinate system is \[ \tilde A = T^{-1} A T. \]

When \(T = \MATRIX{ v_1 & v_2}\), the matrix \(T^{-1} A T\) has a very special form. Recall that we have \[ A v_1 = λ_1 v_1 \quad A v_2 = λ_2 v_2. \] In matrix form, we have \[ A \MATRIX{v_1 & v_2} = \MATRIX{v_1 & v_2} \MATRIX{λ_1 & 0 \\ 0 & λ_2}. \] Thus, \[ T^{-1} A T = \MATRIX{λ_1 & 0 \\ 0 & λ_2} \Eqqcolon Λ \] Thus, \(\tilde A = T^{-1} A T\) is a diagonal matrix. So computing \(\tilde A \tilde x\) in the new coordinate system is trivial.

But there is more. Suppose we wanted to compute \(A^{100}\). Directly computing this by matrix multiplication is going to be very very tedious However, we can do the following. We know that \[ A = T Λ T^{-1} \] Thus, \[ A^2 = T Λ T^{-1} T Λ T^{-1} = T Λ^2 T^{-1}. \] Continuing this way, we have \[ A^3 = T Λ^3 T^{-1} \] and so on. Therefore, \[ A^{100} = T Λ^{100} T^{-1}. \] We can read this equation from right-to-left. To compute \(A^{100}\) we first change to the “eigen-coordinate system” (by multiplying by \(T^{-1}\)), we do the computation “take the 100-th power” in the eigen-coordinate system (by computing \(Λ^{100}\)) and then we translate the result back to the original coordinate system (by multiplying by \(T\)).

7.3 Where have you seen this before?

The basic idea that we should always work in the eigen-coordinate system is at the heart of all we do in linear algebra, signals and systems, and differential equations (and as we shall see in this course, in control systems).

How are eigenvalues and eigenvectors related to Signals and Systems? Consider an LTI system shown in Figure 8

Figure 8: An LTI system with input \(x(t)\), impulse response \(g(t)\) and output \(y(t)\).

We know that \[ y(t) = x(t) * g(t) = \int_{-∞}^{∞} g(τ) x(t-τ) d τ. \] Note that an LTI system is also a “linear transform”. So it has eigenvalues and eigenvectors. What are they?

Claim. Complex exponentials (i.e., \(e^{st}\)) are eigenfunctions of LTI systems (see Figure 9)

Figure 9: \(G(s)\) is the eigenvalue corresponding to \(e^{st}\)

Note that we use the term eigenfunction instead of eigenvector because LTI systems are infinite dimensional so “vector” becomes a “function”.

Proof of claim. \[\begin{align*} y(t) &= \int_{-∞}^{∞} g(τ) e^{s(t-τ)} d τ \\ &= \left[ \int_{-∞}^{∞} g(τ) e^{-s τ} d τ \right] e^{st} \end{align*}\] where the term is the square brackets is \(G(s)\), the Laplace transform of \(g(t)\).

So, in LTI systems, when we need to find the output corresponding to an input \(x(t)\), we follow the same recipe of “always work in the eigen-coordinate system”

  1. Convert \(x(t)\) to the new coordinate system. By the inverse Laplace transform formula, we know \[ x(t) = \int_{-∞}^{∞} X(s) e^{st} ds \] (where the integral is along a straight-line contour, but we simply write \(s\) for notational simplicity) This is similar to the equation \(x = T \tilde x\). Thus, in the new coordinate system, \(x(t)\) has the representation \(X(s)\).

  2. Compute the output in the new coordinate system. Since \[ e^{st} \longrightarrow G(s) e^{st} \] we have by linearity
    \[ X(s) e^{st} \longrightarrow X(s) G(s) e^{st} \] and therefore \[ \int_{-∞}^{∞} X(s) e^{st} \longrightarrow \int_{-∞}^{∞} X(s) G(s) e^{st} \]

    Therefore, the output \(y(t)\) is the inverse Laplace transform of \(X(s) G(s)\). That’s all of Signals and Systems in a nutshell!

8 Change of coordinates in SSMs

Consider a SSM: \[\begin{align*} \dot x(t) &= A x(t) + B u(t) \\ y(t) &= C x(t) \end{align*}\]

Let \(T\) be any invertible \(n × n\) matrix. Such a matrix is called a similarity matrix. Think of columns of \(T\) as the basis of a new coordinate system. Then, the state can be written as \[ \tilde x(t) = T^{-1} x(t) \] in the new coordinate system.

  • What are the state space equations in the new coordinate system?
  • How does the change of coordinates affect the TF?

We look at both these questions in detail.

Since \(\tilde x(t) = T^{-1} x(t)\), we have \(x(t) = T \tilde x(t)\). Substituting this in the SSM, we have \[\begin{align*} T \dot {\tilde x}(t) &= A T \tilde x(t) + B u(t) \\ y(t) &= C T \tilde x(t) \end{align*}\]

Pre-multiply the first equation by \(T^{-1}\): \[\begin{align*} \dot {\tilde x}(t) &= T^{-1} A T \tilde x(t) + T^{-1} B u(t) \\ y(t) &= C T \tilde x(t) \end{align*}\]

So, we can think of this as a new SSM: \[\begin{align*} \dot {\tilde x}(t) &= \tilde A \tilde x(t) + \tilde B u(t) \\ y(t) &= \tilde C \tilde x(t) \end{align*}\] where \[\bbox[5pt,border: 1px solid] {\tilde A = T^{-1} A T, \quad \tilde B = T^{-1} B, \quad \tilde C = CT} \]

The coordinate transformation has the following properties:

  1. Coordinate transformation does not change the characteristic function

    Note that \[\begin{align*} (sI - \tilde A) &= sI - T^{-1} A T \\ &= s T^{-1} I T - T^{-1} A T \\ &= T^{-1} (sI - A) T. \end{align*}\] Therefore, \[ \det(sI - \tilde A) = \det(T^{-1}) \det(sI - A) \det(T) = \det(sI - A) \] because \(\det(T^{-1}) \det(T) = \det(T^{-1} T) = \det(I) = 1\).

  2. Coordinate transformation does not change the TF

    The transfer function in the new coordinate system is \[\begin{align*} \tilde G(s) &= \tilde C(sI - \tilde A)^{-1} \tilde B \\ &= C T T^{-1} (sI - A)^{-1} T T^{-1} B \\ &= C(sI - A)^{-1} B = G(s) \end{align*}\]

These properties hold because the change of coordinates (or similarity transformation) does not physically change the state; it only changes the coordinate system that we are using to represent the state.

Change of coordinates (or similarity transformation) is a very useful tool for control system syntehsis. As we will see later, it is easy to design controllers for systems in CCF. So, when we have to design controllers for systems not in CCF, we first find a similarity transformation that converts the system to CCF; then we design a controller for the system in CCF; and finally we convert the controller back to the original coordinate system. Don’t worry if this doesn’t make sense notw. We will discuss this in detail in the next few lectures.