11 Gaussian processes
11.1 Gaussian processes
Recall that a discrete-time stochastic process is a collection \(\{X_n\}_{n \ge 0}\) of random variables defined on a common probability space where we specify the joint distribution for any collection \(\{n_1, \dots, n_k\}\) of time indices.
A discrete-time stochastic process is called Gaussian if for any collection \(\{n_1, \dots, n_k\}\) of time indices, the random vector \(\{X_{n_1}, \dots, X_{n_k}\}\) has a multivariate Guassian distribution.
Recall that the definition of multivariable Gaussian random variables means we can equivalently define a discrete-time stochastic process \(\{X_n\}_{n \ge 0}\) to be a Gaussian process if every finite linear combination of the random variables \(X_n\) is a Gaussian random variable.
Since a multivariate Gaussian distribution is completely characterized by its mean vector and covariance matrix, a Gaussian process is completely characterized by:
- Mean function: \(μ_X(n) = \EXP[X_n]\)
- Covariance function: \(R_X(m,n) = \COV(X_m, X_n) = \EXP[(X_m - μ_X(m))(X_n - μ_X(n))]\)
Recall that a stochastic process is strict sense stationary if for any finite collection of time indices \(\{n_1, n_2, \dots, n_k\}\) with \(n_i \ge 0\), the random vector \((X_{n_1}, X_{n_2}, \dots, X_{n_k})\) has the same distribution as the random vector \((X_{n_1 + h}, X_{n_2 + h}, \dots, X_{n_k + h})\) for any \(h \ge 0\). For a Gaussian process, this is equivalent to the mean function being constant and the covariance function depending only on the time difference, i.e., \[ μ_X(n) = μ_X \quad \text{for all } n \ge 0, \quad R_X(m,n) = R_X(|n-m|) \quad \text{for all } m, n \ge 0. \] An implication is that the variance of \(X_n\) does not depend on \(n\).
11.2 Properties of covariance functions
To fix ideas, consider three time indices \((\ell,n,m)\). Then, the random vector \(\{X_\ell, X_n, X_m\}\) has a multivariate Gaussian distribution given by \[ \MATRIX{X_{\ell}\\ X_n\\ X_m} \sim \mathcal{N}\left( \MATRIX{\mu_X(\ell)\\ \mu_X(n)\\ \mu_X(m)}, \MATRIX{R_X(\ell,\ell) & R_X(\ell,n) & R_X(\ell,m) \\ R_X(n,\ell) & R_X(n,n) & R_X(n,m) \\ R_X(m,\ell) & R_X(m,n) & R_X(m,m)} \right) \] The covariance matrix above must be symmetric, non-negative definite, and satisfy the Cauchy-Schwarz inequality, i.e., \(\COV(X_m, X_n)^2 \le \COV(X_m, X_m) \COV(X_n, X_n)\). These properties are true in general as well.
Symmetry: \(R_X(m,n) = R_X(n,m)\) for all \(m, n \ge 0\).
NoteProof\(R_X(m,n) = \COV(X_m, X_n) = \COV(X_n, X_m) = R_X(n,m)\) by the symmetry of the covariance operator.
Non-negative definiteness: For any finite collection \(\{n_1, \dots, n_k\}\) with \(n_i \ge 0\) and any real numbers \(a_1, \dots, a_k\), we have \[ \sum_{i=1}^k \sum_{j=1}^k a_i a_j R_X(n_i, n_j) \ge 0. \]
NoteProofLet \(\Sigma\) denote the covariance matrix of the random vector \(\{X_{n_1}, \dots, X_{n_k}\}\). Then, we have \[ \Sigma = \MATRIX{R_X(n_1,n_1) & R_X(n_1,n_2) & \dots & R_X(n_1,n_k) \\ R_X(n_2,n_1) & R_X(n_2,n_2) & \dots & R_X(n_2,n_k) \\ \vdots & \vdots & \ddots & \vdots \\ R_X(n_k,n_1) & R_X(n_k,n_2) & \dots & R_X(n_k,n_k)} \] Since \(\Sigma\) is symmetric, non-negative definite, we have \(\Sigma \succeq 0\). Therefore, \[ \sum_{i=1}^k \sum_{j=1}^k a_i a_j R_X(n_i, n_j) = a^T \Sigma a \ge 0. \]
Cauchy-Schwarz inequality: \(|R_X(m,n)|^2 \le R_X(m,m) R_X(n,n)\).
NoteProofThis is a direct consequence of the Cauchy-Schwarz inequality for random variables: \[ |\COV(X_m, X_n)|^2 \leq \COV(X_m, X_m) \COV(X_n, X_n). \]
The variance function is \(R_X(n,n) = \VAR(X_n) \ge 0\).
11.3 Examples of Gaussian processes
Example 11.1 (Cosine with Gaussian amplitude) Consider a process \(\{X_n\}_{n \ge 0}\) defined as: \[ X_n = A \cos(\Omega n) + B \sin(\Omega n) \] where \(\Omega \in \mathbb{R}\) is a constant frequency, and \(A\) and \(B\) are independent Gaussian random variables with \(A, B \sim \mathcal{N}(0, σ^2)\).
A sample path of this process is shown below.
Find the mean and covariance function of the process. Is the process stationary?
NoteSolution
Mean function:
\[\begin{align*} \mu_X(n) &= \EXP[X_n] = \EXP[A \cos(\Omega n) + B \sin(\Omega n)] \\ &= \EXP[A] \cos(\Omega n) + \EXP[B] \sin(\Omega n) \\ &= 0 \end{align*}\] since \(\EXP[A] = \EXP[B] = 0\).
Covariance function:
\[\begin{align*} R_X(m,n) &= \EXP[X_m X_n] \\ &= \EXP[(A \cos(\Omega m) + B \sin(\Omega m))(A \cos(\Omega n) + B \sin(\Omega n))] \\ &= \EXP[A^2] \cos(\Omega m) \cos(\Omega n) + \EXP[AB] \cos(\Omega m) \sin(\Omega n) \\ &\quad + \EXP[BA] \sin(\Omega m) \cos(\Omega n) + \EXP[B^2] \sin(\Omega m) \sin(\Omega n) \end{align*}\]
Since \(A\) and \(B\) are independent with \(\EXP[A] = \EXP[B] = 0\), we have: - \(\EXP[A^2] = \VAR(A) = σ^2\) - \(\EXP[B^2] = \VAR(B) = σ^2\) - \(\EXP[AB] = \EXP[A]\EXP[B] = 0\)
Therefore: \[\begin{align*} R_X(m,n) &= σ^2 [\cos(\Omega m) \cos(\Omega n) + \sin(\Omega m) \sin(\Omega n)] \\ &= σ^2 \cos(\Omega (m-n)) \end{align*}\] using the identity \(\cos(a-b) = \cos(a)\cos(b) + \sin(a)\sin(b)\).
Since cosine is even: \[ R_X(m,n) = σ^2 \cos(\Omega |m-n|) \]
Thus, the process is sationary.
Example 11.2 (White noise process (i.i.d. Gaussian)) A white noise process \(\{W_n\}_{n \ge 0}\) is a Gaussian process where \(\{W_n\}_{n \ge 0}\) is an i.i.d. sequence with \(W_n \sim \mathcal{N}(0, σ^2)\) for some \(σ > 0\).
A sample path of white noise is shown below.
Find the mean and covariance function of the process. Is the process stationary?
NoteSolution
Mean function:
Since \(W_n \sim \mathcal{N}(0, σ^2)\) for all \(n \ge 0\), we have: \[ \mu_W(n) = \EXP[W_n] = 0 \quad \text{for all } n \ge 0 \]
Covariance function:
For \(m, n \ge 0\):
- If \(m = n\): \(R_W(m,n) = \COV(W_m, W_n) = \VAR(W_n) = σ^2\)
- If \(m \neq n\): Since \(W_m\) and \(W_n\) are independent, \(R_W(m,n) = \COV(W_m, W_n) = 0\)
Therefore: \[ R_W(m,n) = σ^2 \IND_{\{m=n\}} \quad \text{for } m, n \ge 0 \] where \(\IND_{\{m=n\}}\) is the indicator function that equals 1 when \(m = n\) and 0 otherwise.
Example 11.3 (Gaussian random walk) A Gaussian random walk \(\{X_n\}_{n \ge 0}\) is a Gaussian process defined by: \[ X_0 = 0, \quad X_{n+1} = X_n + W_n, \quad n \ge 0 \] where \(\{W_n\}_{n \ge 0}\) is an i.i.d. sequence with \(W_n \sim \mathcal{N}(0, σ^2)\) for some \(σ > 0\).
A sample path of a Gaussian random walk is shown below.
Assuming \(X_n \sim \mathcal{N}(0, σ^2)\), find the mean and covariance function of the process. Is the process sationary?
NoteSolution
Observe that the process satisfies the Markov property. So, it is also called a Gauss Markov process. Since the marginal distribution of \(X_n\) is Gaussian, we can write recursive formulas for the mean and covariance function (which will be the analog of the \(\mu^{(n+1)} = \mu^{(n)} P\) type recursions for Markov chains). \[\begin{align*} \mu_X(n+1) &= \mu_X(n) + \EXP[W_{n+1}] = \mu_X(n), \\ R_X(n+1,n+1) &= \EXP[ (X_n + W_n)(X_n + W_n)] \\ &= R_X(n,n) + \EXP[W_n^2] = R_X(n,n) + σ^2. \end{align*}\]
Thus, \(\mu_X(n) = \mu_X(0) = 0\) and \(R_X(n,n) = σ^2 n\) for all \(n \ge 0\).
For computing \(R_X(n,m)\) for \(n \neq m\), we use the fact that (assuming \(n < m\)) the increment \(X_{m} - X_{n} = W_{n} + W_{n+1} + \cdots + W_{m-1}\) is independent of \(X_n\). Therefore \[\begin{align*} R_X(n,m) &= \EXP[X_n X_m] \\ &= \EXP[X_n (X_{n} + W_{n} + W_{n+1} + \cdots + W_{m-1})] \\ &= R_X(n,n). \end{align*}\]
Combining the above we have \(R_X(n,m) = σ^2 \min(n,m)\) for all \(n, m \ge 0\). Thus, the process is not sationary.
Example 11.4 (AR(1) process) An AR(1) process (also called discrete-time Ornstein-Uhlenbeck process) \(\{X_n\}_{n \ge 0}\) is a Gaussian process defined by the recursion: \[ X_{n+1} = a X_n + W_n, \quad n \ge 0 \] where \(|a| < 1\) and \(\{W_n\}_{n \ge 0}\) is an i.i.d. sequence with \(W_n \sim \mathcal{N}(0, σ^2)\).
A sample path of an AR(1) process is shown below.
Assuming \(X_0 \sim \mathcal{N}(0, σ^2)\), find the mean and covariance function of the process. Is the process sationary?
NoteSolution
As was the case for Example 10.6, AR(1) processes are also Gauss Markov processes. As before, we can write recursive formulas for the mean and covariance function: \[\begin{align*} \mu_X(n+1) &= a \mu_X(n) + \EXP[W_{n+1}] = a \mu_X(n), \\ R_X(n+1,n+1) &= \EXP[ (a X_n + W_n)(a X_n + W_n)] \\ &= a^2 R_X(n,n) + \EXP[W_n^2] = a^2 R_X(n,n) + σ^2. \end{align*}\]
Thus, \(\mu_X(n) = \mu_X(0) = 0\) and \(R_X(n,n) = σ^2 \sum_{k=0}^{n-1} a^{2k}\) for all \(n \ge 0\).
For computing \(R_X(n,m)\) for \(n \neq m\), we use the fact that (assuming \(n < m\)) \[\begin{align*} X_{m} &= a X_{m-1} + W_{m-1} \\ &= a (a X_{m-2} + W_{m-2}) + W_{m-1} \\ &= a^2 X_{m-2} + a W_{m-2} + W_{m-1} \\ &= \cdots \\ &= a^{m-n} X_{n} + a^{m-n-1} W_{n+1} + \cdots + W_{m-1}. \end{align*}\] Note that the increment \(a^{m-n-1} W_{n+1} + \cdots + W_{m-1}\) is independent of \(X_n\). Therefore \[\begin{align*} R_X(n,m) &= \EXP[X_n X_m] \\ &= \EXP[X_n (a^{m-n} X_{n} + a^{m-n-1} W_{n+1} + \cdots + W_{m-1})] \\ &= a^{m-n} R_X(n,n). \end{align*}\]
Combining the above we have \[R_X(n,m) = σ^2 a^{|m-n|} \frac{1 - a^{2\min(n,m)}}{1 - a^2}, \quad \forall n, m \ge 0.\] where we have used the geometric series formula \(\sum_{k=0}^{n-1} a^{2k} = \frac{1 - a^{2n}}{1 - a^2}\) (for \(|a| < 1\)).
Thus, the process is not stationary.
TipInvariant distribution and stationarity
From the results of Example 11.4, observe that \[ \lim_{n \to \infty} \mu_X(n) = 0 \quad \text{and} \quad \lim_{n \to \infty} R_X(n,n) = \frac{σ^2}{1-a^2} \] for \(m, n \ge 0\). The above limits show that the distribution of \(X_n\) converges to an invariant distribution with mean 0 and variance \(\frac{σ^2}{1-a^2}\).
As was the case for Markov chains, if we start with the invariant distribution, i.e., \(X_0 \sim \mathcal{N}(0, \frac{σ^2}{1-a^2})\), the process will remain in the invariant distribution for all time because \[ \mu_X(n+1) = a \mu_X(n) = 0 \] and \[ R_X(1,1) = σ^2 + a^2 R_X(0,0) = σ^2 + a^2 \frac{σ^2}{1-a^2} = σ^2 \frac{1 + a^2}{1-a^2}, \] and simlarly for \(R_X(n,n)\) for all \(n \ge 0\).
Furthermore, the process is a strict sense stationary process because in addition to \(\mu_X(n) = 0\), we have \[ R_X(n, m) = a^{|n-m|} R_X(n,n) = a^{|n-m|} \frac{σ^2}{1-a^2}. \] which is a function of \(|n-m|\) only.
The above ideas can be generalized to vector valued processes as well.
In general, let \(\{W_n\}_{n \ge 0}\) be a vector valued white noise process, where \(W_n \sim \mathcal{N}(0, Σ_W)\). Then \(μ_X(n) = 0\) and \(R_X(n,m) = Σ_W \IND\{n = m\}\). The process is called a standard White noise process if \(Σ_W = I\).
Consider a stochastic process \(\{X_n\}_{n \ge 0}\) defined as \[ X_{n+1} = A_n X_n + B_n W_n. \] where \(A_n\) and \(B_n\) are deterministic matrices and \(\{W_n\}_{n \ge 0}\) is standard White noise. Then, \(\{X_n\}_{n \ge 0}\) is Gauss Markov process and it can be shown that for any \(\ell < n < m\), we have \[ R_X(\ell, m) = R_X(\ell,n) R_X(n,n)^{-1} R_X(n,m). \]
The process is stationary when \(A_n\) and \(B_n\) do not depend on \(n\).
11.4 Jointly Gaussian stochastic processes
Consider two stochastic processes defined on a common probability space: \(\{X_n\}_{n \ge 0}\) and \(\{Y_n\}_{n \ge 0}\). Suppose for any collection \(\{n_1, \dots, n_k\}\) of time indices, the random vector \(\{X_{n_1}, \dots, X_{n_k}, Y_{n_1}, \dots, Y_{n_k}\}\) has a multivariate Gaussian distribution. Then, the processes are called jointly Gaussian.
The joint process \(\{X_n, Y_n\}_{n \ge 0}\) is a Gaussian process with mean \(\mu_{X,Y}(n) = \MATRIX{\mu_X(n)\\ \mu_Y(n)}\) and covariance \(R_{X,Y}(n,m) = \MATRIX{R_X(n,m) & R_{XY}(n,m) \\ R_{YX}(n,m) & R_Y(n,m)}\).
The function \(R_{XY}(n,m) = \COV(X_n, Y_m)\) is called the cross-covariance function of the processes (and sometimes the processes \(R_X(n,m)\) and \(R_Y(n,m)\) are called the auto-covariance functions of the processes).
The cross-covariance function satisfies \(R_{XY}(n,m) = \COV(X_n, Y_m) = R_{YX}(m,n)\) for all \(n, m \ge 0\).
The processes are uncorrelated if \(R_{XY}(n,m) = 0\) for all \(n, m \ge 0\). Since the joint process is Gaussian, uncorrelated implies independent.
11.5 Review of discrete-time LTI systems
Consider an LTI system with impulse resposnse \(\{h_k\}_{k \ge 0}\). The transfer function of the system is given by the Z-transform of the impulse response: \[ H(z) = \sum_{k=0}^∞ h_k z^{-k} \]
The system is stable if \[ \sum_{k=0}^∞ |h_k| < ∞, \] Equivalently, the ROC of \(H(z)\) contains the unit circle \(|z| = 1\), i.e., all poles of \(H(z)\) lie inside the unit circle.
The Z-transform evaluated on the unit circle is called the Fourier transform, i.e., \[ H(e^{j\omega}) = \sum_{k=-\infty}^∞ h_k e^{-j\omega k} \] Note that we have a clash of notation. In Signals and Systems, \(\omega\) is used to denote the frequency in radians per second, while in probabiblity \(\omega\) is used to denote events in the sample space. To avoid confusion, we we will work use \(\Omega\) to denote the frequency (it still clashes with the notation for sample space, but the meaning should be clear from the context). Thus, the Fourier transform is given by \[ H(e^{j\Omega}) = \sum_{k=-\infty}^∞ h_k e^{-j\Omega k} \]
The inverse Fourier transform is given by: \[ h_k = \frac{1}{2\pi} \int_{-\pi}^{\pi} H(e^{j\Omega}) e^{j\Omega k} d\Omega \]
A SISO (single-input single-output) discrete-time LTI system can be represented in state space form as: \[ \begin{align*} x_{n+1} &= A x_n + B u_n \\ y_n &= C x_n + D u_n \end{align*} \] where \(x_n \in \mathbb{R}^d\) is the state vector, \(u_n \in \mathbb{R}\) is the input, \(y_n \in \mathbb{R}\) is the output, and \(A \in \mathbb{R}^{d \times d}\), \(B \in \mathbb{R}^{d \times 1}\), \(C \in \mathbb{R}^{1 \times d}\), and \(D \in \mathbb{R}^{1 \times 1}\) are constant matrices. The system is stable if all eigenvalues of \(A\) lie inside the unit circle. The transfer function \(H(z)\) can be recovered from the state space representation as \(H(z) = C(zI - A)^{-1}B + D\).
Similar state space representations exist for MIMO (multiple-input multiple-output) systems, where the input and output are vectors.
11.6 Complex-valued random variables and processes
As we shall see, when viewing random processes in frequency domain, complex-valued random variables and processes arise naturally. If \(X\) and \(Y\) are real-valued joint random variables, then \(Z = X + jY\) is a complex-valued random variable. Similarly, if \(\{X_n\}_{n \ge 0}\) and \(\{Y_n\}_{n \ge 0}\) are real-valued joint random processes, then \(\{Z_n\}_{n \ge 0} = \{X_n + jY_n\}_{n \ge 0}\) is a complex-valued random process.
We can think of the joint CDF of a complex-valued random variable as \(F_Z(x + jy) = F_{X,Y}(x,y)\).
The covariance of two complex-valued random variables \(Z_1\) and \(Z_2\) is defined as \[ \COV(Z_1, Z_2) = \EXP[Z_1 Z_2^*] \] where \(Z_2^*\) denotes the complex conjugate of \(Z_2\). Note that we use \(Z_2^*\) (rather than \(Z_2\)) to ensure that the variance \(\VAR(Z) = \COV(Z, Z) = \EXP[|Z|^2]\) is real and non-negative. The same definition works for complex-valued random vectors, where \(Z_2^*\) denotes the conjugate transpose of \(Z_2\).
We say that \(Z\) is a complex Gaussian random vector with mean \(\mu_Z\) and covariance matrix \(\Sigma_Z\) if the real and imaginary parts of \(Z\) are jointly Gaussian and \(\EXP[Z] = \mu_Z\) and \(\VAR(Z) = \COV(Z,Z) = \Sigma_Z\).
For a complex-valued random process \(\{X_n\}_{n \ge 0}\), the auto-covariance function is conjugate symmetric, i.e., \(R_X(m,n) = R_X(n,m)^*\). Note that \(R_X(n,n) = \EXP[|X_n|^2]\) is always real and non-negative. When the process is stationary, \(R_X(m,n) = R_X(m-n)\), and therefore \(R_X(k) = R_X(-k)^*\) where \(k = m-n\).
The cross-covariance between two complex-valued random processes \(\{X_n\}_{n \ge 0}\) and \(\{Y_n\}_{n \ge 0}\) is conjugate symmetric, i.e., \(R_{XY}(n,m) = R_{YX}(m,n)^*\). When both processes are stationary, \(R_{XY}(n,m) = R_{XY}(n-m)\), and therefore \(R_{XY}(k) = R_{YX}(-k)^*\) where \(k = n-m\).
11.7 Stationary Gaussian processes through LTI systems
In many applications in signal processing and stochastic control, we are interested in the output of a linear system when the input is a stationary Gaussian process. There are two ways to study such systems: either in the time domain using state space models or in the frequency domain using the transfer function. For the purposes of this section, we will restrict our discussion to linear time-invariant (LTI) systems, but the main ideas can be generalized to time-varying linear systems as well.
Consider Example 11.3 which discussed a Gaussian random walk. The model was already presented in state space form but we can also explicitly write it as \[ \begin{align*} X_{n+1} &= X_n + W_n \\ Y_n &= X_n \end{align*} \] Thus, the transfer function is given by \[ H(z) = 1 \cdot (z - 1)^{-1} \cdot 1 + 0 = \frac{1}{z - 1} = \frac{z^{-1}}{1 - z^{-1}} = \sum_{k=1}^∞ z^{-k} \] Thus, the impulse response is given by \[ h_0 = 0 \quad \text{and} \quad h_k = 1, \quad \forall k \ge 1 \]
Similarly, consider Example 11.4 which discussed an AR(1) process. The model was already presented in state space form but we can also explicitly write it as \[ \begin{align*} X_{n+1} &= a X_n + W_n \\ Y_n &= X_n \end{align*} \] The transfer function is given by \[ H(z) = 1 \cdot (z - a)^{-1} \cdot 1 + 0 = \frac{1}{z - a} = \frac{z^{-1}}{1 - a z^{-1}} = \sum_{k=1}^∞ a^k z^{-k} \] Thus, the impulse response is given by \[ h_0 = 0 \quad \text{and} \quad h_k = a^{k-1}, \quad \forall k \ge 1 \]
Thus, these two examples show the output of two LTI systems when the input is white noise. Note that the transfer function for Example 11.3 is not stable (as it has a pole on the unit circle) and as a consequence the output is unstable (i.e., the marginal distribution does not converge to an invariant distribution). In contrast, the transfer function for Example 11.4 is stable (as it has all poles inside the unit circle) and as a consequence the output is stable (i.e., the marginal distribution converges to an invariant distribution).
Suppose a (complex-valued) Gaussian process \(\{X_n\}_{n ≥ 0}\) is passed through an LTI system \(H(z)\). For every \(ω ∈ Ω\), the output process is given by \[ Y_n(ω) = \sum_{k=-\infty}^∞ h_k X_{n-k}(ω) \] which is a sum of Gaussian random variables, and hence is Gaussian.
For simplicity, we expression the sum from \(-\infty\) to \(\infty\), even though, for causal systems, the impulse response is zero for negative indices and we are assuming that the input is zero for negative indices.
Suppose \(\{X_n\}_{n \ge 0}\) is a stationary Gaussian process with mean \(\mu_X(n)\) and covariance function \(R_X(k) = R_X(|n-m|)\) where \(k = |n-m|\). Then, the output process \(\{Y_n\}_{n \ge 0}\) is also stationary. We establish this by a sequence of steps.
The mean function of the output process is given by \[ μ_Y(n) = \EXP[Y_n] = \EXP\left[\sum_{k=-\infty}^∞ h_k X_{n-k}\right] = \sum_{k=-\infty}^∞ h_k \EXP[X_{n-k}] = \sum_{k=-\infty}^∞ h_k μ_X = μ_X \sum_{k=-\infty}^∞ h_k = μ_X H(1) \] which does not depend on \(n\). This makes intuitive sense: the average of the output is the average of the input times the DC-gain of the system.
The cross-covariance between the output and input is given as follows: \[\begin{align*} R_{YX}(n,m) &= \COV(Y_n, X_m) = \COV\left(\sum_{k=-\infty}^∞ h_{n-k} X_k, X_m\right) \\ &= \sum_{k=-\infty}^∞ h_{n-k} \COV(X_k, X_m) \\ &= \sum_{k=-\infty}^∞ h_{n-k} R_X(k-m) \\ &= (h * R_X)(n-m) \end{align*}\] which only depends on \(n-m\). Thus, the cross-covariance between the output and input is the convolution of the impulse response with the covariance function of the input.
Note that in the above calculations, we have used the following convolution identity: \[\begin{equation}\label{eq:convolution-identity} (f * g)(n-m) = \sum_{k=-\infty}^∞ f(n-k) g(k-m) \end{equation}\] which can be verified by change of variables in the standard definition of convolution.
The auto-covariance of the output is given by \[\begin{align*} R_Y(n,m) &= \COV(Y_n, Y_m) = \COV\left(Y_n, \sum_{k=-\infty}^∞ h_{m-k} X_k\right) \\ &= \sum_{k=-\infty}^∞ h_{m-k}^* \COV(Y_n, X_k) \\ &= \sum_{k=-\infty}^∞ h_{m-k}^* R_{YX}(n-k) \end{align*}\] The above expression looks like a convolution but is not. To convert it to a convolution, define a new function \(\tilde h_k = h^*_{-k}\). Then, \[ R_Y(n,m) = \sum_{k=-\infty}^\infty \tilde h_{k-m} R_{YX}(n-k) = (\tilde h * R_{YX})(n-m) \] due to \(\eqref{eq:convolution-identity}\), which is just a function of \(n-m\).
Thus, we have shown that \(\mu_Y(n)\) does not depend on \(n\) and \(R_Y(n,m)\) only depends on \(n-m\). Therefore, the output process is a stationary Gaussian process.
To summarize, when a stationary Gaussian process is passed through a stable LTI system, the output is also a stationary Gaussian process with \[ \mu_Y(n) = μ_X H(1), \quad R_{YX} = h * R_X, \quad R_Y = \tilde h * R_{YX} = \tilde h * h * R_X. \]
11.8 Power spectral density
For a stationary Gaussian process \(\{X_n\}_{n \ge 0}\), the power spectral density (PSD) \(S_X(e^{j \Omega})\) is defined as the discrete-time Fourier transform (DTFT) of the auto-covariance function: \[ R_X(n) \overset{\mathcal{F}}{\longleftrightarrow} S_X(e^{j \Omega}) \] where we are implicitly assuming that the covariance function is absolutely summable, so that the Fourier transform exists.
Since \(R_X\) is even, \(S_X\) is conjugate symmetric, i.e., \(S_X(e^{j \Omega}) = S_X(e^{-j \Omega})^*\).
Example 11.5 Compute the power spectral density of white noise.
NoteSolutionRecall that \(R_W(n) = \sigma^2 \delta_n\) and \(\delta_n \overset{\mathcal{F}}{\longleftrightarrow} 1\). Therefore, \[ S_W(e^{j \Omega}) = \sigma^2. \] This is the reason that the process is called white noise, because its spectrum is flat (in contrast to colored noise, where the spectrum is not flat).
Example 11.6 Compute the power spectral density of an AR(1) process.
NoteSolutionRecall that \(R_X(n) = \dfrac{σ^2}{1-a^2} a^{|n|}\) and \(a^{|n|} \xleftrightarrow{\mathcal{F}} \dfrac{1-a^2}{\ABS{1 - a e^{j \Omega}}^2}\). Therefore, \[ S_X(e^{j \Omega}) = \frac{σ^2}{1-a^2} \frac{1-a^2}{\ABS{1 - a e^{j \Omega}}^2} = \frac{σ^2}{\ABS{1 - a e^{j \Omega}}^2}. \]
For jointly Gaussian processes \(\{X_n\}_{n \ge 0}\) and \(\{Y_n\}_{n \ge 0}\), the cross-power spectral density \(S_{YX}(e^{j \Omega})\) is defined as the DTFT of the cross-covariance function: \[ R_{YX}(n) \overset{\mathcal{F}}{\longleftrightarrow} S_{YX}(e^{j \Omega}). \]
The reason for the terminology “power spectral density” is the following. The power of the process is given by \[ \frac{1}{N} \sum_{n=0}^{N-1} \VAR(X_n) = \VAR(X_0) = R_X(0) \] due to stationarity. From the formula for inverse DTFT, we have that \[ R_X(0) = \frac{1}{2\pi} \int_{-\pi}^{\pi} S_X(e^{j \Omega}) \, d\Omega. \] Thus, the power spectral density tells how the power of the process is distributed across different frequencies.
Since \(R_X(0) \ge 0\), the above relationship suggests that \(S_X(e^{j \Omega}) \ge 0\) for all \(\Omega \in [-\pi, \pi]\). This is a general property of power spectral densities and is a consequence of :Wiener-Khinchin theorem. See the discussion in Chapter 10.2 of Oppenheim and Verghese.
Observe that \(\tilde h_k = h_{-k}^*\). Therefore, \(\tilde H(e^{j \Omega}) = H(e^{-j \Omega})^*\). Therefore, the fundamental relationships between input and output in frequency domain are given by \[\begin{equation}\label{eq:psd-relationships} S_{YX}(e^{j \Omega}) = H(e^{j \Omega}) S_X(e^{j \Omega}), \quad S_Y(e^{j \Omega}) = |H(e^{j \Omega})|^2 S_X(e^{j \Omega}). \end{equation}\]
We can resolve Example 11.6 using the above relationship. Recall that we showed that the AR(1) process is the output of an LTI system with transfer function \(H(z) = \frac{1}{z - a}\) when the input is white noise.
Note that \(H(e^{j \Omega}) = \frac{1}{e^{j \Omega} - a}\). To compute \(|H(e^{j \Omega})|^2\), we observe that \[ |e^{j \Omega} - a|^2 = |e^{j \Omega}(1 - a e^{-j \Omega})|^2 = |1 - a e^{-j \Omega}|^2 = |1 - a e^{j \Omega}|^2, \] where the last equality follows from the fact that \(|z| = |z^*|\) for any complex number \(z\). Therefore, \[ S_X(e^{j \Omega}) = |H(e^{j \Omega})|^2 S_W(e^{j \Omega}) = \frac{1}{|e^{j \Omega} - a|^2} \sigma^2 = \frac{1}{\ABS{1 - a e^{j \Omega}}^2} \sigma^2, \] which matches the result in Example 11.6.
Note that \(\eqref{eq:psd-relationships}\) gives a way to identify the impluse response of an LTI system. The simplest setting is when the input is standard white noise, in which case, \(S_{YX}(e^{j \Omega}) = H(e^{j \Omega})\). Therefore, we can identify the impulse response by first computing the cross-correlation function \[ R_{YX}(n) \approx \frac 1N \sum_{k=0}^{N-1} Y_k X_{k+n}^* \] for some large \(N\) and then taking the DTFT of the result.
The relationship \(\eqref{eq:psd-relationships}\) is also useful in filtering. An important application is a whitening filter, which is a filter that transforms a colored noise process into white noise.
Suppose \(\{X_n\}_{n \ge 0}\) is a zero mean colored noise process with auto-covariance function \(R_X(n)\). Let \(H(z)\) be (possibly non-causal) transfer function such that \[ \ABS{H(e^{j \Omega})}^2 = \frac{1}{S_X(e^{j \Omega})} \] Then, the output of the filter is a white noise process with variance \(\sigma^2\).
As an illustration, consider a moving average process \(\{X_n\}_{n \ge 0}\) given by \[ X_{n} = W_n + \tfrac{1}{2} W_{n-1} \] where \(\{W_n\}_{n \ge 0}\) is white noise with variance \(\sigma^2\). The mean of the process is \[ μ_X(n) = 0. \] To compute the auto-covariance function, we first observe that for \(n = m\), \[ R_X(n,n) = \VAR(W_n + \tfrac{1}{2} W_{n-1}) = \sigma^2( 1 + \tfrac{1}{4}) = \tfrac{5}{4} \sigma^2. \] For \(m = n+1\), we have \[\begin{align*} R_X(n,n+1) &= \COV(W_n + \tfrac{1}{2} W_{n-1}, W_{n+1} + \tfrac{1}{2} W_{n}) \\ &= \COV(W_n, W_{n+1}) + \tfrac{1}{2} \COV(W_n, W_{n}) + \tfrac{1}{2} \COV(W_{n-1}, W_{n+1}) + \tfrac{1}{4} \COV(W_{n-1}, W_{n}) \\ &= \tfrac{1}{2} \sigma^2 \end{align*}\] as all terms other that \(\COV(W_n, W_n)\) are zero. Furthermore, for all \(m \ge n+2\), there is no overlap between the two terms and therefore \(R_X(n,m) = 0\).
Thus, the auto-covariance function depends only on the time difference \(|n-m|\) and may be simplified as \[ R_X(n) = \begin{cases} \tfrac{5}{4} \sigma^2, & n = 0 \\ \tfrac{1}{2} \sigma^2, & |n| = 1 \\ 0, & |n| > 1 \end{cases} \]
Therefore, the power spectral density is given by \[ S_X(e^{j \Omega}) = \sigma^2 \left( \frac{5}{4} + \frac{1}{2} e^{-j \Omega} + \frac{1}{2} e^{j \Omega} \right) \]
Now, suppose we want to “whiten” the process. Observe that \[ S_X(e^{j \Omega}) = \sigma^2 \left( \frac{5}{4} + \frac{1}{2} e^{-j \Omega} + \frac{1}{2} e^{j \Omega} \right) = \sigma^2 \left| 1 + \frac{1}{2} e^{-j \Omega} \right|^2 \] Therefore, if we pick \[H(e^{j \Omega}) = \frac{1}{\sigma} \frac{1}{1 + \frac{1}{2} e^{-j \Omega}}\] then the output of the filter is a standard white noise process.
We can “see” this by writing the input-output relationship of this filter in time domain. Note that the filter transfer function is given by \[ H(z) = \frac{1}{\sigma} \frac{1}{1 + \frac{1}{2} z^{-1}} = \frac{1}{\sigma} \frac{z}{z + \frac{1}{2}} \] For simplicity, let’s consider determinisitic input and outputs. We know \[ Y(z) = X(z) H(z) \implies Y(z) \left( 1 + \frac{1}{2} z^{-1} \right) = \frac{1}{\sigma} X(z) \] which implies \[ y_n + \tfrac{1}{2} y_{n-1} = \frac{1}{\sigma} x_n. \]
The same relationship holds for the stochastic case, i.e., \[ Y_n = \frac{1}{\sigma} X_n - \frac{1}{2} Y_{n-1} \] We can observe recursively that (for simplificity take \(\sigma^2 = 1\)) \[\begin{align*} Y_0 &= X_0 = W_0 \\ Y_1 &= X_1 - \frac{1}{2} Y_0 = W_1 \\ Y_2 &= X_2 - \frac{1}{2} Y_1 = W_2 \\ Y_3 &= X_3 - \frac{1}{2} Y_2 = W_3 \\ &\vdots \\ \end{align*}\] Thus, the output is white noise.
11.9 Note on non-Gaussian processes
Since all the properties discussed above (mean function, covariance function, linear transformations, etc.) depend only on the first two moments (mean and covariance), they apply to stochastic processes beyond Gaussian processes, as long as we restrict our attention to first and second moments.
However, it is important to distinguish between two notions of stationarity:
A stochastic process \(\{X_n\}_{n \ge 0}\) is called wide-sense stationary (or weakly stationary) if:
- The mean function is constant: \(μ_X(n) = μ_X\) for all \(n \ge 0\)
- The covariance function depends only on the time difference: \(R_X(m,n) = R_X(|n-m|)\) for some function \(R_X\)
A stochastic process \(\{X_n\}_{n \ge 0}\) is called strict-sense stationary (or strongly stationary) if for any finite collection of time indices \(\{n_1, \dots, n_k\}\) and any integer \(h \ge 0\), the joint distribution of \((X_{n_1}, \dots, X_{n_k})\) is the same as the joint distribution of \((X_{n_1+h}, \dots, X_{n_k+h})\).
For any stochastic process, strict-sense stationarity implies wide-sense stationarity. However, the converse is not true in general. For Gaussian processes, since they are completely determined by their mean and covariance, wide-sense stationarity implies strict-sense stationarity. Thus, for Gaussian processes, the two notions are equivalent.
11.10 Weiner filtering
Let \(\{X_n\}_{n \ge 0}\) and \(\{Y_n\}_{n \ge 0}\) be jointly Gaussian. Suppose we observe the \(\{Y_n\}_{n \ge 0}\) process and want to design a filter with input \(\{Y_n\}_{n \ge 0}\) and output \(\{ \hat X_n\}_{n \ge 0}\) such that \(\hat X_n\) is an MMSE estimator of \(X_n\).
Let \(\{h_n\}_{n \ge 0}\) denote the impulse response of the filter. Ideally, \(\{h_n\}_{n \ge 0}\) should be causal, but we start by considering non-causal filter
For \(\hat X_n\) to be an MMSE estimator of \(X_n\), the error \(X_n - \hat X_n\) must be orthogonal to \(\{Y_n\}_{n \ge 0}\), i.e., \[ \COV(X_n - \hat X_n, Y_m) = 0 \implies R_{XY}(n-m) = R_{\hat X Y}(n-m). \]
We also know that \(R_{\hat X Y} = h * R_Y\). Thus, \(R_{XY} = h * R_Y\) or in frequency domain: \[ S_{XY}(z) = H(z) S_Y(z) \implies H(z) = S_{XY}(z)/S_Y(z) \]
Further Reading
Gubner, Chapter 10.
Grimmett and Stirzaker, Chapters 8 and 9.
Hajek, Random Processes for Engineering, Chapter 7 and Chapter 8. Most of the discussion there is for countinuous time processes, but the ideas are the same.
Oppenheim and Verghese, Random Signals and Systems, Chapter 9 and 10.