ECSE 506: Stochastic Control and Decision Theory

Aditya Mahajan
Winter 2022

About  |  Lectures  |  Notes  |  Coursework

Singular value decomposition

Recall that if \(A\) is a symmetric \(n × n\) matrix, then \(A\) has real eigenvalues \(λ_1, \dots, λ_n\) (possibly repeated) and \(\reals^n\) has an orthonormal basis \(v_1, \dots, v_n\), where each \(v_i\) is an eigenvector of \(A\) with eigenvalue \(λ_i\). Then, \[ A = V D V^{-1} \] where \(V\) is the matrix whose columns are \(v_1, \dots, v_n\) and \(D\) is a diagonal matrix whose diagonals are \(λ_1, \dots, λ_n\). Since the vectors \(v_1, \dots, v_n\) are orthonormal, the matrix \(V\) is orthonormal, i.e., \(V^\TRANS V = I\), so we can alternatively write the above equation as \[ A = V D V^\TRANS.\]

The singular value decomposition (SVD) is a generalization of this where \(A\) is a \(n × d\) matrix, which does not have to be symmetric or even square.

1 Singular values

Let \(A\) be a \(n × d\) matrix. Consider the matrix \(A^\TRANS A\). This is a symmetric \(d × d\) matrix, so its eigenvalues are real. Let \(\{ λ_1, \dots, λ_d \}\) denote the eigenvalues of \(A^\TRANS A\), with repetitions. Order then so that \(λ_1 \ge λ_2 \ge \dots \ge λ_d \ge 0\). Let \(σ_i = \sqrt{λ_i}\), so that \(σ_1 \ge σ_2 \ge \dots σ_d \ge 0\). These numbers are called the singular values of \(A\).

1.1 Properties of singular values

  1. The number of non-zero singular values of \(A\) equals to the rank of \(A\). In particular, if \(A\) is \(n × d\) where \(n < d\), then \(A\) has at most \(n\) nonzero singular values.

  2. It can be shown that

    \[ σ_1 = \max_{\|x\| = 1} \| A x \| . \]

    Let \(v_1\) denote the arg-max of the above optimization. \(v_1\) is called the first singular vector of \(A\). Then,

    \[ σ_2 = \max_{ x \perp v_1, \|x \| = 1} \| A x\|. \]

    Let \(v_2\) denote the arg-max of the above optimization. \(v_2\) is called the second singular vector of \(A\), and so on.

  3. Let \(A\) be a \(n × d\) matrix and \(v_1, \dots, v_r\) be the singular vectors, where \(r = \text{rank}(A)\). Then for any \(k \in \{1, \dots, r\}\), let \(V_k\) be the subspace spanned by \(\{v_1, \dots, v_k\}\). Then, \(V_k\) is the best \(k\)-dimensional subspace for \(A\).

  4. For any matrix \(A\), \[ \sum_{i =1}^r σ_i^2(A) = \| A \|_{F}^2 := \sum_{j,k} a_{jk}^2. \]

  5. Any vector \(v\) can be written as a linear combination of \(v_1, \dots, v_r\) and a vector perpendicular to \(V_r\) (defined above). Now, \(Av\) can be written as the same linear combination of \(Av_1, Av_2, \dots, Av_r\). So, \(Av_1, \dots, Av_r\) form a fundamental set of vectors associated with \(A\). We normalize them to length one by \[ u_i = \frac{1}{σ_i(A)} A v_i. \] The vectors \(u_1, \dots, u_r\) are called the left singular vectors of \(A\). The \(v_i\) are called the right singular vectors.

  6. Both the left and the right singular vectors are orthogonal.

Singular value decomposition. For any matrix \(A\), \[ A = \sum_{i=1}^r σ_i u_i v_i^\TRANS \] where \(u_i\) and \(v_i\) are the left and right singular vectors, and \(σ_i\) are the singular values.

In matrix notation: \[ A = U D V^\TRANS \] where the columns of \(U\) and \(V\) consist of the left and right singular vectors, respectively, and \(D\) is a diagonal matrix whose diagonal entries are the singular values of \(A\).

If \(A\) is a positive definite square matrix, then the SVD and the eigen-decomposition coincide.

2 Best rank-\(k\) approximations

There are two important matrix norms, the Frobenius norm which is defined as \[ \| A \|_{F} = \sqrt{ \sum_{i,j} a_{ij}^2 } \] and the induced norm which is defined as \[ \| A \|_2 = \max_{\|x \| = 1} \| A x \|. \]

Note that the Frobenius norm is equal to the square root of the sum of squares of the singular values and the \(2\)-norm is the largest singular value.

Let \(A\) be an \(n × d\) matrix and think of \(A\) as the \(n\) points in \(d\)-dimensional space. The Frobenius norm of \(A\) is the square root of the sum of squared distance of the points to the origin. The induced norm is the square root of the sum of squared distances to the origin along the direction that maximizes this quantity.

Let \[ A = \sum_{i = 1}^r σ_i u_i v_i^\TRANS \] be the SVD of \(A\). For \(k \in \{1, \dots, r\}\), let \[ A_k = \sum_{i=1}^k σ_i u_i v_i^\TRANS \] be the sum truncated after \(k\) terms.

Proposition. \(A_k\) is the best rank \(k\) approximation to \(A\), when the error is measured in either the induced norm or the Frobenius norm.

This result is established by showing the following properties:

  1. The rows of \(A_k\) are the projections of the rows of \(A\) onto the subspace \(V_k\) spanned by the first \(k\) right singular vectors of \(A\).

  2. For any matrix \(B\) of rank at most \(k\) \[ \| A - A_k \|_{F} \le \|A - B \|_{F}. \]

  3. \(\| A - A_k\|_2^2 = σ_{k+1}^2.\)

  4. For any matrix \(B\) of rank at most \(k\) \[ \| A - A_k \|_{2} \le \|A - B \|_{2}. \]


Nice reference for an intuitive explanation of SVD:

This entry was last updated on 24 Aug 2020