Brief introduction to Singular Value Decomposition (SVD)

1. What is SVD (Singular Value Decomposition)?

1.1 Definition

Define $ \mathbf{ X } = (\vec{x_1}) $ as a $n\times m$ matrix ($n \ge m$). A fraction of $\mathbf{X}$ is: $$ \mathbf{X} = \mathbf{ U }\mathbf{ \Sigma } \mathbf{ V }^\top $$ where:

  • $ \mathbf{U} $: $n \times n$ unitary matrix ($\mathbf{U}^* = \mathbf{U}^{-1}$)

  • $ \mathbf{\Sigma} $: $n \times m$ rectangular diagonal matrix with non-negative real numbers on the diagonal

$$ \mathbf{\Sigma} = \begin{bmatrix} \sigma_1 \ & \sigma_2 \ & & \ddots \ & & & \sigma_m \ 0 & 0 & \dots & 0 \ \vdots & \vdots & \ddots & \vdots \ 0 & 0 & \dots & 0 \end{bmatrix} $, $\sigma_1 \ge \sigma_2 \ge \dots \ge \sigma_m \ge 0 $$

  • $ \mathbf{V} $: $m\times m$ unitary matrix

1.2 Singular value and singular vector

Singular value: the value on the diagonal in matrix $\mathbf{\Sigma}$

Singular vector:

  • Left singular vector: $ \mathbf{U} = (\vec{u_1}, \vec{u_2}, \dots , \vec{u_n}) $ are the left singular vectors of matrix $\mathbf{X}$
  • Right singular vector: $ \mathbf{V} = (\vec{v_1}, \vec{v_2}, \dots , \vec{v_m}) $ are the right singular vectors of matrix $\mathbf{X}$

2. Matrix Approximation

Notice that there are only $m$ singular values, we have the Eckand-Young theorem (1936): $$ \mathbf{X} = \sigma_1 u_1 v_1^\top + \sigma_2 u_2 v_2^\top + \dots + \sigma_m u_m v_m^\top + \mathbf{0} \approx \mathbf{\tilde{U}}\mathbf{\tilde{\Sigma}}\mathbf{\tilde{V}}^\top $$

3. Dominant Correlations

4. SVD and Truncation

In $\mathbf{X} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^\top$, we can choose $r$ columns of $\mathbf{U}$ and $\mathbf{V}$. “How to choose $r$” is a fundamental problem in SVD. $$ \mathbf{X} = \mathbf{\tilde{U}}_r\mathbf{\tilde{\Sigma}}_r\mathbf{\tilde{V}}_r^\top $$ Generally we need to set a threshold for $\log \sigma$ and choose those $\sigma$s beyond the threshold on the diagonal of $\mathbf{\Sigma}$.

Recommending extra reading: Gavish-Donoho (2014)

Applying a Gaussian noise to $\mathbf{X}$:

$$ \mathbf{X} = \mathbf{X}\text{true} + \delta \mathbf{X}\text{noise} $$

PS: I did not mange to fully understand the idea here, more details about the truncation can be found in the video and the paper using the URL in this section

5. Example: Image Compression

To better understand what SVD does in a practical task, we consider a image processing task. It is easy to represent an image using a matrix. After calculate the SVD of this matrix, by choosing different $r$, we can keep different features of the image. When we choose to keep all the singular values, we can get the original image.

Videos: