1. What’s there?
    1. Why do I write this?
  2. Eigen decomposition
    1. Wait…
  3. Basics!
    1. Linear Transformation
    2. Eigenvectors and eigenvalues
    3. Important matrix types
  4. Eigendecomposition
    1. Basics
    2. When can you eigendecompose?
    3. Ok, why can you eigendecompose?
    4. Eigendecomposition is powerful!
    5. The Spectral Theorem for Symmetric Matrices
    6. Cool, ok, but where do I use it?
  5. Sources

What’s there?

There are basically 5 (ಐದು / पाँच) types of matrix decompositions that are like super important. I strive to go in depth into all of them. These are

  1. Eigendecomposition (This post!)
  2. QR decomposition
  3. Cholesky decomposition
  4. LU decomposition
  5. (perhaps the most important) Singular Value Decomposition

Edit (July 20, 2025): Added proof of spectral theorem.

Why do I write this?

I wanted to create an authoritative document I could refer to for later. Thought that I might as well make it public (build in public, serve society and that sort of stuff).

Eigen decomposition

Wait…

Basics!

Linear Transformation

A linear transformation is a function \Phi: V \to W between two vector spacesV andW that preserves the operations of vector addition and scalar multiplication. For finite-dimensional vector spaces, any linear transformation\Phi: \mathbb{R}^n \to \mathbb{R}^m can be represented by an m \times n matrix\mathbf{A} . The action of the transformation on a vector\mathbf{x} \in \mathbb{R}^n is given by matrix-vector multiplication:\Phi(\mathbf{x}) = \mathbf{A}\mathbf{x}.

Ok, got it? Basic linear algebra.

Eigenvectors and eigenvalues

Let \mathbf{A} \in \mathbb{R}^{n \times n} be a square matrix. A nonzero vector \mathbf{x} \in \mathbb{R}^n \setminus {\mathbf{0}} is an eigenvector of \mathbf{A} if there exists a scalar \lambda \in \mathbb{C} such that \mathbf{A}\mathbf{x} = \lambda\mathbf{x}. The scalar \lambda is called the eigenvalue corresponding to the eigenvector \mathbf{x}.

I will now add a bit of geometry. The equation states that the vector \mathbf{A}\mathbf{x} (the result of the transformation) is collinear (fancy term for along the same line) with the original vector \mathbf{x}. The eigenvalue \lambda is the scaling factor.

  • If \lambda > 1, the eigenvector is stretched.
  • If 0 < \lambda < 1, the eigenvector is compressed.
  • If \lambda < 0, the eigenvector is flipped and scaled.
  • If \lambda = 1, the eigenvector is unchanged.
  • If \lambda = 0, the eigenvector is mapped to the zero vector, meaning it lies in the null space of \mathbf{A}.

Want a reminder on how to find these? The expression p_{\mathbf{A}}(\lambda) = \det(\mathbf{A} - \lambda\mathbf{I}) is a polynomial in \lambda of degree n, called the characteristic polynomial. The roots of this polynomial are the eigenvalues of \mathbf{A}. By the fundamental theorem of algebra, an n-degree polynomial has n roots (counting multiplicity) in the complex plane \mathbb{C}.

Important matrix types

Ok, this is purely LLM generated.

  • Diagonal Matrices (\mathbf{D}): Square matrices where all non-diagonal entries are zero (D_{ij} = 0 for i \neq j). They represent pure scaling along the coordinate axes. Their powers (\mathbf{D}^k) and inverses (\mathbf{D}^{-1}) are trivial to compute.
  • Triangular Matrices (\mathbf{L}, \mathbf{U}): Square matrices that are either lower triangular (\mathbf{L}, where L_{ij} = 0 for i < j) or upper triangular (\mathbf{U}, where U_{ij} = 0 for i > j). Systems of equations involving triangular matrices can be solved efficiently via forward or backward substitution.
  • Orthogonal Matrices (\mathbf{Q}): Square matrices whose columns (and rows) form an orthonormal basis. They satisfy the property \mathbf{Q}^\top\mathbf{Q} = \mathbf{Q}\mathbf{Q}^\top = \mathbf{I}, which implies \mathbf{Q}^{-1} = \mathbf{Q}^\top. Geometrically, orthogonal matrices represent pure rotations or reflections, which preserve lengths and angles of vectors: |\mathbf{Q}\mathbf{x}|_2 = |\mathbf{x}|_2.

I wrote this (!!) –

  • Also, Diagonalizability. A square matrix \mathbf{A} \in \mathbb{R}^{n \times n} is said to be diagonalizable if it has n linearly independent eigenvectors. Such matrices are also known as non-defective matrices.

Eigendecomposition

Basics

Ok, we can start now. For eigen decompositions, we are super concerned with endomorphisms. This is when the domain and the codomains of the linear transformation are in the same space. This is basically – \Phi: \mathbb{R}^n \to \mathbb{R}^n, represented by a square matrix \mathbf{A} \in \mathbb{R}^{n \times n}.

Why do you need to know this? You don’t. I knew this and barfed the information on you.

When can you eigendecompose?

If a matrix \mathbf{A} \in \mathbb{R}^{n \times n} is diagonalizable, it can be factored as \mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}^{-1}, where:

  • \mathbf{P} \in \mathbb{R}^{n \times n} is a matrix whose columns are the n linearly independent eigenvectors of \mathbf{A}, i.e., \mathbf{P} = [\mathbf{p}_1, \mathbf{p}_2, \dots, \mathbf{p}_n].
  • \mathbf{D} \in \mathbb{R}^{n \times n} is a diagonal matrix whose diagonal entries are the corresponding eigenvalues, D_{ii} = \lambda_i.
  • \mathbf{P}^{-1} is the inverse of the matrix \mathbf{P}. The existence of \mathbf{P}^{-1} is guaranteed by the linear independence of the eigenvectors.

Ok, why can you eigendecompose?

Let \mathbf{A} have n linearly independent eigenvectors \mathbf{p}_1, \dots, \mathbf{p}_n with corresponding eigenvalues \lambda_1, \dots, \lambda_n. We can write the n eigenvalue equations as:

\mathbf{A}\mathbf{p}_i = \lambda_i\mathbf{p}_i, \quad i=1, \dots, n

We can concatenate these equations into a single matrix equation. Let \mathbf{P} = [\mathbf{p}_1, \dots, \mathbf{p}_n]. Then:

\mathbf{A}\mathbf{P} = \mathbf{A}[\mathbf{p}_1, \dots, \mathbf{p}_n] = [\mathbf{A}\mathbf{p}_1, \dots, \mathbf{A}\mathbf{p}_n] = [\lambda_1\mathbf{p}_1, \dots, \lambda_n\mathbf{p}_n]

The right-hand side can be expressed as the product of \mathbf{P} and a diagonal matrix \mathbf{D}:

[\lambda_1\mathbf{p}_1, \dots, \lambda_n\mathbf{p}_n] = [\mathbf{p}_1, \dots, \mathbf{p}_n] \begin{bmatrix} \lambda_1 & 0 & \dots & 0 \\ 0 & \lambda_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \lambda_n \end{bmatrix} = \mathbf{P}\mathbf{D}

Thus, we have the relationship \mathbf{A}\mathbf{P} = \mathbf{P}\mathbf{D}. Since the eigenvectors are linearly independent, the matrix \mathbf{P} is invertible. We can right-multiply by \mathbf{P}^{-1} to obtain the eigendecomposition:

\mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}^{-1}

Side Note: Notice how \mathbf{P} \in \mathbb{R}^{n \times n} is not orthonormal. It can be, sure. Your call. Why? It’s inverse removes the norm of the individual columns. Trivial enough (math textbook, much?).

Eigendecomposition is powerful!

Eigendecomposition reveals fundamental properties of a matrix:

  • Determinant: The determinant of \mathbf{A} is the product of its eigenvalues.

\det(\mathbf{A}) = \det(\mathbf{P}\mathbf{D}\mathbf{P}^{-1}) = \det(\mathbf{P})\det(\mathbf{D})\det(\mathbf{P}^{-1}) = \det(\mathbf{D}) = \prod_{i=1}^n \lambda_i

  • Trace: The trace of \mathbf{A} is the sum of its eigenvalues.

\text{tr}(\mathbf{A}) = \text{tr}(\mathbf{P}\mathbf{D}\mathbf{P}^{-1}) = \text{tr}(\mathbf{D}\mathbf{P}^{-1}\mathbf{P}) = \text{tr}(\mathbf{D}) = \sum_{i=1}^n \lambda_i

  • Matrix Powers: Eigendecomposition simplifies the computation of matrix powers.

\mathbf{A}^k = (\mathbf{P}\mathbf{D}\mathbf{P}^{-1})^k = \mathbf{P}\mathbf{D}^k\mathbf{P}^{-1}

The Spectral Theorem for Symmetric Matrices

Ok, here’s a special case – what if the matrix is symmetric? (Why do we care about symmetric matrices? Remind yourself about covariance matrices.)

Theorem (Spectral Theorem): For any real symmetric matrix \mathbf{A} \in \mathbb{R}^{n \times n}:

  1. All eigenvalues of \mathbf{A} are real numbers.
  2. There exists an orthonormal basis of \mathbb{R}^n consisting of the eigenvectors of \mathbf{A}. That is, the eigenvectors can be chosen to be mutually orthogonal and have unit norm.

Proof 1: Eigenvalues are Real

Let \lambda be an eigenvalue of \mathbf{A} with a corresponding eigenvector \mathbf{x}. These may be complex, so we start with the standard eigenvalue equation:

\mathbf{A}\mathbf{x} = \lambda\mathbf{x}

Now, take the conjugate transpose (denoted by a star, *) of both sides:

(\mathbf{A}\mathbf{x})^* = (\lambda\mathbf{x})^*

\mathbf{x}^*\mathbf{A}^* = \bar{\lambda}\mathbf{x}^*

Since the matrix \mathbf{A} is real and symmetric, its conjugate transpose is itself (\mathbf{A}^* = \mathbf{A}^\top = \mathbf{A}). This simplifies the equation to:

\mathbf{x}^{*}\mathbf{A} = \bar{\lambda}\mathbf{x}^{*}

Next, we right-multiply this equation by the original eigenvector \mathbf{x}:

\mathbf{x}^*\mathbf{A}\mathbf{x} = \bar{\lambda}\mathbf{x}^*\mathbf{x} \quad \cdots \text{(Eq. 1)}

Now, let’s return to the original equation, \mathbf{A}\mathbf{x} = \lambda\mathbf{x}, and left-multiply it by \mathbf{x}^*:

\mathbf{x}^*\mathbf{A}\mathbf{x} = \mathbf{x}^*(\lambda\mathbf{x}) = \lambda\mathbf{x}^*\mathbf{x} \quad \cdots \text{(Eq. 2)}

By comparing Eq. 1 and Eq. 2, we can see that:

\lambda\mathbf{x}^*\mathbf{x} = \bar{\lambda}\mathbf{x}^*\mathbf{x}

(\lambda - \bar{\lambda})\mathbf{x}^*\mathbf{x} = 0

The term \mathbf{x}^*\mathbf{x} is the squared magnitude of the vector \mathbf{x}. Since an eigenvector cannot be a zero vector, \mathbf{x}^*\mathbf{x} must be a positive real number. Therefore, we can safely divide by it, leaving:

\lambda - \bar{\lambda} = 0 \quad \implies \quad \lambda = \bar{\lambda}

A number is equal to its own complex conjugate if and only if it is a real number. This completes the proof that all eigenvalues of a real symmetric matrix are real.

Proof 2: Eigenvectors of Distinct Eigenvalues are Orthogonal

Let \lambda_1 and \lambda_2 be two distinct eigenvalues (\lambda_1 \neq \lambda_2) of \mathbf{A}, with corresponding eigenvectors \mathbf{x}_1 and \mathbf{x}_2. We have:

  1. \mathbf{A}\mathbf{x}_1 = \lambda_1\mathbf{x}_1
  2. \mathbf{A}\mathbf{x}_2 = \lambda_2\mathbf{x}_2

Let’s start with the first equation and left-multiply by \mathbf{x}_2^\top:

\mathbf{x}_2^\top\mathbf{A}\mathbf{x}_1 = \mathbf{x}_2^\top(\lambda_1\mathbf{x}_1) = \lambda_1\mathbf{x}_2^\top\mathbf{x}_1

Now, let’s take the transpose of the entire expression. Since the result is a scalar, it is equal to its own transpose:

(\mathbf{x}_2^\top\mathbf{A}\mathbf{x}_1)^\top = (\lambda_1\mathbf{x}_2^\top\mathbf{x}_1)^\top

\mathbf{x}_1^\top\mathbf{A}^\top\mathbf{x}_2 = \lambda_1\mathbf{x}_1^\top\mathbf{x}_2

Because \mathbf{A} is symmetric (\mathbf{A}^\top = \mathbf{A}), this becomes:

\mathbf{x}_1^\top\mathbf{A}\mathbf{x}_2 = \lambda_1\mathbf{x}_1^\top\mathbf{x}_2 \quad \cdots \text{(Eq. 3)}

Next, we return to the second original equation (\mathbf{A}\mathbf{x}_2 = \lambda_2\mathbf{x}_2) and left-multiply by \mathbf{x}_1^\top:

\mathbf{x}_1^\top\mathbf{A}\mathbf{x}_2 = \mathbf{x}_1^\top(\lambda_2\mathbf{x}_2) = \lambda_2\mathbf{x}_1^\top\mathbf{x}_2 \quad \cdots \text{(Eq. 4)}

By comparing Eq. 3 and Eq. 4, we get:

\lambda_1\mathbf{x}_1^\top\mathbf{x}_2 = \lambda_2\mathbf{x}_1^\top\mathbf{x}_2

(\lambda_1 - \lambda_2)\mathbf{x}_1^\top\mathbf{x}_2 = 0

We started with the assumption that the eigenvalues are distinct, so \lambda_1 - \lambda_2 \neq 0. For the product to be zero, the other term must be zero:

\mathbf{x}_1^\top\mathbf{x}_2 = 0

This shows that the eigenvectors \mathbf{x}_1 and \mathbf{x}_2 are orthogonal.

Yeah, that’s it.

Cool, now think about eigendecomposition. \mathbf{P} is suddenly an orthogonal matrix. Thus, for any real symmetric matrix \mathbf{A}, its eigendecomposition is:

\mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}^\top

This form has a powerful geometric interpretation: any transformation by a symmetric matrix can be understood as a sequence of three operations:

  • A rotation back to the original coordinate system (\mathbf{P}).
  • A rotation or reflection (\mathbf{P}^\top) to align the coordinate system with the eigenvectors.
  • A scaling along these new axes by the eigenvalues (multiplication by \mathbf{D}).

“Huh?”, you ask. Yeah, the above order is wrong.

When you apply the transformation \mathbf{A} to a vector \mathbf{x}, you calculate \mathbf{A}\mathbf{x} = \mathbf{P}\mathbf{D}\mathbf{P}^\top\mathbf{x}. The correct sequence of geometric operations is:

  1. Rotation/Reflection (\mathbf{P}^\top): Aligns the coordinate system with the eigenvectors. This is essentially a change of basis.
  2. Scaling (\mathbf{D}): Scales the vector along these new axes.
  3. Rotation Back (\mathbf{P}): Returns to the original coordinate system.

Cool, ok, but where do I use it?

I point you back to the “Eigendecomposition is powerful!” subsection.

Sources

  1. Mathematics for Machine Learning (link)

Posted in

3 responses to “Matrix Decompositions 1”

  1. Matrix Decompositions 2 – Vineeth Bhat Avatar

    […] Matrix Decompositions 1 […]

    Like

  2. Matrix Decompositions 3 – Vineeth Bhat Avatar

    […] Matrix Decompositions 1 […]

    Like

Leave a reply to PCA, Spectral Clustering and t-SNE – Vineeth Bhat Cancel reply