Is a technique commonly used in machine learning for data complexity reduction. Based on the concept of projections, the goal is to reduce the number of total variables used in the analysis.
- Leads to smaller datasets while minimizing information loss.
- Turn data easier to be visualized.
The main idea here is to project the data into the directions with most variance. Combining the concepts of eigenvalues and eigenvectors, projections and covariance matrix is possible to find these directions. The covariance matrix characterizes the spread of the data, itβs eigenvectors tells the direction in which the matrix can be viewed as just a straight stretching and the largest eigenvalue tells which stretching is greatest.
Note
Larger the eigenvalue larger the variance when projecting data.
Mathematical Formulation
- Given a dataset matrix where is the number of observations and is the number of variables (features).
- Center the data, calculating
- Calculate the covariance matrix
- Calculate the eigenvalues and eigenvectors for the matrix and sort them from largest to smallest.
- Create the projection matrix :
- Project centered data
- Reconstructing the information