UNDERSTANDING PRINCIPAL COMPONENT ANALYSIS (PCA): A LINEAR ALGEBRA APPROACH TO DIMENSIONALITY REDUCTION

Sohail Ahmed Memon; Imtiaz Ahmed; Shoaibullah

Authors

Sohail Ahmed Memon
Imtiaz Ahmed
Shoaibullah

Keywords:

Principal Component Analysis (PCA), Dimensionality Reduction, Multivariate Data Analysis, Machine Learning Pre-processing, Classification, Multi-Classification

Abstract

Principal Component Analysis (PCA) is one of the most widely used techniques for dimensionality reduction in data analysis and machine learning. This work offers a mathematical based introduction to PCA, presenting its interpretation through the perspective of linear algebra. We begin by giving our view about the motivation for dimensionality reduction and by introducing the foundational concepts such as vectors, matrices, covariance, and eigen decomposition. We then present our work starting from the PCA algorithm step by step for projecting it onto a lower dimensional subspace reduced from the centred data. An example based on two-dimensional Seeds dataset is demonstrated to explain the entire process, which is supported by implementing machine learning model (LightGMB classifier) and staging visualizations. For this model, we achieved an improved accuracy score (ROC AUC score) after applying PCA and have discussed the comparison of classification performances. We further explore practical applications of PCA in image compression, noise reduction and machine learning. Finally, we discuss the strengths and limitations of PCA, highlighting when it is appropriate and when more complex techniques may be essential. The work is produced for the machine learning practitioners with a basic understanding of linear algebra and programming.