LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Efficient tools for principal component analysis of complex data— a tutorial

Photo from wikipedia

Abstract Basic tools for exploration and interpretation of Principal Component Analysis (PCA) results are well-known and thoroughly described in many comprehensive tutorials. However, in the recent decade, several new tools… Click to show full abstract

Abstract Basic tools for exploration and interpretation of Principal Component Analysis (PCA) results are well-known and thoroughly described in many comprehensive tutorials. However, in the recent decade, several new tools have been developed. Some of them were originally created for solving authentication and classification tasks. In this paper we demonstrate that they can also be useful for the exploratory data analysis. We discuss several important aspects of the PCA exploration of high dimensional datasets, such as estimation of a proper complexity of PCA model, dependence on the data structure, presence of outliers, etc. We introduce new tools for the assessment of the PCA model complexity such as the plots of the degrees of freedom developed for the orthogonal and score distances, as well as the Extreme and Distance plots, which present a new look at the features of the training and test (new) data. These tools are simple and fast in computation. In some cases, they are more efficient than the conventional PCA tools. A simulated example provides an intuitive illustration of their application. Three real-world examples originated from various fields are employed to demonstrate capabilities of the new tools and ways they can be used. The first example considers the reproducibility of a handheld spectrometer using a dataset that is presented for the first time. The other two datasets, which describe the authentication of olives in brine and classification of wines by their geographical origin, are already known and are often used for the illustrative purposes. The paper is written in the form of tutorial; however, we do not touch upon the well-known things, such as the algorithms for the PCA decomposition, or interpretation of scores and loadings. Instead, we pay attention primarily to more advanced topics, such as exploration of data homogeneity, understanding and evaluation of an optimal model complexity. The tutorial is accompanied by links to free software that implements the tools.

Keywords: tools principal; component analysis; efficient tools; new tools; principal component; analysis

Journal Title: Chemometrics and Intelligent Laboratory Systems
Year Published: 2021

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.