Abstract Vibrational spectroscopy is an effective non-destructive technique, and it has been successfully applied in characteristics identification for agro-food samples. However, owing to the high dimensionality of spectral dataset, it… Click to show full abstract
Abstract Vibrational spectroscopy is an effective non-destructive technique, and it has been successfully applied in characteristics identification for agro-food samples. However, owing to the high dimensionality of spectral dataset, it is difficult to distinguish samples of different characteristics from observing the raw spectral. In this study, t-Distributed Stochastic Neighbor Embedding (t-SNE), an state-of-art method, was applied for visulization on the five vibrational spectroscopy data sets. The performances of t-SNE and the other reference methods (PCA and Isomap) were illustrated both from the differentiation ability in the 2-dimensional space and the accuracy of sequential classification model. For the former, t-SNE showed more satisfied visual discrimination results in 2-dimensional space and obtained better scores of clustering metrics, Silhouette Coefficient (0.59 average score compared to 0.24 achieved by PCA and 0.59 by Isomap) and Davies-Bouldin Index (1.51 average score compared to 2.58 achieved by PCA and 1.52 by Isomap). For the latter, two supervised classification models, k-nearest neighbor(KNN) and support vector machine(SVM), were constructed based on the new representations in 2-dimensional space, in both cases, the representations given by t-SNE outperformed the other methods in terms of accuracy(for KNN, 96% average accuracy compared to the 85% achieved by PCA and 92% by Isomap; for SVM, 96% average accuracy compared to the 86% achieved by PCA and 92% by Isomap). The results showed great potential of t-SNE for recognizing minute spectral differences between classes, and proved that t-SNE is an effective dimensionality reduction and visualization method, especially when complex and highly overlapping vibrational spectra are used for analysis.
               
Click one of the above tabs to view related content.