In the domain of chemometrics, multiblock data analysis is widely performed for exploring or fusing data from multiple sources. Commonly used methods for multiblock predictive analysis are the extensions of… Click to show full abstract
In the domain of chemometrics, multiblock data analysis is widely performed for exploring or fusing data from multiple sources. Commonly used methods for multiblock predictive analysis are the extensions of latent space modelling approaches. However, recently, deep learning (DL) approaches such as convolutional neural networks (CNNs) have outperformed the single block traditional latent space modelling chemometric approaches such as partial least-square (PLS) regression. The CNNs based DL modelling can also be performed to simultaneously deal with the multiblock data but was never explored until this study. Hence, this study for the first time presents the concept of parallel input CNNs based DL modelling for multiblock predictive chemometric analysis. The parallel input CNNs based DL modelling utilizes individual convolutional layers for each data block to extract key features that are later combined and passed to a regression module composed of fully connected layers. The method was tested on a real visible and near-infrared (Vis-NIR) large data set related to dry matter prediction in mango fruit. To have the multiblock data, the visible (Vis) and near-infrared (NIR) parts were treated as two separate blocks. The performance of the parallel input CNN was compared with the traditional single block CNNs based DL modelling, as well as with a commonly used multiblock chemometric approach called sequentially orthogonalized partial least-square (SO-PLS) regression. The results showed that the proposed parallel input CNNs based deep multiblock analysis outperformed the single block CNNs based DL modelling and the SO-PLS regression analysis. The root means squared errors of prediction obtained with deep multiblock analysis was 0.818%, relatively lower by 4 and 20% than single block CNNs and SO-PLS regression, respectively. Furthermore, the deep multiblock approach attained ∼3% lower RMSE compared to the best known on the mango data set used for this study. The deep multiblock analysis approach based on parallel input CNNs could be considered as a useful tool for fusing data from multiple sources.
               
Click one of the above tabs to view related content.