The identification of biomarkers through Mass spectrometry imaging (MSI) is gaining popularity in the clinical field. However, considering the complexity of spectral and spatial variables faced, data mining of the… Click to show full abstract
The identification of biomarkers through Mass spectrometry imaging (MSI) is gaining popularity in the clinical field. However, considering the complexity of spectral and spatial variables faced, data mining of the hyperspectral images can be troublesome. The discovery of markers generally depends on the creation of classification models which should be validated to ensure the statistical significance of the discriminants m/z detected. Internal validation using resampling methods such as cross validation (CV) are widely used for model selection, the estimation of its generalization performance and biomarker discovery when sample sizes are limited and an independent test set is not available. Here, we introduce for first time the use of Constrained Repeated Random Subsampling CV (CORRS-CV) on multi-images for the validation of classification models on MSI. Although several aspects must be taken into account (e.g. image size, CORRS-CVâvalue, the similarity across spatially close pixels, the total computation time), CORRS-CV provides more accurate estimates of the model performance than k-fold CV using of biological replicates to define the data split when the number of biological replicates is scarce and holding images back for testing is a waste of valuable information. Besides, the combined use of CORRS-CV and rank products increases the robustness of the selection of discriminant features as candidate biomarkers which is an important issue due to the increased biological, environmental and technical variabilities when analysing multiple images, especially from human tissues collected in clinical studies.
               
Click one of the above tabs to view related content.