MOTIVATION The classification of high-throughput protein data based on mass spectrometry (MS) is of great practical significance in medical diagnosis. Generally, MS data is characterized by high dimension, which inevitably… Click to show full abstract
MOTIVATION The classification of high-throughput protein data based on mass spectrometry (MS) is of great practical significance in medical diagnosis. Generally, MS data is characterized by high dimension, which inevitably leads to prohibitive cost of computation. To solve this problem, one-bit compressed sensing (CS), which is an extreme case of quantized CS, has been employed on MS data to select important features with low dimension. Though enjoying remarkably reduction of computation complexity, the current one-bit CS method does not consider the unavoidable noise contained in MS dataset, and does not exploit the inherent structure of the underlying MS data. RESULTS We propose two feature selection methods based on one-bit CS to deal with the noise and the underlying block-sparsity features, respectively. In the first method, the feature selection problem is modeled as a perturbed one-bit CS problem, where the perturbation represents the noise in MS data. By iterating between perturbation refinement and feature selection, this method selects the significant features from noisy data. The second method formulates the problem as a perturbed one-bit block CS problem and selects the features block by block. Such block extraction is due to the fact that the significant features in the first method usually cluster in groups. Experiments show that, the two proposed methods have better classification performance for real MS data when compared with the existing method, and the second one outperforms the first one. AVAILABILITY The source code of our methods is available at: https://github.com/tianyan8023/OBCS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
               
Click one of the above tabs to view related content.