Abstract With the rapid development of genetic sequencing and DNA microarray technologies, a large number of gene expression data has been generated, which provides an important reference for tumor diagnosis.… Click to show full abstract
Abstract With the rapid development of genetic sequencing and DNA microarray technologies, a large number of gene expression data has been generated, which provides an important reference for tumor diagnosis. However, it is challenging to classify these gene expression data due to the high-dimensionality and small number of data samples. In this work, we propose an effective method to select the most discriminative genes from high-dimensional microarray data for benefiting tumor classification. In detail, each gene is regarded as a feature dimension and we build a novel computational model based on dual latent feature representation learning, referred as DLRL briefly, which can capture both the internal association of data samples and the relationship between different genes. Instead of measuring the importance of genes in original data space, we perform gene selection in the learned latent representation space which is more robust to noisy and redundant information. We first construct the affinity matrices for both samples and genes, which can represent the correlation information between data samples and genes, respectively. Then the dual latent representation learning is modelled via non-negative matrix factorization of the two affinity matrices. The low-dimensional latent representation matrix of sample space is treated as a pseudo-label matrix to guide the latent space projection of original data. Meanwhile, the sample projection matrix is unified with the latent representation matrix of gene space. An alternating algorithm is carefully designed to solve the resultant optimization problem. Extensive experiments on six commonly used publicly microarray datasets are conducted to demonstrate that the proposed method can steadily outperform other state-of-the-art methods in terms of microarray data classification. In addition, we also test the proposed model on a face image dataset as well as a digit image dataset to validate its efficacy for general unsupervised feature selection.
               
Click one of the above tabs to view related content.