BACKGROUND AND OBJECTIVE Cancer has become a complex health problem due to its high mortality. Over the past few decades, with the rapid development of the high-throughput sequencing technology and… Click to show full abstract
BACKGROUND AND OBJECTIVE Cancer has become a complex health problem due to its high mortality. Over the past few decades, with the rapid development of the high-throughput sequencing technology and the application of various machine learning methods, remarkable progress in cancer research has been made based on gene expression data. At the same time, a growing amount of high-dimensional data has been generated, such as RNA-seq data, which calls for superior machine learning methods able to deal with mass data effectively in order to make accurate treatment decision. METHODS In this paper, we present a semi-supervised deep learning strategy, the stacked sparse auto-encoder (SSAE) based classification, for cancer prediction using RNA-seq data. The proposed SSAE based method employs the greedy layer-wise pre-training and a sparsity penalty term to help capture and extract important information from the high-dimensional data and then classify the samples. RESULTS We tested the proposed SSAE model on three public RNA-seq data sets of three types of cancers and compared the prediction performance with several commonly-used classification methods. The results indicate that our approach outperforms the other methods for all the three cancer data sets in various metrics. CONCLUSIONS The proposed SSAE based semi-supervised deep learning model shows its promising ability to process high-dimensional gene expression data and is proved to be effective and accurate for cancer prediction.
               
Click one of the above tabs to view related content.