LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

HDMC: a novel deep learning-based framework for removing batch effects in single-cell RNA-seq data

Photo from wikipedia

MOTIVATION With the development of single-cell RNA sequencing (scRNA-seq) techniques, increasingly more large-scale gene expression datasets become available. However, to analyze datasets produced by different experiments, batch effects among different… Click to show full abstract

MOTIVATION With the development of single-cell RNA sequencing (scRNA-seq) techniques, increasingly more large-scale gene expression datasets become available. However, to analyze datasets produced by different experiments, batch effects among different datasets must be considered. Although several methods have been recently published to remove batch effects in scRNA-seq data, two problems remain to be challenging and not completely solved: 1) how to reduce the distribution differences of different batches more accurately; 2) how to align samples from different batches to recover the cell type clusters. RESULTS We proposed a novel deep learning approach, which is a hierarchical distribution matching framework assisted with contrastive learning to address these two problems. Firstly, we design a hierarchical framework for distribution matching based on a deep autoencoder. This framework employs an adversarial training strategy to match the global distribution of different batches. This provides an improved foundation to further match the local distributions with a maximum mean discrepancy (MMD) based loss. For local matching, we divide cells in each batch into clusters and develop a contrastive learning mechanism to simultaneously align similar cluster pairs and keep noisy pairs apart from each other. This allows to obtain clusters with all cells of the same type (true positives), and avoid clusters with cells of different type (false positives). We demonstrate the effectiveness of our method on both simulated and real datasets. Results show that our new method significantly outperforms the state-of-the-art methods and has the ability to prevent overcorrection. AVAILABILITY The python code to generate results and figures in this paper is available at https://github.com/zhanglabNKU/HDMC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Keywords: framework; single cell; seq data; novel deep; batch effects; cell rna

Journal Title: Bioinformatics
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.