MOTIVATION Rapid advances in single cell RNA sequencing (scRNA-seq) have produced higher-resolution cellular subtypes in multiple tissues and species. Methods are increasingly needed across datasets and species to i) remove… Click to show full abstract
MOTIVATION Rapid advances in single cell RNA sequencing (scRNA-seq) have produced higher-resolution cellular subtypes in multiple tissues and species. Methods are increasingly needed across datasets and species to i) remove systematic biases, ii) model multiple datasets with ambiguous labels, and iii) classify cells and map cell type labels. However, most methods only address one of these problems on broad cell types or simulated data using a single model type. It is also important to address higher-resolution cellular subtypes, subtype labels from multiple datasets, models trained on multiple datasets simultaneously, and generalizability beyond a single model type. RESULTS We developed a species- and dataset-independent transfer learning framework (LAmbDA) to train models on multiple datasets (even from different species) and applied our framework on simulated, pancreas, and brain scRNA-seq experiments. These models mapped corresponding cell types between datasets with inconsistent cell subtype labels while simultaneously reducing batch effects. We achieved high accuracy in labeling cellular subtypes (weighted accuracy simulated 1 datasets: 90%, simulated 2 datasets: 94%, pancreas datasets: 88%, and brain datasets: 66%) using LAmbDA Feedforward 1 Layer Neural Network with bagging. This method achieved higher weighted accuracy in labeling cellular subtypes than two other state-of-the-art methods, scmap and CaSTLe in brain (66% vs. 60% and 32%). Furthermore, it achieved better performance in correctly predicting ambiguous cellular subtype labels across datasets in 88% of test cases compared to CaSTLe (63%), scmap (50%), and MetaNeighbor (50%). LAmbDA is model- and dataset- independent and generalizable to diverse data types representing an advance in biocomputing. AVAILABILITY github.com/tsteelejohnson91/LAmbDA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
               
Click one of the above tabs to view related content.