LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Refining statistics clarifies leukaemic stem cell genomics

Photo from wikipedia

Most current genomic acute myeloid leukaemia (AML) stem cell datasets are comprised of gene expression values obtained from subsets of cells partitioned by immunophenotype. Such boundaries, however, do not map… Click to show full abstract

Most current genomic acute myeloid leukaemia (AML) stem cell datasets are comprised of gene expression values obtained from subsets of cells partitioned by immunophenotype. Such boundaries, however, do not map 1-to-1 to functional stemness. Recently, Ng et al (2016) introduced the first genomic stem cell dataset where the boundaries were functionally defined; this dataset is quickly becoming the gold-standard for AML leukaemia stem cell (LSC) research. Here we apply a statistical model in the context of this dataset which better accounts for the vast heterogeneity between patients to more accurately identify genes responsible for functional stem cell behaviour. To identify functional AML stem cell genes, Ng et al (2016) collected samples from 78 AML patients and sorted each by CD34/CD38+/ status into 4 groups. These cell fractions were simultaneously profiled for gene expression by Illumina microarrays and challenged for engraftment potential in immunocompromised mice. About a third of the cell fractions were not successfully assayed and were excluded. LSC+ fractions were defined as those that engrafted (at any percentage); all other assayed fractions were classified as LSC . In the original analysis, for each gene, Ng et al (2016) compared the gene expression profiles from all 138 LSC+ cell fractions to the 89 LSC fractions using Smyth’s moderated t-test with a Benjamini-Hochberg correction for multiple testing. A subset of 17 of the most different genes was further developed into a model for a “stemness score” to predict resistance to therapy using bulk sample measurements. While reasonable, this statistical approach fails to account for a novel aspect of the original experimental design: the use of multiple samples from the same patients for testing engraftment potential. The gene expression differences identified by Ng et al (2016) must be big enough to stand out not only against intra-patient variability, but also against the well appreciated inter-patient heterogeneity in baseline expression levels. The experimental design, however, is such that we can fit a multivariate analysis of variance (ANOVA) and factor out the inter-patient heterogeneity to focus more precisely on the within-patient LSC+/LSC division of primary interest. Further, we can quantitatively assess the relative magnitudes of intraand inter-tumoural heterogeneity. If interpatient heterogeneity is larger, refining the analysis should markedly expand the set of detectable LSC-related changes. More specifically, for each gene, we fitted an ANOVA model with factors for likely sources of variability in gene expression: patient source, array batch and LSC status. We used the data from the 48 patients that had both LSC+ and LSC fractions. Log2-fold change for a given gene was calculated as the average of the mean difference in log2 expression between LSC+ and LSC cells from the same patient. We recreated the results reported by Ng et al (2016), using the methods they described, and compared these to results obtained from our ANOVA analysis (Fig 1A). Ng et al (2016) identified 104 unique genes. We identified 237 using the same thresholding rules (P < 0 01 after Benjamini–Hochberg correction for multiple testing, and absolute fold-change greater than 2); 90 were in both lists (Fig 1A). Only 1 gene of the 17-gene signature was lost using the modified approach (Table SI). From our model, we can also assess which variables contributed most to the variation in gene expression across all genes. By plotting the P-values corresponding to ‘batch’, ‘LSC status’ and ‘sample ID’, we observed that ‘sample ID’ had, by far, the most extreme (smallest) P-values (Fig 1B). This quantitatively supports the clinical and biological observation that the majority of differential gene expression is due to the inter-patient heterogeneity. Some genes, like CD34, were detected in both the original and modified analyses because they both (i) demonstrate a clear partition between expression in all engrafting cell fractions versus all non-engrafting cell fractions (Fig 1C) and (ii) have significantly higher expression in LSC+ cells when compared LSC cells within a given patient sample (Fig 1D). However, other genes of potential relevance in LSC function, like MN1, a well-known haematopoietic oncogene, are missed by the bulk analysis (Fig 1E) (Carella et al, 2007). When we looked at what appears to be essential for engraftment between cells from the same disease, we found that this gene is more highly expressed in LSC+ cells in the same number of patients as CD34 (Fig 1F). Manual inspection of the up-regulated genes identified by the ANOVA approach revealed multiple genes known to play a significant role in AML stem cell behaviour, such as MPL (Yoshihara et al, 2007). Interestingly, we also identified multiple genes recognized as regulators of various epithelial stem cells which have not yet been widely explored in AML LSCs, including MSRB3 (Morel et al, 2017) and SMYD3 (Wang et al, 2018). We performed gene set enrichment analysis using the same scoring criteria on results from the original manuscript and our ANOVA analyses (Subramanian et al, 2005). The fraction of significantly enriched pathways (false discovery rate, FDR <25%) was substantially greater for our ANOVA approach across the 3 major pathway databases we examined: Hallmark, KEGG, and Reactome (Fig 1H) (Kanehisa & Goto, 2000; Joshi-Tope et al, 2005; Liberzon et al, 2015). The pattern was consistent regardless of significance criteria (i.e. nominal P-value, FDR, etc.). Specifically looking within the well-defined Hallmark gene sets, the original

Keywords: lsc; stem cell; expression; gene; cell

Journal Title: British Journal of Haematology
Year Published: 2019

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.