To the Editor — In recent years, there has been substantial interest in clonally stable monoallelic expression of autosomal genes in mammalian cells. This ‘autosomal analog of X-chromosome inactivation’ has… Click to show full abstract
To the Editor — In recent years, there has been substantial interest in clonally stable monoallelic expression of autosomal genes in mammalian cells. This ‘autosomal analog of X-chromosome inactivation’ has been observed to affect hundreds of genes in a variety of cell types (overview in ref. 1). These genes tend to encode cell-surface proteins2,3 and to show high heterozygosity in human populations4. This observation has intriguing implications regarding the role of monoallelic expression in biological variation, especially if such genes are abundant. In a recent paper5, Sandberg and colleagues have proposed that transient monoallelic expression due to transcriptional bursts is abundant, whereas clonally stable monoallelic expression (random monoclonal expression of autosomal genes (aRME), a term also used herein) is “surprisingly scarce (< 1% of genes).” They go on to state that their observations “[call] into question the notion of widespread clonal aRME affecting thousands of genes”1–3,6, suggesting a sharp contrast to previous work from several groups, including ours. Upon careful analysis, we argue that that the findings that they report are consistent with the literature on aRME, and apparent discrepancies are due to issues with either semantics or simple methodological choices. It is outside the scope of our comments to discuss complex technical issues involved in allele-specific analysis of single-cell RNAsequencing data. Thus, we will take the factual findings by Sandberg and colleagues as reported and will focus on how these findings relate to previous claims regarding clonal aRME (recent overview in ref. 7). First, we examine the meaning of ‘prevalence’ in the context of aRME. On the one hand, prevalence could refer to the number of aRME genes per clone, which can be relatively low. On the other hand, it could apply to the number of genes in the genome that are subject to this mode of regulation, which is much greater. In each individual clone, relatively few genes are classified as monoallelically expressed, and the same genes can be stably biallelically expressed in other clones. When multiple clones are assessed, however, the cumulative number of genes exhibiting aRME reaches into the hundreds. For example, in the first genomewide analysis of aRME in human cells, we used SNP-array analysis in bulk clonal cell populations to identify 30–50 genes with monoallelic expression per lymphoblast clone, with a total of approximately 400 observed across 12 clones2. We and other groups have also used RNA-seq analysis in bulk clonal populations of different mouse cell types (refs. 8–10 among others). These studies were designed to identify clonally stable aRME, which is consistent across most cells of a given clone, but would not detect transient monoallelic expression, which varies. By applying to these studies the same 98:2 allelic bias cutoff used by Sandberg and colleagues5, we found 362 monoallelic genes per clone in mouse B cells (701 over two clones)9, 178 per fibroblast clone (330 over two clones)9, and 301 per neuronal progenitor clone (1,079 genes over eight clones)10. These numbers are in line with the observations of clonal aRME by Sandberg and colleagues, especially considering the challenges in detecting the allelic bias of all but the highest expressing genes in single cells11. In their study5, Sandberg and colleagues reported 41 and 47 clonally stable aRME genes in two fibroblast clones, and five of these genes showed aRME in both clones. A straightforward extrapolation from these numbers (with a 10–15% probability that an aRME gene that is monoallelic in one clone would also show monoallelic expression in the second clone) would bring the total estimated number of aRME genes in fibroblasts to 300–400; these genes should be detected if a sufficiently large number of clones were analyzed. In addition, the reported overlap of aRME genes between the two clones suggests (on the basis of a simple binomial model) that as few as 10–20 independent clones may be sufficient to identify most informative aRME genes. The estimate of the potential genomewide prevalence of clonal aRME may be even higher, given that such genes show highly cell-type-specific expression8,9. Thus, a union over multiple cell types would cumulatively reach ~30% of all protein-coding genes in humans and mice2,4,9, as we have reported before by using a different approach. The second issue leading to apparent discrepancies in the numbers of aRME genes arises from a straightforward question of methodology: the number of genes classified as aRME will obviously strongly depend on the allelic bias threshold used in analysis. The choice of the 98% threshold by Sandberg and colleagues appears appropriate given the challenges of single-cell RNA-seq, which make confident detection of less extreme bias difficult11. However, given that this issue is one of measurement, the threshold imposed is by necessity arbitrary. Sandberg and colleagues advocate for a more stringent threshold; whether that threshold is functionally relevant is unclear. For instance, there is no biological reason to expect a dramatic functional difference between ‘monoallelic’ expression with a 98:2 allelic bias compared to a 97:3 bias. Thus, in settings that allow for more precise measurement of bias, the rationale for choosing any particular threshold depends on the biological question asked. We and other groups have often used more permissive thresholds, which are robust in bulk RNA-seq analysis. For instance, an RNA-seq study of neuronal lineage cells10 has applied an 85% allelic bias threshold and reported up to 2,444 genes with clonal monoallelic expression across eight clones. Defining larger, more inclusive sets of aRME genes allows the sets to be interrogated in genome-wide analyses, thus yielding new biological insights. For example, we have recently found that in neurodevelopmental disease, point mutations, but not copy number variants, are linked to pathology12. We have also reported an unexpected observation that this group of genes has been subject to large-scale balancing selection4. After carefully considering the definitions of aRME, we believe that the findings on clonally stable aRME reported by Sandberg and colleagues using single-cell analysis confirm, rather than call into question, previous analyses performed in bulk clonal cell populations. These findings all suggest that aRME is a phenomenon that affects a large fraction of genes in the genome. ❐
               
Click one of the above tabs to view related content.