Incomplete lineage sorting (ILS) causes the phylogeny of some parts of the genome to differ from the species tree. In this work, we investigate the frequencies and determinants of ILS… Click to show full abstract
Incomplete lineage sorting (ILS) causes the phylogeny of some parts of the genome to differ from the species tree. In this work, we investigate the frequencies and determinants of ILS in 29 major ancestral nodes across the entire primate phylogeny. We find up to 64% of the genome affected by ILS at individual nodes. We exploit ILS to reconstruct speciation times and ancestral population sizes. Estimated speciation times are much more recent than genomic divergence times and are in good agreement with the fossil record. We show extensive variation of ILS along the genome, mainly driven by recombination but also by the distance to genes, highlighting a major impact of selection on variation along the genome. In many nodes, ILS is reduced more on the X chromosome compared with autosomes than expected under neutrality, which suggests higher impacts of natural selection on the X chromosome. Finally, we show an excess of ILS in genes with immune functions and a deficit of ILS in housekeeping genes. The extensive ILS in primates discovered in this study provides insights into the speciation times, ancestral population sizes, and patterns of natural selection that shape primate evolution. Description INTRODUCTION Incomplete lineage sorting generates gene trees that are incongruent with the species tree. Incomplete lineage sorting has been described in many phylogenetic clades, including birds, marsupials, and primates. For example, the level of incomplete lineage sorting in the human-chimp-gorilla branch adds up to ~30%, which means that, even though our closest primate relatives are chimps, 15% of our genome resembles more the gorilla than the chimp genome, and another 15% groups the chimp with the gorilla first. RATIONALE Although incomplete lineage sorting is usually regarded as an obstacle for phylogenetic reconstruction, it holds valuable information about the evolutionary history of the species because its extent depends on the ancestral effective population sizes and the time between speciation events. Additionally, recurrent ancestral selective processes are expected to influence how the proportion of incongruent trees varies along the genome, which makes incomplete lineage sorting a useful tool to study ancient evolutionary events. In this study, we estimate the incomplete lineage sorting landscape by running a coalescent hidden Markov model in species trios along a 50-way primate genome alignment. We then leverage the signal of incomplete lineage sorting to reconstruct ancestral effective population parameters and to analyze the genomic determinants that influence the sorting of lineages. RESULTS We find widespread incomplete lineage sorting across the primate tree in 29 nodes, some reaching as much as 64% of the genome. Combining CoalHMM with a machine learning pipeline, we reconstruct the speciation times of the primate phylogeny without the need for fossil calibrations. Our speciation time estimates are more recent than divergence times, and they are in agreement with previous estimates based on fossil evidence. Our reconstructed ancestral effective population sizes show that they increase toward the past. We additionally detect regions that have low or high incomplete lineage sorting levels consistently across several nodes. We show that incomplete lineage sorting proportions increase with the recombination rate in the genomic region—a difference that translates into an up to fourfold variation in the inferred local effective population size. Moreover, we report low levels of incomplete lineage sorting on the X chromosome. This reduction is more pronounced than expected under neutral evolution, which suggests that selective forces affect the X chromosome more strongly than the autosomes, reducing the effective population size of the X chromosome and, subsequently, the levels of incomplete lineage sorting. We further assess how selection affects the distribution of incomplete lineage sorting patterns by comparing the incomplete lineage sorting proportions of exons with those in intergenic regions. We find that there is an overall decrease in the levels of incomplete lineage sorting in exons that amounts to a reduction of 31% in the local effective population size as compared with intergenic regions. Finally, we perform a gene ontology enrichment analysis on low– and high–incomplete lineage sorting genes. We find that immune system genes show large proportions of incomplete lineage sorting for many of the nodes, whereas housekeeping genes with basic cell functions show a lack of incomplete lineage sorting. CONCLUSION Most molecular-based methods that aim at timing a species tree provide estimates of divergence times, which are confounded by ancestral population sizes compared with the actual speciation times. We showed that using the coalescent theory and the signal of incomplete lineage sorting allows us to accurately estimate speciation times and ancestral population sizes in the primate tree, gaining key insights regarding some aspects of primate biology. Our study also emphasizes the prevalence of natural selection at linked sites that shapes the landscape of both genetic diversity and incomplete lineage sorting along the primate genome. Inference of the speciation history and the genomic landscape of natural selection in primates from patterns of incomplete lineage sorting. CoalHMM was used to capture the signal of incomplete lineage sorting (ILS) segments along the genomes of 50 primate species and to estimate coalescent parameters—i.e., the ancestral effective population sizes and speciation times. Moreover, the genome-wide variation in the levels of incomplete lineage sorting allowed for the inference of selective processes in primates. ChrX, X chromosome.
               
Click one of the above tabs to view related content.