Despite the fundamental importance of single nucleotide polymorphisms (SNPs) to human evolution, there are still large gaps in our understanding of the forces that shape their distribution across the genome.… Click to show full abstract
Despite the fundamental importance of single nucleotide polymorphisms (SNPs) to human evolution, there are still large gaps in our understanding of the forces that shape their distribution across the genome. SNPs have been shown to not be distributed evenly, with directly adjacent SNPs found unusually frequently. Why this is the case is unclear. We illustrate how neighboring SNPs that cannot be explained by a single mutation event (that we term here sequential dinucleotide mutations [SDMs]) are driven by distinct processes to SNPs and multinucleotide polymorphisms (MNPs). By studying variation across populations, including a novel cohort of 1,358 Scottish genomes, we show that, SDMs are over twice as common as MNPs and like SNPs display distinct mutational spectra across populations. These biases are not only different to those observed among SNPs and MNPs but are also more divergent between human population groups. We show that the changes that make up SDMs are not independent and identify a distinct mutational profile, CA → CG → TG, that is observed an order of magnitude more often than expected from background SNP rates and the numbers of other SDMs involving the gain and deamination of CpG sites. Intriguingly particular pathways through the amino acid code appear to have been favored relative to that expected from intergenic SDM rates and the occurrences of coding SNPs, and in particular those that lead to the creation of single codon amino acids. We finally present evidence that epistatic selection has potentially disfavored sequential nonsynonymous changes in the human genome.
               
Click one of the above tabs to view related content.