The use of array-based SNP genotyping in the beef and dairy industries has produced an astounding amount of medium-to-low density genomic data in the last decade. While low-density assays work… Click to show full abstract
The use of array-based SNP genotyping in the beef and dairy industries has produced an astounding amount of medium-to-low density genomic data in the last decade. While low-density assays work exceptionally well in the context of genomic prediction, they are less useful in mapping and causal variant discovery. This project focuses on maximizing imputation accuracies to the marker set of two high-density research assays, the Illumina Bovine HD, and the GGP-F250 which contains a large proportion of rare and potentially functional variants (~850,000 total SNPs). This 850K SNP set is well-suited for both imputation to sequence-level genotypes and direct downstream analysis. For testing, 310 animals from multiple breeds, all with observed HD and F250 genotypes, were downsampled to various commercial chip densities ranging from 8K–130K markers. We use both well-established and novel measures of imputation accuracy to categorize precisely where, why, and how imputation errors are made. These metrics provide insights into downstream interpretation and identify situations where caution should be exercised when analyzing imputed variants. We find that a large multi-breed composite imputation reference comprised of 36,131 samples with either HD and F250 genotypes significantly increases imputation accuracy compared to a standard within-breed reference panel, particularly at low minor allele frequencies. Breed composition information for each animal in our testing panel allowed us to identify how a breed’s representation in the reference panel affects the imputation accuracy of both purebred and admixed animals. Starting chip density also impacts imputation accuracy, but gains appear to plateau at around 50,000 markers. The addition of the F250’s rare variation to the reference panel increased the imputation accuracy of rare variants from the HD assay by an average of 4.32%. We expect this low MAF content from the F250 to have a similar positive impact on rare variant imputation at the sequence level. Early work using 850K imputed data in genomic predictions has shown substantial increases in both chip heritability and prediction accuracies. Using a large multi-breed reference and the best practices identified through this work will maximize imputation accuracies in virtually all cattle populations, particularly ones that are highly admixed with little or no available pedigree information.
               
Click one of the above tabs to view related content.