Background Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. Methods… Click to show full abstract
Background Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. Methods We developed EagleImp, a software with algorithmic and technical improvements and new features for accurate and accelerated phasing and imputation in a single tool. Results We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with more than 1 million reference genomes. EagleImp is 2 to 10 times faster (depending on the single or multiprocessor configuration selected) than Eagle2/PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp provides same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. It has many new features, including automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files, and various user-configurable algorithm and output options. Conclusions Due to the technical optimisations, EagleImp can perform fast and accurate reference-based phasing and imputation for future very large reference panels with more than 1 million genomes. EagleImp is freely available for download from https://github.com/ikmb/eagleimp.
               
Click one of the above tabs to view related content.