We present a new Rep-Seq analysis tool called corecount, for analyzing genotypic variation in immunoglobulin (IG) and T cell receptor (TCR) genes. corecount is highly efficient at identifying V alleles,… Click to show full abstract
We present a new Rep-Seq analysis tool called corecount, for analyzing genotypic variation in immunoglobulin (IG) and T cell receptor (TCR) genes. corecount is highly efficient at identifying V alleles, including those that are infrequently used in expressed repertoires and those that contain 3’ end variation that are otherwise refractory to reliable identification during germline inference from expressed libraries. Furthermore, corecount facilitates accurate D and J gene genotyping. The output is highly reproducible and facilitates the comparison of genotypes from multiple individuals, such as those from clinical cohorts. Here, we applied corecount to the genotypic analysis of IgM libraries from 16 individuals. To demonstrate the accuracy of corecount, we Sanger sequenced all the heavy chain IG alleles (65 IGHV, 27 IGHD and 7 IGHJ) from one individual from whom we also produced two independent IgM Rep-seq datasets. Genomic analysis revealed that 5 known IGHV and 2 IGHJ sequences are truncated in current reference databases. This dataset of genomically validated alleles and IgM libraries from the same individual provides a useful resource for benchmarking other bioinformatic programs that involve V, D and J assignments and germline inference, and may facilitate the development of AIRR-Seq analysis tools that can take benefit from the availability of more comprehensive reference databases.
               
Click one of the above tabs to view related content.