Abstract Premise DNA‐based species identification is critical when morphological identification is restricted, but DNA‐based identification pipelines typically rely on the ability to compare homologous sequence data across species. Because many… Click to show full abstract
Abstract Premise DNA‐based species identification is critical when morphological identification is restricted, but DNA‐based identification pipelines typically rely on the ability to compare homologous sequence data across species. Because many clades lack robust genomic resources, we present here a bioinformatics pipeline capable of generating genome‐wide single‐nucleotide polymorphism (SNP) data while circumventing the need for any reference genome or annotation data. Methods Using the SISRS bioinformatics pipeline, we generated de novo ortholog data for the genus Carya, isolating sites where genetic variation was restricted to a single Carya species (i.e., species‐informative SNPs). We leveraged these SNPs to identify both full‐species and hybrid Carya specimens, even at very low sequencing depths. Results We identified between 46,000 and 476,000 species‐identifying SNPs for each of eight diploid Carya species, and all species identifications were concordant with the species of record. For all putative F1 hybrid specimens, both parental species were correctly identified in all cases, and more punctate patterns of introgression were detectable in more cryptic crosses. Discussion Bioinformatics pipelines that use only short‐read sequencing data provide vital new tools enabling rapid expansion of DNA identification assays for model and non‐model clades alike.
               
Click one of the above tabs to view related content.