Species tree estimation from multi-locus datasets is complicated by processes such as incomplete lineage sorting (ILS) that result in different loci having different trees. Summary methods, which estimate species trees… Click to show full abstract
Species tree estimation from multi-locus datasets is complicated by processes such as incomplete lineage sorting (ILS) that result in different loci having different trees. Summary methods, which estimate species trees by combining gene trees, are popular but their accuracy is impaired by gene tree estimation error. Other approaches have been developed that only use the site patterns to estimate the species tree, and so are not impacted by gene tree estimation issues. In particular, PAUP∗ provides a method in which SVDquartets is used to compute a set Q of quartet trees (i.e., trees on four leaves), and then a heuristic search is used to combine the quartet trees into a species tree T, seeking to maximize the number of quartet trees in Q that agree with T. The PAUP∗ method based on SVDquartets (henceforth referred to as SVDquartets + PAUP∗) is increasingly used in phylogenomic studies due to its ability to reconstruct species trees without needing to estimate accurate gene trees. We present SVDquest∗, a new method for constructing species trees using site patterns that is guaranteed to produce species trees that satisfy at least as many quartet trees as SVDquartets + PAUP∗. We show that SVDquest∗ is competitive with ASTRAL and ASTRID (two leading summary methods) in terms of topological accuracy, and tends to be more accurate than ASTRAL and ASTRID under conditions with relatively high gene tree estimation error. SVDquest∗ is available in open source form at https://github.com/pranjalv123/SVDquest.
               
Click one of the above tabs to view related content.