Microeukaryotes are among the most important components of the microbial food web in almost all aquatic and terrestrial ecosystems worldwide. In order to gain a better understanding their roles and… Click to show full abstract
Microeukaryotes are among the most important components of the microbial food web in almost all aquatic and terrestrial ecosystems worldwide. In order to gain a better understanding their roles and functions in ecosystems, sequencing coupled with phylogenomic analyses of entire genomes or transcriptomes is increasingly used to reconstruct the evolutionary history and classification of these microeukaryotes and thus provide a more robust framework for determining their systematics and diversity. More importantly, phylogenomic research usually requires high levels of hands‐on bioinformatics experience. Here, we propose an efficient automated method, “Guided Phylogenomic Search in trees” (GPSit), which starts from predicted protein sequences of newly sequenced species and a well‐defined customized orthologous database. Compared with previous protocols, our method streamlines the entire workflow by integrating all essential and other optional operations. In so doing, the manual operation time for reconstructing phylogenetic relationships is reduced from days to several hours, compared to other methods. Furthermore, GPSit supports user‐defined parameters in most steps and thus allows users to adapt it to their studies. The effectiveness of GPSit is demonstrated by incorporating available online data and new single‐cell data of three nonculturable marine ciliates (Anteholosticha monilata, Deviata sp. and Diophrys scutum) under moderate sequencing coverage (~5×). Our results indicate that the former could reconstruct robust “deep” phylogenetic relationships while the latter reveals the presence of intermediate taxa in shallow relationships. Based on empirical phylogenomic data, we also used GPSit to evaluate the impact of different levels of missing data on two commonly used methods of phylogenetic analyses, maximum likelihood (ML) and Bayesian inference (BI) methods. We found that BI is less sensitive to missing data when fast‐evolving sites are removed.
               
Click one of the above tabs to view related content.