MOTIVATION Nearly 40% of the genes in sequenced genomes have no experimentally- or computationally-derived functional annotations. To fill this gap, we seek to develop methods for network-based gene function prediction… Click to show full abstract
MOTIVATION Nearly 40% of the genes in sequenced genomes have no experimentally- or computationally-derived functional annotations. To fill this gap, we seek to develop methods for network-based gene function prediction that can integrate heterogeneous data for multiple species with experimentally-based functional annotations and systematically transfer them to newly-sequenced organisms on a genomewide scale. However, the large sizes of such networks pose a challenge for the scalability of current methods. RESULTS We develop a label propagation algorithm called FastSinkSource. By formally bounding its rate of progress, we decrease the running time by a factor of 100 without sacrificing accuracy. We systematically evaluate many approaches to construct multi-species bacterial networks and apply FastSinkSource and other state-of-the-art methods to these networks. We find that the most accurate and efficient approach is to pre-compute annotation scores for species with experimental annotations, and then to transfer them to other organisms. In this manner, FastSinkSource runs in under three minutes for 200 bacterial species. AVAILABILITY AND IMPLEMENTATION An implementation of our framework and all data used in this research are available at https://github.com/Murali-group/multi-species-GOA-prediction. CONTACT [email protected]. SUPPLEMENTARY INFORMATION Supplementary information is available at Bioinformatics online.
               
Click one of the above tabs to view related content.