There is a wealth of information contained within one’s microbiome regarding their physiology and environment, and this is a promising avenue for developing non-invasive diagnostic tools. Here, we utilize 5643… Click to show full abstract
There is a wealth of information contained within one’s microbiome regarding their physiology and environment, and this is a promising avenue for developing non-invasive diagnostic tools. Here, we utilize 5643 aggregated, annotated whole-community metagenomes from 19 different diseases to implement the first multiclass microbiome disease classifier of this scale. We compared three different machine learning models: random forests, deep neural nets, and a novel graph convolutional architecture which exploits the graph structure of phylogenetic trees as its input. We show that the graph convolutional model outperforms deep neural nets in terms of accuracy (achieving 75% average test-set accuracy), receiver-operator-characteristics (92.1% average AUC), and precision-recall (50% average AUPR). Additionally, the convolutional net’s performance complements that of the random forest, achieving similar accuracy but better receiver-operator-characteristics and lower area under precision-recall. Lastly, we are able to achieve over 90% average top-3 accuracy across all of our models. Together, these results indicate that there are predictive, disease specific signatures across microbiomes which could potentially be used for diagnostic purposes.
               
Click one of the above tabs to view related content.