Large-scale untargeted metabolomics studies suffer from individual variation, batch effects and instrument variability, making comparisons of common spectral features across studies difficult. One solution is to compare studies after compound… Click to show full abstract
Large-scale untargeted metabolomics studies suffer from individual variation, batch effects and instrument variability, making comparisons of common spectral features across studies difficult. One solution is to compare studies after compound identification. However, compound identification is expensive and time consuming. We successfully identify common spectral features across multiple studies, with a generalizable experimental design approach. First, we included an anchor strain, PD1074, during sample and data collection. Second, we collected data in blocks with multiple controls. These anchors enabled us to successfully integrate three studies of Caenorhabditis elegans for nuclear magnetic resonance (NMR) spectroscopy and liquid chromatography-mass spectrometry (LC-MS) data from five different assays. We found 34% and 14% of features to be significant in LC-MS and NMR, respectively. Between 20-50% of spectral features differ in a mutant and among a set of genetically diverse natural strains, suggesting this reduced set of spectral features are excellent targets for compound identification. GRAPHICAL ABSTRACT Fourteen C. elegans strains are used in three individual studies. PD1074, the anchor control strain (orange), is grown alongside test strains (green, yellow, purple). Multiple biological replicates of PD1074 captures environmental variation in growth conditions. Non-polar and polar metabolic data across the three studies (i.e., natural strains, central metabolism mutants, and UGT mutants) were collected by nuclear magnetic resonance (NMR) spectroscopy and liquid chromatography-mass spectrometry (LC-MS). Data acquisition controls in each block included biological reference material and pooled PD1074 samples. Biological replicates of PD1074 (n = 42 for LC-MS, n = 52 for NMR) were included in all batches. Meta-analysis provided comparable inferences to mixed effects models, and the estimated relative effects of each test strain to PD1074 and straightforward comparisons of test strains across experiments.
               
Click one of the above tabs to view related content.