Abstract:
Microbial communities shape nearly every environment on earth, from the human gut to soil and marine ecosystems, through dense networks of chemical interactions. These interactions are mediated by metabolites that microbes produce, modify, exchange, or degrade, influencing processes such as drug metabolism, nutrient cycling, and host physiology. Untargeted LC-MS/MS metabolomics provides a direct window into this chemical layer, yet interpreting the high-dimensional, sparse, and largely unannotated data remains a major challenge. This thesis develops computational approaches that transform raw metabolomics data into mechanistic and ecological insight, focusing on three complementary goals: statistical exploration, biotransformation inference, and cross-omics integration.
Chapter 1 presents FBMN-STATS, a statistical workflow that integrates Feature-Based Molecular Networking (FBMN) with robust exploratory and differential analyses. Available as R, Python, and QIIME 2 workflows and as an interactive web application, FBMN-STATS guides users through preprocessing, multivariate and univariate analyses. It provides a reproducible and accessible framework for analyzing LC-MS/MS datasets within the GNPS ecosystem.
Chapter 2 introduces ChemProp2, a method to infer directional chemical relationships from time-resolved metabolomics data. While FBMN groups structurally related features, it lacks directional information. ChemProp2 addresses this by quantifying precursor-product patterns through anti-correlated temporal trajectories and cascade scoring, revealing multi-step transformation pathways. Applied to a gut synthetic community treated with 50 clinical drugs, ChemProp2 uncovered sequential degradation products and linked several transformations to microbial dynamics.
Chapter 3 extends beyond metabolite-metabolite relationships and links chemical patterns to microbial ecology. Because metabolite trajectories are strongly influenced by microbial abundance dynamics, we developed CorrOmics to perform scalable, correlation-based integration of LC-MS/MS features with microbial profiles. The tool incorporates a target-decoy FDR strategy to reduce false positives and supports hierarchical binning to stabilize taxa-level interpretation. Benchmarking with a 13-strain synthetic community demonstrated the strengths and limitations of correlation-based cross-omics analysis, emphasizing the role of experimental design and normalization.
Together, FBMN-STATS, ChemProp2, and CorrOmics form a cohesive workflow that advances metabolomics from statistical exploration to mechanistic inference and ecological integration. All tools are openly accessible via GNPS2, with source code provided through the Functional Metabolomics Lab GitHub