SREL Reprint #3771

 

Machine-learning based approach to examine ecological processes influencing the diversity of riverine dissolved organic matter composition

Moritz Müller1, Juliana D’Andrilli2, Victoria Silverman3, Raven L. Bier4, Malcolm A. Barnard5,6,
Miko Chang May Lee1, Florina Richard1,7,8, Andrew J. Tanentzap9, Jianjun Wang10,
Michaela de Melo11, and YueHan Lu12

1Faculty of Engineering, Computing and Science, Swinburne University of Technology
Sarawak Campus, Kuching, Malaysia
2Department of Biological Sciences and the Advanced Environmental Research Institute,
University of North Texas, Denton, TX, USA
3Woods Hole Oceanographic Institution, Woods Hole, MA, USA
4Savannah River Ecology Laboratory, University of Georgia, Aiken, SC, USA
5Center for Reservoir and Aquatic Systems Research and Department of Biology,
Baylor University, Waco, TX, USA
6Institute of Marine Sciences and Department of Earth, Marine, and Environmental Sciences,
University of North Carolina at Chapel Hill, Morehead City, NC, USA
7School of the Environment, The University of Queensland, Brisbane, QLD, Australia
8CSIRO Environment, Brisbane, QLD, Australia
9Ecosystems and Global Change Group, School of the Environment, Trent University,
Peterborough, ON, Canada
10State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences, Nanjing, China
11Interuniversity Research Group in Limnology (GRIL), University of Quebec at Montreal,
Montreal, QC, Canada
12Molecular Eco-Geochemistry Laboratory, Department of Geological Sciences,
The University of Alabama, Tuscaloosa, AL, USA

Abstract: Dissolved organic matter (DOM) assemblages in freshwater rivers are formed from mixtures of simple to complex compounds that are highly variable across time and space. These mixtures largely form due to the environmental heterogeneity of river networks and the contribution of diverse allochthonous and autochthonous DOM sources. Most studies are, however, confined to local and regional scales, which precludes an understanding of how these mixtures arise at large, e.g., continental, spatial scales. The processes contributing to these mixtures are also difficult to study because of the complex interactions between various environmental factors and DOM. Here we propose the use of machine learning (ML) approaches to identify ecological processes contributing toward mixtures of DOM at a continental-scale. We related a dataset that characterized the molecular composition of DOM from river water and sediment with Fourier-transform ion cyclotron resonance mass spectrometry to explanatory physicochemical variables such as nutrient concentrations and stable water isotopes (2H and 18O). Using unsupervised ML, distinctive clusters for sediment and water samples were identified, with unique molecular compositions influenced by environmental factors like terrestrial input and microbial activity. Sediment clusters showed a higher proportion of protein-like and unclassified compounds than water clusters, while water clusters exhibited a more diversified chemical composition. We then applied a supervised ML approach, involving a two-stage use of SHapley Additive exPlanations (SHAP) values. In the first stage, SHAP values were obtained and used to identify key physicochemical variables. These parameters were employed to train models using both the default and subsequently tuned hyperparameters of the Histogram-based Gradient Boosting (HGB) algorithm. The supervised ML approach, using HGB and SHAP values, highlighted complex relationships between environmental factors and DOM diversity, in particular the existence of dams upstream, precipitation events, and other watershed characteristics were important in predicting higher chemical diversity in DOM. Our data-driven approach can now be used more generally to reveal the interplay between physical, chemical, and biological factors in determining the diversity of DOM in other ecosystems.

Keywords: DOM, river networks, FTICR-MS, molecular composition, random forest, cluster analysis, ecosystem properties, unsupervised machine learning

SREL Reprint #3771

Müller, M., J. D'Andrilli, V. Silverman, R. L. Bier, M. A. Barnard, M. M. L. Chang, F. Richard, A. J. Tanentzap, J. Wang, M. de Melo, and Y. Lu. 2024. Machine-learning based approach to examine ecological processes influencing the diversity of riverine dissolved organic matter composition. Frontiers in Water 6(1379284).

 

This information was provided by the University of Georgia's Savannah River Ecology Laboratory (srel.uga.edu).