See the full schedule here.
Title: Semantic Data Management for Synchrotron Science: Materials Data Science Ontology for FAIR, Learning, and Reasoning
Keywords: Materials Data Science, Ontology, FAIR, Learning, Reasoning
Abstract: With advances in data science, deep learning, and machine reasoning, the opportunities for Data-Centric AI have expanded rapidly. The foundations of Materials Data Science (MDS) computing, integrating distributed (e.g. Cloudera, Hadoop) and high performance computing, neural network training engines such as Nvidia AISC, and RDF triplestore Graph Databases (e.g. Ontotext’s GraphDB), mean that statistical and deep learning on sparse, massive, and historical datasets are possible. Study Protocols are an essential feature of robust MDS studies, where data FAIRification is enabled by an ontology (MDS-Onto1), transforming messy data into linked data2,3. Our ontology strives to be a comprehensive mapping, integration, and transformation tool for semantically relevant concepts and their relationships in multiple materials domains, and throughout the Study Stages of a scientific study, including sample, tool, recipe, pre-processing, analysis, modeling and results publishing. These guide the research process by allowing the linked data to be serialized as JSON-LD files for exchange and deserialized with Apache Arrow into language-independent dataframes for analysis. With FAIR linked data and the MDS-Ontology we can prepare these data for Semantic or Machine Reasoning.
In Synchrotron Science, we have initially focused on Xray diffraction studies, where datasets at terabyte sizes are commonly acquired during beamtime studies, and scientist users have struggled to completely analyze the experimental results. Here we show how using a Study Protocol approach, and formally defining the Study Stages from samples to synchrotron xray beamlines, and analysis or modeling, enables complete data analysis and workflow automation, using the mds:Xray domain ontology of MDS-Onto. We illustrate this for two different materials studies, xray diffraction of Ti3Nb and Ti-6Al-4V, done respectively at DOE’s Advanced Photon Source (APS) and NSF’s Cornell High Energy Synchrotron Source (CHESS). We can automate the execution of these workflows using FAIRified data analyses, FAIRshake4 (a Python package built on a TensorFlow data loader) and Apache Airflow.
Authors of this Presentation: Roger H. French, Gabriel O. Ponon, Van D. Tran, Redad Mehdi, Jonah A. Bachman, Weiqi Yue, Kyle R. Henrikson, Frank Ernst, Neamul H. Khansur, Matthew A. Willard, Laura S. Bruckman, Yinghui Wu, Ozan Dernek, Quynh D. Tran, Erika I. Barcelos
Related Papers
1. B. P. Rajamohan et al., “Materials Data Science Ontology(MDS-Onto): Unifying Domain Knowledge in Materials and Applied Data Science,” Sci Data, vol. 12, p. 628, Apr. 2025, doi: 10.1038/s41597-025-04938-5.
2. Van D. Tran et al., “FAIRLinked: Transform research data into FAIR-compliant RDF Data.” July 2025. Available: https://pypi.org/project/FAIRLinked/
3. Gabriel O. Ponon, Balashanmuga Priyan Rajamohan, Hemant Sharma, and Roger H. French, “CEMENTO: A package to view and write ontologies directly from draw.io diagram files.” July 27, 2025. Available: https://github.com/cwru-sdle/CEMENTO.
4. Finley Holt, Daniel Savage, and Roger H. French, “FAIRshake: For the FAIR analysis of XRay Datasets.” Oct. 08, 2024. Available: https://github.com/cwru-sdle/FAIRshake.