Experimental Data-Driven Material Research
Experimental data plays a crucial role in advancing the discovery and design of molecules and materials. However, experimental datasets in chemistry and materials science are often limited, heterogeneous, and difficult to integrate due to differences in measurement conditions, reporting standards, and data formats.
In our group, we focus on developing data-driven approaches that leverage experimental data to improve machine learning models and materials discovery. Our research aims to collect, curate, and integrate experimental datasets from diverse sources, including literature, databases, and high-throughput experiments. By combining experimental observations with machine learning techniques, we seek to build models that better reflect real-world chemical systems.
Through the integration of experimental data with AI models and computational methods, we aim to bridge the gap between theoretical predictions and experimental outcomes. This approach enables more reliable materials discovery and helps accelerate the translation of computational insights into practical applications.
Extracting synthesis conditions, material properties, and experimental information from scientific literature using natural language processing and large language models.
Building high-quality datasets by collecting, cleaning, and standardizing experimental data from diverse sources such as literature, databases, and laboratory measurements.
Training machine learning models using experimental datasets to improve prediction accuracy and enable data-driven discovery of molecules and materials.
Selected papers
[1] Y. Kang#, W. Lee#, T. Bae#, S. Han, H. Jang, and J. Kim*
Harnessing Large Language Models to Collect and Analyze Metal–Organic Framework Property Data Set
Journal of the American Chemical Society, 2025
[2] H. Park#, Y. Park#, W. Choi, and J. Kim*
Mining insights on metal-organic framework synthesis from scientific literature texts
Journal of Chemical Information and Modeling, 2022