Sangkeun (Matt) Lee
Hillary K. Fishler
Thomaz Carvalhaes
Minsu Kim
Big data, machine learning, artificial intelligence, and data science technologies have driven breakthroughs across diverse fields by enabling innovative ways to integrate, reuse, and analyze vast datasets. These successes have inspired scientists in physics, chemistry, materials science, and medicine to explore how these tools can advance their research. However, challenges remain. Many existing software tools and systems were not designed for scientific research or the specific needs of scientists. Scientists without programming or computer science expertise may struggle to use these tools effectively, while computer scientists
may lack the domain knowledge needed to address field-specific problems. This workshop aims to bridge the gap between domain scientists and computer/data scientists. It will foster collaboration by exploring tools, systems, and methodologies to enhance scientific discovery. Participants will share success stories, discuss lessons learned, and address challenges to promote effective cross-disciplinary partnerships.
The workshop will focus on the following questions:
• How do big data, machine learning, data science, and AI tools specifically designed for scientific research differ from traditional analytical tools?
• What challenges do scientists face in terms of capturing, representing, maintaining, integrating, validating, and extrapolating data for scientific discovery?
• What are the unique needs and hurdles domain scientists encounter when incorporating big data tools into their research?
• How can computer scientists and domain scientists collaboratively identify and define research problems more effectively?
• What obstacles hinder the application of big data in scientific discovery, and how do these challenges vary across different scientific domains?
• How can big data technologies be leveraged to improve the accuracy and reliability of scientific experiments and simulations?
• What influence does big data exert on the scientific method, and how can we ensure its utilization promotes scientific integrity and reproducibility?
Research Topics Included in the Workshop:
Big data tools, systems, and methods are related to, but not limited to
Scientific data processing (integration, standardization, sampling, etc.)
Artificial intelligence and machine learning
Text and graph mining
Database management, query processing, and query optimization
Parallel computation and high-performance computing
Visualization and user interface/HCI
Parallelization, performance, and scalability of data tools
High-performance computing
Uncertainty quantification
Combinational usage of simulation, experiment, machine learning models, and data
Data fusion
which facilitate innovation and discovery in scientific domains such as:
Use cases, success stories, ongoing research with interesting questions, and lessons learned in scientific discovery using big data tools, systems, and methods are highly encouraged to submit.
Tutorial papers that demonstrate the application of useful tools for processing big data in scientific research, which can be shared with the research community. We seek submissions that highlight the practicality of these tools, clearly showing their utility. We encourage submissions that describe innovative tools designed to accelerate scientific discovery through big data.
Program Committee Members
Please submit a short paper (minimum 4 page, up to 6 page IEEE 2-column format) or full paper (minimum 8 page, up to 10 page IEEE 2-column format) through the online submission system. Submission is single-blind review system.
https://wi-lab.com/cyberchair/2024/bigdata24/scripts/submit.php?subarea=S23&undisplay_detail=1&wh=/cyberchair/2024/bigdata24/scripts/ws_submit.php
Papers should be formatted to IEEE Computer Society Proceedings Manuscript Formatting Guidelines (see link to "formatting instructions" below).
Formatting Instructions
8.5" x 11" (DOC, PDF)
LaTex Formatting Macros
Oct 1, 2025: Due date for full workshop papers submission
Nov 4, 2025: Notification of paper acceptance to authors
Nov 23, 2025: Camera-ready of accepted papers
Dec 8-11 2025: Workshops
Sangkeun Matthew Lee earned his Ph.D. in Computer Science and Engineering from Seoul National University in 2012 and joined Oak Ridge National Laboratory (ORNL) in 2013 as a postdoctoral research associate. He currently serves as a Senior Research Staff member in the Critical Infrastructure Resilience Group within the Geospatial Science and Human Security Division of the National Security Sciences Directorate. Lee has led interdisciplinary research in data analytics for power systems, building science, materials science, and medical sciences. His work supports ORNL’s mission by advancing energy resilience analytics and developing high-impact publications and software tools for scientific and governmental applications.
Thomaz Carvalhaes is an interdisciplinary scientist and R&D Associate in the Critical Infrastructure Resilience Group at Oak Ridge National Laboratory. His work focuses on grid energy infrastructure as complex systems that face unexpected disruptions and a rapidly changing future. He has worked on a range of disaster resilience and infrastructure-related projects using a combination of quantitative and qualitative approaches including geospatial data analytics and modeling to develop impactful infrastructure resilience decision tools and datasets. He currently supports several energy resilience and reliability projects including providing data-driven and direct technical assistance to states to guide infrastructure investments and developing resilience-as- service capabilities together with local utilities.
Hillary K. Fishler serves as a research scientist in the Critical Infrastructure Resilience Group at Oak Ridge National Laboratory. Her current research centers around coupled-human natural systems related to electric grid infrastructure and ancillary services. Dr. Fishler’s passion for stakeholder engagement is driven by applying team science to data-driven narratives for actionable infrastructure and industry applications. Her multidisciplinary work supports current projects related to data governance, risk management, information security, social sciences, and land-use planning, ecology, and vegetation management.
Minsu Kim, a research scientist at Oak Ridge National Laboratory, is deeply engaged in the application of advanced AI technologies, especially transformer models, to the study of Electronic Health Records (EHR) and genomic data within the field of bioinformatics. His focused efforts are geared towards refining healthcare analytics, with an aim to bring about incremental improvements in patient care through more accurate predictive modeling. Kim’s significant work has garnered attention in respected scientific journals, including BMC Medical Genomics, Scientific Reports, Clinical Cancer Research, and PLOS ONE, highlighting his contributions to the evolving landscape of medical research and data analysis. By integrating sophisticated AI methods into his research, Kim is dedicated to enhancing the interpretability and utility of complex biomedical datasets, thus supporting the ongoing advancement of healthcare technology and personalized medicine approaches.