The 5th International Workshop on Big Data & AI Tools, Models, and Use Cases for Innovative Scientific Discovery (BTSD) 2024
Workshop Date/Time: TBD
Conference Date: December 15-18, 2024, Washington DC, USA
Call for Papers
Program Chairs
Sangkeun (Matt) Lee
Jong Youl Choi
Anika Tabassum
Minsu Kim
Introduction to Workshop
Big data, machine learning, artificial intelligence, and data science technologies have paved the way for numerous success stories across various fields. They offer innovative methods to integrate, reuse, and analyze extensive data volumes. These achievements have encouraged scientists in disciplines such as physics, chemistry, materials science, and medicine to investigate how these tools can enhance scientific research.
However, realizing the potential benefits comes with its set of challenges. Many of the existing software tools and systems were not designed with scientific research or the unique needs of scientists in mind. Moreover, scientists who lack programming or computer science expertise may find these tools difficult to use. Conversely, computer scientists might face hurdles in grasping domain-specific issues without adequate background knowledge.
This workshop is designed to bridge the gap between domain scientists and computer scientists. It aims to explore avenues for creating and utilizing tools, systems, and methodologies to advance scientific discovery. Participants will exchange success stories, share insights from lessons learned, and tackle the challenges that need to be addressed to foster fruitful collaborations across different fields.
The workshop will focus on the following questions:
How do big data, machine learning, data science, and AI tools specifically designed for scientific research differ from traditional analytical tools?
What challenges do scientists face in terms of capturing, representing, maintaining, integrating, validating, and extrapolating data for scientific discovery?
What are the unique needs and hurdles domain scientists encounter when incorporating big data tools into their research?
How can computer scientists and domain scientists collaboratively identify and define research problems more effectively?
What obstacles hinder the application of big data in scientific discovery, and how do these challenges vary across different scientific domains?
How can big data technologies be leveraged to improve the accuracy and reliability of scientific experiments and simulations?
What influence does big data exert on the scientific method, and how can we ensure its utilization promotes scientific integrity and reproducibility?
Research Topics Included in the Workshop:
Big data tools, systems, and methods are related to, but not limited to
Scientific data processing (integration, standardization, sampling, etc.)
Artificial intelligence and machine learning
Text and graph mining
Database management, query processing, and query optimization
Parallel computation and high-performance computing
Visualization and user interface/HCI
Parallelization, performance, and scalability of data tools
High-performance computing
Uncertainty quantification
Combinational usage of simulation, experiment, machine learning models, and data
Data fusion
which facilitate innovation and discovery in scientific domains such as:
Physics
Chemistry
Material science
Mechanical engineering
Nuclear engineering
National security
Biomedical science, and more.
Tutorials
We are organizing a demo and tutorial session for this year's workshop to broaden its reach and attract more researchers. This session will offer attendees the opportunity to showcase their work, exchange ideas, and receive hands-on training from field experts. We welcome tutorial submissions in a short-paper format. Tutorials can be either 30 minutes or 1 hour long. Topics will include, but are not limited to, scientific tools and software, AI/ML applications in science, and data processing and management for large-scale scientific data.
Program Committee Members
Feng Bao - Florida State University, USA
Sisi Duan - Tsinghua University, China
Guimu Guo - Rowan University, USA
Wontack Han - Icahn School of Medicine at Mount Sinai, USA
Ramkrishnan Kannan - Oak Ridge National Laboratory, USA
Kangil Kim - Gwangju Institute of Science and Technology, South Korea
Youngjae Kim - Sogang University, South Korea
Ralph Kube - Zap Energy, USA
Ohyung Kwon - Korea University of Technology and Education, South Korea
Sangsoo Lim - Dongguk University, South Korea
Ji Hwan Moon - Samsung Genome Institute, South Korea
Muralidhar Nikhil - Stevens Institute of Technology, USA
Minwoo Pak - Weill Cornell Medical College, USA
Jaehui Park - University of Seoul, South Korea
Sungjoon Park - Seoul National University, South Korea
Jian Peng - Wuhan University of Technology, China
Dongwon Shin - Oak Ridge National Laboratory, USA
Seungha Shin - University of Tennessee, USA
Yang Zhou - Auburn University, USA
Paper Submission
Please submit a short paper (minimum 4 page, up to 6 page IEEE 2-column format) or full paper (minimum 8 page, up to 10 page IEEE 2-column format) through the online submission system.
Papers should be formatted to IEEE Computer Society Proceedings Manuscript Formatting Guidelines (see link to "formatting instructions" below).
Formatting Instructions
Important Dates
Oct 1, 2024: Due date for full workshop papers submission
Nov 4, 2024: Notification of paper acceptance to authors
Nov 20, 2024: Camera-ready of accepted papers
Dec 15-18 2024 : Workshops
Presentation Preparation
TBD
Registration
TBD
Workshop Primary Contact
Sangkeun (Matt) Lee <lees4@ornl.gov>
Jong Youl <choij@ornl.gov>
Tabassum, Anika <tabassuma@ornl.gov>
Kim, Minsu <kimm@ornl.gov>
Organizers’ Background
Sangkeun (Matt) Lee obtained his Ph.D. degree in computer science and engineering from Seoul National University in 2012. He is currently an R&D Associate in the Computer Science and Mathematics Division at Oak Ridge National Laboratory. His research focuses on big data, data science, and machine learning, and he has applied state-of-the-art data analysis technologies to various application domains. Dr. Lee has developed numerous data analytics software, including the award-winning ORiGAMI, which won the 2016 DOE R&D 100 Award. He has also made significant contributions to leading computer science conferences and journals, such as ACM WWW, ACM RecSys, and Expert Systems with Applications. Over the past few years, Dr. Lee has collaborated with scientists from various disciplines, including material science, nuclear science, and mechanical engineering, and published research articles in esteemed scientific journals like the Journal of Nuclear Materials, Acta Materialia, The Electricity Journal, Advanced Theory, and Simulations. His interdisciplinary research has advanced knowledge in data science and machine learning applications, and he continues to lead innovative research projects in his field.
Jong Youl (Jong) Choi is a researcher working in the Discrete Algorithms Group, Computer Science and Mathematics Division, Oak Ridge National Laboratory (ORNL), Oak Ridge, Tennessee, USA. He earned his Ph.D. degree in Computer Science at Indiana University Bloomington in 2012 and his MS degree in Computer Science from New York University in 2004. His areas of research interest span data mining and machine learning algorithms, high-performance data-intensive computing, and parallel and distributed systems. More specifically, he is focusing on researching and developing data-centric machine learning algorithms for large-scale data management, in situ/in-transit data processing, and data management for code coupling. Jong Choi actively serves on conference committees and journal reviews such as ParaMo, CCPE, and CLUS.
Anika Tabassum is currently working as a Postdoctoral researcher at Oak Ridge National Laboratory, where she is contributing toward Deep Learning for multi-scale and multimodal battery analytics and plasma simulation for fusion energy. Her research focuses on developing deep learning models for robust scientific computing, specifically, she works on knowledge-guided ML and scientific ML. She received her Ph.D. from the Department of Computer Science at Virginia Tech where she worked on bringing knowledge-guided ML to address multiple challenges in power system failures and clean energy. Her Ph.D. research work was funded by an NSF Urban Computing fellowship. Apart from her primary research focus, she also worked on designing the COVID-19 forecasting model for the CDC challenge. She has published in multiple venues such as ACM SigKDD, AAAI, CIKM, IEEE BigData, IAAI, and journals like ACM TIST and Elsevier. She completed her bachelor's degree in Computer Science and Engineering from the Bangladesh University of Engineering and Technology.
Minsu Kim, a research scientist at Oak Ridge National Laboratory, is deeply engaged in the application of advanced AI technologies, especially transformer models, to the study of Electronic Health Records (EHR) and genomic data within the field of bioinformatics. His focused efforts are geared towards refining healthcare analytics, with an aim to bring about incremental improvements in patient care through more accurate predictive modeling. Kim’s significant work has garnered attention in respected scientific journals, including BMC Medical Genomics, Scientific Reports, Clinical Cancer Research, and PLOS ONE, highlighting his contributions to the evolving landscape of medical research and data analysis. By integrating sophisticated AI methods into his research, Kim is dedicated to enhancing the interpretability and utility of complex biomedical datasets, thus supporting the ongoing advancement of healthcare technology and personalized medicine approaches.