- Sangkeun (Matt) Lee 
- Jong Youl Choi 
- Anika Tabassum 
- Minsu Kim 
Big data, machine learning, artificial intelligence, and data science technologies have paved the way for numerous success stories across various fields. They offer innovative methods to integrate, reuse, and analyze extensive data volumes. These achievements have encouraged scientists in disciplines such as physics, chemistry, materials science, and medicine to investigate how these tools can enhance scientific research.
However, realizing the potential benefits comes with its set of challenges. Many of the existing software tools and systems were not designed with scientific research or the unique needs of scientists in mind. Moreover, scientists who lack programming or computer science expertise may find these tools difficult to use. Conversely, computer scientists might face hurdles in grasping domain-specific issues without adequate background knowledge.
This workshop is designed to bridge the gap between domain scientists and computer scientists. It aims to explore avenues for creating and utilizing tools, systems, and methodologies to advance scientific discovery. Participants will exchange success stories, share insights from lessons learned, and tackle the challenges that need to be addressed to foster fruitful collaborations across different fields.
The workshop will focus on the following questions:
- How do big data, machine learning, data science, and AI tools specifically designed for scientific research differ from traditional analytical tools? 
- What challenges do scientists face in terms of capturing, representing, maintaining, integrating, validating, and extrapolating data for scientific discovery? 
- What are the unique needs and hurdles domain scientists encounter when incorporating big data tools into their research? 
- How can computer scientists and domain scientists collaboratively identify and define research problems more effectively? 
- What obstacles hinder the application of big data in scientific discovery, and how do these challenges vary across different scientific domains? 
- How can big data technologies be leveraged to improve the accuracy and reliability of scientific experiments and simulations? 
- What influence does big data exert on the scientific method, and how can we ensure its utilization promotes scientific integrity and reproducibility? 
Research Topics Included in the Workshop:
Big data tools, systems, and methods are related to, but not limited to
- Scientific data processing (integration, standardization, sampling, etc.) 
- Artificial intelligence and machine learning 
- Text and graph mining 
- Database management, query processing, and query optimization 
- Parallel computation and high-performance computing 
- Visualization and user interface/HCI 
- Parallelization, performance, and scalability of data tools 
- High-performance computing 
- Uncertainty quantification 
- Combinational usage of simulation, experiment, machine learning models, and data 
- Data fusion 
which facilitate innovation and discovery in scientific domains such as:
- Physics 
- Chemistry 
- Material science 
- Mechanical engineering 
- Nuclear engineering 
- National security 
Biomedical science, and more.
Tutorials
We are organizing a demo and tutorial session for this year's workshop to broaden its reach and attract more researchers. This session will offer attendees the opportunity to showcase their work, exchange ideas, and receive hands-on training from field experts. We welcome tutorial submissions in a short-paper format. Tutorials can be either 30 minutes or 1 hour long. Topics will include, but are not limited to, scientific tools and software, AI/ML applications in science, and data processing and management for large-scale scientific data.
Program Committee Members 
- Feng Bao - Florida State University, USA 
- Sisi Duan - Tsinghua University, China 
- Guimu Guo - Rowan University, USA 
- Wontack Han - Icahn School of Medicine at Mount Sinai, USA 
- Ramkrishnan Kannan - Oak Ridge National Laboratory, USA 
- Kangil Kim - Gwangju Institute of Science and Technology, South Korea 
- Youngjae Kim - Sogang University, South Korea 
- Ralph Kube - Zap Energy, USA 
- Ohyung Kwon - Korea University of Technology and Education, South Korea 
- Sangsoo Lim - Dongguk University, South Korea 
- Ji Hwan Moon - Samsung Genome Institute, South Korea 
- Muralidhar Nikhil - Stevens Institute of Technology, USA 
- Minwoo Pak - Weill Cornell Medical College, USA 
- Jaehui Park - University of Seoul, South Korea 
- Sungjoon Park - Seoul National University, South Korea 
- Jian Peng - Wuhan University of Technology, China 
- Dongwon Shin - Oak Ridge National Laboratory, USA 
- Seungha Shin - University of Tennessee, USA 
- Yang Zhou - Auburn University, USA 
- Nikhil Muralidhar - Stevens Institute of Technology, USA 
Please submit a short  paper (minimum 4 page, up to 6 page IEEE 2-column format) or full paper (minimum 8 page, up to 10 page IEEE 2-column format) through the online submission system. Submission is single-blind review system.
https://wi-lab.com/cyberchair/2024/bigdata24/scripts/submit.php?subarea=S23&undisplay_detail=1&wh=/cyberchair/2024/bigdata24/scripts/ws_submit.php
Papers should be formatted to IEEE Computer Society Proceedings Manuscript Formatting Guidelines (see link to "formatting instructions" below). 
Formatting Instructions
8.5" x 11" (DOC, PDF) 
LaTex Formatting Macros
- Oct 1, 2024: Due date for workshop papers submission 
- Oct 7, 2024: Due date for workshop papers submission
 Please note that many have requested an extension, so kindly adhere to the new deadline.
 
- Nov 4, 2024: Notification of paper acceptance to authors 
Nov 8, 2024: Notification of paper acceptance to authors 
(delayed from Nov 4 due to submission deadline extension & review delay, we apologize for any inconvenience;)
- Nov 17, 2024: Camera-ready of accepted papers 
- Dec 15: Washington DC, USA (Full Day Workshop, Time TBD) 
A Generalized Outage Prediction Model for Various Types of Extreme Climate Events in Texas 
- Authors: Jangjae Lee (Texas A&M University), Sangkeun Lee (Oak Ridge National Laboratory), Stephanie Paal (Texas A&M University), Supriya Chinthavali (Oak Ridge National Laboratory) 
A Decision Support System to Compile Environmental Mitigations from Hydropower Licensing Documents
- Authors: Hong-Jun Yoon (Oak Ridge National Laboratory), Tom Ruggles (Oak Ridge National Laboratory), Huanhuan Zhao (University of Tennessee), Debjani Singh (Oak Ridge National Laboratory) 
A Deep Learning Approach to Maximizing Electrostatic Sieve Efficiency in Regolith Beneficiation
- Authors: Kalpit Vadnerkar (Clemson University), Amen Eze (Missouri University of Science and Technology), Rinoj Gautam (University of North Texas), Daoru Han (Missouri University of Science and Technology), Xin Liang (University of Kentucky), Tong Shu (University of North Texas) 
A framework for compressing unstructured scientific data via serialization
- Authors: Viktor Reshniak (Oak Ridge National Laboratory), Qian Gong (Oak Ridge National Laboratory), Rick Archibald (Oak Ridge National Laboratory), Scott Klasky (Oak Ridge National Laboratory), Norbert Podhorszki (Oak Ridge National Laboratory) 
AAD-LLM: Adaptive Anomaly Detection Using Large Language Models
- Authors: Alicia Russell-Gilbert (Mississippi State University), Alexander Sommers (Mississippi State University), Shahram Rahimi (Mississippi State University), Sudip Mittal (Mississippi State University) 
Discovering Propagating Signals in High-Content Multivariate Time Series via Spatio-Temporal Subsequence Clustering
- Authors: Jan David Hüwel (FernUniversität in Hagen), Georg Stefan Schlake (FernUniversität in Hagen), Kevin Albrechts, Christian Beecks (FernUniversität in Hagen) 
DISTRI: Development and Integration of Simulation Tools for Resilient Infrastructure
- Authors: Imtiaz Mahmud (Lawrence Berkeley National Laboratory), Pawel Zuk (University of Southern California), Cong Wang (RENCI), Mariam Kiran (Oak Ridge National Laboratory), Kesheng Wu (Lawrence Berkeley National Laboratory), Komal Thareja (RENCI), Krishnan Raghavan (Argonne National Laboratory), Anirban Mandal (RENCI), Ewa Deelman (University of Southern California) 
Exploration of TPU Architectures for the Optimized Transformer in Drainage Crossing Detection
- Authors: Amirhossein Nazeri (Clemson University), Denys Godwin (Clark University), Aikaterini Panteleaki (Southern Illinois University), Iraklis Anagnostopoulos (Southern Illinois University), Michael Edidem (Southern Illinois University), Ruopu Li (Southern Illinois University), Tong Shu (University of North Texas) 
LOCOS: A cosine based local gene expression pattern finding algorithm on time-series data
Model and Data Management for Machine Learning (M2ML): Integrating Instruments, Edge and HPC for Accelerated Machine Learning
- Authors: Weijian Zheng (Argonne National Laboratory), Hemant Sharma (Argonne National Laboratory), Ryan Chard (Argonne National Laboratory), Peter Kenesei (Argonne National Laboratory), Jun-Sang Park (Argonne National Laboratory), Nicholas Schwarz (Argonne National Laboratory), Antonino Miceli (Argonne National Laboratory), Ian T. Foster (Argonne National Laboratory), Rajkumar Kettimuthu (Argonne National Laboratory) 
Modeling Lunar Surface Charging Using Physics-Informed Neural Networks
- Authors: Niloofar Zendehdel (Missouri University of Science and Technology), Adib Mosharrof (University of Kentucky), Katherine Delgado (University of Kentucky), Daoru Han (Missouri University of Science and Technology), Xin Liang (University of Kentucky), Tong Shu (University of North Texas) 
Multivariate Data Augmentation for Predictive Maintenance using Diffusion
- Authors: Andrew Thompson (Mississippi State University), Alexander Sommers (Mississippi State University), Alicia Russell-Gilbert (Mississippi State University), Logan Cummins (Mississippi State University), Sudip Mittal (Mississippi State University), Shahram Rahimi (Mississippi State University), Maria Seale (U.S. Army Engineer Research and Development Center), Joseph Jaboure (U.S. Army Engineer Research and Development Center), Thomas Arnold (U.S. Army Engineer Research and Development Center), Joshua Church (U.S. Army Engineer Research and Development Center) 
Multivariate Time Series Clustering for Environmental State Characterization of Ground-Based Gravitational-Wave Detectors
- Authors: Rutuja Gurav, Isaac Kelly, Pooyan Goodarzi (University of California, Riverside), Anamaria Effler (LIGO Livingston Observatory), Barry Barish (University of California, Riverside), Evangelos Papalexakis (University of California, Riverside), Jonathan Richardson 
Privacy Preserving Federated Learning for Advanced Scientific Ecosystems
- Authors: Rick Archibald (Oak Ridge National Laboratory), Addi Malviya Thakur (Oak Ridge National Laboratory), Marshall McDonnell (Oak Ridge National Laboratory), Gregory Cage (Oak Ridge National Laboratory), Cody Stiner (Oak Ridge National Laboratory), Lance Drane (Oak Ridge National Laboratory), Michael Brim (Oak Ridge National Laboratory), Paul Laiu (Oak Ridge National Laboratory), Mathieu Doucet (Oak Ridge National Laboratory), William Heller (Oak Ridge National Laboratory), Ryan Coffee (SLAC National Accelerator Laboratory) 
SLWM: A Library for Implementing Complex Training Workflows for surrogates of MPC’s
- Authors: Stijn Bellis (University of Antwerp), Joachim Denil (University of Antwerp), Ramesh Krishnamurthy (University of Antwerp), Guillermo Pérez (University of Antwerp) 
Toward Smart Scheduling in Tapis
Tuning the interpolation basis in a multigrid decomposition for local error control
- Authors: Nicolas Vidal (Oak Ridge National Laboratory), Qian Gong (Oak Ridge National Laboratory), Viktor Reshniak (Oak Ridge National Laboratory), Scott Klasky (Oak Ridge National Laboratory) 
- The final camera-ready paper submission deadline is Nov 17, 2024, please don’t miss the deadline, otherwise, your paper won’t be published in the conference proceedings, Pls follow the URL for the camera-ready paper submission https://wi-lab.com/cyberchair/2024/bigdata24/scripts/BigData_2024_Camera_ready_instruction.php?subarea=S 
- Please prepare a 15-minute presentation for your accepted paper, followed by a 5-minute Q&A session. 
- Our workshop will be held on 15th, December. A detailed workshop schedule will be announced in early December. 
Virtual Presentation
For those who are presenting virtually (please join the Teams meeting 20 minutes before your presentation time) 
- Sangkeun (Matt) Lee obtained his Ph.D. degree in computer science and engineering from Seoul National University in 2012. He is currently an R&D Associate in the Computer Science and Mathematics Division at Oak Ridge National Laboratory. His research focuses on big data, data science, and machine learning, and he has applied state-of-the-art data analysis technologies to various application domains. Dr. Lee has developed numerous data analytics software, including the award-winning ORiGAMI, which won the 2016 DOE R&D 100 Award. He has also made significant contributions to leading computer science conferences and journals, such as ACM WWW, ACM RecSys, and Expert Systems with Applications. Over the past few years, Dr. Lee has collaborated with scientists from various disciplines, including material science, nuclear science, and mechanical engineering, and published research articles in esteemed scientific journals like the Journal of Nuclear Materials, Acta Materialia, The Electricity Journal, Advanced Theory, and Simulations. His interdisciplinary research has advanced knowledge in data science and machine learning applications, and he continues to lead innovative research projects in his field. 
- Jong Youl (Jong) Choi is a researcher working in the Discrete Algorithms Group, Computer Science and Mathematics Division, Oak Ridge National Laboratory (ORNL), Oak Ridge, Tennessee, USA. He earned his Ph.D. degree in Computer Science at Indiana University Bloomington in 2012 and his MS degree in Computer Science from New York University in 2004. His areas of research interest span data mining and machine learning algorithms, high-performance data-intensive computing, and parallel and distributed systems. More specifically, he is focusing on researching and developing data-centric machine learning algorithms for large-scale data management, in situ/in-transit data processing, and data management for code coupling. Jong Choi actively serves on conference committees and journal reviews such as ParaMo, CCPE, and CLUS. 
- Anika Tabassum is a research scientist at Oak Ridge National Laboratory, where she is contributing towards developing deep Learning for multi-scale data evolved from energy research. Her research interest broadly lies in  domain-guidance domain-adaptation, and domain-generalization machine learning models for scientific data. Recently, she has been focussing on foundational model capabilities for generalizing and solving science problems. She has been selected as RISING STAR 2023 by UT Austin and an outstanding postdoctoral award from her division at ORNL in 2022. She received her Ph.D. from the Department of Computer Science at Virginia Tech where she worked on bringing domain-guided ML to address multiple challenges to prepare and mitigate power system failures and disaster vulnerabilities. Her Ph.D. research work was funded by NSF Urban Computing fellowship. She won 1st prize in designing the COVID-19 forecasting model for the Facebook-CDC challenge. She has published in multiple venues as NeuRIPS, AAAI, ACM SigKDD, CIKM, IEEE BigData, IAAI, and journals like ACM TIST and Elsevier. 
- Minsu Kim, a research scientist at Oak Ridge National Laboratory, is deeply engaged in the application of advanced AI technologies, especially transformer models, to the study of Electronic Health Records (EHR) and genomic data within the field of bioinformatics. His focused efforts are geared towards refining healthcare analytics, with an aim to bring about incremental improvements in patient care through more accurate predictive modeling. Kim’s significant work has garnered attention in respected scientific journals, including BMC Medical Genomics, Scientific Reports, Clinical Cancer Research, and PLOS ONE, highlighting his contributions to the evolving landscape of medical research and data analysis. By integrating sophisticated AI methods into his research, Kim is dedicated to enhancing the interpretability and utility of complex biomedical datasets, thus supporting the ongoing advancement of healthcare technology and personalized medicine approaches.