The 5th International Workshop on Big Data & AI Tools, Models, and Use Cases for Innovative Scientific Discovery (BTSD) 2023

The 5th International Workshop on Big Data & AI Tools, Models, and Use Cases for Innovative Scientific Discovery (BTSD) 2024

Workshop Date/Time: December 15, 2024, Washington DC, USA (Full Day Workshop)
Conference Date: December 15-18, 2024, Washington DC, USA

Call for Papers

Program Chairs

Sangkeun (Matt) Lee
Jong Youl Choi
Anika Tabassum
Minsu Kim

Introduction to Workshop

Big data, machine learning, artificial intelligence, and data science technologies have paved the way for numerous success stories across various fields. They offer innovative methods to integrate, reuse, and analyze extensive data volumes. These achievements have encouraged scientists in disciplines such as physics, chemistry, materials science, and medicine to investigate how these tools can enhance scientific research.

However, realizing the potential benefits comes with its set of challenges. Many of the existing software tools and systems were not designed with scientific research or the unique needs of scientists in mind. Moreover, scientists who lack programming or computer science expertise may find these tools difficult to use. Conversely, computer scientists might face hurdles in grasping domain-specific issues without adequate background knowledge.

This workshop is designed to bridge the gap between domain scientists and computer scientists. It aims to explore avenues for creating and utilizing tools, systems, and methodologies to advance scientific discovery. Participants will exchange success stories, share insights from lessons learned, and tackle the challenges that need to be addressed to foster fruitful collaborations across different fields.

The workshop will focus on the following questions:

How do big data, machine learning, data science, and AI tools specifically designed for scientific research differ from traditional analytical tools?
What challenges do scientists face in terms of capturing, representing, maintaining, integrating, validating, and extrapolating data for scientific discovery?
What are the unique needs and hurdles domain scientists encounter when incorporating big data tools into their research?
How can computer scientists and domain scientists collaboratively identify and define research problems more effectively?
What obstacles hinder the application of big data in scientific discovery, and how do these challenges vary across different scientific domains?
How can big data technologies be leveraged to improve the accuracy and reliability of scientific experiments and simulations?
What influence does big data exert on the scientific method, and how can we ensure its utilization promotes scientific integrity and reproducibility?

Research Topics Included in the Workshop:

Big data tools, systems, and methods are related to, but not limited to

Scientific data processing (integration, standardization, sampling, etc.)
Artificial intelligence and machine learning
Text and graph mining
Database management, query processing, and query optimization
Parallel computation and high-performance computing
Visualization and user interface/HCI
Parallelization, performance, and scalability of data tools
High-performance computing
Uncertainty quantification
Combinational usage of simulation, experiment, machine learning models, and data
Data fusion

which facilitate innovation and discovery in scientific domains such as:

Physics
Chemistry
Material science
Mechanical engineering
Nuclear engineering
National security

Biomedical science, and more.

Tutorials

We are organizing a demo and tutorial session for this year's workshop to broaden its reach and attract more researchers. This session will offer attendees the opportunity to showcase their work, exchange ideas, and receive hands-on training from field experts. We welcome tutorial submissions in a short-paper format. Tutorials can be either 30 minutes or 1 hour long. Topics will include, but are not limited to, scientific tools and software, AI/ML applications in science, and data processing and management for large-scale scientific data.

Program Committee Members

Feng Bao - Florida State University, USA
Sisi Duan - Tsinghua University, China
Guimu Guo - Rowan University, USA
Wontack Han - Icahn School of Medicine at Mount Sinai, USA
Ramkrishnan Kannan - Oak Ridge National Laboratory, USA
Kangil Kim - Gwangju Institute of Science and Technology, South Korea
Youngjae Kim - Sogang University, South Korea
Ralph Kube - Zap Energy, USA
Ohyung Kwon - Korea University of Technology and Education, South Korea
Sangsoo Lim - Dongguk University, South Korea
Ji Hwan Moon - Samsung Genome Institute, South Korea
Muralidhar Nikhil - Stevens Institute of Technology, USA
Minwoo Pak - Weill Cornell Medical College, USA
Jaehui Park - University of Seoul, South Korea
Sungjoon Park - Seoul National University, South Korea
Jian Peng - Wuhan University of Technology, China
Dongwon Shin - Oak Ridge National Laboratory, USA
Seungha Shin - University of Tennessee, USA
Yang Zhou - Auburn University, USA
Nikhil Muralidhar - Stevens Institute of Technology, USA

Paper Submission

Please submit a short paper (minimum 4 page, up to 6 page IEEE 2-column format) or full paper (minimum 8 page, up to 10 page IEEE 2-column format) through the online submission system. Submission is single-blind review system.

https://wi-lab.com/cyberchair/2024/bigdata24/scripts/submit.php?subarea=S23&undisplay_detail=1&wh=/cyberchair/2024/bigdata24/scripts/ws_submit.php

Papers should be formatted to IEEE Computer Society Proceedings Manuscript Formatting Guidelines (see link to "formatting instructions" below).

Formatting Instructions

8.5" x 11" (DOC, PDF)

LaTex Formatting Macros

Important Dates

Oct 1, 2024: Due date for workshop papers submission
Oct 7, 2024: Due date for workshop papers submission
Please note that many have requested an extension, so kindly adhere to the new deadline.
Nov 4, 2024: Notification of paper acceptance to authors

Nov 8, 2024: Notification of paper acceptance to authors
(delayed from Nov 4 due to submission deadline extension & review delay, we apologize for any inconvenience;)

Nov 17, 2024: Camera-ready of accepted papers
Dec 15: Washington DC, USA (Full Day Workshop, Time TBD)

Accepted Papers

A Generalized Outage Prediction Model for Various Types of Extreme Climate Events in Texas

Authors: Jangjae Lee (Texas A&M University), Sangkeun Lee (Oak Ridge National Laboratory), Stephanie Paal (Texas A&M University), Supriya Chinthavali (Oak Ridge National Laboratory)

A Decision Support System to Compile Environmental Mitigations from Hydropower Licensing Documents

Authors: Hong-Jun Yoon (Oak Ridge National Laboratory), Tom Ruggles (Oak Ridge National Laboratory), Huanhuan Zhao (University of Tennessee), Debjani Singh (Oak Ridge National Laboratory)

A Deep Learning Approach to Maximizing Electrostatic Sieve Efficiency in Regolith Beneficiation

Authors: Kalpit Vadnerkar (Clemson University), Amen Eze (Missouri University of Science and Technology), Rinoj Gautam (University of North Texas), Daoru Han (Missouri University of Science and Technology), Xin Liang (University of Kentucky), Tong Shu (University of North Texas)

A framework for compressing unstructured scientific data via serialization

Authors: Viktor Reshniak (Oak Ridge National Laboratory), Qian Gong (Oak Ridge National Laboratory), Rick Archibald (Oak Ridge National Laboratory), Scott Klasky (Oak Ridge National Laboratory), Norbert Podhorszki (Oak Ridge National Laboratory)

AAD-LLM: Adaptive Anomaly Detection Using Large Language Models

Authors: Alicia Russell-Gilbert (Mississippi State University), Alexander Sommers (Mississippi State University), Shahram Rahimi (Mississippi State University), Sudip Mittal (Mississippi State University)

Discovering Propagating Signals in High-Content Multivariate Time Series via Spatio-Temporal Subsequence Clustering

Authors: Jan David Hüwel (FernUniversität in Hagen), Georg Stefan Schlake (FernUniversität in Hagen), Kevin Albrechts, Christian Beecks (FernUniversität in Hagen)

DISTRI: Development and Integration of Simulation Tools for Resilient Infrastructure

Authors: Imtiaz Mahmud (Lawrence Berkeley National Laboratory), Pawel Zuk (University of Southern California), Cong Wang (RENCI), Mariam Kiran (Oak Ridge National Laboratory), Kesheng Wu (Lawrence Berkeley National Laboratory), Komal Thareja (RENCI), Krishnan Raghavan (Argonne National Laboratory), Anirban Mandal (RENCI), Ewa Deelman (University of Southern California)

Exploration of TPU Architectures for the Optimized Transformer in Drainage Crossing Detection

Authors: Amirhossein Nazeri (Clemson University), Denys Godwin (Clark University), Aikaterini Panteleaki (Southern Illinois University), Iraklis Anagnostopoulos (Southern Illinois University), Michael Edidem (Southern Illinois University), Ruopu Li (Southern Illinois University), Tong Shu (University of North Texas)

LOCOS: A cosine based local gene expression pattern finding algorithm on time-series data

Authors: Youjeong Suk, Jaemin Jeon, Inuk Jung (Kyungpook National University)

Model and Data Management for Machine Learning (M2ML): Integrating Instruments, Edge and HPC for Accelerated Machine Learning

Authors: Weijian Zheng (Argonne National Laboratory), Hemant Sharma (Argonne National Laboratory), Ryan Chard (Argonne National Laboratory), Peter Kenesei (Argonne National Laboratory), Jun-Sang Park (Argonne National Laboratory), Nicholas Schwarz (Argonne National Laboratory), Antonino Miceli (Argonne National Laboratory), Ian T. Foster (Argonne National Laboratory), Rajkumar Kettimuthu (Argonne National Laboratory)

Modeling Lunar Surface Charging Using Physics-Informed Neural Networks

Authors: Niloofar Zendehdel (Missouri University of Science and Technology), Adib Mosharrof (University of Kentucky), Katherine Delgado (University of Kentucky), Daoru Han (Missouri University of Science and Technology), Xin Liang (University of Kentucky), Tong Shu (University of North Texas)

Multivariate Data Augmentation for Predictive Maintenance using Diffusion

Authors: Andrew Thompson (Mississippi State University), Alexander Sommers (Mississippi State University), Alicia Russell-Gilbert (Mississippi State University), Logan Cummins (Mississippi State University), Sudip Mittal (Mississippi State University), Shahram Rahimi (Mississippi State University), Maria Seale (U.S. Army Engineer Research and Development Center), Joseph Jaboure (U.S. Army Engineer Research and Development Center), Thomas Arnold (U.S. Army Engineer Research and Development Center), Joshua Church (U.S. Army Engineer Research and Development Center)

Multivariate Time Series Clustering for Environmental State Characterization of Ground-Based Gravitational-Wave Detectors

Authors: Rutuja Gurav, Isaac Kelly, Pooyan Goodarzi (University of California, Riverside), Anamaria Effler (LIGO Livingston Observatory), Barry Barish (University of California, Riverside), Evangelos Papalexakis (University of California, Riverside), Jonathan Richardson

Privacy Preserving Federated Learning for Advanced Scientific Ecosystems

Authors: Rick Archibald (Oak Ridge National Laboratory), Addi Malviya Thakur (Oak Ridge National Laboratory), Marshall McDonnell (Oak Ridge National Laboratory), Gregory Cage (Oak Ridge National Laboratory), Cody Stiner (Oak Ridge National Laboratory), Lance Drane (Oak Ridge National Laboratory), Michael Brim (Oak Ridge National Laboratory), Paul Laiu (Oak Ridge National Laboratory), Mathieu Doucet (Oak Ridge National Laboratory), William Heller (Oak Ridge National Laboratory), Ryan Coffee (SLAC National Accelerator Laboratory)

SLWM: A Library for Implementing Complex Training Workflows for surrogates of MPC’s

Authors: Stijn Bellis (University of Antwerp), Joachim Denil (University of Antwerp), Ramesh Krishnamurthy (University of Antwerp), Guillermo Pérez (University of Antwerp)

Toward Smart Scheduling in Tapis

Authors: Joe Stubbs (Texas Advanced Computing Center), Smruti Padhy (Texas Advanced Computing Center), Richard Cardone (Texas Advanced Computing Center)

Tuning the interpolation basis in a multigrid decomposition for local error control

Authors: Nicolas Vidal (Oak Ridge National Laboratory), Qian Gong (Oak Ridge National Laboratory), Viktor Reshniak (Oak Ridge National Laboratory), Scott Klasky (Oak Ridge National Laboratory)

Presentation Preparation

The final camera-ready paper submission deadline is Nov 17, 2024, please don’t miss the deadline, otherwise, your paper won’t be published in the conference proceedings, Pls follow the URL for the camera-ready paper submission https://wi-lab.com/cyberchair/2024/bigdata24/scripts/BigData_2024_Camera_ready_instruction.php?subarea=S
Please prepare a 15-minute presentation for your accepted paper, followed by a 5-minute Q&A session.
Our workshop will be held on 15th, December. A detailed workshop schedule will be announced in early December.

Virtual Presentation
For those who are presenting virtually (please join the Teams meeting 20 minutes before your presentation time)

Join the meeting now
Meeting ID: 258 143 082 497
Passcode: 6GM33Uk7
+1 865-276-6990,,864851769# United States, Oak Ridge
Find a local number
Phone conference ID: 864 851 769#

Registration

Please submit your camera ready version of your paperby Nov, 17: https://wi-lab.com/cyberchair/2024/bigdata24/scripts/BigData_2024_Camera_ready_instruction.php?subarea=S
Please register conference (author registration due by Nov 23): https://www3.cs.stonybrook.edu/~ieeebigdata2024/Registration.html
Every paper needs to prepare a video-recording presentation no matter whether the authors attend the conference in person or not. The authors need to upload the videorecording to the conference video server uploading site https://urldefense.us/v2/url?u=https-3A__ieeecps.org_-23-21_auth_login-3Fconference-5Fvideo-3D1-26pid-3D5lC3ltqVGxcgvU3jUbIUNv&d=DwIDaQ&c=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc&r=KMUQr-b1bQv8kCohm-_GHg&m=3VvK1SWSib6q0DK5ON2T_XERD0CG-U8Jw_ykVJjjjUQUlddVIN1jjaPbvoJhTXlA&s=wCi4vZ4P_mzmL1q8Ti-aOWcpt_5zGdKwl3Tkcrg1d2I&e= . The deadline to upload the video presentation is Nov 20, 2024.

Workshop Primary Contact

Sangkeun (Matt) Lee <lees4@ornl.gov>
Jong Youl <choij@ornl.gov>
Tabassum, Anika <tabassuma@ornl.gov>
Kim, Minsu <kimm@ornl.gov>

Organizers’ Background

Sangkeun (Matt) Lee obtained his Ph.D. degree in computer science and engineering from Seoul National University in 2012. He is currently an R&D Associate in the Computer Science and Mathematics Division at Oak Ridge National Laboratory. His research focuses on big data, data science, and machine learning, and he has applied state-of-the-art data analysis technologies to various application domains. Dr. Lee has developed numerous data analytics software, including the award-winning ORiGAMI, which won the 2016 DOE R&D 100 Award. He has also made significant contributions to leading computer science conferences and journals, such as ACM WWW, ACM RecSys, and Expert Systems with Applications. Over the past few years, Dr. Lee has collaborated with scientists from various disciplines, including material science, nuclear science, and mechanical engineering, and published research articles in esteemed scientific journals like the Journal of Nuclear Materials, Acta Materialia, The Electricity Journal, Advanced Theory, and Simulations. His interdisciplinary research has advanced knowledge in data science and machine learning applications, and he continues to lead innovative research projects in his field.
Jong Youl (Jong) Choi is a researcher working in the Discrete Algorithms Group, Computer Science and Mathematics Division, Oak Ridge National Laboratory (ORNL), Oak Ridge, Tennessee, USA. He earned his Ph.D. degree in Computer Science at Indiana University Bloomington in 2012 and his MS degree in Computer Science from New York University in 2004. His areas of research interest span data mining and machine learning algorithms, high-performance data-intensive computing, and parallel and distributed systems. More specifically, he is focusing on researching and developing data-centric machine learning algorithms for large-scale data management, in situ/in-transit data processing, and data management for code coupling. Jong Choi actively serves on conference committees and journal reviews such as ParaMo, CCPE, and CLUS.
Anika Tabassum is a research scientist at Oak Ridge National Laboratory, where she is contributing towards developing deep Learning for multi-scale data evolved from energy research. Her research interest broadly lies in domain-guidance domain-adaptation, and domain-generalization machine learning models for scientific data. Recently, she has been focussing on foundational model capabilities for generalizing and solving science problems. She has been selected as RISING STAR 2023 by UT Austin and an outstanding postdoctoral award from her division at ORNL in 2022. She received her Ph.D. from the Department of Computer Science at Virginia Tech where she worked on bringing domain-guided ML to address multiple challenges to prepare and mitigate power system failures and disaster vulnerabilities. Her Ph.D. research work was funded by NSF Urban Computing fellowship. She won 1st prize in designing the COVID-19 forecasting model for the Facebook-CDC challenge. She has published in multiple venues as NeuRIPS, AAAI, ACM SigKDD, CIKM, IEEE BigData, IAAI, and journals like ACM TIST and Elsevier.

Minsu Kim, a research scientist at Oak Ridge National Laboratory, is deeply engaged in the application of advanced AI technologies, especially transformer models, to the study of Electronic Health Records (EHR) and genomic data within the field of bioinformatics. His focused efforts are geared towards refining healthcare analytics, with an aim to bring about incremental improvements in patient care through more accurate predictive modeling. Kim’s significant work has garnered attention in respected scientific journals, including BMC Medical Genomics, Scientific Reports, Clinical Cancer Research, and PLOS ONE, highlighting his contributions to the evolving landscape of medical research and data analysis. By integrating sophisticated AI methods into his research, Kim is dedicated to enhancing the interpretability and utility of complex biomedical datasets, thus supporting the ongoing advancement of healthcare technology and personalized medicine approaches.

Workshop Schedule

Please upload your slides before at least 10 mins before presentation (if you're going to use your laptop, HDMI connection is available):

https://tinyurl.com/4jzjt9r9

Or please send your slides to leesangkeun@gmail.com

We’d love to know if you’d be interested in joining the BTSD 2024 workshop as a PC member. Your expertise and insights would greatly contribute to the success of this event. Please share your email if you’re willing to participate.

https://forms.gle/SjMQXrPHhPUqWz197

Google Sites

Report abuse