The First International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD) 2019

in conjunction with 2019 IEEE International Conference on Big Data (IEEE BigData 2019)

December 9-12, 2019 @ Los Angeles, CA, USA

Workshop Date/Time: December 9, 1:30pm -6:30pm

Call for Papers

Program Co-chairs

Sangkeun (Matt) Lee, Computational Data Analytics Group, Computer Science and Mathematics Division, Oak Ridge National Laboratory, lees4@ornl.gov
Travis Johnston, Computational Data Analytics Group, Computer Science and Mathematics Division, Oak Ridge National Laboratory, johnstonjt@ornl.gov

Organizers’ Background

Sangkeun (Matt) Lee received his Ph.D. degree in computer science and engineering from Seoul National University in 2012. He is currently an R&D Associate at Computational Data Analytics Group of Computer Science and Mathematics Division. He has been studying big data, data science, and machine learning and applied the state-of-the-art data analysis technologies in many application domains. He has developed a number of data analytics software, and one of his developed software, ORiGAMI has won the 2016 DOE R&D 100 Award. He has been contributing many of leading computer science conferences and journals such as ACM WWW, ACM RecSys, Expert Systems with Applications. For last few years, he has collaborated with a scientists across various domains including material science, nuclear science, and mechanical engineering and published papers in science journals such as Journal of Nuclear Materials, Acta Materialia, The Electricity Journal, Advanced Theory and Simulations.
Travis Johnston received his Ph.D. in mathematics from the University of South Carolina in 2014. He is currently a Staff Research Scientist for Artificial Intelligence (AI) in High Performance Computing (HPC) at Oak Ridge National Laboratory. He has a highly interdisciplinary background having worked with physicists (studying ferroelectric polymers, quantum computing, neutron scattering, neutron powder diffraction, and electron microscopy), materials scientists specializing in alloy design, computational chemists (protein folding and other molecular dynamic simulations), medical researchers (cancer pathology), and others (geographers, climate scientists, and engineers). For his recent work on MENNDL (an evolutionary algorithm to design neural networks tailored to unique scientific datasets) he was a Gordon Bell finalist (SC18). He has been on the program committee for several conferences and workshops including, IEEE Cluster, Supercomputing (SC), and Machine Learning in High Performance Computing Environments (MLHPC, Workshop).

Introduction to Workshop

Advances in big data technology, artificial intelligence, and machine learning have created so many success stories in a wide range of areas, especially in industry. These success stories have been motivating scientists, who study physics, chemistry, materials, medicine and many more, to explore a new pathway of utilizing big data tools for their scientific activities.
However, there are barriers to overcome. Most existing big data tools, systems, and methodologies have been developed without considering scientific purposes or scientists’ specific requirements. They are not originally developed for scientists who have no or little knowledge of programming or computer science. On the other hand, for computer scientists, understanding the domain problem is often very challenging due to the lack of enough background knowledge.
We expect that big data technologies can play a great role in contributing to scientific innovation in many ways. There are already a lot of ongoing scientific projects around the world that aim to discover novel hypotheses, analyze big multidimensional data which couldn’t be handled by manually, and reduce the time required by complex calculations via machine. This workshop intends to bring domain scientists and computer scientists together while exploring and extending opportunities in the development of big data tools, systems, and methodologies for scientific discovery, to share success stories and lessons learned, and discuss challenges, which if overcome would enable successful collaboration across different domains, especially domain scientists and computer/data scientists.
In this workshop, we discuss the following questions:
- What makes big data tools for scientists different from the existing tools?
- What specific needs and challenges do domain scientists face when they try to adopt big data tools?
- How can computer scientists and domain scientists communicate to define a feasible problem together?
- What are the barriers of using big data for scientific discovery and how do these barriers differ in different science domains?

Research Topics Included in the Workshop

Big data tools, systems, and methods related to, but not limited to:
- Scientific data processing
- Artificial intelligence/Deep neural networks/Machine learning
- Text mining/Graph mining
- Database/Query processing/Query Optimization
- Parallel computation/High Performance Computing
- Visualization/User Interface/HCI
- Parallelization/Performance/Scalability
- High Performance Computing …
that facilitate innovation and discovery in a scientific domain, such as:
- Physics
- Chemistry
- Material science
- Mechanical engineering
- Nuclear engineering
- Biomedical science …
Use cases, success stories, lessens learned in scientific discovery using big data tools, systems, and methods

Program Committee Members

Tom Potok, Oak Ridge National Laboratory, potokte at ornl.gov
Da Yan, University of Alabama Birmingham, yanda at uab.edu
Sisi Duan, University of Maryland, Baltimore County, sduan at umbc.edu
Feng Bao, Florida State University, fbao at fsu.edu
Kyunghyun Cho, New York University, kyunghyun.cho at nyu.edu
Minsuk Kahng, Georgia Tech, kahng at gatech.edu
Youngjae Kim, SOGANG University, Seoul, Republic of Korea, youkim at sogang.ac.kr
Kangil Kim, GIST(Gwangju Institute of Science & Technology), kangil.kim.01 at gmail.com
Ramakrishnan Kannan, Oak Ridge National Laboratory, kannanr at ornl.gov
Sreenivas Rangan Sukumar, Cray Inc., ranganutk at gmail.com
Seungha Shin, University of Tennessee, sshin at utk.edu
Michael R Wyatt, University of Tennessee, mrwyattii at gmail.com

Paper Submission

Please submit a short paper (up to 4 page IEEE 2-column format) or full paper (up to 8 page IEEE 2-column format) through the online submission system.

Paper Submission Page

Papers should be formatted to IEEE Computer Society Proceedings Manuscript Formatting Guidelines (see link to "formatting instructions" below).

Formatting Instructions

8.5" x 11" (DOC, PDF)

LaTex Formatting Macros

Important Dates

Oct 11, 2019 (Extended) : Due date for abstract submission
Oct 17, 2019 (FINAL) Due date for short/full workshop papers submission
Nov 4, 2019: Notification of paper acceptance to authors
Nov 15, 2019: Camera-ready of accepted papers
Dec 9, 2019: Workshop Day (1:30pm - 7:00pm)

Location

The Westin Bonaventure, LA.

San Pedro

If you are a presenter, please find workshop chairs (Matt Sangkeun Lee or Travis Johnston) at the San Pedro before the session you're presenting.

Agenda

12/9/2019

Presentation (15 minutes) & Q/A (5 minutes)

Session1: Scientific Applications, 1:50pm – 3:40pm (Travis Johnston)

1:30pm

Welcome: Where are we at? What is missing?

(Matt Lee/Travis Johnston)

1:50pm

Identifying Time Series Similarity in Large-Scale Earth System Datasets

(Payton Linton, William Melodia, Alina Lazar, Deborah Agarwal, Ludovico Bianchi, Devarshi Ghoshal, Kesheng Wu, Gilberto Pastorello, and Lavanya Ramakrishnan)

2:10pm

Learning to Predict Material Structure from Neutron Scattering Data

(Cristina Garcia Cardona, Ramakrishnan Kannan, Travis Johnston, Thomas Proffen, Katharine Page, and Sudip Seal)

2:30pm

Realistic Transport Simulation: Tackling the Small Data Challenge with Open Data

(Guimu Guo, Jalal Majed Khalil, Da Yan, and Virginia Sisiopiku)

2:50pm

Information Extraction from Cancer Pathology Reports with Graph Convolution Networks for Natural Language Texts

(Hong-Jun Yoon, John Gounley, M. Todd Young, and Georgia Tourassi)

3:10pm

Machine Learning for Prediction of Mid to Long Term Habitual Transportation Mode Use

(Alina Lazar, Alexandra Ballow, Ling Jin, C. Anna Spurlock, Alex Sim, and Kesheng Wu)

3:40pm

Coffee Break

Session2: Scientific Tools & Methods, 4:00pm – 5:20pm (Matt Lee)

4:00pm

Clustered Latent Dirichlet Allocation for Scientific Discovery

(Christopher Gropp, Alexander Herzog, Ilya Safro, Paul Wilson, and Amy Apon)

4:20pm

Quantum Grover search-based optimization for innovative material discovery

(Sima Esfandiarpour Borujeni, Ramkumar Harikrishnakumar, and Saideep Nannapaneni)

4:40pm

Detecting Dependency Between Discrete Random Variables and Application

(Edgar Llamas, Ivan García, and Andrés Méndez)

5:00pm

Visualization System for Evolutionary Neural Networks for Deep Learning

(Junghoon Chae, Catherine Schuman, Steven Young, Travis Johnston, Derek Rose, Robert Patton, and Thomas Potok)

5:20pm

Session Break

Session3: Scientific System/Data Management, 5:40pm – 6:40pm (Travis Johnston)

5:40pm

Exploration of Workflow Management Systems Emerging Features from Users Perspectives

(Ryan Mitchell, Loic Pottier, Steve Jacobs, Rafael Ferreira da Silva, Mats Rynge, Karan Vahi, and Ewa Deelman)

6:00pm

Empowering Agroecosystem Modeling with HTC Scientific Workflows: The Cycles Model Use Case

(Rafael Ferreira da Silva, Rajiv Mayani, Yuning Shi, Armen R. Kemanian, Mats Rynge, and Ewa Deelman)

6:20pm

Evaluating Scientific Workflow Engines for Data and Compute Intensive Discoveries

(Rina Singh, Jeffrey Graves, Sreenivas Sukumar, and Valentine Anantharaj)

6:40pm

Wrap up: What’s next?

(Matt Lee/Travis Johnston)

Workshop Primary Contact

Sangkeun (Matt) Lee, Computational Data Analytics Group, Computer Science and Mathematics Division, Oak Ridge National Laboratory, TN, USA. Tel: +1 865 574 8858 Email: lees4@ornl.gov

Google Sites

Report abuse