The First International Workshop on Big Data Tools, Methods, and Use Cases for Innovative Scientific Discovery (BTSD) 2019

in conjunction with 2019 IEEE International Conference on Big Data (IEEE BigData 2019)

December 9-12, 2019 @ Los Angeles, CA, USA

Workshop Date/Time: December 9, 1:30pm -6:30pm

Call for Papers

Program Co-chairs

  • Sangkeun (Matt) Lee, Computational Data Analytics Group, Computer Science and Mathematics Division, Oak Ridge National Laboratory,
  • Travis Johnston, Computational Data Analytics Group, Computer Science and Mathematics Division, Oak Ridge National Laboratory,

Organizers’ Background

  • Sangkeun (Matt) Lee received his Ph.D. degree in computer science and engineering from Seoul National University in 2012. He is currently an R&D Associate at Computational Data Analytics Group of Computer Science and Mathematics Division. He has been studying big data, data science, and machine learning and applied the state-of-the-art data analysis technologies in many application domains. He has developed a number of data analytics software, and one of his developed software, ORiGAMI has won the 2016 DOE R&D 100 Award. He has been contributing many of leading computer science conferences and journals such as ACM WWW, ACM RecSys, Expert Systems with Applications. For last few years, he has collaborated with a scientists across various domains including material science, nuclear science, and mechanical engineering and published papers in science journals such as Journal of Nuclear Materials, Acta Materialia, The Electricity Journal, Advanced Theory and Simulations.
  • Travis Johnston received his Ph.D. in mathematics from the University of South Carolina in 2014. He is currently a Staff Research Scientist for Artificial Intelligence (AI) in High Performance Computing (HPC) at Oak Ridge National Laboratory. He has a highly interdisciplinary background having worked with physicists (studying ferroelectric polymers, quantum computing, neutron scattering, neutron powder diffraction, and electron microscopy), materials scientists specializing in alloy design, computational chemists (protein folding and other molecular dynamic simulations), medical researchers (cancer pathology), and others (geographers, climate scientists, and engineers). For his recent work on MENNDL (an evolutionary algorithm to design neural networks tailored to unique scientific datasets) he was a Gordon Bell finalist (SC18). He has been on the program committee for several conferences and workshops including, IEEE Cluster, Supercomputing (SC), and Machine Learning in High Performance Computing Environments (MLHPC, Workshop).

Introduction to Workshop

  • Advances in big data technology, artificial intelligence, and machine learning have created so many success stories in a wide range of areas, especially in industry. These success stories have been motivating scientists, who study physics, chemistry, materials, medicine and many more, to explore a new pathway of utilizing big data tools for their scientific activities.
  • However, there are barriers to overcome. Most existing big data tools, systems, and methodologies have been developed without considering scientific purposes or scientists’ specific requirements. They are not originally developed for scientists who have no or little knowledge of programming or computer science. On the other hand, for computer scientists, understanding the domain problem is often very challenging due to the lack of enough background knowledge.
  • We expect that big data technologies can play a great role in contributing to scientific innovation in many ways. There are already a lot of ongoing scientific projects around the world that aim to discover novel hypotheses, analyze big multidimensional data which couldn’t be handled by manually, and reduce the time required by complex calculations via machine. This workshop intends to bring domain scientists and computer scientists together while exploring and extending opportunities in the development of big data tools, systems, and methodologies for scientific discovery, to share success stories and lessons learned, and discuss challenges, which if overcome would enable successful collaboration across different domains, especially domain scientists and computer/data scientists.
  • In this workshop, we discuss the following questions:
    • What makes big data tools for scientists different from the existing tools?
    • What specific needs and challenges do domain scientists face when they try to adopt big data tools?
    • How can computer scientists and domain scientists communicate to define a feasible problem together?
    • What are the barriers of using big data for scientific discovery and how do these barriers differ in different science domains?

Research Topics Included in the Workshop

  • Big data tools, systems, and methods related to, but not limited to:
    • Scientific data processing
    • Artificial intelligence/Deep neural networks/Machine learning
    • Text mining/Graph mining
    • Database/Query processing/Query Optimization
    • Parallel computation/High Performance Computing
    • Visualization/User Interface/HCI
    • Parallelization/Performance/Scalability
    • High Performance Computing …
  • that facilitate innovation and discovery in a scientific domain, such as:
    • Physics
    • Chemistry
    • Material science
    • Mechanical engineering
    • Nuclear engineering
    • Biomedical science …
  • Use cases, success stories, lessens learned in scientific discovery using big data tools, systems, and methods

Program Committee Members

  • Tom Potok, Oak Ridge National Laboratory, potokte at
  • Da Yan, University of Alabama Birmingham, yanda at
  • Sisi Duan, University of Maryland, Baltimore County, sduan at
  • Feng Bao, Florida State University, fbao at
  • Kyunghyun Cho, New York University, kyunghyun.cho at
  • Minsuk Kahng, Georgia Tech, kahng at
  • Youngjae Kim, SOGANG University, Seoul, Republic of Korea, youkim at
  • Kangil Kim, GIST(Gwangju Institute of Science & Technology), at
  • Ramakrishnan Kannan, Oak Ridge National Laboratory, kannanr at
  • Sreenivas Rangan Sukumar, Cray Inc., ranganutk at
  • Seungha Shin, University of Tennessee, sshin at
  • Michael R Wyatt, University of Tennessee, mrwyattii at

Paper Submission

Please submit a short paper (up to 4 page IEEE 2-column format) or full paper (up to 8 page IEEE 2-column format) through the online submission system.

Paper Submission Page

Papers should be formatted to IEEE Computer Society Proceedings Manuscript Formatting Guidelines (see link to "formatting instructions" below).

Formatting Instructions

8.5" x 11" (DOC, PDF)

LaTex Formatting Macros

Important Dates

  • Oct 11, 2019 (Extended) : Due date for abstract submission
  • Oct 17, 2019 (FINAL) Due date for short/full workshop papers submission
  • Nov 4, 2019: Notification of paper acceptance to authors
  • Nov 15, 2019: Camera-ready of accepted papers
  • Dec 9, 2019: Workshop Day (1:30pm - 7:00pm)


The Westin Bonaventure, LA.

San Pedro

  • If you are a presenter, please find workshop chairs (Matt Sangkeun Lee or Travis Johnston) at the San Pedro before the session you're presenting.



Presentation (15 minutes) & Q/A (5 minutes)

Session1: Scientific Applications, 1:50pm – 3:40pm (Travis Johnston)


Welcome: Where are we at? What is missing?

(Matt Lee/Travis Johnston)


Identifying Time Series Similarity in Large-Scale Earth System Datasets

(Payton Linton, William Melodia, Alina Lazar, Deborah Agarwal, Ludovico Bianchi, Devarshi Ghoshal, Kesheng Wu, Gilberto Pastorello, and Lavanya Ramakrishnan)


Learning to Predict Material Structure from Neutron Scattering Data

(Cristina Garcia Cardona, Ramakrishnan Kannan, Travis Johnston, Thomas Proffen, Katharine Page, and Sudip Seal)


Realistic Transport Simulation: Tackling the Small Data Challenge with Open Data

(Guimu Guo, Jalal Majed Khalil, Da Yan, and Virginia Sisiopiku)


Information Extraction from Cancer Pathology Reports with Graph Convolution Networks for Natural Language Texts

(Hong-Jun Yoon, John Gounley, M. Todd Young, and Georgia Tourassi)


Machine Learning for Prediction of Mid to Long Term Habitual Transportation Mode Use

(Alina Lazar, Alexandra Ballow, Ling Jin, C. Anna Spurlock, Alex Sim, and Kesheng Wu)


Coffee Break

Session2: Scientific Tools & Methods, 4:00pm – 5:20pm (Matt Lee)


Clustered Latent Dirichlet Allocation for Scientific Discovery

(Christopher Gropp, Alexander Herzog, Ilya Safro, Paul Wilson, and Amy Apon)


Quantum Grover search-based optimization for innovative material discovery

(Sima Esfandiarpour Borujeni, Ramkumar Harikrishnakumar, and Saideep Nannapaneni)


Detecting Dependency Between Discrete Random Variables and Application

(Edgar Llamas, Ivan García, and Andrés Méndez)


Visualization System for Evolutionary Neural Networks for Deep Learning

(Junghoon Chae, Catherine Schuman, Steven Young, Travis Johnston, Derek Rose, Robert Patton, and Thomas Potok)


Session Break

Session3: Scientific System/Data Management, 5:40pm – 6:40pm (Travis Johnston)


Exploration of Workflow Management Systems Emerging Features from Users Perspectives

(Ryan Mitchell, Loic Pottier, Steve Jacobs, Rafael Ferreira da Silva, Mats Rynge, Karan Vahi, and Ewa Deelman)


Empowering Agroecosystem Modeling with HTC Scientific Workflows: The Cycles Model Use Case

(Rafael Ferreira da Silva, Rajiv Mayani, Yuning Shi, Armen R. Kemanian, Mats Rynge, and Ewa Deelman)


Evaluating Scientific Workflow Engines for Data and Compute Intensive Discoveries

(Rina Singh, Jeffrey Graves, Sreenivas Sukumar, and Valentine Anantharaj)


Wrap up: What’s next?

(Matt Lee/Travis Johnston)

Workshop Primary Contact

  • Sangkeun (Matt) Lee, Computational Data Analytics Group, Computer Science and Mathematics Division, Oak Ridge National Laboratory, TN, USA. Tel: +1 865 574 8858 Email: