IEEE VIS 2019 Tutorial: Statistical Data Representation, Visualization, and Uncertainty Analysis

Efficient analysis and visualization of data using statistical methods have benefited the visualization community for many years. As the size of data grows rapidly, researchers are increasingly relying on techniques aiming primarily at the efficient identification and analysis of regions that are characterized by the presence of characteristic features, instead of looking at the data at its entirety. Using statistical distributions, statistical characteristics of data can be compactly represented, efficiently analyzed and visualized. Recent developments have demonstrated the broad applicability of statistical distributions in data visualization by introducing novel stochastic algorithms and addressing important problems such as feature identification, extraction, and tracking; multi-variable relationship exploration; query-driven visualization; in-situ data summary, and many more.

Besides being able to compactly represent statistical data properties, a key advantage of statistical distribution-based data analysis techniques is the ability to quantify uncertainty during visualization. Uncertainty-aware visualization algorithms developed using statistical methods and distribution-based data representations can successfully communicate the trustworthiness of the visual representation to the application scientists so that the scientists can draw meaningful conclusions from the visualization results. As we step into the era of big data, the relevance of statistical distribution-based methods has become even more prominent since statistical distributions can be used to generate compact data summaries, which are significantly smaller than the full-resolution raw data and such data triage can be performed in situ. As a result, a variety of visualization applications using statistical distributions has been developed which evidently indicate that such uncertainty-aware statistical methods will provide a promising path forward in the future.

Considering the aforementioned benefits of statistical methods in data visualization, we propose to organize a half-day tutorial on statistical data representation, visualization, and uncertainty analysis. The tutorial will highlight concepts related to general statistical methods of data visualization, with a focus on statistical distribution-based techniques. A comprehensive discussion on the state of the art of uncertainty-aware visualization algorithms using distributional data will be conducted, serving as a basis for audiences interested in research in statistical data representation and processing, with applications in data analysis, visualization, and uncertainty quantification. We will systematically introduce different categories of visual-analytics algorithms that use and benefit from statistical methods. In addition, concepts and applications of uncertainty-aware visualization techniques will be presented. Finally, the latest research trends and applications utilizing statistical data representations will be discussed. The tutorial will conclude, by highlighting future scopes and open problems that have to be solved for the further advancement of statistically supported methods of data visualization.

Organizers

Soumya Dutta, Staff Scientist, Los Alamos National Laboratory

Hanqi Guo, Assistant Computer Scientist, Argonne National Laboratory

Hans-Christian Hege, Professor, Zuse Institute Berlin

Han-Wei Shen, Professor, The Ohio State University

Schedule

Introduction (10 minutes)
Talk 1: Han-Wei Shen (90 minutes)
Coffee Break (20 minutes)
Talk 2: Hanqi Guo (60 minutes)
Conclusion (10 minutes)

Talks

1. Statistics-Based Data Processing and Analytics (Speaker: Prof. Han-Wei Shen) [slides]

Scientists overview and identify regions of interest by transforming data into compact information descriptors that characterize simulation results and allow detailed analysis on demand. Among many existing feature descriptors, statistical information derived from data samples is a promising approach to taming the big data avalanche because data distributions computed from a population can compactly describe the presence and characteristics of salient data features with minimal data movement. The ability to computationally model, summarize, and process data in situ using statistical distributions also provides an efficient and representative capture of information that can adjust to the size and resource constraints, with the added benefit that uncertainty associated with the results can be quantified and communicated. In this talk, several recent works will be discussed that use probability distributions as a new paradigm for in situ data summarization as well as a flexible approach of post-hoc statistical data analytics for large-scale scientific data sets. The goal is to ensure that the application scientists can easily obtain an overview of the entire data set regardless of the size of the simulation output; understand the characteristics and locations of features; easily interact with the data and select regions and features of interest, and perform all the analysis tasks with a small memory footprint. Besides that, this talk will present a set of stochastic data visualization algorithms that leverage statistical methods and probability distributions for performing efficient data exploration. Visualization community has developed a wide range of data analysis algorithms that either transform the data into statistical distributions to perform uncertainty-aware feature analysis, tracking or utilize data distributions as a core step in the analysis pipeline to answer probabilistic data queries and perform robust data classification. The goal of the talk is to encompass such a broad class of important and fundamental visualization tasks and show how each of the tasks benefits from stochastic algorithms.

Bio of the speaker:

Han-Wei Shen is a full professor at The Ohio State University. He received his BS degree from Department of Computer Science and Information Engineering at National Taiwan University in 1988, the MS degree in computer science from the State University of New York at Stony Brook in 1992, and the PhD degree in computer science from the University of Utah in 1998. From 1996 to 1999, he was a research scientist at NASA Ames Research Center in Mountain View California. His primary research interests are scientific visualization and computer graphics. Professor Shen is a winner of National Science Foundation's CAREER award and US Department of Energy's Early Career Principal Investigator Award. He also won the Outstanding Teaching award twice in the Department of Computer Science and Engineering at the Ohio State University.

2. Applications of Statistical Data Visualization and Uncertainty Analysis (Speaker: Dr. Hanqi Guo) [slides]

This talk will present applications of statistical distribution-driven visualizations in understanding uncertain unsteady flows. The goal of this study is to understand transport behavior in uncertain time-varying flow fields by redefining the finite-time Lyapunov exponent (FTLE) and Lagrangian coherent structure (LCS) as stochastic counterparts of their traditional deterministic definitions. Three new concepts are introduced: the distribution of the FTLE (D-FTLE), the FTLE of distributions (FTLE-D), and uncertain LCS (U-LCS). The D-FTLE is the probability density function of FTLE values for every spatiotemporal location, which can be visualized with different statistical measurements. The FTLE-D extends the deterministic FTLE by measuring the divergence of particle distributions. It gives a statistical overview of how transport behaviors vary in neighborhood locations. The U-LCS, the probabilities of finding LCSs over the domain, can be extracted with stochastic ridge finding and density estimation algorithms. In addition, this talk will also cover scalability issues of estimating uncertain transport behaviors---stochastic flow maps (SFMs)---for visualizing and analyzing uncertain unsteady flows. Computing flow maps from uncertain flow fields are extremely expensive because it requires many Monte Carlo runs to trace densely seeded particles in the flow. To reduce the computational cost for this problem, we perform parallelization over tasks---packets of particles in our design---to achieve high efficiency in MPI/thread hybrid programming. Such a task model also enables CPU/GPU co-processing. We will show the scalability on two supercomputers, Mira (up to 256K Blue Gene/Q cores) and Titan (up to 128K Opteron cores and 8K GPUs), that can trace billions of particles in seconds.

Bio of the speaker:

Hanqi Guo is an assistant computer scientist in the Mathematics and Computer Science Division at Argonne National Laboratory. He received his PhD degree in computer science from Peking University in 2014, and the BS degree in mathematics and applied mathematics from Beijing University of Posts and Telecommunications in 2009. His research interests are mainly on uncertainty visualization, flow visualization, and large-scale scientific data visualization.

Page updated

Google Sites

Report abuse