Interactive Data Analytics

Interactive Data Analytics conducts research and development on visual analysis techniques that allow for effective and interactive knowledge discovery operations. In this research, we target not only data defined on a continuous physical space, such as simulation data, but also data defined on an abstract discontinuous space for analysis. In the following, we provide an overview of some of the visual analysis studies we have undertaken, particularly for multivariate time series data.

Angular-based Edge Bundled Parallel Coordinates Plot for the Visual Analysis of Large Ensemble Simulation Data

With the continuous increase in the computational power and resources of modern high-performance computing (HPC) systems, large-scale ensemble simulations have become widely used in various fields of science and engineering, and especially in meteorological and climate science. It is widely known that the simulation outputs are large time-varying, multivariate, and multivalued datasets which pose a particular challenge to the visualization and analysis tasks. In this work, we focused on the widely used Parallel Coordinates Plot (PCP) to analyze the interrelations between different parameters, such as variables, among the members. However, PCP may suffer from visual cluttering and drawing performance with the increase on the data size to be analyzed, that is, the number of polylines. To overcome this problem, we present an extension to the PCP by adding Bézier curves connecting the angular distribution plots representing the mean and variance of the inclination of the line segments between parallel axes. The proposed Angular-based Parallel Coordinates Plot (APCP) is capable of presenting a simplified overview of the entire ensemble data set while maintaining the correlation information between the adjacent variables. To verify its effectiveness, we developed a visual analytics prototype system and evaluated by using a meteorological ensemble simulation output from the supercomputer Fugaku.

Related papers

 A Visual Analytics Method for Time-Series Log Data Using Multiple Dimensionality Reduction

The size and complexity of supercomputer systems and their power and cooling facilities have continuously increased, thus posing additional challenge for long-term and stable operation. Supercomputers are shared computational resources and usually operate with different computational workloads at different locations (space) and timings (time). Better understanding of the supercomputer system's heat generation and cooling behavior is highly desired from the facility operational side for decision making and optimization planning. In this work, we present a dimensionality reduction-based visual analytics method for time-series log data, from supercomputer system and its facility, to capture characteristic spatio-temporal features and behaviors during the operation.

Related papers

Visual Analytics for Failure Cause Identification on HPC Systems

Large-scale scientific computing facilities usually operate expensive HPC (High Performance Computing) systems, which have their computational and storage resources shared with the authorized users. On such shared resource systems, a continuous and stable operation is fundamental for providing the necessary hardware resources for the different user needs, including large-scale numerical simulations, which are the main targets of such large-scale facilities. For instance, the K computer installed at the R-CCS (RIKEN Center for Computational Science), in Kobe, Japan, enables the users to continuously run large jobs with tens of thousands of nodes (a maximum of 36,864 computational nodes) for up to 24 hours, and a huge job by using the entire K computer system (82,944 computational nodes) for up to 8 hours. Critical hardware failures can directly impact the affected job, and may also indirectly impact the scheduled subsequent jobs. To monitor the health condition of the K computer and its supporting facility, a large number of sensors has been providing a vast amount of measured data. Since it is almost impossible to analyze the entire data in real-time, these information has been stored as log data files for post-hoc analysis. In this work, we propose a visual analytics system which uses these big log data files to identify the possible causes of the critical hardware failures. 

Related Papers

Visual Analytics for Cell Division Dynamics

In order to elucidate the developmental mechanisms of multicellular organisms, it is important to quantify the spatiotemporal features (phenotypic characteristics) of cells appearing during cell division and to analyze their relationships (correlations). Many analytical techniques have been proposed, including graph visualization technology. However, in addition to specifying interesting characteristics from large data, obtaining biological interpretations is difficult and time-consuming. To solve such problems, we are developing a visual analysis system that enables exploratory analysis by linking the phenotypic characteristics of nematodes to the spatiotemporal shape of the cell nucleus. 

Related Papers