Workshop

다중척도방법론 워크샵(2024)

소개

다중척도연구실 구성원과 졸업생들이 인적, 학문적 교류를 하기 위한 워크샵을 개최한다. 이를 통해 서로의 연구 주제에 대해 이해하고 논의해볼 수 있을 것으로 기대한다. 각자의 연구 주제에 대해 15~40분 정도 발표하고, 질의응답을 진행한다. 주요 주제는 다중척도방법, 비유클리드 통계학, 함수형 자료 분석, 그래프 신호 처리이다.

일시 및 장소


프로그램 및 일정

2월 22일 ()

현장등록 [13:00~13:20]

Opening Remark: 오희석 [13:20~13:30]

Session 1: Functional Data Analysis and Ocean Science Analysis (Chair: 승우) [13:30~14:50]

The use of quantile curves to extend the distribution scope of function data in functional data clustering (김준표, 세종대학교 수학통계학과) [13:30~14:10]

In this talk, we propose a novel clustering method of functional data that significantly extends the distribution scope of functional data. Most existing methods have been developed for a specific distribution structure of functional data and, therefore, only provide good results if the distribution assumptions are met. In addition, most methods based on the basis expansion approach use only one curve representing the centrality of the data as a clustering input, which prevents these methods from performing well when clustering functional data with more complex distribution structures. To solve this problem, we consider several types of curves, including mean and quantile curves, as input variables for clustering, which might better understand the distribution of the data. Moreover, we apply the concept of sparse clustering to several curve types, resulting in good clustering performance if at least one curve type can divide the data well into several subgroups. Results from numerical experiments and real data analysis are also presented well.


Keywords: Functional principal component analysis, Functional data clustering, Non-Gaussianity, Quantile curve

Multivariate Functional Partial Least Squares regression (오승희, 서울대학교 통계학과) [14:10~14:35]

The PLS regression of multivariate functional data X with real-valued response Y is considered. Unlike principal component analysis (PCA), the partial least square (PLS) approach obtained PLS components using the relationship between predictors and the response. For computational aspects, I used a relationship between the PLS regression with univariate functional data (FPLS) and the PLS regression with multivariate functional data (MFPLS). Simulation studies compare the performances of our approaches with previous methods. 


Keywords: Multivariate functional data analysis, Supervised learning, Regression, Partial least squares regression

Marine Heat Wave in the East Sea of Korea: Spatio-Temporal Characteristics and its Future (김주연, 세종대학교 응용통계학과) [14:35~14:50]

Recently, due to the climate change and global warming, interests in marine heatwave (MHW) are increasing, which denotes the phenomenon that sea surface temperature (SST) becomes extraordinarily high. However, there are only few studies on MHW occurred in Korea. This talk presents spatial and temporal characteristics of SST of the East Sea. We group MHW events into several clusters via K-means clustering using the strength, frequency, and duration of the events as features. In addition, vertical distribution of the temperature is also presented based on ESROB data observed in the southeastern area. We demonstrate that MHW events become more frequent and their durations are also increasing. Furthermore, strength of MHW occurred in summer or winter is also growing. Finally, we obtain return period and return level by implementing the extreme value theory. This work was exhibited at the 3rd Ocean Science Big Data Contest.


Keywords: Marine Heatwaves, K-means clustering, Extreme Value Analysis

Coffee Break [14:50~15:20]

Session 2: Graph Signal Processing (Chair: 김준표) [15:20~16:40]

Quantile-based fitting for graph signals (김규순, 서울대학교 통계학과) [15:20~16:00]

We propose a quantile based fitting method for analyzing graph signals. Unlike traditional approaches for data fitting such as smoothing splines and quantile smoothing splines working on Euclidean space, the proposed method is designed for graph domain, considering the inherent graph structure. In contrast to prevalent graph signal denoising methods that rely on optimization problem with L2-norm fidelity, our approach provides denoised signals that are robust to the existence of outliers, and identifies varying structural relationships within graph signals. We validate the efficacy of our method through comprehensive simulation studies and real data analysis.


Keywords: Denoising, Graph Signal, Quantiles, Regularization

Statistical Graph Empirical Mode Decomposition by Graph Denoising and Boundary Treatment (조형래, 서울대학교 통계학과) [16:00~16:40]

This study proposes a new decomposition method for graph signals (or graph-valued data), termed `statistical graph empirical mode decomposition (SGEMD),' which adopts graph denoising and boundary treatment. The main contribution of SGEMD is that it extends the scope of decomposition to noisy graph signals that cannot be efficiently handled by existing graph decomposition methods, such as graph Fourier-based decomposition and graph empirical mode decomposition. The proposed SGEMD can efficiently decompose various graph signals into several components without distortions and achieve stable decomposition results near the boundaries. Finally, the effectiveness of SGEMD is demonstrated through simulation experiments and real data analysis.


Keywords: Graph signal, Graph Fourier transform, Graph empirical mode decomposition, Graph denoising, Boundary treatment

저녁 프로그램 [19:00~21:00]

다중척도연구실 역사 및 진로탐색 (박민수, 충남대학교 정보통계학과)

다중척도연구실의 역사소개 및 졸업생 진로 현황, 그리고 통계학 대학원 졸업 후 진로에 대한 세미나를 진행합니다. 

2월 23일 (금)

Session 3: Non-Euclidean Data Analysis (Chair: 권영) [9:10~10:30]

On a Notion of Graph Centrality Based on $L_1$ Data Depth (강승우, 서울대학교 통계학과) [9:10~9:50]

A new measure to assess the centrality of vertices in an undirected and connected graph is proposed. The proposed measure, $L_1$ centrality, can adequately handle graphs with weights assigned to vertices and edges. The study provides tools for graphical and multiscale analysis based on the $L_1$ centrality. Specifically, the suggested analysis tools include the target plot, $L_1$ centrality-based neighborhood, local $L_1$ centrality, multiscale edge representation, and heterogeneity plot and index. Most importantly, our work is closely associated with the concept of data depth for multivariate data, which allows for a wide range of practical applications of the proposed measure. Throughout the paper, we demonstrate our tools with two interesting examples: the Marvel Cinematic Universe movie network and the bill cosponsorship network of the 21st National Assembly of South Korea. An R package L1centrality, available from the Comprehensive R Archive Network (CRAN), provides all methods and data sets used in this paper.


Keywords: Graph centrality, Data depth, Multiscale analysis, Visualization, Network data

Statistics with the boundary at infinity on Hadamard manifolds (신하영, 서울대학교 통계학과) [9:50~10:30]

This presentation is an outline of my doctoral dissertation. When doing statistics with metric space-valued data, we sometimes need a notion of direction. This exists in Euclidean space, but not necessarily on general metric spaces. On Hadamard spaces (also known as spaces of global non-positive curvature or complete CAT(0) spaces), the so-called boundary at infinity gives a canonical sense of direction at each point, which can be used to do statistical inference on these spaces. I will detail how this property can be used to define quantiles on Hadamard spaces and introduce some large-sample properties of sample quantiles. The presentation concludes with a discussion on further statistical applications of this boundary at infinity, including expectiles on Hadamard spaces and treatment effects on Hadamard spaces.


Keywords: Geometric statistics

Session 4: Tensor and Time Series Data Analysis (Chair: 박민수) [10:40~11:50]

Robust Penalized Rank-One Tensor Approximation (권영욱, 서울대학교 통계학과) [10:40~11:20]

Many examples of tensor data have smooth aspects in one or more of the modes, and often contain outliers beyond the level of noise. Therefore, considering the smoothness and robustness simultaneously for tensor decomposition helps to identify continuously varying factors that are less sensitive to the contamination of the data. We present a novel rank-one approximation approach incorporating these two viewpoints in the objective of CP decomposition model. Our formulation adopts a robust loss and penalization in each mode of the tensor to ensure robustness and smoothness. We develop an iterative reweighted least square algorithm as an extension of Zhang et al. (2013). Furthermore, we increase the flexibility of our models by applying spline constraints to each factor of the tensor. We demonstrate the advantages of our methods over non-robust or non-smooth decomposition methods via simulation studies and an analysis of Korean sea weather data.


Keywords: Low-rank approximation, CP decomposition, Smoothness, Robustness

An adjusted boxplot for skewed distribution (정수빈, 충남대학교 통계학과) [11:20~11:35]

The boxplot visualizes the distribution of continuous unimodal data, providing information about the location, spread, skewness, and tails of the data. However, when the data are skewed, conventional observations often exceed the fence and are erroneously classified as outliers. 

 In this study, a method for adjusting the boxplot is proposed, incorporating a robust measure of skewness in the determination of the fence. This allows for a more accurate representation of outliers in the data. As a result, this adjusted boxplot can be used as a fast and automatic outlier detection tool without making parametric assumptions about the distribution of the bulk of the data. Several examples and simulation results show the advantages of this new procedure.

 Furthermore, we propose an outlier detection algorithm suitable for time series data. Seasonal and trend patterns are decomposed using LOESS (local regression). The remaining residuals are then utilized for outlier detection using an adjusted boxplot that can handle asymmetrical distributions. We introduce a new outlier detection algorithm for time series data, expected to be effective in identifying outliers in nonlinear, non-stationary, and asymmetric time series data with trends and seasonal variations.


Keywords: Outlier detection, Medcouple, LOESS, Skewness

Short-term load forecasting using EMD-LSTM neural networks with a XGBoost algorithm for feature importance evaluation
(고민규, 충남대학교 통계학과) [11:35~11:50]

Predicting electricity consumption is an important issue with regard to stable and efficient electricity supply. Therefore, in this study, one time series data for electricity was divided into days with similar characteristics, and more accurate prediction was achieved by applying LSTM to major trends in each divided data set. I would like to discuss the overall introduction to this method and the expandable parts.


Keywords: Extreme gradient boosting, Similar day, Empirical mode decomposition, Long short-term memory neural networks

Closing Remark: 오희석 [11:50~12:00]

Photos

Slides