다중척도연구실 구성원과 졸업생들의 연구실 구성원이 인적, 학문적 교류를 하기 위한 워크샵을 개최한다. 이를 통해 서로의 연구 주제에 대해 이해하고 논의해볼 수 있을 것으로 기대한다. 각자의 연구 주제에 대해 20~40분에 걸쳐 발표하고, 질의응답을 진행한다.
2026년 1월 8일(목) 오후 1시 ~ 1월 10일(토) 오후 12시
서울대학교 시흥캠퍼스 컨벤션센터 계단식강의실 609호
8일 저녁: 건강밥상심마니 배곧점 (18:00)
9일, 10일 조식: 서울대학교 시흥캠퍼스 에스라운지(S-LOUNGE)
9일, 10일 중식: 서울대학교 시흥캠퍼스 내부 (정확한 장소는 추후 공지)
9일 저녁: 투파인드피터 배곧점 (17:30)
Elastic-Band Transform–Integrated Spatio-Temporal Graph Neural Networks for Solar Radiation Forecasting (최규빈, 전북대학교 통계학과) [13:30~14:10]
This study presents a forecasting framework that simultaneously captures the strong periodicity and irregular meteorological fluctuations inherent in solar radiation time series. Existing approaches typically define inter-regional correlations using either simple correlation coefficients or distance-based measures when applying spatio-temporal graph neural networks (STGNNs). However, such definitions can lead to spurious correlations due to the dominance of periodic structures. To address this issue, we adopt the Elastic-Band Transform (EBT) to decompose solar radiation into periodic and amplitude-modulated components, which are then modeled independently with separate graph neural networks. The periodic component, characterized by strong nationwide correlations, is learned with a relatively simple architecture, whereas the amplitude-modulated component is modeled with more flexible STGNNs that capture climatological similarities between regions. The predictions from the two components are subsequently recombined to produce final forecasts that reflect both periodic patterns and aperiodic variability. The proposed framework is evaluated with multiple STGNN architectures, and experimental results show improved predictive accuracy and interpretability compared with conventional methods.
Keywords: Spatio-temporal graph neural network (STGNN), Elastic-band transform (EBT), Solar
Graph Canonical Coherence Analysis (김규순, 숭실대학교 정보통계보험수리학과) [14:10~14:50]
We propose a graph canonical coherence analysis, a new framework that extends canonical correlation analysis to multivariate graph signals in the graph frequency domain.
The method addresses the challenges arising from the distinctive characteristics of graphs: discreteness, finiteness, and irregularity. It identifies pairs of canonical graph signals that maximize their coherence, enabling the investigation of relationships between two sets of graph signals from a spectral perspective. This framework reveals how such associations vary across different structural scales of the graph. A real data application to economic and energy datasets from the G20 countries is provided.
Keywords: Canonical correlation analysis, Frequency domain, Graph signal processing, Multivariate graph signal
A Data-Driven Geometric Metric for Long-Range Interactions in 3D Hi-C Data (정윤채, 충남대학교 통계데이터사이언스학과) [14:50~15:10]
Hi-C based three dimensional genome reconstructions offer a geometric view of chromatin organization, and contact intensities provide complementary information on local interaction patterns. To characterize large scale structure, we introduce a data driven geometric metric that summarizes the relation between spatial separation in reconstructed three dimensional space and linear genomic distance along the polymer. The resulting scale free quantity highlights long range pairs with unusually small spatial separation and remains stable across multiple binning resolutions without reliance on parametric decay functions. Applied to cancer associated Hi-C datasets, the metric uncovers distinct patterns of long range structural organization that escape detection through contact intensities alone. These findings show that the proposed measure captures features of chromatin architecture that enrich the analysis of three dimensional genome structure within a statistically interpretable framework.
Keywords: Hi-C, chromatin structure, Distance ratio metric, Long-range interactions, Cancer genomics
Multiview Representation and Clustering of Functional Data (강승우, 서울대학교 기초과학연구원) [15:30~16:10]
The problem of clustering functional data from a multiple point of view is addressed. I present a latent space representation of random functions and statistical methods to obtain representative curves and cluster functional data within this framework. The contribution of this research is three-fold: (i) development of ways to cluster and (optimally) represent a functional dataset from a multiple point of view, (ii) proposal of a new separable Hilbert space structure for warping functions, and (iii) theoretical treatment of this new representation and clustering algorithm. I will demonstrate the work using synthetic and real datasets.
Keywords: Clustering, Functional data, Registration, Dimension reduction
Anomaly Detection Followed by Fault Type Classification of Functional Data on the Semiconductor Manufacturing Process (김민주, 한양대학교 응용통계학과) [16:10~16:30]
This study proposes a two-stage analytical framework that interprets the wafer sensor time-series data collected from semiconductor etching processes as functional data, integrating anomaly detection and fault-type classification. In the first stage, instead of the conventional functional logistic regression based on segment-wise mean levels, a slope-based quadratic logistic regression model is introduced. To this end, the effective on-region is automatically detected, and dynamic programming (DP) is used to perform minimum sum-of-squared error (SSE) segmentation, dividing each signal into five linear segments from which slopes are extracted as representative features. Subsequently, three types of simulated datasets are used to compare the performance of the proposed method with that of the comparison approach. In the second stage, a one-dimensional convolutional neural network (1D-CNN) is employed to classify abnormal wafers into five integrated fault classes, and mixup-based data augmentation is applied to alleviate the problem of class imbalance. The proposed pipeline shows consistent performance improvements over the comparison method in both stages. In the anomaly detection stage, the proposed model achieved F1-scores=0.9524 (comparison 0.9419) for the training data and F1-scores=0.9474 (comparison 0.9136) for the validation data. Across all three simulated datasets (each repeated 1000 times), the quartiles and medians of F1-scores improved, indicating enhanced robustness. In the fault-type classification stage, the CNN model using the proposed features achieved improvements of approximately 1.4–1.9 percentage points in accuracy, precision, recall, and F1-score (F1=0.9610 vs 0.9416, 95% CI [0.9495, 0.9725]) over 30 repeated evaluations, while misclassification between the Residue and Uniformity classes was substantially reduced. These results demonstrate that the proposed quadratic logistic model more precisely captures the structural variations in the signals and that slope-based features effectively reflect class-specific characteristics in the CNN stage. In conclusion, the proposed two-stage framework—comprising quadratic functional logistic regression and 1D-CNN—improves both accuracy and stability in anomaly detection and fault-type classification, providing practical potential for early warning and root-cause diagnosis in real semiconductor manufacturing lines.
Keywords: Semiconductor manufacturing, Anomaly detection, Functional logistic regression, Dynamic programming, 1D-CNN
Dual Homotopy EM (최지수, 서울대학교 통계학과) [16:30~17:10]
The expectation-maximization (EM) algorithm is a powerful tool for parameter estimation involving latent variables; however, in constrained parameter settings, its performance is degraded by challenges in the M-step. These challenges include instability, sensitivity to initialization, and biased estimates near constraint boundaries. For example, survival data in reliability engineering, which often exhibit phase dependent hazard rates, are typically modeled by the bathtub curve. However, this structure involves a mixture of Weibull distributions with parameter constraints that reflect known reliability behavior; therefore, the standard EM does not yield stable parameter estimation in this situation. To overcome these issues, we propose a new algorithm, ‘Dual-Homotopy EM (DH-EM)’, which incorporates homotopy-based continuation strategies in both E-step and M-step to improve robustness and constraint handling. In DH-EM, the E-step employs an entropy-regularized formulation inspired by the deterministic annealing EM algorithm to stabilize latent variable inference, while the M-step uses a barrier method to ensure feasibility under parameter constraints. Through numerical experiments including real data analysis, the proposed DH-EM provides stable and interpretable estimates, establishing it as a principled solution for constrained estimation and change-point detection in structured survival models.
Keywords: Constrained parameter estimation, Deterministic Annealing EM algorithm, Barrier method
울산시 도로 네트워크 기반 긴급출동 시공간 핫스팟 분석 (최민석, 한양대학교 응용통계학과) [9:00~9:20]
화재, 구조, 구급과 같은 긴급상황에서 골든타임을확보하는 것은 인명과 재산 피해를 최소화하기 위한 핵심 과제이다. 우리나라 소방 정책에서는 화재 발생 후 약 7분 이내 현장 도착을 골든타임으로 설정하고 있으며, 울산시는 여러 해에 걸쳐 특·광역시 중 골든타임 도착률이 가장 낮은 지역으로 보고되어 왔다. 이러한 배경에서 본 연구는 울산시에서 긴급출동이 시공간적으로 집중되는 도로 구간과 시간대를 선형 네트워크 관점에서 규명하는 것을 목적으로 한다. 먼저 울산시 도로망을 일정 길이의 릭셀(lixel) 단위로 분할한 뒤, 울산 긴급출동 데이터를 이용하여 각 출동 위치를 도로 네트워크 위 점 패턴으로 재구성하였다. 이후 Temporal Network Kernel Density Estimation (TNKDE)을 적용하여 릭셀×시간 셀을 노드로 하는 시공간 인접 구조를 정의하고, Equal-Split Discontinuous (ESD) - TNKDE 기반 Local Indicator of Spatio-Temporal Association (LISTA)와 Monte Carlo 검정을 이용하여 주변보다 출동 강도가 유의하게 높은 클러스터와 낮은 클러스터를 탐지하였다. 이를 통해 울산시 도로 네트워크 상에 특정 시간대에 반복적으로 높은 긴급출동이 집중되는 구간과 상대적으로 안정적인 구간을 구분하여 시각화하였다.
Keywords: 네트워크, 점 패턴, 커널 밀도 추정, 핫스팟
Adaptive Boosting in Linear Networks and Its Application to River Network (임승연, 한양대학교 응용통계학과) [9:20~9:40]
Classification is a supervised machine learning method that predicts a categorical response variable using several explanatory variables. If observations are sampled from a spatial point process, then we can also use x- and y-coordinates as explanatory variables. If the observations are sampled from a known linear network instead of whole space, then the distance between two points is defined differently, and we require a classifier for the linearly clustered data. In this study, we address the classification problem on a tree-shaped linear network. We select a point on the edges in the given linear network to split the space, and then construct a decision tree through recursive splits. We propose an adaptive boosting algorithm using this decision tree as a weak classifier. Finally, we provide some simulated examples and real data analysis, comparing with adaptive boosting based on decision trees constructed using Cartesian coordinates. The proposed method has better accuracy than the comparison method, when the observations are clustered on linear network.
Keywords: Llinear network, Classification, Decision tree, Adaptive boosting, River network
Exploring Distributions of the Data Observed on Several Network Domains (김준표, 세종대학교 수학통계학과) [9:40~10:20]
This talk presents two case studies exploring distributions of the data observed on network domains. First, we explore the level of pollutants on the river stream network, observed along the stream paths. To achieve this goal, we propose a spatio-temporal additive model for the river network data combined with quantile and expectile regression. Second, we explore the number of passengers at each subway station in Seoul, treating it as an observation on the subway network. For this, we adopt methodologies that have been proposed for graph signals, as well as the heavy-snow transform, a multiscale tool.
Keywords: River network, Subway network, Quantile
A Surrogate-based Framework for Modeling Hawkes Processes Under Spatial Uncertainty
(Junhyeon Kwon, Department of Data Analytics and Statistics, University of North Texas) [10:40~11:20]
Hawkes processes are commonly used to capture clustered structures in point pattern data, as they allow each event to elevate the chance of subsequent event occurrences. However, this triggering mechanism is difficult to model accurately when spatial information is measured at varying levels of precision, a situation frequently encountered not only in ecological field studies (e.g., animal observation data with imprecise geolocation) but also in social science applications such as terrorism incident records, where event locations can be reported at varying spatial resolutions. A common strategy is to use only events with the most precise geolocation, but this can lead to both a loss of information and inaccurate estimates of the underlying triggering structure. In this research, we propose a novel framework that retains events with less precise location data by incorporating location-relevant marks as surrogate measures of spatial information. We integrate this surrogate into nonparametric intensity estimation through a modified weighting scheme in the Model-Independent Stochastic Declustering algorithm. Simulation studies verify that the proposed method can recover the triggering structure more accurately than standard approaches. We further illustrate its usefulness with an application to real-world data, demonstrating how the suggested framework can enhance our understanding of space-time clustering by carefully incorporating imprecise events.
Keywords: Hawkes process, Nonparametric intensity estimation, Spatial imprecision, Spatio-temporal point process
Statistical Modeling of Marine Extremes around Korea (박선철, 한양대학교 수학과) [9일, 11:20~12:00]
In this talk, we present two case studies on marine extremes around Korea: extreme wave heights and marine heatwaves (MHWs). For the wave height data, we first examine its statistical characteristics and use an F-madogram–based clustering approach to identify spatial differences between the Yellow Sea and the East Sea. For marine heatwaves in the East Sea, we apply generalized additive models for location, scale, and shape (GAMLSS-type GAMs) to capture nonlinear structures and spatial correlations. Together, these examples demonstrate how clustering methods and flexible distributional regression can enhance our understanding of marine extremes in Korea.
Keywords: Wave heights, Marine heatwaves, F-madogram, Clustering, Generalized additive models
Expectile Factor Model for Panel Data (박세은, 서울대학교 통계학과) [13:30~14:10]
This study introduces an Expectile Factor Model (EFM), a new class of factor models designed to extract common structure specifically at certain expectile levels for large panel datasets. By leveraging the smoothness and tail sensitivity of expectiles, EFM is particularly suited for identifying latent forces associated with extremely high or low levels, as well as uncovering factors that become prominent only in the tails or hidden drivers of rare events. Compared with quantile-based methods, EFM enjoys both improved computational stability and enhanced ability to characterize the extreme deviations, making it a powerful tool for tail-oriented factor analysis.
Keywords: Tail behavior, Extreme shocks, Latent factors
Abnormal Behavior Detection in Unmanned Stores with Low Light Robust Features (임지원, 충남대학교 통계데이터사이언스학과) [14:10~14:30]
Unmanned stores increasingly encounter harmful behaviors such as theft, property damage and smoking, while patrol based or post monitoring responses remain inefficient. This study presents a real time abnormal behavior detection model that combines skeleton based motion information with scene level context. Joint coordinates are extracted to model body movement and a graph based representation is used to learn structural and temporal patterns. Contextual signals related to object movement or smoke are added to strengthen behavioral interpretation. Low light conditions often degrade pose estimation in unmanned stores. To reduce this effect, an illumination aware training strategy uses both normal and darkened images, which improves the stability of joint extraction and supports reliable classification without additional enhancement steps. Experiments show that the proposed model consistently distinguishes normal and abnormal behaviors and achieves real time performance. The integration of motion cues, contextual information and low light robustness provides a practical approach for safety management in unmanned store environments.
Keywords: Abnormal behavior detection, Skeleton representation, Context information, Low light robustness, Unmanned store monitoring
Bayesian Additive Tree Ensembles for Composite Quantile Regressions (임예지, 중앙대학교 응용통계학과) [14:30~15:10]
In this paper, we introduce a novel approach that integrates Bayesian additive regression trees (BART) with the composite quantile regression (CQR) frame- work, creating a robust method for modeling complex relationships between predictors and outcomes under various error distributions. Unlike traditional quantile regression, which focuses on specific quantile levels, our proposed method, composite quantile BART, offers greater flexibility in capturing the entire conditional distribution of the response variable. By leveraging the strengths of BART and CQR, the proposed method provides enhanced predictive performance, especially in the presence of heavy-tailed errors and non-linear covariate effects. Numerical studies confirm that the proposed composite quantile BART method generally outperforms classical BART, quantile BART, and composite quantile linear regression models in terms of RMSE, especially under heavy-tailed or contaminated error distributions.
Keywords: Bayesian additive regression trees, Composite quantile regression, Heavy-tailed errors, Non-linear covariate effects
Quantile-based Graph Empirical Mode Decomposition (하지훈, 서울대학교 통계학과) [15:30~16:00]
Statistical graph empirical mode decomposition (SGEMD) is a data-driven method for decomposing noisy graph signals into intrinsic mode components using graph-based envelope construction. We propose an extension of statistical graph empirical mode decomposition (SGEMD) by replacing its conventional upper and lower envelope construction with a quantile-based fitting approach. While SGEMD provides stable decomposition of noisy graph signals, envelope estimation based on mean-oriented fitting may be sensitive to outliers and distributional heterogeneity. The proposed method leverages quantile-based graph fitting to construct robust envelopes that capture distributional features beyond the mean while respecting the underlying graph structure.
Keywords: Graph denoising, Graph empirical mode decomposition, Quantile, Graph signal
Wavlets Scattering Meet Transformers: Detecting Anomalies in Robotic Trajectories (류하은, 충남대학교 정보통계학과) [16:00~16:20]
Modern robots generate a wealth of multivariate sensor signals, yet detecting when these trajectories go wrong remains challenging. In this work, we propose an unsupervised framework for abnormal trajectory detection using wavelet scattering and Transformer networks. Unlike vision-based approaches, our method focuses purely on trajectory-level sensor dynamics. Using the DROID (Dataset for Robot Instruction and Demonstration), we model normal motion from joint positions and velocities without supervision. The Wavelet Scattering Transform (WST) provides stable, physics-informed representations of temporal–frequency structure, which are then fed into a transformer-based time-series encoder to capture long-range dependencies across sensors and time. The model is trained exclusively on normal trajectories and evaluated through controlled injections of drifts, spikes, and temporal delays. Results indicate that the WST-transformer combination effectively identifies subtle deviations in robotic motion, outperforming baseline unsupervised detectors. The proposed framework offers a promising step toward robust, sensor-level anomaly monitoring and predictive maintenance in autonomous robotic systems.
Keywords: Statistical signal processing, Wavelet scattering transform, Transformer time-series models, Unsupervised anomaly detection, Robotic systems
Penalized Empirical Mode Decomposition with Second-Order Constraints (박민수, 충남대학교 정보통계학과) [16:20~17:00]
Empirical Mode Decomposition (EMD) is widely used for analyzing nonlinear and non-stationary signals, but it lacks a theoretical foundation and is prone to mode mixing and boundary effects. We propose a penalized EMD framework that formulates the extraction of Intrinsic Mode Functions (IMFs) as a regularized optimization problem with second-derivative constraints. This curvature-based penalty enhances mode separation, reduces artifacts, and yields smoother, more interpretable IMFs. The method extends naturally to two-dimensional signals via Laplacian regularization. Simulation studies and real-world applications demonstrate improved robustness and decomposition quality over classical EMD, and theoretical properties regarding identifiability and convergence are also established.
Keywords: Empirical mode decomposition, Penalized optimization, Second-order smoothness constraint, Signal decomposition
Huber Means on Riemannian Manifolds (이종민, 부산대학교 통계학과) [9:00~9:40]
This article introduces Huber means on Riemannian manifolds, providing a robust alternative to the Fréchet mean by integrating elements of both absolute and quadratic loss functions. The Huber means are designed to be highly resistant to outliers while maintaining efficiency, making it a valuable generalization of Huber’s M-estimator for manifold-valued data. We comprehensively investigate the statistical and computational aspects of Huber means, demonstrating their utility in manifold-valued data analysis. Specifically, we establish nearly minimal conditions for ensuring the existence and uniqueness of the Huber mean and discuss regularity conditions for unbiasedness. The Huber means are consistent and enjoy the central limit theorem. Additionally, we propose a novel moment-based estimator for the limiting covariance matrix, which is used to construct a robust one-sample location test procedure and an approximate confidence region for location parameters. The Huber mean is shown to be highly robust and efficient in the presence of outliers or under heavy-tailed distributions. Specifically, it achieves a breakdown point of at least 0.5, the highest among all isometric equivariant estimators, and is more efficient than the Fréchet mean under heavy-tailed distributions.
Keywords: Geometric statistics, Robust statistics, Riemannian center of mass, Hypothesis testing, Statistics on manifolds
Radial Fields on the Manifolds of Symmetric Positive Definite Matrices (신하영, 숭실대학교 정보통계보험수리학과) [9:40~10:20]
On Hadamard manifolds, the radial fields, which are the negative gradients of the Busemann functions, can be used to designate a canonical sense of direction. This could have many potential applications to Hadamard manifold-valued data, for example in defining notions of quantiles or treatment effects. Some of the most commonly encountered Hadamard manifolds in statistics are the spaces of symmetric positive definite matrices, which are used in, for example, covariance matrix analysis and diffusion tensor imaging. Surprisingly, an expression for the radial fields on these manifolds is unavailable in the literature even though the issue arises quite naturally when studying the geometry of these spaces. This research aims to fill this gap by deriving such an expression, and also demonstrates their smoothness.
Keywords: Positive definite matrices, Hadamard manifolds, Geometric statistics, Radial fields
Randomized QLP Decomposition for Third-Order Tensors with Unitary Transform (권영욱, 서울대학교 통계학과) [10:40~11:20]
Recently, randomized algorithms have gained considerable attention as efficient techniques for dimension reduction in large-scale data across various scientific fields. In this study, we introduce a randomized algorithm for third-order tensor decomposition based on the tensor-tensor product (t-product) using a unitary transform. Our approach is motivated by randomized tensor approximation methods that depend on random projections of each frontal slice of a tensor. However, these methods still incur significant computational costs when applying SVD or column-pivoted QR decomposition to the slices. To improve the efficiency of randomized algorithms, we propose a randomized tensor QLP decomposition (rt-QLP) without pivoting for third-order tensors, extending the matrix-based QLP to the tensor setting in the transformed domain. Deterministic and probabilistic error bounds are derived by combining properties of the t-product with existing error analysis results of matrix QLP. The effectiveness and efficiency of the proposed method are demonstrated through extensive numerical experiments on tasks such as data compression, image completion, and facial recognition.
Keywords: Randomized algorithm, Tensor decomposition, Transformed t-product, QLP decomposition, Randomized tensor QLP decomposition
종단 오믹스 자료 발현 분석을 위한 R 패키지 개발 (강하주, 한양대학교 응용통계학과) [11:20~11:40]
종단 오믹스(omics) 자료는 동일한 생물학적 샘플에서 여러 시간 지점에 걸쳐 수천 개의 분자 특성을 반복 측정하는 데이터이다. 그러나 기존 분석 패키지인 maSigPro는 오차의 상관구조를 독립으로 가정하여 시간적 종속성을 반영하지 못하고, 개체간 이질성을 고려하지 않아 통계적 추론의 왜곡과 동적 발현 변화 해석에 제약을 준다. 이를 보완하기 위해 본 연구에서는 선형 혼합효과모형(linear mixed-effects model)을 활용하여 시간 상관 구조와 개체 간 이질성을 명시적으로 반영하는 새로운 분석 패키지를 제안한다. 제안된 패키지는 데이터 전처리, 유전자 수준의 모형 적합 및 통계적 검정, 시간적 발현 패턴 시각화, 시간 추세 기반 클러스터링으로 구성된다. 실제 유전체(genomics) 자료에 적용한 결과, 기존의 독립 가정 기반 접근법보다 시간 변동 양상 탐지 성능이 향상되었으며, 발현 패턴을 효율적으로 요약할 수 있었다. 본 연구는 종단 오믹스 자료 발현 분석에서 시간적 종속성을 반영하는 실용적이고 통계적으로 정립된 분석 틀을 제시하며, 유전체를 포함한 다양한 오믹스 연구로의 확장 가능성을 보여준다.
Keywords: 오믹스 자료, R 패키지, 종단 자료 분석, 선형 혼합 효과 모형
gglite: A Lightweight and Intuitive R Package for Extending Graphic Grammar (최규빈, 전북대학교 통계학과) [11:40~12:20]
This paper introduces gglite, a new R package designed to provide a lightweight and intuitive graphic grammar for data visualization. While maintaining full compatibility with the commonly used ggplot2 package, gglite offers additional features to simplify data exploration and facilitate graphical modeling. The package includes tools for visual inspection, interactive exploratory analysis, linear and nonlinear graphical modeling, and various auxiliary functions for descriptive analytics. Built upon an easy-to-use syntax, gglite enables users to construct diverse visualization workflows, perform exploratory data analysis efficiently, and extend ggplot2’s expressive power through enhanced defaults and convenient high-level operations.
Keywords: Graphic grammar, Data visualization
이 성과는 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(No. 2021R1A2C1091357).