Poster Session 1
(12/12, Fri, 13:00~14:15)
(12/12, Fri, 13:00~14:15)
[Part 1] 12/12(Fri.) 13:00~14:15
[1-1] DomCLP: Domain-wise Contrastive Learning with Prototype Mixup for Unsupervised Domain Generalization (AAAI 2025)
이진섭(석박통합과정)
Self-supervised learning (SSL) methods based on the instance discrimination tasks with InfoNCE have achieved remarkable success. Despite their success, SSL models often struggle to generate effective representations for unseendomain data. To address this issue, research on unsupervised domain generalization (UDG), which aims to develop SSL models that can generate domain-irrelevant features, has been conducted. Most UDG approaches utilize contrastive learning with InfoNCE to generate representations, and perform feature alignment based on strong assumptions to generalize domain-irrelevant common features from multi-source domains. However, existing methods that rely on instance discrimination tasks are not effective at extracting domainirrelevant common features. This leads to the suppression of domain-irrelevant common features and the amplification of domain-relevant features, thereby hindering domain generalization. Furthermore, strong assumptions underlying feature alignment can lead to biased feature learning, reducing the diversity of common features. In this paper, we propose a novel approach, DomCLP, Domain-wise Contrastive Learning with Prototype Mixup. We explore how InfoNCE suppresses domain-irrelevant common features and amplifies domain-relevant features. Based on this analysis, we propose Domain-wise Contrastive Learning (DCon) to enhance domain-irrelevant common features. We also propose Prototype Mixup Learning (PMix) to generalize domain-irrelevant common features across multiple domains without relying on strong assumptions. The proposed method consistently outperforms state-of-the-art methods on the PACS and DomainNet datasets across various label fractions, showing significant improvements.
[1-2] Mitigating Semantic Collapse in Partially Relevant Video Retrieval (NeurIPS 2025)
정민석(석박통합과정)
Partially Relevant Video Retrieval~(PRVR) seeks videos where only part of the content matches a text query. Existing methods treat every annotated text–video pair as a positive and all others as negatives, ignoring the rich semantic variation both within a single video and across different videos. Consequently, embeddings of both queries and their corresponding video-clip segments for distinct events within the same video collapse together, while embeddings of semantically similar queries and segments from different videos are driven apart. This limits retrieval performance when videos contain multiple, diverse events. This paper addresses the aforementioned problems, termed as semantic collapse, in both the text and video embedding spaces. We first introduce Text Correlation Preservation Learning, which preserves the semantic relationships encoded by the foundation model across text queries. To address collapse in video embeddings, we propose Cross-Branch Video Alignment~(CBVA), a contrastive alignment method that disentangles hierarchical video representations across temporal scales. Subsequently, we introduce order-preserving token merging and adaptive CBVA to enhance alignment by producing video segments that are internally coherent yet mutually distinctive. Extensive experiments on PRVR benchmarks demonstrate that our framework effectively prevents semantic collapse and substantially improves retrieval accuracy.
[1-3] Multi-Agent LLM Framework for PRECEDE-PROCEED Aligned Behavior Change Intervention Design (SAC 2026)
이진권(석사과정)
Behavior-change programs demand collaborative design and consensus among stakeholders, yet real deployments suffer from fragmented planning due to time pressure, uneven expertise, and limited access to evidence. We present a multi-agent large language model system aligned with the PRECEDE–PROCEED framework that synthesizes standardized intervention manuals via iterative propose–critique–revise cycles. Role-specialized agents perform retrieval-augmented generation from guideline and evidence corpora, while a moderator coordinates consensus and validates outputs against reporting standards (TIDieR, BCTTv1). We target young-adult behavioral health with a focus on sleep hygiene, physical activity, and digital overuse. Our contributions are: (i) a role-aligned RAG architecture that encodes stakeholder perspectives, (ii) code-level orchestration for multi-round deliberation with embedded validation loops, and (iii) automated conformance checking to intervention reporting checklists. We outline a comprehensive evaluation comparing our system to single-LLM baselines, self-consistency, generic multi-agent debate, and domain-specific consultation frameworks, using completeness, feasibility, stakeholder alignment, and expert-preference metrics. Results indicate improved checklist completeness and stakeholder alignment with fewer contradictions and higher expert preference, supporting the system's practicality for guideline-grounded intervention design.
[1-4] TimePerceiver: An Encoder-Decoder Framework for Generalized Time-Series Forecasting (NeurIPS 2025)
이재빈(석박통합과정)
In machine learning, effective modeling requires a holistic consideration of how to encode inputs, make predictions (i.e., decoding), and train the model. However, in time-series forecasting, prior work has predominantly focused on encoder design, often treating prediction and training as separate or secondary concerns. In this paper, we propose TimePerceiver, a unified encoder-decoder forecasting framework that is tightly aligned with an effective training strategy. To be specific, we first generalize the forecasting task to include diverse temporal prediction objectives such as extrapolation, interpolation, and imputation. Since this generalization requires handling input and target segments that are arbitrarily positioned along the temporal axis, we design a novel encoder-decoder architecture that can flexibly perceive and adapt to these varying positions. For encoding, we introduce a set of latent bottleneck representations that can interact with all input segments to jointly capture temporal and cross-channel dependencies. For decoding, we leverage learnable queries corresponding to target timestamps to effectively retrieve relevant information. Extensive experiments demonstrate that our framework consistently and significantly outperforms prior state-of-the-art baselines across a wide range of benchmark datasets.
[1-5] Question-Aware Gaussian Experts for Audio-Visual Question Answering (CVPR 2025)
정인영(석사과정)
Audio-Visual Question Answering (AVQA) requires not only question-based multimodal reasoning but also precise temporal grounding to capture subtle dynamics for accurate prediction. However, existing methods mainly use question information implicitly, limiting focus on question-specific details. Furthermore, most studies rely on uniform frame sampling, which can miss key question-relevant frames. Although recent Top-K frame selection methods aim to address this, their discrete nature still overlooks fine-grained temporal details. This paper proposes QA-TIGER, a novel framework that explicitly incorporates question information and models continuous temporal dynamics. Our key idea is to use Gaussian-based modeling to adaptively focus on both consecutive and non-consecutive frames based on the question, while explicitly injecting question information and applying progressive refinement. We leverage a Mixture of Experts (MoE) to flexibly implement multiple Gaussian models, activating temporal experts specifically tailored to the question. Extensive experiments on multiple AVQA benchmarks show that QA-TIGER consistently achieves state-of-the-art performance.
[1-6] DC4GS: Directional Consistency-Driven Adaptive Density Control for 3D Gaussian Splatting (NeurIPS 2025)
정문수(석박통합과정)
We present a Directional Consistency (DC)-driven Adaptive Density Control (ADC) for 3D Gaussian Splatting (DC4GS). Whereas the conventional ADC bases its primitive splitting on the magnitudes of positional gradients, we further incorporate the DC of the gradients into ADC, and realize it through the angular coherence of the gradients. Our DC better captures local structural complexities in ADC, avoiding redundant splitting. When splitting is required, we again utilize the DC to define optimal split positions so that sub-primitives best align with the local structures than the conventional random placement. As a consequence, our DC4GS greatly reduces the number of primitives (up to 30% in our experiments) than the existing ADC, and also enhances reconstruction fidelity greatly.
[1-7] Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation (ICCV 2025)
정지우(박사과정)
최근 텍스트-이미지 생성 모델의 발전으로, 소수의 예시 이미지로 특정 인물이나 대상을 학습하는 subject-driven generation이 주목받고 있습니다. 그러나 확산 모델은 고품질 이미지를 생성하지만, 많은 단계의 생성 과정 때문에 추론 속도가 느립니다. 본 연구에서는 더 빠른 Visual Autoregressive(VAR) 모델을 기반으로 한 최초의 subject-driven generation 방법을 제안하고, 중요한 층과 초기 해상도 위주로 선택적으로 미세 조정(Fine-Tuning) 하는 전략을 통해 계산량 증가, 언어 표현 왜곡, 다양성 저하 문제를 동시에 완화합니다. 다양한 실험을 통해 제안 기법이 기존 확산 기반 방법들보다 더 빠르면서도 높은 품질을 달성하여 실사용 가능성을 입증하였습니다.
[1-8] Federated domain generalization with source knowledge preservation via discriminative ensembles (Information Sciences)
강용훈(박사과정)
Many federated learning approaches assume that all clients have datasets within the same domain. However, in real-world scenarios, this assumption rarely holds, because clients collect data from distinct environments. Federated domain generalization attempts to address this challenge by improving model generalization to unseen domains. However, existing approaches suffer from a trade-off: they enhance performance in the unseen domains at the cost of degrading performance in the source domains due to the suppression of domain-specific features. To overcome this limitation, we propose Federated Discriminative Ensemble (FedDE) to improve generalization on both unseen and source domains by maximizing domain-invariant feature learning while minimizing domain-specific information loss. FedDE introduces two components: a common model that captures domain-invariant features and a localizer that preserves domain-specific features ignored by the common model. To ensure feature separation, we apply L2-norm regularization and adversarial training to encourage each component to learn distinct types of information. During inference, FedDE employs a client model ensemble strategy, leveraging both domain-invariant and domain-specific knowledge to enhance performance across all domains. This ensemble approach mitigates information loss and significantly boosts accuracy on both the seen and unseen domains. We conducted extensive experiments on multiple benchmark datasets and demonstrated that FedDE outperformed existing methods by achieving superior performance across both the source and unseen domains.
[1-9] Model Risk-sensitive offline Reinforcement Learning (ICLR 2025)
유광표(박사과정)
Offline reinforcement learning (RL) is becoming critical in risk-sensitive areas such as finance and autonomous driving, where incorrect decisions can lead to substantial financial loss or compromised safety. However, traditional risk-sensitive offline RL methods often struggle with accurately assessing risk, with minor errors in the estimated return potentially causing significant inaccuracies of risk estimation. These challenges are intensified by distribution shifts inherent in offline RL. To mitigate these issues, we propose a model risk-sensitive offline RL framework designed to minimize the worst-case of risks across a set of plausible alternative scenarios rather than solely focusing on minimizing estimated risk. We present a critic-ensemble criterion method that identifies the plausible alternative scenarios without introducing additional hyperparameters. We also incorporate the learned Fourier feature framework and the IQN framework to address spectral bias in neural networks, which can otherwise lead to severe errors in calculating model risk. Our experiments in finance and self-driving scenarios demonstrate that the proposed framework significantly reduces risk, by to , compared to the most outperforming risk-sensitive offline RL baseline, particularly in highly uncertain environments.
[1-10] MEMHD: Multi-Centroid Hyperdimensional Computing for Fully-Utilized In-Memory Computing Architectures (DATE 2025)
강도영(석사과정)
The implementation of Hyperdimensional Computing (HDC) on In-Memory Computing (IMC) architectures faces significant challenges due to the mismatch between highdimensional vectors and IMC array sizes, leading to inefficient memory utilization and increased computation cycles. This paper presents MEMHD, a Memory-Efficient Multi-centroid HDC framework designed to address these challenges. MEMHD introduces a clustering-based initialization method and quantization aware iterative learning for multi-centroid associative memory. Through these approaches and its overall architecture, MEMHD achieves a significant reduction in memory requirements while maintaining or improving classification accuracy. Our approach achieves full utilization of IMC arrays and enables one-shot (or few-shot) associative search. Experimental results demonstrate that MEMHD outperforms state-of-the-art binary HDC models, achieving up to 13.69% higher accuracy with the same memory usage, or 13.25x more memory efficiency at the same accuracy level. Moreover, MEMHD reduces computation cycles by up to 80x and array usage by up to 71x compared to baseline IMC mapping methods when mapped to 128x128 IMC arrays, while significantly improving energy and computation cycle efficiency.
[1-11] Policy Compatible Skill Incremental Learning via Lazy Learning Interface (NeurIPS 2025)
이대희(석박통합과정)
Skill Incremental Learning (SIL) is the process by which an embodied agent expands and refines its skill set over time by leveraging experience gained through interaction with its environment or through the integration of additional data. SIL facilitates efficient acquisition of hierarchical policies grounded in reusable skills for downstream tasks. However, as the skill repertoire evolves, it can disrupt compatibility with existing skill-based policies, limiting their reusability and generalization. In this work, we propose SIL-C, a novel framework that ensures skill-policy compatibility, allowing improvements in incrementally learned skills to enhance the performance of downstream policies without requiring policy re-training or structural adaptation. SIL-C employs a bilateral lazy learning-based mapping technique to dynamically align the subtask space referenced by policies with the skill space decoded into agent behaviors. This enables each subtask, derived from the policy’s decomposition of a complex task, to be executed by selecting an appropriate skill based on trajectory distribution similarity. We evaluate SIL-C across diverse SIL scenarios and demonstrate that it maintains compatibility between evolving skills and downstream policies while ensuring efficiency throughout the learning process.
[1-12] Compile-Time QoS Scheme for Deep Learning Inferences (SC 2025)
홍성인(박사과정)
With the proliferation of deep learning technologies across various service domains, the sharing of accelerators such as GPUs, TPUs, and NPUs for inference processing has become increasingly common. These accelerators must efficiently handle multiple deep learning services operating concurrently. However, inference requests, characterized by sequences of short-duration kernels, create significant challenges for online schedulers attempting to maintain Quality of Service (QoS) guarantees. This paper presents QoSlicer, a novel compile-time QoS management framework that employs kernel slicing to relieve the burden on schedulers. By generating multiple pre-determined slicing plans, QoSlicer enables more efficient, lightweight QoS scheduling while ensuring target latency requirements are met. Our approach incorporates a heuristic search algorithm to identify optimal slicing plans and implements robust performance estimation models to validate these plans. Our experimental evaluation across 75 diverse workload combinations demonstrates that QoSlicer improves throughput by an average of 20.2\% compared to state-of-the-art scheduling techniques.
[1-13] Self-supervised Adversarial Purification for Graph Neural Networks (ICML 2025)
이우현(석사과정)
Defending Graph Neural Networks (GNNs) against adversarial attacks requires balancing accuracy and robustness, a trade-off often mishandled by traditional methods like adversarial training that intertwine these conflicting objectives within a single classifier. To overcome this limitation, we propose a self-supervised adversarial purification framework. We separate robustness from the classifier by introducing a dedicated purifier, which cleanses the input data before classification. In contrast to prior adversarial purification methods, we propose GPR-GAE, a novel graph auto-encoder (GAE), as a specialized purifier trained with a self-supervised strategy, adapting to diverse graph structures in a data-driven manner. Utilizing multiple Generalized PageRank (GPR) filters, GPR-GAE captures diverse structural representations for robust and effective purification. Our multi-step purification process further facilitates GPR-GAE to achieve precise graph recovery and robust defense against structural perturbations. Experiments across diverse datasets and attack scenarios demonstrate the state-of-theart robustness of GPR-GAE, showcasing it as an independent plug-and-play purifier for GNN classifiers.
[1-14] Bayesian NeRF: Quantifying Uncertainty with Volume Density for Neural Implicit Fields (ICRA2026)
이시백(석박통합과정)
We present a Bayesian Neural Radiance Field (NeRF), which explicitly quantifies uncertainty in the volume density by modeling uncertainty in the occupancy, without the need for additional networks, making it particularly suited for challenging observations and uncontrolled image environments. NeRF diverges from traditional geometric methods by providing an enriched scene representation, rendering color and density in 3D space from various viewpoints. However, NeRF encounters limitations in addressing uncertainties solely through geometric structure information, leading to inaccuracies when interpreting scenes with insufficient real-world observations. While previous efforts have relied on auxiliary networks, we propose a series of formulation extensions to NeRF that manage uncertainties in density, both color and density, and occupancy, all without the need for additional networks. In experiments, we show that our method significantly enhances performance on RGB and depth images in the comprehensive dataset. Given that uncertainty modeling aligns well with the inherently uncertain environments of Simultaneous Localization and Mapping (SLAM), we applied our approach to SLAM systems and observed notable improvements in mapping and tracking performance. These results confirm the effectiveness of our Bayesian NeRF approach in quantifying uncertainty based on geometric structure, making it a robust solution for challenging real-world scenarios.
[1-15] Why is Normalization Necessary for Linear Recommenders? (SIGIR 2025)
박성민(석박통합과정)
Despite their simplicity, linear autoencoder (LAE)-based models have shown comparable or even better performance with faster inference speed than neural recommender models. However, LAEs face two critical challenges: (i) popularity bias, which tends to recommend popular items, and (ii) neighborhood bias, which overly focuses on capturing local item correlations. To address these issues, this paper first analyzes the effect of two existing normalization methods for LAEs, i.e., random-walk and symmetric normalization. Our theoretical analysis reveals that normalization highly affects the degree of popularity and neighborhood biases among items. Inspired by this analysis, we propose a versatile normalization solution, called Data-Adaptive Normalization (DAN), which flexibly controls the popularity and neighborhood biases by adjusting item- and user-side normalization to align with unique dataset characteristics. Owing to its model-agnostic property, DAN can be easily applied to various LAE-based models. Experimental results show that DAN-equipped LAEs consistently improve existing LAE-based models across six benchmark datasets, with significant gains of up to 128.57% and 12.36% for long-tail items and unbiased evaluations, respectively.
[1-16] Optimized Minimal 3D Gaussian Splatting (NeurIPS 2025)
이주찬(석박통합과정)
3D Gaussian Splatting (3DGS) has emerged as a powerful representation for real-time, high-performance rendering, enabling a wide range of applications. However, representing 3D scenes with numerous explicit Gaussian primitives imposes significant storage and memory overhead. Recent studies have shown that high-quality rendering can be achieved with a substantially reduced number of Gaussians when represented with high-precision attributes. Nevertheless, existing 3DGS compression methods still rely on a relatively large number of Gaussians, focusing primarily on attribute compression. This is because a smaller set of Gaussians becomes increasingly sensitive to lossy attribute compression, leading to severe quality degradation. Since the number of Gaussians is directly tied to computational costs, it is essential to reduce the number of Gaussians effectively rather than only optimizing storage. In this paper, we propose Optimized Minimal Gaussians representation (OMG), which significantly reduces storage while using a minimal number of primitives. First, we determine the distinct Gaussian from the near ones, minimizing redundancy without sacrificing quality. Second, we propose a compact and precise attribute representation that efficiently captures both continuity and irregularity among primitives. Additionally, we propose a sub-vector quantization technique for improved irregularity representation, maintaining fast training with a negligible codebook size. Extensive experiments demonstrate that OMG reduces storage requirements by nearly 50% compared to the previous state-of-the-art and enables 600+ FPS rendering while maintaining high rendering quality.
[1-17] Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning (ICML 2025)
백승호(석사과정)
Existing offline hierarchical reinforcement learning methods rely on high-level policy learning to generate subgoal sequences. However, their efficiency degrades as task horizons increase, and they lack effective strategies for stitching useful state transitions across different trajectories. We propose Graph-Assisted Stitching (GAS), a novel framework that formulates subgoal selection as a graph search problem rather than learning an explicit high-level policy. By embedding states into a Temporal Distance Representation (TDR) space, GAS clusters semantically similar states from different trajectories into unified graph nodes, enabling efficient transition stitching. A shortest-path algorithm is then applied to select subgoal sequences within the graph, while a low-level policy learns to reach the subgoals. To improve graph quality, we introduce the Temporal Efficiency (TE) metric, which filters out noisy or inefficient transition states, significantly enhancing task performance. GAS outperforms prior offline HRL methods across locomotion, navigation, and manipulation tasks. Notably, in the most stitching-critical task, it achieves a score of 88.3, dramatically surpassing the previous state-of-the-art score of 1.0.
[1-18] Temporal Alignment-Free Video Matching for Few-shot Action Recognition (CVPR 2025)
이수빈(박사과정)
Few-Shot Action Recognition (FSAR) aims to train a model with only a few labeled video instances. A key challenge in FSAR is handling divergent narrative trajectories for precise video matching. While the frame- and tuple-level alignment approaches have been promising, their methods heavily rely on pre-defined and length-dependent alignment units (e.g., frames or tuples), which limits flexibility for actions of varying lengths and speeds. In this work, we introduce a novel TEmporal Alignment-free Matching (TEAM) approach, which eliminates the need for temporal units in action representation and brute-force alignment during matching. Specifically, TEAM represents each video with a fixed set of pattern tokens that capture globally discriminative clues within the video instance regardless of action length or speed, ensuring its flexibility. Furthermore, TEAM is inherently efficient, using token-wise comparisons to measure similarity between videos, unlike existing methods that rely on pairwise comparisons for temporal alignment. Additionally, we propose an adaptation process that identifies and removes common information across classes, establishing clear boundaries even between novel categories. Extensive experiments demonstrate the effectiveness of TEAM.
[1-19] Spatial Coordinate Transformation for 3D Neural Implicit Mapping (ICRA2026)
강경수(박사과정)
Implicit Neural Representation (INR)-based SLAM has a critical issue where all keyframes must be stored in memory for post-training whenever a remapping is needed due to the neural network's weights themselves representing the map. To address this, previous INR-based SLAM proposed methods to modify INR-based maps without changing the neural network's weights. However, these approaches suffer from low memory efficiency and increased space complexity. In this paper, we introduce a remapping method for INR-based maps that does not require post-training the neural network's weights and needed low space cost. The problem of function modification, such as updating a map defined as a neural network function, can be viewed as transforming the function’s domain. Leveraging function domain transformation, we propose a method to update INR-based maps by identifying the transformation function between the post-optimization and pre-optimization domains. Additionally, to prevent cases where the transformation between the post-optimization and pre-optimization domains does not form a one-to-many relationship, we introduce a temporal domain and propose a method to find the spatial coordinate transformation function accordingly. Evaluations in INR-based techniques demonstrate that our proposed method effectively update to maps while requiring significantly less memory compared to existing remapping approaches.
[1-20] DCG-SQL: Enhancing In-Context Learning for Text-to-SQL with Deep Contextual Schema Link Graph (ACL 2025)
이지형(박사과정)
Text-to-SQL, which translates a natural language question into an SQL query, has advanced with in-context learning of Large Language Models (LLMs). However, existing methods show little improvement in performance compared to randomly chosen demonstrations, and significant performance drops when smaller LLMs (e.g., Llama 3.1-8B) are used. This indicates that these methods heavily rely on the intrinsic capabilities of hyper-scaled LLMs, rather than effectively retrieving useful demonstrations. In this paper, we propose a novel approach for effectively retrieving demonstrations and generating SQL queries. We construct a Deep Contextual Schema Link Graph, which contains key information and semantic relationship between a question and its database schema items. This graph-based structure enables effective representation of Text-to-SQL samples and retrieval of useful demonstrations for in-context learning. Experimental results on the Spider benchmark demonstrate the effectiveness of our approach, showing consistent improvements in SQL generation performance and efficiency across both hyper-scaled LLMs and small LLMs.