Poster Session
Details
(DAY 1,2)
(DAY 1,2)
[Day1] 9/26(Thu.) 16:30~18:00
AdFlush: A Real-World Deployable Machine Learning Solution for Effective Advertisement and Web Tracker Prevention, WWW 2024
임채진(석사과정)
Conventional ad blocking and tracking prevention tools often fall short in addressing web content manipulation. Machine learning approaches have been proposed to enhance detection accuracy, yet aspects of practical deployment have frequently been overlooked. This paper introduces AdFlush, a novel machine learning model for real-world browsers. To develop AdFlush, we evaluated the effectiveness of 883 features, ultimately selecting 27 key features for optimal performance. We tested AdFlush on a dataset of 10,000 real-world websites, achieving an F1 score of 0.98, thereby outperforming AdGraph (F1 score: 0.93), WebGraph (F1 score: 0.90), and WTAgraph (F1 score: 0.84). Additionally, AdFlush significantly reduces computational overhead, requiring 56% less CPU and 80% less memory than AdGraph. We also assessed AdFlush's robustness against adversarial manipulations, demonstrating superior resilience with F1 scores ranging from 0.89 to 0.98, surpassing the performance of AdGraph and WebGraph, which recorded F1 scores between 0.81 and 0.87. A six-month longitudinal study confirmed that AdFlush maintains a high F1 score above 0.97 without the need for retraining, underscoring its effectiveness.
HandDAGT : A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation, ECCV2024
김은지(석박사통합과정)
The extraction of keypoint positions from input hand frames, known as 3D hand pose estimation, is crucial for various human-computer interaction applications. However, current approaches often struggle with the dynamic nature of self-occlusion of hands and intra-occlusion with interacting objects. To address this challenge, this paper proposes the Denoising Adaptive Graph Transformer, HandDAGT, for hand pose es- timation. The proposed HandDAGT leverages a transformer structure to thoroughly explore effective geometric features from input patches. Additionally, it incorporates a novel attention mechanism to adaptively weigh the contribution of kinematic correspondence and local geomet- ric features for the estimation of specific keypoints. This attribute en- ables the model to adaptively employ kinematic and local information based on the occlusion situation, enhancing its robustness and accuracy. Furthermore, we introduce a novel denoising training strategy aimed at improving the model’s robust performance in the face of occlusion chal- lenges. Experimental results show that the proposed model significantly outperforms the existing methods on four challenging hand pose bench- mark datasets.
Multi-granularity Guided Fusion-in-Decoder, NAACL 2024
최은성(석박사통합과정)
In Open-domain Question Answering (ODQA), it is essential to discern relevant contexts as evidence and avoid spurious ones among retrieved results. The model architecture that uses concatenated multiple contexts in the decoding phase, i.e., Fusion-in-Decoder, demonstrates promising performance but generates incorrect outputs from seemingly plausible contexts. To address this problem, we propose the Multi-Granularity guided Fusion-in-Decoder (MGFiD), discerning evidence across multiple levels of granularity. Based on multi-task learning, MGFiD harmonizes passage re-ranking with sentence classification. It aggregates evident sentences into an anchor vector that instructs the decoder. Additionally, it improves decoding efficiency by reusing the results of passage re-ranking for passage pruning. Through our experiments, MGFiD outperforms existing models on the Natural Questions (NQ) and TriviaQA (TQA) datasets, highlighting the benefits of its multi-granularity solution.
Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer, CVPR 2024
현상익(박사과정)
Despite the impressive generative capabilities of diffusion models, existing diffusion model-based style transfer methods require inference-stage optimization (e.g. fine-tuning or textual inversion of style) which is time-consuming, or fails to leverage the generative ability of large-scale diffusion models. To address these issues, we introduce a novel artistic style transfer method based on a pre-trained large-scale diffusion model without any optimization. Specifically, we manipulate the features of self-attention layers as the way the cross-attention mechanism works; in the generation process, substituting the key and value of content with those of style image. This approach provides several desirable characteristics for style transfer including 1) preservation of content by transferring similar styles into similar image patches and 2) transfer of style based on similarity of local texture (e.g. edge) between content and style images. Furthermore, we introduce query preservation and attention temperature scaling to mitigate the issue of disruption of original content, and initial latent Adaptive Instance Normalization (AdaIN) to deal with the disharmonious color (failure to transfer the colors of style). Our experimental results demonstrate that our proposed method surpasses state-of-the-art methods in both conventional and diffusion-based style transfer baselines.
Improving Multi-hop Logical Reasoning in Knowledge Graphs with Context-Aware Query Representation Learning, ACL 2024
김정훈(박사과정)
Multi-hop logical reasoning on knowledge graphs is a pivotal task in natural language processing, with numerous approaches aiming to answer First-Order Logic (FOL) queries. Recent geometry (e.g., box, cone) and probability (e.g., beta distribution)-based methodologies have effectively addressed complex FOL queries. However, a common challenge across these methods lies in determining accurate geometric bounds or probability parameters for these queries. The challenge arises because existing methods rely on linear sequential operations within their computation graphs, overlooking the logical structure of the query and the relation-induced information that can be gleaned from the relations of the query, which we call the context of the query. To address the problem, we propose a model-agnostic methodology that enhances the effectiveness of existing multi-hop logical reasoning approaches by fully integrating the context of the FOL query graph. Our approach distinctively discerns (1) the structural context inherent to the query structure and (2) the relation-induced context unique to each node in the query graph as delineated in the corresponding knowledge graph. This dual-context paradigm helps nodes within a query graph attain refined internal representations throughout the multi-hop reasoning steps. Through experiments on two datasets, our method consistently enhances the three multi-hop reasoning foundation models, achieving performance improvements of up to 19.5%.
Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation, IJCAI 2024
김우경(박사과정)
Data-driven offline reinforcement learning and imitation learning approaches have been gaining popularity in addressing sequential decision-making problems. Yet, these approaches rarely consider learning Pareto-optimal policies from a limited pool of expert datasets. This becomes particularly marked due to practical limitations in obtaining comprehensive datasets for all preferences, where multiple conflicting objectives exist and each expert might hold a unique optimization preference for these objectives. In this paper, we adapt inverse reinforcement learning (IRL) by using reward distance estimates for regularizing the discriminator. This enables progressive generation of a set of policies that accommodate diverse preferences on the multiple objectives, while using only two distinct datasets, each associated with a different expert preference. In doing so, we present a Pareto IRL framework (ParIRL) that establishes a Pareto policy set from these limited datasets. In the framework, the Pareto policy set is then distilled into a single, preference-conditioned diffusion model, thus allowing users to immediately specify which expert's patterns they prefer. Through experiments, we show that ParIRL outperforms other IRL algorithms for various multi-objective control tasks, achieving the dense approximation of the Pareto frontier. We also demonstrate the applicability of ParIRL with autonomous driving in CARLA.
ESG2PreEM: Automated ESG grade assessment framework using pre-trained ensemble models, Heliyon
강충원,황승현(석사과정)
Incorporating environmental, social, and governance (ESG) criteria is essential for promoting sustainability in business and is considered a set of principles that can increase a firm’s value. This research proposes a strategy using text-based automated techniques to rate ESG. For autonomous classification, data were collected from the news archive LexisNexis and classified as E, S, or G based on the ESG materials provided by the Refinitiv-Sustainable Leadership Monitor, which has over 450 metrics. In addition, Bidirectional Encoder Representations from Transformers (BERT), Robustly optimized BERT approach (RoBERTa), and A Lite BERT (ALBERT) models were trained to accurately categorize preprocessed ESG documents using a voting ensemble model, and their performances were measured. The accuracy of the ensemble model utilizing BERT and ALBERT was found to be 80.79% with batch size 20. Additionally, this research validated the performance of the framework for companies included in the Dow Jones Industrial Average (DJIA) and compared it with the grade provided by Morgan Stanley Capital International (MSCI), a globally renowned ESG rating agency known for having the highest creditworthiness. This study supports the use of sophisticated natural language processing (NLP) techniques to attain important knowledge from large amounts of text-based data to improve ESG assessment criteria established by different rating agencies.
Model Adaptation for Time Constrained Embodied Control, CVPR 2024
유민종(박사과정)
When adopting a deep learning model for embodied agents, it is required that the model structure be optimized for specific tasks and operational conditions. Such optimization can be static such as model compression or dynamic such as adaptive inference. Yet, these techniques have not been fully investigated for embodied control systems subject to time constraints, which necessitate sequential decision-making for multiple tasks, each with distinct inference latency limitations. In this paper, we present MoDeC, a time constraint-aware embodied control framework using the modular model adaptation. We formulate model adaptation to varying operational conditions on resource and time restrictions as dynamic routing on a modular network, incorporating these conditions as part of multi-task objectives. Our evaluation across several vision-based embodied environments demonstrates the robustness of MoDeC, showing that it outperforms other model adaptation methods in both performance and adherence to time constraints in robotic manipulation and autonomous driving applications
Native DRAM Cache: Re-architecting DRAM as a Large-Scale Cache for Data Centers, ISCA 2024
유예신(박사과정)
Contemporary data center CPUs are experiencing an unprecedented surge in core count. This trend necessitates scrutinized Last-Level Cache (LLC) strategies to accommodate increasing capacity demands. While DRAM offers significant capacity, using it as a cache poses challenges related to latency and energy. This paper introduces Native DRAM Cache (NDC), a novel DRAM architecture specifically designed to operate as a cache. NDC features innovative approaches, such as conducting tag matching and way selection within a DRAM subarray and repurposing existing precharge transistors for tag matching. These innovations facilitate Caching-In-Memory (CIM) and enable NDC to serve as a high-capacity LLC with high set-associativity, low-latency, high-throughput, and low-energy. Our evaluation demonstrates that NDC significantly outperforms state-of-the-art DRAM cache solutions, enhancing performance by 2.8%/52.5%/44.2% (up to 8.4%/140.6%/85.5%) in SPEC/NPB/GAP benchmark suites, respectively.
Self-supervised One-Stage Learning for RF-based Multi-Person Pose Estimation, CIKM 2024
오승준(박사과정)
In the field of Multi-Person Pose Estimation (MPPE), Radio Frequency (RF)-based methods can operate effectively regardless of lighting conditions and obscured line-of-sight situations. Existing RF-based MPPE methods typically involve either 1) converting RF signals into heatmap images through complex preprocessing, or 2) applying a deep embedding network directly to raw RF signals. The first approach, while delivering decent performance, is computationally intensive and time-consuming. The second method, though simpler in preprocessing, results in lower MPPE accuracy and generalization performance. This paper proposes an efficient and lightweight one-stage MPPE model based on raw RF signals. By sub-grouping RF signals and embedding them using a shared single-layer CNN followed by multi-head attention, this model outperforms previous methods that embed all signals at once through a large and deep CNN. Additionally, we propose a new self-supervised learning (SSL) method that takes inputs from both one unmasked subgroup and the remaining masked subgroups to predict the latent representations of the masked data. Empirical results demonstrate that our model improves MPPE accuracy by up to 15 in PCKh@0.5 compared to previous methods using raw RF signals. Especially, the proposed SSL method has shown to significantly enhance performance improvements when placed in new locations or in front of obstacles at RF antennas, contributing to greater performance gains as the number of people increases.
BinAdapter: Leveraging Continual Learning for Inferring Function Symbol Names in a Binary, ASIACCS 2024
권유정(석사과정)
Binary reverse engineering is crucial to gaining insights into the inner workings of a stripped binary. Yet, it is challenging to read the original semantics from a binary code snippet because of the unavailability of high-level information in the source, such as function names, variable names, and types. Recent advancements in deep learning show the possibility of recovering such vanished information with a well-trained model from a pre-defined dataset. Albeit a static model’s notable performance, it can hardly cope with an ever-increasing data stream (e.g., compiled binaries) by nature. The two viable approaches for ceaseless learning are retraining the whole dataset from scratch and fine-tuning a pre-trained model; however, retraining suffers from large computational overheads and fine-tuning from performance degradation (i.e., catastrophic forgetting). Lately, continual learning (CL) tackles the problem of handling incremental data in security domains (e.g., network intrusion detection, malware detection) using reasonable resources while maintaining performance in practice. In this paper, we focus on how CL assists in the improvement of a generative model that predicts a function symbol name from a series of machine instructions. To this end, we introduce BinAdapter, a system that can infer function names from an incremental dataset without performance degradation from an original dataset by leveraging CL techniques. Our major finding shows that incremental tokens in the source (i.e., machine instructions) or the target (i.e., function names) largely affect the overall performance of a CL-enabled model. Accordingly, BinAdapter adopts three built-in approaches: 1 inserting adapters in case of no incremental tokens in both the source and target, 2 harnessing multilingual neural machine translation (M-NMT) and fine-tuning the source embeddings with 1 in case of incremental tokens in the source, and 3 fine-tuning target embeddings with 2 in case of incremental tokens in both. To demonstrate the effectiveness of BinAdapter, we evaluate the above three scenarios using incremental datasets with or without a set of new tokens (e.g., unseen machine instructions or function names), spanning across different architectures and optimization levels. Our empirical results show that BinAdapter outperforms the state-of-the-art CL techniques for an F1 of up to 24.3% or a Rouge-l of 21.5% in performance.
RGBD GS-ICP SLAM (Real-time Gaussian Splatting SLAM), ECCV2024
하성보(석박사통합과정)
Simultaneous Localization and Mapping (SLAM) with dense representation plays a key role in robotics, Virtual Reality (VR), and Augmented Reality (AR) applications. Recent advancements in dense representation SLAM have highlighted the potential of leveraging neural scene representation and 3D Gaussian representation for high-fidelity spatial representation. In this paper, we propose a novel dense representation SLAM approach with a fusion of Generalized Iterative Closest Point (G-ICP) and 3D Gaussian Splatting (3DGS). In contrast to existing methods, we utilize a single Gaussian map for both tracking and mapping, resulting in mutual benefits. Through the exchange of covariances between tracking and mapping processes with scale alignment techniques, we minimize redundant computations and achieve an efficient system. Additionally, we enhance tracking accuracy and mapping quality through our keyframe selection methods. Experimental results demonstrate the effectiveness of our approach, showing impressive speeds up to 107 FPS (for the entire system) and superior quality of the reconstructed map.
Disrupting Diffusion-based Inpainters with Semantic Digression, IJCAI 2024
손건호(석사과정)
The fabrication of visual misinformation on the web and social media has increased exponentially with the advent of foundational text-to-image diffusion models. Namely, Stable Diffusion inpainters allow the synthesis of maliciously inpainted images of personal and private figures, and copyrighted contents, also known as deepfakes. To combat such generations, a disruption framework, namely Photoguard, has been proposed, where it adds adversarial noise to the context image to disrupt their inpainting synthesis. While their framework suggested a diffusion-friendly approach, the disruption is not sufficiently strong and it requires a significant amount of GPU and time to immunize the context image. In our work, we re-examine both the minimal and favorable conditions for a successful inpainting disruption, proposing DDD, a "Digression guided Diffusion Disruption" framework. First, we identify the most adversarially vulnerable diffusion timestep range with respect to the hidden space. Within this scope of noised manifold, we pose the problem as a semantic digression optimization. We maximize the distance between the inpainting instance's hidden states and a semantic-aware hidden state centroid, calibrated both by Monte Carlo sampling of hidden states and a discretely projected optimization in the token space. Effectively, our approach achieves stronger disruption and a higher success rate than Photoguard while lowering the GPU memory requirement, and speeding the optimization up to three times faster.
Visual defect obfuscation based self-supervised anomaly detection, Scientific Reports
박영현(박사과정)
Due to scarcity of anomaly situations in the early manufacturing stage, an unsupervised anomaly detection (UAD) approach is widely adopted which only uses normal samples for training. This approach is based on the assumption that the trained UAD model will accurately reconstruct normal patterns but struggles with unseen anomalies. To enhance the UAD performance, reconstruction-by-inpainting based methods have recently been investigated, especially on the masking strategy of suspected defective regions. However, there are still issues to overcome: (1) time-consuming inference due to multiple masking, (2) output inconsistency by random masking, and (3) inaccurate reconstruction of normal patterns for large masked areas. Motivated by this, this study proposes a novel reconstruction-by-inpainting method, dubbed Excision And Recovery (EAR), that features single deterministic masking based on the ImageNet pre-trained DINO-ViT and visual obfuscation for hint-providing. Experimental results on the MVTec AD dataset show that deterministic masking by pre-trained attention effectively cuts out suspected defective regions and resolves the aforementioned issues 1 and 2. Also, hint-providing by mosaicing proves to enhance the performance than emptying those regions by binary masking, thereby overcomes issue 3. The proposed approach achieves a high performance without any change of the model structure. Promising results are shown through laboratory tests with public industrial datasets. To suggest EAR be possibly adopted in various industries as a practically deployable solution, future steps include evaluating its applicability in relevant manufacturing environments.
Residual Learning in Diffusion Models, CVPR 2024
장준유(박사과정)
Diffusion models (DMs) have achieved remarkable generative performance particularly with the introduction of stochastic differential equations (SDEs). Nevertheless a gap emerges in the model sampling trajectory constructed by reverse-SDE due to the accumulation of score estimation and discretization errors. This gap results in a residual in the generated images adversely impacting the image quality. To remedy this we propose a novel residual learning framework built upon a correction function. The optimized function enables to improve image quality via rectifying the sampling trajectory effectively. Importantly our framework exhibits transferable residual correction ability ie a correction function optimized for one pre-trained DM can also enhance the sampling trajectory constructed by other different DMs on the same dataset. Experimental results on four widely-used datasets demonstrate the effectiveness and transferable capability of our framework.
[Day2] 9/27(Fri.) 16:30~18:00
FVTTS: Face Based Voice Synthesis for Text-to-Speech, INTERSPEECH 2024
이민영(석박사통합과정)
A face is expressive of individual identity and used in various studies such as identification, authentication, and personalization. Similarly, a voice is a means of expressing individuals, and personalized voice synthesis based on voice reference is active. However, the voice-based method confronts voice sample dependency limitations. We propose Face-based Voice synthesis for Text-To-Speech (FVTTS) to synthesize voice from face images that are more expressive of personal identity than voice samples. A major challenge in face-based TTS methods is extracting distinct voice features highly related to voice from the face image. Our face encoder is designed to tackle this by integrating global facial attributes with voice-related features to represent personalized characteristics. FVTTS has shown superiority in various metrics and adaptability across different data domains. We establish a new standard in face-based TTS, leading the way in personalized voice synthesis.
All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models, AAAI 2024
홍승후(석박사통합과정)
Text-to-Image models such as Stable Diffusion have shown impressive image generation synthesis, thanks to the utilization of large-scale datasets. However, these datasets may contain sexually explicit, copyrighted, or undesirable content, which allows the model to directly generate them. Given that retraining these large models on individual concept deletion requests is infeasible, fine-tuning algorithms have been developed to tackle concept erasing in diffusion models. While these algorithms yield good concept erasure, they all present one of the following issues: 1) the corrupted feature space yields synthesis of disintegrated objects, 2) the initially synthesized content undergoes a divergence in both spatial structure and semantics in the generated images, and 3) sub-optimal training updates heighten the model's susceptibility to utility harm. These issues severely degrade the original utility of generative models. In this work, we present a new approach that solves all of these challenges. We take inspiration from the concept of classifier guidance and propose a surgical update on the classifier guidance term while constraining the drift of the unconditional score term. Furthermore, our algorithm empowers the user to select an alternative to the erasing concept, allowing for more controllability. Our experimental results show that our algorithm not only erases the target concept effectively but also preserves the model’s generation capability.
Diversity-aware Channel Pruning for StyleGAN Compression, CVPR 2024
정지우(박사과정)
StyleGAN has shown remarkable performance in unconditional image generation. However, its high computational cost poses a significant challenge for practical applications. Although recent efforts have been made to compress StyleGAN while preserving its performance, existing compressed models still lag behind the original model, particularly in terms of sample diversity.To overcome this, we propose a novel channel pruning method that leverages varying sensitivities of channels to latent vectors, which is a key factor in sample diversity. Specifically, by assessing channel importance based on their sensitivities to latent vector perturbations, our method enhances the diversity of samples in the compressed model. Since our method solely focuses on the channel pruning stage, it has complementary benefits with prior training schemes without additional training cost.Extensive experiments demonstrate that our method significantly enhances sample diversity across various datasets. Moreover, in terms of FID scores, our method not only surpasses state-of-the-art by a large margin but also achieves comparable scores with only half training iterations.
Agile-DRAM: Agile Trade-Offs in Memory Capacity, Latency, and Energy for Data Centers, HPCA 2024
이준승(석사과정)
Data centers frequently face significant memory under-utilization due to factors such as infrastructure overprovisioning, inefficient workload scheduling, and limited server configurations. This paper introduces Agile-DRAM, a novel DRAM architecture that addresses this issue by flexibly converting the under-utilized memory capacity into enhanced latency performance and reduced power consumption. Through minor modifications to the conventional DRAM architecture, AgileDRAM supports multiple operational modes: low-latency, lowpower, and the default max-capacity mode. Notably, Agile-DRAM facilitates agile transitions between these modes in response to workload fluctuations in data centers at runtime. Evaluation results demonstrate that the low-latency mode can boost singlecore execution speed by up to 25.8% and diminish energy usage by up to 22.4%. Similarly, the low-power mode can reduce DRAM standby and self-refresh power by 31.6% and 85.7%, respectively.
MARS: Matching Attribute-aware Representations for Text-based Sequential Recommendation, CIKM 2024
김현수(석사과정)
Sequential recommendation aims to predict the next item a user is likely to prefer based on their sequential interaction history. Recently, text-based sequential recommendation has emerged as a promising paradigm that uses pre-trained language models to exploit textual item features to enhance performance and facilitate knowledge transfer to unseen datasets. However, existing text-based recommender models still struggle with two key challenges: representing users and items with multiple attributes, and matching items with complex user interests. To address these challenges, we propose a novel model, Matching Attribute-aware Representations for Text-based Sequential Recommendation (MARS). MARS extracts detailed user and item representations through attribute-aware text encoding, capturing diverse user intents with multiple attribute-aware representations. It then computes user-item scores via attribute-wise interaction matching, effectively capturing attribute-level user preferences. Our extensive experiments demonstrate that MARS significantly outperforms existing sequential models, achieving improvements of up to 24.43% and 29.26% in Recall@10 and NDCG@10 across five benchmark datasets.
GENE: GraphRAG Elicits Notions for Evidence based medicine, AAAI 2024
최광준, 김재현(석사과정)
In this paper, we propose a framework that integrates the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) and patient information datasets into a unified knowledge graph database, designed to provide effective queries to patients. Frameworks utilizing Large Language Models (LLMs) like GPT have gained attention for achieving high scores on the USMLE. However, evaluations based on USMLE data, where patient information is fully and accurately provided, do not reflect the reality of clinical environments where diagnoses often rely on fragmented, ambiguous, or even false information. Identifying “Clinical Missing Information” and asking patients “Good Questions” that effectively elicit crucial clinical data are essential aspects of medical AI. This approach acknowledges the “ignorance” of our models and humbly listens to new data, overcoming the “Dunning–Kruger effect” problem associated with generating unconditional responses using LLMs. Our framework first establishes an ontology suited for inference based on medical evidence, analyzing patient information texts from USMLE dataset to construct the first Knowledge Graph. Next, we preprocess the DSM-5 dataset to build the second Knowledge Graph by adopting its hierarchical structure. Each knowledge graph is categorized into clinical knowledge accumulated from medical practice and medically-based factual or procedural knowledge recognized through research. By combining these two knowledge graphs, we create an integrated knowledge graph database and apply GraphRAG to provide four types of questions tailored to the dialogue context when basic patient information and initial utterances are input, simulating real-world medical scenarios. This approach is expected to enhance the potential for AI to assist in clinical settings.
Deblurring 3D Gaussian Splatting, ECCV2024
이병현(박사과정)
Recent studies in Radiance Fields have paved the robust way for novel view synthesis with their photorealistic rendering quality. Nevertheless, they usually employ neural networks and volumetric rendering, which are costly to train and impede their broad use in various real-time applications due to the lengthy rendering time. Lately 3D Gaussians splatting-based approach has been proposed to model the 3D scene, and it achieves remarkable visual quality while rendering the images in real-time. However, it suffers from severe degradation in the rendering quality if the training images are blurry. Blurriness commonly occurs due to the lens defocusing, object motion, and camera shake, and it inevitably intervenes in clean image acquisition. Several previous studies have attempted to render clean and sharp images from blurry input images using neural fields. The majority of those works, however, are designed only for volumetric rendering-based neural radiance fields and are not straightforwardly applicable to rasterization-based 3D Gaussian splatting methods. Thus, we propose a novel real-time deblurring framework, Deblurring 3D Gaussian Splatting, using a small MultiLayer Perceptron (MLP) that manipulates the covariance of each 3D Gaussian to model the scene blurriness. While Deblurring 3D Gaussian Splatting can still enjoy real-time rendering, it can reconstruct fine and sharp details from blurry images. A variety of experiments have been conducted on the benchmark, and the results have revealed the effectiveness of our approach for deblurring. Qualitative results are available at https://benhenryl.github.io/Deblurring-3D-Gaussian-Splatting/
Just Flip: Flipped Observation Generation and Optimization for Neural Radiance Fields to Cover Unobserved View, IROS2024
이시백(석박사통합과정)
Compact 3D Gaussian Representation for Radiance Field, CVPR 2024
이주찬(석박사통합과정)
Neural Radiance Fields (NeRFs) have demonstrated remarkable potential in capturing complex 3D scenes with high fidelity. However, one persistent challenge that hinders the widespread adoption of NeRFs is the computational bottleneck due to the volumetric rendering. On the other hand, 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-based representation and adopts the rasterization pipeline to render the images rather than volumetric rendering, achieving very fast rendering speed and promising image quality. However, a significant drawback arises as 3DGS entails a substantial number of 3D Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric attributes of Gaussian by vector quantization. With model compression techniques such as quantization and entropy coding, we consistently show over 25× reduced storage and enhanced rendering speed, while maintaining the quality of the scene representation, compared to 3DGS. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering.
Novelty-aware Graph Traversal and Expansion for Hierarchical Reinforcement Learning, CIKM 2024
박종찬(박사과정)
Hierarchical Reinforcement Learning (HRL) is specially designed for environments characterized by long-term goals and sparse rewards. High-level policies in HRL learn to generate appropriate subgoals aimed at accomplishing the final goal, while low-level policies focus on achieving these designated subgoals. Recently, graph-based HRL algorithms have demonstrated enhanced learning capabilities through the structural representation of state spaces as graphs. However, existing graph-based HRL methods still often generate inefficient subgoals. This paper introduces a new method, Novelty-aware Graph Traversal and Expansion (NGTE), which selects an optimal node at the graph boundary, termed an Outpost Subgoal, as a direct path toward the final goal. Once the Outpost Subgoal is reached, NGTE transitions into an exploration phase, offering exploration subgoals within a reachable distance to efficiently expand the graph. Demonstrated in complex environments such as quadruped robot navigation and robotic arm manipulation, NGTE consistently outperforms existing graph and non-graph HRL methods, showing outstanding performance, especially in the most challenging scenarios with fixed start and fixed goal conditions.
Risk-Conditioned Reinforcement Learning: A Generalized Approach for Adapting to Varying Risk Measures, AAAI 2024
유광표(박사과정)
In application domains requiring mission-critical decision-making, such as finance and robotics, the optimal policy derived by reinforcement learning (RL) often hinges on a preference for risk management.
A Conflict-Embedded Narrative Generation Using Commonsense Reasoning, IJCAI 2024
조건희(석박사통합과정)
Conflict is a critical element in the narrative, inciting dramatic tension. This paper introduces CNGCI (Conflict-driven Narrative Generation through Commonsense Inference), a neuro-symbolic framework designed to generate coherent stories embedded with conflict using commonsense inference. Our framework defines narrative conflict by leveraging the concept of a soft causal threat, where conflict serves as an obstacle that reduces the likelihood of achieving the protagonist’s goal by weakening the causal link between context and goal through defeasible inference. Comparative studies against multiple story generation baselines utilizing commonsense reasoning show that our framework outperforms the baselines in creating narratives that distinctly embody conflict while maintaining coherency.
CacheCraft: Enhancing GPU Performance under Memory Protection through Reconstructed Caching, MICRO 2024
박소영(석사과정)
Contemporary GPUs use Error Correcting Codes (ECC) to protect against memory errors. GPUs with Graphics DDR (GDDR) utilize in-band ECC (a.k.a. inline ECC), which sequentially accesses data and redundancy to enable ECC functionality using non-ECC memory chips. However, the additional access reduces data throughput and can incur significant performance penalties for bandwidth-intensive applications. This paper introduces CacheCraft, a novel GPU micro-architecture engineered to address the inefficiencies of current in-band ECC protection. It reconfigures the traditional 128B cache line from four 32B sectors into four 30B sectors and one 8B sector. This adjustment creates a 2B space in each 32B memory chunk, designated for storing the redundancy of the sector data, thereby enabling a single memory access to deliver reliable data. Our evaluation shows that this single-access in-band ECC can significantly mitigate the bandwidth penalty of memory protection. While traditional in-band ECC increases memory access by 41.9% (peaking at 96.9%), CacheCraft reduces this extra bandwidth requirement to 21.9% (peaking at 28.2%). This significant reduction (47.8% on average and up to 89.4%) can substantially enhance the performance of memory-intensive applications by as much as 23.5%.
Embodied CoT Distillation From LLM To Off-the-shelf Agents, ICML 2024
최원제(박사과정)
We address the challenge of utilizing large language models (LLMs) for complex embodied tasks, in the environment where decision-making systems operate timely on capacity-limited, off-the-shelf devices. We present DeDer, a framework for decomposing and distilling the embodied reasoning capabilities from LLMs to efficient, small language model (sLM)-based policies. In DeDer, the decision-making process of LLM-based strategies is restructured into a hierarchy with a reasoning-policy and planning-policy. The reasoning-policy is distilled from the data that is generated through the embodied in-context learning and self-verification of an LLM, so it can produce effective rationales. The planning-policy, guided by the rationales, can render optimized plans efficiently. In turn, DeDer allows for adopting sLMs for both policies, deployed on off-the-shelf devices. Furthermore, to enhance the quality of intermediate rationales, specific to embodied tasks, we devise the embodied knowledge graph, and to generate multiple rationales timely through a single inference, we also use the contrastively prompted attention model. Our experiments with the ALFRED benchmark demonstrate that DeDer surpasses leading language planning and distillation approaches, indicating the applicability and efficiency of sLM-based embodied policies derived through DeDer.
Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using Vision Language Models, ECCV2024
정혜린(석사과정)
The challenge of semantic segmentation in Unsupervised Domain Adaptation (UDA) emerges not only from domain shifts between source and target images but also from discrepancies in class taxonomies across domains. Traditional UDA research assumes consistent taxonomy between the source and target domains, thereby limiting their ability to recognize and adapt to the taxonomy of the target domain. This paper introduces a novel approach, Cross-Domain Semantic Segmentation on Inconsistent Taxonomy using Vision Language Models (CSI), which effectively performs domain-adaptive semantic segmentation even in situations of source-target class mismatches. CSI leverages the semantic generalization potential of Visual Language Models (VLMs) to create synergy with previous UDA methods. It leverages segment reasoning obtained through traditional UDA methods, combined with the rich semantic knowledge embedded in VLMs, to relabel new classes in the target domain. This approach allows for effective adaptation to extended taxonomies without requiring any ground truth label for the target domain. Our method has shown to be effective across various benchmarks in situations of inconsistent taxonomy settings (coarse-to-fine taxonomy and open taxonomy) and demonstrates consistent synergy effects when integrated with previous state-of-the-art UDA methods.
[과제성과소개] 의료 데이터 프라이버시 보존을 위한 분산 환경에서의 연합 AI 컴퓨팅 모델 개발
과제수행기간 : 2020.04~2024.12, 지원기관 : NIPA
연구기관: 성균관대학교, 조선대학교, (주)유티소프트, 삼성서울병원 | 지원기관: 정보통신산업진흥원
연구목표: 의료데이터의 노드 간 이동 없이 분석/학습 가능한 엣지 컴퓨팅 기반의 연합 AI 플랫폼을 개발하고, Multi bio-signal(ECG/EMG /PPG) 측정이 가능한 웨어러블 시스템을 활용하여 실증
의료 데이터의 프라이버시 문제가 인공지능 헬스케어 서비스 상용화의 발목을 잡고 있다. 이를 근본적으로 해결할 수 있도록 데이터의 이동없이 인공지능 모델을 학습할 수 있는 방안의 마련이 시급하다. 연합학습은 데이터의 이동이 아니라 인공지능 모델을 이동하여 지식을 합치는 기법으로 데이터 프라이버시를 보존하면서도 여러 사용자의 데이터를 합쳐서 활용하는 것과 유사한 효과를 낼 수 있다. 그러나 지식을 합치는 과정에서 정보손실이 일어나거나 모델의 역설계로 유사 데이터를 만들어낼 수 있는 등 완벽한 기법은 아니다. 본 과제는 인공지능 응용, 데이터베이스 보안, 바이오시그널 기기, 그리고 의료 데이터 전문가의 컨소시엄을 통해 최소한의 정보손실로, 안전하게, 의료 인공지능 모델을 학습할 수 있는 연합 AI 플랫폼을 개발한다.
<그림. 연합 AI 플랫폼 개요>