Presentation Details

(DAY 1)

Online by YouTube

10:00~11:00

[초청강연]

Building an Information Seeking Agent for an Evolving World
Prof. Eunsol Choi (Univ of Texas at Austin)
Abstract: NLP systems increasingly rely on large pre-trained language models and their parametric knowledge. Yet, models often provide unreliable and outdated information, having trained on a hodge hodge of web corpus. In this talk, I will describe our lab’s work on model adaptation. First, I will investigate targeted updates to language models – can language models remember injected facts and make inferences based on them? We find that fine-tuning shows little propagation of injected knowledge while prepending the same information at inference time works robustly. We present a context distillation approach that can both impart knowledge and enable broader inferences. In the second part of the talk, I will present our recent work on improving models from user interaction data. Focusing on extractive question answering (QA), we study iteratively improving an NLP system by learning from human user feedback over time. Together, this talk will present challenges and progresses in building models that can adapt to an evolving world.
Bio: Eunsol Choi is an assistant professor in the Computer Science department at the University of Texas at Austin. Her research area spans natural language processing and machine learning. She is particularly interested in interpreting and reasoning about text in the real world context. Prior to UT, she spent time as a visiting scholar at Google AI. She received a Ph.D. from University of Washington and B.A from Cornell University. She is a recipient of Facebook research fellowship, Google faculty research award, Sony research award, and an outstanding paper award at EMNLP.

11:10~12:00

DIP: Dead code Insertion based Black-box Attack for Programming Language Model, ACL 2023
나철원(석박통합과정)
Automatic processing of source code, such as code clone detection and software vulnerability detection, is very helpful to software engineers. Large pre-trained Programming Language (PL) models (such as CodeBERT, GraphCodeBERT, CodeT5, etc.), show very powerful performance on these tasks. However, these PL models are vulnerable to adversarial examples that are generated with slight perturbation. Unlike natural language, an adversarial example of code must be semantic-preserving and compilable. Due to the requirements, it is hard to directly apply the existing attack methods for natural language models. In this paper, we propose DIP (Dead code Insertion based Blackbox Attack for Programming Language Model), a high-performance and efficient black-box attack method to generate adversarial examples using dead code insertion. We evaluate our proposed method on 9 victim downstream-task large code models. Our method outperforms the state-of-the-art black-box attack in both attack efficiency and attack quality, while generated adversarial examples are compiled preserving semantic functionality.

Towards Suicide Prevention from Bipolar Disorder with Temporal Symptom-Aware Multitask Learning, KDD 2023
이다은 (석박통합과정)
Bipolar disorder (BD) is closely associated with an increased risk of suicide. However, while the prior work has revealed valuable insight into understanding the behavior of BD patients on social media, little attention has been paid to developing a model that can predict the future suicidality of a BD patient. Therefore, this study proposes a multi-task learning model for predicting the future suicidality of BD patients by jointly learning current symptoms. We build a novel BD dataset clinically validated by psychiatrists, including 14 years of posts on bipolar-related subreddits written by 818 BD patients, along with the annotations of future suicidality and BD symptoms. We also suggest a temporal symptom-aware attention mechanism to determine which symptoms are the most influential for predicting future suicidality over time through a sequence of BD posts. Our experiments demonstrate that the proposed model outperforms the state-of-the-art models in both BD symptom identification and future suicidality prediction tasks. In addition, the proposed temporal symptom-aware attention provides interpretable attention weights, helping clinicians to apprehend BD patients more comprehensively and to provide timely intervention by tracking mental state progression.

D-ViSA: A Dataset for Detecting Visual Sentiment from Art Images, ICCV 2023
김서윤 (석사과정)
Detecting emotions evoked by art has been receiving great attention recently. Although previous works provide a variety of datasets consisting of art images and corresponding emotion labels, little attention has been paid to the continuous and dimensional characteristics of human emotions, especially in the domain of art. We propose a dataset for detecting visual sentiment from art images, D-ViSA, whose labels consist of both categorical and dimensional emotion labels which can be implemented in a wide range of visual sentiment analysis research regarding art. We compare several deep learning baselines in two specific tasks, single-feature, and multi-feature dimensional emotion regression. Our experiments lead to the conclusion that our dataset is plausible for both regression tasks with deep learning baselines. We assume that our dataset contributes to the field of artwork analysis and provides insights into human emotions evoked by art.

13:30~14:45

Toward a Better Understanding of Loss Functions for Collaborative Filtering, CIKM 2023
박성민(석박통합과정)
Collaborative filtering (CF) is a pivotal technique in modern recommender systems. The learning process of CF models typically consists of three components: interaction encoder, loss function, and negative sampling. Although many existing studies have proposed various CF models to design sophisticated interaction encoders, recent work shows that simply reformulating the loss functions can achieve significant performance gains. This paper delves into analyzing the relationship among existing loss functions. Our mathematical analysis reveals that the previous loss functions can be interpreted as alignment and uniformity functions: (i) the alignment matches user and item representations, and (ii) the uniformity disperses user and item distributions. Inspired by this analysis, we propose a novel loss function that improves the design of alignment and uniformity considering the unique patterns of datasets called Margin-aware Alignment and Weighted Uniformity (MAWU). The key novelty of MAWU is two-fold: (i) margin-aware alignment (MA) mitigates user/item-specific popularity biases, and (ii) weighted uniformity (WU) adjusts the significance between user and item uniformities to reflect the inherent characteristics of datasets. Extensive experimental results show that MF and LightGCN equipped with MAWU are comparable or superior to state-of-the-art CF models with various loss functions on three public datasets.

Dual Policy Learning for Aggregation Optimization in Graph Neural Network-based Recommender Systems, The WEB 2023
정희수(박사과정)
Graph Neural Networks (GNNs) provide powerful representations for recommendation tasks. GNN-based recommendation systems capture the complex high-order connectivity between users and items by aggregating information from distant neighbors and can improve the performance of recommender systems. Recently, Knowledge Graphs (KGs) have also been incorporated into the user-item interaction graph to provide more abundant contextual information; they are exploited to address cold-start problems and enable more explainable aggregation in GNN-based recommender systems (GNN-Rs). However, due to the heterogeneous nature of users and items, developing an effective aggregation strategy that works across multiple GNN-Rs, such as LightGCN and KGAT, remains a challenge. In this paper, we propose a novel reinforcement learning-based message passing framework for recommender systems, which we call DPAO (Dual Policy framework for Aggregation Optimization). This framework adaptively determines high-order connectivity to aggregate users and items using dual policy learning. Dual policy learning leverages two Deep-Q-Network models to exploit the user- and item-aware feedback from a GNN-R and boost the performance of the target GNN-R. Our proposed framework was evaluated with both non-KG-based and KG-based GNN-R models on six real-world datasets, and their results show that our proposed framework significantly enhances the recent base model, improving nDCG and Recall by up to 63.7% and 42.9%, respectively.

KID34K: A Dataset for Online Identity Card Fraud Detection, CIKM 2023
백승연(석사과정)
To mitigate the risks associated with fraudulent ID card verification, we present a novel dataset for classifying cases where the ID card images that users upload to the verification system are genuine or digitally represented. Our dataset is replicas designed to resemble real ID cards, making it available while avoiding privacy issues. Through extensive experiments, we demonstrate that our dataset is effective for detecting digitally represented ID card images, not only in our replica dataset but also in the dataset consisting of real ID cards.

Towards Understanding of Deepfake Videos in the Wild, CIKM 2023
김지원(석사과정)
Our contributions in this IRB-approved study are to bridge this knowledge gap from current real-world deepfakes by providing in-depth analysis.We first present the largest and most diverse and recent deepfake dataset (RWDF-23) collected from the wild to date, consisting of 2,000 deepfake videos collected from 4 platforms targeting 4 different languages span created from 21 countries: Reddit, YouTube, TikTok, and Bilibili. By expanding the dataset's scope beyond the previous research, we capture a broader range of real-world deepfake content, reflecting the ever-evolving landscape of online platforms. Also, we conduct a comprehensive analysis encompassing various aspects of deepfakes, including creators, manipulation strategies, purposes, and real-world content production methods. This allows us to gain valuable insights into the nuances and characteristics of deepfakes in different contexts. Lastly, in addition to the video content, we also collect viewer comments and interactions, enabling us to explore the engagements of internet users with deepfake content.

15:00~16:15

Self-Feedback DETR for Temporal Action Detection, ICCV 2023
김지환(박사과정)
Temporal Action Detection (TAD) is challenging but fundamental for real-world video applications. Recently, DETR-based models have been devised for TAD but have not performed well yet. In this paper, we point out the problem in the self-attention of DETR for TAD; the attention modules focus on a few key elements, called temporal collapse problem. It degrades the capability of the encoder and decoder since their self-attention modules play no role. To solve the problem, we propose a novel framework, Self-DETR, which utilizes cross-attention maps of the decoder to reactivate self-attention modules. We recover the relationship between encoder features by simple matrix multiplication of the cross-attention map and its transpose. Likewise, we also get the information within decoder queries. By guiding collapsed self-attention maps with the guidance map calculated, we settle down the temporal collapse of self-attention modules in the encoder and decoder. Our extensive experiments demonstrate that Self-DETR resolves the temporal collapse problem by keeping high diversity of attention over all layers.

Query-Dependent Video Representation for Moment Retrieval and Highlight Detection, CVPR 2023
현상익(박사과정)
Recently, video moment retrieval and highlight detection~(MR/HD) are being spotlighted as the demand for video understanding is drastically increased. The key objective of MR/HD is to localize the moment and estimate clip-wise accordance level, i.e., saliency score, to the given text query. Although the recent transformer-based models brought some advances, we found that these methods do not fully exploit the information of a given query. For example, the relevance between text query and video contents is sometimes neglected when predicting the moment and its saliency. To tackle this issue, we introduce Query-Dependent DETR~(QD-DETR), a detection transformer tailored for MR/HD. As we observe the insignificant role of a given query in transformer architectures, our encoding module starts with cross-attention layers to explicitly inject the context of text query into video representation. Then, to enhance the model's capability of exploiting the query information, we manipulate the video-query pairs to produce irrelevant pairs. Such negative~(irrelevant) video-query pairs are trained to yield low saliency scores, which in turn, encourages the model to estimate precise accordance between query-video pairs. Lastly, we present an input-adaptive saliency predictor which adaptively defines the criterion of saliency scores for the given video-query pairs. Our extensive studies verify the importance of building the query-dependent representation for MR/HD. Specifically, QD-DETR outperforms state-of-the-art methods on QVHighlights, TVSum, and Charades-STA datasets.

Boost-up Efficiency of Defective Solar Panel Detection with Pre-trained Attention Recycling, IEEE Transactions on Industry Applications 2023
박영현(박사과정)
Methods that enable the visual inspection of solar panels are currently in demand, as a huge number of solar panels are now being deployed as a sustainable energy source. One of the solutions for inspection automation is an end-to-end deep learning framework, but this is not recommended for this problem because such a framework requires not only powerful computational resources, but also a large-scale class-balanced dataset. In this study, we present a cost-effective solar panel defect detection method. We emphasize the spatial feature of defects by utilizing an attention map that is generated by a pre-trained attention mechanism that can give attention on stroke ends, gathering, and bends. We define and extract 13 statistical features from the attention map, and then feed them into conventional machine learning model. Therefore, we no longer require energy depleting models such as end-to-end neural classifiers to discriminate between non-defective and defective panels. Five conventional machine learning models and one state-of-the-art (SOTA) deep learning model—i. e., EfficientNet—are used to generalize the experimental results. The results of the comparative experiments indicate that our approach, which includes attention mechanism recycling and statistical feature extraction, is guaranteed to provide cost-effective defect detection in general with performance that is competitive with that of recent SOTA. In future research, we expect that our approach can be adopted in other defect detection tasks such as steel or film manufacturing processes.

Kernel Shape Control for Row-Efficient Convolution on Processing-In-Memory Arrays, ICCAD 2023
이존이(석박통합과정)
Processing-in-memory (PIM) architectures have been highlighted as one of the viable solutions for faster and more power-efficient convolutional neural networks (CNNs) inference. Recently, shift and duplicate kernel (SDK) convolutional weight mapping scheme was proposed, achieving up to 50% throughput improvement over the prior arts. However, the traditional pattern-based pruning methods, which were adopted for row-skipping and computing cycle reduction, are not optimal for the latest SDK mapping due to structural irregularity caused by the shifted and duplicated kernels. To address this issue, we propose a method called kernel shape control (KERNTROL) that aims to promote structural regularity for achieving a high row-skipping ratio and model accuracy. Instead of pruning certain weight elements permanently, KERNTROL controls the kernel shapes through the omission of certain weights based on their mapped columns. In comparison to the latest pattern-based pruning approaches, KERNTROL achieves up to 36.4% improvement in the compression rate, and 38.6% in array utilization with maintaining the original model accuracy.

16:30~18:00 [Doctoral Workshop]

Scalable and Accurate Session-based Recommendation, WSDM 2022
최민진(석박통합과정)
Session-based recommendation (SR) predicts the next items from a sequence of previous items consumed by an anonymous user, e.g., on e-commerce or multimedia streaming services. One of critical aspect of recommender systems is computational efficiency and scalability, considering practical feasibility in commercial applications. To account for both accuracy and scalability, we proposed a novel session-based recommendation models. First, we introduced simple-yet-effective linear models for considering the holistic aspects of the sessions, namely Session-aware Item Similarity/Transition (SLIST) model. Second, we presented a session-based recommendation with a random walk with restart, namely SWalk. Finally, to capture multiple user interests within the session, we proposed Exploiting Diverse user Interests for Sessionbased recommendatiON (EDISON). Extensive experimental results demonstrate that our models outperforms existing state-of-theart SBR models on real-world datasets.

Learning to Understand Code and Generate Text in Pre-trained Language Models for Program and Natural Language, Doctoral Dissertation
최윤석(석박통합과정)
The rapid advancement of artificial intelligence (AI) technology in the field of software development has led to a growing demand for techniques that enable software developers to understand code and translate it into natural language. Precise and high-quality descriptions of programming languages offer several advantages, including enhancing software comprehension, collaboration, and increasing accessibility for non-experts in software development. The recent success of Pre-trained Language Models (PLMs) for program and natural language has been attributed to their utilization of large-scale code and text corpora. But, they did not consider the distinct properties of programming language. It needs to shift focus to distinct traits of programming language. This dissertation focuses on the task of generating natural language descriptions from code, considering the distinct properties of programming languages. For a better understanding of code and generating text, we propose three main components to utilize PLMs for program and natural language: effective modeling for code, integration of code and natural language, and efficient parameter learning for program language model. In the first component, we address the method of effectively utilizing the structural information of code as well as the sequential information of code for effectively modeling code. The second component aims to bridge the gap between code and natural language by utilizing external knowledge to connect both as inputs. We develop techniques that effectively combine code and natural language, enabling connections between the two different properties of languages. The third component focuses on the efficient learning of large pre-trained language models for program and natural language. We explore parameter-efficient learning techniques for full-data learning, few-shot learning, and cross-linguistic settings. In this dissertation, we propose novel components for generating natural language descriptions from programming languages. Our contributions include learning sequential and structural features of code for source code summarization, utilizing external knowledge and prompt tuning for code question answering, and designing parameter-efficient learning approaches for PLMs in the context of program and natural language. By addressing these perspectives and challenges, this dissertation provides insights and advancements in generating natural language descriptions from code, contributing to enhanced code comprehension and communication among software developers.

Techniques for Alleviating Regional Quality Imbalance within Synthesized Scene and Clothed Human Images, Doctoral Dissertation
심상헌(박사과정)
We discuss about techniques to alleviate regional quality imbalance problem for synthesizing realistic scene and clothed human images. First, we propose an attention mechanism for a scene image generation, that encourage GANs to generate diverse object classes in a high quality by amplifying local peaks of activations. Second, we propose a network architecture for clothed human image synthesis, that can avoid the texture squeezing artifacts around the sleeves. Experimental results show that our proposed methods successfully alleviate the regional quality imbalance problem.