Publications
TFM2: Training-Free Mask Matching for Open-Vocabulary Semantic Segmentations
Yaoxin Zhuo, Zachary Bessinger, Lichen Wang, Naji Khosravan, Baoxin Li, Sing Bing Kang
IEEE Winter Conference on Applications of Computer Vision (WACV), 2025
We introduce a novel Training-Free Mask Matching (TFM2) framework to enhance the Vision-Language Model (VLM) for open-vocabulary semantic segmentation tasks. Experiments demonstrate that TFM2 improves the SOTA performance up to 5%. TFM2 not only saves on costly data annotation but also significantly reduces GPU computation requirements. Moreover, TFM2 is not limited to any specific methods or backbones, which reveals its flexibility for a wide range of VLM-based applications.
[PDF] [PDF_supplement]
ZInD-Tell: Towards Translating Indoor Panoramas into Descriptions
Tonmoay Deb, Lichen Wang, Zachary Bessinger, Naji Khosravan, Eric Penner, Sing Bing Kang
IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Workshop, 2024
This paper focuses on bridging the gap between natural language descriptions, 360 panoramas, room shapes, and layouts/floorplans of indoor spaces. To enable new multi-modal (image, geometry, language) research directions in indoor environment understanding, we propose a novel extension to the Zillow Indoor Dataset (ZInD) which we use CV and LLM models and built a large-scale indoor description dataset under human supervision. We have further created a multimodal generative AI baseline capable of generating home-level descriptions.
[PDF] [PDF_supplement] [GitHub]
iBARLE: imBalance-Aware Room Layout Estimation
Taotao Jing, Lichen Wang, Naji Khosravan, Zhiqiang Wan, Zachary Bessinger, Zhengming Ding, Sing Bing Kang
IEEE Winter Conference on Applications of Computer Vision (WACV), 2024
Room layout estimation predicts layouts from a single panorama. There are significant imbalances in real-world datasets including the dimensions of layout complexity, camera locations, and variation in scene appearance. We propose imBalance-Aware Room Layout Estimation (iBARLE) framework. iBARLE consists of: (1) Appearance Variation Generation (AVG) module, which promotes visual appearance domain generalization, (2) Complex Structure Mix-up (CSMix) module, which enhances generalizability w.r.t. room structure, and (3) a gradient-based layout objective function, which allows more effective accounting for occlusions in complex layouts.
[PDF]
Rethinking Neighborhood Consistency Learning on Unsupervised Domain Adaptation
Chang Liu, Lichen Wang, Yun Fu
ACM International Conference on Multimedia (MM), 2023
Unsupervised Domain Adaptation (UDA) involves transferring learned knowledge from large-scale labeled data to target data by exploring small-scale unlabeled target samples. In this work, a novel framework called Neighborhood Consistency Learning (NCL) is proposed. NCL optimizes the classification boundary at semantic, instance/local, and batch/global levels, thereby enhancing the effectiveness and generalization of the learned machine learning models.
[PDF]
Human Motion Segmentation via Velocity -Sensitive Dual-Side Auto-Encoder
Yue Bai, Lichen Wang, Yunyu Liu, Yu Yin, Hang Di, Yun Fu
IEEE Transactions on Image Processing (TIP)
Human motion segmentation involves the segmentation of a long video into shorter and meaningful clips. In this paper, we present an unsupervised framework called Velocity-Sensitive Dual-Side Auto-Encoder (VSDA). VSDA incorporates a multi-neighbor auto-encoder to preserve long and short distance information across frames. Additionally, we introduce a velocity-sensitive (VS) guidance mechanism to further enhance performance. Comprehensive results demonstrate the effectiveness and efficiency of our model.
[PDF]
Semi-supervised Domain Adaptive Structure Learning
Can Qin, Lichen Wang, Qianqian Ma, Yu Yin, Huan Wang, Yun Fu
IEEE Transactions on Image Processing (TIP)
Semi-supervised domain adaptation (SSDA) presents a significant challenge, as simply combining SSL and DA methods often falls short in addressing both objectives. In this paper, we propose an adaptive structure learning method to regulate the cooperation between SSL and DA. Our approach involves a feature encoder and two classifier networks, and updated for contradictory purposes to achieve effective SSDA. Experimental results on both 2D and 3D datasets demonstrate the accuracy and robustness of our method.
[PDF]
Generative Multi-Label Correlation Learning
Lichen Wang, Zhengming Ding, Kasey Lee, Seungju Han, Jae-Joon Han, Changkyu Choi, Yun Fu
ACM Transactions on Knowledge Discovery from Data (TKDD)
Multi-label learning predicts multiple labels from a single instance. It presents challenges like complex label correlations and long-tail label distribution. In this paper, we proposed the Multi-Label Correlation Learning (MUCO) framework, which is a general and compact approach. MUCO explicitly and effectively learns latent label correlations by updating a label correlation tensor and utilizing a generative strategy. All networks are jointly trained to achieve the optimal performance.
[PDF]
MemREIN: Rein the Domain Shift for Cross-Domain Few-Shot Learning
Yi Xu, Lichen Wang, Yizhou Wang, Can Qin, Yulun Zhang, Yun Fu
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Few-shot learning enables models to generalize to new categories using limited samples. We presented the MemREIN framework, which incorporates Memorized, Restitution, and Instance Normalization algorithms for cross-domain few-shot learning. The memorized module is designed to capture and retain refined knowledge. MemREIN effectively addresses the domain shift challenge and achieves a performance improvement of up to 16.43% compared with SOTA baselines.
[PDF]
Adaptive Trajectory Prediction via Transferable GNN
Yi Xu, Lichen Wang, Yizhou Wang, Yun Fu
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022
Pedestrian trajectory prediction plays a crucial role in various AI applications. We propose a Transferable Graph Neural Network (TGNN) framework that enables the transfer of prediction results across different domains, such as shopping malls and streets. TGNN introduces a domain-invariant GNN and an attention-based adaptive knowledge learning module. Our work is the pioneer which fills the gap for cross-domain pedestrian trajectory prediction.
[PDF]
Meta Adversarial Weight for Unsupervised Domain Adaptation
Chang Liu, Lichen Wang, Yun Fu
SIAM International Conference on Data Mining (SDM), 2022
Unsupervised Domain Adaptation (UDA) aims to align a trained model to a target domain without labeled data. UDA approaches ignore class-level alignment due to their unsupervised nature. We address this limitation by constructing a meta-dataset that captures the target-like distribution as meta knowledge. By leveraging this meta knowledge, we achieve categorical-wise domain alignment.
[PDF]
Collaborative Attention Mechanism for Multi-Modal Time Series Classification
Yue Bai, Zhiqiang Tao, Lichen Wang, Sheng Li, Yu Yin, Yun Fu
SIAM International Conference on Data Mining (SDM), 2022
Multi-modal time series classification (MTC) leverages complementary knowledge from different modalities to enhance learning performance. In this study, we propose a Collaborative Attention Mechanism (CAM) that detects attention differences among modalities and adaptively integrates attention information for mutual benefit. Additionally, we design a modified Long Short-Term Memory (LSTM) module called Mutual-Aid RNN to facilitate multi-modal collaboration.
[PDF]
Semi-supervised Dual Relation Learning for Multi-Label Classification
Lichen Wang, Yunyu Liu, Hang Di, Can Qin, Gan Sun, Yun Fu
IEEE Transactions on Image Processing (TIP)
Multi-label learning (MLL) involves predicting multiple labels for a single instance. The main challenges in MLL include long-tail label distribution and complex label correlations. When conducting MLL in a semi-supervised manner, an additional domain shift problem arises. We proposed the Semi-supervised Dual Relation Learning (SDRL) framework. The SDRL jointly explores the inter-instance feature-level relation and the intra-instance label-level relation. Furthermore, a dual-classifier structure is deployed to facilitate domain invariant learning, enabling MLL in semi-supervised scenarios.
[PDF]
Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation
Can Qin, Handong Zhao, Lichen Wang, Huan Wang, Yulun Zhang, Yun Fu
Neural Information Processing Systems (NeurIPS), 2021
Graph Similarity Computation (GSC) plays a crucial role in various graph applications. However, existing GNN-based methods are computationally expensive. We propose a novel multi-level early-fusion approach based on a co-attention fusion network to address this issue. Additionally, a knowledge distilling strategy is deployed to enable faster inference. Our model achieves a speed improvement of over 10X compared to previous SOTA methods.
Aspect-based Sentiment Classification via Reinforcement Learning
Lichen Wang, Bo Zong, Yunyu Liu, Can Qin, Wei Cheng, Wenchao Yu, Xuchao Zhang, Haifeng Chen, Yun Fu
IEEE International Conference on Data Mining (ICDM), 2021
Aspect-based sentiment classification is an NLP task that predicts sentimental polarities aligned with specific aspects. Existing approaches heavily rely on large language models trained on extensive data. We proposed SentRL, a reinforcement learning-based framework. SentRL encourages the model to disregard task-irrelevant text and instead prioritize identifying the most effective clues within the given text. SentRL reduces the resource requirements in both the training and testing stages.
[PDF]
Domain Generalization via Feature Variation Decorrelation
Chang Liu, Lichen Wang, Kai Li, Yun Fu
ACM International Conference on Multimedia (MM), 2021
Domain generalization aims to develop models that can generalize to unseen target domains based on multiple source domains. Considering that class-irrelevant information can result in negative transfer, we propose a model that linearly disentangles variations in the feature space and applies a novel class decorrelation regularization to the feature variations. This approach effectively and robustly improves the model performance.
[PDF]
Skeleton Aware Multi-Modal Sign Language Recognition
Songyang Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li, Yun Fu
IEEE Computer Vision and Pattern Recognition (CVPR) Workshop (First prize), 2021
We propose a multi-modal sign language recognition method that achieved top-1 performance in both RGB (98.42%) and RGB-D (98.53%) tracks of the 2021 CVPR Challenge on Sign Language Recognition (Chal-SLR). Specifically, we introduced a Sign Language Graph Convolution Network (SL-GCN) to capture embedded dynamics and a Separable Spatial-Temporal Convolution Network (SSTCN) to exploit skeleton features. The code has been released and verified by the committee.
Generic Multi-label Annotation via Adaptive Graph and Marginalized Augmentation
Lichen Wang, Zhengming Ding, Yun Fu
ACM Transactions on Knowledge Discovery from Data (TKDD)
Multi-label learning involves predicting multiple labels for a single instance. We proposed the Adaptive Graph and Marginalized Augmentation (AGMA) framework for multi-label learning in a semi-supervised scenario. By incorporating a small amount of labeled data with unlabeled data, we observed a significant performance boost. Additionally, a feature-label autoencoder is introduced to enhance model efficiency during the inference stage.
[PDF]
Contradictory Structure Learning for Semi-supervised Domain Adaptation
Can Qin, Lichen Wang, Qianqian Ma, Yu Yin, Huan Wang, Yun Fu
SIAM International Conference on Data Mining (SDM), 2021
Our objective is to simultaneously tackle semi-supervised learning and domain adaptation. However, simply combining these methods often results in failure. To address this issue, we propose a novel model consisting of a shared feature encoder network and two classifier networks. These components are iteratively trained in contradictory directions, as this training strategy enhances the performance of both modules.
Correlative Channel-Aware Fusion for Multi-View Time Series Classification
Yue Bai, Lichen Wang, Zhiqiang Tao, Sheng Li, Yun Fu
AAAI Conference on Artificial Intelligence (AAAI), 2021
Multi-modal time series data classification leverages temporal information from multiple modalities to achieve higher learning performance. We proposes a Correlative Channel-Aware Fusion (C2AF) network. The C2AF network incorporates a two-stream structured encoder to learn intra/inter-modal correlations. Additionally, a channel-aware learnable fusion mechanism is employed to assemble labels across modalities and obtain the final results.
I3DOL: Incremental 3D Object Learning without Catastrophic Forgetting
Jiahua Dong, Yang Cong, Gan Sun, Bingtao Ma, Lichen Wang
AAAI Conference on Artificial Intelligence (AAAI), 2021
We propose an Incremental 3D Object Learning (I3DOL) framework, which is the first work to learn new 3D classes in an online manner. To address the challenges in this setting, we design an adaptive geometric-centroid module, a geometric-aware attention mechanism, and a score fairness compensation strategy. Through experiments on representative 3D datasets, we validate the superiority of our I3DOL framework.
Dual-Side Auto-Encoder for High-Dimensional Time Series Segmentation
Yue Bai, Lichen Wang, Yunyu Liu, Yu Yin, Yun Fu
International Conference on Data Mining (ICDM), 2020
We propose a novel unsupervised representation learning framework called Dual-Side Auto-Encoder (DSAE) for segmenting high-dimensional time series data. DSAE includes a single-to-multiple auto-encoder module that is designed to encode and decode the given high-dimensional data based on the same weights. We demonstrate the effectiveness of the model on six human action datasets.
Generative View-Correlation Adaptation for Semi-Supervised Multi-View Learning
Yunyu Liu, Lichen Wang, Yue Bai, Can Qin, Zhengming Ding, Yun Fu
European Conference on Computer Vision (ECCV), 2020
Multi-Modal Learning (MML) explores the data with multiple modalities. We propose a View-Correlation Adaptation (VCA) framework for conducting MML in a semi-supervised manner. The VCA framework incorporates a cross-view adversarial learning strategy, a data augmentation method, and a late fusion network. Experiments are conducted in human action recognition tasks.
EV-Action: Electromyography-Vision Multi-Modal Action Dataset
Lichen Wang, Bin Sun, Joseph Robinson, Taotao Jing, Yun Fu
IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2020
We introduced EV-Action, a large-scale multi-modal human action dataset comprising five modalities: RGB, depth, electromyography (EMG), and two skeleton modalities. The uniqueness of EV-Action lies in the inclusion of EMG signals collected from human muscles. An industry-level motion capturing system is deployed to obtain comprehensive, high precision, and fine-grained skeleton motion data. A baseline method for analyzing all five modalities is also provided.
Inductive and Unsupervised Representation Learning on Graph Structured Objects
Lichen Wang, Bo Zong, Qianqian Ma, Wei Cheng, Jingchao Ni, Wenchao Yu, Yanchi Liu, Dongjing Song, Haifeng Chen, Yun Fu
International Conference on Learning Representations (ICLR), 2020
We proposed a novel mechanism for graph representation learning called SEED, which consists of the Sampling, Encoding, and Embedding Distributions modules. SEED enables the inductive and unsupervised learning of graph representations. Furthermore, through theoretical analysis, we demonstrate the close connection between SEED and graph isomorphism.
Dual Relation Semi-Supervised Multi-Label Learning
Lichen Wang, Yunyu Liu, Can Qin, Gan Sun, and Yun Fu
AAAI Conference on Artificial Intelligence (AAAI), 2020
Multi-label learning associates multiple labels with a single instance. It is a challenging task due to the long-tail label distribution and intricate label correlations. To tackle these challenges, a dual relation learning approach is proposed, which explores correlations at both the instance-level and label-level. This strategy leads to higher performance in multi-label learning scenario.
PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation
Can Qin, Haoxuan You, Lichen Wang, C.-C. Jay Kuo, and Yun Fu
Neural Information Processing Systems (NeurIPS), 2019
Domain Adaptation (DA) enables models to adapt to new tasks with limited training data. A 3D Domain Adaptation Network (PointDAN) is specifically designed to enhance DA performance when working with 3D point cloud data. PointDAN incorporates a local global adaptive module and a node-attention module. Additionally, a large-scale dataset (PointDA-10) for 3D DA tasks has been collected and made available to research community.
Generative Correlation Discovery Network for Multi-Label Learning
Lichen Wang, Zhengming Ding, Seungju Han, Jae-Joon Han, Changkyu Choi, Yun Fu
IEEE International Conference on Data Mining (ICDM) (Long paper), 2019
Multi-label learning recovers multiple labels from a single instance. This task requires an abundance of training data due to the long-tail label distribution and complex label correlations. A Generative Correlation Discovery Network (GCDN) is proposed, which generates extra training samples and efficiently learns label correlations. The two modules are jointly trained to enhance overall performance.
Job2Vec: Job Title Benchmarking with Collective Multi-View Representation Learning
Denghui Zhang, Junming Liu, Hengshu Zhu, Yanchi Liu, Lichen Wang, Pengyang Wang, Hui Xiong
ACM International Conference on Information and Knowledge Management (CIKM) (Long paper), 2019
Job Title Benchmarking (JTB) aims to match job titles with similar expertise levels across different companies. JTB faces several challenges, including subjective naming, missing information, and sparse title connections. This study constructs a Job-Graph that captures all aspects of title transaction knowledge. A collective multi-modal method called Job2Vec is proposed to effectively handle these challenges.
[PDF]
Generative Multi-View Human Action Recognition
Lichen Wang, Zhengming Ding, Zhiqiang Tao, Yunyu Liu, Yun Fu
International Conference on Computer Vision (ICCV) (Oral), 2019
Multi-modal action recognition classifies human actions based on multiple modalities (e.g., RGB and depth). We proposed a generative framework, which is able to generate new and across modal samples to enhance the learning performance. Correlations across different modalities are further learned by a correlation learning module. Several action datasets are evaluated in the experiments.
Generatively Inferential Co-Training for Unsupervised Domain Adaptation
Can Qin, Lichen Wang, Yulun Zhang, Yun Fu
International Conference on Computer Vision (ICCV) Workshop (Best Paper), 2019
Domain Adaptation (DA) helps model training by utilizing existing data instead of collecting new data. However, the available data is often of lower quality. We propose a novel Generativity Inferential Co-Training (GICT) framework. It leverages cross-domain feature generation and a modified co-training strategy to achieve high performance.
Online Multi-task Clustering for Human Motion Segmentation
Gan Sun, Yang Cong, Lichen Wang, Zhengming Ding, Yun Fu
International Conference on Computer Vision (ICCV) Workshop, 2019
Human motion segmentation (HMS) segments long human action videos into multiple clips. In contrast to existing HMS methods that typically perform segmentation in offline settings, we present an Online Multi-task Clustering (OMTC) model designed for online and multi-agent scenarios. OMTC enables video segmentation even when only partial frames are captured, making it more practical for real-time human action analysis tasks.
[PDF]
Low-Rank Transfer Human Motion Segmentation
Lichen Wang, Zhengming Ding, Yun Fu
IEEE Transactions on Image Processing (TIP)
Human motion segmentation (HMS) involves segmenting long human action videos into multiple clips. We have developed a novel transfer subspace clustering method that leverages pre-existing well-labeled source data. Our approach incorporates a temporal information preserving graph and a weighted low-rank constraint, which is validated in several human action datasets.
Image Super-Resolution Using Very Deep Residual Channel Attention Networks
Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, Yun Fu
European Conference on Computer Vision (ECCV), 2018. (3000+ citations)
Image Super-Resolution (SR) recovers low-resolution images to high-resolution images. We proposed a very deep Residual Channel Attention Networks (RCAN). It contains a specific design Residual in Residual (RIR) module which enables long and short skip connections which fully utilizes knowledge learned across layers. Attention mechanism is further adopted. RCAN achieved the SOTA performance in all general SR tasks.
Adaptive Graph Guided Embedding for Multi-label Annotation
Lichen Wang, Zhengming Ding, Yun Fu
International Joint Conference on Artificial Intelligence (IJCAI), 2018
Multi-label learning aims to recover multiple labels from a single instance. We proposed a novel approach to achieve higher performance in semi-supervised fashion, where the distributions of labeled and unlabeled samples are jointly explored. Extra information is extracted from the distributions to improve the performance.
Learning Transferable Subspace for Human Motion Segmentation
Lichen Wang, Zhengming Ding, Yun Fu
AAAI Conference on Artificial Intelligence (AAAI), 2018
We propose a novel transferable subspace clustering approach that aims to enhance the clustering performance of target data by leveraging extra information from relevant well-labeled source data. Our method incorporates a specific design affiliation graph and is applied to human action segmentation tasks, where the goal is to segment a single long action video into a few meaningful clips.
Modified Multi-target Recognition Based on CamCom
Lichen Wang, Aimin Zhang, Chujia Guo, Pervez Bhan, Tian Yan
Chinese Control Conference (CCC), 2015
Camera Communication (CamCom) receives digital information via cameras. We designed an ID signal sending protocol as well as a machine learning based pipeline to process and identify the signal from the captured videos. Our method is able to robustly identify and localize multiple cooperative targets in pixel level accuracy.
[PDF]
3-D Reconstruction for SMT Solder Joint Based on Joint Shadow
Lichen Wang, Aimin Zhang, Chujia Guo, Songyun Zhao, Pervez Bhan
Chinese Control and Decision Conference (CCDC), 2015
A computer vision algorithm is proposed which predicts the shape of solder joint. It extracts the shadows of joints and estimates the shape from the shadows. A few assumptions (e.g., solder surface tension) are made for this task specific application. Experiments are conducted in real-world system and achieved practical results.
[PDF]
Thesis
Correlation Discovery for Multi-View and Multi-Label Learning
Lichen Wang
Thesis, Doctors of Philosophy, 2021
A few algorithms and applications are proposed for multi-modal and multi-label learning. Multi-modal learning algorithms aim to fully explore the latent correlations across various given modals/views to improve the learning performance. Multi-label learning methods explore the correlations in label space for higher prediction results. Comprehensive experiments provide better understanding about the value of correlations between modalities and labels.
Vision based PCB Defects Detection Algorithms Research and System Implementation
Lichen Wang
Thesis, Master of Science in Engineering, 2016
We built a system for detecting the defects of the surface mount soldering process. It is able to recognize a few types of defect such as components missing, dislocation, polarity reverse, etc. The solution includes hardware design (e.g., structure, illumination, camera setup), and software implementation (e.g., computer vision, machine learning algorithms for classification as well as measurement tasks).
Vision Based Intravenous Bottle Foreign Matter Inspection
Lichen Wang
Thesis, Bachelor of Engineering, 2013
A systematic solution for the detection of foreign substances in intravenous bottles is implemented. The solution contains both hardware and software components. The hardware includes an illumination structure and an electromechanical system, while the software incorporates a specifically designed computer vision algorithm. The system demonstrates high accuracy and efficiency, as it has been successfully tested and deployed in real-world manufacturing procedures.