Welcome to Lichen's Homepage

Publications

TFM2: Training-Free Mask Matching for Open-Vocabulary Semantic Segmentations

Yaoxin Zhuo, Zachary Bessinger, Lichen Wang, Naji Khosravan, Baoxin Li, Sing Bing Kang

IEEE Winter Conference on Applications of Computer Vision (WACV), 2025

We introduce a novel Training-Free Mask Matching (TFM2) framework to enhance the Vision-Language Model (VLM) for open-vocabulary semantic segmentation tasks. Experiments demonstrate that TFM2 improves the SOTA performance up to 5%. TFM2 not only saves on costly data annotation but also significantly reduces GPU computation requirements. Moreover, TFM2 is not limited to any specific methods or backbones, which reveals its flexibility for a wide range of VLM-based applications.

[PDF] [PDF_supplement]

ZInD-Tell: Towards Translating Indoor Panoramas into Descriptions

Tonmoay Deb, Lichen Wang, Zachary Bessinger, Naji Khosravan, Eric Penner, Sing Bing Kang

IEEE Conference on Computer Vision and Pattern Recognition(CVPR) Workshop, 2024

This paper focuses on bridging the gap between natural language descriptions, 360 panoramas, room shapes, and layouts/floorplans of indoor spaces. To enable new multi-modal (image, geometry, language) research directions in indoor environment understanding, we propose a novel extension to the Zillow Indoor Dataset (ZInD) which we use CV and LLM models and built a large-scale indoor description dataset under human supervision. We have further created a multimodal generative AI baseline capable of generating home-level descriptions.

[PDF] [PDF_supplement] [GitHub]

iBARLE: imBalance-Aware Room Layout Estimation

Taotao Jing, Lichen Wang, Naji Khosravan, Zhiqiang Wan, Zachary Bessinger, Zhengming Ding, Sing Bing Kang

IEEE Winter Conference on Applications of Computer Vision (WACV), 2024

Room layout estimation predicts layouts from a single panorama. There are significant imbalances in real-world datasets including the dimensions of layout complexity, camera locations, and variation in scene appearance. We propose imBalance-Aware Room Layout Estimation (iBARLE) framework. iBARLE consists of: (1) Appearance Variation Generation (AVG) module, which promotes visual appearance domain generalization, (2) Complex Structure Mix-up (CSMix) module, which enhances generalizability w.r.t. room structure, and (3) a gradient-based layout objective function, which allows more effective accounting for occlusions in complex layouts.

Publications

TFM2: Training-Free Mask Matching for Open-Vocabulary Semantic Segmentations

ZInD-Tell: Towards Translating Indoor Panoramas into Descriptions

iBARLE: imBalance-Aware Room Layout Estimation

Rethinking Neighborhood Consistency Learning on Unsupervised Domain Adaptation

Human Motion Segmentation via Velocity -Sensitive Dual-Side Auto-Encoder

Semi-supervised Domain Adaptive Structure Learning

Generative Multi-Label Correlation Learning

MemREIN: Rein the Domain Shift for Cross-Domain Few-Shot Learning

Adaptive Trajectory Prediction via Transferable GNN

Meta Adversarial Weight for Unsupervised Domain Adaptation

Collaborative Attention Mechanism for Multi-Modal Time Series Classification

Semi-supervised Dual Relation Learning for Multi-Label Classification

Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation

Aspect-based Sentiment Classification via Reinforcement Learning

Domain Generalization via Feature Variation Decorrelation

Skeleton Aware Multi-Modal Sign Language Recognition

Generic Multi-label Annotation via Adaptive Graph and Marginalized Augmentation

Contradictory Structure Learning for Semi-supervised Domain Adaptation

Correlative Channel-Aware Fusion for Multi-View Time Series Classification

I3DOL: Incremental 3D Object Learning without Catastrophic Forgetting

Dual-Side Auto-Encoder for High-Dimensional Time Series Segmentation

Generative View-Correlation Adaptation for Semi-Supervised Multi-View Learning

EV-Action: Electromyography-Vision Multi-Modal Action Dataset

Inductive and Unsupervised Representation Learning on Graph Structured Objects

Dual Relation Semi-Supervised Multi-Label Learning

PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation

Generative Correlation Discovery Network for Multi-Label Learning

Job2Vec: Job Title Benchmarking with Collective Multi-View Representation Learning

Generative Multi-View Human Action Recognition

Generatively Inferential Co-Training for Unsupervised Domain Adaptation

Online Multi-task Clustering for Human Motion Segmentation

Low-Rank Transfer Human Motion Segmentation

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

Adaptive Graph Guided Embedding for Multi-label Annotation

Learning Transferable Subspace for Human Motion Segmentation

Modified Multi-target Recognition Based on CamCom

3-D Reconstruction for SMT Solder Joint Based on Joint Shadow

Thesis

Correlation Discovery for Multi-View and Multi-Label Learning

Vision based PCB Defects Detection Algorithms Research and System Implementation

Vision Based Intravenous Bottle Foreign Matter Inspection