Confidence-aware Pseudo-label Learning for Weakly Supervised Visual Grounding

Yang Liu1, 2*, Jiahua Zhang1, Qingchao Chen3 , Yuxin Peng1

1 Wangxuan Institute of Computer Technology, Peking University

2 National Key Laboratory of General Artificial Intelligence, BIGAI

3National Institute of Health Data Science, Peking University

Abstract

Visual grounding aims at localizing the target object in image which is most related to the given free-form natural language query. As labeling the position of target object is labor-intensive, the weakly supervised methods, where only image-sentence annotations are required during model training have recently received increasing attention. Most of the existing weakly-supervised methods first generate region proposals via pre-trained object detectors and then employ either cross-modal similarity score or reconstruction loss as the criteria to select proposal from them. However, due to the cross-modal heterogeneous gap, these method often suffer from high confidence spurious association and model prone to error propagation.  In this paper, we propose Confidence-aware Pseudo-label Learning (CPL) to overcome the above limitations. Specifically, we first adopt both the uni-modal and cross-modal pre-trained models and propose conditional prompt engineering to automatically generate multiple `descriptive, realistic and diverse' pseudo language queries for each region proposal, and then establish reliable cross-modal association for model training based on the uni-modal similarity score (between pseudo and real text queries). Secondly, we propose a confidence-aware pseudo label verification module which reduces the amount of noise encountered in the training process and the risk of error propagation. Experiments on five widely used datasets validate the efficacy of our proposed components and demonstrate state-of-the-art performance.

Publications

Confidence-aware Pseudo-label Learning for Weakly Supervised Visual Grounding.

Yang Liu, Jiahua Zhang, Qingchao Chen, Yuxin Peng

ICCV, 2023

PDF  | Code | Bibtex