Database-adaptive Re-ranking
for Enhancing Cross-modal Image Retrieval
ACM Multimedia 2021
ACM Multimedia 2021
Rintaro Yanagi, Ren Togo, Takahiro Ogawa and Miki Haseyama
Hokkaido University
Concept of the proposed retrieval method. A user can reach a single desired image even from an ambiguous text query and a biased DB by simply answering questions.
In this paper, we propose an approach that enhances arbitrary existing cross-modal image retrieval performance. Most of the cross-modal image retrieval methods mainly focus on direct computation of similarities between a text query and candidate images in an accurate way. However, their retrieval performance is affected by the ambiguity of text queries and the bias of target databases (DBs). Dealing with ambiguous text queries and DBs with bias will lead to accurate cross-modal image retrieval in real-world applications. A DB-adaptive re-ranking method using modality-driven spaces, which can extend arbitrary cross-modal image retrieval methods for enhancing their performance, is proposed in this paper. The proposed method includes two approaches: ``DB-adaptive re-ranking'' and ``modality-driven clue information extraction''. Our method estimates clue information that can effectively clarify the desired image from the whole set of a target DB and then receives user's feedback for the estimated information. Furthermore, our method extracts more detailed information of a query text and a target DB by focusing on modality-driven spaces, and it enables more accurate re-ranking. Our method allows users to reach their desired single image by just answering questions. Experimental results using MSCOCO, Visual Genome and newly introduced datasets including images with a particular object show that the proposed method can enhance the performance of state-of-the-art cross-modal image retrieval methods.
Overview of the proposed method. First, the proposed method calculates initial retrieval results by calculating similarities between a text query and candidate images using modality-driven spaces. Then in each representation space, the proposed method estimates clue information that can effectively re-rank the initial retrieval results. Finally, the proposed method receives feedback from users about the estimated clue information and re-ranks the initial retrieval results.
Retrieval results of our method for the biased MSCOCO dataset. The image surrounded by the orange frame indicates the ground truth paired with the query text in the biased MSCOCO dataset.
・R. Yanagi, R. Togo, T. Ogawa and M. Haseyama, “Database-adaptive re-ranking for enhancing cross-modal image retrieval,” ACM International Conference on Multimedia (ACM MM), 2021. (Accepted)
・R. Yanagi, R. Togo, T. Ogawa and M. Haseyama, “IR Questioner: QA-based interactive retrieval system,” ACM International Conference on Multimedia Retrieval (ICMR), 2021. (Accepted)
・R. Yanagi, R. Togo, T. Ogawa and M. Haseyama, “Interactive re-ranking for cross-modal retrieval based on object-wise question answering,” in Proceedings of 2020 ACM International Conference on Multimedia in Asia (ACM MM Asia 2020), 2020, (accepted for publication), Best Paper Runner-up Award.
Presentation Video (Pass: 0ZhjBfPq, ACM MM Asia 2020 official password)
@inproceedings{yanagi2021DB,
title={Database-adaptive Re-ranking for Enhancing Cross-modal Image Retrieval},
author={Rintaro, Yanagi and Ren, Togo and Takahiro, Ogawa and Miki Haseyama},
booktitle={ACM MM},
year={2021}
}
@inproceedings{yanagi2021ir,
title={IR Questioner: QA-based interactive retrieval system},
author={Rintaro, Yanagi and Ren, Togo and Takahiro, Ogawa and Miki Haseyama},
booktitle={ACM ICMR},
year={2021}
}
@inproceedings{yanagi2020interactive,
title={Interactive re-ranking for cross-modal retrieval based on object-wise question answering},
author={Rintaro, Yanagi and Ren, Togo and Takahiro, Ogawa and Miki Haseyama},
booktitle={ACM MM Asia},
year={2020}
}