FashionNTM

FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory

Accepted at ICCV 2023

Anwesan Pal, Sahil Wadhwa, Ayush Jaiswal, Xu Zhang, Yue Wu, Rakesh Chada, Pradeep Natarajan and Henrik I. Christensen

*UC San Diego, **Amazon Alexa Natural Understanding

Abstract

Multi-turn textual feedback-based fashion image retrieval focuses on a real-world setting, where users can iteratively provide information to refine retrieval results until they find an item that fits all their requirements. In this work, we present a novel memory-based method, called FashionNTM, for such a multi-turn system. Our framework incorporates a new Cascaded Memory Neural Turing Machine (CM-NTM) approach for implicit state management, thereby learning to integrate information across all past turns to retrieve new images, for a given turn. Unlike vanilla Neural Turing Machine (NTM), our CM-NTM operates on multiple inputs, which interact with their respective memories via individual read and write heads, to learn complex relationships. An extensive evaluation conducted shows that our proposed method outperforms the previous state-of-the-art algorithm by 50.5%, on Multi-turn FashionIQ -- the only existing multi-turn fashion dataset currently, in addition to having a relative improvement of 12.6% on Multi-turn Shoes -- an extension of the single-turn Shoes dataset that we created in our work. We further analyzed the model performance in a real-world setting, where users actively interacted with our system to retrieve their desired image across multiple turns. Finally, a human preference study was conducted whose results further validate that FashionNTM largely outperforms previous works.

Paper arXiv Supplementary

The complete FashionNTM framework

Quantitative results on Multi-turn FashionIQ

Quantitative results on Multi-turn Shoes

Cite as:

@inproceedings{pal2023fashionntm, author={A. Pal and S. Wadhwa and A. Jaiswal and X. Zhang and Y. Wu and R. Chada and P. Natarajan and H. I. Christensen*}, booktitle={2023 International Conference on Computer Vision (ICCV)}, title={FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory}, year={2023}}

Page updated

Report abuse