December 10, 2021 (all times in GMT)
Join the breakout session for the Image Similarity Challenge 2021 using the zoom link found here! Note: you must be registered for NeurIPS 2021 workshops to attend.
Update December 13, 2021: Please find here the recording of the entire breakout session. Links to slides, GitHub code, & Arxiv for individual talks can be found in the respective expandable section below. Thank you to everyone who participated in or helped organize this competition, congratulations again to our winners, and to everyone else who attended or is watching later, thanks for your interest!
Update June 27, 2022: The dataset for the Image Similarity Challenge 2021 (DISC21) can be downloaded from the Meta AI website here.
10:05-25am: Plenary presentation (recorded)
10:25-35am: Welcome - Zoe Papakipos & Ondrej Chum
Zoe Papakipos has been a Research Engineer at FAIR Paris since 2019. She has a Bachelors degree in Computer Science from Brown University, where she did research with Michael Littman on RL for self-driving cars. She currently works on data augmentations and copy detection and her research interests include adversarial robustness and AI fairness.
Ondrej Chum leads a team within the Visual Recognition Group at the Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague. He received his MSc degree in computer science from Charles University, Prague, in 2001 and the PhD degree from the Czech Technical University in Prague, in 2005. From 2006 to 2007, Ondrej was a postdoctoral researcher at the Visual Geometry Group, University of Oxford, United Kingdom. His research interests include large-scale image and particular object retrieval, object recognition, and robust estimation of geometric models. He is a member of Image and Vision Computing editorial board, and he has served as an area chair at major international conferences (e.g., ICCV, ECCV, CVPR, WACV and BMVC). Ondrej co-organizes Computer Vision and Sports Summers School in Prague. I was the recipient of the Best Paper Prize at the BMVC 2002, Best Science Paper Honorable mention at BMVC 2017, and Saburo Tsuji Best Paper Award at ACCV 2018. He was awarded the 2012 Outstanding Young Researcher in Image & Vision Computing runner up for researchers within seven years of their PhD.
10:35-11:10am: Keynote - Serge Belongie - Representation Learning for Narratives in Social Media
Abstract: While advances in automated fact-checking are critical in the fight against the spread of misinformation in social media, we argue that more attention is needed in the domain of unfalsifiable claims. In this talk, we outline some promising directions for identifying the prevailing narratives in shared content (image & text) and explore how the associated learned representations can be used to identify misinformation campaigns and sources of polarization.
Serge Belongie is a professor of Computer Science at the University of Copenhagen, where he also serves as the head of the Pioneer Centre for Artificial Intelligence. Previously, he was a professor of Computer Science at Cornell University, an Associate Dean at Cornell Tech, and a member of the Visiting Faculty program at Google. His research interests include Computer Vision, Machine Learning, Augmented Reality, and Human-in-the-Loop Computing. He is also a co-founder of several companies including Digital Persona, Anchovi Labs, and Orpix. He is a recipient of the NSF CAREER Award, the Alfred P. Sloan Research Fellowship, the MIT Technology Review “Innovators Under 35” Award, and the Helmholtz Prize for fundamental contributions in Computer Vision.
Slides here.
11:10-40am: Matthijs Douze - Description of the competition & main findings
Matthijs Douze has been a research scientist at the Facebook AI Research (FAIR) lab in Paris since November 2015. At Facebook he is working on large-scale indexing (see the Faiss library), machine learning with graphs and similarity search on images and videos. He obtained a master’s degree from the ENSEEIHT engineering school and defended his PhD at University of Toulouse in 2004. From 2005-2015 Matthijs joined the LEAR team at INRIA Grenoble where he worked on a variety of topics, including image indexing, large-scale vector indexing, event recognition in videos and similar video search. Between 2010 and 2015 he also managed Kinovis, a large 3D motion capture studio at INRIA and developed high-performance geometric algorithms for constructive solid geometry operations. In addition to FAIR’s general research topics, Matthijs is interested in snappy algorithms that process images and produce graphical results.
11:45am-12pm: Descriptor track 1st place - Shuhei Yokoo - Contrastive Learning with Large Memory-Bank and Negative Embedding Subtraction for Accurate Copy Detection
Abstract: We tackled copy detection by training CNNs with contrastive learning. Training with large memory-bank and hard data augmentation enables the CNNs to obtain more discriminative representation. Our proposed negative embedding subtraction further enhances the copy detection accuracy by a large margin.
GitHub code: https://github.com/lyakaap/fbisc
Arxiv paper: http://arxiv.org/abs/2112.04323
Slides here.
Shuhei Yokoo works at DeNA in Japan as a data scientist and is a Kaggle Grandmaster. His interest area is computer vision.
12-12:15pm: Descriptor track 2nd place - Sanjay Addicam, Sergio Manuel Papadakis - Producing augmentation-invariant embeddings from real-life imagery
Abstract: This talk presents an efficient way to produce feature rich, high-dimensionality embedding spaces from real life images. The features produced are designed to be independent of augmentations used in real-life cases that appear on social media. Our approach uses convolutional neural networks (CNN) to produce an embedding space. An ArcFace head was used to train the model using automatically produced augmentations. Additionally, we present a way to make an ensemble of different embed dings containing the same semantic information, a way to normalize the resulting embedding using an external dataset, and a novel way to perform quick training of these models with a high number of classes in the ArcFace head. Using this approach we achieved the 2nd place in the 2021 Facebook AI Image Similarity Challenge: Descriptor Track.
GitHub code: https://github.com/socom20/facebook-image-similarity-challenge-2021
Arxiv paper: https://arxiv.org/abs/2112.03415
Slides here.
Sanjay Addicam is a Sr. Principal Engineer and CTO of an Intel group focused on Retail, Banking, Hospitality and education. He focuses on Deploying Autonomous Deep learning solutions on the edge and accomplish training on the edge. Sanjay is a Kaggle competition master with a best rank of 23.
Sergio Manuel Papadakis studied the career of Nuclear Engineer. His curious personality led him to be interested in the field of Artificial Intelligence. Starting in 2015, Sergio began to study Artificial Intelligence in a self-taught way. In 2019 he formalized what he had learned by obtaining a master’s degree in Artificial Intelligence. Since the end of 2020 he decided to participate in computer vision competitions. Sergio has won several competitions at Kaggle where he holds the title of Competitions Master, best rank of 39 worldwide.
12:15-30pm: Descriptor track 3rd place - Wenhao Wang, Yifan Sun, Weipu Zhang, Yi Yang - Bag of Tricks and A Strong baseline for Image Copy Detection
Abstract: Image copy detection is of great importance in real-life social media. In this paper, a bag of tricks and a strong baseline are proposed for image copy detection. Unsupervised pre-training substitutes the commonly-used supervised one. Beyond that, we design a descriptor stretching strategy to stabilize the scores of different queries. Experiments demonstrate that the proposed method is effective. The proposed baseline ranks third out of 526 participants on the Facebook AI Image Similarity Challenge: Descriptor Track.
GitHub code: https://github.com/WangWenhao0716/ISC-Track2-Submission
Arxiv paper: https://arxiv.org/abs/2111.08004
Slides here.
Wenhao Wang is a Research Intern in Baidu Research co-supervised by Yifan Sun and Yi Yang. His research interest is deep metric learning and computer vision. Prior to Baidu, he was a Remote Research Intern in Inception Institute of Artificial Intelligence from 2020 to 2021. He gained his bachelor’s degree in 2021, and will join in ReLER Lab to pursue a Ph.D. degree in the near future.
Yifan Sun is currently a Senior Expert at Baidu Research. His research interests focus on deep metric learning, long-tailed visual recognition, domain adaptation and generalization. He has publications on many top-tier conferences/journals such as CVPR, ICCV, ECCV, NIPS and TPAMI. His papers have received over 3000 citations and some of his researches have been applied into realistic AI business.
Weipu Zhang is an undergraduate student majoring in EECS as well as a research intern in Baidu Research. He had learned and practiced traditional algorithms and data structure for about 3 years. Currently he is interested in data science and machine learning, especially in computer vision and game agent.
Yi Yang is a Professor with the college of computer science and technology, Zhejiang University. He has authored over 200 papers in top-tier journals and conferences. His papers have received over 32,000 citations, with an H-index of 92. He has received more than 10 international awards in the field of artificial intelligence, such as the National Excellent Doctorate Dissertations, the Zhejiang Provincial Science Award First Prize, the Australian Research Council Discovery Early Career Research Award, the Australian Computer Society Gold Digital Disruptor Award, the Google Faculty Research Award and AWS Machine Learning Research Award.
12:30-45pm: Matching track 1st place - Wenhao Wang, Yifan Sun, Weipu Zhang, Yi Yang - D2LV: A Data-Driven and Local-Verification Approach for Image Copy Detection
Abstract: Image copy detection is of great importance in real-life social media. In this paper, a data-driven and local-verification (D2LV ) approach is proposed to compete for Image Similarity Challenge: Matching Track at NeurIPS’21. In D2LV , unsupervised pre-training substitutes the commonly-used supervised one. When training, we design a set of basic and six advanced transformations, and a simple but effective baseline learns robust representation. During testing, a global-local and local-global matching strategy is proposed. The strategy performs local-verification between reference and query images. Experiments demonstrate that the proposed method is effective. The proposed approach ranks first out of 1, 103 participants on the Facebook AI Image Similarity Challenge: Matching Track.
GitHub code: https://github.com/WangWenhao0716/ISC-Track1-Submission
Arxiv paper: https://arxiv.org/abs/2111.07090
Slides here.
For photos & bios of speakers, see Descriptor track 3rd place just above.
12:45-1pm: Matching track 2nd place - SeungKee Jeon - 2nd Place Solution to Facebook AI Image Similarity Challenge: Matching Track
Abstract: This presentation demonstrates the 2nd place solution to the Facebook AI Image Similarity Challenge : Matching Track on DrivenData. The solution is based on self-supervised learning, and Vision Transformer(ViT). The main breakthrough comes from concatenating the query and reference image to form as one image and asking ViT to directly predict from the image if the query image used a reference image. With this result, we can see that ViT's global attention is strong enough to learn relations between patches, even when patches include two different images, query and reference.
GitHub code: https://github.com/seungkee/2nd-place-solution-to-Facebook-Image-Similarity-Matching-Track
Arxiv paper: https://arxiv.org/abs/2111.09113
Slides here.
SeungKee Jeon has a Bachelor's degree in Computer Science from Sungkyunkwan University and is a Kaggle Competition Grandmaster. He has been a Deep Learning Engineer at Samsung Electronics since 2017.
1-1:15pm: Matching track 3rd place - Xinlong Sun, Yangyang Qin - A Global and Local Dual Retrieval Solution to Facebook AI Image Similarity Challenge
Abstract: As a basic task of computer vision, image similarity retrieval is facing the challenge of large-scale data and image copy attacks. This paper presents our 3rd place solution to the matching track of Image Similarity Challenge (ISC) 2021 organized by Facebook AI. We propose a multi-branch retrieval method of combining global descriptors and local descriptors to cover all attack cases. Specifically, we attempt many strategies to optimize global descriptors, including abundant data augmentations, self-supervised learning with a single Transformer model, overlay detection preprocessing. Moreover, we introduce the robust SIFT feature and GPU Faiss for local retrieval which makes up for the shortcomings of the global retrieval. Finally, KNN-matching algorithm is used to judge the match and merge scores. We show some ablation experiments of our method, which reveals the complementary advantages of global and local features.
GitHub code: https://github.com/sun-xl/ISC2021
Arxiv paper: https://arxiv.org/abs/2112.02373
Slides here.
Xinlong Sun currently works for Tencent in Shenzhen, China. His recent research interests include image retrieval and video retrieval.
Yangyang Qin currently works for Tencent in Shenzhen, China. His recent research interests include image retrieval and object detection.
1:15-30pm: Descriptor track 1st place closed-source - titanshield2 - Feature compatible self-supervised learning for large-scale image similarity detection
Abstract: This talk introduces our approach, named Feature cOmpatible Self-Supervised Learning (FOSSL), to this image similarity challenge. This challenge aims to learn good representations for images in order to perform image retrieval on original images and manipulated images. The key to this challenge is to extract informative image features from unlabelled data, which are edited by hand-crafted PS or automated transformations. In the proposed FOSSL, we adopt the recent popular and effective self-supervised learning techniques and use a multi-task self-supervised learning framework to learn representation models. The multi-task self-supervised learning framework in FOSSL contains a self-distillation branch and an image-restoring branch, which is able to learn invariant features from various transformations. Moreover, in FOSSL we apply ensemble on features obtained from different models to enhance the performance. Specifically, we design an embedding compatible technique to solve the feature compatibility issue resulted from multiple models. The embedding compatible technique in FOSSL "fixes" the reference image features, therefore, query image features from different models are comparable and we fuse the query image features to enhance image representations. With the above-mentioned two methods, the proposed FOSSL is able to extract good representations from images and achieve the top performance in this challenge.
Slides here.
The titanshield2 team is from ZOLOZ. All members in the team are computer vision engineers, and work on developing algorithms on image representation, image retrieval, face recognition etc.
1:30-50pm: Discussion about the challenge & future plans