Challenge
Welcome to the Visual Anomaly and Novelty Detection (VAND) 2023 Challenge! This year our challenge aims to bring visual anomaly detection closer to industrial visual inspection, which has wide real-world applications. We look forward to participation from both academia and industry.
For industrial visual inspection, the majority of previous methods focus on training a specific model for each category given a large number of normal images as reference. However, in real-world scenarios, there are millions of industrial products and it is not cost-effective to collect a large training set for each object and deploy different models for different categories. In fact, building cold-start models, models trained with zero or few normal images, is essential in many cases as defects are rare with a wide range of variations.
Building a single model that can be rapidly adapted to numerous categories without or with only a handful of normal reference images is an ideal solution and an open challenge to the community. To encourage the research in this direction, we propose two relevant tracks:
Track 1: Zero-shot Anomaly Detection (Anomaly classification + Segmentation)
Track 2: Few-Shot Anomaly Detection (Anomaly classification + Segmentation)
Note that in both tracks, there will be no training examples of defective examples. We will have two phases and each phase has different test datasets. The first phase aims to kick-start research and development for the given tasks with public datasets. The second phase will release a new test set and we will announce winners according to the results in phase 2.
Feasibility Study and Clarification on Zero-shot Track
To verify the feasibility of zero-shot anomaly detection, we have conducted a pioneer study in WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation, CVPR 2023. With textual prompt engineering, OpenCLIP pre-trained on LAION-400M yields promising language-guided zero-shot performance for visual inspection on MVTec-AD dataset, e.g. 91.8% AUROC for classification and 85.1% pAUROC for segmentation without any fine-tuning. This indicates that OpenCLIP has learned image-text alignment for concepts in visual inspection, which provides a powerful pre-trained representation to solve the challenging and practical zero-shot anomaly detect.
In the zero-shot anomaly detection track, any public pre-trained models and pre-training/fine-tuning using external dataset without test data (e.g. any MVTec-AD data) is allowed. But when evaluating the model in the test set, it is only allowed to use text description about normality and anomaly (e.g. defect types) for in-context learning. You cannot use any in-task images for model pre-training. Note that the scale of the pre-training data for the public pre-trained model is not limited. If you use private data/subset of public data to pre-train or fine-tune a model, the data is capped to 50 million (for the easiness of reproducing), and you must publish the data if you want to be considered as one of the top two winners per track.
Timeline
Phase 1: March 13th - April 14th (UTC)
Phase 2: April 17th - May 12th (UTC)
Code/report Deadline: May 29th
Winner Announcement: June 5th 6th
Winner Presentation Date: June 18th (Workshop Date)
Challenge Results
Track 1: Zero-shot Anomaly Detection
Winner: April-GAN
Xuhai Chen, Yue Han (Zhejiang University), Jiangning Zhang (Youtu Lab, Tencent)
Runner-up: SegmentAnyAnomaly
Yunkang Cao, Xiaohao Xu, Chen Sun, Yuqi Cheng, Liang Gao, Weiming Shen (Huazhong University of Science and Technology)
Honorable mentions (Invited to poster session):
Variance Vigilance Vanguard
Matthew Baugh, James Batten (Imperial College London), Johanna P. Müller (Friedrich-Alexander-Universität Erlangen-Nürnberg)
MediaBrain
Chaoqin Huang, Aofan Jiang, Ya Zhang, Yanfeng Wang (Shanghai Jiao Tong University)
Track 2: Few-shot Anomaly Detection
Winner: Scortex
Oliver Rippel, João Santos, Triet Tran, Nicolas Morisson, Yann Chéné, Hugues Poiget (Scortex)
Runner-up: MediaBrain
Chaoqin Huang, Aofan Jiang, Ya Zhang, Yanfeng Wang (Shanghai Jiao Tong University)
Honorable mentions (Invited to poster session):
PatchCore+
Stefan Wezel, Karsten Roth, Zeynep Akata (Tübingen University)
April-GAN
Xuhai Chen, Yue Han (Zhejiang University), Jiangning Zhang (Youtu Lab, Tencent)
Rules and Requirements
Participants from both academic and industrial institutions are welcome to participate in this challenge. If you will work with a team, you must provide your list of team members must be provided during registration, and you cannot change the list of team members after registration. Each individual can participate in only one team and should provide their institution / corporate email and a phone number during registration.
Each of the top two winning teams must publish the full code, along with a complete list of instructions and scripts, that shows how to reproduce their results, ideally with the exact random seeds used to get the best result. If the organizing committee for this competition determines that 1) a winning team's submitted code runs with errors or does not yield results comparable to those in the final leaderboard, or 2) the winning team is not willing to cooperate with the committee in reproducing results, then such team may be disqualified, and the next highest team in the leaderboard may be selected as a top two winning team per track.
For this purpose, we also request each of the top five teams per track to submit the full code, instructions and scripts to be reviewed.
Model pre-training: Teams may use any publicly available and appropriately licensed data to pre-train their models (except the dataset used for the challenge). Each participant must ensure that their use of any data in connection with this competition complies with applicable law and all other applicable legal requirements. If a team uses private data for pre-training, the team must release the data for reproducibility to be considered as a winning team. Also, any public pre-trained models can be used, such as CLIP, OpenCLIP, or models pre-trained on ImageNet, etc.
For each track, each team is required to develop a single algorithm that can conduct both anomaly classification and segmentation, and a unified configuration of hyper-parameters across all the subsets, e.g. transistor, pill, etc. We require the algorithm to share the same backbone/encoder for both tasks. For few-shot learning, each team must propose a unified algorithm which can be trained for any k-shots, instead developing different algorithms for each k-shot setting.
The final team rankings depend on the results regardless of the algorithm’s complexity. Model ensembles are allowed, but efficient solutions, in terms of speed/FLOPS, will be highlighted even if they do not finish in the top two winning teams.
For any submission, content, or other data that you provide us in connection with the challenge, you grant us a non-exclusive, worldwide, non-transferable, non-sublicensable, and royalty free right and license, to view, download, use and reproduce such submission, content, or other data only for the purpose of the competition.
Phase 2 (Closed)
Phase 2 is our main challenge for both zero-shot and few-shot tracks. Participants can use their knowledge learned from Phase 1 to develop a model for the Phase 2's dataset. To avoid overfitting, evaluation scripts won't be provided during the challenge. The top two winners of each track will be selected based on the Phase 2’s results.
Registration
To officially participate in our challenge, please register your information and team information (name, email, members, etc.) with the following form first. If you have already registered for Phase 1, you don't need to register again:
We allow only one registration per team. Also, one registration covers all phases (phase 1 and phase 2) and tasks (zero-shot and few-shot).
Note that we only consider submissions with a Team Name provided in our registration form (above link).
You can set your team name for the competition at the User Setting of CodaLab.
Challenge link
Note that these sites are different from our Phase 1 CodaLab sites. Therefore, new CodaLab registration is required for each track.
Zero-shot (Phase 2): Challenge Link
Few-shot (Phase 2): Challenge Link
Zero-shot anomaly classification and segmentation
For zero-shot anomaly classification and segmentation, the goal is to develop a single model with zero-shot anomaly classification and segmentation ability on various downstream datasets, given a pre-trained model and textual description about object/potential defect of the tasks.
Dataset Definition
Test data: Test set of VisA dataset modified for VAND challenge. (links can be found in our CodaLab website)
Train and pre-train data: Participants can train and pre-train their models on any datasets that they are allowed to use, except they cannot train the model on the train and test sets of VisA.
Evaluation Metrics
For both classification and segmentation, the evaluation metric is the F1-max score, i.e. F1 score at optimal threshold, averaged across multiple downstream subsets, e.g. transistor, pill, etc.
For each downstream set, we choose the harmonic mean between the classification F1-max and segmentation F1-max as the summary metric. As we are in favor of an algorithm with good performances on both tasks, harmonic average is chosen instead of arithmetic mean as it is more sensitive to the smaller value.
The final metric to compare different models is the arithmetic mean over all the downstream sets.
We provide the score of WinCLIP (CVPR 2023) as a baseline in the leaderboard.
Few-shot anomaly classification and segmentation
For few-shot anomaly classification and segmentation, the goal is to develop an algorithm which learns to conduct anomaly classification and segmentation for downstream datasets with a few normal downstream images.
Dataset Definition
Test data: Test set of VisA dataset modified for VAND challenge. (links can be found in our CodaLab website)
Train data: k normal images randomly sampled from the train set of VisA. We require k = 1, 5, 10. Randomly selected samples are provided by the organizing committee. (links can be found in our CodaLab website)
Pre-train data: Participants can pre-train the model on any public data that they are allowed tu use, except for the VisA data.
Evaluation Metrics
For each k-normal-shot (or k-shot) setup, on a specific subset (e.g. transistor), the evaluation metric is the harmonic mean of F1-max over classification and segmentation, the same as the metric in zero-shot setup.
For each k-normal-shot on a specific subset, we require 3 random runs on the pre-selected samples. Then we compute the arithmetic mean over multiple runs.
Given the above averaged metrics, we take the arithmetic mean over subsets (e.g. transistor, pill, etc) as the evaluation metric for a k-normal-shot setup.
To evaluate the performance for a few-shot algorithm, we will plot the (k-shot) F1-max curve which is the aggregated F1-max v.s. k-shot curve. The final metric is the Area Under F1-max Curve (AUFC)
We provide the score of WinCLIP (CVPR 2023) as a baseline in the leaderboard.
Phase 1 (Closed)
Phase 1 is designed to initiate algorithmic development and introduce our challenge’s objectives, such as description of the two tracks, zero-shot and few-shot, and target metrics we will use. To avoid overfitting, evaluation scripts won't be provided during the challenge. The top two winners of each track will be selected based on the Phase 2’s results.
Registration
To officially participate in our challenge, please register your information and team information (name, email, members, etc.) with the following form first:
We allow only one registration per team. Also, one registration covers all phases (phase 1 and phase 2) and tasks (zero-shot and few-shot).
Note that we only consider submissions with a Team Name provided in our registration form (above link).
You can set your team name for the competition at the User Setting of CodaLab.
Challenge link
Zero-shot (Phase 1): Challenge link
Few-shot (Phase 1): Challenge link
Submission
For Phase 1, we allow one submission per day per team. Further details on submission including format, structure, and zip can be found at "Learn the Details -> Evaluation" of each challenge site.
Zero-shot anomaly classification and segmentation
For zero-shot anomaly classification and segmentation, the goal is to develop a single model with zero-shot anomaly classification and segmentation ability on various downstream datasets, given a pre-trained model and textual description about object/potential defect of the tasks.
Dataset Definition
Test data: Test set of MVTec-AD
Train and pre-train data: Participants can train and pre-train their models on any datasets that they are allowed to use, except they cannot train the model on the train and test sets of MVTec-AD.
Evaluation Metrics
For both classification and segmentation, the evaluation metric is the F1-max score, i.e. F1 score at optimal threshold, averaged across multiple downstream subsets, e.g. transistor, pill, etc.
For each downstream set, we choose the harmonic mean between the classification F1-max and segmentation F1-max as the summary metric. As we are in favor of an algorithm with good performances on both tasks, harmonic average is chosen instead of arithmetic mean as it is more sensitive to the smaller value.
The final metric to compare different models is the arithmetic mean over all the downstream sets.
Few-shot anomaly classification and segmentation
For few-shot anomaly classification and segmentation, the goal is to develop an algorithm which learns to conduct anomaly classification and segmentation for downstream datasets with a few normal downstream images.
Dataset Definition
Test data: Test set of MVTec-AD
Train data: k normal images randomly sampled from the train set of MVTec-AD. We require k = 1, 5, 10. Randomly selected samples are provided by the organizing committee.
Pre-train data: Participants can pre-train the model on any public data that they are allowed tu use, except for the MVTec-AD data.
Evaluation Metrics
For each k-normal-shot (or k-shot) setup, on a specific subset (e.g. transistor), the evaluation metric is the harmonic mean of F1-max over classification and segmentation, the same as the metric in zero-shot setup.
For each k-normal-shot on a specific subset, we require 3 random runs on the pre-selected samples. Then we compute the arithmetic mean over multiple runs.
Given the above averaged metrics, we take the arithmetic mean over subsets (e.g. transistor, pill, etc) as the evaluation metric for a k-normal-shot setup.
To evaluate the performance for a few-shot algorithm, we will plot the (k-shot) F1-max curve which is the aggregated F1-max v.s. k-shot curve. The final metric is the Area Under F1-max Curve (AUFC)
Clarification on Evaluation Metrics
We believe the metrics used in the challenge are more practical for real-world application than standard ROC-AUC and PR-AUC. We will host our server to evaluate all the submitted results for a fair comparison. However, we won't open source the evaluation code until the winner announcement to avoid overfitting the test set. The participants can either use standard metrics (e.g. ROC-AUC) or the proposed metrics implemented by themselves to evaluate their methods locally, and then submit results to our server for a fair benchmarking.
Prizes
The top two winning teams or individual participants of each track will each get to spend up to ten minutes presenting their work at the VAND workshop, CVPR 2023.
There will be no other prize or rewards as part of this competition.