The DataCV Challenge is held in conjunction with the 5th DataCV Workshop at CVPR 2026. This year, the challenge focuses on data-centric evaluation of vision-language models under visual illusions and perceptual anomalies, with an emphasis on improving robustness without any model training or fine-tuning.
The 2026 challenge features two tracks:
Task I: Classic Illusion Understanding (participate link)
Task II: Real-World Visual Illusions and Anomalies Understanding (participate link)
❗ Participants must register with an official school or institutional email address. Registrations using non-institutional emails (e.g., Gmail, Outlook, QQ, 163) will not be approved.
Feb 5th, 2026: Release of training and validation data for the DataCV Challenge ✅
Feb 10 th–28th, 2026: open result submission for the validation phase
Feb 24th–28th, 2026: Release of test data and open result submission for the test phase
Mar 4th, 2026: Deadline for workshop paper submission
Mar 16th, 2026: Notification of Acceptance
Apr 10th, 2026: Camera Ready submission deadline
Motivation. Recent Vision-Language Models (VLMs) achieve strong performance on standard benchmarks, yet their robustness under atypical perceptual conditions, such as visual illusions, perceptual conflicts, and intuition-breaking scenes, remains under-examined. In these settings, models may rely on linguistic priors or memorized prototypes rather than direct visual evidence, sometimes producing answers that remain unchanged even when the visual stimulus is counterfactually altered.
Our competition is designed to explore potential solutions from a data-centric perspective. Rather than improving models through training or fine-tuning, we investigate how far performance and robustness can be advanced by improving the evaluation data and the prompting / in-context interface. In particular, we encourage approaches that (i) make the task more resistant to shortcut heuristics, (ii) make failures more diagnosable through counterfactual design such as matched controls, and (iii) better reflect direct visual perception. Our goal is to promote methods that reduce shortcut exploitation and improve the coordination between visual perception and language-based understanding, encouraging models to ground their decisions in what is actually seen while using reasoning to interpret the visual evidence.
Task I: Classic illusion understanding. Participants design a prompting / in-context learning (ICL) strategy that enables a fixed, frozen VLM to answer binary (Yes/No) questions about classic optical illusions.
Input: an illusion image + a binary question (Yes/No).
Goal: maximize accuracy while remaining sensitive to controlled visual changes.
Constraint: no model training; only zero-shot / few-shot inference-time methods are allowed.
Output format: please ensure the result file is created according to the GitHub guidelines for this task and submitted to Codabench.
Task-II: Real-world visual illusions and anomalies. Participants design any form of strategy for any VLM to answer questions with multiple-choice (A/B/C/D) on real-world visual illusions and anomalies.
Input: an image + a prompt (including a question and multiple-choice options {A, B, C, D}.
Goal: select the single correct option under diverse illusion/anomaly scenarios.
Constraint: no model training.
Output format: please ensure the result file is created according to the GitHub guidelines for this task and submitted to Codabench.
Participants may choose to enter Task I, Task II, or both. Both tasks are inference-only: participants must not fine-tune or update model weights in any form. Instead, participants may design prompts, routing strategies, tool use, or agent-style pipelines to improve performance and robustness under strict no-training rules.
(a) Datasets. We will release a public development pool and provide validation and test splits for both Task I and Task II. For Task I, we additionally provide a limited few-shot exemplar pool that participants may optionally use for in-context learning (ICL), at their discretion.
(b) Dataset availability and evaluation. The development pool is publicly accessible. The hidden validation and test sets are hosted on the evaluation server (task1, task2). Participants submit their system outputs for the designated split, and the organizers compute scores and publish tea
(c) Ethical considerations. All images are created by organizers with explicit licenses.
Mingyang Li
Stanford University
Wenjin Hou
Zhejiang University
Mingkang Zhou
Shandong University
Xin Chen
Shandong University
Hehe Fan
Zhejiang University
For additional info please contact us.