The 5th DataCV Workshop and Challenge will be held as a half-day workshop in Denver, Colorado, USA, in conjunction with CVPR 2026.
Paper submission deadline: March 4th, 2026 (23:59 Anywhere on Earth). Submissions will be handled via OpenReview.
The DataCV Challenge can be found here.
The training and validation data have been released!!!
Paper acceptance and competition results announcement: March 16th, 2026.
Camera-ready deadline: Friday, April 10th, 2026.
Information about previous years' workshops, DataCV (VDU) 2022, 2023, 2024 and 2025 are available here.
This workshop aims to bring together research works and discussions focusing on data-centric challenges in computer vision and beyond, as opposed to the commonly seen algorithm-centric workshops. After the successful VDU/DataCV workshops and challenges at CVPR 2022–2024 and ICCV 2025, we built a growing community interested in data-centric work: analyzing large datasets, uncovering biases and underrepresented classes, improving training data with active learning and augmentation, and learning from synthetic data. These efforts strongly shape model performance in both computer vision and multimodal settings. This year, we broadened our scope to include the understanding of multimodal datasets that incorporate vision data, building on our existing focus on vision datasets. Meanwhile, we seek to further strengthen our community by hosting invited talks, presenting research papers, and organizing the DataCV challenge. The following topics will be the focus of this workshop:
Exploring Vision-Language Models (VLMs) from a data-centric perspective. VLMs depend on largescale multimodal datasets for vision-language alignment but often face biases, noise, and imbalances. A data-centric approach is key to improving training efficiency and model generalization. This workshop explores how dataset properties, biases, and alignment strategies impact VLMs and how to construct highquality multimodal datasets for real-world use.
Properties and attributes of vision datasets. The first and foremost problem is the definition of datasetlevel properties. While image-level properties (e.g., categories, bounding boxes, caption) are well-studied, dataset-level properties (e.g., noise, diversity, bias) require more attention. These properties necessitate specialized evaluation methods, posing challenges for dataset representation learning.
Application of dataset-level analysis. We find numerous application opportunities for dataset-level analysis. For example, analyzing dataset quality opens up new opportunities for designing dynamic dataset compositions, leading to improved model accuracy. This is especially valuable as most datasets are static, and adaptive compositions can foster innovation. Additionally, understanding content and label biases enhances our insight into model generalization, allowing for targeted dataset improvements. Automated metrics at the dataset level are also crucial for optimizing active learning processes.
Representations of and similarities between vision datasets. While image-level representations are wellstudied, dataset-level representations remain underexplored, often limited to basic statistics. Exploring how to aggregate image features into global dataset representations could be valuable. Moreover, (semi)endto-end learning approaches could train models to extract task-specific dataset features. Meanwhile, using dataset representations, we can analyze their similarities for further study, such as measuring domain gaps.
Improving vision dataset quality through generation and simulation. Recent research has leveraged synthetic data from simulation engines, GANs, diffusion models, and extracted existing real data to create new training sets. These methods offer flexible and cost-effective solutions for scenarios where collecting real training data is expensive or for addressing rare edge cases. Consequently, studying strategies to evaluate and generate high-quality synthetic data is crucial for promoting its effective use.
In summary, the questions related to the proposed workshop include but are not limited to:
Can vision/multi-modal datasets be analyzed on a large scale?
How to holistically understand the visual semantics contained in a dataset?
How to define vision-related properties and problems on the dataset level?
How can we improve algorithm design by better understanding vision datasets?
Can we predict the performance of an existing model in a new dataset?
What makes a good dataset representation? (hand-crafted, learned, a mix of both or other methods)
How do we measure similarities between datasets and their bias and fairness?
Can we improve training data quality through data engineering or simulation?
How to efficiently create labelled datasets under new environments?
How to create realistic datasets that serve our real-world application purpose?
How can we alleviate the need for large-scale labelled datasets in model learning?
How to analyze model performance in environments lacking annotated data?
How can we assess model bias and fairness in vision language models from a data perspective?
How can generated data be used to alleviate privacy concerns in computer vision tasks?
How to better evaluate diffusion models and large language models using data-centric approaches?
The 1st UDA Workshop @ CVPR 2022, New Orleans, Louisiana
The 2nd UDA Workshop @ CVPR 2023, Vancovor, Canada
The 3rd UDA Workshop @ CVPR 2024, Seattle, Washington
The 4rd UDA Workshop @ICCV 2025, Honolulu, Hawai'i