Mar 4th, 2026 11:59 PM (AoE): Deadline for Submission. Submission Website: OpenReview
Mar 16th, 2026 11:59 PM (AoE): Notification of Acceptance
Apr 10th, 2026 11:59 PM (AoE): Camera Ready DDL
Note: The above deadlines apply to those who want to have their papers included in the proceedings. If you prefer not to be included into the proceedings but still want to share your work with the community, please contact the organizing committee to find out possible solutions.
To ensure the high quality of the accepted papers, all submissions will be reviewed by experts from both academia and industry in the relevant fields. The review process will be double-blind. We welcome submissions describing unpublished work, work currently under review, as well as recently published work. All accepted workshop papers will be published in the CVPR 2026 Workshop Proceedings and made openly available through the Computer Vision Foundation (CVF) Open Access platform. Authors of all accepted papers (oral and poster presentations) will be invited to present their work at the DataCV Workshop held in conjunction with CVPR 2026.
Paper submissions must be written in English and submitted in PDF format. In accordance with CVPR 2026 workshop guidelines, submissions must be between 4 and 8 pages in length, excluding references. Works of 4 pages or fewer are not eligible for inclusion in the official proceedings; authors of such works are encouraged to share their contributions independently (e.g., via arXiv or personal/project websites). All submissions must follow the official CVPR 2026 formatting guidelines. The CVPR author kit provides a LaTeX2e template and detailed instructions for paper preparation.
Submission Website: OpenReview
For information about whether the workshop will be in-person, virtual, or hybrid please visit the CVPR 2026 website.
This workshop aims to bring together research works and discussions focusing on data-centric challenges in computer vision and beyond, as opposed to the commonly seen algorithm-centric workshops. After the successful VDU/DataCV workshops and challenges at CVPR 2022–2024 and ICCV 2025, we built a growing community interested in data-centric work: analyzing large datasets, uncovering biases and underrepresented classes, improving training data with active learning and augmentation, and learning from synthetic data. These efforts strongly shape model performance in both computer vision and multimodal settings. This year, we broadened our scope to include the understanding of multimodal datasets that incorporate vision data, building on our existing focus on vision datasets. Meanwhile, we seek to further strengthen our community by hosting invited talks, presenting research papers, and organizing the DataCV challenge. The following topics will be the focus of this workshop:
Exploring Vision-Language Models (VLMs) from a data-centric perspective. VLMs depend on largescale multimodal datasets for vision-language alignment but often face biases, noise, and imbalances. A data-centric approach is key to improving training efficiency and model generalization. This workshop explores how dataset properties, biases, and alignment strategies impact VLMs and how to construct highquality multimodal datasets for real-world use.
Properties and attributes of vision datasets. The first and foremost problem is the definition of datasetlevel properties. While image-level properties (e.g., categories, bounding boxes, caption) are well-studied, dataset-level properties (e.g., noise, diversity, bias) require more attention. These properties necessitate specialized evaluation methods, posing challenges for dataset representation learning.
Application of dataset-level analysis. We find numerous application opportunities for dataset-level analysis. For example, analyzing dataset quality opens up new opportunities for designing dynamic dataset compositions, leading to improved model accuracy. This is especially valuable as most datasets are static, and adaptive compositions can foster innovation. Additionally, understanding content and label biases enhances our insight into model generalization, allowing for targeted dataset improvements. Automated metrics at the dataset level are also crucial for optimizing active learning processes.
Representations of and similarities between vision datasets. While image-level representations are wellstudied, dataset-level representations remain underexplored, often limited to basic statistics. Exploring how to aggregate image features into global dataset representations could be valuable. Moreover, (semi)endto-end learning approaches could train models to extract task-specific dataset features. Meanwhile, using dataset representations, we can analyze their similarities for further study, such as measuring domain gaps.
Improving vision dataset quality through generation and simulation. Recent research has leveraged synthetic data from simulation engines, GANs, diffusion models, and extracted existing real data to create new training sets. These methods offer flexible and cost-effective solutions for scenarios where collecting real training data is expensive or for addressing rare edge cases. Consequently, studying strategies to evaluate and generate high-quality synthetic data is crucial for promoting its effective use.
In summary, the questions related to the proposed workshop include but are not limited to:
Can vision/multi-modal datasets be analyzed on a large scale?
How to holistically understand the visual semantics contained in a dataset?
How to define vision-related properties and problems on the dataset level?
How can we improve algorithm design by better understanding vision datasets?
Can we predict the performance of an existing model in a new dataset?
What makes a good dataset representation? (hand-crafted, learned, a mix of both or other methods)
How do we measure similarities between datasets and their bias and fairness?
Can we improve training data quality through data engineering or simulation?
How to efficiently create labelled datasets under new environments?
How to create realistic datasets that serve our real-world application purpose?
How can we alleviate the need for large-scale labelled datasets in model learning?
How to analyze model performance in environments lacking annotated data?
How can we assess model bias and fairness in vision language models from a data perspective?
How can generated data be used to alleviate privacy concerns in computer vision tasks?
How to better evaluate diffusion models and large language models using data-centric approaches?
For additional information please contact us.