Exploring the Next Generation of Data
CVPR Workshop 2026
Denver, CO
Denver, CO
Half Day Workshop
Data-centric methods—from training large models to developing industrial-grade systems—are more critical and, frankly, more opaque than ever. These models and industrial-grade systems are rapidly incorporating into several safety critical applications of human life, such as medical diagnosis models, autonomous driving, AI chat bots, etc. It is widely acknowledged that these systems can be approached from a data-centric perspective, for example, with large-quantity and high-quality data. Whether there is too much data (for LLMs and Autonomous driving) or too little data (for robotics), data-centric methods focus on the ability to scale, rank, mix, generate data, or even discover data causations for training and retraining models. Furthermore, due to the scale of data, an automatic way to successfully improve models through all these data algorithms is more important than ever. Recently, foundation models themselves are used to discover even more data to feed into more foundation model training. This cyclic relationship between data and foundation models introduces another layer of complexity and biases to consider. Overall, this enormous challenge to discover the next generation of data requires several considerations: definition of data quality, automatic data flywheels, scalability, data mixtures, generating data, data mining, etc… The objective of this proposed CVPR 2026 workshop on the 2nd edition of Exploring the Next Generation of Data is to gather researchers and engineers across academia and industry to discuss and how to tackle this large challenge together. We have invited leading academic and industry experts as speakers and will call for papers to encourage further engagement and research into this new challenging field. As this challenge is broad, we have expanded this year's workshop to include a challenge and an hour long, moderated open discussion on data-centric approaches to AI. We hope this workshop can be a platform to gather all the new cutting edge research required to address this challenge.
This workshop will feature invited talks and selected paper publication. See the program section for details.
Workshop paper submission deadline: TBD
Notification to authors: TBD
Camera ready deadline: TBD
We invite original paper submissions that address data mixture, data distillation, generation of data, bias free data selection, fair data selection, such as:
Scaling laws
Data mixtures
Data curation
Causality
Dataset bias, fairness, and ethical considerations
Data distillation
Data curation
Scalable data mining
Generative models for synthetic data generation
Foundation models for data mining
Foundation models for data annotation
Hallucination free vision language models
We are following the CVPR paper format: https://cvpr.thecvf.com/Conferences/2026/AuthorGuidelines
LaTeX/Word Templates: CVPR 2026 Paper Template
We accept full-length (max 8 pages) submissions, excluding references.
All the submissions will be peer-reviewed by at least two reviewers.
Blind review: we adopt double-blind review for this workshop. Submitted papers and supplementary materials should not reveal any information about the author.
Dual submission: We do not accept paper submissions that have been published or are under review for other conferences or workshops. Accepted papers are expected to be published at CVPR proceedings.
In submitting a manuscript to NEXD Workshop, the authors agree to the review process and agree to contribute with the reviewing process.
Submission site: TBD