Call for Paper
DPFM@ICLR'24 Workshop Call for Papers 🙋
We are pleased to announce that the DPFM@ICLR'24 Workshop is now accepting submissions via OpenReview (https://openreview.net/group?id=ICLR.cc/2024/Workshop/DPFM )
Workshop Format and Presentation
The workshop is a hybrid event with a physical venue co-located with ICLR'24 conference and will be streamed online via Zoom.
All accepted works will feature a poster presentation at the conference venue and online on the workshop website along with the papers.
Selected best papers will feature an oral presentation and awards.
Dual Submission Policy
DPFM@ICLR'24 Workshop is non-archival. Accepted papers will be posted online and indexed by Google Scholar.Â
We are open to works that have been recently published, are currently under review, or are in an early stage of research.
We welcome submissions that have not been presented at another venue. We discourage papers that will also be presented at the main conference of ICLR 2024.
Submission Categories
Spotlight Paper (Short Paper):
Up to 4 pages of content, with up to 1 additional page for discussions on limitations and social impacts.
Suitable for disseminating novel ideas, preliminary results, or controversial findings, or as an extended abstract of a report or early-stage paper.
A shortened manuscript of an already published work is also acceptable so long as it has not been presented at a conference.
Full Research Papers (Long Paper):
Up to 9 pages of content, with up to 1 additional page for discussions on limitations and social impacts.
Manuscripts should be structurally complete with concrete findings. Examples include works that are in submission or currently under review.
The manuscript submitted to this category should have not been published elsewhere.
Areas of Interest
Data Problems x Foundation Models
Data Quality, Dataset Curation, and Data Generation
Data Perspective to Efficiency, Interpretability, and Alignment
Data Perspective on Safety and Ethics
Data Copyright, Legal Issues, and Data Economy
Submission Guidelines
Please use the provided ICLR template: https://github.com/ICLR/Master-Template/raw/master/iclr2024.zipÂ
Adhere to the general guidelines of ICLR such as anonymity. The reviewing process is double-blind.
Be mindful to ensure the submitted document is correctly formatted, and checked for language issues, and that the main paper is self-contained without the Appendix. This is the best way to show respect for the dedicated work of our voluntary reviewers.
We would like to ask each submission to nominate at least one author for consideration of invitation to join the program committee.
Important Dates
Submission deadline: February 3, 2024 (Anywhere on Earth) [ !ATTN: Submission ddl extended to Feb 11 (AoE) ]
Notification of acceptance: March 3, 2024Â
Workshop date: Saturday, May 11, 2024. (Messe Wien Exhibition Congress Center, Vienna, Austria + Zoom)
Contact
For any inquiries about paper submission or workshop details, feel free to contact us anytime at: dpfm-workshop-iclr24@googlegroups.comÂ
We look forward to learning about your amazing work at the workshop!
--
ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models (DPFM)
Spotlight/Finding/Perspective Paper (Short Paper)
- up to 4 pages of content
Short papers typically feature novel ideas, early findings, latest results, or could be an extended abstract for a scientific report, project, or paper published elsewhere. Controversial findings and preliminary results are also accepted so long as hold conceptual value and spark constructive discussions. We also welcome perspective papers to share insights on emerging problems on the research frontier.Â
Full Research Paper (Long Paper)
- up to 9 pages of content
Long papers are full documentation of a research investigation, structurally complete, and have concrete findings. The format follows the norms and guidelines of typical academic publications. Examples include manuscripts that have not been published or are currently under review.
Best Short/Long Paper AwardsÂ
The workshop will select a number of exceptional submissions for best paper awards based on the recommendation of the program committee. Awards for short and long papers will be decided separately. Selected best papers will be offered opportunities for oral presentations at the workshop. We will present a formal certificate for the awards and also include it in the positional paper we will publish after the workshop.
Questions?
Email dpfm-workshop-iclr24@googlegroups.com to get more information on the workshop.
Areas of Interest
[Module A] Data Quality, Dataset Curation, and Data Generation
–Recent Achievements and Current Efforts
A. Data Quality, Dataset Curation, and Data Generation–Recent Achievements and Current Efforts
[Data Quality] How to quantify data quality in the context of FMs or select a good subset?
What are the aspects to consider and what are the quantitative metrics?
How this can be performed considering the scale or nature of data for FMs?
[Data Influence] How to model the influence of data throughout the lifecycle of FMs (pre-training, fine-tuning, deployment, etc.)? What’s the impact of each part and how does data influence interaction?
What are the data-perspective efforts in adapting FMs to target tasks/scenarios for deployment? A particular interest is data curation/labeling for fine-tuning, the role of in-context learning, and adaptation in dynamic environments.
[Data Generation] The capability of foundation models brings it to a new level for generating data. How to control the generation process to produce high-quality or task-relevant data and what can it be used for?
Is it good for directly being used to train a model in the same way as natural data? What problem this may cause?
How can it help with alignment, improving fairness/safety, or/and adaptation in low-resource scenarios where labeled data is scarce?
[Data Quality] [Scalability] For FMs pre-trained on massive and broad data, how to consider data acquisition/composition/quality/resource efficiency at scale?
Resources requirements for training and deployment of FMs may be out of reach for many, what does this mean for researchers and practitioners working on data problems and how to adapt to the new norm?Â
[Scaling Laws] Scaling laws help in a lot of scenarios and enable research on large models with a small budget, but with complications that some unique capabilities for very large models such as chain-of-thoughts only emerge after a sufficiently large scale. What complications will this cause?
[Module B] A Data Perspective to Efficiency, Interpretability, and Alignment
–Latest Advancement and Breakthroughs
B. A Data Perspective to Efficiency, Interpretability, and Alignment–Latest Advancement and Breakthroughs
[Data Efficiency] Other than the impressive capability and generalizability, foundation models have reached unprecedented scales in terms of model size and training data, pronouncing the efficiency problems from all aspects.Â
This includes data efficiency in model training such as the typically resource-intensive pre-training, or fine-tuning at deployment which often has limited labeled data,Â
[Attribution at Scale] [Interpretability] and also the efficiency of inference methods for interpretability, explainability, fact tracing, etc. How do existing approaches scale up to foundation models and what are the new solutions to these problems?
[Data and Alignment] Alignment is one of the most active topics for FMs. How can data-perspective research best contribute to important issues such as data attribution/interpretability, harmlessness/truthfulness, AI safety(fake/harmful contents), etc?
[Module C] A Data Perspective to Safety and Ethics
–Risks, Limitations, and Opportunities
C. A Data Perspective to Safety and Ethics–Risks, Limitations, and Opportunities
[Safety and Trustworthiness] The unprecedented size and capability of FMs pose unprecedented challenges for safety/trustworthy issues (e.g., jailbreaking, security loopholes, harmful contents, misuse, privacy violations, etc.). What are the risks and current limitations?Â
[Data and Safety] [Data and Ethics] How data-perspective research can help and which issues benefit most from data-perspective research?
[Evaluation Techniques] How does data-perspective research contribute to the evaluation of FMs (e.g., fairness/ethics, defects of FMs/failure cases)?
[Data and Evaluation] How to improve these issues from the data side?
[Module D] Copyright, Legal Issues, and Data Economy
–A Broader Landscape
D. Copyright, Legal Issues, and Data Economy–A Broader Landscape
[Data Copyright] [Legal Challenges and Practical Risks] Copyright issues and privacy concerns are the sword of Damocles for the deployment of FMs. What are the current risks and limitations?
[Data Research and Technical, Economic, and Governance Solutions] How data-perspective research can contribute technical, economic, and governance solutions to this topic?
[Data Economy] [Data Acquisition] What is the perspective of data economy? What are the potential market solutions for the acquisition of data?
[Data Valuation] [Data Exchange] What are the research opportunities for data problems associated with it? How to quantify the value of data, schemes for data exchange, etc.
Submit your work
We look forward to learning about your amazing work at the workshop.