The goal of this workshop is to collaborate across academic researchers, industry practitioners, and financial experts to lean into practical examples of small data transforming AI capabilities.
We welcome short papers (1-2 pages) and long papers (4 pages) addressing one or more of the topics of interest below. Papers will be peer-reviewed by the program committee and accepted papers will be presented as lightning talks during the workshop.
Tools and approaches for effectively injecting domain knowledge from experts in the AI system
Algorithms for working with limited labeled data and improving label efficiency
Tools and approaches that quantify and accelerate time to source and prepare high quality data
Tools and approaches that ensure that the data is labeled consistently, such as label consensus
Tools and approaches that make improving data quality more systematic
Tools and approaches that automate the creation of high quality supervised learning training data from low quality resources
Tools and approaches that produce consistent and low noise data samples, or remove labeling noise or inconsistencies from existing data
Tools and approaches for controlling what goes into the dataset and for making high level edits efficiently to very large datasets
Search methods for finding suitably licensed datasets based on public resources
Tools for creating training datasets for small data problems, or for rare classes in the long tail of big data problems
Tools and approaches for timely incorporation of feedback from production systems into datasets
Tools for understanding dataset coverage of important classes, and editing them to cover newly identified important cases
Dataset importers that allow easy combination and composition of existing datasets
Data selection techniques such as active learning and core-set selection for identifying the most valuable examples to label.
Semi-supervised learning, few-shot learning, and weak supervision methods for maximizing the power of limited labeled data.
Transfer learning and self-supervised learning approaches for developing powerful representations that can be used for many downstream tasks with limited labeled data.
Novelty and drift detection to identify when more data needs to be labeled
Synthetic data where real data are limited or not available
Format: Short papers (1-2 pages excluding references), long papers (4 pages excluding references)
Suggested Framework:
Situation
Complication
What actions were taken to address it
How it was solved
Template:
The LaTeX template is integrated with overleaf: [overleaf.com], use the sigconf template
ACM Microsoft Word template: [download]
Submission materials to be submitted by email: small.data@jpmorgan.com
Important Dates:
August 29, 2022: Workshop Paper Due Date
September 12, 2022: Paper Notification Deadline
Nov 2, 2022: Workshop
All deadlines are 11:59PM (UTC-12:00 / Anywhere on Earth).
This workshop is non-archival. The review process is single-blind. There are no conflicts with submitting work that was previously submitted or published in other proceedings or conferences.