Machine Learning for Data

Automated Creation, Privacy, Bias

July 23, 2021 @ ICML 2021

Videos & slides are available here:

https://slideslive.com/icml-2021/machine-learning-for-data-automated-creation-privacy-bias

Virtual Event

Virtual event website: https://icml.cc/virtual/2021/workshop/8356

Poster session via Gather: https://eventhosts.gather.town/0j0So7wYRCaUvO7F/icml2021ml4data

Overview

As the use of machine learning (ML) becomes ubiquitous, there is a growing understanding and appreciation for the role that data plays for building successful ML solutions. Classical ML research has been primarily focused on learning algorithms and their guarantees. Recent progress has shown that data is playing an increasingly central role in creating ML solutions, such as the massive text data used for training powerful language models, (semi-)automatic engineering of weak supervision data that enables applications in few-labels settings, and various data augmentation and manipulation techniques that lead to performance boosts on many real world tasks. On the other hand, data is one of the main sources of security, privacy, and bias issues in deploying ML solutions in the real world.

This workshop will focus on the new perspective of machine learning for data --- specifically how ML techniques can be used to facilitate and automate a range of data operations (e.g. ML-assisted labeling, synthesis, selection, augmentation), and the associated challenges of quality, security, privacy and fairness for which ML techniques can also enable solutions. In this workshop, we aim to bring together researchers and practitioners working on methodology, theory, applications, and systems to exchange ideas, identify key challenges, and advance the field towards the most exciting and promising future directions.

Topics of particular interest include, but are not limited to:

Methods of using ML to assist human annotators in data labeling
Methods of automated data engineering, such as synthesis, augmentation, re-weighting, etc.
Theories, methods, and studies to characterize, detect, or mitigate data bias
Methods of detecting and preserving privacy information in data
Systems for automating data operations and analytics
Applications based on data-human-machine interactions

Invited Speakers

David Alvarez-Melis (Microsoft)

Lora Aroyo (Google)

Kamalika Chaudhuri (UCSD)

Kumar Chellapilla (Amazon)

Hoifung Poon (Microsoft)

Alex Ratner (University of Washington, Snorkel)

Dawn Song (Berkeley)

Eric P. Xing (CMU, Petuum)

Important Dates and Links

Paper Submission Deadline: June 14, 2021 (11:59pm AOE)

Author Notification: July 1, 2021

Camera-ready paper submission due: July 16, 2021 (11:59pm AOE)

Workshop date: July 23, 2021

Call for Papers (CFP) and submission instructions: https://sites.google.com/view/ml4data/call-for-papers

Submission site: https://cmt3.research.microsoft.com/ICML2021ML4data

Follow us on twitter: https://twitter.com/ml4data

Schedule

The following is the workshop schedule (all in Pacific Time) on Friday, July 23, 2021:

08:00 - 08:10 - Opening Remarks
08:10 - 08:50 - Invited Talk: David Alvarez-Melis: Comparing, Transforming, and Optimizing Datasets with Optimal Transport
08:50 - 09:30 - Invited Talk: Lora Aroyo: TBA
09:30 - 09:45 - Contributed Oral: Myra Cheng: SNoB: Social Norm Bias of “Fair” Algorithms
09:45 - 10:00 - Contributed Oral: Hari Prasanna Das: CDCGen: Cross-Domain Conditional Generation via Normalizing Flows and Adversarial Training
10:00 - 10:20 - Coffee Break
10:20 - 11:00 - Invited Talk: Eric Xing: A Data-Centric View for Composable Natural Language Processing
11:00 - 11:40 - Invited Talk: Kamalika Chaudhuri: TBA
11:40 - 12:30 - Poster Session
12:30 - 13:30 - Lunch Break
13:30 - 14:10 - Invited Talk: Hoifung Poon: Task-Specific Self-Supervised Learning for Precision Medicine
14:10 - 14:50 - Invited Talk: Dawn Song: Towards Building a Responsible Data Economy
14:50 - 15:05 - Contributed Oral: Mayana Pereira: An Analysis of the Deployment of Models Trained on Private Tabular Synthetic Data: Unexpected Surprises
15:05 - 15:20 - Coffee Break
15:20 - 16:00 - Invited Talk: Alex Ratner: Programmatic Weak Supervision for Data-centric AI
16:00 - 16:40 - Invited Talk: Kumar Chellapilla: Machine Learning with Humans-in-the-loop (HITL)
16:40 - 17:20 - Panel Discussion: Hoifung Poon, Paroma Varma, Kumar Chellapilla, Kamalika Chaudhuri
17:20 - 17:25 - Closing Remarks

Thanks to our sponsors!

Page updated

Google Sites

Report abuse