PPML for Audio

Special Sessions at 2021 INTERSPEECH on

Privacy-preserving Machine Learning for Audio, Speech and Language Processing

Aug 30, 2021 - Sep 3, 2021


Wiki CFP: http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=119373

Paper Submission Link: https://www.softconf.com/l/interspeech2021/user/

To submit papers to this session, please choose "14.16 Privacy-preserving Machine Learning for Audio, Speech and Language Processing" in 14 Special Sessions listed in INTERSPEECH 2021 paper submission system.

Scope & Objectives

This special session focuses on privacy-preserving machine learning (PPML) techniques in speech, language and audio processing, including centralized, distributed and on-device processing approaches. Novel contributions and overviews on the theory and applications of PPML in speech, language and audio are invited. We encourage submissions related to ethical and regulatory aspects of PPML in this context. Sending speech, language or audio data to a cloud server exposes private information. One approach called anonymization is to preprocess the data so as to hide information which could identify the user by disentangling it from other useful attributes. PPML is a different approach, which solves this problem by moving computation near the clients.

Due to recent advances in Edge Computing and Neural Processing Units on mobile devices, PPML is now a feasible technology for most speech, language and audio applications that enables companies to train on customer data without needing them to share the data. With PPML, data can sit on a customer's device where it is used for model training. During the training process, models from several clients are often shared with aggregator nodes that perform model averaging and sync the new models to each client. Next, the new averaged model is used for training on each client. This process continues and enables each client to benefit from training data on all other clients. Such processes were not possible in conventional audio/speech ML. On top of that, high-quality synthetic data can also be used for training thanks to advances in speech, text and audio synthesis.

PPML is especially beneficial in application sectors with high privacy requirements, e.g., health, finance, or military. In addition to privacy, PPML improves the user experience due to increased accuracy and lower latency from on-device processing. Tech companies have recently reported leveraging PPML approaches such as Federated learning for improving their user experiences while preserving user’s privacy.

Relevant topics include but are not limited to:

● Theory, implementation, and applications of statistical notions of privacy such as Differential Privacy

● Federated or decentralized learning for speech, language and audio processing.

● Privacy-preserving representation learning for audio and speech tasks, including Adversarial approaches.

● On-device training, adaptation, and inference of audio ML models

● Hardware optimization, sparsity, quantization, power savings and algorithmic trade-offs for on-device training and inference

● Generation and usage of synthetic speech, text, and audio for training ML models for Speech Recognition, Speaker Recognition, Keyword Spotting, etc.

● Secure multi-party computation, homomorphic encryption, secure enclaves, privacy attacks and mitigation approaches for PPML in audio and speech processing.

● Machine learning on encrypted speech, language, and audio data

● Tools, processes and benchmarks for PPML in speech, language, and audio processing

Keynote Talk

Prof. Isabel Trancoso, Instituto Superior Técnico (IST, Univ. Lisbon)

Isabel Trancoso is a full professor at Instituto Superior Técnico (IST, Univ. Lisbon), and the President of the Scientific Council of INESC ID Lisbon. She got her PhD in ECE from IST in 1987. She chaired the ECE Department of IST. She was Editor-in-Chief of the IEEE Transactions on Speech and Audio Processing and had many leadership roles in SPS (Signal Processing Society of IEEE) and ISCA (International Speech Communication Association), namely having been President of ISCA and Chair of the Fellow Evaluation Committees of both SPS and ISCA. She was elevated to IEEE Fellow in 2011, and to ISCA Fellow in 2014. Her recent research interests include microblog translation, privacy preserving speech mining, lexical and prosodic entrainment in spoken dialogues and disfluency detection in spontaneous speech.

Session Format:

This session will be organized into two phases. In Phase I, authors will give 2-minute lightning talks for their papers which will be followed by a keynote talk. In Phase II, papers will be presented in the form of posters.

Paper Submission:

Papers for PPML Special Session have to be submitted following the same schedule and procedure as regular papers of INTERSPEECH 2021. The submitted papers will undergo the same review process by anonymous and independent reviewers.

To submit papers to this session, please choose "14.16 Privacy-preserving Machine Learning for Audio, Speech and Language Processing" in 14 Special Sessions listed in INTERSPEECH 2021 paper submission system.

  • Submission deadline: same as INTERSPEECH submission deadline (March 26, 2021)

  • Notification of acceptance: June 2, 2021

  • Camera Ready: June 15, 2021


Harishchandra Dubey (Microsoft)

Amin Fazel (Amazon)

Mirco Ravanelli (MILA , Université de Montréal)

Emmanuel Vincent (Inria)