Schedule
Thursday, June 8th, 2023
Thursday, June 8th, 2023
Start Time — Topic— Speaker
9:00 am — Opening remarks — Colby Banbury
9:15 am — Keynote — Nicholas Lane (University of Cambridge, Samsung)
10:15 am — Morning break
10:30 am — Simple and efficient pseudo-labeling for speech recognition — Tatiana Likhomanenko (Apple)
11:00 am — Accepted Paper Lightning Talks
11:30 am — Poster Session
12:00 pm — Lunch break
2:00 pm — Smart Homes Without Spying — Nat Jeffries (Useful Sensors)
2:30 pm — HLS4ML — Benjamin Hawks (Fermilab)
3:00 pm — Afternoon break
3:30 pm — System-Algorithm Co-Design for TinyML — Ji Lin (MIT)
4:00 pm — Invited Speaker —Ankita Nayak
4:30 pm — Expert Panel / Discussion
4:50 pm — Closing remarks — Matt Stewart
Colby Banbury
The vast majority of machine learning (ML) occurs today in a data center. But there is a very real possibility that in the (near?) future, we will view this situation similarly to how we now view lead paint, fossil fuels and asbestos: a technological means to an end, that was used for a time because, at that stage, we did not have viable alternatives – and we did not fully appreciate the negative externalities that were being caused. Awareness of the unwanted side effects of the current ML data center centric paradigm is building. It couples to ML an alarming carbon footprint, a reliance to biased close-world datasets, serious risks to user privacy – and promotes centralized control by large organizations due to the assumed extreme compute resources. In this talk, I will offer a sketch of preliminary thoughts regarding how a data center free future for ML might come about, and also describe how some of our recent research results and system solutions (including the Flower framework -- http://flower.dev) might offer a foundation along this path.
Professor Nicholas D. Lane (University of Cambridge, Flower Labs)
Nic Lane is a full Professor in the department of Computer Science and Technology, and a Fellow of St. John's College, at the University of Cambridge. He also leads the Cambridge Machine Learning Systems Lab (CaMLSys). Alongside his academic appointments, Nic is the co-founder and Chief Scientific Officer of Flower Labs, a venture-backed AI company (YCW23) behind the Flower framework. Nic has received multiple best paper awards, including ACM/IEEE IPSN 2017 and two from ACM UbiComp (2012 and 2015). In 2018 and 2019, he (and his co-authors) received the ACM SenSys Test-of-Time award and ACM SIGMOBILE Test-of-Time award for pioneering research, performed during his PhD thesis, that devised machine learning algorithms used today on devices like smartphones. Nic was the 2020 ACM SIGMOBILE Rockstar award winner for his contributions to “the understanding of how resource-constrained mobile devices can robustly understand, reason and react to complex user behaviors and environments through new paradigms in learning algorithms and system design.
Pseudo-labeling (PL) algorithms have recently emerged as a powerful strategy for semi-supervised learning in speech recognition in the era of transformers and large scale data. In this talk, I will walk you from the first successful pseudo-labeling algorithms based on teacher-student training, that alternates between training a model and generating pseudo-labels (PLs) with it, to continuous pseudo-labeling algorithms, where PLs are generated in end-to-end manner as training proceeds, improving training speed and the accuracy of the final model. We will discuss different aspects of PL algorithms to make it simple and resource efficient: what exactly the model learns, how training dynamics changes, how speaker diversity and amount of hours affects the training, dependence on language models, what are the key components of such huge success.
Tatiana Likhomanenko (Apple)
Tatiana is a research scientist at the machine learning research team at Apple. Prior to Apple, she was an AI resident and later a postdoctoral research scientist in the speech recognition team, Facebook AI Research. Back in the day, Tatiana received a Ph.D. in mixed type partial differential equations from Moscow State University. For 4 years she worked on applications of machine learning to high energy physics as a researcher in the joint lab at Yandex and CERN, and later at the startup NTechLab, a leader in face recognition. The main focus of her recent research is transformers training and generalization and speech recognition (semi-, weakly- and unsupervised learning, domain transfer and robustness).
Accepted Papers - (3 Minutes Each)
Low-cost camera and microphone-based sensors enable applications beyond the limitations of traditional passive sensors. Even with the best intentions, adding cameras and microphones directly to a connected product puts the end-user’s privacy at risk if the device is compromised. Further, many consumers do not trust OEMs to respect their privacy. A standalone image or audio sensor which only outputs information relevant to the application greatly reduces the information available to a potentially compromised system or ill-intentioned OEM. Application-specific sensors lay the foundation for a broader initiative to improve data privacy, product labeling, and standards around connected devices with cameras, microphones, or biometric sensors.
Nat Jeffries (Useful Sensors)
Nat Jeffries is a founding engineer at Useful Sensors, where he designs privacy-preserving embedded ML sensors. He graduated from Carnegie Mellon University in 2016 with a degree in ECE. He joined Google where he worked on embedded systems before joining Pete Warden to spin up Tensorflow Lite for Microcontrollers. He has previously spoken at Tensorflow World in Sao Paulo Brazil, and guest lectured on TinyML at Harvard.
Born from the high energy physics community at the Large Hadron Collider, hls4ml is an open-source Python package for machine learning inference in FPGAs (Field Programmable Gate Arrays). It creates firmware implementations of machine learning algorithms by translating traditional, open-source machine learning package models into optimized high level synthesis C++ that can then be customized for your use case and implemented on devices such as FPGAs and Application Specific Integrated Circuits (ASICs). Hls4ml can easily scale the implementation of a model to take advantage of the parallel processing capabilities that FPGAs offer, not only allowing for low latency, high throughput designs, but also designs sized to fit on lower cost, resource constrained hardware. Hls4ml also supports generating accelerators with different drivers that build minimal, self-contained implementations which enable control via Python or C/C++ with little extra development or hardware expertise.
Benjamin Hawks (Fermilab) - bhawks@fnal.gov
Ben Hawks is an AI Researcher at Fermi National Accelerator Laboratory, focusing on optimizing and compressing neural networks to be tiny, fast, and accurate for use on FPGAs and other specialized hardware. Since he was young, he’s had a personal interest in computer security, programming, and electronics, and is interested in learning how to make machine learning fair, efficient, and fast. Outside of work, he spends his time messing with electronics, tabletop RPGs, and catering to the whims of a small feline overlord.
On-device training enables the model to adapt to new data collected from the sensors by fine-tuning a pre-trained model. Users can benefit from customized AI models without having to transfer the data to the cloud, protecting the privacy. However, the training memory consumption is prohibitive for IoT devices that have tiny memory resources. We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. On-device training faces two unique challenges: (1) the quantized graphs of neural networks are hard to optimize due to low bit-precision and the lack of normalization; (2) the limited hardware resource does not allow full back-propagation. To cope with the optimization difficulty, we propose Quantization-Aware Scaling to calibrate the gradient scales and stabilize 8-bit quantized training. To reduce the memory footprint, we propose Sparse Update to skip the gradient computation of less important layers and sub-tensors. The algorithm innovation is implemented by a lightweight training system, Tiny Training Engine, which prunes the backward computation graph to support sparse updates and offload the runtime auto-differentiation to compile time. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB SRAM and 1MB Flash without auxiliary memory, using less than 1/1000 of the memory of PyTorch and TensorFlow while matching the accuracy on tinyML application VWW. Finally, we will briefly discuss how similar techniques facilitate efficient LLM inference and training, though it is the other very end of scaling up.
Ji Lin (MIT)
Ji Lin is currently a fourth-year Ph.D. student at MIT EECS, advised by Prof. Song Han. His research interests lie in efficient deep learning algorithms and systems. He received his B.Eng. in Electronic Engineering from Tsinghua University and his M.Sc. in EECS from MIT. His work has been covered by media like MIT Tech Review, MIT News, WIRED, Engadget, VentureBeat, etc. His research is sponsored by Qualcomm Innovation Fellowship.
The successful deployment of 5G technology has led to an explosion of new use cases leading to ever-increasing dimensionality of optimizations needed for improving power, performance and area of cellular modems. Hence there is an emerging and competitive need to explore novel solutions to improve the cellular performance that will translate into real tangible improvements to the user experience. This talk discusses the role of on-device Machine Learning in paving the path to a differentiated product with an AI-enabled 5G NR modem. It discusses the challenges and considerations for building an ML inference engine for 5G modem system and on-device ML aspects beyond inference.
Ankita Nayak (Qualcomm, Stanford)
Ankita is a Senior Staff Engineer in Wireless R&D at Qualcomm Technologies, Inc. Ankita has over eleven years of research and product development experience in hardware and software systems for deep learning, computer vision and wireless domains. She works on on-device ML initiatives for AI/ML-enabled 5G modems. She is also pursuing a doctoral degree at Stanford University, where her research is on energy-efficient agile hardware systems for deep learning and computer vision. Ankita has served as a reviewer, technical program committee member and published at various systems and architecture conferences.
Join our invited speakers for a Q/A and discussion session covering cross-cutting topics in the field of on-device intelligence.
Moderator: Colby Banbury
Matthew Stewart