Invited talks

Efficient 3D Perception for Autonomous Vehicles

Speaker: Zhijian Liu

Abstract: Autonomous vehicles rely on 3D perception to understand their surrounding environment. Although there has been remarkable progress in enhancing the accuracy of perception models, their efficiency still lags behind real-time performance, impeding their use in real-world applications. In this talk, I will present our recent work, BEVFusion (ICRA 2023), which facilitates efficient multi-task multi-sensor fusion by unifying camera, LiDAR, and radar features in a shared bird's-eye view (BEV) space. We addressed an efficiency bottleneck by accelerating the key view transformation operator by 40 times. BEVFusion achieved the leading solution on three popular 3D perception benchmarks, including nuScenes, Argoverse, and Waymo, across different tasks, such as object detection, object tracking, and map segmentation. Subsequently, I will discuss two of our latest works, FlatFormer (CVPR 2023) and SparseViT (CVPR 2023), which aim to accelerate 2D image and 3D point cloud backbones for perception. FlatFormer is an efficient point cloud transformer that attains real-time performance on edge GPUs and is faster than sparse convolutional methods while retaining superior accuracy. SparseViT explores the idea of spatial sparsity in the context of 2D image transformers and delivers a 1.5 times measured speedup compared with its dense counterpart without compromising accuracy.

Bio: Zhijian Liu is a fifth-year Ph.D. student at MIT, advised by Professor Song Han. He received his B.Eng. degree from Shanghai Jiao Tong University in 2018. His research focuses on developing efficient algorithms and systems for deep learning, with applications in computer vision. His work on efficient 3D deep learning ranked first on multiple competitive 3D benchmarks, won the first place in the nuScenes LiDAR segmentation challenge, and was featured by media outlets, such as NVIDIA News and MIT News. He was selected as the recipient of Qualcomm Innovation Fellowship, NVIDIA Graduate Fellowship (declined) and MIT Ho-Ching and Han-Ching Fund Award.


MobileNets: From First Principles to Adapting to Real World Scenarios

Speaker: Andrew Howard

Abstract: We present the building blocks for efficient computer vision models and how they are used in MobileNet models. After introducing the base models,we then focus on how to adapt models to different real world scenarios. We present adaptation to individual hardware as well as to an ecosystem of hardware. We then show how to adapt models to perform better on an image by image basis.. We then conclude with adapting to a user's intent to improve computer vision models.

Bio: Andrew Howard is a Senior Staff Software Engineer at Google Research working on efficient computer vision models for mobile applications. He leads a team of applied researchers focussing on both novel research as well as production use cases. Andrew is the originator of Google’s popular MobileNet models. He received his PhD from Columbia University in computer science focusing on machine learning.


Efficient Deep Learning Computing with Sparsity

Speaker: Song Han

Abstract: Modern deep learning requires a massive amount of computational resources, energy, and engineering efforts. The first principle of efficient AI computing is to be lazy: avoid redundant computation, quickly reject the work, or delay the work. In this talk, I will start by presenting a series of works aimed at accelerating generative AI models, including SIGE, FastComposer, GAN Compression, DiffAugment and Anycost GANs. Subsequently, I will introduce two efficient LLM projects, SmoothQuant and AWQ and provide demonstrations of their pratical applications. Finally, I will conclude this talk with a TinyML project focused on on-device training under 256KB memory. The presentation will highlight full-stack optimizations which allows a larger design space to unearth the underlying principles for sparsity and efficient AI.

Bio: Song Han is an associate professor at MIT EECS. He received his PhD degree from Stanford University. He proposed the “Deep Compression” technique including pruning and quantization that is widely used for efficient AI computing, and “Efficient Inference Engine” that first brought weight sparsity to modern AI chips. He pioneered the TinyML research that brings deep learning to IoT devices, enabling learning on the edge. His team’s work on hardware-aware neural architecture search (once-for-all network) enables users to design, optimize, shrink and deploy AI models to resource-constrained hardware devices, receiving the first place in many low-power computer vision contests in flagship AI conferences. Song received best paper awards at ICLR and FPGA, faculty awards from Amazon, Facebook, NVIDIA, Samsung and SONY. Song was named “35 Innovators Under 35” by MIT Technology Review for his contribution on “deep compression” technique that “lets powerful artificial intelligence (AI) programs run more efficiently on low-power mobile devices.” Song received the NSF CAREER Award for “efficient algorithms and hardware for accelerated machine learning”, IEEE “AIs 10 to Watch: The Future of AI” award, and Sloan Research Fellowship.


Generative AI for data efficient machine learning

Speaker: Oncel Tuzel

Abstract: How do we increase the size of our training data without collecting more? Can we go beyond simple data augmentation? A major bottleneck for building reliable machine learning systems is collecting and annotating large datasets. In this talk, we will discuss our recent work, a sequence generative model that enables us to improve downstream recognition tasks by synthesizing missing training data through control of content and style. We will talk about the results of applying this technique to handwriting and speech recognition, and share insights and techniques to address the domain gap between the synthetic and the real data distributions.

Bio: Oncel Tuzel is a senior principal researcher and research manager at MIND team in Apple. He received his Ph.D. from the computer science department at Rutgers University in 2008. His research interests are broadly in machine learning, particularly focusing on generative AI, and computational and data efficiency of learning algorithms. He has co-authored over 70 peer-reviewed publications and holds over 50 US and international patents. His work has received the best paper award in 2017 CVPR, the best paper runner-up award in 2007 CVPR, and the 2014 R&D 100 award -- awarded to 100 most innovative technology introduced in 2013.


CLIP for Less: Efficient Training and Adaptation of Foundational Models

Speaker: Kate Saenko

Bio: Kate is an AI Research Scientist at FAIR, Meta and a Full Professor of Computer Science at Boston University (currently on leave) where she leads the Computer Vision and Learning Group. Kate received a PhD in EECS from MIT and did postdoctoral training at UC Berkeley and Harvard. Her research interests are in Artificial Intelligence with a focus on out-of-distribution learning, dataset bias, domain adaptation, vision and language understanding, and other topics in deep learning.


Mobile Super Resolution: MLPerf App - Benchmarking and Challenges

Speakers: David Tafur & Sanghyun Son

Bio: David Tafur: David Tafur is a Product Manager at MLCommons, He graduated magna cum laude with a BSc in Industrial Engineering from the University of Lima and the University of Queensland, Australia. With over six years of experience managing digital products across a variety of industries, including AI/ML, Banking and SaaS Products. Currently, David serves as Product Manager at MLCommons, where he plays a critical role in the development of the MLPerf App and other benchmarking tools. 

Sanghyun Son is a Ph.D. candidate in the Department of Electrical and Computer Engineering at the Seoul National University (SNU), Seoul, Korea. He graduated summa cum laude from Seoul National University with a B.S. degree in Electrical and Computer Engineering in 2017. He is interested in low-level computer vision problems, including image super-resolution, denoising, and deblurring. He has co-authored EDSR, one of the representative super-resolution algorithms in the deep learning era. Recently, he is working to generalize super-resolution toward more practical applications.


Increasing Efficiency by Reducing Redundancy


Speaker: Judy Hoffman


Abstract: The best CV models today require training on large datasets to fit large models. This leads to expensive training and inference. In this talk, we will discuss ways increase efficiency both at training and at test time. At training time, we will present ZipIt!, an algorithm to merge multiple pre-trained models to produce a single model with combined capabilities without retraining, thus minimizing the training time redundancy for overlapping images or tasks. To increase inference time efficiency, we will discuss Token Merging (ToME), a method that explicitly reduces computational redundancy by finding and merging similar tokens before processing, thus significantly increasing inference speed without additional training. 


Bio: Dr. Judy Hoffman is an Assistant Professor in the School of Interactive Computing at Georgia Tech and a member of the Machine Learning Center. Her research lies at the intersection of computer vision and machine learning with specialization in domain adaptation, transfer learning, adversarial robustness, and algorithmic fairness. She has received numerous awards including NSF CAREER (2022), Google Research Scholar Award (2022), Samsung AI Researcher of the Year Award (2021), NVIDIA female leader in computer vision award (2020), AIMiner top 100 most influential scholars in Machine Learning (2020), MIT EECS Rising Star in 2015, a Diversity and Inclusion Fellow (2021-2022), and serves as the CVPR Program co-Chair (2023). In addition to her research, she co-founded and continues to advise for Women in Computer Vision, an organization which provides mentorship and travel support for early-career women in the computer vision community. Prior to joining Georgia Tech, she was a Research Scientist at Facebook AI Research. She received her PhD in Electrical Engineering and Computer Science from UC Berkeley in 2016 after which she completed Postdocs at Stanford University and UC Berkeley.