Invited talks

Topic: Efficient Single- and Multi-Modal Vision Learning with Language

Speaker: Diana Marculescu, The University of Texas at Austin

Bio: Diana Marculescu is Department Chair, Cockrell Family Chair for Engineering Leadership #5, and Professor, Motorola Regents Chair in Electrical and Computer Engineering #2, at the University of Texas at Austin. She is also the Founding Director of the iMAGiNE Consortium on Intelligent Machine Engineering, a joint industry-university partnership focusing on engineering the machines that support intelligent applications from cloud to edge. Prior to joining UT Austin in 2019, she was the David Edward Schramm Professor of Electrical and Computer Engineering at Carnegie Mellon University. Her research interests include energy- and reliability-aware computing, hardware aware machine learning, and computing for sustainability and natural science applications. Diana is a recipient of multiple best paper, research achievement, and mentoring awards. She is a Lifetime ACM Fellow, a Fellow of IEEE, and a Fellow of AAAS.

Efficient Multi-Modal LLM

Speaker: Song Han, MIT

Bio: Song Han is an associate professor at MIT EECS and distinguished scientist at NVIDIA. He received his PhD degree from Stanford University. He proposed the “Deep Compression” technique including pruning and quantization that became the standard lexicon for efficient AI computing, and “Efficient Inference Engine” that first brought weight sparsity to modern AI chips, a top-5 cited paper in 50 years of ISCA. He pioneered the TinyML research that brings deep learning to IoT devices. His team’s recent work on LLM quantization and acceleration (SmoothQuant, AWQ, StreamingLLM) improved the efficiency of LLM inference, adopted by NVIDIA TensorRT-LLM. Song received best paper awards at ICLR, FPGA, and MLSys, NSF CAREER Award, and Sloan Research Fellowship. Song was named “35 Innovators Under 35” by MIT Technology Review.

MobileNetV4 - Universal Models for the Mobile Ecosystem

Speaker: Danfeng Qin, Google Research

Bio: Danfeng Qin is a software engineer in Google Research, co-leading MobileNet-V4 and Mobile VLM projects. Her current research includes efficient model architecture design and model-hardware co-design.

Efficient Gaussian Splatting

Speaker: Forrest Iandola, META

Bio: Forrest Iandola completed a PhD in EECS at UC Berkeley, where his research focused on squeezing deep neural networks onto small devices. As part of his dissertation research, he developed the energy-efficient SqueezeNet neural network. His advances in deep learning led to the founding of DeepScale, which was acquired by Tesla in 2019. He is currently an AI Research Scientist at Meta.

Topic: Matformers -- Nested Transformers for Elastic Inference

Speaker: Prateek Jain, Google Research

Bio: Preteek Jain is a Principal Scientist/Diretor at Google DeepMind India where he also leads the Machine Learning and Optimization team. His research interests are in machine learning, efficient and elastic inference of large models, retrieval and indexing, rigorous non-convex optimization and reinforcement learning. Earlier, he was a Sr. Principal Research Scientist at Microsoft Research India. he completed his PhD at the University of Texas at Austin under Prof. Inderjit S. Dhillon.

Scalable 3D/4D Assets Creation

Speaker: Zhangyang “Atlas” Wang, The University of Texas at Austin

Bio: Atlas Wang is a tenured Associate Professor and holds the Temple Foundation Endowed Faculty Fellowship #7, in the Chandra Family Department of Electrical and Computer Engineering at The University of Texas at Austin. He leads the VITA group of UT (https://vita-group.github.io/). Since May 2024, he is currently on leave from UT Austin to serve as the full-time Research Director for XTX Markets, heading their new AI Lab in New York City. Dr. Wang’s core research mission is to leverage, understand and expand the role of low dimensionality in ML and optimization, whose impacts span over many important topics such as the efficiency and trust issues in large language models (LLMs) as well as generative vision.

Distribution-aware Post-training Quantization for Large Vision Language Models

Speaker: Huanrui Yang, University of Arizona

Bio: Huanrui Yang is an incoming Assistant Professor in the ECE Department at the University of Arizona. He was a Postdoctoral Scholar in Berkeley AI Research under Prof. Kurt Keutzer. He received his PhD in ECE from Duke University in 2022 under Prof. Helen Li and Prof. Yiran Chen. His primary research aims to improve the efficiency and robustness of deep learning algorithms, with applications spanning from computer vision, generative models, and natural language processing. Huanrui has published multiple peer-reviewed papers in top-tier journals and conferences and was selected to receive the 2021 Shanghai World AI Conference Yunfan Rising Star Award.

Efficient Vision Foundation Models: Backbones and PEFT

Speaker: Pavlo Molchanov, NVIDIA Research

Bio: Pavlo Molchanov is a Distinguished Research Scientist and Team Manager at NVIDIA Research where has been leading the Deep Learning Efficiency Team. He obtained a PhD from Tampere University of Technology, Finland, in 2014. During his studies, he received the Nokia Foundation Scholarship, GETA Graduate School grant, Best Paper Award, and Young Researcher Award at EuRAD. Recently, he has focused on efficiency in LLMs and multi-modal models: compression, NAS-like acceleration, novel architectures, and adaptive/conditional inference. His past research has led to several NVIDIA product integrations: hand, body, and facial keypoint estimation and recognition in DriveIX, Broadcast, Omniverse, Maxine; efficient vision backbones in TAO, developed compression techniques in TAO, NVIDIA AV, TRT Model Optimization; and small in-game LLMs.