Dr. Yanzhi Wang is currently an associate professor and faculty fellow at Dept. of ECE at Northeastern University, Boston, MA. He received the B.S. degree from Tsinghua University in 2009, and Ph.D. degree from University of Southern California in 2014. His research interests focus on model compression and platform-specific acceleration of deep learning applications. His work has been published broadly in top conference and journal venues (e.g., DAC, ICCAD, ASPLOS, ISCA, MICRO, HPCA, PLDI, ICS, PACT, ISSCC, AAAI, ICML, NeurIPS, CVPR, ICLR, IJCAI, ECCV, ICDM, ACM MM, FPGA, LCTES, CCS, VLDB, PACT, ICDCS, RTAS, Infocom, C-ACM, JSSC, TComputer, TCAS-I, TCAD, TCAS-I, JSAC, TNNLS, etc.), and has been cited above 16,000 times. He has received six Best Paper and Top Paper Awards, and one Communications of the ACM cover featured article. He has another 13 Best Paper Nominations and four Popular Paper Awards. He has received the U.S. Army Young Investigator Program Award (YIP), IEEE TC-SDM Early Career Award, APSIPA Distinguished Leader Award, Massachusetts Acorn Innovation Award, Martin Essigmann Excellence in Teaching Award, Massachusetts Acorn Innovation Award, Ming Hsieh Scholar Award, and other research awards from Google, MathWorks, etc. He has received 26 federal grants from NSF, DARPA, IARPA, ARO, ARFL/AFOSR, Dept. of Homeland Security, etc.. He has participated in a total of $40M funds with personal share $8.5M. 11 of his academic descendants become tenure track faculty at Univ. of Connecticut, Clemson University, Chongqing University, University of Georgia, Beijing University of Technology, University of Texas San Antonio, and Cleveland State University. They have secured around $4M personal share in funds.
GPT and Stable Diffusion on the Mobile: Towards Ultimate Efficiency in Deep Learning Acceleration
Abstract: Mobile and embedded computing devices have become key carriers of deep learning to facilitate the widespread use of machine intelligence. However, there is a widely recognized challenge to achieve real-time DNN inference on edge devices, due to the limited computation/storage resources on such devices. Model compression of DNNs, including weight pruning and weight quantization, has been investigated to overcome this challenge. However, current work on DNN compression suffers from the limitation that accuracy and hardware performance are somewhat conflicting goals difficult to satisfy simultaneously.
We present our recent work Compression-Compilation Co-design, to overcome this limitation towards the best possible DNN acceleration on edge devices. The neural network model is optimized in a hand-in-hand manner with compiler-level code generation, achieving the best possible hardware performance while maintaining zero accuracy loss, which is beyond the capability of prior work. We are able to achieve real-time on-device execution of a number of DNN tasks, including object detection, pose estimation, activity detection, speech recognition, just using an off-the-shelf mobile device, with up to 180 X speedup compared with prior work. Recently, for the first time, we enable large-scale language and AIGC models such as GPT and Stable Diffusion on mobile devices. We will also introduce our breakthrough in digital avatar, stable diffusion for video generation, and interaction systems. Last we will introduce our recent breakthrough of superconducting logic based neural network acceleration that achieves 10^6 times energy efficiency gain compared with state-of-the-art solutions, achieving the quantum limit in computing.
Dr. Xiaojun Ruan is an Associate Professor in the Department of Computer Science at California State University, East Bay. He earned his Ph.D. in Computer Science and Software Engineering from Auburn University in 2011, following his B.E. in Computer Science and Technology from Shandong University in Jinan, China, 2005. Prior to his current role, he held a tenured position as an Associate Professor of Computer Science at West Chester University of Pennsylvania. His research interests include Cloud Computing, Distributed and Parallel Computing, Computer Security, Energy-Efficient Computing, Data Storage Systems, and Artificial Intelligence.
Virtual machine allocation and migration based on performance-to-power ratio in energy-efficient clouds
The last decade witnessed a dramatic advance in cloud computing research and techniques. One of the key challenges in this field is reducing the massive amount of energy consumption in cloud computing data centers. Many power-aware virtual machine (VM) allocation and consolidation approaches were proposed to reduce energy consumption efficiently. However, most of the existing efficient cloud solutions save energy at the cost of significant performance degradation. In this paper, we propose a strategy to calculate the optimized working utilization levels for host computers. As the performance and power data need to be measured on real platforms, to make our design practical, we propose a strategy named “PPRGear” which is based on the sampling of utilization levels with distinct Performance-to-Power Ratios (PPR) calculated as the number of Server Side Java operations completed during a certain time period divided by the average active power consumption in that period. In addition, we present a framework for virtual machine allocation and migration which leverages the PPR for various host types. By achieving the optimal balance between host utilization and energy consumption, our framework is able to ensure that host computers run at the most power-efficient utilization levels, i.e., the levels with the highest PPR, thus tremendously reducing energy consumption with ignorable sacrifice of performance. Our extensive experiments with real world traces show that compared with three baseline energy-efficient VM allocation and selection algorithms, IqrMc, MadMmt, and ThrRs, our framework is able to reduce the energy consumption up to 69.31% for various host computer types with fewer migration times, shutdown times, and little performance degradation for cloud computing data centers.