Invited Talks

Quickly Generating Diverse Valid Test Inputs with Reinforcement Learning

[Presentation]

Test input generators are a popular tool for assuring the quality of software and hardware. A test input generator is a non-deterministic function which can be used to send a large number of randomized inputs for a particular test driver. However, when the test driver requires strict validity constraints on the inputs, completely random input generation fails to generate enough valid inputs. Existing approaches to solving this problem rely on whitebox or greybox information collected by instrumenting the input generator and/or test driver. However, collecting such information reduces the speed at which tests can be executed. This talk discusses a blackbox approach for generating valid test inputs, based on reinforcement learning (RL). First, I will frame how the problem can be viewed as an RL problem. Then, I will present our solution, RLCheck, that uses a tabular, on-policy RL approach to guide the test input generator. I will outline the key points of success of RLCheck on our software benchmarks---in particular, the choice of a good level of state/action abstraction. I will also present results for a deep Q-learning prototype of RLCheck, which utilizes curiosity to increase the diversity of valid inputs generated --- highlighting the strength and weaknesses of this approach.

Bio

Caroline Lemieux is a PhD candidate at UC Berkeley, advised by Koushik Sen. Her research interests center around improving the correctness and reliability of software systems by developing automated methods for engineering tasks such as testing, debugging, and comprehension. Her current projects tackle these goals with a focus on fuzz testing and program synthesis. Her work on fuzz testing has been awarded an ACM SIGSOFT Distinguished Paper Award, Distinguished Artifact Award, Tool Demonstration Award, and Best Paper Award (Industry Track). Before Berkeley, she received her B.Sc. in Computer Science and Mathematics at the University of British Columbia, where she won the Governor General's Silver Medal in Science, awarded to the undergraduate student with highest standing in the Faculty of Science. She is the recipient of a Berkeley Fellowship for Graduate Study, and a Google PhD Fellowship in Programming Technologies and Software Engineering.

DNN Training Acceleration through Better Communication-Computation Overlap

[Presentation]

Abstract

As deep learning continues to revolutionize a variety of domains, training of Deep Neural Networks (DNNs) is emerging as a prominent workload in data centers. Data parallel DNN training is commonly employed for scalability. However, the relationship between communication and computation, a key factor that affects the DNN training throughput, is often overlooked in this network- and compute-intensive workload. In this talk, I will cast light on the communication-computation interdependencies that are critical for DNN training acceleration, and present two systems that significantly improve the training performance by leveraging this understanding. I will first discuss the communication paradigms, Parameter Server and AllReduce, and examine scalability challenges in each of them. I will then present TicTac, a system that optimizes training throughput by up to 37% through computation-aware parameter transfer scheduling in Parameter Servers. Finally, I will elaborate on the need for a different approach to tackle the same problem under AllReduce. I will introduce our system, Caramel, which improves training throughput under AllReduce by up to 3.62x using computation scheduling to achieve better communication-computation overlap.

Bio

Sangeetha Abdu Jyothi is an incoming Assistant Professor in the Department of Computer Science at the University of California, Irvine (from Jul 2020). Her research interests are in the broad areas of computer networking and systems with current focus on systems and machine learning. Her current work includes building high-performance systems for machine learning, and leveraging learning techniques to design efficient, verifiable, and interpretable control for systems. She is currently spending a year at VMware Research as a postdoctoral researcher after completing her Ph.D. at the University of Illinois, Urbana-Champaign in 2019. She is a winner of the Facebook Graduate Fellowship (2017) and was invited to attend the Heidelberg Laureate Forum (2019) and the Rising Stars in EECS Workshop at MIT (2018).

Website: https://www.ics.uci.edu/~sabdujyo

Algorithm-Hardware Co-Design for Edge DNN Deployment and ML for Hardware Design

[Presentation]

Abstract

Deploying deep neural networks (DNN) on embedded systems for computer vision has been challenging due to limited compute resources and strict energy budgets. In this talk, I will present an algorithm-hardware co-design approach we adopt to accelerate DNN models on embedded FPGAs for computer vision tasks, such as image classification and object detection. In our DNN designs, we employ novel operators, such as the shift operation and the deformable convolution. We modify both the models and the operations to make them more compatible with FPGA architectures. For image classification, we developed DiracDeltaNet, which is a ConvNet with only 1x1 convolutions with spatial convolutions being replaced by more efficient shift operations. For object detection, we designed an efficient anchor-free DNN model and codesigned the deformable convolution operation. In both works, we achieve real-time inference with competitive accuracy on the target embedded FPGA. We use High-level Synthesis (HLS) to generate hardware for the FPGA accelerators, which greatly reduces our design time. Since the quality of results from HLS is greatly impacted by the optimization decisions made in the frontend compiler, we apply machine learning to drive the optimization decisions. More specifically, we applied deep reinforcement learning to address the phase-ordering problem in the HLS compiler and show that RL can significantly improve the circuit performance compared to using the -O3 compiler flag, and achieves competitive results with state-of-the-art solutions, while requiring fewer samples.

Bio

Qijing Jenny Huang is a PhD student at the University of California, Berkeley, advised by Prof. John Wawrzynek. Her interests are in computer architecture, computer aided design, reconfigurable computing, and machine learning. She has been working on building efficient FPGA accelerators for emerging ML applications, HLS-based hardware/software flow, and ML-assisted HLS and compiler transformation. Her thesis work focuses on novel design and scheduling techniques for accelerating machine learning algorithms on heterogeneous spatial architecture. She received her B.A.Sc. in Electrical and Computer Engineering at the University of Toronto, where she was granted the University of Toronto Excellence Awards for her undergraduate research.