workshop Program

[March 25, 2018]

Keynote Talk

Invited Talk

Short Break

Invited Talk

Paper Presentations

ReQuEST Overview

Kamal Khouri, NXP Semiconductors

Charles Qi, Tensilica/Cadence

To Be Announced

To Be Announced

Grigori Fursin, dividiti and cTuning

(1.00 - 1.50 pm EST)

(2.00 - 2.45 pm EST)

(2.50 - 3.00 pm EST)

(3.00 - 3.45 pm EST)

(3.50 - 5.30 pm EST)

(5.30 - 6.00 pm EST)

Keynote Talk: "Safety and Security at the Heart of Autonomous Driving"

Speaker: Kamal Khouri, NXP Semiconductors

Bio: Kamal Khouri is General Manager & Vice-president of Automotive Microcontrollers and Processors for ADAS Product Line at NXP Semiconductors. Kamal holds a BS in Electrical Engineering from Bucknell University and a Masters and Ph.D. in Computer Engineering from Princeton University. He has over 17 years of semiconductor industry experience with multiple patents and over 25 publications.

He started his career at Motorola SPS and later Freescale working in various roles in engineering and product management within the compute and networking divisions. Kamal was also Director of Products for various businesses at AMD, ranging from embedded to gaming and semi-custom products.

At NXP, his team is now defining the future of autonomous vehicles and the processing power they need to make them a reality.

Invited Talk: "Challenges and Solutions for Embedding Vision AI"

Speaker: Charles Qi, Tensilica/Cadence

Abstract: Recently computer vision and neural network based AI technology have seen explosive demands in embedded systems such as robots, drones, autonomous vehicles, etc. Due to cost and power constraints, it remains quite challenging to achieve satisfactory performance, while maintaining power efficiency and scalability for embedded vision AI. This presentation first analyzes the technical challenges of embedding vision AI, from the perspectives of algorithm complexity, computation and memory BW demands, and constrains of power consumption profile. The analysis shows that modern neural networks for vision AI contain complex topology and diversified computation steps. These neural networks are often part of a large embedded vision processing pipeline, intermixed with conventional vision algorithms. As a result, the vision AI implementation demands several TOPS computation performance and ten’s of GB memory BW. Subsequently the architecture of Tensilica Vision AI DSP processor technology is presented with three distinctive advantages: The optimized instruction sets of Vision P6 and Vision C5 DSP are explained as examples of achieving instruction level computation efficiency and performance. This is coupled with unique processor architecture features for achieving SoC level data processing efficiency and scalability that lead to a high-performance vision AI sub-system. The fully automated AI optimization framework, software libraries and tools provide practical performance tuning methodology and rapid turn-around time for embedded vision AI system design. In conclusion, the presentation offers considerations for future research and development to bring embedded vision AI to the next performance level.

Bio: Charles Qi is a system solutions architect in Cadence’s IPG System and Software team, responsible for providing vision system solutions based on the Cadence® Tensilica Vision DSP technology and a broad range of interface IP portfolio. At system level, his primary focus is image sensing, computer vision and deep learning hardware and software for high-performance automotive vision ADAS SoC. Currently he is also an active internal architecture team member for high performance neural network acceleration hardware IPs.

Prior to joining Cadence, Charles held various technical positions in Intel, Broadcom and several high-tech startups.

Invited Talk: "Introducing ReQuEST: an Open Platform for Reproducible and Quality-Efficient Systems-ML Tournaments"

Speaker: Grigori Fursin, dividiti and cTuning foundation

Abstract: Co-designing efficient machine learning based systems across the whole application/hardware/software stack to trade off speed, accuracy, energy and costs is becoming extremely complex and time consuming. Researchers often struggle to evaluate and compare different published works across rapidly evolving software frameworks, heterogeneous hardware platforms, compilers, libraries, algorithms, data sets, models, and environments. I will present our community effort to develop an open co-design tournament platform with an online public scoreboard based on Collective Knowledge workflow framework (CK). It gradually incorporates best research practices while providing a common way for multidisciplinary researchers to optimize and compare the quality vs. efficiency Pareto optimality of various workloads on diverse and complete hardware/software systems. All the winning solutions will be made available to the community as portable and customizable "plug&play" components with a common API to accelerate research and innovation!

I will then discuss how our open competition and collaboration can help to achieve energy efficiency for cognitive workloads based on energy-efficient submissions from the 1st ReQuEST tournament co-located with ASPLOS’18. Further details:

Bio: Grigori Fursin is the CTO of dividiti and the Chief Scientist of the non-profit cTuning foundation. He is developing an open Collective Knowledge platform to crowdsource multi-objective optimization and co-design of deep learning and other emerging workloads across the whole SW/HW/model stack. Before co-founding dividiti in 2015, he was the head of workload optimization group at Intel Exascale Lab and a senior tenured scientist at INRIA. Grigori has an interdisciplinary background in computer engineering, physics, electronics and machine learning with a PhD in Computer Science from the University of Edinburgh (2004). He is a recipient of a personal INRIA fellowship for "making an outstanding contribution to research" in 2012 and the ACM CGO "test of time" award in 2017. Further info: