Key Dates and Schedule

July 7 : Extended abstracts due
July 21 : Notification of acceptance
July 28 : Final papers due
August 10: Workshop

  8:45am Introduction
Forrest Iandola (DeepScale)
Title:Small Deep-Neural-Networks: Their Advantages, and Their Design
Abstract: Deep neural networks (DNNs) have led to significant improvements to the accuracy of machine-learning applications. For many problems, such as object classification and object detection, DNNs have led to levels of accuracy that are acceptable for commercial applications. In other words, thanks to DNNs, an ever-growing range of ML-enabled applications are now ready to be put into commercial use. However, the next hurdle is that many DNN-enabled applications can only achieve their highest value when they are deployed on smartphones or other small, low-wattage, embedded hardware.
When deploying DNNs on embedded hardware, there are a number of reasons why small DNN models (i.e. models with few parameters) are either required or strongly recommended. These reasons include:
 - Small models require less bandwidth communication when sending updated models from the cloud to the client (e.g. smartphone or autonomous car)
 - Small models train faster
 - Small models require fewer memory transfers during inference, and off-chip memory transfers require 100x more power than arithmetic operations
To create DNNs that met the requirements of embedded systems and benefitted from the advantages of small DNNs, we set out in 2015 to identify smaller DNN models that can be deployed on embedded devices. The first result of our efforts was SqueezeNet, a DNN targeted for the object classification problem that achieves the same accuracy as the popular DNN AlexNet but with a 50x reduction in the number of model parameters.
SqueezeNet was created using a few basic techniques including kernel reduction, channel reduction, and delayed pooling. Over the last year, many other researchers have pursued the same goals of small, fast, energy-efficient DNNs for computer-vision problems ranging from object classification to style-transfer. In this talk we review these developments and report our progress in developing a systematic approach to the design of small DNNs. 
Venkatesh Saligrama (Boston University)
Title: An Adaptive Approximation for Prediction Under a Budget
Abstract: We propose a novel adaptive approximation approach for test-time resource-constrained prediction for classification and sequential-decision making problems. Given an input instance at test-time, a gating function identifies a prediction model or policy for the input among a collection of models or policies. Our objective is to minimize overall average cost without sacrificing accuracy. We present a novel bottom-up method based on adaptively approximating a high-accuracy model in regions where low-cost models are capable of making highly accurate predictions. We pose an empirical loss minimization problem with cost constraints to jointly train gating and prediction models. On a number of benchmark datasets our method outperforms state-of-the-art achieving higher accuracy for the same cost.
 10:00am  Coffee Break
Sujith Ravi (Google)
Title: On-Device Machine Intelligence with Neural Projections
Abstract: Deep neural networks and other machine learning models have been transformative for building intelligent systems capable of visual recognition, speech and language understanding. While recent advances have led to progress for machine intelligence applications running on the cloud, it is often infeasible to use typical machine learning models on devices like mobile phones or smart watches due to computation and memory constraints — model sizes are huge and cannot fit into the limited memory available on such devices. While these devices could make use of models running on high-performance data centers with CPUs or GPUs, this is not feasible for many applications and scenarios where inference needs to be performed directly “on” device. This requires re-thinking existing machine learning algorithms and coming up with new models that are directly optimized for on-device machine intelligence rather than doing post-hoc model compression.
In this talk, I will introduce a novel “projection-based” machine learning system for training compact neural networks. The approach uses a joint optimization framework to simultaneously train a “full” deep network like feed-forward or recursive neural network and a lightweight “projection” network. Unlike the full deep network, the projection network uses random projection operations that are efficient to compute and operates in bit space yielding a low memory footprint. The system is trained end-to-end using backpropagation. We show that the approach is flexible and easily extensible to other machine learning paradigms, for example, we learn graph-based projection models using label propagation. The trained “projection” models are directly used for inference and achieve significant model size reductions and efficiency on several visual and language tasks while providing competitive performance. We have used the novel networks to power machine intelligence applications on devices such as mobile phones and smart watches, for example a fully on-device Smart Reply model that runs on Android smart watches.
Bill March (Apple)
Title: Core ML: High-performance on-device machine learning

Abstract: Considering the limited computing power available on mobile devices, application developers have typically been constrained to either small, simple models or expensive network access to remote servers. This year, Apple introduced Core ML, a new framework for on-device inference.
Core ML combines an open format for encoding a wide-range of models with simple programming interfaces and highly-optimized, on-device evaluation methods. The combination of these factors makes Core ML a powerful tool to bridge the gap between cutting edge ML research and large scale impact on mobile device users.
While on-device inference is typically regarded to be limited by power and computing constraints, we show that optimized methods can achieve excellent performance. We will show this first with a demo of Core ML in action, showing that efficient evaluation of state-of-the-art deep neural networks on a mobile device is possible with an extremely simple programming interface. We then discuss some of the optimizations underlying this performance in detail, including graph optimizations and automatic hardware selection algorithms. We then discuss Core ML’s open-source tools and model format, and highlight several ways in which we hope to work together with the wider machine learning community.
Shiv Naga Prasad (Amazon)
Title: Building Amazon Alexa’s embedded wake word detector
Abstract: Alexa is a conversational AI agent that is accessible through several consumer devices such as Echo, Dot, Tap, Echo Show, etc. A key feature is that users can talk to Alexa eyes free and hands free, by invoking a wake up phrase, “Alexa”. Detecting the wake word on the device is one of the grand challenges in far field speech, given the CPU and memory constraints, background noise in household environment, and variation in user’s speech characteristics. We will provide an overview of the technical challenges in this area, and some of the research being conducted at Alexa in efficient ML on edge platforms.
 12:15pmLunch Break
Manik Varma (Microsoft)
Title: The Edge of Machine Learning: Resource-efficient ML in 2 KB RAM for the Internet of Things
Abstract: We propose an alternative paradigm for the Internet of Things (IoT) where machine learning algorithms run locally on severely resource-constrained edge and endpoint devices without necessarily needing cloud connectivity. This enables many scenarios beyond the pale of the traditional paradigm including low-latency brain implants, precision agriculture on disconnected farms, privacy-preserving smart spectacles, etc.
Towards this end, we develop novel tree and kNN based algorithm, called Bonsai and ProtoNN, for efficient prediction on IoT devices -- such as those based on the Arduino Uno board having an 8 bit ATmega328P microcontroller operating at 16 MHz with no native floating point support, 2 KB RAM and 32 KB read-only flash memory. Bonsai and ProtoNN maintain prediction accuracy while minimizing model size and prediction costs by: (a) developing novel compressed yet expressive models; (b) sparsely projecting all data into a low-dimensional space in which the models are learnt; and (c) jointly learning all model and projection parameters. Experimental results on multiple benchmark datasets demonstrate that Bonsai and ProtoNN can make predictions in milliseconds even on slow microcontrollers, can fit in KB of memory, have lower battery consumption than all other algorithms while achieving prediction accuracies that can be as much as 30% higher than state-of-the-art methods for resource-efficient machine learning. Bonsai and ProtoNN are also shown to generalize to other resource constrained settings beyond IoT by generating significantly better search results as compared to Bing's L3 ranker when the model size is restricted to 300 bytes.
 2:40pm Spotlight Presentations
 3:00pmCoffee Break
Adarsh Subbaswamy (JHU)
Title: Trading-Off Cost of Deployment Versus Accuracy in Learning Predictive Models
Abstract: Predictive models are finding an increasing number of applications in many industries. As a result, a practical means for trading-off the cost of deploying a model versus its effectiveness is needed. Our work is motivated by risk prediction problems in healthcare. Cost-structures in domains such as healthcare are quite complex, posing a significant challenge to existing approaches. We propose a novel framework for designing cost-sensitive structured regularizers that is suitable for problems with complex cost dependencies. We draw upon a surprising connection to boolean circuits. In particular, we represent the problem costs as a multi-layer boolean circuit, and then use properties of boolean circuits to define an extended feature vector and a group regularizer that exactly captures the underlying cost structure. The resulting regularizer may then be combined with a fidelity function to perform model prediction, for example. For the challenging real-world application of risk prediction for sepsis in intensive care units, the use of our regularizer leads to models that are in harmony with the underlying cost structure and thus provide an excellent prediction accuracy versus cost tradeoff. 
Shaked Sammah (Mobileye)
Title: Resource Efficient Driving Policy
Abstract: When attacking the problem of Autonomous Driving, one must take into account strict computational constraints, posed by the desired low cost of sensors and processors, and by the required real-time performance. Specifically, when considering Driving Policy, many of the current state-of-the-art solutions for planning in large state spaces (applied to different problems), are ruled out. We discuss approaches which allow feasible planning, through different representations of the state space, along with the use of both supervised and reinforcement learning algorithms.
Panel Discussion
Workshop Ends