Purpose: This project develops the first comprehensive data, algorithm, and platform-aware framework that provably optimizes the performance cost of machine learning on big and dense data. Our devised algorithms and hardware-customized engines reach the theoretical bounds on the system performance (in terms of runtime, energy, and/or memory usage) given a data geometry and platform architecture. The framework is applicable to a myriad of resource-intensive iterative ML applications. 

Applications: The usage scenarios include learning and analysis of dense visual content on constrained platforms (e.g., smartphones), which is a key enabler for next-generation applications in important domains including healthcare, social media, robotics, and autonomous vehicles Our work so far has focused on the two widely-adopted ML algorithms: 
  • Kernel based learning (e.g., regular and penalized regression, power iteration) 
  • Deep neural networks

Contemporary optimization algorithms and big data analytics are often focused on functionality and accuracy with system performance as an afterthought. As their use grows and they make the leap from servers and desktops to smartphones and Internet of Things (IoT) devices, functionality is not just about algorithmic efficiency and accuracy, but also real-world practicality. In this project, our aim is to address two sets of standing challenges in practical machine learning. First the physical constraints such as real-time requirement, power/memory budget, and inter-core/memory bandwidth. Second, the dense data processing hurdles; dense dependencies not only lead to costly computations, but also they congest the limited I/O and memory bandwidth in modern platforms. Dense data, which includes omnipresent image and video, accounts for the majority of digital content today. My dissertation simultaneously advances the: (i) complex ML and data-dependent dimensionality reduction (projection) methods that are traditionally abstracted from physical constraints, and (ii) hardware mapping/synthesis methods that generally do not alter the known ML/projection algorithms, while optimizing for physical resources.

Leveraging data geometrical properties for resource efficiencyModern high-dimensional dense datasets often exhibit structural properties that can be leveraged for projecting content into an ensemble of lower dimensional subspaces. While the subspace structures had been mainly used for knowledge extraction purposes, with little attention to their potential benefits for resource efficiency. In this project we introduce the idea of pre-processing and alignment of these subspaces to optimize performance given the constraints of a specific platform. 

The f
igure shows a dataset in R2 space where each point is represented by two non-zero elements in the direction of X and Y axis. We factorize the data to a dictionary matrix D, and a coefficient matrix C, using SVD and our approach, called TinyML. Since the dataset is full-rank, an SVD factorization finds a 2-dimensional orthogonal basis (matrix D) and a fully dense coefficient matrix (matrix C). However, by choosing a more suitable basis for factorization such as the ones shown by the gray arrows, we can approximately represent each data point using only one non-zero coefficient element. Matrix C in this case becomes half sparse. Thus, although the dataset is not inherently low-rank in the classic definition, it lies on a union of two lower dimensional (rank-1) subspaces. When the data is much higher dimensional, there is a higher degree of freedom in the way atoms in D can be selected. We leverage this degree of freedom to find the best dictionary that minimizes the cost on the pertinent platform.

Privacy Preserving Machine Learning.  For resource- limited clients such as smartphones and sensory IoT devices, it is common to delegate the expensive ML tasks to cloud servers . Our work systematically models the trade-off between the ML accuracy and the resource efficiency and, for the first time, enables provably secure outsourcing of iterative ML computations on datasets with billions of records. We devised new parametric transforms on the private data which can be customized to the pertinent security protocol and network/memory architecture. The analysis on the transformed content is then interactively performed between the client and servers, and is adaptable to streaming applications. The proposed approach can be applied to various security primitives and protocols; while our initial focus was on secret sharing primitive and outsourcing scenarios, we are now extending my work to more generic security primitives, e.g., Yao’s Garbled Circuits and Fully Hommomorphic Encryption, as well as to large-scale multi-party computing scenarios.