My research revolves around the algorithm-hardware co-design of energy-efficient Machine Learning systems. Current state-of-the-art machine learning algorithms, though, have gained a lot of popularity in terms of accuracy of results, the amount of energy required to implement them has been constantly increasing too. In order for them to be implemented on a resource-constrained edge node, it is desirable that these ML algorithms have similar accuracy in performance while consuming a lesser amount of energy and preferably low in latency.
PEDRA - Programmable Engine for Drone RL Applications
PEDRA is a programmable engine for Drone Reinforcement Learning (RL) applications. The engine is developed in Python and is module-wise programmable. PEDRA is targeted mainly at goal-oriented RL problems for drones, but can also be extended to other problems. The engine interfaces with the Unreal gaming engine using AirSim to create the complete platform. PEDRA comes equipped with a list of rich 3D realistic environments created using Unreal Gaming Engine that can be used for the underlying drone problem. Different levels of details are added to make the environments look as realistic as possible. Once the environment is selected, it is interfaced with PEDRA using AirSim. AirSim is an open-source plugin developed by Microsoft that interfaces Unreal Engine with Python. It provides basic python functionalities controlling the sensory inputs and control signals of the drone. PEDRA is built onto the low-level python modules provided by AirSim creating higher-level python modules for the purpose of drone RL applications.
Processing-In-Memory based DNN accelerator
Novel STT-MRAM based analog Processing-In-Memory modular DNN accelerator providing end-to-end simulation framework that is required to find a power-performance optimized solution a given DNN topology. The simulator supports various layer types, logical to physical crossbar mapping schemes, and crossbar configurations. A list of control parameters is used to overwrite DRAM read/write bandwidth, the number of parallel inputs for pipe-lining, and to select a mapping scheme. Results show a significant approvement in energy and latency when compared with a digital DNN accelerator
CoRL - Conditional Reinforcement Learning
Conditional reinforcement learning is aimed at reducing the inference latency and energy on edge nodes and is joint work with Dr. Kaushik’s group @ Purdue. The DQN being trained is augmented with shallow side branches, and the outputs from these branches are monitored to decide if an action can be inferred at the current stage without invoking the deeper main branch. Both the main and the side network is trained collectively during training, while during inference we only invoke the deeper main branch if the shallow side branch is not able to predict the drone action. The results of implementing CoRL on the drone autonomous navigation have been tabulated. The inference latency was reduced by about 30 percent without any decrease in the MSF of the drone.
Hierarchical RL mapped onto Hierarchical memory sub-system
This article presents a transfer learning (TL) followed by a reinforcement learning (RL) algorithm mapped onto a hierarchical embedded memory system to meet the stringent power budgets of autonomous drones. The power reduction is achieved by 1. TL on meta-environments followed by online RL only on the last few layers of a deep convolutional neural network (CNN) instead of end-to-end (E2E) RL and 2. Mapping of the algorithm onto a memory hierarchy where the pre-trained weights of all the conv layers and the first few fully connected (FC) layers are stored in dense, low standby leakage Spin Transfer Torque (STT) RAM eNVM arrays and the weights of the last few FC layers are stored in the on-die SRAM. This memory hierarchy enables real-time RL as the drone explores unknown territories and the system only reads the weights from eNVM (that are slow and power-hungry to write otherwise) for inference and uses the on-die SRAM for low latency training through both write and read of the weights of the last few layers. The proposed system is extensively simulated on a virtual environment and dissipates 83.5% lower energy per image frame as well as 79.4% lower latency as compared to E2E RL without any loss of accuracy. The speed of the drone is improved by a factor of 3× due to higher frame rates as well.