Computer Architecture
Autonomous Robots
Machine Learning
I have focused on the energy-efficient acceleration of various computation tasks in autonomous robots with algorithm and hardware optimization during my Ph.D. journey. Specifically, I have focused on motion planning and deep regression algorithms. In our motion planning acceleration work, we explore how the physical spatial locality of different positions of a robot in the environment can be exploited to reduce the computation and propose a motion planning hardware accelerator. Further, we propose an application-specific reliability metric for low-overhead error mitigation in robotic hardware accelerators. Our work on label encoding for deep regression networks explores the use of binary classification to solve regression problems. The use of label encoding for regression problems provides significant improvement in the regression accuracy for dense and sparse regression networks. More information about these different projects can be found in published papers!
o Supervisor: Prof. Sachin Patkar, Department of Electrical Engineering, IIT Bombay
o Associated Lab: High Performance Computing Lab, IIT Bombay
The architecture of NoC plays an important role in distributed on-chip computing. Programmable router architecture results in higher resource utilization and cycles per packet than non-programmable convention router architecture but opens up interesting applications and flexibility.
Developed a lightweight stack processor that can run Forth programs and built a programmable router around the cores in Verilog. Implemented a configurable synthesizable NoC generator, which can be easily customized by the user according to the application. Instead of having dedicated hardware, new features can be added to this router by programming the Forth Core.
For illustration, support for features like broadcast, multicast, systolic array computing, and adaptive routing have been implemented by programming the router. The approach has also been evaluated through a comparison in terms of network performance characterization for different traffic patterns and resource overload.
Superscalar Processor and pipelined processor: Designed & implemented architecture of 2-wide fetch, 6-stage pipelined superscalar processor for ARMv7 equivalent ISA using VHDL and Altera Quartus. We also analyzed prevailing methods to handle data, control, and memory hazards. Incorporated major blocks, including forwarding unit, stalling unit, rename register logic, branch predictor, branch history table, distributed reservation stations, and load queue to handle the aforementioned critical points.
Study of GPU-based implementation of Circuit Simulator: We studied Modified Nodal Analysis and Circuit Simulator implementation based on it. Studied GPU accelerated Linear algebra libraries provided by CULATools for matrix solving. mplemented the circuit simulator using GPU accelerated Linear algebra libraries provided by CULATools, and performed detailed performance analysis.
Parallelism exploration in different maze-solving algorithms: We explored parallelism in different algorithms for maze solving (i.e. BHS, DFS, and game of life). We implemented these algorithms using CUDA, OpenMP, MPI, and pthread. Profiled the performance for finding limiting factors in achieving theoretical speedup as well as methods to overcome these factors using Intel VTune Ampli er, Valgrind, and nvprof.
Automatic Test Pattern Generator (ATPG) using circuit CNF formula: Developed a flow for using this CNF formula to develop automatic test pattern generator to cover all stuck-at faults in MUX-based circuits using the MiniSAT solver.
Line follower car using Arduino
Gesture recognition using Accelerometer
Graph plotter using Stepper Motor
Design of Differential Operational Amplifier in CMOS
RTL Design to Layout implementation of 8 bit counter in CMOS