Research
About: Hardware accelerators are vital for achieving efficient and high-performance deep neural network (DNN) computations, enhancing speed and efficiency compared to software-based approaches. Research in digital design for DNN accelerators needs to focus on optimizing arithmetic modules, software-hardware co-validation, and VLSI design to enable resource-efficient and high-performance hardware implementations for edge-AI applications.
Addressing this, my research overview focuses on the design of low-power and efficient VLSI architectures for deep neural network (DNN) accelerators. It has needed significant computational resources on hardware and efficient architectures for efficient acceleration targeting to the Edge-AI, IoT applications.
Work Contribution Overview:
The research investigates the use of the Co-ordinate Rotation Digital Computer (CORDIC) architecture for MAC and non-linear activation function (AF) operations.
Although CORDIC-based designs are area and power-efficient, they have low throughput. A performance-centric pipeline architecture for CORDIC-based MAC and AF is proposed to address this, optimizing hardware resource utilization. Different CORDIC-based MAC and AF topologies with iterative and pipeline architectures are explored, offering significant hardware resource savings.
Digital implementation of an efficient CORDIC-based DNN architecture with a performance-centric pipeline MAC unit, achieving high throughput and area efficiency. The design is scalable for design parameters and adaptable for ASIC implementation without relying on Xilinx macros.
Hardware implementation of DNN architecture with reused hardware-costly AF, optimized memory addressing, and high-throughput with Parallel-In-Serial-out Architecture. The design achieved better performance with reduced hardware resources and power consumption, making it suitable for edge computing.
Design an efficient hardware-based processing engine (PQPE) for deep learning networks, supporting various Posit number formats. The PQPE achieves high accuracy with reduced hardware resources, making it suitable for matrix computations in convolutional neural networks.
Develop fixed-point Multiply-Accumulate (MAC) operations with an approximation technique and dynamic quantization, reducing area, power consumption, and latency while maintaining acceptable accuracy. Also, address the preloading bias scheme to make the design efficient for resources and throughput.
Design of configurable In-Memory Advanced Computing (In-MAC) architecture based on 6T SRAM. It performs complex Boolean operations and logic functions efficiently, offers flexibility, and improves latency compared to conventional conversions.
Design of a configurable dynamic comparator-based analog-to-digital converter (ADC) for low-power applications, considering the importance of in-memory computing architecture for DNN acceleration.
Research Work Area and Expertise:
Digital Circuit Design targeting the SoC for DNN Accelerator
FPGA Implementation DNN accelerator with PS and PL utilization, HW-SW Co-design
VHDL/Verilog Design for UART, AXI, and module integration for RISC-V, 2D systolic array, customized-scalable control engine for DNN accelerator
Efficient arithmetic module design with CORDIC architecture, POSIT/Bfloat16/Fixed-point arithmetic, and Rounding Scheme targeting to Multiply-Accumulate (MAC) arithmetic units.
Software platform design and development for validation of Hardware Performance accuracy using Python Library
DMA controller architecture for efficient data access targeting DNN accelerator
ASIC design flow till GDS-II
CMOS Circuit Design for Mix-signal VLSI design applications (In-memory Computing using SRAM, Amplifiers, ADCs )
Layout Design for CMOS technology, DRC-LVS-PEX for layout at 45nm/180nm, Pading and TapeOut