Applied all the learnings of the course ECE260B to the project. The project focuses on designing a 1-D vector processor for the Attention mechanism in transformer models. Each processor core performs Q*K vector multiplication (where Q and K are vectors with 8 elements each of 8 bits), Normalization, followed by Norm*V computation. The baseline design comprises 2 such cores to process up to 16 vectors in parallel. To improve performance, area, and power the baseline design is further optimized: pipelining to meet timing at 1GHz, controller design to reduce the number of ports and testbench overhead, applying data gating technique to save dynamic power, and thorough RTL/GLS verification to ensure no functional/timing bugs in the design. Delivered a GDS2 file as an output of the project.
The project focuses on an efficient implementation of a 2-D systolic array architecture for mapping a CNN layer of the VGGNet16 model. Quantization aware training has been done for the VGG16 model. The conventional 2-D systolic array architecture has been implemented and further optimized to reduce gate count, power and latency. The enhancements implemented include memory banking to enable simultaneous accumulation and writing of partial sums, pipelining the implementation stages to reduce latency, input reordering for memory and latency optimization (input sieving), pruning, clock gating, dual-core implementation and time domain tiling. Batch processing has also been explored for thorough verification of the design and to identify memory bottlenecks. The FPGA implementation has been done for the reconfigurable and vanilla version of 2-D systolic array.
Designed a pipelined model of tanH function on FPGA for maximum throughput and minimum resource utilization for a targetted accuracy. Optimization for data representation, lookup tables, memory vectors was done on MATLAB using Genetic Algorithm.
Tanh function is antisymmetric in around origin.
Function can be mapped in 2 Linear, and one saturation range
Pipelined design of TanH function with 3 cycle latency.
Designed a pipelined processor based on MIPS architecture on Verilog with hazard-detecting and forwarding units. Made a small compiler on python to convert the ISA instructions to opcodes.
Implementing a paper for predicting the opening, minimum and maximum price of a stock based on its past trends. Used LSTM and RNN on Keras-Python.
The project’s objective is to model an RF based community management system. The communication between different nodes is done through wireless Zigbee modules. The objective of the system is to control the humidity of an environment using the data from DHT sensors and using a motor to pump water to bring up the humidity. This system was implemented using 4 STM32F4 Discovery boards, 2 DHT sensors, 4 Xbee transceivers, a DC motor and motor driver, and a YF-S201 water flow sensor. One of the boards was approximated into a node by transmitting dummy data to the Base Station.