ARYABHAT-1
( Analog Reconfigurable Technology and Bias-scalable Hardware for AI Tasks )
Team Members
Project Video
Project Background
ARYABHAT-1 is a next-generation analog computing chipset designed to target Artificial-Intelligence (AI) and Machine Learning (ML) applications at the edge. Presently, such computations are achieved by application-specific digital accelerators which utilize spatial arrays of parallel processing elements to significantly improve performance and energy efficiency compared to general purpose platforms. This work focuses on building the first-of-its-kind technology scalable reconfigurable analog processor that can be fully scaled down to sub-nanometer process nodes.
This post summarizes the contributions and developments that have happened to date along with the upcoming works in the pipeline.
Project Motivation
We started this research back in 2019 and were intrigued by how powerful and energy-efficient the human brain is. It has roughly 86 Billon processing units (neurons) and consumes only about 25 Watts of power. Even the most powerful supercomputer in the world falls short when it comes to matching the raw computational power, efficiency, and energy consumption of the human brain. Thus, there still is a huge order of magnitude difference in what we have built today and what nature offers us. While replicating the human brain was not the ideal path forward (at least not in the nearby future), it was pretty clear that the existing digital can be augmented with analog to move in a similar direction.
Furthermore, we also wanted to take a route distinct from where the industry was focusing, whereby in the future, it will soon become infeasible to keep on increasing power and area for squeezing more performance per watt for digital Deep-Learning Accelerators.
Current AI Challenges & Industry Approach
It was important to understand the current challenges in designing Analog AI Accelerator along with the alternative solutions the industry was working on to tackle this. Below I highlight a few of the most important ones.
Shrinking Moore’s Law and Broken-Down Dennard’s Scaling :
Today, with Moore's law reaching its end and Dennard's scaling already hit the wall, digital accelerators (like GPUs, TPUs, and IPUs) which the industry is currently pursuing, are not enough to execute the demanding workloads efficiently. The overall effect of saturating Moore's law and already saturated Dennard's scaling is that we cannot increase computation in a given sq. mm of the area with the hope of squeezing out more performance per watt. For instance, if I could have done a million operations for 1 mm. sq area in 180nm technology, maybe I could do 20 million operations in the same 1 mm. sq area for 7nm FinFet, and that too with improved power efficiency. But this has stopped for now.
Reducing Bit Precision vs. Performance Accuracy Trade-off :
So what industries have done to tackle Problem #1 mentioned above, is that they are doing low precision calculations such as fixed-point 16-bit or even 8-bit. By this, the hope is to squeeze out even more performance per watt. However, it will soon become infeasible to increase the number of operations onto a chip or reduce bit precision further. There is an inherent limit to implementing machine learning on digital electronics. So, on the one hand, we cannot reduce the bit precision further because the computational accuracy starts dropping, while on the other hand, we cannot scale the technology down because of physical limitations. So we are kind of stuck here.
Exponentially increasing Machine Learning Algorithmic Computations :
Machine Learning (ML) algorithms are increasing exponentially in size. In 2015 the VGNet, an artificial neural net that surpassed the human limit, had 100 layers of neurons and performed millions of operations. Since that time, model size has increased drastically, and this has become a problem for several reasons: One is that it now requires computations to happen in billions which we cannot do because of fundamental physical limits, and the second is the energy consumption.
Challenge of Process Technology Scalability in Energy-Efficient Analog :
Performance of analog in terms of power efficiency and the area is unmatched to digital counterpart but porting of design from one node to another advanced need is challenging and generally requires architectural redesign. For eg., the design that used to work at the 65nm node where most of the industry analog designs are will not work in FinFet.
Lack of modularity in traditional Analog Design :
Analog designs are generally non-modular. In analog, there is no concept of fundamental modules like we have standard cells in digital designs that can be recursively used irrespective of implemented technology node. Therefore each ML architecture design in analog required a complete rework and consumes massive time and manpower.
Lack of Bias-Scalability in Analog Design :
Analog designs are stable only in their defined biases. They tend to lose performance and functionality if operated beyond their defined specifications or operating regimes due to which it is difficult to tune the architecture as per desired need of the application.
Challenges of Reconfigurability in Analog Design :
Analog designs lack the reconfigurability of their digital counterparts. This is also the reason why we always hear about FPGAs (Field-Programmable GateArray) but not FPAAs (Field-Programmable Analog Array).
Challenge of Computational Precision vs Computational Accuracy :
In any analog ML design, it is generally difficult to control computational/bit precision at the hardware level. Thereby affecting the area saving and power-performance as per the need of application.
Proposed Solution : Analog "A Road Less Travelled"
We proposed a novel analog computing framework called "Shape-based Analog Computing" (S-AC) based on our previous work on Generalized Margin Propagation (GMP).
We made the analog designs modular just like digital designs while simultaneously inheriting the area and power efficiency of the analog world.
Analog designs implemented using S-AC Framework inherit the power of Analog Standard cells just like digital standard cells which helps the design to be easily synthesizable.
Analog designs implemented using S-AC Framework are process technology scalable meaning the design done in one process node can be implemented in another process node without architectural changes.
Analog designs implemented using S-AC Framework are bias scalable meaning one can use same design for performance-hungry applications such as at the server and the same can be utilized at the edge for energy efficiency.
ARYABHAT-1 Design Stages & Timelines
Stage 1: Design of Novel Analog Computing framework called Shape-based Analog Computing
(Aug-2019 to Aug-2020)
Stage 2: Design of Computational Modules based on Analog Framework designed in Stage 1
(Jan-2020 to Dec-2020)
Stage 3: Prototyping Analog Computational Modules from Stage2 and checking their Feasibility
(Aug-2020 to Aug-2021)
Stage 4: Design of ML Computational Blocks for High-Performance Computing
(Dec-2020 to Sep-2021)
Stage 5: Design of Analog AI accelerators using computational blocks from Stage 4
(Jan-2021 to Aug-2021)
Stage 6: Design of Analog Test Framework using Open-Source tools for Analog-AI Accelerator testing
(Mar-2021 to Aug-2021)
Stage 7: Design of Compiler and APIs to interface Open-Source Libs with actual AI -Accelerator
(May-2021 to Dec-2021)
Stage 8: Prototyping and testing of the first version of Analog AI accelerator chip called "ARYABHAT".
(Jan-2022 to Dec-2022)
ARYABHAT Snapshots
ARYABHAT 1
A Snapshot of first generation analog AI Accelerator chip called ARYABHAT
ARYABHAT 1
A Snapshot of ARYABHAT with embedded testbed
Test-In Progress
Test-In Progress
ARYABHAT In-News
ARYABHAT-1 Architecture
Pratik Kumar, Ankita Nandi, Shantanu Chakrabartty, Chetan Singh Thakur, " ARYABHAT: A Digital-Like Field Programmable Analog Computing Array for Edge AI," in IEEE Transactions on Circuits and Systems I: Regular Papers, (Early Access), pp. 1-14, Jan. 2024, doi: 10.1109/TCSI.2024.3349776.
Related Publications
P. Kumar, A. Nandi, S. Chakrabartty and C. S. Thakur, "Process, Bias, and Temperature Scalable CMOS Analog Computing Circuits for Machine Learning," in IEEE Transactions on Circuits and Systems I: Regular Papers, 2022, doi: 10.1109/TCSI.2022.3216287.
P. Kumar, A. Nandi, S. Chakrabartty and C. S. Thakur, "Bias-Scalable Near-Memory CMOS Analog Processor for Machine Learning," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, doi: 10.1109/JETCAS.2023.3234570.
Gu, M. and Chakrabartty, S., 2011. Synthesis of bias-scalable CMOS analog computational circuits using margin propagation. IEEE TCAS I: Regular Papers, 59(2), pp.243-254.
Pratik Kumar, Ankita Nandi, Shantanu Chakrabartty, Chetan Singh Thakur, " ARYABHAT: A Digital-Like Field Programmable Analog Computing Array for Edge AI," in IEEE Transactions on Circuits and Systems I: Regular Papers, (Early Access), pp. 1-14, Jan. 2024, doi: 10.1109/TCSI.2024.3349776.
Patents
“A Reconfigurable and Scalable Multi-Core Analog Computing Chip”, India Provisional Pat. App. No. 202141054561, filed 25th November 2021.