M. Hassan Najafi - Research

Brian-Inspired Hyperdimensional Computing (HDC)

HDC is a computational model developed based on the observation that the human brain operates on high-dimensional data. HDC has shown significant promises for ultra-efficient and robust learning. It is very powerful in reasoning and the association of the abstract information. HDC can transform data into knowledge at a very low cost and with better or comparable accuracy to state-of-the-art methods for diverse learning and cognitive applications. An important aspect of our research is to address the limitations of HDC and exploit its strengths for the next generation of AI systems.

Stochastic Computing

Ultra Low-Cost and Robust Data Processing with Stochastic Logic

SC was introduced in the 1960s as a collection of techniques representing and processing data in the form of random bit-streams. The key advantages of this paradigm are the very simple hardware to implement complex operations (e.g., multiplication using an AND gate) and its ability to gracefully tolerate high noise rates. SC-based designs consistently achieve 50× to 100× reductions in gate count compared to conventional binary designs. SC has the potential to enable the design of fully parallel and scalable hardware implementations. Latency and energy consumption, however, are the limitations. A primary focus of our research is to address SC’s long-time limitations and challenges, unveiling its potential for the next generation of AI systems. This is what we call SC 2.0, an evolution of the SC paradigm with the added benefits of high performance, high accuracy, and energy efficiency. At the application level, we developed novel low-cost and energy-efficient SC designs for different application domains from image and sound processing to convolutional and deep neural networks.

Deterministic Methods to Stochastic Computing

For the first time, we showed that SC circuits can produce deterministic and completely accurate results if properly structured. We introduced novel methods for accurate computations with SC logic. We introduced four methods for accurate computations with SC logic: clock dividing bit-streams, using bit-streams with relatively prime lengths, rotation of bit-streams, and using LD bit-streams.

Polysynchronous Clocking: Computing with Crappy Clocks

For modern integrated circuits, the global clock distribution network (CDN) is a major bottleneck in terms of design effort, area, and performance. High skew tolerance can mitigate the costs: either the global CDN can be eliminated entirely, or one can design a relaxed CDN. We investigate Polysynchronous Clocking, a design strategy in which clock domains are split at a very fine level, reducing power on an otherwise large global clock tree. Each domain is synchronized by an inexpensive local clock. Alternatively, the skew requirements for a global clock tree network can be relaxed. This allows for a higher working frequency and so lower latency. Polysynchronous clocking results in significant latency, area, and energy savings for a wide variety of applications including image and signal processing applications.

Unary Computing

A recent evolution of the idea of SC is unary computing (UC). UC has some characteristics common to SC but is deterministic and produces accurate results. UC operates on thermometer-coded data called unary bit-streams. A unary bit-stream consists of a sequence of one value (say 1) followed by a sequence of the other value (say 0). The maximum and minimum values functions, (using an AND and an OR gate), absolute value subtraction (using an XOR gate), and multiplication (similar to SC using an AND gate) are examples of functions with low-cost unary implementation

Low-Cost and Energy Efficient Near-Sensor Processing

We explore a radically novel and highly unorthodox idea for fast, low-cost, and energy-efficient data processing near sensor: complex structures for computing in binary are replaced with ultra-low-cost UC designs. Instead of encoding data in space, as digital bit-streams, we encode values in time. Costly analog-to-digital converters (ADCs) are replaced with low-cost analog-to-time converters (ATCs). Our time encoding consists of periodic signals, with the value encoded as the fraction of the time that the signal is in the high (on) state compared to the low (off) state in each cycle.

Low-Cost Sorting Networks using Unary Bit-Streams

We introduce novel area- and power-efficient hardware designs for implementing sorting circuits based on unary bit-streams. Our designs inherit the fault tolerance and low-cost design advantages of stochastic computing but produce completely accurate results. Synthesis results of complete sorting networks show more than 90% area and power savings compared to the costs of the conventional binary implementation. To mitigate the increased latency, we use a time-encoding method. Time-based encoding of data is exploited for fast and energy-efficient processing of data with the developed sorting circuits.

Fuzzy Logic Systems using Bit-stream Processing

For the first time, we applied the concept of unary processing to the platform of fuzzy logic. We developed a low-cost and high-performance UC-based fuzzy inference controller. Synthesis results for the case of implementing a fuzzy inference controller with 81 inference rules showed up to 82% savings in area, 46% reduction in power, and 67% savings in energy consumption compared to the conventional binary counterpart.

In-Memory Computing

In-memory computing (IMC)–aka processing in memory– is introduced as a promising solution to accelerate big data applications by addressing the data movement issue between memory and processing unit and enabling extensive parallelism. IMC with non-volatile memories (NVMs) has shown great potential for in-place computations. However, there are multiple key technical challenges that make it difficult to use IMC as a part of today’s computing systems. The emerging memory devices have various reliability issues such as endurance, durability, and variability. They are prone to soft errors in the logical states. This often causes significant errors in the computation when using the traditional binary representation.

An important focus of our research is to exploit unconventional and alternative computation techniques to develop efficient and robust IMC architectures. SC and UC use a redundant bit-stream representation that can inherently tolerate high rates of noise. Our research combines the complementary advantages of SC/UC and IMC to achieve reliable and fast computation in NVMs. We introduced the first exact SC-based in-memory multiplier. The proposed multiplier can perform fast and accurate multiplication, replacing the conventional binary multiplier. The latency decreases from hundreds of cycles to only a few. We further developed the first architectures for in-memory sorting of data: a binary and a unary sorting design. The latency and energy are significantly reduced compared to prior off-memory CMOS-based sorting designs.

Page updated

Google Sites

Report abuse