Energy-Efficient Machine Learning Systems using Deterministic Bit-Streams

To be completed...

Time-based Computing with Stochastic Constructs

We explore an evolution of the concept of stochastic computing and propose a highly unorthodox idea: performing computation with digital constructs on time-encoded analog signals. Instead of encoding data in space, as random bit streams, we encode values in time. The time encoding consists of periodic signals, with the value encoded as the fraction of the time that the signal is in the high (on) state compared to the low (off) state. This time-based representation is an excellent fit for low-power applications that include time-based sensors, for instance, image processing circuits in vision chips. Converting a variety of signals from an external voltage to a time-based representation can be done much more efficiently than a full conversion to binary radix. Stochastic image processing based on time-encoded signals could have a significant impact in this application area because of a significantly lower hardware cost, power and energy consumption, and a remarkable lower processing time. This work has led to several conference publications in ASP-DAC 2017, ISCAS 2017, and ICCD 2017, three journal publications in IEEE TVLSI 2017, IEEE TVLSI 2018, and IEEE Micro 2017, and one non-provisional patent application.
Figure: Example of multiplying two PWM signals using an AND gate. IN1 represents 0.5 (50% duty cycle) with a period of 20 ns, and IN2 represents 0.6 (60% duty cycle) with a period of 13 ns. The output signal from t=0 to 260 ns represents 0.30 (78 ns/260 ns).

Deterministic Computing with Stochastic Bitstreams

Poor progressive precision is the main challenge with the recently developed deterministic methods of SC. Relatively prime stream length, clock division, and rotation of bitstreams are the three deterministic methods of processing bit-streams that are initially proposed based on unary bitstreams. For applications that slight inaccuracy is acceptable, these unary stream-based approaches must run for a relatively long time to produce acceptable results. This long processing time makes the deterministic approaches energy-inefficient compared to the conventional random stream-based SC. We propose a high-quality down-sampling method which significantly improves the processing time and the energy consumption of the deterministic methods by pseudo-randomizing bitstreams. We also propose two novel deterministic methods of processing bit-streams by using low-discrepancy sequences. Significant improvement in the processing time and energy consumption is observed using the proposed methods. This project has led to one journal article in IEEE Transactions on Emerging Topics in Computing (TETC’18), three conference and workshop presentations at ICCAD’18, ICCD’17, and IWLS’18, and a provisional U.S. patent application. The presented work at the ICCD conference received the Best Paper Award of the conference.

Low-Cost Sorting Network Circuits using Unary Bit-Streams

We introduced a novel area- and power-efficient synthesis approach for implementing sorting network circuits based on unary bit-streams. The proposed method inherits the fault tolerance and low-cost design advantages of processing random stochastic bit-streams while producing a completely accurate result. Synthesis results of complete sorting networks show more than 90% area and power savings compared to the costs of the conventional binary implementation. However, the latency increases. To mitigate the increased latency, we use our developed time-encoding method. Time-based encoding of data is exploited for fast and energy-efficient processing of data with the developed sorting circuits. The approach is validated by implementing a low-cost, high-performance, and energy-efficient implementation of an important application of sorting, median filtering. This project resulted in one journal article in IEEE Transactions on Very Large Scale Integration Systems (TVLSI’18) and one conference presentations at ICCD’17.

Polysynchronous Clocking: Computing with Crappy Clocks

For modern integrated circuits, the global clock distribution network (CDN) is a major bottleneck in terms of design effort, area, and performance. High skew tolerance can mitigate the costs: either the global CDN can be eliminated entirely, or one can design a relaxed CDN. We investigate Polysynchronous Clocking, a design strategy in which clock domains are split at a very fine level, reducing power on an otherwise large global clock tree. Each domain is synchronized by an inexpensive local clock. Alternatively, the skew requirements for a global clock tree network can be relaxed. This allows for a higher working frequency and so lower latency. Polysynchronous clocking results in significant latency, area, and energy savings for a wide variety of applications including image and signal processing applications. This work has led to one conference publication at ASP-DAC 2016, one journal publication at IEEE Transactions on Computers 2017, and one non-provisional patent application.
Figure: Stochastic multiplication using an AND with unsynchronized bit-streams.

Memory System Design for Stochastic Computing

As the first study of its kind, we rethink the memory system design for SC. We integrate analog memory with conventional stochastic systems to reduce the energy wasted in conversion units. We propose a seamless stochastic system, StochMem, which features analog memory to trade the energy and area overhead of data conversion for computation accuracy. Comparing to a baseline system which features conventional digital memory, StochMem can reduce the energy and area significantly at the cost of slight loss in computation accuracy. This project resulted in one journal article in IEEE Computer Architecture Letters (CAL’18), one accepted paper at DAC’17, and a provisional U.S. patent application.

Figure: StochMem featuring Analog Memory