Chip Gallery

Shenjing Chip

Conventional neuromorphic accelerators primarily leverage split-merge method to accommodate a neural network that is beyond a single core’s size, leading to possible accuracy loss, extra core usage and significant power and energy overhead. This work presents an energy-efficient, reconfigurable neuromorphic processor to address the problem by (i) a partial sum router circuitry that enables in-network computing to remove the need of extra merge cores; (ii) software-defined Networks-on-Chip that eliminates the power-hungry routing compute and (iii) fine-grained power gating and clock gating technique for power reduction. Our test chip achieves lossless mapping as the algorithm and an energy efficiency of 1.7pJ/SOP at 0.5V, 19% lower than state-of-the-art result. Please visit https://github.com/Angela-WangBo/Shenjing-RTL for more info.

HyCUBE Chip

IoT devices use ultra-low-power micro-controllers that cannot handle the performance demands of emerging compute-intensive applications. Accelerators can be added to improve system power-performance efficiency. We present HyCUBE, a Coarse-Grained Reconfigurable Array (CGRA) accelerator chip that realizes 127 improvement in power efficiency compared to TI Sensortag IoT platform. HyCUBE has a bufferless Network-on-Chip (NoC), enabling single-cycle data traversal to boost throughput and a software-scheduled architecture, automatically extracting application parallelism. Our 40nm test chip delivers peak efficiency of 26.4 MOPS/mW with 290 pJ/cycle, realizing a power efficiency improvement of 28.6 and 26.5 compared to Xilinx Zynq FPGA and ARM Cortex-A7 core.

Bit-Reconfigurable Compute-In-Memory Chip

This work proposes a digital versatile SRAM-based computing-in-memory (CIM) macro with reconfigurable precision from 1-bit to 16-bit and programmable mathematical functions, including addition and multiplication. The proposed CIM macro supports 1~16-bit weight-stationary addition (WSA) and operands-stationary addition (OSA), and 1~8-bit bit-serial multiplication (BSM). The proposed versatile CIM macro accelerates various machine learning algorithms such as convolutional neural networks (CNNs) and self-organizing maps (SOMs). A test chip was fabricated in 65nm CMOS technology and achieved an energy efficiency of up to 40.7 TOPS/W for WSA (1-bit), 39.4TOPS/W for OSA (1-bit), and 84.1 TOPS/W for BSM (1-bit).