This work presents a low-power motion gesture recognition system-on-chip (SoC) for smart devices. The SoC incorporates a low-power image sensor and a memory-efficient outermost-edge-based gesture sensing DSP. The DSP utilizes a self-adaptive motion detector that automatically updates a motion-pixel threshold for accurately sensing hand movements. A convolution-based noise-tolerant feature extraction technique is also developed for preventing detection errors caused by random noises in the images from the low-power sensor. The feature extraction architecture is highly accelerated utilizing parallelisms and pipelining for achieving low-latency real-time gesture recognition. Measurements from a test chip fabricated in 65nm CMOS show that the SoC consumes 213.7 μW with only 3 μW dynamic power at 30fps. The SoC occupies only 0.54mm2, making it very well-suited for wearable devices and sensor nodes. The image sensor is fully operational down to 0.6 V while the DSP can be scaled down to 0.46 V. The average recognition accuracy of the system is 85% while the latency is 1.056ms.

This work proposes a 2D sequence-dependent PUF based on SRAM. It expands challenge to response pairs (CRPs) by the order of rows(sequence length - 1) × columns(sequence length - 1) for reliable authentication. This is achieved by configuring the sequence of SRAM cell selection. Each bit cell has a vertical word-line and a horizontal word-line to utilize the orthogonal word-lines to connect four cells simultaneously to generate one bit data. The proposed technique allows us to generate multiple data maps from one chip. This non-linear behavior also makes the chip securer. A test chip was fabricated in 65 nm CMOS technology with the area of area is 12580 μm2. The bit error rate is 3% at the nominal point (0.8 V/20 ̊C) and the inter-hamming distance between chips is 0.497. The hamming distance of 0.427 was measured when using the same sequence length with different orders.

This work presents an energy harvesting system (EHS) based on a triboelectric nanogenerator (TENG). A novel TENG HDL spice model was developed to optimize the proposed ultra-low power (ULP) EHS. The proposed TENG-EHS utilizes a novel single-comparator-control (SCC) algorithm for improving the power conversion efficiency (PCE). It modulates the switching frequency of the implemented switched capacitor charge pump (SCCP) in proportion to the load condition at a given applied vibration frequency (i.e. excitation frequency). Moreover, a novel hysteresis control technique was introduced. It regulates the input voltage at the maximum possible power point without IC breakdown, and adopts a dropout excess charge storage technique. The fabricated test chip in 65-nm CMOS technology achieves a peak PCE of 88% with 2.4 µW to 15.6 µW input power and power density of 39.59 µW/mm2.

Gesture recognition has increasingly become one of the most popular human-machine interaction techniques for smart devices. Existing gesture recognition systems suffer from either excessive power consumption or large size, limiting their applications for ultra-low power IoT and wearable devices. This paper presents an accurate, area-efficient, and ultra-low power real-time gesture recognition system for smart wearable devices. The proposed work utilizes a peak-based gesture classification engine with less memory and a low-resolution and low-power on-chip image sensor for achieving high area efficiency and low power. The feature extraction architecture removes fixed-pattern noises from the low-power on-chip image sensor for accuracy improvement and employs parallelism for recognition speed enhancement. The proposed system requires only 3.2 KB on-chip memory for processing 32×32 pixel data. Measurement results of a test chip fabricated in 65nm CMOS demonstrate that the proposed system consumes 137.0 μW at 0.8 V and 30fps while occupying only 1.78mm2, which achieves the lowest power and smallest area among existing gesture recognition systems.

This work presents a low-power infrared motion detection system suitable for smart devices such as wearables. The SoC incorporates instrumentation chopper amplifiers (ICA), LPFs, ADCs, and a DSP. The low-noise ICAs amplify very low frequency µV-level thermopile outputs with 2.0 NEF and provide programmable gain modes. To reduce standby power the ICA uses lower current when the system is in idle mode. Wakeup can be triggered by detection of a simple gesture. For the LPF, source degeneration by pseudo-resistors and gm division techniques are used for both improved linearity and 30Hz bandwidth. The DSP employs a motion history image technique to achieve low-power detection. The system consumes 260µW in active mode and 46µW in idle mode while processing 16×4 infrared data at 30fps. A complete system demonstration is shown.

We propose a column-based split cell-VSS (CS-CVSS) data-aware write-assisted (DAWA) 9T ultra-low voltage SRAM with enhanced read sensing margin in 28nm FDSOI technology. The proposed write-assist technique (CS-CVSS and DAWA) improve both half-select SNM and write margin. The proposed 3T low leakage read port enhances read sensing margin by minimizing bitline leakage through negative gate to source voltage. A 16kb 9T SRAM test chip demonstrates VDDMIN-Write improvement of 0.39 V and VDDMIN-Read of 0.25 V with 1.57 µs read access time. The energy of 6.72 pJ is achieved at 0.5 V.

This work presents a novel three-dimensional maximum power point tracking (3D-MPPT) system for ultra-low power (ULP) solar energy harvesting systems (EHS) for internet of things (IoT) smart nodes. The proposed 3D-MPPT utilizes a gate-source voltage (Vgs) dependent switch width modulation (SWM) technique for improving power efficiency (PE) at standby (<1 µA) and heavy (>300µA) load scenarios, and eliminating the gate driver and conduction loss trade-off for a reconfigurable switched capacitor charge pump (SCCP). The proposed SWM technique modulates the SC transistors size in proportion to the load condition, input voltage and Vgs applied. The tested chip, fabricated in 65-nm CMOS technology, can harvest from 0.35 V and provides a regulated output voltage at 1 V with peak efficiency of 88% at 200 µW and PE > 60% at 100 nW.

Radix-2k delay feed-back and Radix-K delay commutator are the most well-known pipeline architecture for FFT design. This paper proposes a novel Radix-22 multiple delay commutator architecture utilizing the advantages of the Radix-22 algorithm such as simple butterflies and less memory requirement. Therefore, it is more hardware efficient when implementing parallelism for higher throughput using multiple delay commutators or feed-forward data paths. Here, we propose an improved input scheduling algorithm based upon memory to eliminate energy required to shift data along the delay lines. An 1024-point FFT processor with two parallel data paths is implemented in 65 nm CMOS process technology. The FFT processor occupies an area of 3.6 mm2, successfully operates in the supply voltage range from 0.4 V – 1 V and the maximum clock frequency of 600 MHz. For low voltage, high performance applications, the processor is able to operate at 400 MHz and consumes 60.3 mW or 77.2 nJ/FFT generating 800 Msamples/s at 0.6 V supply.

This work proposes an 8T SRAMs utilizing a column-based data encoding scheme to reduce read and write power when there are similarities between consecutive data. It is useful in image processing applications where nearby pixels tend to have similar values. A 32Kb SRAM implemented in 65 nm CMOS process demonstrated successful operation down to 0.36 V. The total power consumption is 0.37 µW, corresponding to the maximum frequency of 0.25 MHz. Its minimum energy is 0.3 pJ/access achieved at 0.5 V.

This work describes a fully integrated, low voltage digital low-dropout voltage (DLDO) regulator for ultra-low power applications with a load current aware clock modulation scheme. The proposed DLDO uses a clock modulation technique that provides a fast transient response during load state transitions. The proposed clock modulation (CM) controls the clock frequency when it senses a sudden load current transition. This eliminates the tradeoff between transient time and power efficiency with a fixed clock frequency. Thus, it minimizes the transient response time and maximizes the power and current efficiency. The proposed DLDO operates at 0.6 V and generates 0.55 V output voltage. A test chip is fabricated using 65-nm CMOS technology and demonstrates the current efficiency of 99.7% with the load current from 10 μA to 200 μA with and the quiescent current of 0.9 μA.

This work presents a novel ultra-low power maximum power point tracking (MPPT) technique with a wide tracking range. An indirect, non-interrupting and approach using a novel timing-based tracking algorithm is proposed. This reduces processing current consumption down to 3.4-μA. Moreover, the proposed tracking method is self-adaptive to various types of photo-voltaic cells and thermo-electric generators and avoids external re-configuration or change of passive components for different operation conditions. A test chip was fabricated in 65-nm CMOS technology. It can harvest energy within 0.4 V to 1.7 V with a tracking response time of less than 300 ms with the minimum supply voltage of 0.8 V. The tracking efficiency is up to 96.2 % when supplied by a PV micro-cell array using an irradiation range of 200 lux to 1000 lux.

This work proposes a power and area efficient Laplacian Pyramid processing engine (LPPE) for multi-resolution image representation in image/video processing. In the proposed LPPE, a novel FIFO architecture with adaptive data compression is proposed to reduce the power and area consumption of LPPE. A new filtering extension method is also proposed to reduce the output errors. In circuit level, near-threshold design is adopted to further reduce the power consumption by supply voltage scaling. The proposed LPPE fabricated in a 0.18 µm CMOS process technology can process 112 frames per second at 3.68 MHz and 0.5 V while consuming only 452 µW.

This work introduces an ultra-low voltage open loop VCO-based ADC with background calibration for ultra-low power applications. A novel calibration scheme is proposed to calibrate the nonlinear voltage-to-frequency tuning curve of the VCO. A replica VCO is used to compute the correction coefficients and the corrected values are stored in a lookup table. The proposed calibration method is at least 64 times faster than other state-of-the-art ones. The proposed VCO-based ADC achieves a resolution of 8.8 bits at 10 KHz bandwidth with the power consumption of 1.15 µW in the open loop architecture.

Dual-port SRAMs improve the performance of various hardware accelerators. This paper presents a low voltage 12T dual-port SRAM for biomedical hardware accelerators. The proposed dual-port SRAM cell decreases the disturbance of the common-row-access mode for improving the worst case stability issue and realizing ultra-low voltage operation. In addition, hierarchical bitlines and a virtual ground technique are employed to further lower the power and minimum operating voltage and power consumption. A 16 Kb 12T dual-port SRAM was fabricated in a 65nm CMOS process technology and showed successful dual-port SRAM operation down to 0.4 V in the common-row-access mode.

A cognitive multi-functional ECG processor has been designed and fabricated in 0.18 μm CMOS process. Various power-saving techniques are applied across different levels of design hierarchy, including global cognitive clocking, pseudo-downsampling WT & IWT, adaptive storing, denoising based run-length compression, and near-threshold operation.

The presented processor realizes comprehensive cardiac analysis functions while consuming only 457 nW from 0.5 V supply voltage, which is the lowest consumption among the processor designs for long-term ECG monitoring. (Collaboration IME, A*STAR)

This work presents a new NVM NEM device, called the shuttle memory. It consists of a floating metal electrode guided inside a pod cavity, which is actuated by electrostatic forces and has two stable positions. Permanent data retention is obtained by adhesion forces only, which eliminates leakage current and predicts good reliability at HT. Besides, the anchorless geometry provides better compactness compared with planar MEMS memory devices and does not suffer from elastic fatigue or thermal drifts. A 3-terminal cantilever-based NEMS device is also proposed as a NVM structure with fast READ/WRITE and good data retention at extremely high temperature (up to 300 °C). The proposed vibrational reset operation significantly simplify the device structure complexity and hence fabrication cost. A selective set-after-reset scheme is introduced for energy efficient WRITE operation in the proposed array architecture.

Low Power CAM: This work reports a fully parallel match-line (ML) structure with an automated background checking (ABC) scheme. MLs are pre-charged by a pulsed current source to minimize power. The proposed ABC scheme monitors the ML sensing using two dummy rows. It digitally adjusts the pulse width and the delay of the search control signals of the CAM without disturbing the normal operation. The test chip was prototyped using a standard 65 nm CMOS process.

Capacitive-coupled Transceiver for 3DICs: This brief presents a simultaneous bidirectional capacitive coupling transceiver for intertier communication in 3-D integrated circuits. A novel capacitive coupling interconnect structure is proposed. Optimization of the proposed interconnect structure for minimizing parasitic capacitance achieves the voltage swing VSW of 200 mV at the voltage sensing nodes. The data rate of 3 Gb/s/ch is demonstrated in the emulated-3D interconnect. The proposed transceiver consumes 140 μWat 3 Gb/s/ch. The test chip was fabricated in a 65-nm CMOS technology.

Two ultra-low voltage SRAMs are implemented in 1.2V, 65nm CMOS technology. In the first SRAM, we propose a bitline equalization technique for single-ended bitline for improving sensing margin and fast local write-back for eliminating the stability issue of the half-selected cells. In the second SRAM, we propose a novel 7T SRAM cell, write through virtual GND, and ultra-fine grain power gating switches. The write through virtual GND scheme improves the dynamic stability of unselected cells while eliminating the conventional half-selection. Ultra-fine grain power gating switches reduces the cell leakage from garbage data. The cell power will be enabled only when the first write access occurs. This technique will be more effective in wireless sensor nodes where memory data is newly generated after each power-up.

A voltage scalable 0.26 V, 64kb 8T SRAM with 512 cells per bitline is implemented in a 130nm CMOS process. Utilization of the reverse short channel effect in a SRAM cell design improves cell write margin and read performance without the aid of peripheral circuits. A Marginal Bitline Leakage Compensation (MBLC) scheme compensates for the bitline leakage current which becomes comparable to a read current at subthreshold supply voltages. The MBLC allows us to lower Vmin to 0.26V and also eliminates the need for precharged read bitlines. A floating read bitline and write bitline scheme reduces the leakage power consumption. A deep sleep mode minimizes the standby leakage power consumption without degrading the hold mode cell stability. Finally, an automatic wordline pulse width control circuit tracks PVT variations and shuts off the bitline leakage current upon completion of a read operation.

A novel 8T SRAM-based bitcell is proposed for currentbased compute-in-memory dot-product operations. The proposed bitcell with two extra NMOS transistors (vs. standard 6T SRAM) decouples SRAM read and write operation. A 128x128 8T SRAM bitcell array is built for processing a vector-matrix multiplication (or parallel dot-products) with 64x binary (0 or 1) inputs, 64x128 binary (-1 or +1) weights, and 128x 1-5bit outputs. Each column (i.e. neuron) of the proposed SRAM compute-in-memory macro consists of 64x bitcells for dot-product, 32x bitcells for ADC, and 32x bitcells for calibration. The column-based neuron minimizes the ADC overhead by reusing a sense amplifier for SRAM read. The column-wise ADC converts the analog dot-product results to N-bit output codes (N=1 to 5) by sweeping reference levels using replica bitcells for 2N-1 cycles for each conversion. Monte-Carlo simulations and test-chip measurement results have verified both linearity and process variation. The largest variation (σ=2.48%) results in the MNIST classification accuracy of 96.2% (i.e. 0.4% lower than a baseline with no variation). A test-chip is fabricated using 65nm, and the 16K SRAM bitcell array occupies 0.055mm2. The energy efficiency of the 1bit operation is 490-to-15.8TOPS/W at 1-5bit ADC mode using 0.45/0.8V core supply and 200MHz.

This work presents an 8T SRAM macro with vertical read word line (RWL) and selective dual split power line techniques. The proposed vertical RWL reduces dynamic power consumption during read operation by charging and discharging only selected read bitlines (RBLs). The data-aware dual split power line enhances the write margin (WM) and the static noise margin (SNM) after combined with vertical write bitlines. The 16kb SRAM test chip in 65nm CMOS technology demonstrates the minimum energy consumption of 0.506 pJ at 0.4 V, and the minimum operating voltage of 0.26 V.

This work presents an ultra-low voltage level shifter (LS) with fast and energy-efficient voltage conversion from the deep subthreshold region to the superthreshold region. The proposed LS achieves better performance and higher energy efficiency by addressing the reduced swing and the slow fall transition issues in prior arts. A novel reduced-swing buffer design is proposed to obtain lower standby power consumption while a pass transistor is used for improving the speed of the fall transition. The proposed LS consists of only 11 transistors, which is the same as WCMLS [4]. A test chip fabricated in 65nm technology demonstrates that the proposed LS shows the maximum leakage and speed improvements of 16.3´ and 2.7´ compared to WCMLS. The proposed LS also accomplishes the maximum energy reduction of 8.5´ and can convert the deep subthreshold voltage as low as 100 mV to the superthreshold voltage of 1.2 V.

This work proposes a novel 24-transistor change-sensing flip-flop (CSFF) for ultra-low power applications. With the aid of an internal change-sensing unit, the proposed CSFF eliminates redundant transitions of internal clocked nodes when the input and output data are identical. No additional transistors are required compared to the conventional transmission-gate flip-flop (TGFF). Test chip measurement in 40nm CMOS technology shows that CSFF exhibits the power reduction of 82% and 68% at 10% activity rate and 1.0V, and the delay improvement of 37% and 11% compared to conventional TGFF and static single-phase contention-free flip-flop (SSCFF) in the supply range of 0.4V to 1.0V. While achieving better power and energy efficiencies, CSFF still maintains robust functionality at ultra low voltage operations. The proposed CSFF shows the minimum operating voltage of 0.19 V.

A temperature-aware low voltage 8T SRAM for high temperature operations is presented. A dedicated read port with virtual ground and optimal body bias improves sensing margin under very high temperature (up to 300 °C). Bitline offset voltage for data ‘0’ caused by the virtual ground scheme is also compensated by a replica bitline. The independent body bias control feature of the employed SOI technology allows the write margin to be enhanced significantly without using any write-assist circuitry. Test chips were fabricated in 1 µm SOI technology with Tungsten interconnect for reliability at high temperature. Measurement results demonstrate that the proposed SRAM operates successfully up to 300 °C with the supply voltage range of 2 - 5 V. At the minimum performance variation point (VDD = 2.5 V), the SRAM consumes 1.48 mW and shows the access time of 156 ns and maximum clock frequency of 14.38 MHz at 300 °C.

This work proposes a novel architecture and circuit implementation for Capacitance to Digital Converter (CDC). Capacitance information is digitized using a continuous time second order delta-sigma modulator with multi-bit quantization. Proposed architecture embeds a Capacitance to Voltage Converter (CVC) in the delta-sigma loop, which improves dynamic range and energy efficiency of the CDC. An active-RC integrator and multi-bit VCO-integrator/quantizer are used as the loop filters. Measurement results from a test chip fabricated in 0.18 µm CMOS technology show that the CDC achieves 13-bit resolution for capacitance-to-digital conversion with a measurement time of 0.125 ms while consuming only 42 uA from 1.2 V supply. This corresponds to a state-of-the-art figure-of-merit (FoM) of 0.84 pJ/conversion-step.

This work presents a highly efficient 3-stage boost converter with an isolated Power-on-Reset (PoR) based starter for thermal energy harvesting. The automatic pulse generation property of the proposed PoR is coupled with a charge-pump (CP) based clock enhancer (CE) to enhance the gate-driving capability for fast and efficient boost conversion during startup. Unlike conventional PoR-based startup circuits, where the reset signals cannot be directly utilized to execute a boost conversion during startup, the proposed starter converts a chain of pulses from the PoR into level-shifted clock signals to aid direct boost conversion from sub-threshold voltages. The proposed boost converter has a minimum self-startup TEG voltage of 150 mV at the series resistance (ESR) of 450 Ω without using external devices or native MOSFET. The maximum ESR for startup is 600 Ω at the TEG voltage of 320 mV. The peak power conversion efficiency of the proposed boost converter is 78%.

This work proposes an ultra-low voltage, VCO-based sigma delta modulator with self-compensated current reference against process and temperature variations. The proposed current reference generator sets the feedback current of the multi-bit Non-Return-to-Zero (NRZ) DAC and the VCO tuning coefficient (KVCO) at ultra-low voltage. A test chip fabricated in 65nm CMOS technology demonstrated successful operation at 0.3 V. It consumes 510 nW and occupies 0.015mm2. The proposed VCO-based delta sigma modulator achieves peak SNDR of 56.1dB at 0.3 V and 10 kHz input bandwidth, and FoM of 49fJ/conv.-step.

This work presents a power and area efficient processor for real-time neural spike-sorting. We propose a robust spike detector (SD), a feature extractor (FE), and an improved k-means algorithm for better clustering accuracy. Furthermore, time-multiplexing architecture is used in SD for dynamic power reduction. A customized 39kb 8T SRAM is also implemented to minimize leakage and storage area. The proposed processor consumes 0.175 µW/ch with leakage of 0.03 µW/ch at 0.54 V and area of 0.0033 mm2/ch.

This work also proposes an 8T SRAMs utilizing a column-based data encoding scheme to reduce read and write power when there are similarities between consecutive data. It is useful in image processing applications where nearby pixels tend to have similar values. The proposed design has two modes of operation: normal and sequential modes. In the normal mode, it operates as a normal SRAM. In the sequential mode, bit-wise differences between consecutive data are written instead of the actual data. This leads to a much higher number of zeros in the array. Accordingly, a new data-aware bitline pre-charge scheme is proposed to minimize write power when writing a zero. A PVT-tracking reference voltage generator is also employed to compensate the read-bitline leakage for ultra-low voltage operation.

This work presents circuit techniques that support on-chip SRAM dynamic reliability management to prevent half-selected cell stability failure due to Bias Temperature Instability (BTI) degradation. The proposed techniques monitor the BTI degradation in SRAM cells through a replica row and adjust the WWL voltage level with the assist of a two-phase write operation, where the WWL voltage level is divided into two phases to maintain the half-selected cell stability with BTI without compromising other circuit parameters. Test chip measurement shows that the half-selected cell stability failure is reduced significantly with the proposed techniques at a 10% area and 3.42% power overheads in 28-nm FDSOI 16kb SRAM.

An energy-efficient sensor node processor (SNP) is presented for intelligent sensing in Internet of Things (IoT) applications. To achieve ultra-low energy consumption while meeting required performance, the proposed processor incorporates an ARM Cortex-M0 RISC core and diverse hardware accelerators, including discrete wavelet packet trans-form (DWPT) engine, FIR filtering engine, FFT engine, and CORDIC engine, to accelerate common signal processing tasks in intelligent sensing. At the architecture level, dual-bus architecture with automatic bus sensing and reconfigurable memory access scheme are proposed. At the circuit level, digital-assisted cognitive sampling and ultra-low-voltage operation with in-situ timing-error monitoring techniques are employed. When applied to neural spike classification and vehicle speed detection, the proposed SNP consumes only 39 and 29 pJ/cycle at 0.5 V, respectively.

In this work, a simplified Linear Feedback Shift Register (LFSR) is used to shuffle input data so that distribution of “1” and “0” in each column is close to 50%. As a result, bit-line sensing margin is enhanced. In addition, a bit-line boost biasing scheme is applied to further increase the bit-line swing and the sensing window. A 16Kb test chip fabricated in a 65 nm CMOS technology demonstrates successful SRAM operation at 0.2 V, room temperate. The power consumption of 0.94 mW and the access time of 256ns were achieved at 0.2 V, room temperature.

It introduces a hybrid capacitive coupling interconnects (CCIs) array suitable for bumpless flip-chip 3D integration. Inside the hybrid array, both single-ended and common-centroid differential CCIs are interleaved together to cancel the crosstalk among them. The crosstalk cancellation capability of its own allows CCIs to be placed closer and thus, improves the area efficiency. A high gain and high CMRR receiver is also presented to minimize jitter caused by the common-mode noise. The process variation track biasing circuit is also proposed for the receiver. The measurement verifies that the proposed transceiver in a 3 × 3 pseudo-hybrid CCIs array produces only 84 ps or 0.2 UI crosstalk related jitter under the worst case crosstalk condition. Total of 9 transceivers in the array achieve the data rate of 20.79 Gbps and consume only 53 µW/Gbps. The chip was fabricated in 65nm CMOS technology.

A 0.18V energy-delay efficient 16-transistor DFF targeting near-/sub-threshold operation is presented. With the aid of charge pumps and anti-INWE sizing strategy, 23% boosted C-Q delay from 0.18V to 0.3V is observed. The delay variability is minimized by pumped gate voltages. Power consumption is remarkably reduced mainly due to minimum transistor count. The DFF proves to have a 50.8% lower energy-delay product compared to TGFF. The circuit is verified by a 256-bit FIFO and achieves 31.2% power reduction at 0.3V. Experimental results validates the proposed DFF is competent to near-/sub-threshold applications. (Collaboration IME, A*STAR)

An 8-Kbit low power 8-T SRAM for high temperature (up to 300̊C) applications is presented. Near-threshold operation is selected for minimum performance variations over a wide temperate range. We proposed a PVT-tracking bitline sensing margin enhancement technique to improve the bitline swing and the sensing window. Test chips fabricated in a commercial 1.0-µm SOI technology with high temperature interconnection option demonstrates successful SRAM operation at 2 V, 300̊C.

An energy efficient 9T SRAM with bitline leakage equalization and Content-Addressable-Memory-assisted performance boosting techniques is presented. The equalized read bitline leakage improves the read bitline swing by 6.8× at 0.2V. The proposed CAM-assisted boosting technique enhances the write performance of the multi-threshold CMOS (MTCMOS) SRAM array implemented with higher-Vth (HVT) devices. The inserted tiny CAM conceals the slow data development after data flipping, and therefore improves overall operating frequency in the near threshold region. A 16Kb SRAM test chip was fabricated in 65nm CMOS technology and showed the minimum energy of 0.33 pJ at 0.4V.

An SRAM reliability test macro is designed in a 1.2V, 65nm CMOS process for statistical measurements of Vmin degradation. An automated test program efficiently collects statistical Vmin data and reduces test time. The proposed test structure enables Vmin degradation measurements for different SRAM failure modes such as the SNM-limited case and the access-time-limited case. The impact of voltage stress on the time to cell data flip was measured.

Precise measurement of digital circuit degradation is a key aspect of aging tolerant digital circuit design. In this work, we present a fully-digital on-chip reliability monitor for high resolution frequency degradation measurements of digital circuits. The proposed technique measures the beat frequency of two ring oscillators; one stressed, the other unstressed; to achieve 50X higher delay sensing resolution than prior techniques. This differential frequency measurement technique also eliminates the effect of common-mode environmental variation. A 265x132µm2 test chip implementing this design has been fabricated in a 1.2V, 130nm CMOS technology. The measured resolution of the proposed monitoring circuit was 0.02%, as the ring oscillator in this design has a period of 4ns; this translates to a temporal resolution of 0.8ps. The 2µs measurement time was short enough to suppress the unwanted recovery effect from concealing the actual circuit degradation.

A 2 mW, 100 kHz, 480kb subthreshold SRAM operating at 0.2 V is demonstrated in a 130 nm CMOS process. A 10-T SRAM cell allows 1k cells per bitline by eliminating the data-dependent bitline leakage. A virtual ground replica scheme is proposed for logic ‘0’ level tracking and optimal sensing margin in read buffers. Utilizing the strong reverse short channel effect in the subthreshold region improves cell writability and row decoder performance due to the increased current drivability at a longer channel length. The sizing method leads to an equivalent write wordline voltage boost of 70 mV and a delay improvement of 28% in the row decoder compared to the conventional sizing scheme at 0.2 V. A bitline writeback scheme was used to eliminate the pseudo-write problem in unselected columns.