Abstract
DRAM suffers from a growing number of errors from process scaling. System companies and DRAM vendors have introduced system ECC (Error Correcting Codes) and on-die ECC to protect memory from errors. The protection schemes, however, increase protection costs with separate redundancy. This paper proposes a unified memory protection scheme called You Code Only Once (YOCO). In YOCO, the redundancy generated by the system ECC is used for both system-level symbol-error correction and on-die bit-error correction. YOCO can reduce HBM2E ECC redundancy overheads from 21.9% down to 12.5% with negligible yield losses.
YOCO Architecture
Figure 1. A comparison between prior work and YOCO architecture
Fig. 1. compares the ECC architectures of the prior work [1] and YOCO. The prior work utilizes 16-bit redundancy (12.5%) over 128-bit data to provide 8-bit Single Symbol Correction (SSC) S-ECC. A standard HBM2E receives two S-ECC blocks (288 bits), partitions them into three 96-bit chunks, and adds 8-bit extra redundancy per chunk to provide SEC-DED O-ECC per chunk. This flow requires two encodings and two decodings for S-ECC and O-ECC, and the total cell overhead is 21.9%.
With YOCO, the shared encoder utilizes the same ECC block size as the S-ECC in the prior work (i.e., 8-bit SSC over 128-bit data and 16-bit redundancy). However, it does not have separate encoding and redundancy for O-ECC. Instead, the S-ECC encoding scheme is shared with O-ECC designers, possibly by standardization, and the O-ECC designer develops a simpler decoding algorithm. Using the shared redundancy, the redundant cell overhead decreases to 12.5%.
The O-ECC decoding utilizes S-ECC redundancy, which is symbol-based, to correct bit errors (i.e., SEC). It may use the same decoding as S-ECC, yet full decoding of the symbol-based decoding can take a significant time (e.g., Samsung reported 3.5ns for SSC decoding [2]). Because the primary goal of O-ECC is to correct random bit errors, we simplified the decoding to correct single-bit errors to reduce the decoding latency and not degrade performance.
YOCO Decoding
Figure 2. An overview of the O-ECC decoder in YOCO
The goal of YOCO O-ECC decoding is to provide SEC-DED using SSC redundancy, and Fig. 2. shows the overview. Single bit errors are a subset of single symbol errors and have unique syndromes in the SSC. YOCO computes the syndrome values for each bit error in advance, adds a comparator to each bit position to compare against the corresponding syndrome value, and flips the bit on the match. The latency for this simple decoding is significantly faster (e.g., 0.42ns with UMC 28nm HVT process) than the full Reed-Solomon decoding of SSC.
Double error detection is not guaranteed by SSC. Having two bit-errors on different symbols corrupts two symbols and is not always detectable by SSC. However, the SSC for HBM2E utilizes shortened codes to match its access granularity. 8-bit Reed-Solomon allows up to 255-symbol codes, yet the S-ECC shortens codes to 18-symbols to accommodate 16-symbol data and 2-symbol redundancy. If a double-error has a syndrome of unused positions, it can be detected.
In order to find a shortened SSC with DED property, we built symbol-based parity check matrices of 8-bit Reed-Solomon codes with different primitive polynomials, converted them into binary matrices, and checked whether any sum of three columns of the binary matrix is zero. If a zero-sum exists, it means a two-bit error has a syndrome same as a single-bit error and can be miscorrected. We found several codes satisfying the required property (e.g., A primitive polynomial is x8=x4+x3+x2+1, and selecting the first 18 columns). In the circuit-level, the detection can be implemented as a NOR of correctable error syndromes.
Evaluation
Figure 3. An evaluation of DRAM yield against different cell fault ratios
YOCO can reduce the overall ECC storage overhead from 21.9% down to 12.5% by eliminating O-ECC redundancy. It can save write latency by eliminating the O-ECC encoding step. The decoding latency is estimated as 0.42ns, which is the same as the estimated latency of the current HBM2E O-ECC (SEC on 104-bit blocks) and significantly less than the reported SSC decoding (3.5ns) [2].
To estimate the yield, we assume random cell faults with varying ratios and apply binomial distribution. We assume a 1Gb HBM2E sub-channel can tolerate up to 160 uncorrectable blocks using its spare rows and columns. If the number of uncorrectable blocks exceeds the count, it is regarded as bad. The baseline has no ECC over 128b blocks, and the prior work utilizes (104, 96) SEC O-ECC. Fig. 3 presents the results. With no protection, the yield starts decreasing at 10-9. Prior work and YOCO maintain near 100% yield up to 10-6. At the 3x10-6 ratio, the prior work shows a slightly better yield than YOCO (100% vs. 98.2%). All schemes fail to achieve a high yield at the higher ratios. Despite the improved yield, YOCO can still correct severe errors using the 2nd decoding of S-ECC.
Conclusion
This paper presents and analyzes an efficient memory protection scheme called YOCO. YOCO can provide the same level of protection as separate S-ECC and O-ECC but reduces the redundancy by encoding only once.
References
[1] K. C. Chun et al., “A 16-GB 640-GB/s HBM2E DRAM with a databus window extension technique and a synergetic on-die ECC scheme,” IEEE J. Solid-State Circuits, vol. 56, no. 1, pp. 199–211, Jan. 2021.
[2] S. Cha et al., “Defect Analysis and Cost-Effective Resilience Architecture for Future DRAM Devices,” in HPCA, 2017.