Unity ECC: Unified Memory Protection Against Bit and Chip Errors (2023). Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC). Best Student Paper Finalist, Invited to SAIF 2023
Dongwhee Kim, Jaeyoon Lee, Wonyeong Jung, Michael Sullivan, and Jungrae Kim
[Paper] [Slides] [Poster (SAIF 2023)] [GitHub] [BibTex] [NVIDIA Research]
Abstract
DRAM vendors utilize On-Die Error Correction Codes (OD-ECC) to correct random bit errors internally. Meanwhile, system companies utilize Rank-Level ECC (RL-ECC) to protect data against chip errors. Separate protection increases the redundancy ratio to 32.8% in DDR5 and incurs significant performance penalties. This paper proposes a novel RL-ECC, Unity ECC, that can correct both single-chip and double-bit error patterns. Unity ECC corrects double-bit errors using unused syndromes of single-chip correction. Our evaluation shows that Unity ECC without OD-ECC can provide the same reliability level as Chipkill RL-ECC with OD-ECC. Moreover, it can significantly improve system performance and reduce DRAM energy and area by eliminating OD-ECC.
Motivation
Figure 1. Applying 8-bit Symbol AMD Chipkill to DDR5.
This study is motivated by the high costs of separate bit-level and chip-level protection. Combining OD-ECC and RL-ECC provides robust memory protection against both bit-level and chip-level errors. However, it increases redundancy and negatively impacts performance due to overfetching and Read-Modify-Writes (RMWs) in OD-ECC. Meanwhile, DDR5 Chipkill-correct RL-ECC has unused syndromes, which, if utilized to correct more bit errors, can eliminate OD-ECC to reduce redundancy, energy consumption, and performance overheads.
A. OD-ECC Overheads
DDR5 OD-ECC employs (136, 128) codes to correct single-bit errors [1]. This implementation requires an additional 6.25% of cells for redundancy, and the extra circuitry for encoding and decoding further enlarges the chip area. A DRAM vendor has reported a total chip area increase of 6.9% for OD-ECC [2], which presents a substantial challenge for cost-sensitive manufacturers. When combined with the 25% extra chips in DDR5 ECC-DIMM, the overall cell redundancy escalates to 32.8%.
OD-ECC also degrades performance due to the disparity between access granularity (64-bit data) and ECC granularity (128-bit data). A ×4 DDR5 chip transfers 64-bit data over a 16-beat transfer. Ideally, OD-ECC block size should correspond to the access granularity, but providing SEC over 64-bit data increases the redundancy to 10.9% (7-bit). The incongruity between access and ECC granularities leads to overfetching and RMW operations, which increases power consumption and negatively affects performance.
For every 64-bit read, a DRAM chip must internally fetch 128-bit data along with its redundancy, decode the information, and transfer only half of the fetched data. This process consumes more power and lengthens the read time (by up to 2ns in [3]). The situation becomes more problematic for writes, as it requires fetching the original 128-bit block, partially updating the block with new data, encoding the data, and writing the block back to cells [2, 4, 5, 6]. DDR5 micro-architectures have maintained most timing parameters despite this change, except for one; tCCD_L_WR. It is the latency between two consecutive writes to the same bank group and has doubled due to OD-ECC. Due to the increased read time and tCCD_L_WR, OD-ECC is reported to reduce the performance of memory-intensive applications by an average of 5 − 10% [2].
B. Shortened Codes in RL-ECC
Meanwhile, DDR5 RL-ECC has the potential to provide more-thanchipkill corrections. As an example, we apply AMD Chipkill to a DDR5 sub-channel and demonstrate that many syndromes are used for detection only.
On a DDR5 sub-channel with 32-pin data and 8-pin redundancy, we construct 8-bit symbols from two consecutive data from a ×4 chip, similar to the AMD approach (Figure 1). Consequently, an ECC word comprises 8 data symbols and 2 redundant symbols. The two redundant symbols (16 bits in total) offer 65535 distinct nonzero syndromes, which can be used to identify any single symbol error (255 cases for 8-bit symbols) across 255 symbol positions.
However, the ECC words contain only 10 symbols (8 for data and 2 for redundancy), and the remaining 245 symbols are replaced with zeros during encoding and decoding (i.e., shortened). If a decoded syndrome corresponds to errors on one of the shortened symbols, it is considered as the detection of more severe errors (e.g., two-chip error) rather than correcting the error-free constant. As a result, only 2,550 syndromes (3.89%) out of the 65535 syndromes are used for correction, and the remaining 96.11% of syndromes are used for detection only.
If these syndromes can be repurposed to correct multi-bit errors, we can potentially eliminate the need for OD-ECC, reducing redundancy, power consumption, and performance overheads. This change trades detection capability for correction and should be carefully controlled not to degrade the detection coverage level, which is important to large-scale systems and mission-critical systems.
Unity ECC
Algorithm 1. Flow of code construction algorithm (Greedy search)
Figure 2. 2 × 257 unshortened extended RS code H-matrix. The last two columns indicate the identity matrix.
Figure 3. H-matrix example of (10, 8) Unity-ECC with generator polynomial = 0x15F.
This paper proposes a novel ECC, called Unity ECC, that is capable of correcting both bit errors and chip errors at the rank level. Featuring Single Symbol Correcting and Double Error Correcting (SSC-DEC) capabilities, Unity ECC offers robust protection against both growing scaling-induced bit errors and infrequent-but-severe chip-level errors. By integrating double-bit error correction into RL-ECC, Unity ECC eliminates the storage, power, and performance costs associated with OD-ECC. The high efficiency of this approach stems from repurposing detection-only syndromes in RL-ECC to correct multi-bit errors.
Unity ECC is a strong single-tier RL-ECC designed for correcting DRAM bit and chip errors. Similar to AMD, Unity ECC forms 8-bit symbols from two beats of data per ×4 chip, resulting in eight (10, 8) 8b-symbol codewords per memory transfer. Similar to RS codes, Unity ECC can correct a chip error using SSC (2-symbol redundancy) per codeword. However, its novel SSC-DEC capability can also correct two-bit errors by mapping double errors to detection-only syndromes in the SSC code. Unity ECC unifies the roles of both RL-ECC and OD-ECC within a single RL-ECC without additional redundancy.
A. Code Property
Our proposed Unity ECC codes can correct all single-symbol errors and all random double-bit errors.1 Linear block codes are uniquely determined by a parity-check matrix, “H.” The H-matrix dictates the structure of the encoder/decoder and the error correction and detection capabilities of the code. The H-matrix of Unity ECC should have the following properties:
1) All columns are non-zero.
2) DEC: The sums (XOR operation) of any two columns are unique non-zero values.
3) SSC: The sums (XOR operation) of all symbol-aligned columns are unique non-zero values.
4) DEC+SSC: All sums from properties 2 and 3 should be unique (apart from double-bit errors in the same symbol, which are considered symbol errors).
The first and second properties provide DEC capabilities. The syndrome must be the sum of any two distinct non-zero and unique columns for double-bit errors. The first and third properties relate to SSC, where the syndrome is the sum of columns aligned with the symbol size. All syndromes derived from DEC and SSC must be non-zero and unique, with overlapping cases excluded (e.g., when a 2-bit error occurs in a single symbol).
B. Code Construction
Consider an 80-bit codeword with an 8-bit symbol size. The sum of any two H-Matrix columns yields 3160 (80C2) cases, while the sum
of any symbol size-aligned columns produces 2550 (10C1 X (2^8-1)) cases. Overlapping cases (280; 8C2 X 10) ) should be excluded, resulting in 5430 cases. If all cases are non-zero and unique, the code satisfies SSC-DSC requirements.
The number of possible non-zero syndromes using two 8-bit symbols of redundancy is 2^16−1=65535. While this is higher than the 5430 unique syndromes for single-symbol and double-bit errors, finding such an SSC-DEC code is non-trivial. As a starting point, one might adopt an approach based on RS or BCH codes—RS codes possess SSC correction capabilities, while BCH codes provide DEC correction. We construct the Unity ECC H-matrix using the unshortened extended RS code H-matrix (Figure 2), as building DEC properties on RS codes may be easier than constructing SSC properties on BCH codes. Unity ECC codes are constructed as systematic codes for convenience.
We select columns from the unshortened H-matrix (Figure 2) until matching the codeword length. A greedy search such as [7, 8, 9, 10] is applied based on previously-selected columns. Algorithm 1 presents a Unity ECC construction algorithm using a greedy search.
Our Unity ECC construction algorithm is flexible, allowing adjustments to codeword and data lengths, making it applicable to various systems. We focus on DDR5 protection in this paper; Figure 3 displays a Unity ECC code example with 64-bit data and 80-bit codeword matching DDR5’s code configuration.
Evaluation results can be found in our paper (Open Access)
Conclusion
This paper presents Unity ECC, a novel memory protection scheme that addresses key challenges in DRAM technology: high access latencies, energy consumption, hardware overhead, and susceptibility to vulnerabilities. Implemented for DDR5 DRAM as a single-tier RL-ECC, Unity ECC eliminates OD-ECC and reduces DRAM redundancy from 32.8% to 25%, leading to improved performance and reduced energy consumption. The proposed flexible algorithm and efficient decoding method allow Unity ECC to offer significant benefits over conventional DDR5 while maintaining acceptable levels of system reliability.
References
[1] M. JEDEC. 2022. DDR5 SDRAM standard, JESD79-5B𝑣 1.20.
[2] Sanguhn Cha, O Seongil, Hyunsung Shin, Sangjoon Hwang, Kwangil Park, Seong Jin Jang, Joo Sun Choi, Gyo Young Jin, Young Hoon Son, Hyunyoon Cho, et al. 2017. Defect analysis and cost-effective resilience architecture for future DRAM devices. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 61–72.
[3] Sanghyuk Kwon, Young Hoon Son, and Jung Ho Ahn. 2014. Understanding ddr4 in pursuit of in-dram ecc. In 2014 International SoC Design Conference (ISOCC). IEEE, 276–277.
[4] Seong-Lyong Gong, Jungrae Kim, Sangkug Lym, Michael Sullivan, Howard David, and Mattan Erez. 2018. Duo: Exposing on-chip redundancy to rank-level ecc for high reliability. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 683–695.
[5] Uksong Kang, Hak-Soo Yu, Churoo Park, Hongzhong Zheng, John Halbert, Kuljit Bains, S Jang, and Joo Sun Choi. 2014. Co-architecting controllers and DRAM to enhance DRAM process scaling. In The memory forum, Vol. 14.
[6] Saeng-Hwan Kim, Won-Oh Lee, Jung-Ho Kim, Seong-Seop Lee, Sun-Young Hwang, Chang-Il Kim, Tae-Woo Kwon, Bong-Seok Han, Sung-Kwon Cho, DaeHui Kim, et al. 2007. A low power and highly reliable 400Mbps mobile DDR SDRAM with on-chip distributed ECC. In 2007 IEEE Asian Solid-State Circuits Conference. IEEE, 34–37.
[7] Avijit Dutta and Nur A Touba. 2007. Multiple bit upset tolerant memory using a selective cycle avoidance based SEC-DED-DAEC code. In 25th IEEE VLSI Test Symposium (VTS’07). IEEE, 349–354.
[8] Jiaqiang Li, Pedro Reviriego, Liyi Xiao, Costas Argyrides, and Jie Li. 2017. Extending 3-bit burst error-correction codes with quadruple adjacent error correction. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 2 (2017), 221–229.
[9] Zhu Ming, Xiao Li Yi, and Luo Hong Wei. 2011. New SEC-DED-DAEC codes for multiple bit upsets mitigation in memory. In 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip. IEEE, 254–259.
[10] Chauchin Su and Jyrghong Wang. 1993. ECCSyn-A synthesis tool for ECC circuits. In 1993 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1706–1709.