Abstract
Conventional DRAM’s limitations in volatility, high static power consumption, and scalability have led to the exploration of alternative technologies such as Phase Change Memory (PCM) and Resistive RAM (ReRAM). StorageClass Memory (SCM) arises as a target application for these emerging technologies with non-volatility and higher capacity through Multi-Level Cells (MLCs). However, MLCs face issues of reliability and reduced endurance. To address this, our paper introduces a novel Error Correction Codes (ECC) method, “Single Eight-Level Cell Correcting” (SELCC) ECC. This technique efficiently corrects single-cell errors in 8-level cell memories using existing ECC syndromes without added redundancy. SELCC enhances memory reliability and improves 8LC memory endurance by 3.2 times, surpassing previous solutions without significant overheads.
SELCC ECC
Fig. 1. An example of (72,64) SELCC H-matrix in multiplicative form.
Fig. 2. An example of (72,64) SELCC H-matrix in binary form.
We propose a novel ECC scheme, SELCC, tailored to correct every single-cell error in 8LC memories. Despite its enhanced correction capability, SELCC does not require additional redundancy beyond what the conventional SEC-DED requires.
SELCC achieves this efficiency by leveraging unused syndromes of shortened codes and error boundaries. Conventional SEC-DED codes employ only 72 out of the possible 255 nonzero syndromes to correct all SEs across a 72-bit codeword span. SEC-DAEC codes further harness another 71 from the remaining 183 syndromes to correct 71 DAEs. Similarly, SEC-DAEC-TAEC utilizes another 70 from the residual 112 syndromes, enabling the correction of an additional 70 TAEs. This approach based on error adjacency, however, faces a limitation as the leftover syndromes are insufficient to correct DNEs.
Instead, SELCC focuses on error boundaries to reduce the number of targeted error patterns. In memory systems, it is rare for adjacent errors to cross a cell boundary, making them prime candidates for exclusion from the targeted error pool. Within a 72-bit word, there exist 24 8LC cells, and each of these cells can manifest 7 distinct error patterns, ranging from 000 through 111. Cumulatively, this leads to a collective count of 168 errors, all neatly contained within cell boundaries. This count aptly fits within the constraint of 8-bit redundancy.
The challenge then lies in devising a suitable mapping for these 168 errors onto unique syndromes, setting the stage for effective error correction. The subsequent subsections delve into this in detail, outlining the required properties for such codes, introducing a construction algorithm to find these codes, and describing their hardware implementation.
A. Code Properties
In an H-matrix, each column represents the syndrome generated when an SE arises at the corresponding bit position. Likewise, the syndrome resulting from a multi-bit error is the sum (XOR) of the columns from the corresponding bit positions. Consequently, the H-matrix of SELCC should adhere to the following properties to correct all single-cell errors.
1) Every column must be nonzero.
2) Each column must be unique.
3) The sum of two or three columns within any cell boundaries must be nonzero and unique.
The first and second properties guarantee that syndromes arising from any single error are distinct. In the third criterion, each trio of columns represents a cell, and the property ensures that syndromes from double or triple errors within a cell are unique.
B. Code Construction
We employ GF(2^8) arithmetic to derive an H-matrix that satisfies the properties. GF(2^8), a Galois Field with 256 elements, has a primitive element symbolized as α. This field is closed under multiplication and addition, and the element values range from 0, α^0 to α^254. We treat every 8-bit column within the H-matrix as an element of GF(2^8).
The construction initiates by populating the rightmost 8 columns of the H-matrix with descending powers of α, from α^7 down to α^0. This ensures the codes are systematic, meaning the data portion of the codeword is a reflection of the original input data, and it simultaneously aligns with 8LC boundaries.
As a consequence, double or triple errors from the rightmost cell yield syndromes of α^0+α^1 (1st DAE), α^1+α^2 (2nd DAE), α^0 + α^2 (DNE), and α^0 + α^1 + α^2 (TAE). In a similar manner, the next rightmost cell produces syndromes of α^3 + α^4 = α^3 (α^0 +α^1), α^3 (α^1 +α^2), α^3 (α^0 +α^2), and α^3 (α^0 +α^1 +α^2). Generalizing this, assigning descending powers of α to n cells, ranging from α^(3n−1) to α^0, will lead to syndromes of {α^(n−1), … , α^0}(α^0 + α^1) for the 1st DAEs, {α^(n−1), … , α^0}(α^1 + α^2) for the 2nd DAEs, and so forth.
The next step is to allocate these sequences, ensuring they do not intertwine, thereby preserving the uniqueness of syndrome values. After iterating over the 16 generator polynomials of GF(2^8) and considering different lengths of allocation, we identified several H-matrices that meet all SELCC properties. An illustrative H-matrix, for instance, employs the generator polynomial 0x11D with a cell-run-length of 8, as visualized in Fig. 1 (in multiplicative) and Fig. 2 (in binary).
Fig. 3. An implementation of (72,64) SELCC decoder.
C. Hardware Implementation
The encoding process of SELCC shares that of the conventional SEC-DED. By multiplying a 64-bit data vector with a (64 × 72) G-matrix, a (1 × 72) codeword is produced. In the binary domain, addition and multiplication operations translate to XOR and AND operations, respectively. As a result, each redundancy bit is constructed via an XOR gate tree, with contributions determined by the G-matrix [1].
Decoding is slightly more intricate, as shown in Fig. 3. The decoder multiplies a 72-bit word with the (72 × 8) transposed H-matrix to yield a (1 × 8) syndrome, using XOR gate trees. This syndrome is then distributed to the output data bits, which compare the syndrome against its correctable syndromes. An output bit in SEC-DED has only one correctable syndrome (SE at the current position, P), whereas one in SELCC has up to 4 correctable syndromes; one SE at P, two DEs at (P, one of the two other positions within the cell), and one TAE (all positions within the cell). If the syndrome matches one of these correctable syndromes, the final stage flips the output bit using an XOR gate, thereby correcting the error.
In comparison to SEC-DED, each output bit in SELCC mandates three additional comparators and a 4-input OR gate to aggregate comparison results. By parallelizing comparisons, the OR gate becomes the primary source of the latency increment. Section V-C presents a detailed evaluation regarding latency, area, and power consumption .
Evaluation results can be found in our paper
Conclusion
We introduced a novel ECC, termed SELCC, tailored to enhance the reliability and longevity of 8LC SCMs. While MLC memories bring forth the benefits of non-volatility and higher capacity, their integration as SCMs raises concerns about reliability and endurance due to inevitable resistance drifts and cell wear-outs. Prior ECC solutions with similar overheads fell short of addressing all single-cell errors, mandating a separate mechanism to enhance SCM lifetimes. SELCC significantly improves both reliability and endurance (by 3.2 times) by correcting all single-cell errors. This enhancement is achieved without the need for additional redundancy.
References
[1] Y. Song, S. Park, M. B. Sullivan, and J. Kim, “SEC-BADAEC: An Efficient ECC with No Vacancy for Strong Memory Protection,” IEEE Access, vol. 10, 2022.