This project is aimed at implementing the construction of Locally-Testable-Codes with constant distance, rate and locality, proposed by Dinur et al .
You can find the code of this project here.
You can also find a link to a document about the choices of parameters here (coming soon)
Why Does the Code Have Rate In Unproven Regimes?
The construction proves a lower bound on the rate of the resulting code: 2(rate_A+rate_B)-3 (or, under certain conditions, 2(rate_A*rate_B)-1). However, when rate_A+rate_B<1.5 (or rate_A*rate_B<1/2), there is no guarantee on the rate of the code.
Empirically, we know that the code is non-empty (and so it has rate > 0). It is an interesting question to understand under which conditions the code still has rate.
The same question holds for expander codes when the smaller code's rate is too small to provide meaningful lower bounds.
Analyze Guarantees of the Decoding Algorithms along Vertices and Edges
The decoding algorithm for c3-LTC described in the original paper is quite inefficient. The main loop checks if there is a vertex g and a choice of tensor code word to put in g's local view, such that the number of disputes between local views of the vertices. This exhaustive search approach is inefficient, even for small sizes.
In our implementation, we choose to implement two "natural" decoding algorithms:
Iterate over all the edges, and decode the local view of each edge.
Iterate over all the vertices, and decode the local view of each vertex.
We observe that the vertex decoding is similar to Tanner code decoding approach, while the edge decoding is somewhat similar to tensor decoding (because we can think of iterating along the A-edges followed by iteration along the B-edge, akin to the decoding along the columns and then the rows in the "natural" algorithm for decoding tensor codes).
It remains to prove the concrete decoding bounds of these proposed algorithms.
Encoding by propagation
The construction of the generator matrix for the generated LDPC code is computationally expensive.
In our code, the parity matrix is generated by collecting the constraints forced on each edge. Afterward, we compute the kernel of the resulting (big) matrix, which has dimension: ([set of A edges] * [co-dimesion of C_B] + [set of B edges] * [co-dimesion of C_A]) X (squares). As the number of squares behaves like |G| * |A| * |B| / 4, this number is likely to be very big (similarly for the number of rows), and so performing Gaussian elimination or computing the kernel is quite a heavy computation.
An interesting open direction would be to find a linear encoding algorithm.
A suggestion for such an algorithm by prorogation -
Choose squares and assign values to them. If the value of some square is "forced" by the constraints posed by the tensor code, compute the value it should have according to these constraints and continue until all squares have values.
Find Small Tensor Codes that are k-Agreement Testable
The original construction calls for tensor-code (of the two inner codes), which is k-agreement testable.
However, finding concrete codes whose tensor is k-agreement testable for concrete value of k, even for tensor of small codes, turns out to be too computationally heavy.
An interesting open direction would be to construct a simple tensor code with a concrete value of k.
Additional Improvements
1) Use a more sophisticated tensor decoding, using decoding with erasures.