Our full implementation of AES encryption and decryption in verilog can be found in our Github repository linked here. We have included a few sections of our modules below to explain our process and structure.
Each row in this step gets shifted over by either 0, 1, 2, or 3 positions row wise. This is an example from the second row where the bytes get shifted one position to the left. In our implementation, 'state' is the unshifted state and the values of 'newstate' get assigned to the corresponding initial values in 'state'. The rest of the implementation of ShiftRows follows the same structure, just with different offsets for each row.
Rijndael's S-box is a 16x16 lookup table where the rows represent the first half of a byte and the rows make up the second half. To implement this 2D array in verilog, we used nested case blocks. First, we split 'address' in half where we assign 'column' the least significant half and 'row' gets assigned to the most significant half.
The row condition is evaluated first and once a condition has been matched, there is an internal case statement that checks for a matching column value.
In order to perform the modular multiplication required in the MixColumns step, each column can be treated as a vector that gets multiplied by a 4x4 matrix derived from Galois Field. To execute this operation, the bits are shifted one to the left and then XOR-ed with the signed bit.
In MixColumns, each 1x4 column is taken and matrix multiplication is implemented. Since Galois Field is finite, addition is performed by the XOR operation. The simplification of the new column is relatively easy since there are only three coefficients to work with. Nothing needs to happen when the coefficient is 1. We have a module that performs multiplication by 2 and to multiply by three, the initial value can be added to the multiplication by 2 result since 2x+x is equivalent to 3x.
In the key expansion step, a unique key is generated for each round based on the previous key used. The operation happens column wise, so first we had to concatenate the values from the initial state which makes referencing each column more efficient. This step in the process is made up of a shift, substitution, and addition transformation.
First, the "top" byte is moved to the bottom, shifting all of the other ones "up" an position. Then, the S-box is used again to replace the values in the column. And finally, the each column in the initial state is XORed with the previous, with the exception of the first column that also gets XORed with a constant that is relative to the number of round it is.
In the AddKey step, the new round key is added to the state. In order to implement this, we combined the state with the key by XORing each corresponding byte to make up the new state.
Each round (except for the last in encryption, and first in decryption) is made up of substituting, shifting rows, mixing columns, and adding a unique key. These steps are repeated, just with each subsequent interim states.
In 128-bit encryption, there are 10 rounds. Each round consists of the same operations with the exception of the last one where MixColumns is omitted. Between each round, the key and the state that was produced in the previous round are passed into the following round along with the a new round constant.
In order to properly decrypt, the same key must be used each round in the add key step. The way that we implemented KeyExpand allowed us to get all 10 round keys from the initial key independently from the other functions. Since each key is the same size, each index that is a multiple of 16 marks the start of a new round key. In our implementation, we choose to generate all of the keys ahead of time so that we could traverse through them backwards in the decryption process. This differs slightly from the way that we wrote KeyExpand in our encryption implementation. We embedded a single cycle of KeyExpand in each round as opposed to calculating them all at the same time. In each round, we passed in the previous key, applied the key expansion operations, and then added the key. (Theoretically, our encryption would have also worked if we had performed the entire key expansion process at the beginning like we did for decryption.) To the left is the code we used to generate all of the keys and save them in a bus.
Decrypting looks quite similar to encryption, and is implemented using the same principles. If you look back at the Implementation section on encryption, you'll notice that the order of the mini round and 10 main rounds are directly inversed for decryption. The only other big change from encryption is that to decrypt we first run the "wholekeyexpand" module to get a bus of all the round keys we needed to encrypt. After doing this we can easily pass the keys into each inverse round, starting with the last key used to encrypt and moving backward.
To test each operation used in AES, we started by making a test bench for each module. We ran predetermined examples through our test benches to make sure the operations did as expected.
Beyond this, because of the symmetry of AES, we were able to check our modules by running the encryption version, followed by the decryption model. For example, if we ran ShiftRows on the state object, and then took that result and ran InverseShiftRows, we should get the state object that we started with before running ShiftRows. All of AES is symmetrical, so the same can be done to easily check any operation.