The algorithm works by first expanding the given key using Rijndael’s key schedule and then using a “XOR” to add the round key to the message. Then it launches into rounds of the following steps: it takes the numerically represented data and substitutes in values from a substitution table; next, it shifts the rows of data, before next mixing the columns; finally a “XOR” is used on each column with a different part of the encryption key to again add the round keys. The shift rows and mix columns work to just add another layer of complication to the encryption by transforming both vertically and horizontally. The substitution changes the data in a non-linear way to add further complicate the relationship between the message and the ciphertext. Finally, the key expansion/xor step allows the algorithm to be harder to crack by generating a different round key to add during each round.
A flow chart of the AES encryption progress can be seen in the image below.
AES encryption structures data in the form of a "state," a 4-by-4 matrix of bytes. Expanded round keys are also held in this format to allow for the addition of keys to the current state object.
Byte substitution is done by finding the designated s-box value of the byte and replacing the original with the value from the table. The same table is used throughout AES encryption to standardize the substitution step and AES. Below is the table used for the substitution step, note this table was obtained from the FIPS 197 Advancement Standard as issued by the government.
This "S-Box" substitution table is from Rijndael's cipher which derives the table from finding the multiplicative inverses of the inputs and then transforming them with an affine transformation (a projection matrix).
"Shift Rows" works by quite literally cyclically shifting each row (4 bytes) by a different number. Row 1 is shifted by 0, row 2 is shifted by 1, row 3 is shifted by 2, and row 4 is shifted by 3. A visual of this can be seen in the diagram below, note this table was obtained from the FIPS 197 Advancement Standard:
This operation can be done quite simply by reassigning the bytes to new spaces in the state object.
In this step of the process, the transformations are applied to the columns. This adds another level of complexity to the encryption since the columns are made up of different blocks than in the original matrix after the ShiftRows step. Now the blocks are in both new rows and new columns in the new matrix.
The MixColumns takes each column and multiplies it by a 4x4 matrix. The values in the matrix are derived from Rijndael's Galois field. Galois' field is a finite field meaning it has a finite number of values. These are often used in cryptography since you can perform modular arithmetic with them. When the largest number is reached, the values circuit back to the beginning.
Galois' Field is given by the following:
Since we are multiplying a 4x4 matrix and a 4x1 vector column, we expect to get a 4x1 vector in return which we can return as the new column. One observation that can be made about this matrix is that it only has coefficients of 1, 2, and 3. This means we only have to computer modular multiplication with three different values, one of which is 1.
As expected, multiplying by 1 in a finite field has no effect. Since we are using binary, we can shift each bit over one to the left to multiply by two. We also have to take into account the most significant bit and XOR the shifted bits by 00000000 or 11111111. Lastly, multiplication by 3 can be thought of as (2 * X) + X. In finite arithmetic, addition is replaced with the XOR function. So here, when we need to do multiplication by 3, we can first do multiplication by 2 and then XOR that product with the original state.
The algorithm starts with a 4x4 key and generates a new one for each of the following rounds. Through this operation, a unique is generated for each round. The process of expanding the key happens in intervals of 16 bytes to match the 16 byte state object that it gets applied to.
First, if you imagine the 16 bytes arranged in a 4x4 matrix, the last column is manipulated first. The bytes are shifted in a similar fashion as the ShiftRows step expects it is done column-wise. Then there is a substitute step where Rijndael's s-box is referenced again. At this point, the transformed column that has been obtained is called the rotword. A 4 byte round constant, which is dependent on the number of rounds, is XORed with the rotword and the first column to produce the new first column. Each subsequent column is calculated by taking the old column in the corresponding position and XORing it with a new column that was most recently generated.
This process happens in multiples of 4 bytes with the indices just shifted over 16 bytes each round.
After the key has been generated, it needs to be added to the current state object. Because keys are the same size as state objects, the operation can be done with a simple byte-wise XOR operation.
XOR is a common replacement for addition in cryptography, because the operation essentially performs bit-wise addition (the visualization above was obtained from Wikipedia images).