Encoder-decoder is a type of sequence to sequence (seq2seq) models in NLP. It uses two RNN (e.g. LSTM) components to acquire the states of the input sequence and predict the output sequence.
Extra Notes about input output data.
Source input sequence is fed into the encoder
No source output sequence. Only the hidden states / cell states of encoder are further fed into the decoder.
Target input sequence is fed into the decoder, with position shifted by 1, so using step k to predict step k+1.
Target output sequence is for comparing with the actual output sequence from the decoder, so as to calculate the loss.
A few modifications are required:
Start of String (SOS), End of String (EOS)
Encoder input sequence, adds EOS to the sequence, so the encoder knows the sequence has finished and should finalize.
Encoder input sequence doesn’t need SOS because it simply starts with its first character
Decoder output sequence, also adds EOS to the sequence, so we know the output has finished. Before that, the decoder can output a string with whatever length.
Decoder input sequence, doesn’t need EOS, as when the last character is fed into the decoder, the decoder is expected to emit EOS which is the end.
Decoder input sequence, needs SOS, because that is the input character for predicting the first character in the output sequence.
When providing training data, both source and target sequences are appended with EOS.
The source sequence + EOS = encoder input sequence
The target sequence + EOS = decoder output sequence
SOS + The target sequence = decoder input sequence
In coding, both source and target sequence are appended with EOS. The decoder input sequence is then taking the target sequence excluding the last character (EOS).