To map our list of words from reviews, we use, GloVe word embedding database to provide initial weights for the embedding layer. For the purpose of our project, we are using ‘glove.6B.50d.txt’ that contains 400,000 word vectors.
Tokenizing the Data
Tokenization is the process by which big quantity of text is divided into smaller parts called tokens where each token is a separate word.
Tokenizer converts review words into an integer token and add padding to get sequences of same length.