Embedding

GloVe (Global Vectors for Word Representation)

To map our list of words from reviews, we use, GloVe word embedding database to provide initial weights for the embedding layer. For the purpose of our project, we are using ‘glove.6B.50d.txt’ that contains 400,000 word vectors.

Tokenizing the Data

Tokenization is the process by which big quantity of text is divided into smaller parts called tokens where each token is a separate word.
Tokenizer converts review words into an integer token and add padding to get sequences of same length.

Page updated

Google Sites

Report abuse