Do you know online personality tests that can ask you a few questions and then tell you something about yourself? They'll ask you 30 multiple choice questions, and they will give you five scores along different axes. And they will tell you some things about your personality, that psychologists have been studying for tens of years. This grouping is an example of embedding.
This document talks about embedding role in machine learning world.
Ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space.
Finding similarities
Summary of sentences
Word embeddings are a type of word representation that allows words with similar meaning to have a similar representation. For example, using word embedding, semantically similar (e.g. “boat” — “ship”) or semantically related (e.g. “boat” — “water”) words come closer depending on the training method.
GloVE is a popular word embedding technique
Similar to word embedding, it operates on sentence. It can be used for purposes like
Sentiment analysis (Whether the user review is +ve, -ve or neutral)
Find semantic similar sentence
natural language translation
..
BERT is NLP example tool for sentence embedding
It performs embedding at character level.
A benefit is that it good fits for misspelling words, emoticons, new words.
Recommender system
Machine translation
NLP
Classify input in the N-dimension space and co-ordinate will be embedding of the input
Below picture shows 2-D embedding example
Encoding words is the process of representing each word as a vector. The simplest method of encoding words is called one-hot or 1-of-K vector representation. In this method, each word is represented as an
vector with all 0s and one 1 at the index of that word in the sorted vocabulary.
Word vectors in this type of encoding for vocabulary {King, Queen, Man, Woman, Child} appear as in the following example of encoding for the word Queen: Note that 01000 is binary encoding for Queen in this encoding form
Purpose of Encoding is machine representation.
Purpose of embedding is to achieve semantic similarity
You now have embeddings for any pair of examples. A similarity measure takes these embeddings and returns a number measuring their similarity. Remember that embeddings are simply vectors of numbers. To find the similarity between two vectors, you have three similarity measures to choose from, as listed in the table below.
https://developers.google.com/machine-learning/crash-course/embeddings/video-lecture
https://subscription.packtpub.com/book/web_development/9781786465825/3/ch03lvl1sec32/encoding-and-embedding
https://machinelearningmastery.com/what-are-word-embeddings/
https://towardsdatascience.com/sentence-embedding-3053db22ea77
https://nlp.stanford.edu/projects/glove/
https://www.infoq.com/presentations/nlp-word-embedding/
https://developers.google.com/machine-learning/clustering/similarity/measuring-similarity
https://towardsdatascience.com/besides-word-embedding-why-you-need-to-know-character-embedding-6096a34a3b10