Purpose of embedding in machine learning

Introduction

Laymen explanation

Do you know online personality tests that can ask you a few questions and then tell you something about yourself? They'll ask you 30 multiple choice questions, and they will give you five scores along different axes. And they will tell you some things about your personality, that psychologists have been studying for tens of years. This grouping is an example of embedding.

This document talks about embedding role in machine learning world.

Technical explanation

Ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space.

Purposes

Finding similarities
Summary of sentences

Types of embedding

Word embedding

Word embeddings are a type of word representation that allows words with similar meaning to have a similar representation. For example, using word embedding, semantically similar (e.g. “boat” — “ship”) or semantically related (e.g. “boat” — “water”) words come closer depending on the training method.

GloVE is a popular word embedding technique

Sentence embedding

Similar to word embedding, it operates on sentence. It can be used for purposes like

- Sentiment analysis (Whether the user review is +ve, -ve or neutral)
- Find semantic similar sentence
- natural language translation
- ..

BERT is NLP example tool for sentence embedding

Character embedding

It performs embedding at character level.

A benefit is that it good fits for misspelling words, emoticons, new words.

Applications

- Recommender system
- Machine translation
- NLP

A simple example of embedding

Classify input in the N-dimension space and co-ordinate will be embedding of the input

Below picture shows 2-D embedding example

Encoding vs Embedding

Encoding

Encoding words is the process of representing each word as a vector. The simplest method of encoding words is called one-hot or 1-of-K vector representation. In this method, each word is represented as an

vector with all 0s and one 1 at the index of that word in the sorted vocabulary.

Word vectors in this type of encoding for vocabulary {King, Queen, Man, Woman, Child} appear as in the following example of encoding for the word Queen: Note that 01000 is binary encoding for Queen in this encoding form

Difference

Purpose of Encoding is machine representation.
Purpose of embedding is to achieve semantic similarity

Measuring Similarity from Embeddings

You now have embeddings for any pair of examples. A similarity measure takes these embeddings and returns a number measuring their similarity. Remember that embeddings are simply vectors of numbers. To find the similarity between two vectors, you have three similarity measures to choose from, as listed in the table below.

Reference

https://developers.google.com/machine-learning/crash-course/embeddings/video-lecture

https://subscription.packtpub.com/book/web_development/9781786465825/3/ch03lvl1sec32/encoding-and-embedding

https://machinelearningmastery.com/what-are-word-embeddings/

https://towardsdatascience.com/sentence-embedding-3053db22ea77

https://nlp.stanford.edu/projects/glove/

https://www.infoq.com/presentations/nlp-word-embedding/

https://developers.google.com/machine-learning/clustering/similarity/measuring-similarity

https://towardsdatascience.com/besides-word-embedding-why-you-need-to-know-character-embedding-6096a34a3b10

Page updated

Google Sites

Report abuse