Understanding cosine similarity for NLP

Introduction

Laymen explanation

If you are looking for semantically similar document and you want NLP to provide these documents, then you will be wondered how a machine is able to identify the similarity which generally a human being can do. Cosine similarity is an important tool in this regard.

Technical explanation

Cosine similarity is the cosine of the angle between two n-dimensional vectors in an n-dimensional space. It is the dot product of the two vectors divided by the product of the two vectors' lengths (or magnitudes).

It is defined to equal the cosine of the angle between them, which is also the same as the inner product of the same vectors normalized to both have length 1.

Properties

independent of their magnitude, two vectors with

The same orientation have a cosine similarity of 1
Oriented at 90° relative to each other have a similarity of 0, and
Diametrically opposed have a similarity of -1

Underlying Mathematics

The cosine of two non-zero vectors can be derived by using the Euclidean dot product formula:

Given two vectors of attributes, A and B, the cosine similarity, cos(θ), is represented using a dot product and magnitude as

where and are components of vector and respectively.

With Cauchy Schwatz Inequality, it can be proven that its value ranges in [-1,1]

Usage constraint

The cosine similarity is particularly used in positive space, where the outcome is neatly bounded in

. These bounds apply for any number of dimensions, and the cosine similarity is most commonly used in high-dimensional positive spaces.

Use-cases(When to use it)

We can use the Cosine Similarity algorithm to work out the similarity between two things. We might then use the computed similarity as part of a recommendation query. For example, to get movie recommendations based on the preferences of users who have given similar ratings to other movies that you’ve seen.

How NLP consumes it

NLP models result in output embedding. These embeddings are nothing but vectors. So, pair of such vectors are compared using cosine similarity and top few similar documents are returned as result.

Advantages

low-complexity, especially for sparse vectors: only the non-zero dimensions need to be considered.

Soft cosine similarity

In this idea, author proposes to modify the manner of calculation of similarity in Vector Space Model taking into account similarity of features. If we apply this idea to the cosine measure, then the “soft cosine measure” is introduced, as opposed to traditional “hard cosine”, which ignores similarity of features.

In such case, the semantic meaning should be considered. That is, words similar in meaning should be treated as similar. To get the word vectors, you need a word embedding model.

Reference

https://en.wikipedia.org/wiki/Cosine_similarity

https://neo4j.com/docs/graph-algorithms/current/labs-algorithms/cosine/

https://www.machinelearningplus.com/nlp/cosine-similarity/

http://www.scielo.org.mx/pdf/cys/v18n3/v18n3a7.pdf

https://youtu.be/bZUdS4W0mMk

https://en.wikipedia.org/wiki/Cauchy–Schwarz_inequality#ℝn_-_n-dimensional_Euclidean_space

Page updated

Google Sites

Report abuse