NLP Models

Naive Bayes Classifier

Naive Bayes is a conditional probability model based on applying Bayes' theorem: given a problem instance to be classified, represented by a vector X=(x₁, x₂, x₃, …., x_n) representing some n features (independent variables), it assigns to this instance probabilities: P(Y|X)

Using Bayes' theorem, we know that: P(y|x) = P(y)*P(x|y)/P(x)

Now the "naive" conditional independence assumptions come into play: assume that each feature x_i is conditionally independent of every other feature x_j for i ≠j in the given category Y. That means,

P(Y|X)P(Y,X)=P(Y| x₁, x₂, x₃, …., x_n)P(Y, x₁, x₂, x₃, …., x_n)=P(Y)P(x₁|Y)P(x₂|Y)…

Document Classification

Naive Bayesian classification can be apply to the document classification problem. Consider the problem of classifying documents by their content, for example into spam and non-spam e-mails. Imagine that documents are drawn from a number of classes of documents which can be modeled as sets of words where the (independent) probability that the i^th word of a given document occurs in a document from class C can be written as p(w_i|C).

Then the probability that a given document D contains all of the words w_i, given a class C, is

P(D|C)=p(w₁|C)p(w₂|C)p(w₃|C)…

The question that we desire to answer is: "what is the probability that a given document D belongs to a given class C?" In other words, what is P(C|D)?

Using Naive Bayes classifier, P(C|D) = P(C)*P(D|C)/P(D)

Latent Dirichlet Allocation (LDA)

LDA is a generative statistical model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's presence is attributable to one of the document's topics. LDA is an example of a topic model.

This model can be show as following:

Generative process:

1. Sampling from Dirichlet distribution α to generate a topic distribution θ_i of document i

2. Sampling from the polynomial distribution θ_i to generate the topic z_ijof the j^th word of the document i

3. Sampling from Dirichlet distribution β to generate a word distribution φi of topic z_ij

4. Sampling from the word distribution φi to generate the result word w_ij

Word2vec

Word2vec is a group of related models that are used to produce word embeddings. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic contexts of words. Word2vec takes as its input a large corpus of text and produces a vector space, typically of several hundred dimensions, with each unique word in the corpus being assigned a corresponding vector in the space. Word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space.

Google Sites

Report abuse

NLP Models

Naive Bayes Classifier

Naive Bayes is a conditional probability model based on applying Bayes' theorem: given a problem instance to be classified, represented by a vector X=(x1, x2, x3, …., xn) representing some n features (independent variables), it assigns to this instance probabilities: P(Y|X)

Using Bayes' theorem, we know that: P(y|x) = P(y)*P(x|y)/P(x)

Now the "naive" conditional independence assumptions come into play: assume that each feature xi is conditionally independent of every other feature xj for i ≠j in the given category Y. That means,

P(Y|X)*P(Y,X)=P(Y| x1, x2, x3, …., xn)*P(Y, x1, x2, x3, …., xn)=P(Y)*P(x1|Y)*P(x2|Y)…

Document Classification

Then the probability that a given document D contains all of the words wi, given a class C, is

P(D|C)=p(w1|C)*p(w2|C)*p(w3|C)…

The question that we desire to answer is: "what is the probability that a given document D belongs to a given class C?" In other words, what is P(C|D)?

Using Naive Bayes classifier, P(C|D) = P(C)*P(D|C)/P(D)

Latent Dirichlet Allocation (LDA)

This model can be show as following:

Generative process:

1. Sampling from Dirichlet distribution α to generate a topic distribution θi of document i

2. Sampling from the polynomial distribution θi to generate the topic zij of the jth word of the document i

3. Sampling from Dirichlet distribution β to generate a word distribution φi of topic zij

4. Sampling from the word distribution φi to generate the result word wij

Word2vec

Naive Bayes is a conditional probability model based on applying Bayes' theorem: given a problem instance to be classified, represented by a vector X=(x₁, x₂, x₃, …., x_n) representing some n features (independent variables), it assigns to this instance probabilities: P(Y|X)

Now the "naive" conditional independence assumptions come into play: assume that each feature x_i is conditionally independent of every other feature x_j for i ≠j in the given category Y. That means,

P(Y|X)P(Y,X)=P(Y| x₁, x₂, x₃, …., x_n)P(Y, x₁, x₂, x₃, …., x_n)=P(Y)P(x₁|Y)P(x₂|Y)…

Then the probability that a given document D contains all of the words w_i, given a class C, is

P(D|C)=p(w₁|C)p(w₂|C)p(w₃|C)…

1. Sampling from Dirichlet distribution α to generate a topic distribution θ_i of document i

2. Sampling from the polynomial distribution θ_i to generate the topic z_ijof the j^th word of the document i

3. Sampling from Dirichlet distribution β to generate a word distribution φi of topic z_ij

4. Sampling from the word distribution φi to generate the result word w_ij