Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.
Bayes’ Theorem finds the probability of an event occurring given the probability of another event that has already occurred. Bayes’ theorem is stated mathematically as the following equation:
P(A|B) = [P(B|A)* P(A)]/P(B)
And, with regards to our dataset, we can apply Bayes’ theorem in following way:
P(y|X) = [P(X|y)* P(y)]/P(X)
Where, y is class variable and X is a dependent feature vector (of size n) where:
X = (x_1,x_2,x_3,.....,x_n)
Now, its time to put a naive assumption to the Bayes’ theorem, which is, independence among the features. So now, we split evidence into the independent parts.
If any two events A and B are independent, then
P(y|x_1,...,x_n) = [P(x_1|y)P(x_2|y)...P(x_n|y)P(y)]\[P(x_1)P(x_2)...P(x_n)]