In machine learning, a classifier is an algorithm that is trained to make predictions or decisions based on input data. A classifier takes input data, processes it using a mathematical model that has been trained on previous data, and produces output predictions or decisions. The output of a classifier is typically a class label or a probability distribution over possible class labels.
For example, in a binary classification problem, a classifier might take an input consisting of a set of features that describe some object or phenomenon, and output a binary class label indicating which of two possible classes the object belongs to (e.g., "positive" or "negative," "spam" or "not spam," "fraudulent" or "legitimate," etc.). In a multi-class classification problem, a classifier might output a class label from a set of more than two possible classes.
Classifiers can be trained using a variety of machine-learning techniques, such as decision trees, random forests, support vector machines, logistic regression, and neural networks. The choice of the classifier depends on the nature of the problem being solved, the amount and quality of available training data, and other factors.
At a high level, Naive Bayes is a probabilistic classification algorithm that makes predictions based on the probability of an input belonging to a certain class. To understand how Naive Bayes works, let's break down the components of Bayes' theorem:
Where:
P(A|B) is the probability of A given B, i.e., the probability of an input belonging to a certain class given the observed features.
P(B|A) is the probability of B given A, i.e., the probability of observing the features given the input belongs to a certain class.
P(A) is the prior probability of A, i.e., the probability of an input belonging to a certain class before observing any features.
P(B) is the prior probability of B, i.e., the probability of observing the features in general.
In Naive Bayes, we make the simplifying assumption that the features are conditionally independent given the class label, which means that the presence or absence of one feature does not affect the probability of the presence or absence of another feature. This is often not true in practice, but it simplifies the calculation of the probabilities and can lead to good results in many cases.
To train a Naive Bayes classifier, it starts by calculating the prior probability of each class based on the frequency of examples of that class in the training data. It also calculates the conditional probability of each feature given each class, which tells us the probability of observing each feature given an input belonging to a certain class. these probabilities are then used to calculate the posterior probability of each class given new input.
When the algorithm is given a new input, it calculates the posterior probability of each class given the observed features, using Bayes' theorem. It then selects the class with the highest posterior probability as the predicted class for that input.
Naive Bayes can be used for both binary and multi-class classification problems and can handle both discrete and continuous input features. It is often used in text classification tasks, such as spam filtering or sentiment analysis, where each input is a piece of text and the features are the frequency of each word in the text.
Fig Source: https://www.turing.com/kb/an-introduction-to-naive-bayes-algorithm-for-beginners
There are three main types of Naive Bayes classifiers:
Bernoulli Naive Bayes:
The Bernoulli Naive Bayes classifier is used when the features are binary (i.e., they can take on only two values, such as 0 or 1). It is often used in text classification tasks, where the presence or absence of a certain word in a document is used as a feature. For example, if we are trying to classify emails as spam or not spam, we might use the presence or absence of certain words (such as "free", "money-back guarantee", etc.) as binary features.
The Bernoulli Naive Bayes classifier calculates the conditional probability of each feature given each class label and uses these probabilities to make predictions. Specifically, it assumes that each feature is independent of all other features and that the probability of each feature being present or absent is the same for all instances of a given class.
Multinomial Naive Bayes:
The Multinomial Naive Bayes classifier is used when the features represent counts or frequencies of occurrences of events (such as the frequency of each word in a document). It is also commonly used in text classification tasks. For example, if we are trying to classify documents into different categories based on the frequency of certain words, we might use the frequency of each word as a feature.
The Multinomial Naive Bayes classifier calculates the conditional probability of each feature given each class label and uses these probabilities to make predictions. Specifically, it assumes that the counts or frequencies of each feature are independent given the class label and that the probability of observing a certain count or frequency of each feature is given by a multinomial distribution.
3. Gaussian Naive Bayes:
The Gaussian Naive Bayes classifier is used when the features are continuous values, such as measurements from sensors or physical properties of objects. It assumes that the probability distribution of each feature given each class label is Gaussian (i.e., normally distributed). For example, if we are trying to classify images of digits based on their pixel values, we might use the pixel values as continuous features.
The Gaussian Naive Bayes classifier calculates the conditional probability of each feature given each class label, assuming that each feature is normally distributed within each class. It then uses these probabilities to make predictions.
In summary, each type of Naive Bayes classifier makes a different assumption about the distribution of the features and uses this assumption to calculate the conditional probabilities needed to make predictions. Despite their simplicity, Naive Bayes classifiers can be effective in practice and are widely used in many real-world applications.