1.1. Regression versus Classification

Regression algorithms could be seen as a type of machine learning algorithm used to predict numerical values based on input data. Regression algorithms attempt to find a relationship between the input variables and the output variable by fitting a mathematical model to the data. The goal of regression is to find a mathematical relationship between the input features and the target variable that can be used to make accurate predictions on new, unseen data [1].

Types of regression models

A list of other possible regression models are [1]:

1. Logistic regression: Logistic regression is a popular algorithm used for binary classification problems, where the target variable has two possible outcomes (e.g., yes/no, true/false, 0/1).

2. Polynomial regression: Polynomial regression is an extension of linear regression where the relationship between the variables is modeled using a polynomial equation.

3. Ridge regression: Ridge regression is a regularization technique that addresses the issue of overfitting in linear regression.

4. Lasso regression: Lasso regression, similar to ridge regression, is a regularization technique used to combat overfitting.

5. Elastic Net Regression: ElasticNet regression combines both ridge and lasso regularization techniques.

Despite its name, logistic regression is a classification algorithm, not a regression algorithm. And the next text gives the reason for this.

About Classification

On the other hand, Classification is an algorithm that finds functions that help divide the dataset into classes based on various parameters. When using a Classification algorithm, a computer program gets taught on the training dataset and categorizes the data into various categories depending on what it learned [2].

Classification algorithms find the mapping function to map the “x” input to “y” discrete output. The algorithms estimate discrete values (in other words, binary values such as 0 and 1, yes and no, true or false, based on a particular set of independent variables. To put it another, more straightforward way, classification algorithms predict an event occurrence probability by fitting data to a logit function.

Classification algorithms are used for things like email and spam classification, predicting the willingness of bank customers to pay their loans, and identifying cancer tumor cells.

Types of classification algorithms

A list of other possible classification models are [2]:

1. Decision Tree Classification: This type divides a dataset into segments based on particular feature variables. The divisions’ threshold values are typically the mean or mode of the feature variable in question if they happen to be numerical.

2. K-Nearest Neighbors: This Classification type identifies the K nearest neighbors to a given observation point. It then uses K points to evaluate the proportions of each type of target variable and predicts the target variable that has the highest ratio.

3. Logistic Regression: This classification type isn't complex so it can be easily adopted with minimal training. It predicts the probability of Y being associated with the X input variable.

4. Naïve Bayes: This classifier is one of the most effective yet simplest algorithms. It’s based on Bayes’ theorem, which describes how event probability is evaluated based on the previous knowledge of conditions that could be related to the event.

5. Random Forest Classification: Random forest processes many decision trees, each one predicting a value for target variable probability. You then arrive at the final output by averaging the probabilities.

6. Support Vector Machines: This algorithm employs support vector classifiers with an exciting change, making it ideal for evaluating non-linear decision boundaries. This process is possible by enlarging feature variable space by employing special functions known as kernels.

Regression versus Classification

The next figure from [2] helps to summarize and understand the difference between regression and classification algorithms.

Classification Algorithms Categories

In classification, data is categorized under different labels according to some parameters given in the input and then the labels are predicted for the data, in classification task, is supposed to [3]:

1. Predict discrete target variables (class labels) using independent features,

2. Find a decision boundary that can separate the different classes in the target variable.

The next figure illustrates the difference between predicting discrete target variables and finding a decision boundary [4].

Another way to categorize classification processes is in two groups: one the data can be divided using a binary variable (0/1), and in another, the data should be divided into multiple discrete labels. The next figure illustrates another way to categorize classification algorithms in terms of the difference between Binary and Multi-Class classification [3] as done in the next figure.

Machine Learning Area

According to [5], regression and classification could be grouped inside the Machine Learning area (under the supervised learning subarea) within the reinforcement learning as organized in the following figure.

Supervised learning is the type of machine learning in which machines are trained using well “labeled” training data, and on the basis of that data, machines predict the output [5].

References

[1] https://medium.com/@arunp77/regression-algorithms-29f112797724

[2] https://www.simplilearn.com/regression-vs-classification-in-machine-learning-article

[3] https://www.geeksforgeeks.org/ml-classification-vs-regression/

[4] https://hackernoon.com/how-to-plot-a-decision-boundary-for-machine-learning-algorithms-in-python-3o1n3w07

[5] https://medium.com/@arunp77/machine-learning-an-introductory-tutorial-for-beginners-1957475e6c0

Recommended Book

Machine Learning using Python [Print Replica] Kindle Edition