One popular way to introduce classifiers is to use logistic regression. The idea of a logistic regression is to span the interval (0,1) to the real line where 0 is associated with -∞ and 1 with +∞. This way one can map the values between (0,1) to the real line and then use the linear regression to model it. Then for making a forecast we map back the regression value to the probability space (0,1) and find the class. The function that does this job is called logit function.
In linear regression, one can find a linear relationship between the label and the features (or factors or attributes). However, if the label is not a real number, but 0 or 1 (or a probability in general) then the linear regression cannot give a reasonable correspondence between the label and the features (factors or attributes).
For logistic regression, one needs to map the real line to the interval (0,1), and vise versa. So, we make a correspondence between the real numbers and the probability space. This is done by a function that is called the logit function:
p↦logit(p)=log(p/(1-p)).
When a probability is mapped to the real line then we can use linear regression to find the impact of the factors. After training is done and all validation (and test) is also verified, then one can use the logistic regression to make a prediction of a 0-1 label by features (attributes). The way it is done is to set a threshold, say 0.5, and if the result of the regression is a value greater than the threshold then the prediction is 1 and otherwise, it is 0.
Lets us consider the following example. There are two classes, red balls, associated with value 1, and blue balls, associated with value 0. We want to find a logistic classifier to correctly split the two classes. Here, we have only one feature, that is the value on the real line, which means essentially we are looking for a mapping from the real numbers to {0,1}. As you can see the dashed curve, will provide a mapping from the real numbers to (0,1). However, in order to have a classifier, we have to be able to map the real line to {0,1}. For that then we have to set also a threshold. Any value of the mapping below the threshold is classified as 0 and otherwise 1. This in the picture can be illustrated by a horizontal line, that cuts off the dashed curve: anything below that is classified as 0.
Here you can see a logistic classifier, where logistic regression is trained on the data. In the following figure three different thresholds are used, one close to 1, one in the middle, and one close to 0. Each threshold gives a different classifier that can specify the circles representing TP (light blue), FN (light red), FP (dark red), and TN (dark blue). As one can see there is a trade-off between the TPR and FPR for these three cases, where we have located them on the Cartesian system. If the model needs higher specificity (for insurance example), then the first model is better, as its x component (1-specificity) is close to 0. As one can see, this model can identify a good number of true negatives (TN), which helps the decision-maker in the insurance company to catch claims that do not need to be accommodated. The downside of this model is that in many cases it misses identifying true claims (i.e., false negative). On the other hand, the third model is better for a decision-maker that needs higher precision (for instance in credit card example). As one can see this model has a higher rate of specifying true positives. The downside is that there might be lots of unnecessary warnings (i.e., false positives). However, if one needs to have a model with a good balance one can take a look at the F1 score. In that case, the model in the middle is ranked higher.
In this slide we have another fit of the logistic regression which gives different confusion matrices and also ROC curves.
Here we compare the ROC curves of the two fits of the logistic regressions. As one can see neither can outperform the other model. In some areas, the blue curve (first model) and some points the red curve is more towards the point (0,1). For instance, when the threshold is equal to 0.5 the red curve can better classify the classes, as it is also clear from the ROC curve.