Logistic Regression (Gemini)
Logistic Regression is a classification algorithm, not a regression algorithm in the traditional sense, despite its name. It's used to predict the probability of a binary outcome (e.g., yes/no, true/false, 0/1).
1. The Goal:
To model the probability that an instance belongs to a particular class.
2. The Core Idea: Sigmoid Function
Unlike linear regression which outputs a continuous value, logistic regression uses a sigmoid (or logistic) function to squeeze the output of a linear equation into a probability range between 0 and 1.
3. Decision Boundary:
Once you have the probability, you need a threshold (typically 0.5) to classify.
If $ P(\text{class=1}) \ge 0.5 $, predict Class 1.
If $ P(\text{class=1}) < 0.5 $, predict Class 0.
4. How it Learns (Cost Function):
Logistic Regression doesn't use Mean Squared Error like linear regression. Instead, it uses a log-loss (or cross-entropy) cost function to penalize incorrect probabilistic predictions.Â
The goal is to minimize this cost function, typically using an optimization algorithm like Gradient Descent.
5. When to Use It:
Binary Classification: Predicting whether an email is spam or not, whether a customer will churn, medical diagnosis (disease/no disease).
6. Key Advantages:
Simple and Interpretable: Easy to understand and implement.
Outputs Probabilities: Provides not just a classification but also a probability of belonging to a class.
Good Baseline: Often a good starting point for classification problems.
7. Key Disadvantages:
Assumes Linearity: Assumes a linear relationship between features and the log-odds of the outcome.
Sensitive to Outliers: Can be affected by extreme values.
Not suitable for complex relationships: May not perform well with highly non-linear data.
Logistic Regression takes a linear combination of inputs, squashes it through a sigmoid function to get a probability, and then uses a threshold to classify. It learns by minimizing log-loss.