Supervised learning is a type of machine learning where we train a model using labeled data — that means both the input and the desired output are known.
The model learns patterns and can then predict outputs for new, unseen data.
If we train a model on emails labeled spam or not spam, it can classify new emails automatically.
Definition:
A statistical model used for binary classification problems — where the output is either 0 or 1.
Equation:
�(�=1∣�)=11+�−(�0+�1�)
P(Y=1∣X)=
1+e
−(β
0
+β
1
X)
1
Explanation:
It uses the sigmoid function to map values between 0 and 1.
Code Example:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
print("Accuracy:", model.score(X_test, y_test))
Output:
Accuracy: 0.93
Diagram (Sigmoid Curve):
|
Definition:
A probabilistic classifier based on Bayes’ Theorem, assuming all features are independent.
Formula:
�(�∣�)=�(�∣�)�(�)�(�)
P(A∣B)=
P(B)
P(B∣A)P(A)
Example:
Used in spam filtering and sentiment analysis.
Code Example:
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
model = GaussianNB()
model.fit(X, y)
print(model.predict([[5.1, 3.5, 1.4, 0.2]]))
Output:
Predicted Class: [0]
Diagram:
Definition:
A classifier that finds the best decision boundary (hyperplane) to separate different classes.
Code Example:
from sklearn.svm import SVC
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
model = SVC(kernel='linear')
model.fit(X, y)
print("Accuracy:", model.score(X, y))
Output:
Accuracy: 0.98
Diagram:
Definition:
A tree-shaped model that splits data into branches based on feature values.
Example:
Used in customer churn prediction.
Code Example:
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
model = DecisionTreeClassifier()
model.fit(X, y)
print("Accuracy:", model.score(X, y))
Output:
Accuracy: 1.0
Diagram:
Definition:
Combines multiple models to improve performance (e.g., Bagging, Boosting, Random Forest).
Example:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
model = RandomForestClassifier()
model.fit(X, y)
print("Accuracy:", model.score(X, y))
Output:
Accuracy: 1.0
Diagram:
Confusion Matrix:
Metrics:
Accuracy = (TP + TN)/(TP + TN + FP + FN)
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F-score = 2 × (Precision × Recall)/(Precision + Recall)
AUC-ROC = Area under the Receiver Operating Characteristic curve.