BCELoss: Binary Cross-entropy Loss is a function used to measure the error of our model's predictions. It essentially quantifies the difference between the predicted probability and the actual label (ground truth). The lower the loss, the better our model is performing.
Predicted Probability (p): The output of our model, representing the probability that the input belongs to the positive class (cat). It's a value between 0 and 1.
True Label (y): This is the actual label of the input. It's either 1 (positive- cat) or 0 (negative- non-cat).
BCELoss Formula
L = - (y * log(p) + (1 - y) * log(1 - p))
crit = nn.BCELoss()
In Pytorch: this creates a BCELoss object that can be used to calculate the loss during training.
Why it works well with Sigmoid: The sigmoid function is often used as the activation function in the output layer of binary classification models because it maps the output to a probability between 0 and 1. BCELoss is designed to work with probabilities, making it a good choice when using a sigmoid output layer.
Here we have an image from the datset given to us.
Binary Classification: predict whether the images in the dataset belongs to one of two classes (cat or non-cat). Our model outputs a probability score between 0 and 1, representing the likelihood of the input belonging to the positive class (cat).
If y = 1 (input is a cat): The formula simplifies to -log(p). This means the loss is high if the predicted probability p is low (the model is less confident that it's a cat). The loss decreases as p approaches 1 (the model is more confident that it's a cat).
If y = 0 (input is not a cat): The formula simplifies to -log(1 - p). This means the loss is high if the predicted probability p is high confident that it's a cat, but it's wrong). The loss decreases as p approaches 0 (more confident that it's not a cat).
A popular optimization algorithm used in deep learning to update the parameters of a model during training. It's known for its efficiency and ability to handle large datasets and sparse gradients.
Implementation in Pytorch:
opt = optim.Adam(model_cnn.parameters(), lr=0.001)
This creates an Adam optimizer object.
model_cnn.parameters() specifies the parameters to be optimized.
lr=0.001 sets the initial learning rate.
How it works...
Calculate the First Moment (Mean of Gradients):
This is like an average of the recent gradients for a parameter.
It's denoted by mt and is calculated using a moving average:
mt = β1 * mt-1 + (1 - β1) * gradient
where `β1` is a hyperparameter (usually 0.9) that controls how much weight is given to past gradients.
Calculate the Second Moment (Uncentered Variance of Gradients):
This is like an average of the squared recent gradients for a parameter.
It's denoted by vt and is calculated using a moving average:
vt = β2 * vt-1 + (1 - β2) * gradient^2
where `β2` is a hyperparameter (usually 0.999) that controls how much weight is given to past squared gradients.
Bias Correction: The initial values of mt and vt are biased towards zero. To correct for this, Adam applies bias correction:
m^t = mt / (1 - β1^t)
v^t = vt / (1 - β2^t)
where `t` is the current iteration number.
Update the Parameters: Finally, Adam updates the parameters using the following formula:
θt+1 = θt - (learning_rate * m^t) / (sqrt(v^t) + epsilon)
where `learning_rate` is the learning rate, `epsilon` is a small constant to prevent division by zero (usually 1e-8).
Logistic Regression is a simple but powerful algorithm used for binary classification. It predicts the probability of an input belonging to a specific class (e.g., cat or non-cat).
How it works:
Linear Transformation: Input features (x) are combined with weights (W) and a bias (b) to produce a score (z)
z = W⋅x + b
Sigmoid Function: The score (z) is then passed through a sigmoid function (σ) to get a probability between 0 and 1
probability = σ(z) = 1 / (1 + e^-z)
This probability represents how likely the input belongs to the positive class (cat).
Logistic Regression uses a linear equation to calculate a score, and then squashes that score into a probability using the sigmoid function. This probability is used to make the final classification decision (e.g., if probability > 0.5, classify as 'cat').
class LogisticRegression(nn.Module):
def __init__(self):
super(LogisticRegression, self).__init__()
self.linear = nn.Linear(3*64*64, 1)
Creates a linear layer (nn.Linear) that takes an input of size 3*64*64 (representing flattened images) and outputs a single value.
self.sigmoid = nn.Sigmoid()
creates a sigmoid activation function (nn.Sigmoid).
def forward(self,x):
x = x.reshape(x.size(0), -1)
x = self.linear(x)
x = self.sigmoid(x)
return x
model_lr = LogisticRegression().to(device)
summary(model_lr, (3, 64, 64))
This creates an instance of the LogisticRegression class called model_lr.
to(device) moves the model to the specified device (e.g., GPU if available).
summary prints a summary of the model's architecture.
Logistic Regression(LR)
Strengths: Simple, interpretable, computationally efficient, less prone to overfitting with limited data.
Weaknesses: Limited to linear relationships, may not perform well on complex datasets.
Neural Networks (CNN)
Strengths: Can learn complex patterns, extract hierarchical features, high accuracy on large datasets.
Weaknesses: More complex, computationally expensive, requires more data, prone to overfitting, less interpretable.
Accuracy Test
66%
86%
Training & Validation Loss
Higher loss
train_losses, val_losses = train(model_lr, crit, opt, train_x, train_y, test_x, test_y, num_epochs=num_epochs)
Lower loss
train_losses, val_losses = train(model_cnn, crit, opt, train_x, train_y, test_x, test_y, num_epochs=num_epochs)
From the above, LR produces lower accuracy and has higher training and validation loss, proving that CNN model is superior.
AI models(CNN and LR): We can see LR's lower accuracy through its failed prediction of a cat on the left. This indicates that the CNN model significantly outperforms the Logistic Regression model in terms of accuracy for classifying cat images.
The superior performance of the CNN model can be attributed to its ability to learn complex spatial features from images through convolutional layers. Logistic Regression, being a linear model, is limited in capturing these intricate features, resulting in lower accuracy.
Not purrfect
AI is not always correct, as the CNN model with higher accuracy still predicted this cat image incorrectly as a non-cat.
This project reinforced: the importance of model selection, careful training, and performance evaluation in achieving desired results. Even then, AI also has its flaws, just like us humans.