4th week: Using pre-trained models
Machine learning technique where a model developed for a task is reused as the starting point for a model on a second, related task.
Advantages
Reduced Training Time: You'll need much less training time than training a CNN from scratch.
Less Data Required: Transfer learning works well even with smaller datasets.
Improved Performance: Often leads to higher accuracy and better generalization.
Implementing transfer learning
Alpaca Calssifier: we use MobileNetV2 which has been trained on ImageNet dataset to classify whether an image is an Alpaca / Not Alpaca.
Ants VS Bees Classifier: we use ResNet, known for its ability to train deep networks efficiently to classify whether an image have Ants / Bees
Neural networks that have undergone prior training on extensive datasets.
Fine-tuning MobileNetV2 to classify Alpacas
MobileNetV2 : a class of lightweight convolutional neural network architectures designed for mobile and embedded vision applications. Trained on ImageNet, a dataset containing over 14 million images and 1000 classes, it emphasizes efficiency and reduced computational cost.
MobileNetV2 employs a specific type of convolution called depthwise separable convolution as its fundamental building block. These specialized convolutions are designed to be more efficient compared to traditional convolutions, which often demand substantial computational resources.
Inside its Convolutional Building block:
Depthwise Convolution: Instead of processing all input channels simultaneously like traditional convolutions, the depthwise convolution handles each input channel individually. It applies a separate filter to each channel, generating an intermediate feature map for each channel independently. This initial step effectively isolates spatial information within each channel.
Pointwise Convolution: combines the intermediate feature maps produced in the depthwise step. It uses a 1x1 convolution (also known as a pointwise convolution) to merge the outputs of the depthwise convolution into a single output feature map. This 1x1 convolution is applied across all filters in the output layer, enabling the network to learn relationships between different channels and generate a final output with the desired number of channels.
Key features:
Depthwise Separable Convolutions: Instead of processing all input channels at once, they handle each channel separately and then merge the results. This significantly reduces computations and parameter count compared to standard convolutions. These convolutions also handle both spatial (height/width) and depth (number of channels) dimensions of the input.
Input/Output Bottlenecks: Thin layers at the start and end of the convolutional block act as information checkpoints, ensuring important features are preserved during computations. This prevents information loss during the process.
Implementation
base_model = models.mobilenet_v2(pretrained=True)
Imports the MobileNetv2 model from the torchvision.models library.
The pretrained=True argument sets it to use pre-trained weights, meaning the model comes pre-trained on a large image dataset (ImageNet).
base_model.eval()
Puts the model in evaluation mode, which disables features like dropout (a regularization technique) commonly used during training. Evaluation mode is used when you want to test the performance of the model on new data.4.
base_model = base_model.to(device)
Transfers the base_model to the chosen device ("cuda" or "cpu"). This is important because computations on a GPU are typically much faster than on a CPU, especially for deep learning models.5.
summary(base_model, (3, 160, 160))
- Print a summary of the model architecture, including the number of layers, parameters, etc. It shows the structure of the model and helps understand its complexity. This line might be commented out since it doesn't directly affect the functionality but provides information about the model.
Fine-Tuning
Original dataset loaded
The Fine-tuned model has better accuracy but higher training and validation loss.
Fine-Tuned model
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=2, prefetch_factor=2)
validation_loader = DataLoader(validation_dataset, batch_size=BATCH_SIZE, shuffle=False, num_workers=2, prefetch_factor=2)
Efficient Data Loading with DataLoader : PyTorch's DataLoader class facilitates efficient data loading by leveraging multi-threading, enabling parallel loading of data from disk. This parallel approach mitigates memory bottlenecks, ensuring that the GPU has a continuous flow of data during training.
data_augmentation = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(20),
transforms.Resize(IMG_SIZE),
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
Data Augmentation for Improved Generalization: To enhance our model's ability to generalize effectively to new, unseen data, it is standard practice to augment your training dataset. Data augmentation involves applying random transformations to your training data, such as flipping, rotating, cropping, or color adjustments. These transformations introduce diversity into the training set, exposing the model to variations of the original images or data, which in turn helps the model to learn more robust features and be less prone to overfitting.
Using it to predict the previous batch of images, we can see its increased accuracy through its fully correct prediction.
Here the AI wrongly predicted the bottom left image as its prediction label was different than the label(correct answer).
Upgrading our AI: After optimizing some parameters, further training and validating the AI, we are able to increase its training and validation accuracy and decrease its loss.
with torch.no_grad():
feature_batch = base_model(image_batch)
Let the AI predict a batch of training data.
plt.figure(figsize=(10, 10))
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
imshow(image_batch[i], f'Label: {label_batch[i].item()}, Pred: {preds[i].item()}')
plt.show()
Using 'for' function to iterate and visualize the predictions of the AI along with the predicted image set.
Fine-tuning ResNet18 to classify bees and ants
ResNet(Residual Network): a deep convolutional neural network architecture known for its ability to train very deep networks effectively.
It addresses the vanishing gradient problem, which can hinder the training of deep networks, by introducing skip connections (also called residual connections). These connections allow the network to learn residual functions, making it easier to optimize.
Feature Extraction: The pre-trained model will be used as a feature extractor. Instead of training a CNN from scratch on the Alpaca dataset, we are utilizing the learned features of the pre-trained model. We would take the output of an intermediate layer of the pre-trained model as our image features.
New Classifier: we will add a new classification layer on top of the extracted features. This layer is specifically trained to classify images as Alpaca or Not Alpaca. The idea is that the features learned by the pre-trained model (edges, textures, object parts, etc.) can generalize to the alpaca classification task.
Original Dataset
model = models.resnet18(pretrained=True)
Loads the ResNet18 model with it pre-trained
data_transforms = {
'train': transforms.Compose([
transforms.RandomResizedCrop(244),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(244),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
Preprocessing: Preparing the images for the model by resizing, cropping, and normalizing them.
Data Augmentation: Introducing variations into the training images to improve model generalization and robustness.
Fine-Tuning
model.fc = nn.Linear(512, 2)
replaces the original classification layer of the pre-trained model with a new, binary classification layer.
optimizer = optim.Adam(model.parameters(), lr=1e-4)
initializes the optimizer responsible for updating the model's parameters based on the calculated gradients.
model.parameters(): This indicates that the optimizer will manage all the parameters of your model (although we've frozen most of them).
lr=1e-4: This sets the learning rate to 0.0001, controlling the step size taken by the optimizer during each update. A small learning rate is often used during fine-tuning to avoid disrupting the pre-trained weights.
history = train_and_validate_gpu(model, dataloaders['train'], dataloaders['val'], criterion, optimizer, num_epochs=8)
plot_training_history_gpu(history)
Trains and validates our model for 8 epochs
params_to_update = [param for param in model.parameters() if param.requires_grad]
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(params_to_update, lr=1e-4)
history = train_and_validate_gpu(model, dataloaders['train'], dataloaders['val'], criterion, optimizer, num_epochs=2)
plot_training_history_gpu(history)
Trains and validates our model for 2 epochs
From the above numbers and graph, we can see after our second training and fine tuning, our model is slightly more accurate and decreased in training and accuracy loss.
Week 4: using AI models made and ready, then fine-tuning them to suit our needs. A great way to utilize resources available, much better than reinventing the wheel.