Advanced Applied Deep Learning
Practice Course
Sheng Yun Wu
Practice Course
Sheng Yun Wu
In Week 10, students will explore advanced object detection techniques to improve model accuracy, efficiency, and performance. This includes multi-scale detection, feature pyramid networks (FPN), anchor box optimization, non-maximum suppression (NMS), and augmentation techniques specific to object detection tasks. By the end of the week, students will understand how to refine and extend standard object detection methods.
Description:
This example introduces multi-scale detection, a technique that allows the detection of objects of varying sizes by utilizing multiple resolutions during training and inference.
No Code for this example – Theoretical Explanation
Use models like Faster R-CNN and SSD, which inherently handle multi-scale detection.
Multi-scale detection involves detecting objects at different resolutions to ensure accuracy across different object sizes.
Description:
This example demonstrates how to use Feature Pyramid Networks (FPN) with Faster R-CNN to enhance object detection for objects of varying scales by combining low and high-level features.
import torch
from torchvision.models.detection import fasterrcnn_resnet50_fpn
# Load pre-trained Faster R-CNN with FPN
model = fasterrcnn_resnet50_fpn(pretrained=True)
# Define dataset and transformations
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.ImageFolder('custom_dataset/train/images/', transform=transform)
train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True)
# Train the model
optimizer = torch.optim.SGD(model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005)
model.train()
for epoch in range(10):
for images, targets in train_loader:
optimizer.zero_grad()
losses = model(images, targets)
loss = sum(loss for loss in losses.values())
loss.backward()
optimizer.step()
# Save the model with FPN
torch.save(model.state_dict(), 'faster_rcnn_fpn.pth')
Description:
This example explains how to optimize anchor boxes in YOLOv5 to better fit custom objects and improve detection accuracy, especially for objects of different aspect ratios and sizes.
# Optimize anchor boxes for YOLOv5
!python train.py --img 640 --batch 16 --epochs 50 --data custom_dataset/data.yaml --weights yolov5s.pt --rect --cache --hyp hyp.custom.yaml
Code explain:
The --rect argument optimizes for rectangular training, and the --cache argument caches the dataset to speed up training.
The hyp.custom.yaml file can be used to customize anchor settings.
Description:
This example demonstrates how to implement Non-Maximum Suppression (NMS) to remove overlapping bounding boxes and retain the most confident detection for an object.
import numpy as np
# Example predictions (bounding boxes and confidence scores)
boxes = np.array([[100, 100, 200, 200], [105, 105, 205, 205], [300, 300, 400, 400]])
scores = np.array([0.9, 0.85, 0.6])
# Define a function for non-maximum suppression (NMS)
def non_max_suppression(boxes, scores, threshold=0.5):
idxs = np.argsort(scores)[::-1]
keep = []
while len(idxs) > 0:
i = idxs[0]
keep.append(i)
# Compute IoU (Intersection over Union)
ious = compute_iou(boxes[i], boxes[idxs[1:]])
idxs = idxs[1:][ious <= threshold]
return keep
# Perform NMS on predicted boxes
keep_indices = non_max_suppression(boxes, scores)
print(f"Indices of boxes kept after NMS: {keep_indices}")
Description:
This example shows how to apply advanced data augmentation techniques specifically designed for object detection tasks, including flipping, rotation, scaling, and color jittering.
from albumentations import Compose, HorizontalFlip, ShiftScaleRotate, RandomBrightnessContrast
import cv2
# Define augmentations for object detection
augment = Compose([
HorizontalFlip(p=0.5),
ShiftScaleRotate(shift_limit=0.1, scale_limit=0.1, rotate_limit=15, p=0.5),
RandomBrightnessContrast(p=0.5)
])
# Load an image and apply augmentations
image = cv2.imread('image.jpg')
bboxes = [[100, 150, 200, 250]] # Example bounding box
augmented = augment(image=image, bboxes=bboxes)
aug_image = augmented['image']
aug_bboxes = augmented['bboxes']
# Visualize augmented image and bounding boxes
for box in aug_bboxes:
startX, startY, endX, endY = map(int, box)
cv2.rectangle(aug_image, (startX, startY), (endX, endY), (255, 0, 0), 2)
cv2.imshow("Augmented Image", aug_image)
cv2.waitKey(0)
Description:
In this example, students will learn how to train YOLOv5 to detect multiple object classes simultaneously on a custom dataset.
# Define multiple classes in data.yaml
train: custom_dataset/train/images/
val: custom_dataset/val/images/
nc: 3 # Example: 3 classes (dog, cat, person)
names: ['dog', 'cat', 'person']
# Train YOLOv5 for multi-class detection
!python train.py --img 640 --batch 16 --epochs 50 --data custom_dataset/data.yaml --weights yolov5s.pt
Description:
This example demonstrates how to convert a TensorFlow object detection model to TensorFlow Lite (TFLite) for real-time inference on mobile devices.
# Convert a trained TensorFlow model to TensorFlow Lite format
converter = tf.lite.TFLiteConverter.from_saved_model('ssd_finetuned_custom')
tflite_model = converter.convert()
# Save the TFLite model
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
# Perform real-time inference using TFLite on mobile devices
import tflite_runtime.interpreter as tflite
interpreter = tflite.Interpreter(model_path='model.tflite')
interpreter.allocate_tensors()
# Perform inference on new data
# (additional code for loading input data and running the interpreter)
Description:
In this example, students will learn how to use ensemble techniques to combine predictions from multiple object detection models to improve overall accuracy.
import numpy as np
# Example predictions from three different models
preds_model1 = np.array([[100, 150, 200, 250], [300, 350, 400, 450]])
preds_model2 = np.array([[105, 155, 205, 255], [295, 345, 395, 445]])
preds_model3 = np.array([[110, 160, 210, 260], [290, 340, 390, 440]])
# Combine predictions (simple averaging)
combined_preds = (preds_model1 + preds_model2 + preds_model3) / 3
# Visualize the combined bounding boxes
print(f"Ensemble bounding boxes: {combined_preds}")
Description:
This example introduces EfficientDet, a family of efficient object detection models that provide a good trade-off between speed and accuracy, and demonstrates how to train an EfficientDet model on a custom dataset.
# Train EfficientDet on custom dataset using TensorFlow Object Detection API
!python model_main_tf2.py --pipeline_config_path=efficientdet_d0_coco.config --model_dir=training/ --num_train_steps=10000 --sample_1_of_n_eval_examples=1 --alsologtostderr
Description:
This example shows how to deploy a trained object detection model using Intel’s OpenVINO toolkit for optimized inference on edge devices.
# Convert TensorFlow or PyTorch model to OpenVINO format
!mo.py --input_model faster_rcnn_fpn.pth --framework pytorch --output_dir openvino_model/
# Load and run the model on edge device
from openvino.inference_engine import IECore
ie = IECore()
model = ie.read_network(model='openvino_model/faster_rcnn_fpn.xml')
exec_net = ie.load_network(network=model, device_name='CPU')
# Perform inference (additional code for loading data and running inference)
Objective: Explore advanced techniques in object detection to improve model performance, speed, and accuracy.
Skills Developed:
Implement multi-scale detection, Feature Pyramid Networks (FPN), and anchor box optimization.
Apply advanced data augmentation techniques for object detection tasks.
Use Non-Maximum Suppression (NMS) and ensemble techniques to refine detection results.
Deploy object detection models on mobile and edge devices using TFLite and OpenVINO.
Tools: PyTorch, TensorFlow, YOLOv5, EfficientDet, OpenVINO, TensorFlow Lite.
By the end of Week 10, students will have a deep understanding of advanced techniques to improve object detection models' accuracy and efficiency, making them capable of deploying models in real-world applications such as mobile and edge devices.