Advanced Applied Deep Learning
Practice Course
Sheng Yun Wu
Practice Course
Sheng Yun Wu
In Week 7, students are introduced to the fundamentals of object detection and popular architectures used in object detection tasks, such as Faster R-CNN, SSD (Single Shot Multibox Detector), and YOLO (You Only Look Once). They will learn how object detection differs from classification and how these architectures address the challenges of detecting objects in real-time.
Description:
This example introduces the concept of object detection using bounding boxes and demonstrates how to draw bounding boxes around objects in an image.
import cv2
import matplotlib.pyplot as plt
# Load an example image
image = cv2.imread(r'C:\Users\asd01\tensorflow_datasets\cats_vs_dogs\train\cats\3.jpg')
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Define bounding box coordinates: (startX, startY, endX, endY)
startX, startY, endX, endY = 100, 150, 300, 400
# Draw bounding box on the image
cv2.rectangle(image_rgb, (startX, startY), (endX, endY), (255, 0, 0), 2)
# Display the image
plt.imshow(image_rgb)
plt.axis('off')
plt.show()
Description:
This example provides a high-level overview of object detection architectures and explains how Faster R-CNN, SSD, and YOLO work differently.
No Code for this example, purely theoretical explanation
Description:
In this example, students will use a pre-trained Faster R-CNN model to perform object detection on an image using PyTorch.
import torch
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from torchvision import transforms
from PIL import Image
import matplotlib.pyplot as plt
# Load the pre-trained Faster R-CNN model
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()
# Load and preprocess the input image
image = Image.open(r'C:\Users\asd01\tensorflow_datasets\cats_vs_dogs\train\cats\3.jpg')
transform = transforms.Compose([transforms.ToTensor()])
input_image = transform(image).unsqueeze(0)
# Perform object detection
with torch.no_grad():
predictions = model(input_image)
# Visualize results
image_np = image.convert("RGB")
plt.imshow(image_np)
plt.axis('off')
plt.show()
# Print bounding boxes and labels for the detected objects
for element in predictions[0]['boxes']:
print(f'Bounding box: {element}')
import matplotlib.patches as patches
# Create a figure and axes
fig, ax = plt.subplots(1)
ax.imshow(image_np)
# Draw bounding boxes
for box in predictions[0]['boxes']:
xmin, ymin, xmax, ymax = box
rect = patches.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin, linewidth=2, edgecolor='r', facecolor='none')
ax.add_patch(rect)
plt.axis('off')
plt.show()
Description:
This example demonstrates how to use a pre-trained SSD model in TensorFlow to detect objects in an image.
import numpy as np
import tensorflow as tf
from PIL import Image
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
# Load a pre-trained SSD model from TensorFlow Hub
model = tf.saved_model.load("ssd_mobilenet_v2_fpnlite_320x320/saved_model")
# Create the category index
category_index = label_map_util.create_category_index_from_labelmap(r"C:\Users\asd01\mscoco_label_map.pbtxt", use_display_name=True)
# Load the image and preprocess it
image_path = r'C:\Users\asd01\tensorflow_datasets\cats_vs_dogs\train\cats\3.jpg'
image = np.array(Image.open(image_path))
input_tensor = tf.convert_to_tensor(image)
input_tensor = tf.image.resize(image, [320, 320])
input_tensor = tf.cast(input_tensor, tf.uint8)
input_tensor = tf.expand_dims(input_tensor, 0)
# Perform object detection
detections = model(input_tensor)
image_with_detections = image.copy()
# Visualize results
viz_utils.visualize_boxes_and_labels_on_image_array(
image_with_detections,
detections['detection_boxes'][0].numpy(),
detections['detection_classes'][0].numpy().astype(int),
detections['detection_scores'][0].numpy(),
category_index,
use_normalized_coordinates=True,
line_thickness=5)
plt.imshow(image_with_detections, cmap='viridis')
plt.axis('off')
plt.show()
Description:
This example demonstrates how to use the YOLOv5 model for real-time object detection using PyTorch.
import torch
image_path = r'C:\Users\asd01\tensorflow_datasets\cats_vs_dogs\train\cats\3.jpg'
# Load YOLOv5 model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
# Perform object detection on an image
results = model(image_path)
# Display the results
results.show()
# Print bounding boxes, labels, and confidence scores
print(results.xyxy[0]) # Bounding boxes, labels, and confidence scores
Description:
In this example, students will use OpenCV to capture video from a webcam and perform real-time object detection using a pre-trained YOLOv3 model.
import cv2
import numpy as np
# Load YOLO model and configuration files
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
# Load the COCO class labels
with open("coco.names", "r") as f:
classes = [line.strip() for line in f.readlines()]
# Capture video from webcam
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
height, width, channels = frame.shape
# Prepare the frame for YOLO
blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)
# Processing detections
for out in outs:
for detection in out:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.5:
# Object detected
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
# Draw bounding box
x = int(center_x - w / 2)
y = int(center_y - h / 2)
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
# Add label
label = str(classes[class_id])
cv2.putText(frame, label, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
# Display the frame
cv2.imshow("Real-Time Object Detection", frame)
# Break the loop if 'q' is pressed
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Description:
In this example, students will learn how to evaluate the performance of an object detection model using the mean Average Precision (mAP) metric.
# Assuming predictions are stored in the following format: [boxes, labels, scores]
# Example: boxes = [[x1, y1, x2, y2], [x1, y1, x2, y2], ...]
from sklearn.metrics import average_precision_score
true_labels = [1, 0, 0, 1] # Ground truth labels
pred_scores = [0.9, 0.3, 0.5, 0.8] # Model predictions (confidence scores)
pred_labels = [1, 0, 0, 1] # Predicted labels
# Calculate mAP (mean Average Precision)
mAP = average_precision_score(true_labels, pred_scores)
print(f"Mean Average Precision: {mAP}")
Description:
This example explains the concept of anchor boxes, used in Faster R-CNN and SSD, to handle objects of varying scales and aspect ratios.
No Code for this example, purely theoretical explanation
Description:
This example guides students on how to create a custom object detection dataset and annotate it using tools like LabelImg.
No Code for this example, practical tool-based explanation
Description:
In this final example, students will learn how to train a YOLOv5 model on a custom dataset by fine-tuning the model weights.
# Assuming custom dataset is prepared and labeled in YOLO format
# Train YOLOv5 on a custom dataset
!python train.py --img 640 --batch 16 --epochs 30 --data custom_data.yaml --weights yolov5s.pt
# Visualize training progress and results
Objective: Introduce object detection and understand popular architectures such as Faster R-CNN, SSD, and YOLO.
Skills Developed:
Understand object detection concepts like bounding boxes, anchor boxes, and confidence scores.
Use pre-trained object detection models (Faster R-CNN, SSD, YOLO) to detect objects in images and videos.
Implement real-time object detection using YOLO and OpenCV.
Evaluate object detection performance using mAP.
Tools: PyTorch, TensorFlow, YOLO, OpenCV, COCO dataset.
These 10 examples in Week 7 provide students with a hands-on understanding of object detection tasks and models. Students will learn how to use pre-trained models and implement real-time object detection, setting the stage for more advanced topics like training custom models in subsequent weeks.