Advanced Applied Deep Learning

Practice Course

Sheng Yun Wu

Week 7: Object Detection Fundamentals and Architectures

In Week 7, students are introduced to the fundamentals of object detection and popular architectures used in object detection tasks, such as Faster R-CNN, SSD (Single Shot Multibox Detector), and YOLO (You Only Look Once). They will learn how object detection differs from classification and how these architectures address the challenges of detecting objects in real-time.

Example 1: Introduction to Object Detection with Bounding Boxes

Description:
This example introduces the concept of object detection using bounding boxes and demonstrates how to draw bounding boxes around objects in an image.

import cv2

import matplotlib.pyplot as plt

# Load an example image

image = cv2.imread(r'C:\Users\asd01\tensorflow_datasets\cats_vs_dogs\train\cats\3.jpg')

image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Define bounding box coordinates: (startX, startY, endX, endY)

startX, startY, endX, endY = 100, 150, 300, 400

# Draw bounding box on the image

cv2.rectangle(image_rgb, (startX, startY), (endX, endY), (255, 0, 0), 2)

# Display the image

plt.imshow(image_rgb)

plt.axis('off')

plt.show()

Example 2: Understanding Object Detection Architectures (Faster R-CNN, SSD, YOLO)

Description:
This example provides a high-level overview of object detection architectures and explains how Faster R-CNN, SSD, and YOLO work differently.

No Code for this example, purely theoretical explanation

Example 3: Loading Pre-trained Faster R-CNN Model (PyTorch)

Description:
In this example, students will use a pre-trained Faster R-CNN model to perform object detection on an image using PyTorch.

import torch

from torchvision.models.detection import fasterrcnn_resnet50_fpn

from torchvision import transforms

from PIL import Image

import matplotlib.pyplot as plt

# Load the pre-trained Faster R-CNN model

model = fasterrcnn_resnet50_fpn(pretrained=True)

model.eval()

# Load and preprocess the input image

image = Image.open(r'C:\Users\asd01\tensorflow_datasets\cats_vs_dogs\train\cats\3.jpg')

transform = transforms.Compose([transforms.ToTensor()])

input_image = transform(image).unsqueeze(0)

# Perform object detection

with torch.no_grad():

predictions = model(input_image)

# Visualize results

image_np = image.convert("RGB")

plt.imshow(image_np)

plt.axis('off')

plt.show()

# Print bounding boxes and labels for the detected objects

for element in predictions[0]['boxes']:

print(f'Bounding box: {element}')

import matplotlib.patches as patches

# Create a figure and axes

fig, ax = plt.subplots(1)

ax.imshow(image_np)

# Draw bounding boxes

for box in predictions[0]['boxes']:

xmin, ymin, xmax, ymax = box

rect = patches.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin, linewidth=2, edgecolor='r', facecolor='none')

ax.add_patch(rect)

plt.axis('off')

plt.show()

Example 4: Using Pre-trained SSD Model (TensorFlow)

Description:
This example demonstrates how to use a pre-trained SSD model in TensorFlow to detect objects in an image.

import numpy as np

import tensorflow as tf

from PIL import Image

from object_detection.utils import label_map_util

from object_detection.utils import visualization_utils as viz_utils

# Load a pre-trained SSD model from TensorFlow Hub

model = tf.saved_model.load("ssd_mobilenet_v2_fpnlite_320x320/saved_model")

# Create the category index

category_index = label_map_util.create_category_index_from_labelmap(r"C:\Users\asd01\mscoco_label_map.pbtxt", use_display_name=True)

# Load the image and preprocess it

image_path = r'C:\Users\asd01\tensorflow_datasets\cats_vs_dogs\train\cats\3.jpg'

image = np.array(Image.open(image_path))

input_tensor = tf.convert_to_tensor(image)

input_tensor = tf.image.resize(image, [320, 320])

input_tensor = tf.cast(input_tensor, tf.uint8)

input_tensor = tf.expand_dims(input_tensor, 0)

# Perform object detection

detections = model(input_tensor)

image_with_detections = image.copy()

# Visualize results

viz_utils.visualize_boxes_and_labels_on_image_array(

image_with_detections,

detections['detection_boxes'][0].numpy(),

detections['detection_classes'][0].numpy().astype(int),

detections['detection_scores'][0].numpy(),

category_index,

use_normalized_coordinates=True,

line_thickness=5)

plt.imshow(image_with_detections, cmap='viridis')

plt.axis('off')

plt.show()

Example 5: Implementing Object Detection with YOLO (YOLOv5 in PyTorch)

Description:
This example demonstrates how to use the YOLOv5 model for real-time object detection using PyTorch.

import torch

image_path = r'C:\Users\asd01\tensorflow_datasets\cats_vs_dogs\train\cats\3.jpg'

# Load YOLOv5 model

model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)

# Perform object detection on an image

results = model(image_path)

# Display the results

results.show()

# Print bounding boxes, labels, and confidence scores

print(results.xyxy[0]) # Bounding boxes, labels, and confidence scores

Example 6: Real-Time Object Detection with OpenCV and YOLO

Description:
In this example, students will use OpenCV to capture video from a webcam and perform real-time object detection using a pre-trained YOLOv3 model.

import cv2

import numpy as np

# Load YOLO model and configuration files

net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")

layer_names = net.getLayerNames()

output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# Load the COCO class labels

with open("coco.names", "r") as f:

classes = [line.strip() for line in f.readlines()]

# Capture video from webcam

cap = cv2.VideoCapture(0)

while True:

ret, frame = cap.read()

height, width, channels = frame.shape

# Prepare the frame for YOLO

blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False)

net.setInput(blob)

outs = net.forward(output_layers)

# Processing detections

for out in outs:

for detection in out:

scores = detection[5:]

class_id = np.argmax(scores)

confidence = scores[class_id]

if confidence > 0.5:

# Object detected

center_x = int(detection[0] * width)

center_y = int(detection[1] * height)

w = int(detection[2] * width)

h = int(detection[3] * height)

# Draw bounding box

x = int(center_x - w / 2)

y = int(center_y - h / 2)

cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)

# Add label

label = str(classes[class_id])

cv2.putText(frame, label, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

# Display the frame

cv2.imshow("Real-Time Object Detection", frame)

# Break the loop if 'q' is pressed

if cv2.waitKey(1) & 0xFF == ord('q'):

break

cap.release()

cv2.destroyAllWindows()

Example 7: Evaluating Object Detection Models (Faster R-CNN)

Description:
In this example, students will learn how to evaluate the performance of an object detection model using the mean Average Precision (mAP) metric.

# Assuming predictions are stored in the following format: [boxes, labels, scores]

# Example: boxes = [[x1, y1, x2, y2], [x1, y1, x2, y2], ...]

from sklearn.metrics import average_precision_score

true_labels = [1, 0, 0, 1] # Ground truth labels

pred_scores = [0.9, 0.3, 0.5, 0.8] # Model predictions (confidence scores)

pred_labels = [1, 0, 0, 1] # Predicted labels

# Calculate mAP (mean Average Precision)

mAP = average_precision_score(true_labels, pred_scores)

print(f"Mean Average Precision: {mAP}")

Example 8: Understanding the Role of Anchor Boxes in Object Detection

Description:
This example explains the concept of anchor boxes, used in Faster R-CNN and SSD, to handle objects of varying scales and aspect ratios.

No Code for this example, purely theoretical explanation

Example 9: Creating a Custom Object Detection Dataset

Description:
This example guides students on how to create a custom object detection dataset and annotate it using tools like LabelImg.

No Code for this example, practical tool-based explanation

Example 10: Training a Custom Object Detection Model with YOLOv5

Description:
In this final example, students will learn how to train a YOLOv5 model on a custom dataset by fine-tuning the model weights.

# Assuming custom dataset is prepared and labeled in YOLO format

# Train YOLOv5 on a custom dataset

!python train.py --img 640 --batch 16 --epochs 30 --data custom_data.yaml --weights yolov5s.pt

# Visualize training progress and results

Week 7 Summary

Objective: Introduce object detection and understand popular architectures such as Faster R-CNN, SSD, and YOLO.
Skills Developed:
- Understand object detection concepts like bounding boxes, anchor boxes, and confidence scores.
- Use pre-trained object detection models (Faster R-CNN, SSD, YOLO) to detect objects in images and videos.
- Implement real-time object detection using YOLO and OpenCV.
- Evaluate object detection performance using mAP.
Tools: PyTorch, TensorFlow, YOLO, OpenCV, COCO dataset.

These 10 examples in Week 7 provide students with a hands-on understanding of object detection tasks and models. Students will learn how to use pre-trained models and implement real-time object detection, setting the stage for more advanced topics like training custom models in subsequent weeks.

Page updated

Report abuse