Advanced Applied Deep Learning
Lecture Course
Sheng Yun Wu
Lecture Course
Sheng Yun Wu
Objective:
To introduce students to object tracking techniques combined with real-time object detection. Students will learn how to implement object detection and tracking systems, focusing on algorithms like SORT (Simple Online and Realtime Tracking) and DeepSORT. By the end of the week, students will understand how to build real-time systems that detect and track objects across video frames.
Lecture 1: Object Tracking Fundamentals
12.1 What is Object Tracking?
Definition:
Object tracking is the process of following a detected object as it moves across multiple frames in a video sequence. The goal is to maintain consistent identification of objects across time.
Difference Between Object Detection and Object Tracking:
Object Detection: Identifies and locates objects in individual frames but does not maintain continuity of the objects across frames.
Object Tracking: Extends object detection by assigning an ID to each detected object and tracking its movement across frames, maintaining consistent object identities over time.
Applications of Object Tracking:
Video surveillance (tracking people or vehicles).
Autonomous driving (tracking pedestrians and other vehicles).
Sports analytics (tracking players or equipment).
Robotics (tracking objects for manipulation and interaction).
Lecture 2: Tracking-by-Detection Approach
12.2 Tracking-by-Detection:
How it Works:
In tracking-by-detection, object detection is performed on each frame of a video sequence, and an object tracking algorithm assigns consistent IDs to detected objects over time.
It combines the strengths of object detection (accurate identification of objects in each frame) with tracking (maintaining object identities across frames).
Challenges in Tracking-by-Detection:
Occlusion: Objects may become partially or fully occluded by other objects, making tracking more difficult.
Re-identification: When objects leave and re-enter the frame, the tracking system needs to correctly re-identify them.
Appearance Variations: Changes in lighting, scale, or orientation of objects across frames can make tracking challenging.
12.3 Kalman Filters for Object Tracking:
What is a Kalman Filter?
A Kalman filter is a mathematical algorithm used for estimating the future position of an object based on its previous positions.
It combines a prediction model (based on motion) and a measurement model (from detections) to update the estimated position of the object in each frame.
How Kalman Filters Work in Tracking:
The Kalman filter predicts the next position of an object in the video frame, and when the new position is detected, the prediction is corrected based on the detection.
This helps in maintaining object consistency even when objects are temporarily occluded or move rapidly between frames.
Lecture 3: Real-Time Object Tracking Algorithms – SORT and DeepSORT
12.4 SORT (Simple Online and Realtime Tracking):
Overview of SORT:
SORT is a fast and simple tracking algorithm that builds on the tracking-by-detection paradigm.
It uses Kalman filters to predict object trajectories and Hungarian algorithm for data association (matching detected objects with predicted positions).
SORT Workflow:
Object Detection: Detect objects in each frame (e.g., using YOLO or SSD).
Prediction (Kalman Filter): Predict the future position of each tracked object using the Kalman filter.
Data Association (Hungarian Algorithm): Match the predicted positions with the detected objects using the Hungarian algorithm.
Update: Update the object positions based on new detections.
Advantages of SORT:
Speed: SORT is very fast and can be easily integrated with real-time object detection models.
Efficiency: SORT works well for simple tracking tasks where there are no complex interactions between objects.
Limitations of SORT:
No Re-identification: If an object leaves the frame and returns, SORT cannot re-identify the object correctly.
Handling Occlusion: SORT struggles when objects are occluded for long periods or when there are many overlapping objects.
12.5 DeepSORT (Deep Simple Online and Realtime Tracking):
Overview of DeepSORT:
DeepSORT is an extension of SORT that adds deep learning-based appearance models to handle re-identification and occlusion challenges. It improves tracking accuracy by considering both motion and appearance information.
DeepSORT Workflow:
Object Detection: Detect objects in each frame.
Prediction (Kalman Filter): Predict the next position of each tracked object using a Kalman filter.
Appearance Feature Extraction: Extract appearance features from the detected objects using a deep CNN (e.g., a pre-trained ResNet).
Data Association (Hungarian Algorithm): Match detected objects with predicted positions based on both motion (Kalman filter) and appearance (CNN features).
Re-identification: Re-identify objects based on their appearance when they re-enter the frame or after being occluded.
Advantages of DeepSORT:
Appearance-based Tracking: DeepSORT can track objects even when they are temporarily occluded or re-enter the frame by comparing their appearance features.
Improved Accuracy: By combining motion and appearance information, DeepSORT achieves better tracking accuracy, especially in crowded scenes.
Limitations of DeepSORT:
Computational Cost: DeepSORT requires more computational resources than SORT due to the deep learning-based appearance model.
Practical Session: Implementing Real-Time Object Detection and Tracking
Objective: Implement a real-time object detection and tracking system using YOLO or SSD for object detection and SORT or DeepSORT for tracking.
Dataset: Use a video dataset or live video feed (e.g., surveillance footage, autonomous driving footage, sports video).
Key Steps:
Step 1: Perform Object Detection
Use a pre-trained object detection model (e.g., YOLO or SSD) to detect objects in each frame of the video.
Step 2: Implement SORT for Tracking
Implement the SORT algorithm for tracking detected objects across frames.
Use Kalman filters to predict the future positions of objects, and apply the Hungarian algorithm for data association.
Step 3: Implement DeepSORT for Appearance-based Tracking (Optional)
Implement the DeepSORT algorithm to improve tracking accuracy using both motion and appearance features.
Use a pre-trained CNN (e.g., ResNet) to extract appearance features for re-identifying objects after occlusion.
Step 4: Evaluate Tracking Performance
Evaluate the performance of the tracking system using metrics like MOTA (Multiple Object Tracking Accuracy) and ID Switches (number of times an object’s ID is incorrectly switched).
Compare the performance of SORT and DeepSORT in terms of speed and accuracy.
Assignment for Week 12:
Coding Assignment:
Implement a real-time object detection and tracking system using SORT or DeepSORT.
Experiment with different object detection models (e.g., YOLO, SSD) for detection and compare their performance in terms of detection and tracking accuracy.
Analyze how the tracking system handles occlusion and re-identification.
Analysis:
Compare the performance of SORT and DeepSORT in terms of tracking accuracy, re-identification, and computational speed.
Analyze how the Kalman filter and appearance-based re-identification improve the tracking of objects across video frames.
Reading Assignment:
Read Chapter 13 of "Advanced Applied Deep Learning" by Umberto Michelucci.
Focus on understanding the combination of object detection and tracking, and how different tracking algorithms like SORT and DeepSORT are implemented in real-time systems.
Summary of Key Concepts:
Object Tracking: The process of following detected objects across video frames, maintaining consistent object identities over time.
Tracking-by-Detection: A common approach that combines object detection and tracking algorithms to track objects across frames.
SORT (Simple Online and Realtime Tracking): A fast and simple tracking algorithm based on Kalman filters and the Hungarian algorithm for data association.
DeepSORT: An extension of SORT that uses deep learning-based appearance models for improved tracking accuracy and re-identification.
Kalman Filters: Predict the future position of objects based on their past positions, helping to maintain object consistency across frames.
Hungarian Algorithm: Used for data association, matching detected objects with predicted positions in each frame.
This week provides students with the skills to implement real-time object detection and tracking systems using modern algorithms like SORT and DeepSORT. Students will gain practical experience combining object detection with tracking, understanding the challenges of real-time systems, and exploring how deep learning improves object tracking accuracy through appearance-based re-identification.