Sachin - Project 2

Object Detection

Hybrid Multi-Attention Transformer for Robust Video Object Detection (HyMATOD)

Video object detection (VOD) is particularly challenging due to motion blur, occlusions, and appearance changes. This project developed HyMATOD, a transformer-based model with a Hybrid Multi-Attention (HyMAT) module, which improves temporal consistency by enhancing target-background embeddings. By integrating self- and cross-attention blocks, the model achieves superior detection accuracy across frames.

Skills Learned:

Transformer-based object detection models
Attention mechanisms for refining object-background relations
Temporal consistency techniques for video-based detection

Key Highlights

Achieved 86.7% mean Average Precision (mAP) on ImageNet VID, outperforming existing methods.
Integrated a lightweight transformer framework for real-time object detection.
Published in Engineering Applications of Artificial Intelligence (Elsevier).

Overall framework of Proposed method

Comparison Results

Comparison of HyMATOD performance on benchmark datasets

Page updated

Google Sites

Report abuse