Object Detection
Object Detection
Video object detection (VOD) is particularly challenging due to motion blur, occlusions, and appearance changes. This project developed HyMATOD, a transformer-based model with a Hybrid Multi-Attention (HyMAT) module, which improves temporal consistency by enhancing target-background embeddings. By integrating self- and cross-attention blocks, the model achieves superior detection accuracy across frames.
Transformer-based object detection models
Attention mechanisms for refining object-background relations
Temporal consistency techniques for video-based detection
Achieved 86.7% mean Average Precision (mAP) on ImageNet VID, outperforming existing methods.
Integrated a lightweight transformer framework for real-time object detection.
Published in Engineering Applications of Artificial Intelligence (Elsevier).
Comparison of HyMATOD performance on benchmark datasets