Contextual Combination of Appearance and Motion for Intersection Videos with Vehicles and Pedestrians

Mohammad Shokrolah shirazi and Brendan Morris

Proceeding of the International Symposium in Visual Computing, (ISVC 2014), pp. 708-717, October, 2014, Las Vegas, USA.

Abstract

Object detection and classification is challenging problem for vision-based intersection monitoring since traditional motion-based techniques work poorly when pedestrians or vehicles stop due to traffic signals. In this work, we present a method for vehicle and pedestrian recognition at intersections that benefits from both motion and appearance cues in video surveillance. Vehicle and pedestrian recognition performance is compared using motion, appearance and combined cues in contextually relevant stop areas to improve recognition. Experimental evaluation shows 5% average improvement for vehicle and pedestrian recognition at two Las Vegas intersection

Goal

The goal of this work is to provide robust detection of vehicles and pedestrians through the intersection videos. Most methods rely on motion based techniques for vehicle and pedestrian detection that fail for intersection scenarios since traffic phase signals force participants to stop at intersections. As a result, large data-sets of vehicle and pedestrians are collected to train appearance-based classifiers to detect stopping pedestrians and vehicles at intersections.

System overview

A three stage cascaded system is proposed for reliable vehicle/pedestrian detection and classification at intersections as shown below. The main advantage of this system is the use of both motion and appearance cues in a contextually meaningful manner for accurate classification.

Contextual combination

Contextual combination provides fusion at the decision level to combine the outputs from the GMM and Haar detections in mix areas where both detectors are active. In this way,

appearance detection is limited to smaller processing regions for speed and reliability. The contextual combination has been defined to be able to reject many false appearance-based pedestrian/vehicle detections outside mix areas since detection by motion is more reliable. It also performs pooling of detection responses that have overlapping bounding boxes. Mix areas have been defined for two intersection as below.

Results

The performance results imply some interesting points. Motion-based techniques work well for detecting vehicles but Haar-like appearance features surprisingly are inefficient. However, the Haar detector significantly outperforms GMM motion for pedestrians. This is because pedestrians are small and tend to remain still on sidewalks. The Combined method is able to utilize appearance-based detection by dramatically lower false detections.

Video

Page updated

Google Sites

Report abuse