Player Classfication

During the football matches, cameras record the action of players of the entire football ground, along with some unnecessary details that are undesirable during processing the videos to generate necessary details. Audience in the football match and the non-players who are outside the football ground are captured has no added value to the application and thus need to be removed or identified to be neglected from the processes involved in generating useful results within the application.

During the football match, the applications expects to detect 11 players of each team for successful processing of information’s in each frame. Since all non-players and audience are recorded around 25-35 human (person class in coco dataset) bounding boxes are detected and processing all bounding boxes cost huge processing time and unwanted errors are introduced during tracking players. This emphasis the need of an algorithm to filter-out the non-players and audience from the detected person bounding boxes for each frame.

Spatial Feature Filter (SFM)

In the spatial feature filter method, we make use of the spatial differences between players and audience. We collected over 10K images of players and non-players to train a deep learning model based on ResNet-50 architecture to classify the images into 2 categories: player and audience. This filter method accurately identifies the audience who are close to the camera, while produce low confidence to classify images that are far away from the camera. This is mainly due to the resolution of images, as far away images are of resolution 15x40 pixels. Also, the spatial difference between player and audience tend to reduce far away from the camera, and hence spatial feature filter alone cannot be employed for the task. Player are denoted as 'P' and Audience as 'A'

Bounding Box Filter (BBM)

The above figure illustrates the outline algorithm use to filter-out non-players and audience per frame in a football match video using bounding box filter method. The main idea behind the algorithm is the nature of the game and how the action (movements) of player and non-player change during a football match. Audiences tend to have constant location compared to a player during a football match and this location displacement is considered as a key to classify player from all detected person class objects.

In the first stage, all the frames recorded during a football game is processed to generate the bounding boxes for person class. The detections are in the format of and these detections are later converted to generate the centre location of each bounding box in the format of .The bounding box location of the entire match is overlapped to generate an activity map or heat map of the entire game. This map indicates the activity of persons in the football ground in terms of displacement as shown in Figure below.

Activity and Mean activity maps

Mask of non-players

Audience and unwanted bounding box in red

Depending upon the user requirement the threshold can be set to filter the bright spots. A default value of 0.5 can be used, which indicates a bounding box has the same location during 50% of the entire game and thus the location can be considered as a non-player bounding box.

A mask of non-players and audiences are generated by selecting all pixels that have a pixel value greater than the threshold set by the user. Using information of the bounding box, the non-players are filtered out. All bounding box that fall within the mask region are considered as non-players and rest of the detections are considered as players. This drastically reduces the amount of detection per frame to be processed to get necessary tracking information. The performance is represented in Table below and overall result is shown in the below image.

Overall result of BBM + SFM filter methods