The primary task of our work is to track the football players accurately to be used by football teams to analyse the performance of individual players. Due to the nature of the football game, the tracking becomes highly difficult in the crowed scenarios and regions that are far away from the camera. Under such situations, human annotators will resolve the tracking issues during post-processing. Hence the main goal of tracking is to produce tracks of high confidence and reduce swaps as far as possible. MOT scores are considered as standard metrics to compare different tracking models and judge the best tracker for a given challenge. But in MOT metrics, events such as swap and rename/switch are given the same weightage. This metrics becomes unacceptable in scenarios where the main requirement is to generate tracks with a smaller number of Id swaps and an acceptable number of ID renames. Since to resolve Id rename, human annotator needs to focus in two frames where new Id is created, while in the case of Id swap the human annotator needs to focus on every individual frame to locate the swap frame and then correct the Id. Thus, generating confident short track is better than longer tracks with swaps. some of the keywords that are used in our work are highlighted below.
· Id swap: Id of two players are exchanged.
· Id Rename/Switch: Player Id is changed to a new Id.
· Id Copy: A single Id is assigned to two different players.
Tracking Algorithm
Tracking players is one of the most crucial tasks in our work, as all statistical data of players such as running, scoring goal, kicking are associate with proper tracking. The video recorded by Statmetrix camera incorporates a unique Field of View (FOV), where the players are recorded at an angle facilitating two different overlapping scenarios. Firstly, the overlapping player completely occludes another player, where the extracted feature represents only one player which is similar to other tracking problems. In contrary, due to camera angle, overlapping players tend to occlude players body rather than an entire player and this occlusion happens very frequently. The feature extracted under this scenario does not provide high confidence to any single player. We employed Split-feature matching system to retain typical full-body feature extraction and gain an advantage over overlapping scenarios.
A split feature matching model is trained to provide player matching results in terms of percentage as shown below. The model is trained with ResNet-50 backbone to spatial extract features and Siamese architecture style is used to match the temporal features. The model consists of splitter that splits the test and target image into two parts, Top and Bottom, to extract features that are matched individually to provide a value between 0-1 which is the matching percentage between the test and the target image. Since there are two matching results available after matching, the best matching score is considered as feature matching score by assigning equal weightage to both feature matching scores. In split feature match technique, variable weightage can be assigned which is not possible for full body matching.
Full body match
Split body match