We obtained two full-length match footage between the Mumbai Indians and the Chennai Super Kings and turned them to ball-by-ball clips. We also transformed the clips into frames. We utilized these frames to create a model that classifies the scene and extracts the photos of the players.
We had about 490 video clips and more than 2.3 lakh images. For training and testing, the data was divided in an 80:20 ratio.
The first model we created is for categorizing a scene depending on the camera angle. The batsman, bowler, wicket-keeper, runner, and umpire are all captured in the initial camera view. The second camera view depicts the situation after the shot is taken, including ball tracking, other players, a ground view, and audience.
We created a CNN model for binary classification of Camera Views 1 and 2. For this model, we used five convolutional layers, five max-pooling layers, five dropout layers, a flatten layer, and two dense layers. We trained the network for 100 epochs and the results were quite encouraging.
Camera View 1
Camera View 2
Here, for model two we have used transfer learning technique to identify the persons in the frame for cropping purpose.
In this model, we manually segregated the cropped player photos into three categories: Batsman, Bowler, and Others. Furthermore, we used these segregated images to train a network for multiclass classification of players in Camera View1 into the aforementioned categories.
We created a CNN model for multiclass classification. We used five convolutional layers, five max-pooling layers, five dropout layers, a flatten layer, and two dense layers.
We created two models, one for batsman name recognition and the other for bowler name recognition.
The players classified in model 3 are then used for name identification in model 4 and 5. The network is designed for multiclass classification, with the batsman having 18 classes and the bowler having 12. For both models, we used five convolutional layers, five max-pooling layers, five dropout layers, a flatten layer, and two dense layers.
We built a fresh dataset for this model that contains the poses of several shots played by the batsman. The dataset is divided into six categories: cover shot, cut shot, straight drive, pull shot, leg glance, and sweep shot.
For the model, we used seven convolutional layers, seven max-pooling layers, seven dropout layers, a flatten layer, and two dense layers.
In the final stage, we combined the results of all the models described above to create ball-by-ball commentary that included the bowler and batsman's names, as well as a description of the shot played.