We evaluated the accuracy and the respective run-times for various neural network architectures for tool recognition and the respective results are shown above.
From the above, we can see that Resnet152 has the highest accuracy among all other neural networks. Although, the run time for ResNet152 is higher as compared to most of the other architectures but since our primary motivation is accuracy improvement, we selected ResNet152 for tool recognition and used the obtained trained model for the next part of our project i.e phase recognition.
The graph below depicts the accuracy of several evaluated architectures along with their time to train.
For tool recognition, the accuracy obtained using EndoNet architecture is around 81%, while the accuracy obtained by our proposed architecture comes out to be 87%. We can easily see that the architecture proposed in this project is significantly better (in terms of accuracy) as compared to the accuracy obtained using the state-of-the-art architecture.
Here are few example images from our model. Here Green means the correct/ actual label and Red means incorrect prediction.
The phase recognition part involves the final prediction of the different surgical phases involved in cholecystectomy videos. The different phases of cholecystectomy surgery in order are Preparation, Calot triangle dissection, Clipping and cutting, Gallbladder dissection, Gallbladder packaging, Cleaning and coagulation and Gallbladder retraction.
We have tested three different neural network architectures, that is GRU, RNN, and LSTM. Each chosen neural network has its own positives, and have been used in similar phase detection tasks before, which gives us strong evidence and motivation to experiment with. We have varied the configuration on different parameters such as hidden number of layers (64 and 128), and the Sequence length (100 and 200).
The tables below, along with the bar plots show the achieved accuracy (Train and Test), along with the training time (in seconds) on the respective configurations for all the three architectures.
Fig 1 Sequence Length = 100 Hidden Features = 128
Fig 2 Sequence Length = 100 Hidden Features = 64
Fig 3 Sequence Length = 200 Hidden Features = 128
Fig 4 Sequence Length = 200 Hidden Features = 64
The bar plots in the above figures shows how the training and testing accuracy vary for the three architectures and the line plot shows the comparison between the computation time.
Hidden features - 64
Hidden features - 128
We have also evaluated the test set accuracy comparison for different sequence lengths (100 and 200) while keeping the hidden features length constant at 64 and 128 respectively. The results obtained are plotted above. On observing the final results, we see that out of the three architectures we explored, GRU with 128 hidden layers and input sequence length set 100, achieves the best accuracy of 65.34% on the Test set.
To the best of our knowledge, for cholecystectomy surgical videos, Endonet provides the best tool and phase-recognition accuracy. In our work, we have received a higher classification accuracy for tool recognition as compared to Endonet. Our proposed architecture has used ResNet-152 architecture, while the state of the art has used AlexNet for the same.
On the other hand, our phase recognition results are a bit on the lower end, in comparison to the state of the art. One of the reasons, we speculate is the use of HHMM technique for phase recognition in Endonet. On the contrary, we did empirical experiments to test RNN, GRU, and LSTM models and decided to go ahead with GRU for the final implementation. We plan to apply and test the HHMM technique in our proposed architecture to make any accuracy improvement.