Our results on tool recognition and eventually phase recognition have been found to be good, however, they still have a scope of improvement. We used the publicly available cholec80 dataset, which contains 80 videos of cholecystectomy surgeries performed by 13 surgeons.
The size of the dataset, that is 80 videos, is relatively small to capture the best results, as the training will be limited by the same. Also, the dataset was made public in 2016, and since then there has been a significant shift in the technologies used to capture the medical processes. Also, some of the frames were a bit blurred and an increase in resolution can definitely bring about a significant amount of improvements in the results. The tools at times cannot be seen clearly and hence are difficult to be visually recognized. Some of the datasets for cholecystectomy surgeries are curated by medical labs, however, most of them are not publicly available. The presence of the latest dataset will surely lead to better accuracy results.
Also, phase recognition has its own challenges. Some of the visual challenges include the camera motion speed, which might be too fast making it difficult for our proposed architecture to capture the relevant distinctive features. Also, some speed variation across the different video parts also does impact the final accuracy achieved. Another challenge, worth mentioning is the presence of articulated tools in our videos, which leads to a but occlusion and hence impact the quality.
Looking at the scope and the high impact of our project in the medical field, we are planning to further extend the work. The intention is to increase the accuracy of surgical-phase detection, and make relevant changes in our proposed architecture to achieve the same. Also, to check the generality of our work, we are planning to test our architecture on other surgical videos, such as Cataract surgical phases, etc.
As of now, we have directly used the RGB frames extracted from videos for feature extraction. In future, we plan to use the features extracted from optical flows calculated between frames. Also, we believe that the techniques like HHMM (Hierarchical Hidden Markov Model)[1] can be explored out in order to improve the phase recognition accuracy. In addition, we plan to test our model on latest and larger cholecystectomy surgery datasets, which will be made publicly available.