Zhili Yuan, et al
Abstract
Segmentation and/or recognition of surgical operation trajectories into distinct, meaningful gestures is a crucial preliminary step in surgical workflow analysis for Robot-assisted minimally invasive surgery.
This step is necessary for facilitating learning from demonstrations for autonomous robotic surgery, evaluating surgical skills, etc. In this work, we develop a hierarchical semi-supervised learning approach for surgical gesture segmentation using multi-modality data (i.e. vision and kinematics data). More specifically, surgical tasks are first segmented by combining the distance characteristics-based profiles and variance characteristics-based profiles constructed using kinematics data. Following that, a Transformer-based network with a pre-trained `ResNet-18' backbone was used to extract visual features from the surgical operation videos. Combining the potential segmentation points obtained from the two modality data, the final segmentation points can be determined, while gesture recognition can be implemented based on supervised learning.
The proposed approach was evaluated using data from the publicly available JIGSAWS database, including Suturing, Needle Passing, and Knot Tying tasks. The results revealed an average F1 score of 0.623 for segmentation and an accuracy of 0.856 for recognition.
Aim of this project:
By decomposing complicated surgical tasks into multiple gestures accurately, a robot-assist surgical system could automatically label the surgical motion data according to the activities being performed by a surgeon.
JIGSAWS database is used in this paper for experimental evaluation.
https://cirl.lcsr.jhu.edu/research/hmm/datasets/jigsaws_release/
Table I comes from this paper:
Gao, Yixin, et al. "Jhu-isi gesture and skill assessment working set (jigsaws): A surgical activity dataset for human motion modeling." MICCAI workshop: M2cai. Vol. 3. No. 3. 2014.
Fig. 1 and Fig. 2 illustrate the normalized Euclidean distance and the ground truth segmentation points for the left hand of suturing files D004 and D002, respectively.
Fig. 1: Fixed threshold method for euclidean distance (Suturing File D004)
Fig. 2: CWT method for euclidean distance (Suturing File D002)
The variance value is reversed to find the critical points at a corner, as shown in Fig. 3.
Fig. 3 Reversed rotation distance with the Savitzky–Golay filter (Suturing File D002)
Fig.4 displays an example of the raw trajectory distance.
Fig. 4: Raw translation trajectory distance (Needle Passing File E004).
As in Fig.5, the segmentation points are not always happened at the peak of the translation distance profile. Thus, to increase accuracy, more strict rules should be used to avoid over-segment.
Fig. 5: Smoothed translation trajectory distance with Kalman filter
(Needle Passing File E004)
Fig.6 and Fig.7 show the rotation distance with spotted critical points
with and without the Savitzky–Golay filter.
Fig.6: Without the Savitzky–Golay filter (Knot Tying File C005)
1. Blue curve: normalized rotation distance; 2. Red dot: ground truth start and end points; 3. Red 'x': critical points
Fig.7: Rotation distance with the Savitzky–Golay filter (Knot Tying File C005)
An example of extracted critical points from translation and rotation distance is shown in Fig.8 using left-hand properties.
Fig.8: Critical points after Spatial-temporal method-left hand (Suturing File F004)
Examples of critical points for rotation variances are shown in Fig.9. The ground truth points are shown in red dots. The gesture changes mostly happened at the peak and corner.
Fig.9: Rotation variance character-right hand (Suturing File B005)
One of the examples of the translation variance character for right-hand trajectories is present in Fig. 10, which is selected from Suturing file H003.
It shows that segment points in the translation variance profile are not as significant as the rotation variance profile.
Fig.10: Translation variance character-right hand(Suturing File H003)
Combining all the critical points found by distance-based and variance-based methods for each demonstration file, the estimated segmentation results using the kinematics data for Knot tying file I002 is present in Fig. 11.
Fig.11: Estimated segmentation results by only using kinematics data(Knot Tying File I002)
The segmentation outcome for Knot Tying File I002 is shown in Fig.12.
Fig.12: Final segmentation result with visual data(Knot Tying File I002)
Fig.13 is the final result for Suturing file C003, with recall= 0.483, precision =0.789 and f1 score = 0.599.
Fig.13: Final segmentation result with visual data + processed kinematics data (Suturing File C003)
As shown in Fig. 14, the segmentation scores for Needle Passing file C002 are recall= 0.692, and precision =0.947. For all Needle Passing files, the average scores are 0.510, 0.647, and 0.570 respectively.
Fig.14: Final segmentation result with visual data + processed kinematics data (Needle Passing File C003)
Fig.15: Confusion matrix