Temporal Augmentations
Automatic recognition of surgical activities from intraoperative surgical videos is crucial for developing intelligent support systems for computer-assisted interventions. Current state-of-the-art recognition methods are based on deep learning where data augmentation has shown the potential to improve the generalization of these methods. This has spurred work on automated and simplified augmentation strategies for image classification and object detection on datasets of still images. Extending such augmentation methods to videos is not straightforward, as the temporal dimension needs to be considered. Furthermore, surgical videos pose additional challenges as they are composed of multiple, interconnected, and long-duration activities. This work proposes a new simplified augmentation method, called TRandAugment, specifically designed for long surgical videos, that treats each video as an assemble of temporal segments and applies consistent but random transformations to each segment.
TRandAugment: Temporal Random Augmentation Strategy for Surgical Activity Recognition from Videos, Int J CARS (2023) | paper |
Self-supervised Learning - Surgical Vision
The field of surgical computer vision has undergone considerable breakthroughs in recent years with the rising popularity of deep neural network-based methods. However, standard fully-supervised approaches for training such models require vast amounts of annotated data, imposing a prohibitively high cost; especially in the clinical domain. Self-Supervised Learning (SSL) methods, which have begun to gain traction in the general computer vision community, represent a potential solution to these annotation costs, allowing to learn useful representations from only unlabeled data. Still, the effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. In this work, we address this critical need by investigating four state-of-the-art SSL methods (MoCo v2, SimCLR, DINO, SwAV) in the context of surgical computer vision. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection. We examine their parameterization, then their behavior with respect to training data quantities in semi-supervised settings.
Dissecting Self-Supervised Learning Methods for Surgical Computer Vision, Medical Image Analysis | paper |
Weakly Supervised Learning
Automatic recognition of fine-grained surgical activities, called steps, is a challenging but crucial task for intelligent intra-operative computer assistance. The development of current vision-based activity recognition methods relies heavily on a high volume of manually annotated data. This data is difficult and time-consuming to generate and requires domain-specific knowledge. In this work, we propose to use coarser and easier-to-annotate activity labels, namely phases, as weak supervision to learn step recognition with fewer step annotated videos.
Temporal Convolutional Networks for Fine-grained Surgical Activity Recognition, IEEE Transactions on Medical Imaging | paper |
Multi-Activity Recognition
Automatic segmentation and classification of surgical activity is crucial for providing advanced support in computer-assisted interventions and autonomous functionalities in robot-assisted surgeries. Prior works have focused on recognizing either coarse activities, such as phases, or fine-grained activities, such as gestures. This work aims at jointly recognizing two complementary levels of granularity directly from videos, namely phases and steps. We introduce two correlated surgical activities, phases and steps, for the laparoscopic gastric bypass procedure. We propose a multi-task multi-stage temporal convolutional network (MTMS-TCN) along with a multi-task convolutional neural network (CNN) training setup to jointly predict the phases and steps and benefit from their complementarity to better evaluate the execution of the procedure. We evaluate the proposed method on a large video dataset consisting of 40 surgical procedures (Bypass40).
Multi-task temporal convolutional networks for joint recognition of surgical phases and steps in gastric bypass procedures, Int J CARS 16, 1111–1119 (2021) | paper |
Scalar Field Visualization - Color map, height map, and contouring using marching squares. (Github, Tech Stack: C++, OpenGL)
Vector Field Visualization - Hedgehog plot and streamline plot using Runge-Kutta (RK2) method. (Github, Tech Stack: C++, OpenGL)
Volume Visualization - Visualizing a slice of a volume by computing the plane-volume intersection and an isosurface using marching cubes. (Github, Tech Stack: C++, OpenGL)
2D Curve Rendering - Rendering 2D curves generated from a set of points using Bezier and Lagrange interpolation. (Github, Tech Stack: C++, OpenGL)
3D Model Rendering - Rendering 3D objects from a ply file. (Github, Tech Stack: C++, OpenGL)
Texture Mapping - Applying textures to 3D objects using spherical and cylindrical mapping with lights. (Github, Tech Stack: C++, OpenGL)
Animation - Scene animation using Scene graphs. (Github, Tech Stack: C++, OpenGL)
DreamWorks Challenge - Sparks animation. (Github, Tech Stack: JS, three.js, WebGL)
Computer Vision: CS131 - Implemented various computer vision techniques like filtering, edge detection, image stitching, image resizing and segmentation(using k-means) through the course assignments. (Tech Stack: Python, numpy)