This work was developed during the EndoMapper project, at the University of Zaragoza.
Participants: O. León Barbed, Pablo Azagra, Juan Plo, Ana C. Murillo.
We aim to automate the initial analysis of complete endoscopy videos, identifying the sparse relevant content. This facilitates long procedure recording understanding, reduces the clinicians’ review time, and facilitates downstream tasks such as video summarization, event detection, and 3D reconstruction. Our approach extracts endoscopic video frame representations with a learned embedding model. These descriptors are clustered to find visual patterns in the procedure, identifying key scene types (surgery, clear visibility frames, etc.) and enabling segmentation into informative and non-informative video parts.
Evaluation on complete colonoscopy videos presents good performance identifying surgery segments and different visibility conditions. The method produces structured overviews that separate useful segments from irrelevant ones. We illustrate its suitability and benefits as preprocessing for other downstream tasks, such as 3D reconstruction or video summarization. Our approach enables automated endoscopy overview generation, helping the clinicians focus on the relevant video content such as good visibility sections and surgery actions. The presented work facilitates faster recording reviewing for clinicians and effective video preprocessing for downstream tasks.
Publication: Barbed, O.L., Azagra, P., Plo, J. & Murillo, A. C. (2025). Automated overview of complete endoscopies with unsupervised learned descriptors. International Journal of Computer Assisted Radiology and Surgery. https://doi.org/10.1007/s11548-025-03502-1
Supplementary Information. The online version of the publication contains supplementary material available