In the generative AI era, where even critical medical tasks are increasingly automated, radiology report generation (RRG) continues to rely on suboptimal metrics for quality assessment. Developing domain-specific metrics has therefore been an active area of research, yet it remains challenging due to the lack of a unified, well-defined framework to assess their robustness and applicability in clinical contexts. To address this, we present CTest-Metric, a first unified metric assessment framework with three modules determining the clinical feasibility of metrics for CT RRG. The modules test: (i) Writing Style Generalizability (WSG) via LLM-based rephrasing; (ii) Synthetic Error Injection (SEI) at graded severities; and (iii) Metrics-vs-Expert correlation (MvE) using clinician ratings on 175 "disagreement" cases. Eight widely used metrics (BLEU, ROUGE, METEOR, BERTScore-F1, F1-RadGraph, RaTEScore, GREEN Score, CRG) are studied across seven LLMs built on a CT-CLIP encoder. Using our novel framework, we found that lexical NLG metrics are highly sensitive to stylistic variations; GREEN Score aligns best with expert judgments (Spearman~0.70), while CRG shows negative correlation; and BERTScore-F1 is least sensitive to factual error injection. We will release the framework, code, and allowable portion of the anonymized evaluation data (rephrased/error-injected CT reports), to facilitate reproducible benchmarking and future metric development.
Link to paper: https://arxiv.org/abs/2601.11488
Pathologic diagnosis is a critical phase in deciding the optimal treatment procedure for dealing with colorectal cancer (CRC). Colonic polyps, precursors to CRC, can pathologically be classified into two major types: adenomatous and hyperplastic. For precise classification and early diagnosis of such polyps, the medical procedure of colonoscopy has been widely adopted paired with various imaging techniques, including narrow band imaging and white light imaging. However, the existing classification techniques mainly rely on a single imaging modality and show limited performance due to data scarcity. Recently, generative artificial intelligence has been gaining prominence in overcoming such issues. Additionally, various generation-controlling mechanisms using text prompts and images have been introduced to obtain visually appealing and desired outcomes. However, such mechanisms require class labels to make the model respond efficiently to the provided control input. In the colonoscopy domain, such controlling mechanisms are rarely explored; specifically, the text prompt is a completely uninvestigated area. Moreover, the unavailability of expensive class-wise labels for diverse sets of images limits such explorations. Therefore, we develop a novel model, PathoPolyp-Diff, that generates text-controlled synthetic images with diverse characteristics in terms of pathology, imaging modalities, and quality. We introduce cross-class label learning to make the model learn features from other classes, reducing the burdensome task of data annotation. The experimental results report an improvement of up to 7.91% in balanced accuracy using a publicly available dataset. Moreover, cross-class label learning achieves a statistically significant improvement of up to 18.33% in balanced accuracy during video-level analysis. The code is available at https://github.com/Vanshali/PathoPolyp-Diff.
Link to paper: https://arxiv.org/abs/2502.05444
In recent years, generative models have been very popular in medical imaging applications because they generate realistic-looking synthetic images, which is crucial for the medical domain. These generated images often complement the hard-to-obtain annotated authentic medical data because acquiring such data requires expensive manual effort by clinical experts and raises privacy concerns. Moreover, with recent diffusion models, the generated data can be controlled using a conditioning mechanism, simultaneously ensuring diversity within synthetic samples. This control can allow experts to generate data based on different scenarios, which would otherwise be hard to obtain. However, how well these models perform for colonoscopy still needs to be explored. Do they preserve clinically significant information in generated frames? Do they help in downstream tasks such as polyp segmentation? Therefore, in this work, we propose ControlPolypNet, a novel stable diffusion based framework. We control the generation process (polyp size, shape and location) using a novel custom-masked input control, which generates images preserving important endoluminal information. Additionally, our model comprises a detection module, which discards some of the generated images that do not possess lesioncharacterizing features, ensuring clinically relevant data. We further utilize the generated polyp frames to improve performance in the downstream task of polyp segmentation. Using these generated images, we found an average improvement of 6.84% and 1.3% (Jaccard index) on the CVC-ClinicDB and Kvasir-SEG datasets, respectively. The source code is available at https://github.com/ Vanshali/ControlPolypNet.
Link to paper:
Colonoscopy video acquisition has been tremendously increased for retrospective analysis, comprehensive inspection, and detection of polyps to diagnose colorectal cancer (CRC). However, extracting meaningful clinical information from colonoscopy videos requires an enormous amount of reviewing time, which burdens the surgeons considerably. To reduce the manual efforts, we propose a first end-to-end automated multi-stage deep learning framework to extract an adequate number of clinically significant frames, i.e., keyframes from colonoscopy videos.
The proposed framework comprises multiple stages that employ different deep learning models to select keyframes, which are high-quality, non-redundant polyp frames capturing multi-views of polyps. In one of the stages of our framework, we also propose a novel multi-scale attention-based model, YcOLOn, for polyp localization, which generates ROI and prediction scores crucial for obtaining keyframes. We further designed a GUI application to navigate through different stages.
Extensive evaluation in real-world scenarios involving patient-wise and cross-dataset validations shows the efficacy of the proposed approach. The framework removes 96.3% and 94.02% frames, reduces detection processing time by 38.28% and 59.99%, and increases mAP by 2% and 5% on the SUN database and the CVC-VideoClinicDB, respectively. The source code is available at https://github.com/Vanshali/KeyframeExtraction.
Link to paper: https://ieeexplore.ieee.org/abstract/document/10268934
Integrating real-time artificial intelligence (AI) systems in clinical practices faces challenges such as scalability and acceptance. These challenges include data availability, biased outcomes, data quality, lack of transparency, and underperformance on unseen datasets from different distributions. The scarcity of large-scale, precisely labeled, and diverse datasets are the major challenge for clinical integration. This scarcity is also due to the legal restrictions and extensive manual efforts required for accurate annotations from clinicians. To address these challenges, we present GastroVision, a multi-center open-access gastrointestinal (GI) endoscopy dataset that includes different anatomical landmarks, pathological abnormalities, polyp removal cases and normal findings (a total of 27 classes) from the GI tract. The dataset comprises 8,000 images acquired from Baerum Hospital in Norway and Karolinska University Hospital in Sweden and was annotated and verified by experienced GI endoscopists. Furthermore, we validate the significance of our dataset with extensive benchmarking based on the popular deep learning based baseline models. We believe our dataset can facilitate the development of AI-based algorithms for GI disease detection and classification. Our dataset is available at https://osf.io/84e7f/.
Link to paper: https://link.springer.com/chapter/10.1007/978-3-031-47679-2_10
arXiv: https://arxiv.org/abs/2307.08140
Link to paper: https://ojs.aaai.org/index.php/AAAI/article/view/27021
Specularity segmentation in colonoscopy images is a crucial pre-processing step for efficient computational diagnosis. The presence of these specular highlights could mislead the detectors that are intended towards precise identification of biomarkers. Conventional methods adopted so far do not provide satisfactory results, especially in the overexposed regions. The potential of deep learning methods is still unexplored in the related problem domain. Our work aims at providing a solution for more accurate highlights segmentation to assist surgeons. In this paper, we propose a novel deep learning based approach that performs segmentation following a multi-resolution analysis. This is achieved by introducing discrete wavelet transform (DWT) into the proposed model. We replace the standard pooling layers with DWTs, which helps preserve information and circumvent the effect of overexposed regions. All analytical experiments are performed using a publicly available benchmark dataset, and an F1-score (%) of 83.10 is obtained on the test set. The experimental results show that this technique outperforms state-of-the-art methods and performs significantly better in overexposed regions. The proposed model also performed superior to some deep learning models (but applied in different domains) when tested with our problem specifications. Our method provides segmentation outcomes that are closer to the actual segmentation done by experts. This ensures improved pre-processed colonoscopy images that aid in better diagnosis of colorectal cancer.
Link to paper: https://link.springer.com/article/10.1007/s11042-023-14564-1
Lymph node (LN) detection is a crucial step that complements the diagnosis and treatments involved during cancer investigations. However, the low-contrast structures in the CT scan images and the nodes’ varied shapes, sizes, and poses, along with their sparsely distributed locations, make the detection step challenging and lead to many false positives. To overcome these issues, our work aims at providing an automated framework for LNs detection in order to obtain more accurate detection results with low false positives. Methods The proposed work consists of two stages: candidate generation and false positive reduction. The first stage generates volumes of interest (VOI) of probable LN candidates using a modified U-Net with ResNet architecture to obtain high sensitivity but with the cost of increased false positives. The second stage processes the obtained candidate LNs for false positive reduction using a 3D convolutional neural network (CNN) classifier. Our proposed approach yields sensitivities of 87% at 2.75 false positives per volume (FP/vol.) and 79% at 1.74 FP/vol. with the mediastinal and abdominal datasets, respectively.
Link to paper: https://link.springer.com/article/10.1007/s11548-022-02822-w
Link to paper: https://openreview.net/pdf?id=-8mexJCWH_-