A deep learning system (DLS) with an online learning system has high sensitivity, specificity and a more transparent and interpretable diagnosis for detecting glaucomatous optic neuropathy (GON).
To establish a DLS for detection of GON using retinal fundus images and convoluted neural networks (GD-CNN), with a high level of general utility.
A DLS for the classification of GON was developed for automated classification of GON using retinal fundus images. To build and validate GD-CNN, a total of 355 339 fundus images were included. Of those, 241 032 images and 114 307 images were selected as the training and validation dataset, respectively. The generalization of the DLS was tested in several validation datasets, which allowed assessment of the DLS in a clinical setting without exclusions, testing against variable image quality based on fundus photographs obtained from websites, evaluation in a population-based study that reflects a natural distribution of glaucoma patients within the cohort and an additive dataset that has a diverse ethnic distribution. An online learning system was established to transfer the trained and validated DLS to generalize the results with fundus images from new sources. To better understand the DLS decision making process, a prediction visualization test was performed that identified regions of the fundus images utilized by the DLS for diagnosis.
Area under the receiver operating characteristics curve (AUC), sensitivity and specificity for DLS with reference to professional graders.
The AUC of the GD-CNN model in clinical-based, population-based, multi-image quality and multi-ethnic validation datasets ranged from 0.823 to 0.996, with sensitivity and specificity ranging from 82.2-96.2% and 70.4.9-97.7% respectively. The most common reasons for both false-negative and false-positive grading by GD-CNN and manual grading was pathologic or high myopia.
Figure 2. Receiver Operating Characteristic Curve and Area Under the Curve of the Deep Learning System for GD-CNN in local validation Dataset. The curve is the receiver operating characteristic curve (ROC) and the orange diamond is the operating point to measure the sensitivity and specificity. AUC represents the area under the receiver operating characteristic curve.
Figure 3. Performance of the ODL system for glaucoma diagnosis alongside increasing amount of samples. In our experiments, 4 groups of samples are sequentially obtained from the tele-ophthalmology platform. The performance of the ODL system is evaluated in terms of sensitivity and specificity, for each of sample group in a sequential order. Figure A shows that the performance of sensitivity and specificity incrementally grow along with the increased number of diagnosed groups. Figure B shows that the P value of sensitivity and specificity decreases consistently with the progress of online learning, final at the value lower than 0.05. This verifies the effectiveness of the ODL system, in which human (ophthalmologists) help machines improve the prediction performance and machines help human to save some efforts in diagnosing the negative samples of glaucoma.
Figure 4. Visualization maps generated from deep features. The visualization map is created after prediction by assigning the softmax probability of the correct label to each occluded area and which can be superimposed on the input image to highlight the areas the model considered important in making its diagnosis. (A) Original fundus images. (B) Fundus heatmap overlaid on a fundus image, highlighting pathologic regions.
Figure 5. Training loss and visualization of deep features at different training iterations. (A) Training loss with accuracy. The curve represents the value of loss function, which decreases along with more training iterations. The training loss converges at small values after 20x104 iterations. (B) Feature clustering with the progress of training. The blue dots are the deep features extracted from negative glaucoma, while the yellow dots are the deep features of positive glaucoma.
Application of GD-CNN to fundus images from different settings and varying image quality demonstrated a high sensitivity, specificity and generalization for detecting GON. Automated DLS could enhance current screening programs in a cost-effective and time-efficient manner.