Recently, the attention mechanism has been successfully applied in convolutional neural networks (CNNs), significantly boosting the performance of many computer vision tasks. Unfortunately, few medical image recognition approaches incorporate the attention mechanism in the CNNs. In particular, there exists high redundancy in fundus images for glaucoma detection, such that the attention mechanism has potential in improving the performance of CNN-based glaucoma detection. This paper proposes an attention-based CNN for glaucoma detection (AG-CNN). Specifically, we first establish a large-scale attention based glaucoma (LAG) database, which includes 5,824 fundus images labeled with either positive glaucoma (2,392) or negative glaucoma (3,432). The attention maps of the ophthalmologists are also collected in LAG database through a simulated eye-tracking experiment. Then, a new structure of AG-CNN is designed, including an attention prediction subnet, a pathological area localization subnet and a glaucoma classification subnet. Different from other attention-based CNN methods, the features are also visualized as the localized pathological area, which can advance the performance of glaucoma detection. Finally, the experiment results show that the proposed AG-CNN approach significantly advances state-of-the-art glaucoma detection.
As shown in Figure 1, glaucoma can be correctly detected by a CNN method when the visualized heat maps are consistent with the attention maps of ophthalmologists in glaucoma diagnosis. Otherwise, glaucoma is mislabeled by the CNN model. Therefore, it is reasonable to combine the attention mechanism in the CNN model for using fundus images to detect ophthalmic diseases.
Figure 1. Examples of glaucoma fundus images, attention maps by ophthalmologists in glaucoma diagnosis and visualization results of a CNN model by an occlusion experiment.
The LAG database contains 11,760 fundus images corresponding to 4,878 suspecious and 6,882 negative glaucoma samples. All the samples are labeled with the diagnosis results (0 refers to negative glaucoma and 1 refers to suspecious glaucoma). 5,824 fundus images are further labeled with attention regions based on an alternative method for eye tracking, in which 2,392 are positive glaucoma and the rest 3,432 are negative glaucoma. The database is available at here.
Figure 2. Some samples from our LAG database.
Figure 3. An example of capturing fixations of an ophthalmologist in glaucoma diagnosis.
The framework of AG-CNN is shown in Figure 4. As shown in Figure 4, the input to AG-CNN is the RGB channels of a fundus image, while the output is (1) the located pathological area and (2) the binary glaucoma label. Our AG-CNN has two 2 stages as follows.
Figure 4. Architecture of our AG-CNN network for glaucoma detection and its components, with the sizes of the feature maps and convolutional kernels.
Table 1. Performance of three methods for glaucoma detection over our LAG validation set and the test set of RIM-ONE database.
Figure 5. Comparison of ROC curves among different methods. (Left): Testing on our LAG validation set and RIM-ONE database.. (Right): Ablated experiment results.
Table 2. Ablation results over the validation set of our LAG database.
Figure 6. Comparison of pathological area localization results for glaucoma detection. (1st row): The pathological areas located by ophthalmologists. Optic cup and disc are labeled in blue and the regions of retinal nerve fiber layer defect are labeled in green. (2nd row): The result of our method. (3rd row): The result of the CAM-based method. (4th row): The result of the ablation experiment.