Graphical formulation of aerial images have the potential to describe RS scenes better [1].
We establish that graph convolution-based networks perform better than CNNs for serial both single-label and multi-label image classification and retrieval tasks [2].
Also, putting more attention to the most important areas within a region and neighborhoods (edges) lead to the class decision. We propose a novel edge attention mechanism to tackle the same [3].
We develop deep learning based model for cross-modal retrieval in RS. The following are some of the important cross-modal retrieval applications in RS:
- SAR - Multispectral
-RGB - Depth
-Image - Speech
-Panchromatic - Multispectral
-Hyperspectral - LiDAR
-Image - Sketch
Cross-modal retrieval can include cross-sensor [4], cross-media [5], and cross-resolution [6] retrieval.
In practical applications, a model is trained on seen classes (e.g., cauliflower), however it may encounter unseen classes (e.g., broccoli) upon deployment.
How do we handle such situation and deploy robust models? Develop zero-shot retrieval model!
To this end, we leverage the semantic information of classes [7, 8, 9].
The performance of a deep-learning-based model primarily relies on the diversity and size of the training dataset. However, obtaining such a large amount of labeled data for practical remote sensing (RS) applications is expensive and labor-intensive.
Training protocols have been previously proposed for few-shot learning (FSL) and zero-shot learning (ZSL). However, FSL is not compatible with handling unobserved class data at the inference phase, while ZSL requires many training samples of the seen classes. In this work, we propose a novel training protocol for image retrieval and name it as label-deficit zero-shot learning (LDZSL). We use this novel LDZSL training protocol for the challenging task of cross-sensor data retrieval in RS. This protocol uses very few labeled data samples of the seen classes during training and interprets unobserved class data samples at the inference phase. This strategy is critical as some data modalities are hard to annotate without domain experts [10].