This page lists our publicly available datasets for remote sensing image retrieval, scene classification, semantic segmentation, land use scene classification and change detection!

PatternNet

PatternNet is a large-scale high-resolution remote sensing dataset collected for remote sensing image retrieval. There are 38 classes and each class has 800 images of size 256×256 pixels. The images in PatternNet are collected from Google Earth imagery or via the Google Map API for some US cities. The following table shows the classes and the corresponding spatial resolutions. The figure shows some example images from each class.

Note: One image (closedroad361.jpg) in the closed road class is mistakenly labeled and it does not contain "closed road". It is actually an image tile which is adjacent to the closed road image tile on the large imagery. This has very little effect on the retrieval performance!

Download this dataset:  PatternNet    

If you use this dataset in your research, please cite the following work:

Zhou, W., Newsam, S., Li, C., & Shao, Z. (2018). PatternNet: A benchmark dataset for performance evaluation of remote sensing image retrieval. ISPRS Journal of Photogrammetry and Remote Sensing, 145, 197-209.

The following are some other publicly available datasets:

DLRSD

DLRSD [1] is a dense labeling dataset that can be used for multi-label tasks such as remote sensing image retrieval (RSIR) and classification, and other pixel-based tasks like semantic segmentation (also called classification in remote sensing). DLRSD has a total number of 21 broad categories with 100 images per class, which is the same as the UC Merced archive. We labeled the pixels of each image in the UC Merced archive with the following 17 class labels, i.e., airplane, bare-soil, buildings, cars, chaparral court, dock, field, grass, mobile-home, pavement, sand, sea, ship, tanks, trees and water. The 17 class labels are first constructed and defined in the multi-label RSIR archive [2], where each image in the UC Merced archive is provided with a set of multiple labels. During the labeling of DLRSD, we improved some of the multi-labels [2] that we thought inaccurately labeled through visual inspection, and then used/referred to the revised multi-labels to manually label the pixels of each image. Therefore, DLRSD is an extension of the UC Merced archive and particularly the multi-label RSIR archive.

Note: 

The size of some images in the UC Merced archive are not exactly 256*256 pixels as it stated. We resized those images to 256*256 before labeling the pixels.

Download this dataset:  DLRSD    

DLRSD is also available for  multilabel tasks such as multilabel scene classification and multilabel image retrieval, etc. 

Download the multilabel version of DLRSD: DLRSD_multilabel

If you use DLRSD in any resulting publications, please cite our work [1] and the work that provides the 17 class labels [2]:

WHDLD

WHDLD is a dense labeling dataset that can be used for multi-label tasks such as remote sensing image retrieval (RSIR)  and classification, and other pixel-based tasks like semantic segmentation (also called classification in remote sensing). We labeled the pixels of each image with the following 6 class labels, i.e., building, road, pavement, vegetation, bare soil and water.

Download this dataset:  WHDLD    

WHDLD is also available for  multilabel tasks such as multilabel scene classification and multilabel image retrieval, etc. 

Download the multilabel version of WHDLD: WHDLD_multilabel

If you use WHDLD in any resulting publications, please cite the following works:

CVGD

CVGD (cross-view between ground and drone) is a dataset collected for cross-view geolocalization (also called cross-view image retrieval) between ground and drone images. The images in CVGD are collected from 100 locations in a university. Considering the fact that in a real cross-view task there are possibly more than one ground and drone image pairs indicating the same location but captured different viewpoints, we collect 2~6 drone images and 2~7 ground images per location .

Download this dataset:  CVGD

If you use CVGD in any resulting publications, please cite the following work:

W. Zhou, H. Guan, Z. Li, Z. Shao and M. R. Delavar, "Remote Sensing Image Retrieval in the Past Decade: Achievements, Challenges, and Future Directions," in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 1447-1473, 2023, doi: 10.1109/JSTARS.2023.3236662.

MtSCCD

MtSCCDMulti-temporal Scene Classification and Change Detectionis a large-scale dataset constructed for land use scene classification and change detection.  To construct MtSCCD dataset, bi-temporal imageries of central areas of five China cities including Hangzhou, Hefei, Nanjing, Shanghai, and Wuhan are acquired from World Imagery. The spatial resolution is about 1.01 meter.

The large-size imageries of the five cities are cropped to image patches (i.e., land use scene, called image hereafter) with the size of 300×300 pixels. Each image in MtSCCD is named “xx_yyyymm_nx_ny_c”, where “xx” is an acronym for city name, “yyyymm” is the date (year and month) that the large-size imagery is acquired, “c”is the label of each class, “nx”and “ny”are the row and column of the image in the large-size imagery. The images in MtSCCD are categorized into 10 land use classes: residential land, public service and commercial land, educational land, industrial land, transportation land, agricultural land, water body, green space, woodland, and bare land. The classification standard is referred to “Code for classification of urban land use and planning standards of development land (GB50137-2011)” [1], PKU-USED dataset [2], MtS-WH dataset [3], and WH-MAVS dataset [4].

MtSCCD consists of two subsets, MtSCCD_LUSC (MtSCCD land use scene classification) and MtSCCD_LUCD (MtSCCD land use scene change detection)

Download this dataset:  MtSCCD

MtSCCD_LUSC dataset is used for land use scene classification. To construct MtSCCD_LUSC, the bi-temporal images of each class of each city are combined together. Then the images of each class of Hangzhou, Shanghai, and Wuhan are split in to training and validation set with the ratio of 80% and 20%, respectively. The images of Hefei and Nanjing are used for testing set A and B, respectively. Table 2 shows the overview of MtSCCD_LUSC dataset.

MtSCCD_LUCD dataset is used for land use change detection. To construct MtSCCD_LUCD, for the three cities Hangzhou, Shanghai, and Wuhan, the first temporal (T1) images of each class and the second temporal (T2) images of each class are combined to obtain the images of T1 and T2 of each city. Then the images of T1 and T2 are split into training and validation set with the ratio of 80% and 20%, respectively. It is notable that the images of T1 should be corresponding to the images of T2 in training and validation set. In other words, the bi-temporal image pairs should belong to the same location. The images of Hefei and Nanjing are used for testing set A and B, respectively.  One can find the corresponding T1 and T2 scene pairs according to “nx_ny” in the filename “xx_yyyymm_nx_ny_c” when using this dataset. 

If you use MtSCCD in any resulting publications, please cite the following works:

References:

(1) http://www.risn.org.cn/Xxbz/ShowForceStandard.aspx?Guid=61387

(2) http://geoscape.pku.edu.cn/

(3) http://sigma.whu.edu.cn/newspage.php?q=2019_03_26

(4) http://sigma.whu.edu.cn/newspage.php?q=2021_06_27