3. Classification

This chapter presents good practice recommendations for all stages of producing land cover and land use maps, from creating the satellite image mosaic, to filters and post-classification processing, as well as for generating products derived from final classifications, documentation and the process of improving collections.

1. Mosaic

1.1 Image mosaicing and pre-processing

1.2 Feature space

2. Samples

3. Algorithm and model parameters

4. Filters and post-classification processing

5. Maps integration

6. Modules and products derived from annual land cover and land use maps

7. Documentation

8. Improvement process

Figure 1: Main stages of the annual mapping of land cover and land use and the generation of other derived modules and products.

1. Mosaic

This section provides guidelines for the creation, storage and availability of image mosaics.

1.1 Image mosaicing and pre-processing

For each satellite image scene, masks must be applied to remove pixels contaminated by clouds and shadows and the edges of the scenes must also be removed;
Annual mosaics can be created using all scenes of the year or by choosing a specific period of the year that maximizes the contrast between the classes;
Excessively noisy scenes can be excluded during the generation of annual mosaics;
The length of the time series will be determined by the quantity and quality of images available, but should aim to include the most comprehensive time interval possible;
The limits of the mosaics can be related to the scenes or to some cartographic regionalization considered operational (e.g. systematic mapping at the 1:250,000 scale);
The annual mosaics must be stored and made available as a product for use and visualization for platform users;
Some classes may need specific mosaics, which can be processed in memory during classification;
It is possible to combine images from different sensors (L5, L7, L8 and L9) to ensure greater availability of information.

1.2 Feature space

It is recommended to use a wide range of variables in the land cover and land use classification process;
The variables can be expressed in a complementary way by measures of central tendency (median), as well as intra-annual variability (amplitude, standard deviation, minimum, and maximum values);
In addition to the reflectance bands, the following can be used:
- spectral indices (NDVI, PRI, CAI, NBR, among others);
- data with specific intervals for regions or themes (e.g. Pantanal can use the time interval from the end of the flood (March) to the beginning of the rains (October); Agriculture can use the harvest calendar);
- texture images;
- image fractions from the spectral mixture model and indices generated from the fractions;
- segmentation bands;
- terrain data (e.g. elevation, slope, slope orientation, vertical distance from the nearest drainage (HAND), etc.);
- other complementary data, such as annual burned area, time since the last fire and geolocation descriptors (latitude and longitude).

2. Samples

This section provides guidelines for the creation, filtering, balancing, and reviewing of samples.

In order to facilitate the extraction of annual samples in the classification process, it is suggested to construct a mask of stable pixels, in which training samples called “stable samples” are drawn, which correspond to pixels always classified in the same class throughout the entire time interval in a previous collection. If no previous collection is available, a similar approach can be applied by using other relevant available classifications or creating a new classification considering reference maps, although additional processing and harmonization of the legend will likely be needed. This constancy in the classification ensures a greater degree of confidence that the same pixel can be used as a training sample for all years of the series, increasing the efficiency of the classifier and reducing large variations in the time series;
Stable pixel filter: It is recommended to filter stable pixels using reference maps (e.g. excluding stable pixels of native vegetation that coincide with deforestation reference maps) and outlier analysis (see an example here);
Balancing and drawing stable samples: it is important to consider the proportion between the occurrence of a given class in the classification region and the number of samples to be drawn. It must be ensured that rare classes are well represented, but, at the same time, taking into account the proportion of area that the classes occupy in the landscape;
Number of stable samples: after a certain number of samples, the quality of the classified map stabilizes (Figure 2A) and, therefore, it is not recommended to use an excessively large number of samples;
Quality of stable samples: it is important to pay attention to the quality of the samples, preferably using random points, which can be drawn over the stable pixel mask in a regionalized approach or over a large number of small polygons delimited over selected areas, always seeking to ensure a good spatial distribution in the classification region, representing the spectral heterogeneity of the class;
Regionalization of the classification: working with smaller and more homogeneous regions allows better control in sample collection, balancing and evaluation of results;
It is recommended that samples trained on images from a given year are only used to classify that same year. This prevents, for example, samples trained in a year with a drier climate from being used to classify a year with a wetter climate;
Complementary samples: set of samples collected during the classification process. These samples can be based on field knowledge, better resolution images, literature or reference maps and are important, for example, to complement the number of stable samples, to increase the spectral representation of a class or to add a new class;
It is important to observe whether the complementary samples are valid for the entire time series or for a specific period and use them only when classifying the corresponding years;
It is recommended to carry out sample reviews and balancing in a cyclical process of improvement, progressively generating classification versions, evaluating the results and making adjustments during this process;
Stable samples can be reviewed and generated again after each collection is published.

Figure 2: Expected relationships between (A) number of samples , (B) bands and (C) number of trees and map quality.

3. Algorithm and model parameters

This section provides guidelines for the choice of algorithms, analysis of variables importance, and number of trees (in the case of Random Forest).

Considering the territorial extension, heterogeneity of the landscape and mapped classes and the amount of information in the feature space, artificial intelligence algorithms present better results. Among them, two stand out: the Random Forest, when the pixel's spectral information is more important, and Neural Networks, when context information is more important;
Feature Space: it is recommended to carry out an analysis of the importance of the variables used in the classification (reflectance bands, indices, etc.) to select those that optimize the classification in terms of quality (e.g. accuracy, spatial and temporal consistency) and make computational processing more efficient. In addition to the variables indicated in this analysis, others considered important for the classification of rare classes can be included (it is recommended to use between 20 and 60 bands - Figure 2B);
Number of trees: for the Random Forest classifier, after a certain number of trees the quality of the classified map stabilizes (Figure 2C). Between 50 and 200 trees are recommended.

4. Filters and post-classification processing

This section describes guidelines for the use of filters and other procedures after the classification.

Gap Fill: gap fill is used to fill in information in regions that were not observed in certain years (lack of images). It is important to use this filter so that the quantitative data for each class is comparable over the period. The first alternative to filling in a region without information from one year is to repeat the information from the previous year. Areas not observed in the first few years can be filled in with classification information from the subsequent year;
Spatial Filter: the spatial filter removes the “salt & pepper” effect from mapping and influences the definition of the minimum mapped area. Normally, areas with less than 0.5 ha (6 pixels) to 1 ha (11 pixels), in mappings with images of 30m spatial resolution, are replaced by the mode value of the surrounding pixels;
Temporal Filter: used to remove transitions between classes considered very unlikely throughout the time series. It uses a temporal trajectory analysis of the pixel over the years to remove errors and to guarantee the temporal consistency of the classification (e.g. moving windows of 3, 4 or 5 years, consistency analyzes for regeneration or deforestation, frequency of a class throughout the time series, number of changes between classes, etc.);
The filter application sequence can be adapted according to the needs of the class or region;
Reference Data: Reference data can be used as a mask for adding new classes (e.g. forest is reclassified as “arboreal restinga” based on the existing coastal sandy soil class on a detailed soil map);
It is essential that the classification is carried out with a buffer of 100 m to 2 km or overlapping area with neighboring regions, border strips and coastal areas so that there is no lack of information when using sections with different scales;
The clip by administrative or natural territories (country, basins, etc.) is carried out when the statistics are generated.

5. Maps integration

This section describes the procedures to combine maps when generated by different teams and the harmonization with border territories.

The generation of annual land cover and land use maps can result from the combination of maps generated by different teams focusing on specific classes. In these cases, it is important to define integration rules so that the maps can be combined into a single final map. A suggested arrangement is that one team manages the map with an emphasis on the natural classes of the biome while other teams dedicate themselves in a specialized way to the anthropic classes (e.g. agriculture, forestry, urbanized areas, mining, etc.);
Prioritization: when there are different classifications applicable to the same region (e.g. urbanized areas on the mapping of the more generic natural and/or anthropic classes generated for the biome) it is essential to define the priority of spatial overlap of each class, so that the integration of maps generates a consistent final map. Typically, rare classes and those with lower commission errors have higher priority (i.e. are placed on top of others);
Post-Integration Filter: Post-integration filters can include spatial and temporal filters. Integration can result in some undesirable situations such as isolated pixels or false transitions (e.g. when integrating forestry into the classified map, the years immediately preceding the identification of forestry, and which correspond to the growth period of this plantation, can be confused with native forest).
Each classification by region or country should harmonize in the border with other neighboring regions or countries. The process of harmonization allows the discussion of the border areas between territories and identifies discrepancies that need to be solved (class definition, balance of samples, etc.).

6. Modules and products derived from annual land cover and land use maps

This section provides guidelines for the generation of data on class transitions, vegetation suppression and regeneration, and area estimates.

Transitions: transitions between classes in time are calculated for intervals of interest for each country/region. Pre-defined intervals can be used (first and last year of mapping, periods every 5 or every 10 years) or intervals that have some specific meaning (e.g. changing protection laws). The transition data is generated by the formula:

(Year 1 class ID X 100) + Year 2 class ID

After generating the transition asset, a spatial filter is applied to remove areas smaller than the minimum mapped area.

Natural vegetation suppression and regeneration: The calculation of natural vegetation suppression and regeneration derived from annual maps uses spatial and temporal trajectory filters to remove noise and to derive a series consistent with the annual dynamics of native vegetation cover. Usually, the first and last years of land cover and land use mapping do not have data on vegetation suppression and regeneration because it is not possible to confirm the change trajectory;
Area estimates: The areas of classes or transitions between classes are calculated for the different territories of interest in the country/region. It is recommended to consider at least political limits, protected areas, biomes/ecoregions and river basins. It is important that each team organizes the shapefile files that will be the basis for calculating the platform's statistics in a standardized way. See more details in the Data Storage and GEE Organization chapter. The area calculation is carried out in a standardized way using raster file format, following the script “mapbiomas-user-toolkit-calculate-area.js” available at “users/mapbiomas/user-toolkit”

7. Documentation

This section provides guidelines for the ATBD and source code availability.

Algorithm Theoretical Document Basis (ATBD, see an example here): it is mandatory to generate a technical report that documents all methods and procedures adopted in as much detail as possible, written in English, which must be updated with each new collection;
If there are methodological specificities between regions or transversal themes, there may be a general ATBD and other detailed ones;
Source code: on the premise of open source, all scripts used in the classification, as well as their appropriate documentation, must be deposited in the project's GitHub repository.

8. Improvement process

This section provides guidelines for the continuous improvement of MapBiomas data, including a guide for the internal evaluation of collections.

Research and development projects to improve the classification can be carried out at any time and are incorporated in the classification methods when consolidated;
It is important to consider users comments about erroneously mapped regions or classes for continuous data improvement;
During the generation of a new classification and after the publication of each collection, a comprehensive critical analysis must be carried out to evaluate the results obtained in accordance with the Spatio-temporal Assessment Guide for Collections. This collection assessment is the starting point for planning and improving the next map collection;
It is recommended to improve the classification considering contiguous territories, standardizing legend, interpretation criteria and classes to generate consistent regional maps.

Page updated

Report abuse