Lab 5: Mapping Benthic Habitats in GEE

from Earth Engine Fundamentals and Applications - EEFA (2021), Chapter A2.2

Eidan W. Willis

Introduction
Section 1: Input Data
Section 2: Preprocessing Functions
- - 2.1: Masking Land Areas
  - 2.2: Correcting sunglint in an image
  - 2.3: Developing a Depth Invariant Index (DII) Image
Section 3: Supervised Classification
Section 4: Bathymetry (RF Regression)
Conclusion

Introduction

Lab 5 was designed as a "pick-your-poison" sort of activity. Instead of following a tutorial provided by Professor Cardille, students were given a list of possible tutorials to execute and base a lab report on. Having a deep personal interest in the marine sciences, I chose to execute a tutorial explaining how to map benthic habitats in Google Earth Engine (GEE). This lab report explains how to:

We learned how to import and preprocess our data, how to develop a Depth Invariant Index (DII) image, how to perform supervised classification of an image using predetermined classification types, and how to determine and visualize bathymetric data using a function employing the Random Forest (RF) Regression method. In conjunction, these methods can be used to garner a deeper understanding of our world's coastal waters using satellite imagery collected miles above the Earth's surface.

Import satellite imagery of a marine environment somewhere in Greece
Perform various preprocessing methods, including masking land areas, correcting for sunglint, and producing a Depth Invariant Index (DII)
Perform supervised classification of marine imagery using predetermined classification types
Compute and analyze bathymetric data with the help of field-collected sonar data using machine learning (i.e., Random Forest (RF) Regression method)

Section 1: Input Data

We'll be using a raster dataset and a set of geometry data. Start by importing the raster dataset, located around a coastal region somewhere in Greece, using the ee.Image() function. The majority of satellite data are stored with values scaled by 10000 in order to occupy less memory. Therefore, we need to descale them back to physical values before we can begin processing them for analysis. This is simply done using the .divide() function to divide our new imagery by 10000 to obtain the correct scale. We then add the imagery to the map using the Map.addLayer() function, specifying the following inputs for the function:

eeObject, required: the object we want to visualize (in this case, it's our imported imagery).
visParams (vector of various data types), optional: the visualization parameters by which the image should be displayed with. This input contains the following parameters:
- bands (vector of strings): the names of the bands to be visualized in the displayed image; these can be found by printing our imagery and examining its metadata.
- min (integer): the minimum band value to display.
- max (integer): the maximum band value to display.
- gamma (integer): the gamma value of the displayed image.
name (string), optional: the name of the layer to-be-displayed in the Layer drop-down menu in the top-left hand corner of the map.
shown (boolean), optional: True or False. indicates whether or not the new layer should be displayed on the map immediately upon running the script.

Lastly, we can avoid having to manually navigate to the geographical location upon which our imagery is displayed by centering the map using the Map.centerObject(). This method takes a imagery (required) and a zoom level (optional), numbered from 0 to 24.

Looks like we're somewhere in Greece...

Section 2: Preprocessing Functions

Masking land areas

In order to prepare our imagery for classification and bathymetric analysis, we need to perform some preprocessing functions. In this case, preprocessing means correcting or minimizing spectral noise and alterations caused by physical conditions on the surface, like sun-glint, waves, and water column distortion.

To begin, use the geometry tools in the top-left corner of the map in the GEE API to draw two multi-line FeatureCollections over the two types of surface cover: land and water. These FeatureCollections will be used to train the Random Forest (RF) classifier – a supervised machine learning algorithm that uses randomly selected subsets of a given dataset to classify image features. The RF classifier creates a set of decision trees that can be used to classify individual pixels that are similar to the training data provided; it then aggregates these pixels into one of the classification fields specified by the user. In our example, the two classification fields are land and water, which are distinguished by the value of their 'class' property, where land has the value 0 and water the value 1. In our code, we will use these property values to selectively mask the land area class from the image.

Once we've drawn our multi-line FeatureCollections and configured our geometry import settings, we will create and use a function to perform three actions: 1) display our image using the Normalized Difference Water Index (NDWI), which will allow us to garner deeper information on what's happening beneath the ocean's surface, 2) classify land and water features into two distinct classifications, and 3) remove the classified land area from the image. This will help limit our focus to the marine environment and prevent the land reflectance values from baising our data. The code displayed on the right performs these actions and applies a land mask to our planet imagery. The resulting image is visualized below.

Drawing multi-line features to train the Random Forest classifier.

Correcting sunglint in the image

We then need to account for the effect of sun-glint on the image. Sunglint is a natural phenomenon that can occur when the sun and the satellite sensor are positioned in such a way that sunlight is reflected off of the water surface towards and interferes with the sensor's ability to detect reflectance from the ocean floor. This is accounted for by manually adding training polygons to regions of the image where the effect of sunglint is observed, which are then removed using linear modelling. Individually, Red, Green, and Blue bands, are each combined with the Near Infrared (NIR) band and a linear fit is computed. Unfortunately, the following steps in the script did not come with any explanation, so this is what I believe is happening under the hood:

The range of values that each of these linear fits spans is mapped to an image called the slope image.
A second image containing the minimum NIR value in each pixel within the sunglint polygons (i.e., the polygons I manually added to the map) is created.
The input image of the function is set to display an RGB composite.
Each pixel in this RGB image is subtracted from each pixel in the slope image and multiplied by the NIR value in that same pixel.
Each pixel in this image is then subtracted from each pixel in the minimum NIR image.
Lastly, the NIR band is added to the image original image.
The function is applied to the newly landmasked image, creating a sunglint-corrected image called SGImage.
This image is added to the map (right) and displayed next to the original uncorrected image (left).

The image prior to sunglint correction.

The image post-correction. Training polygons used to locate regions affected by sunglint can be seen near the Eastern coast.

Developing a Depth Invariant Index (DII) image

Performing satellite-based remote sensing analyses on aquatic environments inherently comes with its own unique challenges in comparison to terrestrial analyses. In particular, electromagnetic waves attenuate and travel differently through air than through water depending on the wavelength of the signal. It is not uncommon for variations in depth and inconsistent presence of suspended materials in the water column to bias spectral values of aquatic imagery and perturb efforts to perform classification or bathymetric procedures. As a result, the spectral signatures of completely different aquatic environments – for instance, that of shallow seagrass beds and deep sandy sea floors – can erroneously occupy the same classes. To prevent this bias, a Depth Invariant Index (DII) can be performed by correlating depth measurements and logged blue and green band values.

We begin by importing some manually drawn training polygons that will be used as training data by the machine learning algorithm that will calculate our DII. These polygons were placed in what are believed to be sandy regions, though the actual composition of the benthos at these locations cannot be determined with 100% accuracy. It's important to keep this in mind when developing your training data, as the potential presence of other benthic structures (e.g., seagrass beds, coral reefs) will effect the spectral signature and, therefore, the final DII output.

Placing training polygons at believed-to-be sandy regions.

In a body of water, an electromagnetic wave will attenuate (i.e., reduce in amplitude after crossing the air-water boundary) differently based on its wavelength. For instance, red light (λ = 620-750nm) will be absorbed more quickly and, therefore, will not penetrate as deeply as green light (λ = 495-570nm), which in turn will not penetrate as deeply as blue light (λ = 450-495nm). This is why it is important to preprocess aquatic imagery using a DII in order to be able to visualize benthic points of interest and accurately compute bathymetric data. The code segment below loops through the Red (b3), Green (b2), and Blue (b1) bands to produce covariance matrices for each of the three bands. The covariance values of each of the logged bands in relation to one another are used by the DII to produce an Attenuation Coefficient Ratio (ACR), which weighs the image in favor of those bands capable of reaching the deepest depth (i.e., green and blue).

A diagram explaining light attenuation/absorption in the ocean.

(source: https://scubadiverlife.com/review-backscatter-flip-3-1-part-i/)

The console output for the first iteration of the above loop, in this case for the Red (b1) and Blue (b3) bands. The bands are listed, a covariance matrix generated, and an Attenuation Coefficient Ratio is computed. Variance of b1 can be found in top-left (0, 0), b2 in bottom-right (1, 1), and covariance in both bottom-left (1, 0) and top-right (0, 1).

DII Image

Section 3: Supervised Classification

Classification FeatureCollections visualized on the mapped region. Regions with soft, sandy benthos are represented in yellow, regions with hard, rocky benthos are in red, and high concentrations of the marine plant Posidonia oceanica are displayed in green. Training and validation data are displayed in the same color palette.

Running our code creates a supervised classification of the marine region in the image where:

yellow represents sandy, softer benthos composition
red represents rocky, harder benthos composition
green represents collections of Posidonia oceanica, also known as Neptune grass or Mediterranean tapeweed.

Zoomed out, the image is most predominantly composed of sandy regions, with some rocky regions near shore and sporadic collections of P. oceanica spread throughout. Zooming in, however, rocky regions remain concentrated almost entirely along the shoreline of Greece. Sandy regions still dominate, though it seems that many clusters of P. oceanica that are not visible from 2km out (38m/pixel) are visible from 500m out (10m/pixel).

A classified image representing sandy (yellow), rocky (red) regions, as well as regions concentrated with the marine plant Posidonia oceanica (green). This image is visualized at 5km out (38m/pixel).

A zoomed in look at the same classified image, where sandy (yellow) and rocky (red) regions are displayed alongside regions with high concentrations of the marine plant Posidonia oceanica (green). The image is visualized at 500m out (10m/pixel).

Section 4: Bathymetry (RF Regression)

In order to obtain measurements on the depth of the water (i.e., bathymetry) in this marine region, we will perform a regression of a machine learning Random Forest output use the ee.Classifier.smileRandomForest() method. A regression can be exectuted from within this method simply by appending the statement: .setOutputMode('REGRESSION'). For this section, reference data came from bathymetric observations that were taken in the field using a sonar device mounted on a boat. Unlike the classification accuracy assessment, however, the bathymetric accuracy assessment is based on two statistical outputs: the R^2 value and the Root Mean Square Error (RMSE).

As is the case with color palette visualization, it is be important to use the appropriate color palette for visualizing this bathymetric profile; this may seem somewhat obvious, but those unfamiliar with your data can easily become confused if you use an unconventional or counterintuitive palette for your image (Tip: colder, darker colors will give the perception of depth on a 2D plane.). Similar to the classification section, we begin by specifying the import data and the variables that GEE will use to compute the Random Forest Regression.

Then, we will create a new function that will perform the following actions:

Creates an empty Random Forest classifier which takes the number of decision trees to be created as an input.
Train our new RF classifier using our field-collected depth data
- - According to GEE Docs, the .train() method, "trains the new Random Forest classifier on a collection of features, using the specified numeric properties of each feature as training data. The geometry of the features is ignored.
Sets the output of the classifier to the desired computation using the .setOutputMode() method.
- - In our case, we want to specify the 'REGRESSION' input, since we want to perform a regression.
Concatenate our newly created Satellite Derived Bathymetry (SDB) estimate image and our in-situ field-collected depth data (depthV) into one image. This allows for inter-comparison between the two images once we run the script.
Note that computations are being performed with the understanding that these variables are images, meaning that the following calculations will be derived at each pixel of the concatenated image.
Calculate the covariance of the SDB estimate and the in-situ depth data derived from the concatenated image.
Create a covariance matrix to house our new covariance data.
Compute the R^2 statistic in each pixel using the covariance matrix we just made.
Compute the standard deviation in each pixel by subtracting the mean depth (from in-situ depth data) from the SDB image and squaring the result.
Compute the Root Mean Square Error (RMSE) of each pixel by square-rooting the mean of the standard deviation.
Print both statistical values so that they appear in the same console output.
Save the output as a dictionary for easier access with future coding.

The code that performs all of the above actions is displayed below:

Lastly, we run our new function on the DIV image we created in Section 2 and add the new layer to the map:

Running all of the above script produces a bathymetry layer detailing the depth of the water in the study region, with the deepest waters displayed in darker (navy blue) pixels and shallowest waters brighter (white) pixels. We can also create geometry points at some of the deeper/shallower points of interest on the map to be able to compute the depth at that pixel. This depth can be found in the Inspector window after re-running the script with the new geometries and clicking on a given geometry point on the map with the Inspector window open.

Output of our Random Forest Regression to compute bathymetric data based on the covariance of the Random Forest Satellite Derived Bathymetry (SDB) output and the in-situ depth training and validation data collected in the field via boat-mounted sonar.

By this point, your Earth Engine Script should look something like this:

https://code.earthengine.google.com/78b3547c8beba45d3eb4882a602bb12c

Conclusion

In this lab, we explored ways in which Google Earth Engine (GEE) can be employed to perform analyses on benthic ecosystems. We learned how to import and preprocess our data, how to develop a Depth Invariant Index (DII) image, how to perform supervised classification of an image using predetermined classification types, and how to determine and visualize bathymetric data using a function employing the Random Forest (RF) Regression method. In conjunction, these methods can be used to garner a deeper understanding of our world's coastal waters using satellite imagery collected miles above the Earth's surface.

Google Sites

Report abuse