A Remote Sensing Approach to Boreal Lake Carbon Content

Andrew Kittredge 

Using Remote Sensing Classification to Estimate Carbon Content of Quebec's Northern Lakes

Landsat, Color Dissolved Organic Matter, Classification



Shortly after the launch of the first remote sensing satellite in 1972, software techniques known as classification have been used to group pixels with similar spectral characteristics. Broadly, spectral-based classification is separated into two distinct methods: unsupervised and supervised classification. In unsupervised classification, a remote sensing software package analyses the spectral signature of each pixel in a satellite image. From here, pixels are grouped based on a number of user-inputted parameters. The user then manually maps each class to a land cover type. In supervised classification, the user provides training data to the classification algorithm--that is, each class is user-defined by a spectrally similar subset of the image–from here, the program deduces land cover type for the rest of the image.

As presented in class by Dr. Cardille, the carbon content of lakes is correlated to their color–their spectral signature, in remote sensing terms. It was also shown that a simple ratio of reflectance values in two satellite bands has the ability to represent color, and thus carbon content. However, this approach does not leverage all available data from newer satellite sensors like Landsat 7 and now the current, SLC-off error free Landsat 8. It follows, then, that a more sophisticated classification technique may be better suited to produce a measure of carbon content in lakes.


Context for the project

In a scientific era marked by austerity, funding is hard to come by. This reality has a huge impact on studies which involve a large amount of field work and lab analysis. The utilization of freely available Landsat imagery for CDOM analyses is appealing: Derived CDOM measurements from satellite imagery have enormous money and time saving benefits.

Project Objectives and Questions

This project proposes to use existing ground truth data collected by Dr. Cardille's lab to assess the merit of a full-spectrum supervised classification of carbon content in Boreal Lakes. If more successful, the supervised classification route may be able to answer questions like "what is the average carbon content across lakes in this image?" and "when is the program most accurate in predicting carbon content?". Performing the classification locally can also serve as a baseline to compare Google Earth Engine's supervised classification techniques. This will give insight into whether a time-averaged composite will negatively impact carbon prediction due to loss of temporal specificity. 


CDOM data was made available for 44 Quebec lakes. These lakes fell within Landsat path/row positions 18/26, 19/25, and 19/26. The classification was extended to the full extent of the combined area of these three Landsat scenes, covering approximately 90,000 square kilometers.


In situ data collected by Dr. Cardille's lab served asdata for the supervised classification; it will serve to both train the classes as well as be used for accuracy assessment. The three Landsat scenes used were collected within a week of each other on the 17th and 24th of September 2013. 


Lake CDOM data was provided via a Google Fusion table. It was exported to Excel, and again exported as a Windows-formatted .txt file in order to be read by ArcMap. The three downloaded Landsat images were combined in ENVI using the Seamless Mosaic tool. The data-ignore function was set to (0,0,0) and a mosaic was exported. Analysis of lake spectra compared to other land cover types in the mosaic made it clear that the two could be separated using the NIR channel: the separation between lakes (uniformly below 7500 DN) and other land cover types (above ~10,000DN) made masking relatively straightforward: a mask was applied using a decision tree to exclude pixels with DNs higher than 7500 in NIR channel 5. 

In ArcMap, the .txt file containing CDOM values at specific lat/lon points was loaded, then displayed using the "Display XY data" functionality. A buffer of 90m was added to the points to aid ENVI classification. The X field was set to read the longitude field in the table and the Y value the latitude. The geographic coordinate system was set to NAD 1983. A buffer of 90m was added to the points to aid ENVI classification. The result was then exported as a shapefile in order to be read by ENVI.

The CDOM shapefile was added to ENVI as a vector file, then converted into separate Regions of Interest (ROI) based on CDOM. From here, the ROIs were manually sorted into 5 equally spaced categories from low CDOM (0-2.138) to high CDOM (8.552-10.69). These ROIs were then split using the Generate Random Sample Using Ground Truth ROIs tool–half were used as training data while the other half used as assessment data. The Minimum distance classification was then run, with 5 classes and a maximum distance from the computed class means at 1 standard deviation.

A confusion table was then calculated to assess the results


Overall, the calculated accuracy for the classification was 26%. Per class accuracy was also calculated: Low 71%, Med-Low 54%, Medium 77%, Med-High 35%, High 0%. The accuracy of the Low and Medium classes is consistent with the literature (Kutser et al., 2005). This suggests that the methodology is indeed valid for correlating spectral signature with CDOM. 



The classes for which the majority of ground truth data was provided had values consistent with the literature. This suggests that the low accuracy found in the higher CDOM classes is not inherent to the methodology but rather a function of limited data availability. Limited data availability also impacts the statistical significance of the findings, making conclusions difficult–even with an accuracy assessment. It is worth noting that various unsupervised classification methods were unsuccessful, which may prevent more automated approaches to the methodology in the future. 

However, the limited data provided prevents further conclusions to be made regarding the validity of the accuracy with regards to area and seasonality. Further analysis is needed to produce a baseline from which to compare the temporally-composited Google Earth Engine Landsat Mosaics for this kind of analysis. Further study of the biological implications of CDOM can help with class definition–as it stands now, the definition of the 5 classes used is arbitrary; a more ecologically sound division is likely. 

Ironically, while this methodology suggests the possibility of an automated approach to calculation of CDOM levels from lakes, more ground truth data is needed to fully assess the approach. Though this would likely come at great cost, a statistically sound approach to deriving CDOM content from water bodies could have great benefits, especially as a warning system for eutrophication–a problem more receptive to a proactive approach rather than a reactive approach.

Limitations and Challenges

While the Landsat imagery used for this study is free, the tools required are not; both ArcMap and ENVI come at the cost of many thousands of dollars. Furthermore, the large amounts of spatial data used means they are processor-intense–many of the steps used here had processing times of 5 to 10 minutes. These logistical considerations must be taken into account when applied on a larger scale. 



While hard to make solid conclusions about statistical validity of approach (partly due to a lack of knowledge about the biological implications of CDOM, the results indicate that a spectral approach to CDOM measurement holds merit. Further research is needed to determine the larger validity of the methodology, justified by the potential time and cost savings.


Kutser, T., Pierson, D. C., Kallio, K. Y., Reinart, A., & Sobek, S. (2005). Mapping lake CDOM by satellite remote sensing. Remote Sensing of Environment94(4), 535-540.

Initial proposal

Here's a link to the proposal of this project
Subpages (1): Proposal