decidous: "falling off at maturity" refers to trees or shrubs that lose their leaves.
shrub(bush): distinguished from a tree by its multiple stems and shorter heights (usually under 6m)
many species can grow in both tree and shrub structure depending on condition.
(CH2 - CH1) / (CH2 + CH1)  Where CH1 is the reflectance in the visible wavelengths (0.58-0.68 um) and CH2 is the reflectance in the reflective infrared wavelengths (0.725-1.1 um). link ch1=660 ch2=730 link   OR 798-679/798+679  (NIR-RED)/(NIR+RED)

 Year  Title  Citation  Where  Authors  Description

    - number of images? resolution
    - lidar? hyperspectural? others?
    - additional knowledge?
    - data source? private? Modis? Neon?
2. ecological question addressed
    - output property? (e.g., species?)
    - value or distribution?
3. machine learning methods:
    - classification vs. clustering?
    - features used? feature reduction?
    - SVM? logistic regression?
4. results
    - accuracy/recall/f1 measure
5. can we improve the methodology using additional knowledge?

 frf          in memory array datasbe.

checkout java and GPU for better distributed computations.
 2012  Mapping Savanna Tree Species at Ecosystem Scales Using Support Vector Machine Classification and BRDF Correction on Airborne Hyperspectral and LiDAR Data  5  Remote Sensing Colgan, Baldeck, Féret, Asner  Dataset: east South Africa, collected by CAO.
Resolution 1.12m, flight overlap 100%
point density averaged two points per spot.
Spatial error less than 0.20m vertically, 0.36m horizontally
72 bands: 385 to 1,054 nm
Resolution 1.12m (co-aligned with the LiDAR)
Flight at April-May 2008.
Ground data: collected 2009
729 individual tree crowns
Specie Info
basal diameter
crown diameter
additional 124 circular field plots of 30 m diameter were used to measure the abundance of each specie
these plots were not used in calibration or validation of the species classifiers since trees were not individually located within the plots

  • Filter

    • NDVI ≥ 0.5 : exclude soil and grass

    • NIR reflectance ≥ 20% : exclude heavily shaded samples

    • Only crowns with three pixels passing both filters were used as model calibration data

      • smallest detectable crown in the LiDAR data was 2.2 m in diameter (two pixels wide)

    • minimum 25 crowns each specie

    • 15 major crowns + a category of others.

  • Software Packages

    • Crown Segmentation: e-Cognition

    • SVM: e1071 package in R

  • Separate BRDF Correction per type:

    • well-lit vegetation

    • shaded veg

    • grass

    • soil

    • other (e.g., water)

    • and clouds.

  • 1/5th of crowns chosen as test data

Classification of specie for each pixel:

    •     A two level SVM stack

      1. SVM classification at pixel level using hyperspectral to get specie probability

      2. SVM on segmented crown level using averaged specie class probability, max height, area.

    • SVM Kernel: radial basis function (RBF) kernel

In confusion matrix:
Producer Accuracy = # correctly classified pixels / # of ground pixels (precision)
User Accuracy = # of classified pixels of class / total # of pixels

BRDF correction
Reduce NIR variation from 40% to 5% 1% accuracy
increase in precision due to BRDF correction

  • 65% to 71% increase in accuracy when ground data increased from 290 to 729 crowns.
  • Second level SVM incorporating incorporating maximum height (rather than pixel height) increased from 73.8 to 76.5
  • Given the additional cost and complexity of having both LiDAR and hyperspectral data for a given study area, we note the relative improvement over hyperspectral data alone was relatively minor (from 73.0% to 76.5%).
  • Using the mean spectra over all pixels in a crown as input into a crown-level SVM had poor performance (approximately 54% overall accuracy)—much worse than predicting the pixels individually
  • Chose maximum height over average height : it is less dependent on canopy shape, less variance in maximum height than mean height among species.
 2013 Estimating Vegetation Beta Diversity from Airborne Imaging Spectroscopy and Unsupervised Clustering
 0  Remote Sensing
Baldeck, Asner Goal:  estimating the beta diversity among sites from high spatial resolution airborne data. (how similar are two regions in terms of specie diversity)
Approaches: 1. Unsupervised with Eucledian distance 2. k-means clustering 3. supervised
Notes: KNP Park South Africa. The multiple clustering model allows a rapid assessment of the spatial arrangement of the biodiversity of a region. This outperformed the mean Euclidian distance among pixels, not accuracies as high as a supervised species classification approach. 50% of pixels in supervised scenario were classified as other. due to lack of ground measures and hence unhelpful.
 2014 Landscape-scale variation in plant community composition of an African savanna from airborne species mapping.  1  Ecological Applications  Baldeck, Colgan, Fe Ret, Levick, Martin, Asner Goal: Maps of community compositional variation were produced by ordination and clustering, and the importance of hillslope-scale topo-edaphic variation in shaping community structure was evaluated with redundancy analysis.
Aproaches: 2 layer stacked SVM as before. the analysis is on hierarchical species clustering, commnity determination and proportion of species in each.
Notes: They use R^2 metric (correlation^2) and abundance per unit area and 0<indicator<100 value of each dominant specie per clustered hierarchy

 2012 Mapping tree species composition in South African savannas using an integrated airborne spectral and LiDAR system  10 Elsevier
Remote Sensing of Environment
Cho, Mathieu, Asner, Naidoo, vanAardt,  Ramoelo, Debba, Wessels, Main,  Smit, Izak PJ
1. Compare CAO, WorldView2, and QuickBird hyperspectral efficiencies for classification
2. Check out Spectral + tree height effect
3. Matching Expert Knowledge:
whether the species diversity maps generated from the classified species maps corroborate with conventional knowledge on species diversity in the region. For example, we assumed that the maps produced would show that granite soils are richer in tree species than gabbro or whether Acacia nigrescens is more abundant on gabbro than on granite

maximum likelihood classifier, pixel level

KNP Park South Africa
Obtains WorldView and QuickBird quality images by resampling from CAO (1.2m resolution)
Worldview2: satelite, 8 bands, 1.85m resolution, revisit 1 days, collecting up to 1 million km^2 image per day
WorldView-2 spectral bands are centred at
425 nm (absorbed by chlorophyll),
480 nm (absorbed by chlorophyll),
545 nm (sensitive to plant health),
605 nm (absorbed by carotenoids — detects ‘yellowness’ of vegetation),
660 nm (absorbed by chlorophyll), 725 nm (sensitive to vegetation health),
835 nm (sensitive to leaf mass and moisture content)
and 950 nm (sensitive to leaf mass and moisture content) (see review by Ustin et al., 2009).
Quickbird: Satellite, 4 bands, submeter resolution. Panchromatic (black and white) imagery at 60cm resolution and multispectral at 2.4- and 2.8-m resolutions.
485 nm (blue), 560 nm (green), 660 nm (red) and 830 nm (near-infrared)
Worldview2 satellite data performs equal/better than CAO due to specific band selection. But we can't have Lidar via satellite

hyperspectral +lidar 2% better than hyperspectral itself. but this is statistically significant (is not just by chance and can reject null hypothesis)

Possible Approaches
Parametric methods: maximum likelihood, discriminant analysis
first order variations (e.g. mean values)
second order variations (e.g., covariance matrices): accounting for within-species variability in the classification.
Cons: High data dimensionality of hyperspectral data: # of training spectra per species must be >= # of spectral bands. e.g. having at least 220 training spectra per species if we have 220 spectral bands. Non-parametric classifiers might be more useful than maximum likelihood in case of high within-species variability
Non-parametric Methods (make no assumption of the data distribution)
• Spectral similarity measures e.g. spectral angle mapper
• Sub-pixel classification techniques e.g. spectral mixture analysis
• Machine learning methods e.g. ANN and SVM, decision tree classification techniques e.g. Random forests

trees at various phenological stages cause confusion (how old they are).
Filter: height > 2m (avg minimum height of trees).

Tools Used: To get WorldView2 quality data from COA, it's data was reampled in ENVI. First, we converted the classified species raster map into a vector map of the species polygons in the ENVI software. The resulting species polygon image was exported into ArcGIS software, where the polygon centroids were converted into point data (point shapefile), representing the species. Subsequently, the point shapefile was exported to DIVA GIS software (Hijmans et al., 2005), where the tree species diversity maps were generated on a per hectare (ha) basis.
 2005 Hyperspectral discrimination of tropical rain forest tree species at leaf to crown scales  237 Remote Sensing of Environmen Clark, Roberts, Clark  161 bands, 7 species, Costa Rica
scale - near-infrared (700–1327 nm) bands were consistently important regions across all scales. Bands in the visible region (437–700 nm) and shortwave infrared (1994–2435 nm) were more important at pixel and crown scales.
leaf: laboratory
pixel: flight
crown: flight : majority of pixels class
Approach: Classifications: applied to combinations of bands from a stepwise-selection procedure.
linear discriminant analysis (LDA) using MASS package in R
maximum likelihood (ML) in ENVI v4.1 and IDL v6.1
spectral angle mapper (SAM) in ENVI v4.1 and IDL v6.1
The SAM classifier performed poorly at all scales and spectral regions of analysis

  • Determine if spectral variation among tree species (interspecific) is greater than spectral variation within species (intraspecific), thereby permitting spectral-based species discrimination.
  • Identify the spatial scale(leaf/pixel/tree) and spectral regions that provide optimal discrimination among TRF emergent tree species.
  • Develop an analytical procedure for the species-level (floristic) classification of individual tree crowns using their reflectance spectra.
  • Assess the relative importance of narrowband hyper- spectral versus broadband multispectral information for species identification of TRF trees.
Ground data from previous 1998 paper by D.B. Clark : 544 tree 27 specie -> 7 specie selected
Edaphic variation and the mesoscale distribution of tree species in a neotropical rain forest.
Flight: The U.S. Naval Research Laboratory flew the airborne HYperspectral Digital Imagery Collection Experiment (HYDICE) sensor over LSBS in 1998

 the spectral angle is a metric used for comparing the degree of similarity between two spectra. Unlike Euclidean distance, the spectral angle is insensitive to linearly-scaled differences among spectra such as those caused by illumination.

Feature Selection:
  • Full spectra: 161 bands
  • Sub-sampled spectra regions (VIS, NIR) with 10 bands per region
    • These bands were evenly-spaced with an average spacing of 23 nm (VIS), 55 nm (NIR), 25 nm (SWIR1), and 47 nm (SWIR2).
  • A forward stepwise selection method based on discriminant analysis. This method was implemented using the SAS STEPDISC procedure  software.
 2012  Species-Level Differences in Hyperspectral Metrics among Tropical Rainforest Trees as Determined by a Tree-Based Classifier  4  Remote Sensing  Clark, Roberts
 Costa Rica 1.6 resolution. 210 bands

Metrics that respond to vegetation chemistry and structure were derived using narrowband indices, derivative- and absorption- based techniques, and spectral mixture analysis.

Random Forest Classifier in R

Metrics that respond to vegetation chemistry and structure were derived using narrowband indices, derivative- and absorption- based techniques, and spectral mixture analysis.

RF classification was performed with hyperspectral metrics from
tissue- (bark, leaf),
crown-scale spectra,

with the following sets of metrics:
(1). indices;
(2). absorption-based;
(3). derivative;
(4). SMA fractions (pixel and crown scale only);
Spectral mixture analysis (SMA) models reflectance spectra as a linear combination of dominant spectral components, or endmembers, producing per-pixel fractional abundance of each endmember and a root-mean-square error (RMSE) model fit
(5). all available metrics.

Tropical forest canopies are typically modeled as a mixture of the following end members:
  • “green” photosynthetic vegetation (GV),
  • non-photosynthetic vegetation (NPV),
  • shade
  • and possibly soil substrate endmember
 Linear mixing:

Pixel- and crown-scale spectra were unmixed using a three-endmember SMA model [69] composed of GV, NPV and photometric shade

 2014  Tree crown delineation and tree species classification in boreal forests using hyperspectral and ALS data  0 Elsevier Remote Sensing of Environment Dalponte, Ørka, Ene, Gobakken, Næsset
 Location: Norway. ASL (Airborne Laser Scanning - Lidar)

 Totally, 2363 trees were recorded in the 23 plots having a dominant species distribution of 57% spruce, 28% pine, and 15% broad- leaves. 2008    160 bands, spectral resolution 3.7nm

(SVM), having as input features all the hyperspectral bands acquired by the sensor. R package Kernlab.

, the classification at ITC level was obtained by aggregating the classified pixels inside each ITC according to a majority rule.

five classification cases were analyzed:
i) a fully manual case based on manually delineated ITCs (the M–M case),
ii–iii) two fully automatic cases based on ITCs automatically delineated on hyperspectral data (the H–H case) and on ALS data (the L–L case), and
iv–v) two semi-automatic cases that consider manually delineated ITCs in the training phase and ITCs automatically delineated on hyperspectral data (the M–H case) and on ALS data (the M–L case) in the validation phase.

 Two thresholding methods were tested:
i) the automatic Otsu thresholding method (OTM; Otsu, 1979),
ii) a percentile-based thresholding (PTM).

k-fold cross validation
the classifier was trained with the trees belonging to k − 1 plots (where k is the number of plots; k = 23 in our study) and validated on the left-out plot. This process was repeated k = 23 times.

The model selection of the SVM was performed using a five-fold cross validation on the training dataset of size N − 1.

The relationship between the classification accuracy and the distributions of
i) the DBH of the trees (measured on the ground), and
ii) the crown area provided by the two delineation methods,
was investigated using analysis of variance (ANOVA) and additional multiple comparison test of the differences in means (the Tukey's “Honest Significant Difference”) implemented in the stats-package of R.
it is more conservative with respect to, for example, a standard t-test

 2012  Semi-Supervised Methods to Identify Individual Crowns of Lowland Tropical Canopy Species Using Imaging Spectroscopy and LiDAR  6  Remote Sensing  Féret,  Asner  
Nine tree species, Hawaiian lowland
hyperspectral imagery, LiDAR intensities, and LiDAR height
semi-supervised Support Vector Machine classification using tensor summation kernel was superior to supervised classification
combination of hyperspectral imagery and LiDAR data usually improved species classification

 Both LiDAR intensity and LiDAR canopy height proved useful for classification

Recent work has shown that multiple species can be detected in tropical forests, yet accuracies and the potential for automation remain highly uncertain

image size: 1,980-by-1,420 pixel image

resolution of 0.56 m

24 spectral bands of 28 nm in width, evenly spaced between 390 nm and 1,044 nm
lidar spot spacing was 0.56 m both across and down-track. 50% overlap between adjacent flightlines, resulting in two laser shots per 0.56 m. a physically-based model was used to estimate top-of-canopy and ground surfaces using REALM software and Terrascan/Terramatch software packages.
3 intensity values

791 individual tree crowns (ITCs) from 17 species

we reduced the dataset by discarding ITCs smaller than 50 pixels and species with less than 12 ITCs. The final dataset used for this study encompassed 333 ITCs from nine different species

3-year time lag between the acquisition of the image (September 2007) and the collection of the ground truth (November 2010)

Both the supervised and semi-supervised classifications performed in this study were based on the SVM
These linear boundaries between classes are generated by maximizing the margins between the hyperplane and the closest training samples (i.e., the support vectors) and minimizing the error of the training samples that cannot be differentiated. As the classes are rarely linearly separable in the original feature space, SVM projects the training dataset into a kernel feature space of higher dimensionality. This is performed nonlinearly. linear (L-) SVM and radial basis function (RBF-) SVM both outperform other non-parametric classifiers such as the k-nearest neighbor or artificial neural network approaches, and have comparable or better performance than discriminant analysis. all classification tasks were performed using the MATLAB interface of the LIBSVM package. The RBF function is written as follows: ...
training data, by semi-supervised methods. The semi-supervised classification takes advantage of complementary data corresponding to unlabeled samples in order to improve the estimation of the marginal data distribution during the training stage. To start the semi-supervised approach, we first randomly selected 500 unlabeled pixels from the total dataset. We then implemented the semi-supervised approach proposed by Tuia and Camps-Valls, which is based on the local regularization of the training kernel. The information contained in these unlabeled samples is used to create a bagged kernel combined with the training kernel in order to deform its base structure through a cluster-based method. This bagged kernel is obtained after successive k-means clustering (with different initialization but the same number of clusters) are performed on the combined training/unlabeled samples. The bagged kernel accounts for the number of time two samples i and j have been assigned to the same cluster. Here we compared two different kernels: the tensor product kernel which deforms the training kernel by multiplying it with the bagged kernel, and the tensor summation kernel which deforms the training kernel by adding it with the bagged kernel. A package including MATLAB source code for this method is publicly available [37] (

These two types of LiDAR variables complement one another, and we recommend combining them with hyperspectral data whenever possible as the full combination of hyperspectral imagery, LiDAR intensity and canopy height outperformed any other combination tested here when averaged on the nine species studied, and showed significant improvements compared to hyperspectral data only or combined with one of the two LiDAR data types studied for six of these species.

 2013  Tree species discrimination in tropical forests using airborne imaging spectroscopy  14 Geoscience and Remote Sensing, IEEE Transactions on   Feret, Asner Hawaiian

supervised classification
Nonparametric methods
linear and radial basis function
support vector machine,
artificial neural network,
k-nearest neighbor
parametric methods
linear, quadratic, and regularized discriminant analysis
  • a clear advantage in using regularized discriminant analysis, linear discriminant analysis, and support vector machines.
combine segmentation and species classification from regularized discriminant analysis to produce a map of the 17 discriminated species.

mixed crown is defined as one in which two or more species occupy the same canopy space at a scale of 1–2 m spatial resolution.

A total of 920 ITCs were identified and located, corresponding to 17 “pure” species and 12 types of mixed crowns, resulting in 29 different classes to be discriminated.

majority- class rule classification

The mean shift clustering algorithm implemented in the Edge Detection and Image SegmentatiON system (EDISON) [57] gave satisfying results for automatic segmentation of tree crowns with a subset of three visible bands of our data(R=646nm;G=560.7nm;B=447nm).
The tree segments do not exactly correspond to ITCs. However, [28] found that even polygons produced through automated methods that were only partially in agreement with detailed ground mapping improved tree species classification
After this segmentation, a pixel-wise classification is performed using the classifier showing the best performance for pixel-wise classification, with 50 pixels per species for training, and a majority vote rule is applied to decide about the class assigned to each region.

 2014 A framework for mapping tree species combining hyperspectral and LiDAR data: Role of selected classifiers and sensor across three spatial scales  1  Elsevier International Journal of Applied Earth Observation and Geoinformation Ghosh, Fassnacht, Joshi, Koch location: Germany
 Goal: scale effect in imaging spectroscopy when moving from 4 to 30 m pixel size for tree species mapping,

Two airborne (HyMAP) and one spaceborne (Hyperion) imaging spectroscopy dataset with pixel sizes of 4, 8 and 30 m, respectively were available to examine the effect of scale over a central European forest.
managed forest with relatively homogenous stands featuring mostly two canopy layers.

Supervised kernel based (Support Vector Machines) and ensemble based (Random Forest)

8m slightly better than 4m and 30 meter produced sound results.


hyper spectral, lidar(12 points per square meter. collected by NASA

Six different sets of predictor variables (reflectance value of all bands, selected components of a Minimum Noise Fraction (MNF), Vegetation Indices (VI) and each of these sets combined with LiDAR derived height) were explored at each scale

For processing the point-clouds and to generate an nDSM, we used the TreesVis.fitting procedure that is based on a force minimi- zation algorithm. subtracting surface model from terrain model

 natural conditions like tree age, forest structure and density shall be considered/

There is no common usage of the terms digital elevation model (DEM), digital terrain model (DTM) and digital surface model (DSM) in scientific literature. In most cases the term digital surface model represents the earth's surface and includes all objects on it. In contrast to a DSM, the digital terrain model represents the bare ground surface without any objects like plants and buildings

 2012  Tree Species Classification with Random Forest Using Very High Spatial Resolution 8-Band WorldView-2 Satellite Data
 Remote Sensing Immitzer, Atzberger, Koukal
 worldview2 satellite 8 band, Austria
Random Forest (RF) classification (object-based and pixel-based)
sub-mountain zone
For the object-based approach, we calculated the mean band values for each crown polygon using its within-crown pixel spectra.

At nadir the ground resolution (GSD) is 50 cm for the panchromatic band (0.46–0.80 μm) and 200 cm for the multispectral bands.
some tree species, were much better separated with 8 bands, compared to the sole use of the 4 standard bands.
 Classification of savanna tree species, in the Greater Kruger National Park region, by integrating hyperspectral and LiDAR data in a Random Forest datamining environment  14 ISPRS Journal of Photogrammetry and Remote Sensing
Naidoo, Cho, Mathieu, Asner
Kruger National Park region, South Africa,
seven predictor datasets
Random Forest
important predictors,:  height; NDVI; the chlorophyll b wavelength (466 nm) and a selection of raw, continuum removed and Spectral Angle Mapper (SAM) bands.

Concluded that the hybrid predictor dataset Random Forest model yielded the highest classification accuracy

72 bands

Lidar:     there is one laser shot per pixel ~ 1.3 points per 1.1 pixel size

 2009 Retrieval of foliar information about plant pigment systems from high resolution spectroscopy  116  Remote Sensing of Environment
 Ustin, Gitelson,  Jacquemoud,  Schaepman,  Asner, Gamon,  Zarco-Tejada.
Pigment: the natual coloring matter of animal or plant tissue.
Pigment color differs from structural color in that it is the same for all viewing angles, whereas structural color is the result of selective reflection or iridescence, usually because of multilayer structures. For example,butterfly wings typically contain structural color, although many butterflies have cells that contain pigment as well.

All biological pigments selectively absorb certain wavelengths of light while reflecting others. The light that is absorbed may be used by the plant to power chemical reactions, while the reflected wavelengths of light determine the color the pigment will appear to the eye.

green pigment(Chlorophyll) along with several red/yellow pigments
that help to capture as much light energy as possible.

 We review recent advances in detecting plant pigments at the leaf level and discuss suc- cesses and reasons why challenges remain for robust remote observation and quantification.


New methods to identify and quantify individual pigments in the presence of overlapping absorption features would provide a major advance in understanding their biological functions, quantifying net carbon exchange, and identifying plant stresses.   we focus primarily on re- flectance measurements, at the leaf level, emphasizing advances in the past 15–20 years, and examining two types of quantitative approaches: (1) empirical methods and (2) physically based radia- tive transfer models and quantitative methods.

It has been noted that extracted chlorophyll absorption peaks are shifted about 20 nm to shorter wavelengths than observed in re- flectance from intact leaves.

--which indexes or wavelengths's to use for various detections.
   Mapping a priori defined plant associations using remotely sensed vegetation characteristics      Roelofsen, Lammert Kooistra, Peter M. van Bodegom, Jochem Verrelst, Johan Krol, and Jan-Philip M. Witte.  
   Linguistic Regularities in Continuous Space Word Representations     Tomas Mikolov, Wen-tau Yih, Geoffrey Zweig  Note:
Each word is represented as a vector and each relationship e.g. clothes to shirt can be represented as their vector offset
           Vector oriented reasoning based on the offset between words. ‘better:best is rougher:’  answer would be roughest
           The only contribution is applying it to relation comparison dataset. The real implementation is done by Tomas Mikolov

  Global Biodiversity Information Facility

geo-tagged location of species, can be used with NDVI of Modis to study the change of NDVI over time per specie.

 2012  JuliaLang
 12  arxiv Jeff Bezanson, Stefan Karpinski, Viral B. Shah, Alan Edelman JuliaLang: Parallel/distributed scientific programming language vs R, Matlab parallel toolbox

 Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments.

We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby

con: modern data analytics systems are data oriented. data migration is what determines effieiency not parallel compuatation or not yet another parallel programming language... process data right where it resides (in database computation)
  D4M: Dynamic Distributed Dimensional Data Model
           R-tree is a data structure that splits the space into nested rectangeles. it is good for object hierarchies: each minimum rectangle for a polygon, line or road. this will allow quick tests of  for maitaining polygons. It is like B-trees but instead of a node just cointaining a value, each node contains the range of the rectangle. a good video link

K-D tree : you have a set of points and want to answer querries such as closest n-neighbors or nodes in range r, splits the space horizontally or vertically at chosen nodes.

Quad-tree is like K-D tree but instead of each node splitting space vertically or horizontally, each node splits space to NE,SE,NW, SW (hence quad), the newer points will each go to proper quad and have their own children. can find all points which are contained within a range.

alpha shapes/concave hull

 2011  A Kd-tree-based Outlier Detection Method for Airborne LiDAR Point Clouds  2   IEEE Image and Data Fusion   Notes:
the average of the distances between the central point and its k-neighborhood points are calculated.
If the average distance is larger than an adaptively preset value, the point is regarded as an outlier.

crown segmentation using kd-tree and expanding on nearest neighbors graph

Lidar data structures: TIN, octree, k-d tree
k-d tree
 k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing points in a k- dimensional space.
range searches
nearest neighbor searches
if for a particular split the "x" axis is chosen, all points in the subtree with a smaller "x" value than the node will appear in the left subtree and all points with larger "x" value will be in the right sub tree.
 ANN [7] is employed to realize the kd-tree. ANN is a library written in the C++ programming language to support both exact and approximate nearest neighbor searching in spaces of various dimensions.
kd-tree is more efficient than the TIN

A multi-resolution approach for filtering LiDAR altimetry data
An optimal algorithm for approximate nearest neighbor searching in fixed dimensions

 2012  A New Method for Segmenting Individual Trees from the Lidar Point Cloud  29  Photogrammetric Engineering and Remote Sensing  Wenkai Li,
Qinghua Guo,
Marek Jakubowski, Maggi Kelly

TerraScan software used to classify raw lidar point data to ground / above-ground
Ordinary kriging(point interpolation) used to interpolate the ground points and generate the digital elevation model (DEM) at 1 m resolution
normalized the vegetation point cloud values by sub- tracting the ground points (DEM) from the lidar point
After normalization, the elevation value of a point indicates the height from the ground
taking advantage of the relative spacing between trees
the spacing at the top of a tree is larger than the spacing at the bottom
 starting from a tree top, we can identity and “grow” a target tree by including nearby points and exclude points of other trees based on their relative spacing.
points are classified sequentially, from the highest to the lowest.
Starting from the seed point A, we classify other lower points sequentially.
The threshold should be approximately equal to the crown radius. If the threshold is too small, trees with elongated branches may be over-segmented;
adaptive threshold can be used, assuming that taller trees have larger crown diameters
 projected into 2D Euclidean space, most of the points fall into the left sector
 a spacing threshold
 a minimum spacing rule, and a horizontal profile of the tree shape. Under-segmentations can be reduced by using a relatively small threshold, and over-segmentations can be reduced based on the shape and distribution of the points.
During each iteration, only one tree (target) is segmented,
points corresponding to this target tree are removed from the point cloud.
 First, find the highest point (global maximum)

  Delineating Individual Trees from Lidar Data: A Comparison of Vector- and Raster-based Segmentation Approaches
  Remote Sensing    
good categorization of approaches for tree segmentation using either lidar and/or images

  Mining lidar data with spatial clustering algorithms        out of various clustering approaches used, DBSCAN outperformed others.

Data clustering algorithms
Partitioning methods: 
partition a data set into a given number of clusters k-means, k-medioid
Hierarchical methods: 
single link (SL) algorithm emphasizes the connectedness of the patterns in a cluster
Prototype-based optimization methods:
use an optimization procedure tuned for a particular shape of the cluster.
Spectral clustering and kernel-based: 
 number of parameters is quadratic with respect to the number of samples.
Density-based methods: 
assume that clusters are high- density regions.
Manifold-based methods: 
 Evolutionary algorithms for clustering
Evolutionary algorithms for clustering:
genetic algorithm (GA) Combination: fuzzy clustering and minimum spanning tree

Three Approaches
the neighbourhood of a given radius ε needs to contain at least a minimum number of objects,
 the density in the neighbourhood has to exceed a certain threshold.
map each point to the distance of the k-th nearest neighbour.

ordering points based on ε
minimum number of points required to call a circular area ‘dense’
uses perimeters of triangles instead of circular areas.
(1) prepare an ordered ‘live edge’ database;
(2) create ‘sparse cluster’ and ‘noise’ data;
(3) test the clusters and noise for mergeability and output final clusters

c) Manual clustering was performed using TerraScan software

outputs of the clustering algorithms are compared with the manually generated clusters, using ARI as a measure
qualitatively visualized by overlaying the points

adjusted rand index (ARI)
ranges from 0 to 1, as a measure of the similarity between two different clustering approaches
A value closer to zero would imply that the two clustering approaches are dissimilar, whereas values closer to one would indicate that the approaches have yielded similar results.

   lidar Tools
lidar tree segmentation
Fusion and lastools

A framework for non-photorealistic rendering (NPR) Quadtree
GeoTools - The Open Source Java GIS Toolkit Quadtree
R-Tree UIUC Mini Lecure

Quadtree nearest neighbor algorithm
Automatic Plane Extraction from LIDAR Data Based on Octree Splitting and Merging Segmentation
LiDAR visualization using octree and only keeping nodes relevant to rendering in memory. UC Davis


Tools for Lidar Classification / Filtering:

  • MCC-LIDAR - An open source command-line tool for processing discrete-return LIDAR data in forested environments. It classifies data points as ground or non-ground using the Multiscale Curvature Classification algorithm developed by Evans and Hudak, 2007

  • GRASS GIS - Open source GIS software.  Includes a suite of tools related to lidar data processing as discussed in this GRASS lidar wiki entry.  Currently doesn’t offer direct support for point cloud data in LAS format.

  • BCAL Lidar Tools - Open source tools developed by the Idaho State University Boise Center Aerospace Lab in IDL as a plugin for the ENVI software package.  Includes a Height Filtering tool optimized for open rangeland (sagebrush) vegetation developed by Streutker and Glenn, 2006.

  • SAGA GIS - Open source GIS package.  “SAGA includes several tools to manipulate the point cloud, e.g. an attribute calculator, reclassifier, subset extractor and many other methods like gridding and interpolation. There is also (a grid based) bare earth filter, adapted from Vosselman (2001).”
Point cloud library
PRocess TOol LIdar DAta in R

HOW TO: Install latest geospatial & scientific software on Linux

 2011  Point Cloud Classification for Water Surface Identification in Lidar Datasets    UT Austin Master's Thesis
   TINs are constructed by triangulating set of vertices; the vertices are connected with a series of edges to form a network of triangles. The resulting triangulation satisfies Delaunay triangulation which ensures that no vertex lies within the interior of any of the circum-circle of the triangles in network.

 Finding K nearest Neighbors in KD-tree
1. Starting with the root node, the control moves down the tree recursively, i.e. it goes to the right or the left depending on whether the 2D distance (3D can also be used) of point is greater or less than a current node in the specified split dimension.
2. Once control reaches the leaf node, it saves the current node point as the current best fit.
3. The control unwinds the recursion of the tree, performing the following steps at each node:
a. If the current node is closer than the current best, then it becomes the current best fit.
b. The control checks whether there could be any points on the other side of the splitting plane that are closer to the search point than the current best. This is done by intersecting the splitting hyperplane with a hypersphere around the search point that has a radius equal to the nearest distance. This is implemented as a simple comparison to see whether the difference between the splitting coordinate of the search point and current node is less than the distance from the search point to the current best.
i. If the hypersphere crosses the plane, there could be nearer points on the other side, so the control must move down the other branch of the tree from the current location or node, searching for closer points in space by following the same recursive technique.
ii. If the hypersphere does not intersect the splitting plane, then the control moves up the tree and the entire branch on the other side is not inspected any more.
4. Once the process is finished for the root node the search returns the index and the distances of the points from the point of interest.
Finding Neighbors within K radius
The search is similar to finding the K nearest neighbors; the only difference being the Euclidean distance from the potential points is calculated at each step and compared if the distance is less than K.

 2007 3D LiDAR point-cloud segmentation
        Kittipat's Homepage
The MATLAB toolbox is available here. The brief manual can be found here.

A brief slides can be found here.

 2011 Individual Tree Crowns Delineation Using Local Maxima Approach And Seeded Region Growing Technique    GIS Ostrava
   Seeded region growing is an iterative process started in a pixel from the set of seeds. Pixels from the seed neighborhood are subsequently classified whether or not they are part of the same crown as the seed.
1) Absolute distance from the seed.
2) Brightness agreement.
3) Spectral agreement
   Algorithms for Nearest Neighbor Search        Slides, KD-tree R-Tree ...
 29  Linguistic Regularities in Continuous Space Word Representations    NAACL-HLT    each word is represented as a vector and each relationship e.g. clothes to shirt can be represented as their vector offset
vector oriented reasoning based on the offset between words. ‘better:best is rougher:’  answer would be roughest
the only contribution is applying it to relation comparison dataset. The real implementation is done by Tomas Mikolov

Continuous space language models
 vector-space word representations
implicitly learned by the input-layer weights.'
 capturing syntactic and semantic regularities in language

each relationship is characterized by a relation-specific vector offset.

allows vector-oriented reasoning based on the offsets between words
 male/female relationship is automatically learned with the induced vector representations “King - Man + Woman”  results in a vector very close to “Queen.”
word vectors capture syntactic regularities by means of syntactic analogy questions
word vectors capture semantic regularities by using the vector offset method
 neural network language models
 representation of words as high dimensional real valued vectors.
words are converted via a learned lookup- table into real valued vectors used, as the inputs to a neural network

 n-gram model works in terms of discrete units that have no inherent relationship to one another,
continuous space model works in terms of word vectors where similar words are likely to have similar vec- tors.
Thus, when the model parameters are adjusted in response to a particular word or word-sequence, the improvements will carry over to occurrences of similar words and sequences.

By training a neural network language model,

not just the model itself, but also the learned word representations, which may be used for other, potentially unrelated, tasks.

predicting a probability of  the “next” word, given some preceding words. 

were first studied in the context of feed-forward networks

later in the context of recurrent neural network models (Mikolov et al., 2010; Mikolov et al., 2011b)

Recurrent Neural Network
input layer, a hidden layer with re- current connections
input vector w(t) represents input word at time t encoded using 1-of-N coding, and the output layer y(t) produces a probability distribution over words. The hidden layer s(t) maintains a representation of the sentence history.

 input vector w(t) and the output vector y(t) have dimensionality of the vocabulary.
 word representations are found in the columns of U, with each column representing a word.
The RNN is trained with back- propagation to maximize the data log-likelihood
we tagged 267M words of newspaper text with Penn Treebank POS tags
 similarity between members of the word pairs (xb, xd), (xc, xd) and dis-similarity for (xa, xd).

we used vectors generated by the RNN toolkit of Mikolov   
RNN generates vector for each word: taken from Mikolov
Training on Mikolov’s dataset
 We present a new dataset for mea- suring syntactic performance, and achieve almost 40% correct.

Surprisingly, both results are the byproducts of an unsupervised maximum likelihood training criterion that simply operates on a large amount of text data.

 2013   - Endmember Detection Using Graph Theory    IGRASS  Rohani
 image is segmented, with the minimum size of the segment= 20, k=0.001;
                spectral angle distance as the distance metric the original imag
super pixel is the avg of all points there
knn graph of superpixels are drawn
sum of all distances for each superpixel with its knns are computed
super pixels are sorted based on the sum calculated
super pixels with largest sum are endmembers.
itis observed that isolated superpixels are not endmembers and they always have a high sum
fo all k selelctions, so they can be discarded

Some authors proposed approaches based on piecewise convex models for nonlinear unmixing (e.g. [3], [4]) which assume only local geometrical structure
choice of number of convex regions and the number of the endmembers in each convex region
explores the topological relationships between data points regardless of the geometry of the data cloud.

”centrality” from graph theory to identify boundary points of the data cloud
Multi Dimensional Pixel Purity Index algorithm

V is the set of the vertices, the centroid of each segment of the image, and E is the set of the edges which represent the connectivity between the nodes. In our approach, we build an edge between twonodes if euclidean ditance is below threshold.
we calculate the spectral Euclidean distances between each pair of the minerals available in spectral libraries and we set the value of ✏ equal to the least spectral Euclidean distance found among these pairs.

In linear mixing model, find- ing the endmembers is equivalent to identifying the vertices of the simplex containing (most of) the data cloud. Even if the mixing is not linear, our simulations for intimate mixing confirm that endmembers all lie on the boundary of the data.
where st is the number of shortest geodesic paths from s to t, and st(v) the number of the shortest geodesic paths from node s to node t that pass through a vertex v.
which is a measure of a node’s centrality in a network.

From the points with lowest between- ness centrality, we have to choose those that we consider end- member points. Since we are working with a segmented im- age we can assume that none of the segments are affected by local distortions which occur in only a few pixels. The seg- mentation algorithms used discards segments below a given minimum size and the pixels in them are grouped with the spectrally most similar segment in its neighborhood. The sig- nature for this new segment is recalculated by averaging the spectra of all the pixels in the segment. This averaging helps mitigate the effect of the local distortions. Thus, the set of boundary points can now be thought of as the set of spectral signatures present in the image that are not overly affected by local distortions (such as atmospheric effects and instrument noise).

we present a ranking scheme that attempts to maximize the spectral variability at the top of the list.

score for each node is the sum of spectral Euclidean distances of each node with its k-nearest neighbors. The points with the largest values of the score (sum of spectral Euclidean distances S S D ) are placed at the top of the list. The points with largest spectral Euclidean distance will be those points that are on the bound- ary and isolated. This will be followed by some points that are on the boundary in low-density areas and lastly those in the boundary that are in high-density areas.

The isolated boundary points would be unaffected by the choice of k and would always be placed at the top the list.
Our observations show that even in the presence of intimate mixing the purest points (endmembers) are located in regions of higher curvature of the boundary.
endmembers in dense areas

We perform a segmentation with the minimum size of the segment equal to 20, k=0.001 and spectral angle distance as the distance metric to the original

The scientists have ratioed the unique spectra with some dark points spectra in order to have better signal quality (dark points are the points with no distinguishable spectral features and consequently their spectra would be more flat in comparison with others). The spectra identified by the algorithm have been ratioed with the average dark points spectra.

identified by scientists manually (Fig.2 (a))

 algorithm iteratively projects data onto a direction orthogonal to the subspace spanned by the endmembers already determined. The new endmember signature corresponds to the extreme point in the projection.

finds the endmembers as the set of points defining the largest volume by inflating a simplex inside the data.

VCA finds two endmem- bers (red and blue) and cannot find all the endmembers as the most extreme points in the projections. Also for N-FINDR (Fig. 2 (d)), the endmembers are not extracted accurately.

graphs for modeling the image.
Each node represents the centroid of the superpixels of the image and it is connected to the adjacent (spectrally) nodes.

betweenness centrality and sum of the Euclidean distances from nearest neighbors can be employed. We applied this approach to some CRISM images and reported the results of one image. As mentioned in the previous sections, our approach detects the endmembers found by the scientists properly and it can be applied to the images with least assumptions on the type of mixing or the shape of the data cloud

 2010  Superpixel Endmember Detection  32


   A super pixel is 19+ pixels to 100 pixels big.
Superpixel representations can reduce noise in hyperspectral images by exploiting the spatial contiguity of scene features.

Used for Mars satelite images each pixel about 20m

First, a graph-based agglomerative algorithm oversegments the image. We then use segments’ mean spectra as input to existing statis- tical endmember detection algorithms such as sequential maximum angle convex cone (SMACC) and N-FINDR.

superpixel representations significantly reduce the computational complexity of later processing while improving endmembers’ match to the target spectra.

 2007  Google News Personalization: Scalable Online Collaborative Filtering  713  WWW   Summary: two major scheme: 1) MinHash which is a locally sensitive hash 2) Probabilistic Latent semantic indexing which
takes a latent variable z that makes users and contents conditionally independent, then does an EM to calculate the
latent variable z

Content Retrieval
- Collaborative filtering user preferences . use the item ratings by users, are typically content agnostic
- Content-based filtering (e.g. keyword-based searching).  rated highly by the user is used to recommend new items.

a user’s past shopping history is used to make recommendations for new products.
churn (insertions and deletions) every few minutes. any model older than a few hours may no longer be of interest and partial updates will not work.

Treating clicks as a positive vote is more noisy than accepting explicit 1-5 star. clicks don’t say anything about a user’s negative interest.

   Memory-based algorithms -  “similarity” between users. Pearson correlation coefficient,  cosine similarity.
   Model-based algorithms -  Latent Dirichlet Allocation. Most of the model-based algorithms are computationally expensive
- The similarity between two users ui, uj is defined as the overlap between their item sets. Jaccard coefficient,  it is well known that the corresponding distance function is a metric
- we would like to compute the similarity of this user, S (ui , uj ), to all other users uj , and recommend to user ui stories voted by uj with weight equal to S(ui, uj ).

sublinear time near-neighbor search technique
Locality Sensitive Hashing (LSH)
hash the data points using several hash functions so as to ensure that, for each function, the probability of collision is much higher for objects which are close. LSH schemes are known to exist for the following distance or similarity measures: Hamming norm, Lp norms, Jaccard coefficient, cosine distance and the earth movers distance (EMD)

    randomly permute the set of items (S) and for each user ui compute its hash value h(ui) as the index of the first item under the permutation that belongs to the user’s item set
    for a random permutation, the probability that two users will have the same hash function is exactly equal to their similarity or Jaccard coefficient. min-hashing as a probabilistic clustering.
    we can always concatenate p hash-keys for users, where p ≥ 1, so the probability that any two users ui , uj will agree on the concatenated hash-key is equal to S (ui , uj )p .

these refined clusters have high precision but low recall. We can improve the recall by repeating this step in parallel multiple times,

By choosing the range of the hash-value to be 0 . . . 264 − 1 (unsigned 64 bit integer) we ensure that we do not encounter the “birthday paradox

     PLSI   probabilistic latent semantic indexing
The relationship between users and items is learned by modeling the joint distribution of users and items as a mixture distribution. A hidden variable Z (taking values from z ∈ Z, and ∥Z∥ = L) is introduced to capture this relationship, which can be thought of as representing user communities (like-minded users) and item communities (genres).

key contribution of the model is the in- troduction of the latent variable Z, which makes users and items conditionally independent. The model can also be thought of as a generative model in which state z of the la- tent variable Z is chosen for an arbitrary user u based on the CPD p(z|u). Next, an item s is sampled based on the chosen z from the CPD p(s|z).

conditional likelihood over all data points is maximized. Expectation Maximization (EM) is used to learn the maxi- mum likelihood parameters of this model.
E-Step involves the computation of Q variables (i.e. the a-posteriori latent class probabilities)
           M-step uses the above computed Q function to com- pute the following distributions:

        The insight into using mapreduce for the EM

        generative model [joint distribution: p(x,y)]  vs   discriminative model [conditional: p(y|x)]
          What is the difference between a Generative and Discriminative Algorithm?

Let's say you have input data x and you want to classify the data into labels y. A generative model learns the joint probability distribution
p(x,y) and a discriminative model learns the conditional probability distribution p(y|x) - which you should read as "the probability of y given x".

Here's a really simple example. Suppose you have the following data in the form (x,y):

(1,0), (1,0), (2,0), (2, 1)

p(x,y) is

      y=0   y=1
x=1 | 1/2   0
x=2 | 1/4   1/4

p(y|x) is

      y=0   y=1
x=1 | 1     0
x=2 | 1/2   1/2

If you take a few minutes to stare at those two matrices, you will understand the difference between the two probability distributions.

The distribution p(y|x) is the natural distribution for classifying a given example x into a class y, which is why algorithms that model this directly
 are called discriminative algorithms. Generative algorithms model p(x,y), which can be tranformed into p(y|x) by applying Bayes rule and then used for
 classification. However, the distribution p(x,y) can also be used for other purposes. For example you could use p(x,y) to generate likely (x,y) pairs.

From the description above you might be thinking that generative models are more generally useful and therefore better, but it's not as simple as that.
 This paper is a very popular reference on the subject of discriminative vs. generative classifiers, but it's pretty heavy going. The overall gist is
 that discriminative models generally outperform generative models in classification tasks.

          Locality-sensitive hashing (LSH)

 is a method of performing probabilistic dimension reduction of high-dimensional data.
The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability
 (the number of buckets being much smaller than the universe of possible input items). The hashing used in LSH is
different from conventional hash functions, such as those used in cryptography, as in the LSH case the goal is to
maximize probability of "collision" of similar items rather than avoid collisions. [1] Note how locality-sensitive
hashing, in many ways, mirrors data clustering and Nearest neighbor search.

Bit sampling for Hamming distance

One of the easiest ways to construct an LSH family is by bit sampling.[3] This approach works for the Hamming
distance over d-dimensional vectors \{0,1\}^d. Here, the family \mathcal F of hash functions is simply the
family of all the projections of points on one of the d coordinates, i.e.,
 {\mathcal F}=\{h:\{0,1\}^d\to \{0,1\}\mid h(x)=x_i,i =1 ... d\}, where x_i is the ith coordinate of x.
A random function h from {\mathcal F} simply selects a random bit from the input point. This family has the
following parameters: P_1=1-R/d, P_2=1-cR/d.

   Streaming Similarity Search over one Billion Tweets using Parallel Locality-Sensitive Hashing         cache-conscious hash table layout, using a 2-level merge algorithm for hash table construction; an efficient algorithm for duplicate elimination during hash-table querying; an insert-optimized hash table structure and efficient data expiration algorithm for streaming data

 for any given query q, reports the points within the radius R from q. We refer to those points as R-near neighbors of q in P. The data structure is randomized: each R-near neigh- bor is reported with probability 1 − δ where δ > 0.. locality-sensitive if for any two points p and q, the probability that p and q collide under a ran- dom choice of hash function depends only on the distance between p and q.

 angle between unit vectors p and q. t can be calculated as acos( p·q ). The hash functions in the ||p||·||q|| family are parametrized by a unit vector a. Each such function ha, when applied on a vector v, returns either −1 or 1, depending on the value of the dot product between a and v. Specifically, we have ha(v) = sign(a · v)

apply hash functions iteratively on previous hash function results. two level. u1,u2 -- u1,u3 -- u1,u4
  Vegetation and Its Reflectance Properties      ENVI The optical spectrum is partitioned into four distinct wavelength ranges:
  • Visible: 400 nm to 700 nm
  • Near-infrared: 700 nm to 1300 nm
  • Shortwave infrared 1 (SWIR-1): 1300 nm to 1900 nm
  • Shortwave infrared 2 (SWIR-2): 1900 nm to 2500 nm
The transition from near-infrared to SWIR-1 is marked by the 1400 nm atmospheric water absorption region in which satellites and aircraft cannot acquire measurements. Similarly, the SWIR-1 and SWIR-2 transition is marked by the 1900 nm atmospheric water absorption region.

The most important leaf components that affect their spectral properties areas below. Other components (such as phosphorus, calcium, and so forth) are significant to plant function, but they do not directly contribute to the spectral properties of leaves, and therefore cannot be directly measured using remotely sensed data.:
• Pigments
chlorophyll (-a and -b)
high concentration of chlorophyll is generally very healthy, as chlorophyll is linked to greater light use efficiency or photosynthetic rates
carotenoids, and anthocyanins
higher concentrations in vegetation that is less healthy, typically due to stressed (seen in drought or nutrient depletion), senescent (dormant or dying vegetation that appears red, yellow, or brown), or dead

• Water
Plants of different species inherently contain different amounts of water based on their leaf geometry, canopy architecture, and water requirements. Among plants of one species, there is still significant variation, depending upon leaf thickness, water availability, and plant health. Water is critical for many plant processes, in particular, photosynthesis. Generally, vegetation of the same type with greater water content is more productive and less prone to burn.
Leaf water affects plant reflectance in the near-infrared and shortwave infrared regions of the spectrum (see the following figure). Water has maximum absorptions centered near 1400 and 1900 nm, but these spectral regions usually cannot be observed from airborne or space-based sensors due to atmospheric water absorption, preventing their practical use in the creation of VIs. Water features centered around 970 nm and 1190 nm are pronounced and can be readily measured from hyperspectral sensors. These spectral regions are generally not sampled by multispectral sensors.
• Carbon
Cellulose and lignin display spectral features in the shortwave infrared range of the shortwave optical spectrum as in figure.
• Nitrogen
VIs sensitive to chlorophyll content (which is approximately 6% nitrogen) are often broadly sensitive to nitrogen content as well. Some proteins that contain nitrogen affect the spectral properties of leaves in the 1500 nm to 1720 nm range.

The variation in reflectance caused by different canopy structures, much like individual leaf reflectance, is highly variable with wavelength.

The LAI is the green leaf area per unit ground area, which represents the total amount of green vegetation present in the canopy. The MLA is the average of the differences between the angle of each leaf in a canopy and horizontal. The more LAI, the more reflectance.
vegetation strongly reflects light in the near-infrared portion of the spectrum, canopies strongly absorb photons in the visible and SWIR-2 ranges. This results in a much shallower penetration of photons into the canopy in these wavelengths. As such, VIs using spectral data from the visible and SWIR-2 are very sensitive to upper-canopy conditions.

Non-Photosynthetic Vegetation
senescent or dead vegetation (also known as non-photosynthetic vegetation, or NPV). it could be truly dead or simply dormant (such as some grasses between rainfall events),  Also included in the NPV category are woody structures in many plants, including tree trunks, stems, and branches.

NPV is composed largely of the carbon-based molecules lignin, cellulose, and starch. As such, it has a similar reflectance signature to these materials, with most of the variation in the shortwave infrared range. In many canopies, much of the NPV is obscured below a potentially closed leaf canopy; the wavelengths used to measure NPV (shortwave infrared) are often unable to penetrate through the upper canopy to interact with this NPV. As such, only exposed NPV has a significant effect on the spectral reflectance of vegetated ecosystems. When exposed, NPV scatters photons very efficiently in the shortwave infrared range, in direct contrast to green vegetation which absorbs strongly in the shortwave infrared range.

In general, photons in the visible wavelength region are efficiently absorbed by live, green vegetation. Likewise, photons in the SWIR-2 region of the spectrum are efficiently absorbed by water. In contrast to live vegetation, dead, dry, or senescent vegetation scatters photons very efficiently throughout the spectrum, with the most scattering occurring in the SWIR-1 and SWIR-2 ranges. The change in canopy reflectance due to increasing amounts of NPV is shown in the following figure.

 Dry or Senescent Carbon
Normalized Difference Lignin Index
Cellulose Absorption Index

Maxmimum Likelihood vs Maximum Aposteriori MLvsMAP
Maximum Likelihood is a parametric estimation of parameter μ (assming the probability distribution is e.g. Bernoulli what is the mean that generates the given sequence of interest with maximum probability)
Thus far, we have considered p ( x ; μ) as a function of x , parametrized by μ. If we view p ( x ; μ) as a function of μ, then it is called the likelihood function.
Maximum likelihood estimation basically chooses a value of μ that maximizes the likelihood function given the observed data

Maxmimum Apostriori
Take μ as a random variable
p(μ | X) = p(X|μ) p(μ) / p(X)

Thus, Bayes' law converts our prior belief about the parameter μ (before seeing data) into a posterior probability, p (μ | X ), by using the likelihood function p ( X | μ ). The maximum a-posteriori (MAP) estimate is de.ned a

To take a simple example of a situation in which MAP estima- tion might produce better results than ML estimation, let us consider a statistician who wants to predict the outcome of the next election in the USA.

The statistician is able to gather data on party preferences by asking people he meets at the Wall Street Golf Club which party they plan on voting for in the next election The statistician asks 100 people, seven of whom answer "Democrats". This can be modeled as a series of Bernoullis, just like the coin tosses. In this case, the maximum likelihood estimate of the proportion of voters in the USA who will vote democratic is μ^ML = 0.07.

Somehow, the estimate of μ^ML = 0.07 doesn't seem quite right given our previous experience that about half of the electorate votes democratic, and half votes republican. But how should the statistician incorporate this prior knowledge into his prediction for the next election? The MAP estimation procedure allows us to inject our prior beliefs about parameter values into the new estimate

posterior ~ likelihood * prior distribution (e.g. beta distribution)

The beta distribution

has been applied to model the behavior of random variables limited to intervals of finite length in a wide variety of disciplines. For example, it has been used as a statistical description of allele frequencies in population genetics;[1] time allocation in project management / control systems;[2] sunshine data;[3] variability of soil properties;[4] proportions of the minerals in rocks in stratigraphy;[5] and heterogeneity in the probability of HIV transmission.[6]

In Bayesian inference, the beta distribution is the conjugate prior probability distribution for the Bernoulli, binomial and geometric distributions. For example, the beta distribution can be used in Bayesian analysis to describe initial knowledge concerning probability of success such as the probability that a space vehicle will successfully complete a specified mission. The beta distribution is a suitable model for the random behavior of percentages and proportions.

Principal Components:
The number of principal components is less than or equal to the number of original variables.

An orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal component),  the second greatest variance on the second coordinate, and so on

PCA can be done by eigenvalue decomposition of a data covariance (or correlation) matrix or singular value decomposition of a data matrix, usually after mean centering (and normalizing or using Z-scores) the data matrix for each attribute.[4] The results of a PCA are usually discussed in terms of component scores, sometimes called factor scores (the transformed variable values corresponding to a particular data point), and loadings (the weight by which each standardized original variable should be multiplied to get the component score)

 2003 A Survey of Spectral Unmixing Algorithms 220 Lincoln Laboratory Journal
 Nirmal Keshava
endmembers, and a set of corresponding fractions, or abundances, that indicate the proportion of each endmember present in the pixel

statistical modeling often fails to reflect the high degree of physical detail that guarantees precision and physically plausible answers for individual pixels.
introduce the aggregate behavior of a larger population of data into the processing of an individual pixel, and they do so having no knowledge of the probabilistic nature of the data

Non-statistical (e.g. geometrical/physical)
becomes important in target detection, where statistical characterizations of nontarget behavior (background or clutter) can complicate the detection of low-probability targets
assumption that the received data originate from a parameterized probability density function:
e.g. whenever algorithms incorporate Gaussian probability density functions in their derivation.
maximum likelihood or maximum a posteriori solutions
a statistical algorithm is not always parametric

Non-parametric different cost functions, e.g. minimization of squared error.

algorithms are deemed optimal if they optimize an objective function. The choice of the objective function is key,

linear mixing model (LMM)

Stages of Unmixing
Ideally designed with consideration of the performance of unmixing procedures performing in the lower dimension.
Because the above three algorithms do not presume any probability density function for the data, they are all non- parametric.

Principal-component analysis (PCA)
The magnitude of an eigenvalue indicates the energy residing in the data along the component of the data parallel to the associated eigenvector.

-Orthogonal vectors
Maximum noise fraction (MNF)
nonorthogonal axes but decreasing SNR order

 As in PCA, the ordering of components can estimate one type of effective signal dimensionality, and the set of random variables obtained after the MNF transform can be truncated to retain only those components possessing a minimum SNR.
non-statistical, non-parametric, orthogonal axes, optimizes squared error

 dimension reduction is achieved by identifying a subset of representative, or exemplar, pixels that convey the variability in a scene.  If the new pixel is sufficiently different from each of the existing exemplars, it is added to the exemplar set. An orthogonal basis is periodically created from the current set of exemplars by using a modified Gram-Schmidt process, which adds new dimensions until every exemplar can be approximated within a prescribed tolerance.

extracting spectra that are physically meaningful. Non-statistical algorithms essentially assume the endmembers are deterministic quantities, whereas statistical approaches view endmembers as either deterministic, with an associated degree of uncertainty, or as fully stochastic, with random variables having probability density functions.

Fuzzy k-means
Maximum likelihood: optimize posterior density function

     ; NAME: viper_mesma
; AUTHOR: Kerry Halligan
;   This routine performs Multiple Endmember Spectral Mixture Analysis (MESMA) on an input
;   image or images using endmembers containted in one or more ENVI spectral libraries. It
;   allows simple SMA or MESMA, photometric shade or non-photometric shade and a range of
;   constraints and outputs.
;   SMA
;   Reflectance image
;     Input image can be in any format (byte, integer, unsigned integer, floating point) or
;     interleave (bip, bil, bsq).  Integer or unsigned integer data should be in reflectance
;     times 10,000 (0 - 10,000). If data are in byte format, they should be in reflectance times
;     250 (0 - 250).  If data are in floating point they should be in reflectance (0 - 1).
;     Utilizes ENVI's band bands list (if present) to spectrally subset both image and spectral
;     library.
;   Spectral libraries
;     Up to 3 spectral libraries allowed.  Need to be the same number of bands and same data
;     type (e.g. floating point, integer, etc.) as image.  Libraries should not contain shade.
;     Most common libraries would be 1) all spectra (used for 2em mode), 2) vegetation (used
;     for 3em and 4em mode, and 3) npv + soil (used for 3em case) 4) npv (used for 4em case) and
;     5) soil (used for 4 em case).  Note that all combinations of spectra are used, so a
;     4em run with 10 green veg spectra, 5 npv and 6 soil spectra runs a total of 300 models.
;   None
;   None
;   Minimum RMS image: Non-shade and shade fractions (1st bands) plus the rms and model number
;     of the minimum rms model.
;   Classification image: Classified image with the minimum RMS model for each pixel.
;   None
;   The proceedure builds a lookup table for the endmembers for each model, reads in all spectral
;   libraries, then begins a line by line loop. 
For each image line it reads in the data, and then loops through each model. 
For each model
the spectra are selected and used to build an endmember array, which is passed to viper_mesma_fracCalc.
fraccalc returns the fractional abundances of all non-shade and shade endmembers, the model RMSE and optionally the residuals.
If the current model produced a lower RMSE for any given pixel
the selected fraction, RMSE, and residual constraints are tested
All pixels that meet the constraints are considered valid
All pixels that produce a lower RMSE than the stored best value AND are valid are updated with the new model number, fractions and RMSE
After all models have been run, results for that line of image data are saved to disk and the next line is read in.

This procedure selects the single model for each pixel that meets all constraints AND has the lowest RMS error.

If no model meets the contraints then pixel is left with all zero values and appears as 'unclassified' in the output image.

When the image is complete, the file is closed, headers are written, and a classification image is produced.

Calculates fractions first,
using Singular Value Decomposition to invert the endmember matrix,
then calculates shade as 1-sum of non-shade fractions.

If a non-photometric shade endmember is used
then this endmember is subtracted first from each endmember
then from the image spectrum
then the fractions are calculated as above.

;   sel_from_list
;   cmapply
;   6-30-05 Written by Kerry Halligan from, with earlier versions of this code
;       dating back to 2001 and used for Master's thesis work
;   9-29-05 Kerry Halligan modified including
;       significant debugging effort to address a range of problems and to add
;       batch mode capabilities.  Also removed some of the cmapply calls which
;       were causing an unknown error during the constraint process
;   6-30-06 to fix problems with dropdown list for non-photometric shade selection
;   9-6-06 Changed counters from integer to float datatype to allow for greater
;       than 2^16/2 models (to accomidate Ted Eckman's fire temp mapping work)
;   11-1-06  Made various changes to fix two problems with the residual images
;       removed r_fid capture if/when residual image is opened
;       added a case statement for scaling residual images such that byte, integer and floating
;       point are handled explicitly, all others are treated as floating point
;   11-21-06 Kerry Halligan modified to do the following:
;       added error handler to _run routine to report error message
;       activated the save and restor control files after major re-writing of these routines
;   12-6-06 Kerry Halligan modified to first check all input files to make sure they
;     exist before trying to load them in ENVI - this prevents crashes when files not found.
;     Also added format statements to printf calls for filenames of image and spectral
;     libraries that were causing new lines when long filenames/paths were used.
;   12-20-06 Kerry Halligan odified add back in the thres_resids function that had been
;     inadvertantly removed from previous version when cleaning up code.  Also fixed bug
;     in the 'run' routine which was preventing the read of the text widgets for the
;     max RMSE and max residual text widgets.  Now it is no longer necessary to hit
;     enter to update these values.  Changed output residual image to now be:
;      same number of bands as input image, regardless of bad bands list
;      bands that are denoted as 'bad' in bad bands list will be zeros in residual image
;      floating point data with no scale factor (e.g. DN,radiance, reflectance)
;          regardless of scale factor of input image
;      renamed fracCalc2 to viper_mesma_fraccalc
;      renamed thresh_resids to viper_mesma_thresh_resids
;   12-21-06 Kerry Halligan modified to fix output residuals.  Bug had been resulting in the
;       output residuals being from just the last model, not the best model.
;   1-1-07 Kerry Halligan made the following changes
;     renamed to viper_mesma from eMESMA_viper_batch to maintain consistancy
;     fixed a bug in residual calculation that was causing it to crash when output residuals not selected
;     made significant modification to the handling of input datasets with regards to bad bands lists - copied code
;      from recently updated CRES module.  Now uses bad bands common to all spectral libraries and input image
;     warns user if bad bands are different between spectral libraries and image, and asks user if new bad bands
;      list should be output as an ASCII file.  This file can be imported in ENVI
;   1-7-07 Kerry Halligan fixed bug in the restore control file routine (viper_mesma_restore) which was causing
;     the index for the scale factor (0-3) to be stored rather than the scale factor itself (0,1,1000,10000)
;   2-27-07 Kerry Halligan fixed bug in the residual constraint.  Dropped code which autmatically set number
;     of bands to the number of good bands, and changed so that it only checks to see if number of good bands
;     is less than number of residual bands IF residual constraint is used.
;   7-3-07 Kerry Halligan made miner change to default file names in the 'browse' and 'save' routines to speed
;     selection of input and output filenames.

; This function is called by the MESMA code
processes on a single image line
; It gets passed the endmember array that is n x m where n is the number of endmembers and m is the number of bands per spectrum.  The resids keyword can be set to a named variable that will contain the riduals for all bands for all spectra in the input image_line.  viper_mesma_fraccalc makes several improvements over the previously used fracCalc including much reduced looping. 

Returns a structure with the following tags: frac, shade, rmse. 
Frac is a n X m array where n is the number of non-shade endmembers, and m is the number of pixels in the input image line, and the values are the fractional abundance of each endmember for each pixel, in reflectance units.
Shade and RMSE are m element vectors where m is the number of pixels in the input image line, and the values are the fractional abundance of shade and the RMSE in reflectance units.

;   See main viper_mesma routine for an
; example of using viper_mesma_fraccalc.
function viper_mesma_fraccalc, em_array, image_line, resids=resids

From UofT:

From Baidu/UCLA:

From Google:

From Stanford (our work):

From Berkeley

Amir Standford: label generation for image. this is goifg to fit into deepdive to be able to understand what;s inside a image. they tried inside a DB but that didn't work out so they do it in C++ they tried GPU but Chris Rei believes there is a trade off some parts are better done in CPU some GPU, they have taken a look at it but for now they believe on CPU. Their performance is 10x better than others in benchmarks (speed? precision?)
The actual knowledge base of deepdive is trained and does knowledge extraction using their KBP, that they won teh contest from others by large margin that they removed kbp from the task.


Subpages (1): mesmaTemp