decidous: "falling off at maturity" refers to trees or shrubs that lose their leaves.
shrub(bush): distinguished from a tree by its multiple stems and shorter heights (usually under 6m)
many species can grow in both tree and shrub structure depending on condition.
NDVI
(CH2  CH1) / (CH2 + CH1) Where CH1 is the reflectance in the visible wavelengths (0.580.68 um) and CH2 is the reflectance in the reflective infrared wavelengths (0.7251.1 um). link ch1=660 ch2=730 link OR 798679/798+679 (NIRRED)/(NIR+RED)
Year 
Title 
Citation 
Where 
Authors 
Description 





Tags:
Problem:
Idea:
Pros:
Cons:
dataset
 number of images? resolution
 lidar? hyperspectural? others?
 additional knowledge?
 data source? private? Modis? Neon?
2. ecological question addressed
 output property? (e.g., species?)
 value or distribution?
3. machine learning methods:
 classification vs. clustering?
 features used? feature reduction?
 SVM? logistic regression?
4. results
 accuracy/recall/f1 measure
5. can we improve the methodology using additional knowledge?

frf 




in memory array datasbe.
checkout java and GPU for better distributed computations.

2012 
Mapping Savanna Tree Species at Ecosystem Scales Using Support Vector Machine Classification and BRDF Correction on Airborne Hyperspectral and LiDAR Data 
5 
Remote Sensing 
Colgan, Baldeck, Féret, Asner 
Dataset: east South Africa, collected by CAO.
Lidar:
Resolution 1.12m, flight overlap 100%
point density averaged two points per spot.
Spatial error less than 0.20m vertically, 0.36m horizontally
Hyperspectral:
72 bands: 385 to 1,054 nm
Resolution 1.12m (coaligned with the LiDAR)
Flight at AprilMay 2008.
Ground data: collected 2009
729 individual tree crowns
Specie Info
basal diameter
crown diameter
height
additional 124 circular field plots of 30 m diameter were used to measure the abundance of each specie
these plots were not used in calibration or validation of the species classifiers since trees were not individually located within the plots
Classification of specie for each pixel:
In confusion matrix:
Producer Accuracy = # correctly classified pixels / # of ground pixels (precision)
User Accuracy = # of classified pixels of class / total # of pixels
BRDF correction
Reduce NIR variation from 40% to 5% 1% accuracy
increase in precision due to BRDF correction
 65% to 71% increase in accuracy when ground data increased from 290 to 729 crowns.
 Second level SVM incorporating incorporating maximum height (rather than pixel height) increased from 73.8 to 76.5
 Given the additional cost and complexity of having both LiDAR and hyperspectral data for a given study area, we note the relative improvement over hyperspectral data alone was relatively minor (from 73.0% to 76.5%).
 Using the mean spectra over all pixels in a crown as input into a crownlevel SVM had poor performance (approximately 54% overall accuracy)—much worse than predicting the pixels individually
 Chose maximum height over average height : it is less dependent on canopy shape, less variance in maximum height than mean height among species.

2013 
Estimating Vegetation Beta Diversity from Airborne Imaging
Spectroscopy and Unsupervised Clustering

0 
Remote Sensing

Baldeck, Asner 
Goal:
estimating the beta diversity among sites from high spatial
resolution airborne data. (how similar are two regions in
terms of specie diversity)
Approaches:
1. Unsupervised with Eucledian distance 2. kmeans clustering
3. supervised
Notes:
KNP Park South Africa. The multiple clustering model allows a rapid assessment of the
spatial arrangement of the biodiversity of a region. This
outperformed the mean Euclidian distance among pixels, not
accuracies as high as a supervised species classification
approach. 50% of pixels in supervised scenario were classified
as other. due to lack of ground measures and hence unhelpful. 
2014 
Landscapescale
variation in plant community composition of an African savanna
from airborne species mapping. 
1 
Ecological Applications 
Baldeck, Colgan, Fe Ret, Levick, Martin, Asner 
Goal:
Maps of community compositional variation were produced by
ordination and clustering, and the importance of hillslopescale
topoedaphic variation in shaping community structure was
evaluated with redundancy analysis.
Aproaches: 2 layer stacked SVM as before. the analysis is
on hierarchical species clustering, commnity determination and
proportion of species in each.
Notes: They use R^2 metric (correlation^2) and abundance
per unit area and 0<indicator<100 value of each dominant
specie per clustered hierarchy
Tool: http://www.sagagis.org/en/index.html

2012 
Mapping tree species composition in South African savannas using an integrated airborne spectral and LiDAR system 
10 
Elsevier
Remote Sensing of Environment

Cho, Mathieu, Asner, Naidoo, vanAardt, Ramoelo, Debba, Wessels, Main, Smit, Izak PJ

Goal:
1. Compare CAO, WorldView2, and QuickBird hyperspectral efficiencies for classification
2. Check out Spectral + tree height effect
3. Matching Expert Knowledge:
whether the species diversity maps generated from the classified species maps corroborate with conventional knowledge on species diversity in the region. For example, we assumed that the maps produced would show that granite soils are richer in tree species than gabbro or whether Acacia nigrescens is more abundant on gabbro than on granite
Aproach: maximum likelihood classifier, pixel level
Notes: KNP Park South Africa
Obtains WorldView and QuickBird quality images by resampling from CAO (1.2m resolution)
Worldview2: satelite, 8 bands, 1.85m resolution, revisit 1 days,
collecting up to 1 million km^2 image per day
WorldView2 spectral bands are centred at
425 nm (absorbed by chlorophyll),
480 nm (absorbed by chlorophyll),
545 nm (sensitive to plant health),
605 nm (absorbed by carotenoids — detects ‘yellowness’ of vegetation),
660 nm (absorbed by chlorophyll), 725 nm (sensitive to vegetation health),
835 nm (sensitive to leaf mass and moisture content)
and 950 nm (sensitive to leaf mass and moisture content) (see review by Ustin et al., 2009).
Quickbird: Satellite, 4 bands, submeter resolution. Panchromatic (black and white) imagery at 60cm resolution and multispectral at 2.4 and 2.8m resolutions.
485 nm (blue), 560 nm (green), 660 nm (red) and 830 nm (nearinfrared)
Worldview2 satellite data performs equal/better than CAO due to specific band selection. But we can't have Lidar via satellite
hyperspectral +lidar 2% better than hyperspectral itself. but this is statistically significant (is not just by chance and can reject null hypothesis)
Possible Approaches
Parametric methods: maximum likelihood, discriminant analysis
consider
first order variations (e.g. mean values)
second order variations (e.g., covariance matrices): accounting for withinspecies variability in the classification.
Cons: High data dimensionality of hyperspectral data: # of training spectra per species must be >= # of spectral bands. e.g. having at least 220 training spectra per species if we have 220 spectral bands. Nonparametric classifiers might be more useful than maximum likelihood in case of high withinspecies variability
Nonparametric Methods (make no assumption of the data distribution)
• Spectral similarity measures e.g. spectral angle mapper
• Subpixel classification techniques e.g. spectral mixture analysis
• Machine learning methods e.g. ANN and SVM, decision tree classification techniques e.g. Random forests
trees at various phenological stages cause confusion (how old they are).
Filter: height > 2m (avg minimum height of trees).
Tools Used: To get WorldView2 quality data from COA, it's data was reampled in ENVI. First, we converted the classified species raster map into a vector map of the species polygons in the ENVI software. The resulting species polygon image was exported into ArcGIS software, where the polygon centroids were converted into point data (point shapefile), representing the species. Subsequently, the point shapefile was exported to DIVA GIS software (Hijmans et al., 2005), where the tree species diversity maps were generated on a per hectare (ha) basis.

2005 
Hyperspectral discrimination of tropical rain forest tree species at leaf to crown scales 
237 
Remote Sensing of Environmen 
Clark, Roberts, Clark 
161 bands, 7 species, Costa Rica
scale  nearinfrared (700–1327 nm) bands were consistently important regions across all scales. Bands in the visible region (437–700 nm) and shortwave infrared (1994–2435 nm) were more important at pixel and crown scales.
leaf: laboratory
pixel: flight
crown: flight : majority of pixels class
Approach: Classifications: applied to combinations of bands from a stepwiseselection procedure.
linear discriminant analysis (LDA) using MASS package in R
maximum likelihood (ML) in ENVI v4.1 and IDL v6.1
spectral angle mapper (SAM) in ENVI v4.1 and IDL v6.1
The SAM classifier performed poorly at all scales and spectral regions of analysis
Goal:
 Determine if spectral variation among tree species (interspecific) is greater than spectral variation within species (intraspecific), thereby permitting spectralbased species discrimination.
 Identify the spatial scale(leaf/pixel/tree) and spectral regions that provide optimal discrimination among TRF emergent tree species.
 Develop an analytical procedure for the specieslevel (floristic) classification of individual tree crowns using their reflectance spectra.
 Assess the relative importance of narrowband hyper spectral versus broadband multispectral information for species identification of TRF trees.
Ground data from previous 1998 paper by D.B. Clark : 544 tree 27 specie > 7 specie selected
Edaphic variation and the mesoscale distribution of tree species in a neotropical rain forest.
Flight: The U.S. Naval Research Laboratory flew the airborne HYperspectral Digital Imagery Collection Experiment (HYDICE) sensor over LSBS in 1998
the spectral angle is a metric used for comparing the degree of similarity between two spectra. Unlike Euclidean distance, the spectral angle is insensitive to linearlyscaled differences among spectra such as those caused by illumination.
Feature Selection:
 Full spectra: 161 bands
 Subsampled spectra regions (VIS, NIR) with 10 bands per region
 These bands were evenlyspaced with an average spacing of 23 nm (VIS), 55 nm (NIR), 25 nm (SWIR1), and 47 nm (SWIR2).
 A forward stepwise selection method based on discriminant analysis. This method was implemented using the SAS STEPDISC procedure software.

2012 
SpeciesLevel Differences in Hyperspectral Metrics among Tropical Rainforest Trees as Determined by a TreeBased Classifier 
4 
Remote Sensing 
Clark, Roberts

Costa Rica 1.6 resolution. 210 bands
Metrics that respond to vegetation chemistry and structure were derived using narrowband indices, derivative and absorption based techniques, and spectral mixture analysis.
Random Forest Classifier in R
Metrics that respond to vegetation chemistry and structure were derived using narrowband indices, derivative and absorption based techniques, and spectral mixture analysis.
RF classification was performed with hyperspectral metrics from
tissue (bark, leaf),
pixel
crownscale spectra,
with the following sets of metrics:
(1). indices;
(2). absorptionbased;
(3). derivative;
(4). SMA fractions (pixel and crown scale only);
Spectral mixture analysis (SMA) models reflectance spectra as a linear combination of dominant spectral components, or endmembers, producing perpixel fractional abundance of each endmember and a rootmeansquare error (RMSE) model fit
(5). all available metrics.
Tropical forest canopies are typically modeled as a mixture of the following end members:
 “green” photosynthetic vegetation (GV),
 nonphotosynthetic vegetation (NPV),
 shade
 and possibly soil substrate endmember
Pixel and crownscale spectra were unmixed using a threeendmember SMA model [69] composed of GV, NPV and photometric shade

2014 
Tree crown delineation and tree species classification in boreal forests using hyperspectral and ALS data 
0 
Elsevier Remote Sensing of Environment 
Dalponte, Ørka, Ene, Gobakken, Næsset

Location: Norway. ASL (Airborne Laser Scanning  Lidar)
Totally, 2363 trees were recorded in the 23 plots having a dominant species distribution of 57% spruce, 28% pine, and 15% broad leaves. 2008 160 bands, spectral resolution 3.7nm
(SVM), having as input features all the hyperspectral bands acquired by the sensor. R package Kernlab.
, the classification at ITC level was obtained by aggregating the classified pixels inside each ITC according to a majority rule.
five classification cases were analyzed:
i) a fully manual case based on manually delineated ITCs (the M–M case),
ii–iii) two fully automatic cases based on ITCs automatically delineated on hyperspectral data (the H–H case) and on ALS data (the L–L case), and
iv–v) two semiautomatic cases that consider manually delineated ITCs in the training phase and ITCs automatically delineated on hyperspectral data (the M–H case) and on ALS data (the M–L case) in the validation phase.
Two thresholding methods were tested:
i) the automatic Otsu thresholding method (OTM; Otsu, 1979),
ii) a percentilebased thresholding (PTM).
kfold cross validation
the classifier was trained with the trees belonging to k − 1 plots (where k is the number of plots; k = 23 in our study) and validated on the leftout plot. This process was repeated k = 23 times.
The model selection of the SVM was performed using a fivefold cross validation on the training dataset of size N − 1.
The relationship between the classification accuracy and the distributions of
i) the DBH of the trees (measured on the ground), and
ii) the crown area provided by the two delineation methods,
was investigated using analysis of variance (ANOVA) and additional multiple comparison test of the differences in means (the Tukey's “Honest Significant Difference”) implemented in the statspackage of R.
it is more conservative with respect to, for example, a standard ttest

2012 
SemiSupervised Methods to Identify Individual Crowns of Lowland Tropical Canopy Species Using Imaging Spectroscopy and LiDAR 
6 
Remote Sensing 
Féret, Asner 
Nine tree species, Hawaiian lowland
hyperspectral imagery, LiDAR intensities, and LiDAR height
semisupervised Support Vector Machine classification using tensor summation kernel was superior to supervised classification
combination of hyperspectral imagery and LiDAR data usually improved species classification
Both LiDAR intensity and LiDAR canopy height proved useful for classification
Recent work has shown that multiple species can be detected in tropical forests, yet accuracies and the potential for automation remain highly uncertain
image size: 1,980by1,420 pixel image
resolution of 0.56 m
24 spectral bands of 28 nm in width, evenly spaced between 390 nm and 1,044 nm
lidar spot spacing was 0.56 m both across and downtrack. 50% overlap between adjacent flightlines, resulting in two laser shots per 0.56 m. a physicallybased model was used to estimate topofcanopy and ground surfaces using REALM software and Terrascan/Terramatch software packages.
3 intensity values
791 individual tree crowns (ITCs) from 17 species
we reduced the dataset by discarding ITCs smaller than 50 pixels and species with less than 12 ITCs. The final dataset used for this study encompassed 333 ITCs from nine different species
3year time lag between the acquisition of the image (September 2007) and the collection of the ground truth (November 2010)
Both the supervised and semisupervised classifications performed in this study were based on the SVM
These linear boundaries between classes are generated by maximizing the margins between the hyperplane and the closest training samples (i.e., the support vectors) and minimizing the error of the training samples that cannot be differentiated. As the classes are rarely linearly separable in the original feature space, SVM projects the training dataset into a kernel feature space of higher dimensionality. This is performed nonlinearly. linear (L) SVM and radial basis function (RBF) SVM both outperform other nonparametric classifiers such as the knearest neighbor or artificial neural network approaches, and have comparable or better performance than discriminant analysis. all classification tasks were performed using the MATLAB interface of the LIBSVM package. The RBF function is written as follows: ...
training data, by semisupervised methods. The semisupervised classification takes advantage of complementary data corresponding to unlabeled samples in order to improve the estimation of the marginal data distribution during the training stage. To start the semisupervised approach, we first randomly selected 500 unlabeled pixels from the total dataset. We then implemented the semisupervised approach proposed by Tuia and CampsValls, which is based on the local regularization of the training kernel. The information contained in these unlabeled samples is used to create a bagged kernel combined with the training kernel in order to deform its base structure through a clusterbased method. This bagged kernel is obtained after successive kmeans clustering (with different initialization but the same number of clusters) are performed on the combined training/unlabeled samples. The bagged kernel accounts for the number of time two samples i and j have been assigned to the same cluster. Here we compared two different kernels: the tensor product kernel which deforms the training kernel by multiplying it with the bagged kernel, and the tensor summation kernel which deforms the training kernel by adding it with the bagged kernel. A package including MATLAB source code for this method is publicly available [37] (http://www.uv.es/gcamps/code/bagsvm.htm).
These two types of LiDAR variables complement one another, and we recommend combining them with hyperspectral data whenever possible as the full combination of hyperspectral imagery, LiDAR intensity and canopy height outperformed any other combination tested here when averaged on the nine species studied, and showed significant improvements compared to hyperspectral data only or combined with one of the two LiDAR data types studied for six of these species.

2013 
Tree species discrimination in tropical forests using airborne imaging spectroscopy 
14 
Geoscience and Remote Sensing, IEEE Transactions on 
Feret, Asner 
Hawaiian
supervised classification
Nonparametric methods
linear and radial basis function
support vector machine,
artificial neural network,
knearest neighbor
parametric methods
linear, quadratic, and regularized discriminant analysis
 a clear advantage in using regularized discriminant analysis, linear discriminant analysis, and support vector machines.
combine segmentation and species classification from regularized discriminant analysis to produce a map of the 17 discriminated species.
mixed crown is defined as one in which two or more species occupy the same canopy space at a scale of 1–2 m spatial resolution.
A total of 920 ITCs were identified and located, corresponding to 17 “pure” species and 12 types of mixed crowns, resulting in 29 different classes to be discriminated.
majority class rule classification
The mean shift clustering algorithm implemented in the Edge Detection and Image SegmentatiON system (EDISON) [57] gave satisfying results for automatic segmentation of tree crowns with a subset of three visible bands of our data(R=646nm;G=560.7nm;B=447nm).
The tree segments do not exactly correspond to ITCs. However, [28] found that even polygons produced through automated methods that were only partially in agreement with detailed ground mapping improved tree species classification
After this segmentation, a pixelwise classification is performed using the classifier showing the best performance for pixelwise classification, with 50 pixels per species for training, and a majority vote rule is applied to decide about the class assigned to each region.

2014 
A framework for mapping tree species combining hyperspectral and LiDAR data: Role of selected classifiers and sensor across three spatial scales 
1 
Elsevier International Journal of Applied Earth Observation and Geoinformation 
Ghosh, Fassnacht, Joshi, Koch 
location: Germany
Goal: scale effect in imaging spectroscopy when moving from 4 to 30 m pixel size for tree species mapping,
Two airborne (HyMAP) and one spaceborne (Hyperion) imaging spectroscopy dataset with pixel sizes of 4, 8 and 30 m, respectively were available to examine the effect of scale over a central European forest.
managed forest with relatively homogenous stands featuring mostly two canopy layers.
Supervised kernel based (Support Vector Machines) and ensemble based (Random Forest)
8m slightly better than 4m and 30 meter produced sound results.
hyper spectral, lidar(12 points per square meter. collected by NASA
Six different sets of predictor variables (reflectance value of all bands, selected components of a Minimum Noise Fraction (MNF), Vegetation Indices (VI) and each of these sets combined with LiDAR derived height) were explored at each scale
For processing the pointclouds and to generate an nDSM, we used the TreesVis.fitting procedure that is based on a force minimi zation algorithm. subtracting surface model from terrain model
natural conditions like tree age, forest structure and density shall be considered/
There is no common usage of the terms digital elevation model (DEM), digital terrain model (DTM) and digital surface model (DSM) in scientific literature. In most cases the term digital surface model represents the earth's surface and includes all objects on it. In contrast to a DSM, the digital terrain model represents the bare ground surface without any objects like plants and buildings

2012 
Tree Species Classification with Random Forest Using Very High Spatial Resolution 8Band WorldView2 Satellite Data

11

Remote Sensing 
Immitzer, Atzberger, Koukal

worldview2 satellite 8 band, Austria
Random Forest (RF) classification (objectbased and pixelbased)
submountain zone
For the objectbased approach, we calculated the mean band values for each crown polygon using its withincrown pixel spectra.
At nadir the ground resolution (GSD) is 50 cm for the panchromatic band (0.46–0.80 μm) and 200 cm for the multispectral bands.
some tree species, were much better separated with 8 bands, compared to the sole use of the 4 standard bands.

2012

Classification of savanna tree species, in the Greater Kruger National Park region, by integrating hyperspectral and LiDAR data in a Random Forest datamining environment 
14 
ISPRS Journal of Photogrammetry and Remote Sensing

Naidoo, Cho, Mathieu, Asner

Kruger National Park region, South Africa,
seven predictor datasets
Random Forest
important predictors,: height; NDVI; the chlorophyll b wavelength (466 nm) and a selection of raw, continuum removed and Spectral Angle Mapper (SAM) bands.
Concluded that the hybrid predictor dataset Random Forest model yielded the highest classification accuracy
72 bands
Lidar: there is one laser shot per pixel ~ 1.3 points per 1.1 pixel size

2009 
Retrieval of foliar information about plant pigment systems from high resolution spectroscopy 
116 
Remote Sensing of Environment

Ustin, Gitelson, Jacquemoud, Schaepman, Asner, Gamon, ZarcoTejada.

Pigment: the natual coloring matter of animal or plant tissue.
Pigment color differs from structural color in that it is the same for all viewing angles, whereas structural color is the result of selective reflection or iridescence, usually because of multilayer structures. For example,butterfly wings typically contain structural color, although many butterflies have cells that contain pigment as well.
All biological pigments selectively absorb certain wavelengths of light while reflecting others. The light that is absorbed may be used by the plant to power chemical reactions, while the reflected wavelengths of light determine the color the pigment will appear to the eye.
green pigment(Chlorophyll) along with several red/yellow pigments that help to capture as much light energy as possible.
We review recent advances in detecting plant pigments at the leaf level and discuss suc cesses and reasons why challenges remain for robust remote observation and quantification.
New methods to identify and quantify individual pigments in the presence of overlapping absorption features would provide a major advance in understanding their biological functions, quantifying net carbon exchange, and identifying plant stresses. we focus primarily on re flectance measurements, at the leaf level, emphasizing advances in the past 15–20 years, and examining two types of quantitative approaches: (1) empirical methods and (2) physically based radia tive transfer models and quantitative methods.
It has been noted that extracted chlorophyll absorption peaks are shifted about 20 nm to shorter wavelengths than observed in re flectance from intact leaves.
which indexes or wavelengths's to use for various detections.


Mapping a priori defined plant associations using remotely sensed vegetation characteristics 


Roelofsen, Lammert Kooistra, Peter M. van Bodegom, Jochem Verrelst, Johan Krol, and JanPhilip M. Witte. 


Linguistic Regularities in Continuous Space Word Representations 


Tomas Mikolov, Wentau Yih, Geoffrey Zweig 
Note:
Each word is represented as a vector and each relationship e.g. clothes to shirt can be represented as their vector offset
Vector oriented reasoning based on the offset between words. ‘better:best is rougher:’ answer would be roughest
The only contribution is applying it to relation comparison dataset. The real implementation is done by Tomas Mikolov


Global Biodiversity Information Facility 



http://www.gbif.org/occurrence/search?datasetKey=db6cd9d77be54cd08b3cfb6dd7446472
http://scholar.google.com/citations?user=UKjJKUIAAAAJ&hl=en&oi=ao
geotagged location of species, can be used with NDVI of Modis to study the change of NDVI over time per specie.

2012 
JuliaLang

12 
arxiv 
Jeff Bezanson, Stefan Karpinski, Viral B. Shah, Alan Edelman 
JuliaLang: Parallel/distributed scientific programming language vs R, Matlab parallel toolbox
Julia is a highlevel, highperformance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments.
We want a language that’s open source, with a liberal license.
We want the speed of C with the dynamism of Ruby
con: modern data analytics systems are data oriented. data migration is what determines effieiency not parallel compuatation or not yet another parallel programming language... process data right where it resides (in database computation)


D4M: Dynamic Distributed Dimensional Data Model 



http://www.mit.edu/~kepner/D4M/
http://www.mit.edu/~kepner/#here
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6289129
http://istcbigdata.org/wpcontent/uploads/2013/03/SIAMCSE2013D4MJeremyKepnerEtAl.pdf
http://www.mit.edu/~kepner/pubs/Kepner_2012_D4M_Slides.pdf






Rtree is a data structure that splits the space into nested rectangeles. it is good for object hierarchies: each minimum rectangle for a polygon, line or road. this will allow quick tests of for maitaining polygons. It is like Btrees but instead of a node just cointaining a value, each node contains the range of the rectangle. a good video link
KD tree : you have a set of points and want to answer querries such as closest nneighbors or nodes in range r, splits the space horizontally or vertically at chosen nodes.
Quadtree is like KD tree but instead of each node splitting space vertically or horizontally, each node splits space to NE,SE,NW, SW (hence quad), the newer points will each go to proper quad and have their own children. can find all points which are contained within a range.
alpha shapes/concave hull
http://www.openjump.org/
http://www.rotefabrik.free.fr/concave_hull/

2011 
A Kdtreebased Outlier Detection Method for Airborne LiDAR Point Clouds 
2 
IEEE Image and Data Fusion 

Notes:
the average of the distances between the central point and its kneighborhood points are calculated.
If the average distance is larger than an adaptively preset value, the point is regarded as an outlier.
crown segmentation using kdtree and expanding on nearest neighbors graph
Lidar data structures: TIN, octree, kd tree
kd tree
kd tree (short for kdimensional tree) is a spacepartitioning data structure for organizing points in a k dimensional space.
range searches
nearest neighbor searches
if for a particular split the "x" axis is chosen, all points in the subtree with a smaller "x" value than the node will appear in the left subtree and all points with larger "x" value will be in the right sub tree.
ANN [7] is employed to realize the kdtree. ANN is a library written in the C++ programming language to support both exact and approximate nearest neighbor searching in spaces of various dimensions.
kdtree is more efficient than the TIN
references
A multiresolution approach for filtering LiDAR altimetry data
An optimal algorithm for approximate nearest neighbor searching in fixed dimensions

2012 
A New Method for Segmenting Individual Trees from the Lidar Point Cloud 
29 
Photogrammetric Engineering and Remote Sensing 
Wenkai Li,
Qinghua Guo,
Marek Jakubowski, Maggi Kelly

Preprocessing
TerraScan software http://terrasolid.fi used to classify raw lidar point data to ground / aboveground
Ordinary kriging(point interpolation) used to interpolate the ground points and generate the digital elevation model (DEM) at 1 m resolution
normalized the vegetation point cloud values by sub tracting the ground points (DEM) from the lidar point
After normalization, the elevation value of a point indicates the height from the ground
Algorithm
taking advantage of the relative spacing between trees
the spacing at the top of a tree is larger than the spacing at the bottom
starting from a tree top, we can identity and “grow” a target tree by including nearby points and exclude points of other trees based on their relative spacing.
points are classified sequentially, from the highest to the lowest.
Starting from the seed point A, we classify other lower points sequentially.
The threshold should be approximately equal to the crown radius. If the threshold is too small, trees with elongated branches may be oversegmented;
adaptive threshold can be used, assuming that taller trees have larger crown diameters
projected into 2D Euclidean space, most of the points fall into the left sector
a spacing threshold
a minimum spacing rule, and a horizontal profile of the tree shape. Undersegmentations can be reduced by using a relatively small threshold, and oversegmentations can be reduced based on the shape and distribution of the points.
During each iteration, only one tree (target) is segmented,
points corresponding to this target tree are removed from the point cloud.
startfromthehighesttree
First, find the highest point (global maximum)


Delineating Individual Trees from Lidar Data: A Comparison of Vector and Rasterbased Segmentation Approaches


Remote Sensing 

good categorization of approaches for tree segmentation using either lidar and/or images


Mining lidar data with spatial clustering algorithms 



out of various clustering approaches used, DBSCAN outperformed others.
Data clustering algorithms
Partitioning methods:
partition a data set into a given number of clusters kmeans, kmedioid
Hierarchical methods:
single link (SL) algorithm emphasizes the connectedness of the patterns in a cluster
Prototypebased optimization methods:
use an optimization procedure tuned for a particular shape of the cluster.
Spectral clustering and kernelbased:
number of parameters is quadratic with respect to the number of samples.
Densitybased methods:
assume that clusters are high density regions.
Manifoldbased methods:
Evolutionary algorithms for clustering
Evolutionary algorithms for clustering:
genetic algorithm (GA)
Combination:
fuzzy clustering and minimum spanning tree
Three Approaches
a) DBSCAN
the neighbourhood of a given radius ε needs to contain at least a minimum number of objects,
the density in the neighbourhood has to exceed a certain threshold.
map each point to the distance of the kth nearest neighbour.
b) OPTICS
ordering points based on ε
minimum number of points required to call a circular area ‘dense’
OPTICS (BOPT),
uses perimeters of triangles instead of circular areas.
steps:
(1) prepare an ordered ‘live edge’ database;
(2) create ‘sparse cluster’ and ‘noise’ data;
(3) test the clusters and noise for mergeability and output final clusters
outputs of the clustering algorithms are compared with the manually generated clusters, using ARI as a measure
qualitatively visualized by overlaying the points
adjusted rand index (ARI)
ranges from 0 to 1, as a measure of the similarity between two different clustering approaches
A value closer to zero would imply that the two clustering approaches are dissimilar, whereas values closer to one would indicate that the approaches have yielded similar results.





lidar Tools




http://home.iitk.ac.in/~blohani/
lidar tree segmentation
Fusion and lastools
Quadtree
People
Tools for Lidar Classification / Filtering:
 MCCLIDAR
 An open source commandline tool for processing discretereturn LIDAR
data in forested environments. It classifies data points as ground or
nonground using the Multiscale Curvature Classification algorithm
developed by Evans and Hudak, 2007
 GRASS GIS  Open source GIS software. Includes a suite of tools related to lidar data processing as discussed in this GRASS lidar wiki entry. Currently doesn’t offer direct support for point cloud data in LAS format.
 BCAL Lidar Tools
 Open source tools developed by the Idaho State University Boise
Center Aerospace Lab in IDL as a plugin for the ENVI software package.
Includes a Height Filtering tool optimized for open rangeland (sagebrush) vegetation developed by Streutker and Glenn, 2006.
 SAGA GIS
 Open source GIS package. “SAGA includes several tools to manipulate
the point cloud, e.g. an attribute calculator, reclassifier, subset
extractor and many other methods like gridding and interpolation. There
is also (a grid based) bare earth filter, adapted from Vosselman
(2001).”
Point cloud library http://pointclouds.org/documentation/
PRocess TOol LIdar DAta in R
HOW TO: Install latest geospatial & scientific software on Linux




2011 
Point Cloud Classification for Water Surface Identification in Lidar Datasets 

UT Austin Master's Thesis


TINs are constructed by triangulating set of vertices; the vertices are connected with a series of edges to form a network of triangles. The resulting triangulation satisfies Delaunay triangulation which ensures that no vertex lies within the interior of any of the circumcircle of the triangles in network.
Finding K nearest Neighbors in KDtree
1. Starting with the root node, the control moves down the tree recursively, i.e. it goes to the right or the left depending on whether the 2D distance (3D can also be used) of point is greater or less than a current node in the specified split dimension.
2. Once control reaches the leaf node, it saves the current node point as the current best fit.
3. The control unwinds the recursion of the tree, performing the following steps at each node:
a. If the current node is closer than the current best, then it becomes the current best fit.
b. The control checks whether there could be any points on the other side of the splitting plane that are closer to the search point than the current best. This is done by intersecting the splitting hyperplane with a hypersphere around the search point that has a radius equal to the nearest distance. This is implemented as a simple comparison to see whether the difference between the splitting coordinate of the search point and current node is less than the distance from the search point to the current best.
i. If the hypersphere crosses the plane, there could be nearer points on the other side, so the control must move down the other branch of the tree from the current location or node, searching for closer points in space by following the same recursive technique.
ii. If the hypersphere does not intersect the splitting plane, then the control moves up the tree and the entire branch on the other side is not inspected any more.
4. Once the process is finished for the root node the search returns the index and the distances of the points from the point of interest.
Finding Neighbors within K radius
The search is similar to finding the K nearest neighbors; the only difference being the Euclidean distance from the potential points is calculated at each step and compared if the distance is less than K.




2007 
3D LiDAR pointcloud segmentation




Kittipat's Homepage
https://sites.google.com/site/kittipat/vegfilterlidar
The MATLAB toolbox is available here. The brief manual can be found here.
A brief slides can be found here.




2011 
Individual Tree Crowns Delineation Using Local Maxima Approach And Seeded Region Growing Technique 

GIS Ostrava


Seeded region growing is an iterative process started in a pixel from the set of seeds. Pixels from the seed neighborhood are subsequently classified whether or not they are part of the same crown as the seed.
1) Absolute distance from the seed.
2) Brightness agreement.
3) Spectral agreement





Algorithms for Nearest Neighbor Search 



Slides, KDtree RTree ...




29 
Linguistic Regularities in Continuous Space Word Representations 

NAACLHLT 

each word is represented as a vector and each relationship e.g. clothes to shirt can be represented as their vector offset
vector oriented reasoning based on the offset between words. ‘better:best is rougher:’ answer would be roughest
the only contribution is applying it to relation comparison dataset. The real implementation is done by Tomas Mikolov
Continuous space language models
vectorspace word representations
implicitly learned by the inputlayer weights.'
capturing syntactic and semantic regularities in language
each relationship is characterized by a relationspecific vector offset.
allows vectororiented reasoning based on the offsets between words
male/female relationship is automatically learned with the induced vector representations “King  Man + Woman” results in a vector very close to “Queen.”
word vectors capture syntactic regularities by means of syntactic analogy questions
word vectors capture semantic regularities by using the vector offset method
neural network language models
representation of words as high dimensional real valued vectors.
words are converted via a learned lookup table into real valued vectors used, as the inputs to a neural network
ngram model works in terms of discrete units that have no inherent relationship to one another,
continuous space model works in terms of word vectors where similar words are likely to have similar vec tors.
Thus, when the model parameters are adjusted in response to a particular word or wordsequence, the improvements will carry over to occurrences of similar words and sequences.
By training a neural network language model,
not just the model itself, but also the learned
word representations, which may be used for other,
potentially unrelated, tasks.
predicting a probability of the “next” word, given some preceding words.
were first studied in the context of feedforward networks
later in the context of recurrent neural network models (Mikolov et al., 2010; Mikolov et al., 2011b)
Recurrent Neural Network
input layer, a hidden layer with re current connections
input vector w(t) represents input word at time t encoded using 1ofN coding, and the output layer y(t) produces a probability distribution over words. The hidden layer s(t) maintains a representation of the sentence history.
input vector w(t) and the output vector y(t) have dimensionality of the vocabulary.
word representations are found in the columns of U, with each column representing a word.
The RNN is trained with back propagation to maximize the data loglikelihood
we tagged 267M words of newspaper text with Penn Treebank POS tags
similarity between members of the word pairs (xb, xd), (xc, xd) and dissimilarity for (xa, xd).
we used vectors generated by the RNN toolkit of Mikolov
RNN generates vector for each word: taken from Mikolov
Training on Mikolov’s dataset
We present a new dataset for mea suring syntactic performance, and achieve almost 40% correct.
Surprisingly, both results are the byproducts of an unsupervised maximum likelihood training criterion that simply operates on a large amount of text data.




2013 
 Endmember Detection Using Graph Theory 

IGRASS 
Rohani
Parente 
Summary
image is segmented, with the minimum size of the segment= 20, k=0.001;
spectral angle distance as the distance metric the original imag
super pixel is the avg of all points there
knn graph of superpixels are drawn
sum of all distances for each superpixel with its knns are computed
super pixels are sorted based on the sum calculated
super pixels with largest sum are endmembers.
itis observed that isolated superpixels are not endmembers and they always have a high sum
fo all k selelctions, so they can be discarded
Some authors proposed approaches based on piecewise convex models for nonlinear unmixing (e.g. [3], [4]) which assume only local geometrical structure
choice of number of convex regions and the number of the endmembers in each convex region
explores the topological relationships between data points regardless of the geometry of the data cloud.
”centrality” from graph theory to identify boundary points of the data cloud
Multi Dimensional Pixel Purity Index algorithm
V is the set of the vertices, the centroid of each segment of the image, and E is the set of the edges which represent the connectivity between the nodes. In our approach, we build an edge between twonodes if euclidean ditance is below threshold.
we calculate the spectral Euclidean distances between each pair of the minerals available in spectral libraries and we set the value of ✏ equal to the least spectral Euclidean distance found among these pairs.
In linear mixing model, find ing the endmembers is equivalent to identifying the vertices of the simplex containing (most of) the data cloud. Even if the mixing is not linear, our simulations for intimate mixing confirm that endmembers all lie on the boundary of the data.
where st is the number of shortest geodesic paths from s to t, and st(v) the number of the shortest geodesic paths from node s to node t that pass through a vertex v.
which is a measure of a node’s centrality in a network.
From the points with lowest between ness centrality, we have to choose those that we consider end member points. Since we are working with a segmented im age we can assume that none of the segments are affected by local distortions which occur in only a few pixels. The seg mentation algorithms used discards segments below a given minimum size and the pixels in them are grouped with the spectrally most similar segment in its neighborhood. The sig nature for this new segment is recalculated by averaging the spectra of all the pixels in the segment. This averaging helps mitigate the effect of the local distortions. Thus, the set of boundary points can now be thought of as the set of spectral signatures present in the image that are not overly affected by local distortions (such as atmospheric effects and instrument noise).
we present a ranking scheme that attempts to maximize the spectral variability at the top of the list.
score for each node is the sum of spectral Euclidean distances of each node with its knearest neighbors. The points with the largest values of the score (sum of spectral Euclidean distances S S D ) are placed at the top of the list. The points with largest spectral Euclidean distance will be those points that are on the bound ary and isolated. This will be followed by some points that are on the boundary in lowdensity areas and lastly those in the boundary that are in highdensity areas.
The isolated boundary points would be unaffected by the choice of k and would always be placed at the top the list.
Our observations show that even in the presence of intimate mixing the purest points (endmembers) are located in regions of higher curvature of the boundary.
endmembers in dense areas
We perform a segmentation with the minimum size of the segment equal to 20, k=0.001 and spectral angle distance as the distance metric to the original
The scientists have ratioed the unique spectra with some dark points spectra in order to have better signal quality (dark points are the points with no distinguishable spectral features and consequently their spectra would be more flat in comparison with others). The spectra identified by the algorithm have been ratioed with the average dark points spectra.
identified by scientists manually (Fig.2 (a))
VCA
algorithm iteratively projects data onto a direction orthogonal to the subspace spanned by the endmembers already determined. The new endmember signature corresponds to the extreme point in the projection.
N FINDR
finds the endmembers as the set of points defining the largest volume by inflating a simplex inside the data.
VCA finds two endmem bers (red and blue) and cannot find all the endmembers as the most extreme points in the projections. Also for NFINDR (Fig. 2 (d)), the endmembers are not extracted accurately.
graphs for modeling the image.
Each node represents the centroid of the superpixels of the image and it is connected to the adjacent (spectrally) nodes.
betweenness centrality and sum of the Euclidean distances from nearest neighbors can be employed. We applied this approach to some CRISM images and reported the results of one image. As mentioned in the previous sections, our approach detects the endmembers found by the scientists properly and it can be applied to the images with least assumptions on the type of mixing or the shape of the data cloud




2010 
Superpixel Endmember Detection 
32 
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING


A super pixel is 19+ pixels to 100 pixels big.
Superpixel representations can reduce noise in hyperspectral images by exploiting the spatial contiguity of scene features.
Used for Mars satelite images each pixel about 20m
First,
a graphbased agglomerative algorithm oversegments the image.
We then use segments’ mean spectra as input to existing statis
tical endmember detection algorithms such as sequential maximum angle convex cone (SMACC) and NFINDR.
superpixel representations significantly reduce the computational complexity of later processing
while improving endmembers’ match to the target spectra.




2007 
Google News Personalization: Scalable Online Collaborative Filtering 
713 
WWW 

Summary: two major scheme: 1) MinHash which is a locally sensitive hash 2) Probabilistic Latent semantic indexing which
takes a latent variable z that makes users and contents conditionally independent, then does an EM to calculate the
latent variable z
Content Retrieval
 Collaborative filtering user preferences . use the item ratings by users, are typically content agnostic
 Contentbased filtering (e.g. keywordbased searching). rated highly by the user is used to recommend new items.
a user’s past shopping history is used to make recommendations for new products.
churn (insertions and deletions) every few minutes. any model older than a few hours may no longer be of interest and partial updates will not work.
Treating clicks as a positive vote is more noisy than accepting explicit 15 star. clicks don’t say anything about a user’s negative interest.
Algorithms
Memorybased algorithms  “similarity” between users. Pearson correlation coefficient, cosine similarity.
Modelbased algorithms  Latent Dirichlet Allocation. Most of the modelbased algorithms are computationally expensive
MinHash
 The similarity between two users ui, uj is defined as the overlap between their item sets. Jaccard coefficient, it is well known that the corresponding distance function is a metric
 we would like to compute the similarity of this user, S (ui , uj ), to all other users uj , and recommend to user ui stories voted by uj with weight equal to S(ui, uj ).
sublinear time nearneighbor search technique
Locality Sensitive Hashing (LSH)
hash the data points using several hash functions so as to ensure that, for each function, the probability of collision is much higher for objects which are close. LSH schemes are known to exist for the following distance or similarity
measures: Hamming norm, Lp norms, Jaccard coefficient, cosine distance and the earth movers distance (EMD)
MinHashing
randomly permute the set of items (S) and for each
user ui compute its hash value h(ui) as the index of the first item
under the permutation that belongs to the user’s item set
for a random permutation, the probability that two users will have the same hash function is
exactly equal to their similarity or Jaccard coefficient. minhashing as
a probabilistic clustering.
we can always concatenate p
hashkeys for users, where p ≥ 1, so the probability that any two users
ui , uj will agree on the concatenated hashkey is equal to S (ui , uj
)p .
these refined clusters have high precision but low recall. We can improve the recall by repeating this step in parallel multiple times,
By choosing the range of the hashvalue to be 0 . . . 264 − 1 (unsigned 64 bit integer) we ensure that we do not encounter the “birthday paradox”
PLSI probabilistic latent semantic indexing
The relationship between users and items is learned by modeling the joint distribution of users and items as a mixture distribution. A hidden variable Z (taking values from z ∈ Z, and ∥Z∥ = L) is introduced to capture this relationship, which can be thought of as representing user communities (likeminded users) and item communities (genres).
key contribution of the model is the in troduction of the latent variable Z, which makes users and items conditionally independent. The model can also be thought of as a generative model in which state z of the la tent variable Z is chosen for an arbitrary user u based on the CPD p(zu). Next, an item s is sampled based on the chosen z from the CPD p(sz).
conditional likelihood over all data points is maximized. Expectation Maximization (EM) is used to learn the maxi mum likelihood parameters of this model.
EStep involves the computation of Q variables (i.e. the aposteriori latent class probabilities)
Mstep uses the above computed Q function to com pute the following distributions:
The insight into using mapreduce for the EM
generative model [joint distribution: p(x,y)] vs discriminative model [conditional: p(yx)]









What is the difference between a Generative and Discriminative Algorithm?
Let's say you have input data x and you want to classify the data into labels y. A generative model learns the joint probability distribution
p(x,y) and a discriminative model learns the conditional probability distribution p(yx)  which you should read as "the probability of y given x".
Here's a really simple example. Suppose you have the following data in the form (x,y):
(1,0), (1,0), (2,0), (2, 1)
p(x,y) is
y=0 y=1

x=1  1/2 0
x=2  1/4 1/4
p(yx) is
y=0 y=1

x=1  1 0
x=2  1/2 1/2
If you take a few minutes to stare at those two matrices, you will understand the difference between the two probability distributions.
The distribution p(yx) is the natural distribution for classifying a given example x into a class y, which is why algorithms that model this directly
are called discriminative algorithms. Generative algorithms model p(x,y), which can be tranformed into p(yx) by applying Bayes rule and then used for
classification. However, the distribution p(x,y) can also be used for other purposes. For example you could use p(x,y) to generate likely (x,y) pairs.
From the description above you might be thinking that generative models are more generally useful and therefore better, but it's not as simple as that.
This paper is a very popular reference on the subject of discriminative vs. generative classifiers, but it's pretty heavy going. The overall gist is
that discriminative models generally outperform generative models in classification tasks.









Localitysensitive hashing (LSH)
is a method of performing probabilistic dimension reduction of highdimensional data.
The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability
(the number of buckets being much smaller than the universe of possible input items). The hashing used in LSH is
different from conventional hash functions, such as those used in cryptography, as in the LSH case the goal is to
maximize probability of "collision" of similar items rather than avoid collisions. [1] Note how localitysensitive
hashing, in many ways, mirrors data clustering and Nearest neighbor search.
http://en.wikipedia.org/wiki/Localitysensitive_hashing
Bit sampling for Hamming distance
One of the easiest ways to construct an LSH family is by bit sampling.[3] This approach works for the Hamming
distance over ddimensional vectors \{0,1\}^d. Here, the family \mathcal F of hash functions is simply the
family of all the projections of points on one of the d coordinates, i.e.,
{\mathcal F}=\{h:\{0,1\}^d\to \{0,1\}\mid h(x)=x_i,i =1 ... d\}, where x_i is the ith coordinate of x.
A random function h from {\mathcal F} simply selects a random bit from the input point. This family has the
following parameters: P_1=1R/d, P_2=1cR/d.





Streaming Similarity Search over one Billion Tweets using Parallel LocalitySensitive Hashing 



cacheconscious hash table layout, using a 2level merge algorithm for hash table construction; an efficient algorithm for duplicate elimination during hashtable querying; an insertoptimized hash table structure and efficient data expiration algorithm for streaming data
for any given query q, reports the points within the radius R from q. We refer to those points as Rnear neighbors of q in P. The data structure is randomized: each Rnear neigh bor is reported with probability 1 − δ where δ > 0.. localitysensitive if for any two points p and q, the probability that p and q collide under a ran dom choice of hash function depends only on the distance between p and q.
angle between unit vectors p and q. t can be calculated as acos( p·q ). The hash functions in the p·q family are parametrized by a unit vector a. Each such function ha, when applied on a vector v, returns either −1 or 1, depending on the value of the dot product between a and v. Specifically, we have ha(v) = sign(a · v)
apply hash functions iteratively on previous hash function results. two level. u1,u2  u1,u3  u1,u4





Vegetation and Its Reflectance Properties 


ENVI 
The optical spectrum is partitioned into four distinct wavelength ranges:
 Visible: 400 nm to 700 nm
 Nearinfrared: 700 nm to 1300 nm
 Shortwave infrared 1 (SWIR1): 1300 nm to 1900 nm
 Shortwave infrared 2 (SWIR2): 1900 nm to 2500 nm
The transition from nearinfrared to SWIR1 is marked by the
1400 nm atmospheric water absorption region in which satellites and
aircraft cannot acquire measurements. Similarly, the SWIR1 and SWIR2
transition is marked by the 1900 nm atmospheric water absorption region.
The most important leaf components that affect their spectral properties areas below.
Other components (such as phosphorus, calcium, and so forth) are
significant to plant function, but they do not directly contribute to
the spectral properties of leaves, and therefore cannot be directly
measured using remotely sensed data.:
• Pigments
chlorophyll (a and b)
high concentration of chlorophyll is generally very healthy, as chlorophyll is linked to greater light use efficiency or photosynthetic rates
carotenoids, and anthocyanins
higher concentrations in vegetation that is less healthy, typically due
to stressed (seen in drought or nutrient depletion), senescent (dormant or dying vegetation that
appears red, yellow, or brown), or dead
• Water
Plants of different species inherently contain different amounts of water based on their leaf geometry, canopy architecture, and water requirements. Among plants of one species, there is still significant variation, depending upon leaf thickness, water availability, and plant health. Water is critical for many plant processes, in particular, photosynthesis. Generally, vegetation of the same type with greater water content is more productive and less prone to burn.
Leaf water affects plant reflectance in the nearinfrared and shortwave infrared regions of the spectrum (see the following figure). Water has maximum absorptions centered near 1400 and 1900 nm, but these spectral regions usually cannot be observed from airborne or spacebased sensors due to atmospheric water absorption, preventing their practical use in the creation of VIs. Water features centered around 970 nm and 1190 nm are pronounced and can be readily measured from hyperspectral sensors. These spectral regions are generally not sampled by multispectral sensors.
• Carbon
Cellulose and lignin display spectral features in the shortwave infrared range of the shortwave optical spectrum as in figure.
• Nitrogen
VIs sensitive to chlorophyll content (which is approximately 6% nitrogen) are often broadly sensitive to nitrogen content as well. Some proteins that contain nitrogen affect the spectral properties of leaves in the 1500 nm to 1720 nm range.
The variation in reflectance caused by different canopy structures, much like individual leaf reflectance, is highly variable with wavelength.
The LAI is the green leaf area per unit ground area, which represents the total amount of green vegetation present in the canopy. The MLA is the average of the differences between the angle of each leaf in a canopy and horizontal. The more LAI, the more reflectance.
vegetation strongly reflects light in the nearinfrared portion of the spectrum, canopies strongly absorb photons in the visible and SWIR2 ranges. This results in a much shallower penetration of photons into the canopy in these wavelengths. As such, VIs using spectral data from the visible and SWIR2 are very sensitive to uppercanopy conditions.
NonPhotosynthetic Vegetation
senescent or dead vegetation (also known as nonphotosynthetic vegetation, or NPV). it could be truly dead or simply dormant (such as some grasses between rainfall events), Also included in the NPV category are woody structures in many plants, including tree trunks, stems, and branches.
NPV is composed largely of the carbonbased molecules lignin, cellulose, and starch. As such, it has a similar reflectance signature to these materials, with most of the variation in the shortwave infrared range. In many canopies, much of the NPV is obscured below a potentially closed leaf canopy; the wavelengths used to measure NPV (shortwave infrared) are often unable to penetrate through the upper canopy to interact with this NPV. As such, only exposed NPV has a significant effect on the spectral reflectance of vegetated ecosystems. When exposed, NPV scatters photons very efficiently in the shortwave infrared range, in direct contrast to green vegetation which absorbs strongly in the shortwave infrared range.
In general, photons in the visible wavelength region are efficiently absorbed by live, green vegetation. Likewise, photons in the SWIR2 region of the spectrum are efficiently absorbed by water. In contrast to live vegetation, dead, dry, or senescent vegetation scatters photons very efficiently throughout the spectrum, with the most scattering occurring in the SWIR1 and SWIR2 ranges. The change in canopy reflectance due to increasing amounts of NPV is shown in the following figure.
Dry or Senescent Carbon
Normalized Difference Lignin Index
Cellulose Absorption Index









Maxmimum Likelihood vs Maximum Aposteriori MLvsMAP
Maximum Likelihood is a parametric estimation of parameter μ (assming the probability distribution is e.g. Bernoulli what is the mean that generates the given sequence of interest with maximum probability)
Thus far, we have considered p ( x ; μ) as a function of x , parametrized by μ. If we view p ( x ; μ) as a function of μ, then it is called the likelihood function.
Maximum likelihood estimation basically chooses a value of μ that maximizes the likelihood function given the observed data
Maxmimum Apostriori
Take μ as a random variable p(μ  X) = p(Xμ) p(μ) / p(X)
Thus, Bayes' law converts our prior belief about the parameter μ (before seeing data) into a posterior probability, p (μ  X ), by using the likelihood function p ( X  μ ). The maximum aposteriori (MAP) estimate is de.ned a
Example
To take a simple example of a situation in which MAP estima tion might produce better results than ML estimation, let us consider a statistician who wants to predict the outcome of the next election in the USA. The statistician is able to gather data on party preferences by asking people he meets at the Wall Street Golf Club which party they plan on voting for in the next election The statistician asks 100 people, seven of whom answer "Democrats". This can be modeled as a series of Bernoullis, just like the coin tosses. In this case, the maximum likelihood estimate of the proportion of voters in the USA who will vote democratic is μ ^{^}_{ML} = 0.07. Somehow, the estimate of μ ^{^}_{ML} = 0.07 doesn't seem quite right given our previous experience that about half of the electorate votes democratic, and half votes republican. But how should the statistician incorporate this prior knowledge into his prediction for the next election? The MAP estimation procedure allows us to inject our prior beliefs about parameter values into the new estimate posterior ~ likelihood * prior distribution (e.g. beta distribution)
The beta distribution
has been applied to model the behavior of random variables
limited to intervals of finite length in a wide variety of disciplines.
For example, it has been used as a statistical description of allele frequencies in population genetics;^{[1]} time allocation in project management / control systems;^{[2]} sunshine data;^{[3]} variability of soil properties;^{[4]} proportions of the minerals in rocks in stratigraphy;^{[5]} and heterogeneity in the probability of HIV transmission.^{[6]}
In Bayesian inference, the beta distribution is the conjugate prior probability distribution for the Bernoulli, binomial and geometric distributions.
For example, the beta distribution can be used in Bayesian analysis to
describe initial knowledge concerning probability of success such as the
probability that a space vehicle will successfully complete a specified
mission. The beta distribution is a suitable model for the random
behavior of percentages and proportions. Principal Components:
The number of principal components is less than or equal to the number of original variables.
An orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on
PCA can be done by eigenvalue decomposition of a data covariance (or correlation) matrix or singular value decomposition of a data matrix, usually after mean centering (and normalizing or using Zscores) the data matrix for each attribute. ^{[4]}
The results of a PCA are usually discussed in terms of component
scores, sometimes called factor scores (the transformed variable values
corresponding to a particular data point), and loadings (the weight by
which each standardized original variable should be multiplied to get
the component score)




2003  A Survey of Spectral Unmixing Algorithms  220  Lincoln Laboratory Journal
 Nirmal Keshava  endmembers, and a set of corresponding fractions, or abundances, that indicate the proportion of each endmember present in the pixel
Algorithm
Statistical
statistical modeling often fails to reflect the high degree of physical
detail that guarantees precision and physically plausible answers for
individual pixels. introduce the aggregate behavior of a larger population of data into the
processing of an individual pixel, and they do so having no knowledge
of the probabilistic nature of the data
Nonstatistical (e.g. geometrical/physical)
becomes important in target detection, where statistical characterizations of nontarget behavior
(background or clutter) can complicate the detection of lowprobability
targets Algorithm
parametric assumption that the received data originate from a parameterized probability density function:
e.g. whenever algorithms incorporate Gaussian probability density functions in their derivation. maximum likelihood or maximum a posteriori solutions
a statistical algorithm is not always parametric
Nonparametric different cost
functions, e.g.
minimization of squared error.
algorithms are deemed optimal if they optimize an objective function. The choice of the objective function is key,
linear mixing model (LMM)
Stages of Unmixing
dimensionreduction Ideally designed with consideration of the performance of unmixing procedures performing in the lower dimension. Because the above three algorithms do not presume any probability density function for the data, they are all non parametric. Examples Principalcomponent analysis (PCA) The magnitude of an eigenvalue indicates the energy residing in the data along the component of the data parallel to the associated eigenvector. Orthogonal vectors Maximum noise fraction (MNF) nonorthogonal axes but decreasing SNR order
As in PCA, the ordering of components can estimate one type of effective signal dimensionality, and the set of random variables obtained after the MNF transform can be truncated to retain only those components possessing a minimum SNR.
ORASISnonstatistical, nonparametric, orthogonal axes, optimizes squared error dimension reduction is achieved by identifying a subset of representative, or exemplar, pixels that convey the variability in a scene. If the new pixel is sufficiently different from each of the existing exemplars, it is added to the exemplar set. An orthogonal basis is periodically created from the current set of exemplars by using a modified GramSchmidt process, which adds new dimensions until every exemplar can be approximated within a prescribed tolerance. EndmemberDeterminationextracting spectra that are physically meaningful. Nonstatistical algorithms essentially assume the endmembers are deterministic quantities, whereas statistical approaches view endmembers as either deterministic, with an associated degree of uncertainty, or as fully stochastic, with random variables having probability density functions. Fuzzy kmeans Maximum likelihood: optimize posterior density function Inversion          ; NAME: viper_mesma ; AUTHOR: Kerry Halligan halligan@vipertools.org www.vipertools.org ; ; PURPOSE: ; This routine performs Multiple Endmember Spectral Mixture Analysis (MESMA) on an input ; image or images using endmembers containted in one or more ENVI spectral libraries. It ; allows simple SMA or MESMA, photometric shade or nonphotometric shade and a range of ; constraints and outputs. ; ; CATEGORY: ; MESMA ; SMA ; ; INPUTS: ; Reflectance image ; Input image can be in any format (byte, integer, unsigned integer, floating point) or ; interleave (bip, bil, bsq). Integer or unsigned integer data should be in reflectance ; times 10,000 (0  10,000). If data are in byte format, they should be in reflectance times ; 250 (0  250). If data are in floating point they should be in reflectance (0  1). ; Utilizes ENVI's band bands list (if present) to spectrally subset both image and spectral ; library. ; Spectral libraries ; Up to 3 spectral libraries allowed. Need to be the same number of bands and same data ; type (e.g. floating point, integer, etc.) as image. Libraries should not contain shade. ; Most common libraries would be 1) all spectra (used for 2em mode), 2) vegetation (used ; for 3em and 4em mode, and 3) npv + soil (used for 3em case) 4) npv (used for 4em case) and ; 5) soil (used for 4 em case). Note that all combinations of spectra are used, so a ; 4em run with 10 green veg spectra, 5 npv and 6 soil spectra runs a total of 300 models. ; ; OPTIONAL INPUTS: ; None ; ; KEYWORD PARAMETERS: ; None ; ; OUTPUTS: ; Minimum RMS image: Nonshade and shade fractions (1st bands) plus the rms and model number ; of the minimum rms model. ; Classification image: Classified image with the minimum RMS model for each pixel. ; ; OPTIONAL OUTPUTS: ; None ; ; PROCEDURE: ; The proceedure builds a lookup table for the endmembers for each model, reads in all spectral ; libraries, then begins a line by line loop.
For each image line it reads in the data, and then loops through each model. For each model the spectra are selected and used to build an endmember array, which is passed to viper_mesma_fracCalc.
fraccalc returns the fractional abundances of all nonshade and shade endmembers, the model RMSE and optionally the residuals. If the current model produced a lower RMSE for any given pixel the selected fraction, RMSE, and residual constraints are tested All pixels that meet the constraints are considered valid All pixels that produce a lower RMSE than the stored best value AND are valid are updated with the new model number, fractions and RMSE After all models have been run, results for that line of image data are saved to disk and the next line is read in.
This procedure selects the single model for each pixel that meets all constraints AND has the lowest RMS error.
If no model meets the contraints then pixel is left with all zero values and appears as 'unclassified' in the output image.
When the image is complete, the file is closed, headers are written, and a classification image is produced. Calculates fractions first, using Singular Value Decomposition to invert the endmember matrix, then calculates shade as 1sum of nonshade fractions. If a nonphotometric shade endmember is used then this endmember is subtracted first from each endmember then from the image spectrum then the fractions are calculated as above.
; REQUIRES: ; sel_from_list ; cmapply ; ; EXAMPLE: ; ; MODIFICATION HISTORY: ; 63005 Written by Kerry Halligan from eMESMA.pro, with earlier versions of this code ; dating back to 2001 and used for Master's thesis work ; 92905 Kerry Halligan modified including ; significant debugging effort to address a range of problems and to add ; batch mode capabilities. Also removed some of the cmapply calls which ; were causing an unknown error during the constraint process ; 63006 to fix problems with dropdown list for nonphotometric shade selection ; 9606 Changed counters from integer to float datatype to allow for greater ; than 2^16/2 models (to accomidate Ted Eckman's fire temp mapping work) ; 11106 Made various changes to fix two problems with the residual images ; removed r_fid capture if/when residual image is opened ; added a case statement for scaling residual images such that byte, integer and floating ; point are handled explicitly, all others are treated as floating point ; 112106 Kerry Halligan modified to do the following: ; added error handler to _run routine to report error message ; activated the save and restor control files after major rewriting of these routines ; 12606 Kerry Halligan modified to first check all input files to make sure they ; exist before trying to load them in ENVI  this prevents crashes when files not found. ; Also added format statements to printf calls for filenames of image and spectral ; libraries that were causing new lines when long filenames/paths were used. ; 122006 Kerry Halligan odified add back in the thres_resids function that had been ; inadvertantly removed from previous version when cleaning up code. Also fixed bug ; in the 'run' routine which was preventing the read of the text widgets for the ; max RMSE and max residual text widgets. Now it is no longer necessary to hit ; enter to update these values. Changed output residual image to now be: ; same number of bands as input image, regardless of bad bands list ; bands that are denoted as 'bad' in bad bands list will be zeros in residual image ; floating point data with no scale factor (e.g. DN,radiance, reflectance) ; regardless of scale factor of input image ; renamed fracCalc2 to viper_mesma_fraccalc ; renamed thresh_resids to viper_mesma_thresh_resids ; 122106 Kerry Halligan modified to fix output residuals. Bug had been resulting in the ; output residuals being from just the last model, not the best model. ; 1107 Kerry Halligan made the following changes ; renamed to viper_mesma from eMESMA_viper_batch to maintain consistancy ; fixed a bug in residual calculation that was causing it to crash when output residuals not selected ; made significant modification to the handling of input datasets with regards to bad bands lists  copied code ; from recently updated CRES module. Now uses bad bands common to all spectral libraries and input image ; warns user if bad bands are different between spectral libraries and image, and asks user if new bad bands ; list should be output as an ASCII file. This file can be imported in ENVI ; 1707 Kerry Halligan fixed bug in the restore control file routine (viper_mesma_restore) which was causing ; the index for the scale factor (03) to be stored rather than the scale factor itself (0,1,1000,10000) ; 22707 Kerry Halligan fixed bug in the residual constraint. Dropped code which autmatically set number ; of bands to the number of good bands, and changed so that it only checks to see if number of good bands ; is less than number of residual bands IF residual constraint is used. ; 7307 Kerry Halligan made miner change to default file names in the 'browse' and 'save' routines to speed ; selection of input and output filenames. ;
;+ ; This function is called by the MESMA code processes on a single image line ; It gets passed the endmember array that is n x m where n is the number of endmembers and m is the number of bands per spectrum. The resids keyword can be set to a named variable that will contain the riduals for all bands for all spectra in the input image_line. viper_mesma_fraccalc makes several improvements over the previously used fracCalc including much reduced looping.
Returns a structure with the following tags: frac, shade, rmse.
Frac is a n X m array where n is the number of nonshade endmembers, and m is the number of pixels in the input image line, and the values are the fractional abundance of each endmember for each pixel, in reflectance units. Shade and RMSE are m element vectors where m is the number of pixels in the input image line, and the values are the fractional abundance of shade and the RMSE in reflectance units.
; See main viper_mesma routine for an ; example of using viper_mesma_fraccalc. ; function viper_mesma_fraccalc, em_array, image_line, resids=resids
         http://cs.stanford.edu/people/karpathy/
From UofT: http://arxiv.org/pdf/1411.2539v1.pdf From Baidu/UCLA: http://arxiv.org/pdf/1410.1090v1.pdf From Google: http://googleresearch.blogspot.com/2014/11/apictureiswort... From Stanford (our work): http://cs.stanford.edu/people/karpathy/deepimagesent/ From Berkeley http://arxiv.org/pdf/1411.4389v1.pdf Amir Standford: label generation for image. this is goifg to fit into deepdive to be able to understand what;s inside a image. they tried inside a DB but that didn't work out so they do it in C++ they tried GPU but Chris Rei believes there is a trade off some parts are better done in CPU some GPU, they have taken a look at it but for now they believe on CPU. Their performance is 10x better than others in benchmarks (speed? precision?) The actual knowledge base of deepdive is trained and does knowledge extraction using their KBP, that they won teh contest from others by large margin that they removed kbp from the task.
   
