X-ray-based Detection
Under construction。。。
Explosive Detection and Classification in X-ray Machines
Support Vector Machines:
The SVM classifier finds a hyperplane which separates two-class data with the maximal margin;
It is a statistical learning method to realize the structural risk minimization rather than empirical risk minimization only, taking into account the VC dimension (structural complexity);
For given observations X, and corresponding labels Y which takes +/-1, one finds a classification function: f (x) = sign(wTx + b) where w, b represents the parameters of the hyperplane;
The margin is defined as the distance of the closest training point to the separating hyperplane and its maximization can be formulated as a constrained optimization problem solved by Lagrange multipliers;
Can be transformed to be a dual formulation in terms of those Lagrange multipliers;
The support vectors are those feature vectors lying nearest to the separating hyperplane, which Lagrange multipliers are bigger than zero;
Data sets are not always linearly separable; The SVM takes two approaches to cope with this problem:
Firstly it introduces an error weighting constant C which penalizes mis-classification of samples in proportion to their distance from the classification boundary (slacking variables), called "soft-margin";
Secondly a mapping F is made from the original data space of X to another feature space; This second feature space may have a high or even infinite dimension; One of the advantages of the SVM is that it can be formulated entirely in terms of scalar products in the second feature space, by introducing the kernel K(u, v) = F(u)F(v), known as "kernel trick".
Both the kernel K and penalty C are problem dependent and need to be determined by the user.
Support Vector Machines for Multi-Class Problem:
In order to apply the SVM to multi-class problems, take the one-against-all approach, where each problem discriminates a given class from the other classes; (Note: other ways could be the all-against-all method to compare each class with each other class, the error-correcting output-coding method that gives each class a codeword as well as the generalized coding method, and the hierarchical classification etc.)
Given an m-class problem, train m SVM's, each distinguishes data of some category i from data of all the other m-1 categories j that is not equal to i. Given an unknown data, assign it to the class with the largest SVM output.
Performance Optimization in Support Vector Machines:
For SVM, there are several parameters to choose; Cross validation provides the procedure to find the best parameters; In a v-fold cross validation, the training data can be divided into v subsets of equal size, call "leave-one-out-cross-validation"; Sequentially one subset is tested using the classifier trained on the remaining v-1 subsets;
The cross validation accuracy is the percentage of data which are correctly classified; The main goal of cross validation is prevention of the overfitting problem (i.e. learn the irrelevant details of the data or its noise); Over-fitting implies poor generalization to correctly classify new data;
Additionally, sensitivity to noise and computational complexity may increase with the dimension of the feature space; This problem is known as the curse of dimensionality; Grid search is a straight forward method for cross validation, somehow naive;
The settings for SVM classifiers are the penalty parameter (C) and sigma for kernel (nonlinear); However, when C gets too big, it may result in overfitting too though permitting more support vectors falling into the margin; on the contrary, underfitting may occur if C is set too small.
Decision Tree:
The tree models could be either regression tree or classification tree;
There is a popular ensemble of decision trees, called random forest;
Decision tree is efficient to process large amounts of training data;
To avoid overfitting in decision tree, tree either pre or post pruning is used;
Decision tree works like a flowchart, easier to understand the underlying nature of the analyzed data; easy to see that some initial variable divides the data into two categories and then other variables split the resulting child groups;
Classification trees are well suited to modeling binary variables; they also can model variables with multiple values, and handle variable interactions.
Naive Bayes Classifier:
Naive Bayes is a simple classifier used often in text categorization.
It can be viewed as the maximum a posteriori (MAP) probability classifier for a generative model (instead, logistic regression for a discriminative model);
1) a document category is selected according to class prior probabilities;
2) each word in the document is independently from a multi-nomial distribution over words specific to that class.
While independence is naive assumption, the accuracy of Naive Bayes classification is typically high;
In applying Naive Bayes classifiers, the "zero frequency" problem is solved by smoothing technique:
Laplacian estimation.
In training a Naive Bayes classifier, the task is to estimate class prior probabilities and probabilities of the data given the class;
Degree of smoothing and the number of bins to use when discretizing continous features.
Logistic Regression:
Logistic regression as a discriminative classifier, directly estimate the likelihood of data given the class and the Naive Bayes as a generative classifier, estimate the data probability and the posterior;
When large training data is available, logistic regression is better; instead, naive Bayes outperforms;
If the conditional independence assumption actually holds, a Naive Bayes classifier will converge quicker, so need much less training data; even if the assumption doesn't hold, a Naive Bayes classifier still often performs surprisingly well in practice;
Logistic regression is suggested if want a probabilistic framework (e.g., to easily adjust classification thresholds) or if expect to receive more training data in the future to be able to quickly incorporate into the classifier;
Logistic regression not easily handle categorical variables nor is good for interactions b.w. variables;
Like SVM, logistic regression can apply regularization and kernel trick too;
SVM required less variables to achieve an equivalent misclassification rate;
SVM's loss function is different, related to the maximal margin theory.
Neural Network (Multi-Layer-Perceptron):
Neural Network is more of a "black box", very hard to know how it makes decision in classification;
For a decision by NN, it is very difficult to explain and justify to non-technical people how decisions were made;
Binary categorical input for NNs can be handled by using 0/1 (off/on) inputs, but categorical variables with multiple classes are awkward to handle;
If the goal is to produce a program that can be distributed with a built-in predictive model, it is usually necessary to send along some additional module or library just for the NN interpretation.
X-ray Spectra:
X-ray is an electromagnetic radiation;
Photon energies: 120eV~1.2MeV(electronvolt);
Wavelength range: 0.001nm ~10nm (nanometer);
Visible light: 400-700 nm.
Radar: 105nm ~ 109nm (microwave).
Fig. 1, X-ray wavelength range in Electromagnetic wave
X-ray Generator:
Discovered by Roentgen in 1895;
Main components:
Source of Charged Particles (electrons), i.e. a filament used as a cathode;
Accelerating route (circuit voltage: 100v ~100kv);
Solid target (anode) to stop them (Tungsten).
Bremsstrahlung (continuous spectra) effect;
Line spectra (ionization), i.e. K-edge;
X-ray Tube is a popularly used generator;
Fig. 2, Schematic description of X-ray Tube
Transmission, Absorption and Scattering of X-ray:
Photoelectric effect (absorption): 1~100keV;
Scattering(back/forward scattering):
Coherent Scattering (without energy change);
Compton scattering (with energy change);
Pair production (positron-electron pair): >1MeV;
Not considered in Explosive Detection System (EDS).
Transmission: neither absorbed nor scattered;
X-rays range used for EDS: 10keV ~150keV;
X-ray Detectors:
X-ray film (photographic effect);
Channel electron multipliers;
Gas detectors (gaseous ions);
Transmission: neither absorbed nor scattered;
Silicon detectors;
Scatter data (energy dispersive analysis);
Scintillation detectors.
Transmission data.
Scintillation Detector:
Absorption of the incident radiation by the scintillator;
Luminescent conversion of the energy dissipated in the scintillator;
Emission of light photons;
Impingement on photodiodes from light photons.
Converting of photons to electrical currents.
Fig. 3, Schematic Illustration of Scintillation Detectors
X-ray Transmission Sensing with dual energy detectors:
Attenuation I=I0 exp(-m d);
Absorption I=I0 {1-exp(-m d)};
Dual energy data can avoid beam hardening in single energy detector;
Banana curve in dual energy space for the same atomic number;
Interpolation of those banana curves build LUT of atomic number and dual energy detection pairs.
Fig. 4, Dual energy (DE) transmission
(Note: There is relationship between thickness, atomic number
and density in linear transform with high and low energies.)
Fig. 5, Dual energy (DE) transmission detector
(Note: Adjust to make the overlapped shadow region as small as possible.)
Threat Detection in Dual Energy X-ray Images:
Image Segmentation by mean shift/watershed;
Thresholding and grouping segments of density images by connected component analysis.
Final classification by atomic number range to reduce false alarms.
Fig. 6, X-ray Colored Dual Engery Images and Threat Detection by Image Processing and Atomic Number LUT.
Substance Identification based on Small angle X-ray diffraction:
X-ray Coherent Scattered Spectrum from Small Angle Diffraction;
Normalize the scatter spectrum with the transmission spectrum;
Compensation of partial volume effect;
Train a substance classifier with those spectrum as the signature;
Identify the substance with the trained classifier;
SVM, decision tree, Naive Bayes classifier, logistic regression or NN.
Fig. 7, A typical Silicon detector
(Note: Incident X-rays may cause ionization in the Si. The resulting characteristic X-rays may
escape or be absorbed by electron-hole pairs in the intrinsic region of the semiconductor; these
charge carriers then migrate to the electrodes under the influence of an applied bias voltage . )
Fig. 8, A Substance Identification Detector Array
(Note: The inspection region consists of an infinitesimally thin ring. Its volume
attains the shape of a ring with “diamond”-shaped cross section. Dual energy
transmission spectrum from X-ray “Ring Detector” is used to “normalize”
the measured scatter spectrum from X-ray Energy Dispersive Detector. )