X-ray-based Detection

Under construction。。。

Explosive Detection and Classification in X-ray Machines

Support Vector Machines:

The SVM classifier finds a hyperplane which separates two-class data with the maximal margin;
It is a statistical learning method to realize the structural risk minimization rather than empirical risk minimization only, taking into account the VC dimension (structural complexity);
- For given observations X, and corresponding labels Y which takes +/-1, one finds a classification function: f (x) = sign(wTx + b) where w, b represents the parameters of the hyperplane;
The margin is defined as the distance of the closest training point to the separating hyperplane and its maximization can be formulated as a constrained optimization problem solved by Lagrange multipliers;
- Can be transformed to be a dual formulation in terms of those Lagrange multipliers;
The support vectors are those feature vectors lying nearest to the separating hyperplane, which Lagrange multipliers are bigger than zero;
Data sets are not always linearly separable; The SVM takes two approaches to cope with this problem:
Firstly it introduces an error weighting constant C which penalizes mis-classification of samples in proportion to their distance from the classification boundary (slacking variables), called "soft-margin";
Secondly a mapping F is made from the original data space of X to another feature space; This second feature space may have a high or even infinite dimension; One of the advantages of the SVM is that it can be formulated entirely in terms of scalar products in the second feature space, by introducing the kernel K(u, v) = F(u)F(v), known as "kernel trick".

Both the kernel K and penalty C are problem dependent and need to be determined by the user.

Support Vector Machines for Multi-Class Problem:

In order to apply the SVM to multi-class problems, take the one-against-all approach, where each problem discriminates a given class from the other classes; (Note: other ways could be the all-against-all method to compare each class with each other class, the error-correcting output-coding method that gives each class a codeword as well as the generalized coding method, and the hierarchical classification etc.)
Given an m-class problem, train m SVM's, each distinguishes data of some category i from data of all the other m-1 categories j that is not equal to i. Given an unknown data, assign it to the class with the largest SVM output.

Performance Optimization in Support Vector Machines:

For SVM, there are several parameters to choose; Cross validation provides the procedure to find the best parameters; In a v-fold cross validation, the training data can be divided into v subsets of equal size, call "leave-one-out-cross-validation"; Sequentially one subset is tested using the classifier trained on the remaining v-1 subsets;
The cross validation accuracy is the percentage of data which are correctly classified; The main goal of cross validation is prevention of the overfitting problem (i.e. learn the irrelevant details of the data or its noise); Over-fitting implies poor generalization to correctly classify new data;
Additionally, sensitivity to noise and computational complexity may increase with the dimension of the feature space; This problem is known as the curse of dimensionality; Grid search is a straight forward method for cross validation, somehow naive;
The settings for SVM classifiers are the penalty parameter (C) and sigma for kernel (nonlinear); However, when C gets too big, it may result in overfitting too though permitting more support vectors falling into the margin; on the contrary, underfitting may occur if C is set too small.

Decision Tree:

The tree models could be either regression tree or classification tree;
There is a popular ensemble of decision trees, called random forest;
Decision tree is efficient to process large amounts of training data;
To avoid overfitting in decision tree, tree either pre or post pruning is used;
Decision tree works like a flowchart, easier to understand the underlying nature of the analyzed data; easy to see that some initial variable divides the data into two categories and then other variables split the resulting child groups;
Classification trees are well suited to modeling binary variables; they also can model variables with multiple values, and handle variable interactions.

Naive Bayes Classifier:

Naive Bayes is a simple classifier used often in text categorization.
It can be viewed as the maximum a posteriori (MAP) probability classifier for a generative model (instead, logistic regression for a discriminative model);
- 1) a document category is selected according to class prior probabilities;
- 2) each word in the document is independently from a multi-nomial distribution over words specific to that class.
While independence is naive assumption, the accuracy of Naive Bayes classification is typically high;
In applying Naive Bayes classifiers, the "zero frequency" problem is solved by smoothing technique:
- Laplacian estimation.
In training a Naive Bayes classifier, the task is to estimate class prior probabilities and probabilities of the data given the class;
- Degree of smoothing and the number of bins to use when discretizing continous features.

Logistic Regression:

Logistic regression as a discriminative classifier, directly estimate the likelihood of data given the class and the Naive Bayes as a generative classifier, estimate the data probability and the posterior;
When large training data is available, logistic regression is better; instead, naive Bayes outperforms;
- If the conditional independence assumption actually holds, a Naive Bayes classifier will converge quicker, so need much less training data; even if the assumption doesn't hold, a Naive Bayes classifier still often performs surprisingly well in practice;
- Logistic regression is suggested if want a probabilistic framework (e.g., to easily adjust classification thresholds) or if expect to receive more training data in the future to be able to quickly incorporate into the classifier;
Logistic regression not easily handle categorical variables nor is good for interactions b.w. variables;
Like SVM, logistic regression can apply regularization and kernel trick too;
- SVM required less variables to achieve an equivalent misclassification rate;
- SVM's loss function is different, related to the maximal margin theory.

Neural Network (Multi-Layer-Perceptron):

Neural Network is more of a "black box", very hard to know how it makes decision in classification;
For a decision by NN, it is very difficult to explain and justify to non-technical people how decisions were made;
Binary categorical input for NNs can be handled by using 0/1 (off/on) inputs, but categorical variables with multiple classes are awkward to handle;
If the goal is to produce a program that can be distributed with a built-in predictive model, it is usually necessary to send along some additional module or library just for the NN interpretation.

X-ray Spectra:

- X-ray is an electromagnetic radiation;
- Photon energies: 120eV~1.2MeV(electronvolt);
- Wavelength range: 0.001nm ~10nm (nanometer);
- Visible light: 400-700 nm.
- Radar: 105nm ~ 109nm (microwave).

Fig. 1, X-ray wavelength range in Electromagnetic wave

X-ray Generator:

- Discovered by Roentgen in 1895;
- Main components:
  - Source of Charged Particles (electrons), i.e. a filament used as a cathode;
  - Accelerating route (circuit voltage: 100v ~100kv);
  - Solid target (anode) to stop them (Tungsten).
- Bremsstrahlung (continuous spectra) effect;
- Line spectra (ionization), i.e. K-edge;
- X-ray Tube is a popularly used generator;

Fig. 2, Schematic description of X-ray Tube

Transmission, Absorption and Scattering of X-ray:

- Photoelectric effect (absorption): 1~100keV;
- Scattering(back/forward scattering):
  - Coherent Scattering (without energy change);
  - Compton scattering (with energy change);
- Pair production (positron-electron pair): >1MeV;
  - Not considered in Explosive Detection System (EDS).
- Transmission: neither absorbed nor scattered;
- X-rays range used for EDS: 10keV ~150keV;

X-ray Detectors:

- X-ray film (photographic effect);
- Channel electron multipliers;
- Gas detectors (gaseous ions);
- Transmission: neither absorbed nor scattered;
- Silicon detectors;
  - Scatter data (energy dispersive analysis);
- Scintillation detectors.
  - Transmission data.

Scintillation Detector:

- Absorption of the incident radiation by the scintillator;
- Luminescent conversion of the energy dissipated in the scintillator;
- Emission of light photons;
- Impingement on photodiodes from light photons.
- Converting of photons to electrical currents.

Fig. 3, Schematic Illustration of Scintillation Detectors

X-ray Transmission Sensing with dual energy detectors:

- Attenuation I=I0 exp(-m d);
- Absorption I=I0 {1-exp(-m d)};
- Dual energy data can avoid beam hardening in single energy detector;
- Banana curve in dual energy space for the same atomic number;
- Interpolation of those banana curves build LUT of atomic number and dual energy detection pairs.

Fig. 4, Dual energy (DE) transmission

(Note: There is relationship between thickness, atomic number

and density in linear transform with high and low energies.)

Fig. 5, Dual energy (DE) transmission detector

(Note: Adjust to make the overlapped shadow region as small as possible.)

Threat Detection in Dual Energy X-ray Images:

- Image Segmentation by mean shift/watershed;
- Thresholding and grouping segments of density images by connected component analysis.
- Final classification by atomic number range to reduce false alarms.

Fig. 6, X-ray Colored Dual Engery Images and Threat Detection by Image Processing and Atomic Number LUT.

Substance Identification based on Small angle X-ray diffraction:

- X-ray Coherent Scattered Spectrum from Small Angle Diffraction;
- Normalize the scatter spectrum with the transmission spectrum;
- Compensation of partial volume effect;
- Train a substance classifier with those spectrum as the signature;
- Identify the substance with the trained classifier;
  - SVM, decision tree, Naive Bayes classifier, logistic regression or NN.

Fig. 7, A typical Silicon detector

(Note: Incident X-rays may cause ionization in the Si. The resulting characteristic X-rays may

escape or be absorbed by electron-hole pairs in the intrinsic region of the semiconductor; these

charge carriers then migrate to the electrodes under the influence of an applied bias voltage . )

Fig. 8, A Substance Identification Detector Array

(Note: The inspection region consists of an infinitesimally thin ring. Its volume

attains the shape of a ring with “diamond”-shaped cross section. Dual energy

transmission spectrum from X-ray “Ring Detector” is used to “normalize”

the measured scatter spectrum from X-ray Energy Dispersive Detector. )

Page updated

Google Sites

Report abuse