X-ray-based Detection

Under construction。。。

Explosive Detection and Classification in X-ray Machines

Support Vector Machines:

  • The SVM classifier finds a hyperplane which separates two-class data with the maximal margin;

  • It is a statistical learning method to realize the structural risk minimization rather than empirical risk minimization only, taking into account the VC dimension (structural complexity);

    • For given observations X, and corresponding labels Y which takes +/-1, one finds a classification function: f (x) = sign(wTx + b) where w, b represents the parameters of the hyperplane;

  • The margin is defined as the distance of the closest training point to the separating hyperplane and its maximization can be formulated as a constrained optimization problem solved by Lagrange multipliers;

    • Can be transformed to be a dual formulation in terms of those Lagrange multipliers;

  • The support vectors are those feature vectors lying nearest to the separating hyperplane, which Lagrange multipliers are bigger than zero;

  • Data sets are not always linearly separable; The SVM takes two approaches to cope with this problem:

  • Firstly it introduces an error weighting constant C which penalizes mis-classification of samples in proportion to their distance from the classification boundary (slacking variables), called "soft-margin";

  • Secondly a mapping F is made from the original data space of X to another feature space; This second feature space may have a high or even infinite dimension; One of the advantages of the SVM is that it can be formulated entirely in terms of scalar products in the second feature space, by introducing the kernel K(u, v) = F(u)F(v), known as "kernel trick".

Both the kernel K and penalty C are problem dependent and need to be determined by the user.

Support Vector Machines for Multi-Class Problem:

  • In order to apply the SVM to multi-class problems, take the one-against-all approach, where each problem discriminates a given class from the other classes; (Note: other ways could be the all-against-all method to compare each class with each other class, the error-correcting output-coding method that gives each class a codeword as well as the generalized coding method, and the hierarchical classification etc.)

  • Given an m-class problem, train m SVM's, each distinguishes data of some category i from data of all the other m-1 categories j that is not equal to i. Given an unknown data, assign it to the class with the largest SVM output.

Performance Optimization in Support Vector Machines:

  • For SVM, there are several parameters to choose; Cross validation provides the procedure to find the best parameters; In a v-fold cross validation, the training data can be divided into v subsets of equal size, call "leave-one-out-cross-validation"; Sequentially one subset is tested using the classifier trained on the remaining v-1 subsets;

  • The cross validation accuracy is the percentage of data which are correctly classified; The main goal of cross validation is prevention of the overfitting problem (i.e. learn the irrelevant details of the data or its noise); Over-fitting implies poor generalization to correctly classify new data;

  • Additionally, sensitivity to noise and computational complexity may increase with the dimension of the feature space; This problem is known as the curse of dimensionality; Grid search is a straight forward method for cross validation, somehow naive;

  • The settings for SVM classifiers are the penalty parameter (C) and sigma for kernel (nonlinear); However, when C gets too big, it may result in overfitting too though permitting more support vectors falling into the margin; on the contrary, underfitting may occur if C is set too small.

Decision Tree:

  • The tree models could be either regression tree or classification tree;

  • There is a popular ensemble of decision trees, called random forest;

  • Decision tree is efficient to process large amounts of training data;

  • To avoid overfitting in decision tree, tree either pre or post pruning is used;

  • Decision tree works like a flowchart, easier to understand the underlying nature of the analyzed data; easy to see that some initial variable divides the data into two categories and then other variables split the resulting child groups;

  • Classification trees are well suited to modeling binary variables; they also can model variables with multiple values, and handle variable interactions.

Naive Bayes Classifier:

  • Naive Bayes is a simple classifier used often in text categorization.

  • It can be viewed as the maximum a posteriori (MAP) probability classifier for a generative model (instead, logistic regression for a discriminative model);

    • 1) a document category is selected according to class prior probabilities;

    • 2) each word in the document is independently from a multi-nomial distribution over words specific to that class.

  • While independence is naive assumption, the accuracy of Naive Bayes classification is typically high;

  • In applying Naive Bayes classifiers, the "zero frequency" problem is solved by smoothing technique:

    • Laplacian estimation.

  • In training a Naive Bayes classifier, the task is to estimate class prior probabilities and probabilities of the data given the class;

    • Degree of smoothing and the number of bins to use when discretizing continous features.

Logistic Regression:

  • Logistic regression as a discriminative classifier, directly estimate the likelihood of data given the class and the Naive Bayes as a generative classifier, estimate the data probability and the posterior;

  • When large training data is available, logistic regression is better; instead, naive Bayes outperforms;

    • If the conditional independence assumption actually holds, a Naive Bayes classifier will converge quicker, so need much less training data; even if the assumption doesn't hold, a Naive Bayes classifier still often performs surprisingly well in practice;

    • Logistic regression is suggested if want a probabilistic framework (e.g., to easily adjust classification thresholds) or if expect to receive more training data in the future to be able to quickly incorporate into the classifier;

  • Logistic regression not easily handle categorical variables nor is good for interactions b.w. variables;

  • Like SVM, logistic regression can apply regularization and kernel trick too;

    • SVM required less variables to achieve an equivalent misclassification rate;

    • SVM's loss function is different, related to the maximal margin theory.

Neural Network (Multi-Layer-Perceptron):

  • Neural Network is more of a "black box", very hard to know how it makes decision in classification;

  • For a decision by NN, it is very difficult to explain and justify to non-technical people how decisions were made;

  • Binary categorical input for NNs can be handled by using 0/1 (off/on) inputs, but categorical variables with multiple classes are awkward to handle;

  • If the goal is to produce a program that can be distributed with a built-in predictive model, it is usually necessary to send along some additional module or library just for the NN interpretation.

X-ray Spectra:

    • X-ray is an electromagnetic radiation;

    • Photon energies: 120eV~1.2MeV(electronvolt);

    • Wavelength range: 0.001nm ~10nm (nanometer);

    • Visible light: 400-700 nm.

    • Radar: 105nm ~ 109nm (microwave).

Fig. 1, X-ray wavelength range in Electromagnetic wave

X-ray Generator:

    • Discovered by Roentgen in 1895;

    • Main components:

      • Source of Charged Particles (electrons), i.e. a filament used as a cathode;

      • Accelerating route (circuit voltage: 100v ~100kv);

      • Solid target (anode) to stop them (Tungsten).

    • Bremsstrahlung (continuous spectra) effect;

    • Line spectra (ionization), i.e. K-edge;

    • X-ray Tube is a popularly used generator;

Fig. 2, Schematic description of X-ray Tube

Transmission, Absorption and Scattering of X-ray:

    • Photoelectric effect (absorption): 1~100keV;

    • Scattering(back/forward scattering):

      • Coherent Scattering (without energy change);

      • Compton scattering (with energy change);

    • Pair production (positron-electron pair): >1MeV;

      • Not considered in Explosive Detection System (EDS).

    • Transmission: neither absorbed nor scattered;

    • X-rays range used for EDS: 10keV ~150keV;

X-ray Detectors:

    • X-ray film (photographic effect);

    • Channel electron multipliers;

    • Gas detectors (gaseous ions);

    • Transmission: neither absorbed nor scattered;

    • Silicon detectors;

      • Scatter data (energy dispersive analysis);

    • Scintillation detectors.

      • Transmission data.

Scintillation Detector:

    • Absorption of the incident radiation by the scintillator;

    • Luminescent conversion of the energy dissipated in the scintillator;

    • Emission of light photons;

    • Impingement on photodiodes from light photons.

    • Converting of photons to electrical currents.

Fig. 3, Schematic Illustration of Scintillation Detectors

X-ray Transmission Sensing with dual energy detectors:

    • Attenuation I=I0 exp(-m d);

    • Absorption I=I0 {1-exp(-m d)};

    • Dual energy data can avoid beam hardening in single energy detector;

    • Banana curve in dual energy space for the same atomic number;

    • Interpolation of those banana curves build LUT of atomic number and dual energy detection pairs.

Fig. 4, Dual energy (DE) transmission

(Note: There is relationship between thickness, atomic number

and density in linear transform with high and low energies.)

Fig. 5, Dual energy (DE) transmission detector

(Note: Adjust to make the overlapped shadow region as small as possible.)

Threat Detection in Dual Energy X-ray Images:

    • Image Segmentation by mean shift/watershed;

    • Thresholding and grouping segments of density images by connected component analysis.

    • Final classification by atomic number range to reduce false alarms.


Fig. 6, X-ray Colored Dual Engery Images and Threat Detection by Image Processing and Atomic Number LUT.

Substance Identification based on Small angle X-ray diffraction:

    • X-ray Coherent Scattered Spectrum from Small Angle Diffraction;

    • Normalize the scatter spectrum with the transmission spectrum;

    • Compensation of partial volume effect;

    • Train a substance classifier with those spectrum as the signature;

    • Identify the substance with the trained classifier;

      • SVM, decision tree, Naive Bayes classifier, logistic regression or NN.

Fig. 7, A typical Silicon detector

(Note: Incident X-rays may cause ionization in the Si. The resulting characteristic X-rays may

escape or be absorbed by electron-hole pairs in the intrinsic region of the semiconductor; these

charge carriers then migrate to the electrodes under the influence of an applied bias voltage . )

Fig. 8, A Substance Identification Detector Array

(Note: The inspection region consists of an infinitesimally thin ring. Its volume

attains the shape of a ring with “diamond”-shaped cross section. Dual energy

transmission spectrum from X-ray “Ring Detector” is used to “normalize”

the measured scatter spectrum from X-ray Energy Dispersive Detector. )