Pattern recognition toolbox
TOOLDIAG is a collection of methods for statistical pattern recognition. The main area of application is classification. The application area is limited to multidimensional continuous features, without any missing values. No symbolic features (attributes) are allowed. The program in implemented in the 'C' programming language and was tested in several computing environments. The user interface is simple, commandline oriented, but the methods behind it are efficient and fast. You can customize your own methods on the application programming level with relatively little effort. If you wish a presentation of the theory behind the program at your university, feel free to contact me. In the following a more detailed description about the possibilities of TOOLDIAG is given.  CLASSIFIER PARADIGM
Different classifier types are provided: KNearest Neighbor
 Linear Machines, using the following learning rules
 DeltaRule (a.k.a WidrowHoff rule, LMS rule)
 Deterministic LeastMeanSquare rule (a.k.a Pseudoinverse)
 Perceptron learning rule
 Quadratic Gaussian Classifier
 Radial Basis Function Network with training algorithms
 Regularization
 ErrorCorrection Learning
 Parzen window with kernel types: Hypercubic, Hypertriangle, Hyperspheric, Gaussian, Exponential, Lorenz
 Q* algorithm
 Multilayer Perceptron (1 hidden layer)
 Learning rules: Stochastic & Batch (with/without momentum)
 Activation function: Sigmoid & Hyperbolic tangent
 Support Vector Machine, using LIBSVM
 Probabilistic Neural Network
 "Your own classifier" (Framework to implement your own classification method)
 FEATURE SELECTION
A strong part of the program. Several search strategies are provided: Best Features
 Sequential Forward Selection
 Sequential Backward Selection
 Plus L  Take away R
 Sequential Floating Forward Selection
 Sequential Floating Backward Selection
 Branch and Bound
 Exhaustive Search
The search strategies can be combined with several selection criteria. The main groups of the selection criteria are: Estimated minimal error probability
A arbitrary classifier model can be combined with a arbitrary crossvalidation technique to estimate the error (Wrapper method).  Interclass distance: Minkowski, City block, Euclidean, Chebychev, Nonlinear (Parzen & hyperspheric kernel)
 Probabilistic distance, assuming multivariate Gaussian distribution: Chernoff, Bhattacharyya distance, Matusita distance, Divergence, Mahalanobis, PatrickFisher
 Confusionmatrix based
 FEATURE EXTRACTION
All available features are combined to new features with a lower dimension. The methods: Linear discriminant analysis
 Principal Component Analysis alias KarhunenLoève Expansion
 Sammon mapping (a nonlinear method)
 Higherorder combinations of existing features (polynomials)
 Times Series: Fourier transform
 Times Series: Regression by Orthogonal Polynomials
 Times Series: Regression by 'Usual' Polynomials
 PERFORMANCE ESTIMATION
Several performance estimation methods can be combined with all available classifier paradigms, thus allowing easy comparison of results.The CrossValidation methods: Resubstitution
 Holdout
 LeaveOneOut
 Rotation alias Kfold cross validation
 Bootstrap
The performance estimation methods:  Accuracy
 Sensitivity aka Recall
 Precision
 Geometric mean of Sensitivity and Precision
 Fmeasure of Sensitivity and Precision
 ROC analysis: Area under ROC curve
 SAMMON PLOT
A graphical interface to the GNUPLOT program is provided which allows to plot the data points in 2D or 3D. Higherdimensional data can be mapped by a structure conserving algorithm, the Sammon mapping.  INTERFACING
The analized data can be passed to other programs or can be split into several training and test data sets. Two different feature families which describe the same samples can be merged together. Interfaces exist to:  NORMALIZATION
The data samples can be normalized Linear to [0,1]
 Zero mean, unit variance
 STATISTICS
Statistical parameters of the data can be generated, globally and for each particular class: Extrema, mean and standard deviation
 Covariance matrix
 Correlation matrix
 Inertia
 Dispersion and overlapping
 Besides a set of functionalities of minor importance are available, like loading and saving possibilities, noise adding or a demonstration run.
Envisaged future extensions of the program: Regression
 Estimation of unknown feature values
 Symbolic features allowed
The following WWW resources contain databases which are processable by TOOLDIAG. Only databases with continuous features (attributes) and no missing values are allowed as input to TOOLDIAG.
Windows: There is a precompiled executable included and a project file for the DevC++ development environment ( http://www.bloodshed.net/dev/devcpp.html )with which you can recompile the source code. Sorry no fancy interface and intallers available, just pure methods. 
