discriminant.analysis.fnc

Copy, Paste and Adapts

model.ad= discriminant.analysis.fnc(dat, stepwise=F, group='diagnostico', variables=2:6, graphic=T)

dat = discriminant.score.fnc(dat, model=model.ad)

Objetive

Performs linear and cuadratic discriminant analysis with simultaneous estimation or stepwise from a grouping factor (group) on a number of p user defined cuantitatives variables .

Discriminant Analysis

Starting from the iris database (available at the library MASS) on which we will carry aout a discriminant analysis in order to find linear combinations of 4 quantitative variables that maximize the distance between three groups of plant species.

head(iris)

We request the frequency distribution of Species grouping variable.

frequency.fnc(iris, variable='Species')

model1= discriminant.analysis.fnc(iris, group='Species')

Note that we have assigned the output of the function to the object model1. This would be particularly useful if we use the estimated model to extract both discriminant scores for each subject as to conduct property group predictions for subjects not included in the estimation process.

By default if only the mandatory argument group is included the function will analyze all the variables of the database (in this example iris) and you will get:

  1. Stepwise estimation of the model.

  2. Table of means of selected p variables in J factor levels included in the group argument.

  3. Manova omnibus.

  4. Estimated discriminants functions.

  5. Wilks value for the drop of each variable in the functions.

  6. MBox Test of homogeneity of variances.

  7. Centroids, stantard, unstandard and structure coefficients for all variables in each functions.

  8. Crosstabulation of absolute and relative frequencies of the classification of groups from J discriminant functions.

Veamos la llamada anterior incluyendo el nuevo argumento graphic=T.

model1=discriminant.analysis.fnc(iris, variables=1:4,

stepwise=T, group='Species', graphic=T)

In the outcomes you will see previous graphs. On the left you can see that the first function very clearly separates the species setosa of the remainder (virginica and versicolor), while the second just adds nothing in the separation.

With graphic=T argument we get:

  1. A histogram for discriminant scores in each discriminant function for the J groups.

  2. Correlation plot of the p discriminant variables.

  3. Territorial map of the p discriminant variables.

The graph in the second figure recreates the discriminative potential of all combinations 2 by 2 of dismininantes variables. Each graph divides the space between the J groups with different colors. Each observation is located in the territorial space. If the color of that location was red would indicate an incorrect classification. The proportion of this degree of error leads each of the graphs. The variable conjunction with Sepal.Length Petal.Length generates the smallest error (0.03), while the crossing Sepal.Length Sepal.Width and higher (0.2). No wonder that the slightest error variables are obviously the most load on the first discriminant function while the combination is precisely the biggest mistake made by the two least important variables discrimimante function more discriminative potential.

res.analisis.discriminante

Validación cruzada (jacknife)

With VC=T argument the function carry out a cross validations through the method leave one out cross validation.

model1 = discriminant.analysis.fnc(iris,

group='Species',VC=T)

A jacknife crossvalidation table is added to the discriminant output.

$validacion.cruzada.jacknife

Predicho

Observado setosa versicolor virginica

setosa 50 0 0

versicolor 0 48 2

virginica 0 1 49

$prop.validacion.cruzada

Predicho

Observado setosa versicolor virginica

setosa 1.00 0.00 0.00

versicolor 0.00 0.96 0.04

virginica 0.00 0.02 0.98

Quadratic Discriminant Analysis

The classic discriminant analysis (linear) presents an important requirement: homogeneity of variance and covariance matrix of discriminant variables across the J groups. If this assumption is violated the user can add the argument ADC = T indicating that further desired estimation quadratic discriminant analysis model for which this assumption is not required. If it is true (T) this argument, the function added to the estimated line,the quadratic. This can be used to make predictions of observations to groups rather than the linear model.

discriminant.analysis.fnc(iris, group='Species', ADC=T)

Discriminants scores

When we estimate a linear discriminant model, the discriminant scores for each subject and the group predicted can bee extracted. We will use the function: discriminant.score.fnc, whose only arguments are the name of the database (iris) and the estimated linear model (model1). It is very important to note here that the user can use this feature to extend the model to a different database than that used to estimate it. Obviously these new data must contain the discriminating variables used in the estimation.

iris.2 = discriminant.score.fnc(iris, model=model1)

The user can carry out now an Anova and poshoc contrasts with this discriminants scores as dependent variable.

Anova.fnc(iris.2, between.factor='Species', dv='LD1',

poshoc='Species', graphic=F)

We create an orthogonal contrast family with a first comparing the species setosa versus the other two and the second orthogonal to the first one where we compare the species versicolor vs virginica.

my.contrasts=list(set.vs.vv= c(-2,1,1),

ver.vs.vir=c(0,1,-1))

Anova.fnc(iris.2, between.factor='Species', dv='LD1',

poshoc='Species', contrasts=my.contrasts)

Par a par

Ortogonales

Eficacia de un modelo predictivo

Partiendo de la misma base de datos iris vamos a realizar un procedimiento de estimación del modelo discriminante en la mitad de los datos y utilizaremos la otra mitad para probar la bondad del mismo.

  1. Crearemos una nueva variable que indique el número del caso o registro que llamaremos registro.

  2. Extraeremos una muestra aleatoria por cada nivel de la variable de agrupación Species.

  3. Sobre esa muestra estimaremos el modelo discriminante.

  4. Extraeremos de la base de datos iris los registros que no han participado en la muestra aleatoria anterior. La llamaremos arbitrariamente resto.

  5. Estimaremos el grupo de pertenencia de cada registro de resto utilizando el modelo estimado a partir de la muestra.

  6. Comprobaremos la bondad de ajuste del modelo mediante la tabla de contingencia de la variable Species y la variable predicha de Species.

frequency.fnc(iris, 'Species')

De esta manera vimos en la introducción anterior que había 50 registros por nivel de Species.

1.- Creamos la variable registro.

iris$record= 1:150

2.- Extraemos una muestra aleatoria de tamaño 25 por nivel de Species.

my.sample=random.sample.fnc(iris, which.factor='Species',

n=25, ID='record')

3.- Estimamos el modelo discriminante sobre esta muestra y lo guardamos en ad.2.

ad.2 = discriminant.analysis.fnc(my.sample,

variables=1:4, group='Species')

4.- Extraemos de iris los registros que no forman parte de la muestra (Operadores Lógicos)

index = iris$record %in% my.sample$record

remainder= iris[!index, ]

5.- Aplicamos el modelo predictivo al resto de los datos

prediction=discriminant.score.fnc(remainder, model=ad.2)

6.- Comprobamos la bondad predictiva del modelo solicitando la tabla de contingencia del cruce de la variable observada Species y su versión predicha por el modelo grupo.predicho.

frequency.fnc(prediction, 'Species:grupo.predicho')