What can LDA do?
Dimension Reduction
Classification
Contributions of predictors
Idea behind LDA
Maximize distance between mean
Minimize variation within each class
What distance we use here?
Usually we use euclidean distance. Three cons:
depends on unit
not consider predictors variation
not consider correlation
So, we use statistical distance here
Let's take 2-d example:
Seeing picture on the left, we had X and y and color represents different class.
can't hard just only depend on X or Y to identify the class.
But when we use both X and Y, we kind of get pictures on the left to use an line to seperate the X and Y when we use new axes from LDA
So, we define a ratio is distance between to class to sum of their standard deviation.
Goal : maximize the ratio (distance between to class to sum of their standard deviation)
Method :
classification function to compute classification score
compute each class' probabilities, then assign label with larger probabilities.
Assumptions:
Multivariate normal distribution
the correlation structure between the different predictors within a class is the same across class
Use EDA to identify whether data fits the assumptions or not.
Performance: as classification, we use accuracy, auc and roc to evaluate its effect.
Prior Probability
Situation: probabilities of encountering a record for classification in the future is not equal for the different class
Cost
Situation: misclassification costs are not symmetrical
Pros:
Easy to calculations
Compare to other classification model, LDA can tell the contributions of each predictors
Cons:
Need to check two assumptions
LDA is sensitive to outliers, need to get rid of outliers
Need transform due to distance calculation