## SMOTE (Synthetic Minority Over-Sampling Technique)by Manohar
29 Oct 2012 The SMOTE function takes the feature vectors with dimension(r,n) and the target class with dimension
## ADASYN (improves class balance, extension of SMOTE)
17 Apr 2015 (Updated 23 Apr 2015) ADASYN algorithm to reduce class imbalance by synthesizing minority class examples
## SMOTEBoostby Barnan Das
26 Jun 2012 Implementation of SMOTEBoost algorithm used to handle class imbalance problem in data.
## Algorithms for imbalanced multi class classification in Matlab?Answer by Ilya on 13 Oct 2012 Accepted answer I described approaches for learning on imbalanced data here http://www.mathworks.com/matlabcentral/answers/11549-leraning-classification-with-most-training-samples-in-one-category This advice is applicable to any number of classes. If you have Statistics Tlbx in R2012b, I recommend RUSBoost algorithm available from ## 4 CommentsShow 1 older comment Ilya on 14 Oct 2012 I'll make sure to pass your joy to the doc writer who worked on that page. RUSboost undersamples the majority class(es) for every weak learner in the ensemble (decision tree, most usually). For example, if the majority class has 10 times as many observations as the minority class, it is undersampled 1/10. If the ensemble has say 100 trees, every observation in the majority class is used 100/10=10 times by the ensemble on average. Every observation in the minority class is used 100 times, once for every tree. The MATLAB implementation follows the paper by Seiffert et al. If you are not certain about a specific detail, post your question to Answers or call our Tech Support. Take a look at the doc for Assigning a large misclassification cost to a class tells It's ok to skew your data thus making it not representative of the real world if it gives you a better confusion matrix for the classes that you care to classify correctly. If you assign uniform prior, the accuracy for the rare classes will likely improve and the accuracy for the popular classes will likely go down. The page for I typed 'cross-validate ensemble' in the online doc search box, and the 2nd hit was this page http://www.mathworks.com/help/stats/classificationensemble.crossval.html There is a short example at the bottom. Does this suffice? Carlos Andrade on 14 Oct 2012 Hi Ilya, Yes those were all great answers, thanks for covering all my questions. Please forward the compliment, it is well deserved. I learn a lot from them things that are sometimes over complicated in my textbooks. I did not know cross validation in classification were stratified by default, this just makes me more happy :-) I have one more question and one last concern. The last question is in respect to the paper you pointed me out. The paper index refers to binary imbalanced problems. To my understanding, you and also the documentation of the method suggest it can be used to either 2 classes or more. From your comment, I understood that this is what some authors have been calling a one versus all approach. What I mean one versus all is that the algorithm is creating a weak classifier that only sees two classes, one is a weak (positive) and all the remaining classes are considered negative and grouped as a single class. So we would have k classifiers where k is defined by the number of classes in the dataset I want to predict. The final class label would be judged based on the agreement of all the weak classifiers of the ensemble. Is that so? I just want to make sure I am following how matlab extended the binary classification to a multi class classification and if it is already one I have seen (but only in theory). I have one last concern, in respect to licensing, since it includes this problem I will post it here, I hope this is not a problem: I have a package associated to my institution (Stevens Institute of Technology) which is currently in Matlab R2012a. What is the best option in my case to move to obtain this algorithm? Is it possible to buy only a toolbox and plug it in Matlab R2012a from my institution, or since this is a student version I must buy a completely separated version for R2012b? In any case, what minimum licenses would I need to be able to run Rusboost? Matlab R2012b plus the Statistical Package? And lastly, is it possible to run any trial version of this algorithm to see how it behaves with our datasets if requested from an institution or professor from academia? Thank you, Carlos Ilya on 14 Oct 2012 RUSboost uses AdaBoost.M2 algorithm underneath. This is a multiclass algorithm proposed by Freund and Schapire. It is not reducible to one-vs-all strategy. I don't remember a published reference off top of my head, but a google search finds this http://users.eecs.northwestern.edu/~yingwu/teaching/EECS510/Reading/Freund_ICML96.pdf. An observation is assigned to the class with the largest score. You need Statistics Tlbx R2012b. For licensing and trial questions, please call our customer support. Vote 0 Answer by Walter Roberson on 13 Oct 2012 Usually multi-class problems are handled by doing pairwise discrimination. Class 1 vs everything else, to pull out class 1. Take the "everything else" and run it against class 2 to get class 2 and a new "everything else". And so on. You ## 1 CommentCarlos Andrade on 13 Oct 2012 Hi Walter, Thank you for your reply. By pairwise, are you referring to what they call the One versus all approach? I found some papers on them, specially on doing this together with AdaBoost and Ensemble methods, but I only found one implementation in R. The implementation requires splitting the data, while I found MATLAB stratified k-fold to be more appropriate to validate it in such case. Could you point out any implementation in MATLAB for this that already takes into account in the algorithm the Ensemble method? The only ones I have found so far do not address it looking as multi class. Thank you, Carlos |