W. Dhifli, A. B. Diallo. Toward an Efficient Multi-class Classification in an Open Universe. 12th International Conference on Machine Learning and Data Mining (MLDM), New York City, NY, USA 2016 (Best paper award).
W. Dhifli, A. B. Diallo. Face Recognition in the Wild. 20th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES), York, UK 2016.
Existing classification methods are designed to classify unknown instances within a set of previously known classes that are seen in training. Such classification takes the form of prediction within a closed-set. However, a more realistic scenario that fits the ground truth of real world applications is to consider the possibility of encountering instances that do not belong to any of the classes that are seen in training, i.e., an open-set classification. In such situation, existing closed-set classification methods will assign a training label to these instances resulting in a misclassification.
We introduce Galaxy-X, a novel multi-class classification method for open-set problem. Our method is able to distinguish instances resembling previously seen classes from those that are of unseen classes. Galaxy-X presents high flexibility through a softening parameter that allows extending or shrinking class boundaries adding more generalization or specialization to the classification models.
In this Figure, we show an illustrative example of the transformation of a closed-set classification into a challenging multi-class open-set recognition problem. The figure shows an evaluation in terms of classification accuracy (Figure (a)) and weighted f-measure (Figure (b)) on the classification of handwriting digits (A description of the dataset is available here). The classification performance is shown for a standard closed-set classifier (SVM) versus an open-set classifier (our approach: Galaxy-SVM) (for both approaches, SVM is used with a linear kernel). In the experiment, we only used five handwriting digits in training while instances of the rest of the digits are held out. Each classification scenario was performed for five runs where in each run, the five training classes are divided using a stratified shuffle split with 75% of the training data used in training and 25% of them used in testing. In each classification scenario, we progressively added other testing instances representing one class of the held out digits. Instances of labels that were unseen in training should be labeled as unknown in prediction. As the number of unknown evaluation classes grows, the accuracy and f-measure of SVM drops significantly in contrast to that of our approach which showed higher performance especially in open-set classification scenarios.
Accuracy (a) and weighted f-measure (b) tendency of a standard closed-set classifier (namely SVM) versus our approach (Galaxy-SVM) on simulated closed-set and open-set classification scenarios. The x-axis shows the number of classes seen in training as well as the number of classes seen in testing in each classification.
How Open is an Open-set Classification?
We propose Openness as a measure to quantify the openness of a classification scenario (S):
openness(S) = | UnseenLabels| / |labels of the dataset D|
- We compare our approach with the multi-class classification strategy One-vs.-Rest (OvR-SVM), the open-set multi-class classifier One-vs.-Set SVM (OvS-SVM) and OCSVM+OvR-SVM. OCSVM+OvR-SVM is a two-step open-set multi-class classifier that we built using a one-class SVM (OCSVM) with an RBF kernel is trained on the entire training instances considered as a single super-class, and OvR-SVM. Instances that are rejected by OCSVM are classified as of "unknown" classes, otherwise they are classified using OvR-SVM.
- For our approach, we use SVM as the closed-set classifier. We show the results of Galaxy-SVM using a fixed softening value δ=-0.3 and for H(yper)-Galaxy-SVM using the optimal δ value for each openness
Visualization of the handwriting digits dataset. Colors in (b) are according to ground truth class membership.
F-measure (a) and rejection f-measure (b) results of H-Galaxy-SVM, Galaxy-SVM (δ=-0.3), OvS-SVM, OCSVM+OvR-SVM and OvR-SVM in open-set classification of the handwriting digits dataset with different openness values.
F-measure performance in open-set classification of the handwriting digits dataset for OvR-SVM and Galaxy-SVM with openness of 0.5 and using different δ (softening) values.
Visualization of the Olivetti faces dataset. Colors in (b) are according to ground truth class membership.
F-measure (a) and rejection f-measure (b) results of H-Galaxy-SVM, Galaxy-SVM (δ=-0.3), OvS-SVM, OCSVM+OvR-SVM and OvR-SVM in open-set classification of the olivetti dataset with different openness values.
F-measure performance of Galaxy-SVM and OvR-SVM in open-set classification of the Olivetti faces dataset with openness of 0.5 and using different δ (softening) values.