SEMSUP-XC
SEMANTIC SUPERVISION
FOR ZERO & FEW-SHOT
EXTREME CLASSIFICATION
Pranjal Aggarwal1 , Ameet Deshpande2, Karthik Narasimhan2
1 Indian Institute of Technology, Delhi | 2 Department of Computer Science, Princeton University
Motivation
Extreme classification (XC) considers the scenario of predicting over a very large number of classes (thousands to millions), with real-world applications including serving search engine results, e-commerce product tagging, and news article classification.
A real-life requirement in this domain is to predict from labels unseen during training(Zero-Shot), however there have been very little success in this domain. To this end, we propose SemSup-XC, a model that achieves state-of-the-art zero-shot (ZS) and few-shot (FS) performance on three extreme classification benchmarks spanning various domains. Instead of treating labels as class ids, our model learns from diverse descriptions of them, thereby attaining a more better understanding of the label space, evident from qualitative and quantitative results.
Interactive Demo of SemSup-XC
Abstract
Extreme classification (XC) considers the scenario of predicting over a very large number of classes (thousands to millions), with real-world applications including serving search engine results, e-commerce product tagging, and news article classification. The zero-shot version of this task involves the addition of new categories at test time, requiring models to generalize to novel classes without additional training data (e.g. one may add a new class “fidget spinner” for e-commerce product tagging). In this paper, we develop SEMSUP-XC, a model that achieves state-of-the-art zero-shot (ZS) and few-shot (FS) performance on three extreme classification benchmarks spanning the domains of law, e-commerce, and Wikipedia. SEMSUP-XC builds upon the recently proposed framework of semantic supervision that uses semantic label descriptions to represent and generalize to classes (e.g., “fidget spinner” described as “A popular spinning toy intended as a stress reliever”). Specifically, we use a combination of contrastive learning, a hybrid lexico-semantic similarity module and automated description collection to train SEMSUP-XC efficiently over extremely large class spaces. SEMSUP-XC significantly outperforms baselines and state-of-the-art models on all three datasets, by up to 6-10 precision@1 points on zero-shot classification and >10 precision points on few-shot classification, with similar gains for recall@10 (3 for zero-shot and 2 for few-shot). Our ablation studies show the relative importance of various components and conclude the combined importance of the proposed architecture and automatically scraped descriptions with improvements up to 33 precision@1 points. Furthermore, qualitative analyses demonstrate SEMSUP-XC’s better understanding of label space than other state-of-the-art models.
Working of SemSup-XC
SemSup-XC uses an automated approach to extract high quality descriptions for each label. It first queries a search engine for the labels, and extracts the top results. Further, a series of heuristics is applied to remove unwanted results such as advertisements, spams and uninformative or explicit content. The filtered content reflects the descriptions providing semantic information about the labels. Example: Consider Label: Video Surveillance. The corresponding description for this label is: 'It is a surveillance system capable of capturing images and videos that can be compressed, stored or sent over communication networks.'
Before train time, SemSup-XC creates a shortlist of labels using TF-IDF scores between documents and label descriptions. SemSup-XC is then trained using contrastive learning using hard negative mining, where hard negative labels are chosen based on TF-IDF scores. The overall design ensures SemSup-XC is almost 1000 times faster and memory efficient for dataset containing millions of labels.
The Input document and Label Descriptions are passed through two different transformer architectures(BERT). However instead of using simply [CLS] embeddings for classification, we propose to use a hybrid lexico-semantic matching, which ensures, model uses the semantic information as well as the fine-grained lexical information present in the text and label descriptions.
Results
Zero-Shot Classification
SemSup-XC (blue) outperforms all other baselines by significant margins on Zero-Shot classification, where task is to classify documents on unseen labels.
Generalized Zero-Shot Classification
SemSup-XC (blue) outperforms all other baselines by significant margins on Generalized Zero-Shot classification, where task is to classify documents on both seen and unseen labels.
Few Shot Classification
Eurlex (Legal) Dataset Amazon Dataset
SemSup-XC consistently outperforms all other baselines from 1-shot to 20-shot. Interestingly, even our zero-shot performance is almost equivalent to 20-shot performance of state-of-the-art zero-shot methods.
Qualitative Analysis
Qualitatively, SemSup-XC's predictions are much better than the next best performing baseline(MACLR). Examples:
In the first example, even from the short text in the document SemSup-XC is able to figure out that it is not just a book, but a textbook. While MACLR predicts five labels which are all similar, SemSup-XC is able to predict diverse labels while getting the correct label in five predictions.
In the second example, SemSup-XC smartly realizes the content of the document is a story and hence predictions literature & fiction, whereas MACLR tries to predict labels for the contents of the story instead. This shows the nuanced understanding of the label space that SemSup-XC has learned.
The third example portrays the semantic understanding of the SemSup-XC 's label space. While MACLR tries to predict labels like powered mixers because of the presence of the word mixer, SemSup-XC is able to understand the text at a high level and predict labels like studio recording equipment even though the document has no explicit mention of the words studio, recording or equipment.