SEMSUP-XC

SEMANTIC SUPERVISION
FOR ZERO & FEW-SHOT
EXTREME CLASSIFICATION

             Pranjal Aggarwal1 , Ameet Deshpande2, Karthik Narasimhan2

                                  1 Indian Institute of Technology, Delhi   |  2 Department of Computer Science, Princeton University


Motivation

Extreme classification (XC) considers the scenario of predicting over a very large number of classes (thousands to millions), with real-world applications including serving search engine results, e-commerce product tagging, and news article classification. 

A real-life requirement in this domain is to predict from labels unseen during training(Zero-Shot), however there have been very little success in this domain. To this end, we propose SemSup-XC, a model that achieves state-of-the-art zero-shot (ZS) and few-shot (FS) performance on three extreme classification benchmarks spanning various domains. Instead of treating labels as class ids, our model learns from diverse descriptions of them, thereby attaining a more better understanding of the label space, evident from qualitative and quantitative results. 

semsupxc_demo.mp4

Interactive Demo of SemSup-XC

Abstract

Extreme classification (XC) considers the scenario of predicting over a very large number of classes (thousands to millions), with real-world applications including serving search engine results, e-commerce product tagging, and news article classification. The zero-shot version of this task involves the addition of new categories at test time, requiring models to generalize to novel classes without additional training data (e.g. one may add a new class “fidget spinner” for e-commerce product tagging). In this paper, we develop SEMSUP-XC, a model that achieves state-of-the-art zero-shot (ZS) and few-shot (FS) performance on three extreme classification benchmarks spanning the domains of law, e-commerce, and Wikipedia. SEMSUP-XC builds upon the recently proposed framework of semantic supervision that uses semantic label descriptions to represent and generalize to classes (e.g., “fidget spinner” described as “A popular spinning toy intended as a stress reliever”). Specifically, we use a combination of contrastive learning, a hybrid lexico-semantic similarity module and automated description collection to train SEMSUP-XC efficiently over extremely large class spaces. SEMSUP-XC significantly outperforms baselines and state-of-the-art models on all three datasets, by up to 6-10 precision@1 points on zero-shot classification and >10 precision points on few-shot classification, with similar gains for recall@10 (3 for zero-shot and 2 for few-shot). Our ablation studies show the relative importance of various components and conclude the combined importance of the proposed architecture and automatically scraped descriptions with improvements up to 33 precision@1 points. Furthermore, qualitative analyses demonstrate SEMSUP-XC’s better understanding of label space than other state-of-the-art models. 

Working of SemSup-XC

Results

Zero-Shot Classification

SemSup-XC (blue) outperforms all other baselines by significant margins on Zero-Shot classification, where task is to classify documents on unseen labels.

Generalized Zero-Shot Classification

SemSup-XC (blue) outperforms all other baselines by significant margins on Generalized Zero-Shot classification, where task is to classify documents on both seen and unseen labels.

Few Shot Classification

Eurlex (Legal) Dataset Amazon Dataset

SemSup-XC consistently outperforms all other baselines from 1-shot to 20-shot. Interestingly, even our zero-shot performance is almost equivalent to 20-shot performance of state-of-the-art zero-shot methods.

Qualitative Analysis

Qualitatively, SemSup-XC's predictions are much better than the next best performing baseline(MACLR). Examples: