Workshop at EMNLP 2020
Online (due to Covid-19 pandemic)
Friday, 20 November 2020
The Natural Language Processing (NLP) community has, in recent years, demonstrated a notable focus on improving higher scores on standard benchmarks and taking the lead on community-wide leaderboards (e.g., GLUE, SentEval). While this aspiration has led to improvements in benchmark performance of (predominantly neural) models, it has also resulted in a worrysome increase in model complexity and the amount of computational resources required for training and using the current state-of-the-art models. Moreover, the recent research efforts have, for the most part, failed to identify sources of empirical gains in models, often failing to empirically justify the model complexity beyond benchmark performance.
Because of these easily observable trends, we propose a workshop that promotes simpler and more sustainable NLP research and practices, with two main objectives: (1) encouraging development of more efficient NLP models; and (2) providing simpler architectures and empirical justification of model complexity. For both aspects, we will encourage submissions from all topical areas of NLP.
Concerning efficiency, we encourage submissions covering models yielding competitive performance but are more efficient in either of the following aspects:
Data and training efficiency: models requiring less training data and/or less computational resources and/or time;
Inference efficiency: models with lower comp. complexity of prediction/inference
With respect to justifiability of model complexity, we encourage submissions that:
Justify the complexity of existing or newly proposed NLP models, e.g., by showing that meaningful simplifications of the model lead to significant deterioration in performances, interpretability, and/or robustness;
Introduce a conceptual or practical simplification of an existing model, yielding (1) comparable performance, while (2) offering advantages like interpretablity, inference time, robustness, etc.
The workshop will encourage novel ways of evaluating and reporting research besides the currently prevalent focus on comparison (using established metrics) with state-of-the-art models on known benchmarks. Concretely, we aim to (1) promote best practices in reporting experimental results, and encourage work that (2) critically analyzes existing evaluation protocols and (3) encourages the development and usage of novel evaluation procedures (see the shared task below).
With the SustaiNLP workshop, we wish to complement existing related events on reproducibility and interpretability (e.g., 4REAL@LREC18, BlackboxNLP) and further encourage the community to justify the complexity of models and design simpler solutions yielding competitive results. Furthermore, our focus on efficiency and justifiability has the potential to stimulate conceptual creativity and novelty in model design, as opposed to current trends where the empirical progress is dominantly achieved by increasing model complexity, computational resources, and training data.
We plan to hold a shared task that aims to stimulate the development of more efficient models, based on the language understanding benchmark SuperGLUE. More precisely, the shared task will focus on evaluating and encouraging an optimal trade-off between the model performance and efficiency during inference. We decided to focus the shared task on inference efficiency as it can be difficult to fairly evaluate training efficiency in the most general setting. Moreover, as large-scale pretrained models reach production, the cumulative lifetime environmental cost of these models will likely be mostly constituted by inference computational cost, thus calling for a particular attention to this stage.
Evaluation of the shared task will comprise ranking models according to (inference) efficiency under model performance constraints and vice versa, according to performance under efficiency constraints. This shared task bares similarities with the Efficient NMT shared task held as part of WNMT, but will target a wider NLP community, as opposed to the narrower MT community.