HAT19 - Quality Estimation in Practice: from Implementation to State-of-the-Art

Quality Estimation in Practice: from Implementation to State-of-the-Art

Abstract:

Neural Machine Translation (NMT) models have gone one step further than previous statistical (SMT) approaches by providing highly fluent translations. However, it is not uncommon for them to provide hallucinated or inadequate translations. An increasingly common scenario is to evaluate an NMT translation in order to detect which of those is the case and take action accordingly, i.e., either delivering the translation or requesting human post-editing. A main complication factor in this scenario is that neural models are generally overconfident, meaning even very wrong translations are generated with a high internal probability or confidence. The Quality Estimation (QE) task aims at evaluating machine translations using totally decoupled models. The WMT shared task on QE has been the main venue of models and resources. Although the task has been held for longer, evaluating neural model translations only started in 2017, and posed a much harder challenge than previous SMT data. In this talk I will first present how Unbabel reproduced the winning systems of 2017 and 2018 with comparable or better numbers, and how they were implemented in an easy to use and open-source QE framework called OpenKiwi. I will give examples on how to download the available pretrained models and replicate the published results with a few lines of code or a single command. OpenKiwi is being successfully used inside Unbabel for exactly the scenario described above. I will then talk about how Unbabel built several new models on top of OpenKiwi to ultimately win the WMT19 QE shared task across the board. Finally, I will talk about less desirable features of current QE systems and challenges for further improving the usefulness of this task.

Bio:

Fabio Kepler is a senior research scientist at the fundamental AI team at Unbabel, leading and participating in projects for machine translation and Quality Estimation (QE), whose most recent outcomes were an open-source state-of-the-art framework for QE and the top rank on the WMT QE competition across all tasks. Prior to his current role, Fabio was a full-time professor for almost seven years at the Federal University of Pampa in Brazil, where among other things he created and became the first director of the university's Patent and Technology Transfer Office. He has a Ph.D. from the University of Sao Paulo (2010, with an interim period at the University of Pennsylvania), is the author of more than 50 scientific publications, and actively reviews for top NLP conferences.

Report abuse