SCOPES
Statistical and neural machine translation (SMT/NMT) methods have been successfully used to build MT systems in many popular languages in the last two decades with significant improvements on the quality of automatic translation. However, these methods still rely upon a few natural language processing (NLP) tools to help pre-process human generated texts in the forms that are required as input for these methods, and/or post-process the output in proper textual forms in target languages.
In many MT systems, the performance of these tools has great impacts on the quality of resulting translation. However, there is not much discussion on these NLP tools, their methods, their roles in different MT systems of diverse methods, and their coverage of support in the many languages of the world, etc. In this workshop, we would like to bring together researchers who work on these topics and help review/overview what are the most important tasks we need from these tools for MT in the following years.
These NLP tools include, but not limited to, several kinds of word tokenizers/de-tokenizers, word segmenters, morphology analysers, etc. In this workshop, we solicit papers dedicated to these supplementary tools that are used in any language and especially in low resource languages. We would like to have an overview of these NLP tools from our community. The evaluations of these tools in research papers should include how they have improved the quality of MT output. The workshop also solicits papers working on MT systems/methods for low resource languages in general. The scopes of the workshop are not limited to the NLP tools for MT pre-processing and post-processing.