Carla Parra Escartín and Manuel Arcedillo
Abstract:
This paper studies the minimum score at which machine translation (MT) evaluation metrics report productivity gains in a machine translation post-editing (MTPE) task. We ran an experiment involving 10 professional in-house translators from our company in which they were asked to carry out a real translation task involving MTPE, translation from scratch and fuzzy-match editing. We then analyzed the results and evaluated the MT output with traditional MT evaluation metrics such as BLEU and TER, as well as the standard used in the translation industry to analyze text similarity in translation memory (TM) matches: the fuzzy score. We report where the threshold between productivity gain and productivity loss lies and contrast it with past experiences in our company. We also compare the productivity of segments from MT and TM which require similar editing in order to gain further insights on MTPE cognitive effort and its impact on a fuzzy-based pricing scheme.