Оцінювання якості машинного перекладу за Hter у специфічному галузевому текстовому середовищі

Додаткова інформація

Інформація про автора:

Карпіна Олена Олександрівна, кандидатка філологічних наук (германські мови), старша викладачка кафедри прикладної лінгвістики Волинського національного університету імені Лесі Українки (м. Луцьк, Україна).

Листування: karpina@vnu.edu.ua

Citation:

Karpina O. Evaluating the quality of machine translation output with HTER in domain-specific textual environment [Text] // Linhvistychni Studiyi / Linguistic Studies : collection of scientific papers / Vasyl' Stus Donetsk National University; Ed. by Anatoliy Zahnitko. Vinnytsia : Vasyl' Stus DonNU, 2023. Vol. 46. Pp. 85-99. ISBN 966-7277-88-7

DOI: https://doi.org/10.31558/1815-3070.2023.46

Історія публікації:

Випуск вперше опубліковано в Інтернеті: 1 листопада 2023 року

Стаття отримана: 04 вересня 2023 року, прийнята: 01 жовтня 2023 року та вперше опублікована в Інтернеті: 01 листопада 2023

Анотація.

У статті представлено ґрунтовний аналіз якості машинного перекладу на прикладі двох популярних систем машинного перекладу: Google Translate та DeepL, у контексті англо-української мовної пари. Дослідження зосереджене на оцінюванні результатів машинного перекладу за метрикою HTER у трьох тематичних галузях: публіцистика, технічна документація та юридичні документи. Процесом оцінювання передбачено розподіл редагувань, зроблених людиною, на вставлення, видалення, заміну та перестановку, кожне з яких продемонструвало певні показники. Дослідження заглиблюється в основні причини редагувань, що випливають з граматичних, стилістичних, культурних і термінологічних труднощів. Хоча результати дослідження продемонстрували досить високу продуктивність обох систем, з різницею між машинним перекладом і версією, відредагованою людиною, менше 1 %, дослідження підкреслило постійну потребу у втручанні людини в процес машинного перекладу.

Ключові слова: HTER, оцінювання якості машинного перекладу, постредагування, відстань редагування, помилка машинного перекладу, Google Translate, DeepL.

EVALUATING THE QUALITY OF MACHINE TRANSLATION OUTPUT WITH HTER IN DOMAIN-SPECIFIC TEXTUAL ENVIRONMENT

Olena Karpina

Applied Linguistics Department, Lesya Ukrainka Volyn National University, Lutsk, Ukraine.

Abstract

Background: The implementation of neural networks in MT systems design has greatly challenged the existence of human translation. The emergence of translating models which adopt mechanisms of translation, imitating the work of the human brain, aroused high expectations of immediate breakthrough. However, despite significant improvements in accuracy and fluency of AI-powered MT systems, human assistance remains essential in the translation process.

Purpose: The Purpose of the research is to evaluate and compare the effectiveness and limitations of free online services Google Translate and DeepL for English-Ukrainian language pair across three topic domains utilizing HTER metrics.

Results: Google Translate and DeepL demonstrated rather high level of performance, with the edit distance less than 1 % in each of the three domains. Nonetheless, it is still early to talk about self-sufficient MT systems which can operate completely without human assistance. The main causes for MT translation errors were identified as terminological issues, including wrong translation equivalent and terminology inconsistency, contextual issues, stemming in from the inability to interpret a wider context; accuracy errors due to the gap between grammatical systems of the source and target languages, fluency concerns, and various cultural and stylistic discrepancies.

Discussion: The most challenging input for both MT systems appeared journalistic writing, with the HTER scores 0,7 % for Google Translate and 0,5 % for DeepL (the percentage indicates edit distance between MT and human post-edited translation). The errors made by MT systems are rooted in the stylistic features of this genre of writing, bearing traits of the author’s individual style, including idioms, phrasal verbs, stylistic figures. In technical writing, DeepL performed considerably better, with the edit distance just 0,2 %, while Google Translate exhibited the most favorable performance within the legal textual environment, demonstrating the edit distance of 0,5 %, whereas in technical writing the outcome was slightly worse – 0,6 %. DeepL, having outperformed Google Translate in all experimental domains, exhibited the edit distance of 0,3 % in technical writing.

Concerning the types of edits, categorized according to HTER metrics into insertion, deletion, substitution and shift, the most frequent edit employed by human post-editors was substitution, accounting for roughly over a half of all edits made during the post-editing process. Notably, in legal writing its score raised to 79 % for Google Translate and 80 % for DeepL, which can be explained by terminological inappropriacy and structural challenges due to distinct syntactic rules of the source and target languages. The least frequent edit was shift, its value did not exceed 4 % for all experimental domains.

Keywords: HTER, MT quality evaluation, post-editing, edit distance, MT error, Google Translate, DeepL.

Vitae

Olena Karpina PhD in Philology (Germanic Languages), Associate Professor of Applied Linguistics Department, Lesya Ukrainka Volyn National University.

The scope of scientific interests covers translation studies, linguistics of emotion, lexical semantics, communicative linguistics.

Correspondence: karpina@vnu.edu.ua

Лінгвістчині студії

Випуск 46, 2023, с. 85-99

Оцінювання якості машинного перекладу за Hter у специфічному галузевому текстовому середовищі

Карпіна Олена

Стаття вперше опублікована в Інтернеті: 20 листопада 2023 р.

Стаття.

ЛС_46_85-99.pdf

Література

1. Гудманян, А., Сітко, А., Струк, І. «Функціонально-прагматична адекватність машинного перекладу публіцистичних текстів». [B] Науковий журнал Львівського державного університету безпеки життєдіяльності «Львівський філологічний часопис»: зб. наук. праць 5. Львів, 2019: 48–54.

[Hudmanyan A., Sitko A., Struk I. «Funktsional’no-prahmatychna adekvatnist’ mashynnoho perekladu publitsystychnykh tekstiv». [V] Naukovyy zhurnal L’vivs’koho derzhavnoho universytetu bezpeky zhyttyediyal’nosti «L’vivs’kyy filolohichnyy chasopys»: zb. nauk. prats’ 5. L’viv, 2019: 48–54.]

2. Карабан, В. І., and А. В., Карабан. «Чи настає вже ера художнього машинного перекладу?(контекстуальні помилки машинного перекладача Deepl)». [B] Мова і культура, 2021: 438–445.

[Karaban, V. I., and A. V., Karaban. «Chy nastaye vzhe era khudozhn’oho mashynnoho perekladu? (kontekstual’ni pomylky mashynnoho perekladacha DeepL)». [V] Mova i kul’tura, 2021: 438–445.

3. Карпіна, Олена «Компаративний аналіз літературного й машинного перекладів (на матеріалі фрагментів роману C. Плат “The Bell Jar”)». [B] Актуальні питання іноземної філології : наук. журн. / редкол. І. П. Біскуб (гол. редактор) та ін. Луцьк: Східноєвроп. нац. ун-т ім. Лесі Українки 3, 2020: 94–101.

[Karpina, Olena «Komparatyvnyy analiz i mashynnoho perekladiv ((na materiali frahmentiv romanu S. Plat “The Bell Jar”)». [V] Aktual’ni pytannya inozemnoyi filolohiyi : nauk. zhurn. / redkol. I. P. Biskub (hol. Redactor ta in. Luts’k: Skhidnoyevrop. nats. un-t im. Lesi Ukrayinky 3, 2020: 94–101.]

4. Моісєєва, Наталія, Ольга, Дзикович, and Аліна, Штанько. «Машинний переклад: порівняння результатів та аналіз помилок DeepL та Google Translate». [B] Advanced Linguistics 11, 2023: 78–82.

5. [Moisyeyeva, Nataliya, Ol’ha, Dzykovych, and Alina, Shtan’ko. «Mashynnyy pereklad: porivnyannya rezul’tativ ta analiz pomylok DeepL ta Google Translate». [V] Advanced Linguistics 11, 2023: 78–82.]

6. ASD Simplified Technical English Specification ASD-STE100. URL: https://www.asd-ste100.org/ (29.08.2023)

7. Bhardwaj, Sh., Hermelo, D. A., Langlais, Ph., Bernier-Colborne, G., Goutte, C., and Simard, M... “Human or Neural Translation?”. [V] In Proceedings of the 28th International Conference on Computational Linguistics. Barcelona, Spain. International Committee on Computational Linguistics, 2020: 6553–6564. URL: https://aclanthology.org/2020.coling-main.576.pdf (9.09.2023)

8. Castilho, S., Moorkens, J., Gaspari, F., Calixto, I., Tinsley, J., Waya, A. “Is Neural Machine Translation the New State of the Art?”. [V] The Prague Bulletin of Mathematical Linguistics 108(108), 2017:109–120. DOI: 10.1515/pralin-2017-0013

9. How does DeepL work? URL: https://www.deepl.com/en/blog/how-does-deepl-work

10. ISO/IEC/IEEE 26514:2008 (IEEE Standard for Systems and Software Engineering - Requirements for Designers and Developers of User Documentation). URL: https://www.iso.org/standard/43073.html (29.08.2023)

11. Banerjee, S. and Lavie, A. [V] “METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments”. Proceedings of the ACL 2005 Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization: 2005 URL: https://aclanthology.org/W05-0909/ (29.08.2023)

12. Cambridge dictionary URL: https://dictionary.cambridge.org/dictionary/english/oven (9.09.2023)

13. How do we compare to the competition? URL: https://www.deepl.com/en/quality.html

14. Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. “Bleu: a Method for Automatic Evaluation of Machine Translation”. [V] Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics 2002 URL: https://aclanthology.org/P02-1040.pdf (29.08.2023)

15. Snover, Matthew G. et al. “TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate”. [V] Machine Translation 23 (2), September 2009: 117–127. DOI: 10.1007/s10590-009-9062-9

16. Snover, Matthew, et al. A Study of Translation Edit Rate with Targeted Human Annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. Cambridge, Massachusetts, USA. Association for Machine Translation in the Americas 2006: 223–231.

17. The IBM Style Guide: Conventions for Writers and Editors. URL: https://ptgmedia.pearsoncmg.com/images/9780132101301/samplepages/0132101300.pdf (29.08.2023)

18. Wu Y., Schuster M., Chen Zh., Le Quoc V., Norouzi M., Macherey W., Krikun M., Cao Yu., Gao Q., Macherey K., Klingner J., Shah A., Johnson M., Liu X., Kaiser L., Gouws S., Kato Y., Kudo T., Kazawa H., Stevens K., Kurian G., Patil N., Wang W., Young C., Smith J., Riesa J., Rudnick A., Vinyals O., Corrado G., Hughes M., and Dean, J. “Google’s neural machine translation system: Bridging the gap between human and machine translation”. arXiv: 1609.08144v2 [cs.CL] 2016. https://doi.org/10.48550/arXiv.1609.08144