The performance of machine translation using NMT and LLMs has improved dramatically, and in some cases, it can even surpass human translation depending on the language and domain. However, there is currently no universal method for accurately evaluating the performance of machine translation. Even widely used metrics such as COMET have been reported to yield unstable or inaccurate evaluation results when applied to translations of texts from domains other than those used in COMET's training.
The same applies to patent document translation. While the average translation quality has significantly improved, it remains difficult to accurately evaluate aspects such as appropriate terminology usage and term consistency. In particular, patent claims present additional challenges due to their length and distinctive writing style, making accurate evaluation even more difficult.
Therefore, we will conduct a Shared Task focusing on Japanese-English patent claim translation. The goal is not only to compete on translation quality, but also to ultimately develop an automatic evaluation method that can accurately assess translation results.
Test Period September 29 - October 6, 2025
System Description Paper for Shared Tasks Submission Deadline October 27, 2025
Review Feedback of System Description Papers November 3, 2025
Camera-ready Deadline November 11, 2025
Workshop Dates December 24, 2025
Japanese-to-English Patent Claims Translation
English-to-Japanese Patent Claims Translation
JaParaPat, an English-Japanese parallel patent application corpus is provided as training data.
You can also use training data or models other than the above JaParaPat. You must describe all of the training data and/or models used for your system in the system description paper.
Sentence pairs automatically extracted from patent document families filed between 2016 and 2020.
About 107M sentence pairs
Including TITLE, ABSTRACT, DESCRIPTION and CLAIM
Including labels indicating if each sentence comes from 1) Patent Cooperation Treaty (PCT) International Applications, 2) Patents originally applied in Japan, 3) Patents originally applied in US or 4) Patents originally applied in neither Japan nor US
Including document ID, paragraph ID, and sentence ID (but it is not possible to reconstruct the original documents)
It may contains one-to-many and many-to-many sentence pairs
You can download the development data here.
NOTE: 10_US7328829B2.txt file is missing under ref_1 directory of en-ja
You can download the test data here.
Each file contains one or more patent claims extracted from a single patent document.
Patent claims are separated by blank lines.
Each patent claim may contain line breaks.
When submitting your translation, use the same filename as the input file and output it in the same format as above.
Be sure to include a blank line between each patent claim.
The automatic evaluations will be conducted claim by claim, and the line breaks inside a claim will be removed.
Please submit the translation result to pat-claim-mt .at. googlegroups.com
BLEU
COMET
etc.
We will conduct human evaluations based on the ESA protocol (following WMT) as much as our budget permits.