Public Studies

Download public TPR-DB data

The CRITT TPR-DB consists of studies that were conducted with Translog-II, Trados or with the CASMACAT workbench. Compiled summary tables can be download via the CRITT TPR-DB management tool:

The Raw logging and aligned data for all public sessions are available on sourceforge https://sourceforge.net/projects/tprdb/ and can be checked out via svn:

svn checkout --username=tprdb https://svn.code.sf.net/p/tprdb/svn/tprdb-svn

Post your technical, methodological, and theoretical questions and comments here.

Visualize Translation Progression Graphs

Translation Progression Graphs for public studies can be interactively scrutinized by following this link:

http://critt.as.kent.edu:3838/mcarl6/ProgGraph/
note: this works with http, not with https. It works with the MS-Edge browser.

Studies that were conducted with Translog-II. The study names links to the raw data, in brackets are indicated primary references that make use of this resource. The references should be spelled out on the CRITT publication  website.

The TPR-DB contains several collections of studies which make use of the same English source texts. These source texts are identically tokenized and segmented across all studies, but are used in different text production (or reception) modes and text production into different languages.

multiLing

The multiLing data set is based on six English source texts which are translated into various languages. Four of them (Texts 1-4) are news articles and the other two are (Texts 5-6) sociological texts from an encyclopedia. The Data can be downloaded from here (User: TPRBD, passwd: tprdb). The source text data (project files for Translog-II,  tokenized *src files, and texts) can be downloaded from this link. Publications that refer to the data are given in brackets.

Into Arabic

Into Chinese

Into Danish

Into Dutch

Into English

Into German

Into Hindi

Into Japanese

Into Spanish

Into Spanish with permuted segments

missionStatements

13 mission statements from different companies in English, each of which contains approximately 160-190 words. Can be downloaded from here

ministerSpeech

This is a 44:50 minutes speech of the Minister for Foreign Affairs of Australia during her visit to Japan in 2014. This English speech was made at the National Press Club, Tokyo, Japan. The beginning of the speech is segmented into 6 short texts (approx 1 min each). The source text data (Translog-II project files, Videos, wav files, and some background information) can be downloaded from this link.

Into Chinese

Into German

diverse

Studies that use different source texts: 

Brazilian Portuguese, English

Chinese, English

Chinese, Portuguese

Danish, English

English, Dutch (nl)

Estonian, English

Polish, French

German, English

Spanish, English

Trados Data

sessions recorded with Trados. Follow instructions here to download the data. For a description of the Trados logging tool and conversion into TPR-DB, see Zou, L., et al. (2023), .Yamada et al (2022), or Zou, L., & Carl, M. (2022). 

Data used in Vieira, et al  (2023) Translating science fiction in a CAT tool: machine translation and segmentation settings. Translation & Interpreting Vol. 15 No. 1  

Data used in Zou  et al (2022)

Data used in Gilbert (2022), PhD thesis:

Studies conducted with the CASMACAT workbenches include. Follow instructions here to download the data: 

Data from the CASMACAT field trial 2014 (CFT14):

Seven post-editors post-editing each two texts in plain post-editing mode and under active learning conditions. Revisions of the post-edited texts with hand-writing recognition.

Data from the CASMACAT longitudinal study 2014 (LS14):

Five post-editors post-editing each 24 files during a periode of 6 weeks between May and June 2014 in plain post-editing and interactive mode. In all 120 translation sessions from which 35 are with recorded gaze data

Data from the CASMACAT field trial 2013 (CFT13):

Logging data of 81 post-editing and revision sessions, more than 120 hours of user activity data, recorded with CASMACAT workbench v.2.0:

Data from the CASMACAT pre-field trial 2013 (PFT13):

This data was recorded with CASMACAT workbench v.2.0:

Data from the CASMACAT field trial 2012 (CFT12):

Logging data of 89 translation sessions English -> Spanish recorded with the CASMACAT workbench v.1.0