CRITT TPR-DB

The CRITT Translation Process Database (CRITT TPR-DB) is a publicly available database of recorded text processing sessions (mostly translation). It is available under a creative commons license (see license) . The TPR-DB consists of a data lake (raw data) of user activity data (UAD) from more than 40 translation (and other text processing) studies recorded with Translog, Translog-II and with the CASMACAT workbench. This data acquisition software logs keystrokes and gaze data during text perception and text production. The data currently amounts to more than 500 hours of text production gathered in more than 3000 sessions.

In addition to the raw logging data, a post-processed version of the database (TPR-DB) can be downloaded which consists of several tab-separated summary tables that can be more easily processed by various visualization and (statistical) analysis tools.

More detailed information is available below and under these links:


Post your technical, methodological, and theoretical questions and comments here

Download public studies via the TPR-DB management tool

Go to: http://dighum1.ftsk.uni-mainz.de/cgi-bin/yawat/yawat.cgi

user: TPRDB

password: tprdb

then switch to the CRITT TPR-DB management tool :

https://dighum1.ftsk.uni-mainz.de/cgi-bin/yawat/tpd.cgi

from here you can download the raw data (log and alignment files) as well as newest versions of the TPR-DB tables for all publicly available studies

Download raw TPR-DB data from sourceforge

Alternatively, the raw logging and aligned data for all sessions are also available on sourceforge https://sourceforge.net/projects/tprdb/ and can be checked out via svn (approx. 50 GB!)

On Linux (or cygwin):

On Windows:

Earlier versions of post-processed and zipped TPR-DB tables can be downloaded from here: https://sourceforge.net/projects/tprdb/files/

newer versions of the tables should be downloaded via the TPR-DB management tool:

Generating a TPR-DB

Visualizing TPR-DB data

Documentation

For a documentation how to extract and convert the raw logging data into the TPR-DB format, read this document. The document describes how to run the scripts in the "bin" in the TPR study folder. The database compilation process also requires external tools & resources:

  • YAWAT: Yet Another Word Alignment Tool is a browser-based tool for manual word alignment. The YAWAT website visualizes the segment and word alignments of the entire TPR-DB and requires a password which can be obtained from mc.ibc@cbs.dk. The following paper explains what Yawat is all about, and how to use it.
  • JDTAG is a Java-based tool for manual word alignment and alignment correction, in function similar to YAWAT. JDTAG can read the atag file format as is contained in the Alignment folder in the TPR-DB (version 1.0) and in raw data (please send an e-mail to mc.ibc@cbs.dk in case you want to have access to JDTAG).

License

The CRITT Translation Process Research Database (TPR-DB) by the Center for Research and Innovation in Translation and Translation Technology is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

We would like to thank all contributors and participants for their work.

Creative Commons License