The CRITT Translation Process Database (CRITT TPR-DB) is a publicly available database of recorded text processing sessions (mostly translation). It is available under a creative commons license (see license) . The TPR-DB consists of a data lake (raw data) of user activity data (UAD) from more than 40 translation (and other text processing) studies recorded with Translog, Translog-II and with the CASMACAT workbench. This data acquisition software logs keystrokes and gaze data during text perception and text production. The data currently amounts to more than 500 hours of text production gathered in more than 3000 sessions.
In addition to the raw logging data, a post-processed version of the database (TPR-DB) can be downloaded which consists of several tab-separated summary tables that can be more easily processed by various visualization and (statistical) analysis tools.
More detailed information is available below and under these links:
- Public studies
- Upload data to the CRITT TPR-DB
- Word and segment alignment
- Features of the CRITT TPR-DB tables
- Visualization and progression graphs in R
Download public studies via the TPR-DB management tool
then switch to the CRITT TPR-DB management tool :
from here you can download the raw data (log and alignment files) as well as newest versions of the TPR-DB tables for all publicly available studies
Download raw TPR-DB data from sourceforge
Alternatively, the raw logging and aligned data for all sessions are also available on sourceforge https://sourceforge.net/projects/tprdb/ and can be checked out via svn (approx. 50 GB!)
On Linux (or cygwin):
- svn checkout --username=tprdb https://svn.code.sf.net/p/tprdb/svn/ tprdb-svn
- Install Tortoise svn e.g. from https://tortoisesvn.net/downloads.html
- right-click in a directory where you want to install the TPR-DB, select "SVN checkout", paste https://svn.code.sf.net/p/tprdb/svn/ into the URL field and click OK.
- You can also specify which folders to download e.g https://svn.code.sf.net/p/tprdb/svn/bin
Earlier versions of post-processed and zipped TPR-DB tables can be downloaded from here: https://sourceforge.net/projects/tprdb/files/
- A paper describing the features in the TPR-DB can be downloaded here.
- The book New Directions in Empirical Translation Process Research gives an in-depth introduction to the TPR-DB
newer versions of the tables should be downloaded via the TPR-DB management tool:
For a documentation how to extract and convert the raw logging data into the TPR-DB format, read this document. The document describes how to run the scripts in the "bin" in the TPR study folder. The database compilation process also requires external tools & resources:
- YAWAT: Yet Another Word Alignment Tool is a browser-based tool for manual word alignment. The YAWAT website visualizes the segment and word alignments of the entire TPR-DB and requires a password which can be obtained from email@example.com. The following paper explains what Yawat is all about, and how to use it.
- JDTAG is a Java-based tool for manual word alignment and alignment correction, in function similar to YAWAT. JDTAG can read the atag file format as is contained in the Alignment folder in the TPR-DB (version 1.0) and in raw data (please send an e-mail to firstname.lastname@example.org in case you want to have access to JDTAG).
The CRITT Translation Process Research Database (TPR-DB) by the Center for Research and Innovation in Translation and Translation Technology is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
We would like to thank all contributors and participants for their work.