Consider to publish your next paper in "Special Issue on Human and Machine Translation: Recent Trends and Foundations". Click here:

CRITT TPR-DB

The CRITT Translation Process Research Database (CRITT TPR-DB) consists of two sections. In private section, researchers can upload, process, analyze, download and also deleted their Translog-II compatible data. Privat data can also be added to the public section of the TPR-DB, if requested. A license for a private account can be obtained upon request.

The second section of the TPR-DB is a publicly available database of recorded text processing sessions (mostly translation). It is available under a creative commons license (see license). The TPR-DB consists of a data lake (raw data) of user activity data (UAD) from translation (and other text processing) studies recorded with Translog 2006, Translog-II, and with the CASMACAT workbench. This data acquisition software logs keystrokes and gaze data during text perception and text production.

In addition to the raw logging data, a post-processed version of the database (TPR-DB) can be downloaded which consists of several tab-separated summary tables that can be more easily processed by various visualization and (statistical) analysis tools.

More detailed information is available below and under these links:

Post your technical, methodological, and theoretical questions and comments here

Download public studies via the TPR-DB management tool

pre-compiled summary tables can be downloaded from the CRITT TPRDB management tool:

1. go to the login page: https://critt.as.kent.edu/cgi-bin/yawat/yawat.cgi
2. login as public user: TPRDB password: tprdb
3. change to management page: https://critt.as.kent.edu/cgi-bin/yawat/tpd.cgi
4. click on the Download buttons for the Tables, Alignment, or the logging data of the studies you are interested in.

Download raw TPR-DB data from sourceforge

Alternatively, the raw logging and aligned data for all sessions are also available on sourceforge https://sourceforge.net/projects/tprdb/ and can be checked out via svn (approx. 50 GB!)

On Linux (or cygwin):

svn checkout --username=tprdb https://svn.code.sf.net/p/tprdb/svn/ tprdb-svn

On Windows:

Install Tortoise svn (e.g., from https://tortoisesvn.net/downloads.html)
right-click in a directory where you want to install the TPR-DB, select "SVN checkout", paste https://svn.code.sf.net/p/tprdb/svn/ into the URL field, and click OK.
You can also specify which folders to download (e.g., https://svn.code.sf.net/p/tprdb/svn/bin)

The book New Directions in Empirical Translation Process Research gives an in-depth introduction to the TPR-DB

Versions of the tables should be downloaded from the TPR-DB management tool:

https://critt.as.kent.edu/cgi-bin/yawat/yawat.cgi

Generating a TPR-DB

Generate a TPR-DB from logging data using the firefox browser (or follow instructions under TPR-DB management tool )
Generate a TPR-DB from logging data using perl scripts

Visualizing TPR-DB data

Word-alignments in YAWAT:
Progression Graphs in R

Documentation

For documentation on how to extract and convert the raw logging data into the TPR-DB format, read this document. The document describes how to run the scripts in the "bin" in the TPR study folder. The database compilation process also requires external tools & resources:

YAWAT: Yet Another Word Alignment Tool is a browser-based tool for manual word alignment. The YAWAT website visualizes the segment and word alignments of the entire TPR-DB and requires a password which can be obtained from mc.ibc@cbs.dk. The following paper explains what Yawat is all about, and how to use it.
JDTAG is a Java-based tool for manual word alignment and alignment correction in functions similar to YAWAT. JDTAG can read the atag file format as is contained in the Alignment folder in the TPR-DB (version 1.0) and in raw data (please send an e-mail to mc.ibc@cbs.dk in case you want to have access to JDTAG).

License

The CRITT Translation Process Research Database (TPR-DB) by the Center for Research and Innovation in Translation and Translation Technology is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

We would like to thank all contributors and participants for their work.

Google Sites

Report abuse