Uploading Data

The TPR-DB management tool is a a browser interface through which one can upload and download raw logging data and generate TPD-DB tables. Data can be downloaded from the public section. A 'private' account is required for uploading and processing data in the CRITT TPR-DB. Interested researchers can request a CRITT membership.

TPR-DB tables facilitate the analysis of the logging data through a large set of pre-defined features. This management tool allows you to:

To upload your study, it is crucial that you prepare your files according to the TPR-DB conventions. Follow the steps below and post your technical, methodological, and theoretical questions and comments here if something is unclear.

Prepare TPR-DB session files 

TPR-DB naming conventions

The TPR-DB is an anonymized repository of logged translation sessions and contains a number of different modes, such as post-editing of machine translation and interactive post-editing (e.g. CASMACAT project), in addition to human translations.

The logged UAD is contained in one single file, where the file name contains important variables of the translation study. For example:

That is, Participant, Task, Text, in the following format:   

Adding Language tag

Before uploading a Translog-II log file through the TPR-DB management tool you must insert a language tag. The server can then know which are the source and the target languages and process files accordingly: Open the xml file and insert a line <Languages … /> with the source and target languages in the session, at the position indicated below. IMPORTANT: Make sure to use the correct quotes, and no additional spaces between attributes and values!

    ...

    <fullScreen>false</fullScreen>

    <lockWindows>false</lockWindows>

    <Languages source="en" target="es" task="translating" />

    <Plugins>

       <Key_Logger />

       ...

Upload and process Studies

You need an account to generate a TPR-DB from the management tool with your own study. Contact m.gummiball[at]gmail.com to obtain an account that allows you to upload your studies and to change the word alignments. You can also generate a TPR-DB from logging data using perl scripts.

Log in to the TPR-DB management tool

The appearance of the TPR-DB management tool

Upload a new study

The appearance of the screen after successful upload

Add data to available collections

when collecting and adding data to available ST data collections (these are currently "multiLing", "missionStatement:, "ministerSpeech") the source text files must be identical with respect to text numbering, segmentation and tokenization. Perhaps the best way to ensure this, is to use the available Translog-II *project files, which can be downloaded from this link. Once the Translog-II data is collected and uploaded to the TPRDB as described above, the *src files (in the Alignment folder) should be replaced by the corresponding *src of previous studies and which can also be downloaded  from from this same link. For ST-TT alignment and all further automatic processing make sure that the src files in your new study are identical with the src files in the already available studies in that collection.

Uploading Trados  data to the TPR-DB

Keylogging data collected with  Trados Studio (i.e. with the Qualitivity plugin) can be uploaded to the CRITT TPR-DB.  The uploading option can also synchronize with the data of various eye-trackers, Tobii, Eyelink and Gazepoint that is recorded during the translation sessions.  More details are provided here and video instruction on YouTube.

Uploading PET data to the TPR-DB

PET is tool to facilitate the post-editing of translations from MT systems. PET produces *per files which contain the keylogging data collected in PET sessions. These *per can be uploaded to the TPR-DB. 


The PET-to-Translog conversion produces for each PET unit (i.e. the PET-term for segment) a separate Translog-II session. That is, if the file PET name was P33_T1.per and has 21 units (segments), after the conversion there will be 21 Translog sessions: P33_T1001 ...  P33_T1021. The PET-to-Translog assumes that all *per files that end in the same number (e.g., 1 in P33_T1) have identical number of units and ST content. TPR-DB computes word translation entropy on this basis.  

Speech data and the CRITT TPRDB

The speech signal should be transcribed in such a way that each word comes along with the time-stamp indicating its production time. You can use an automatic transcription (ASR), such as IBM Watson (see tutorial here and here) or Speechmatics (see documentation and some scripts) which in our experience produces better output and punctuation marks. The automatic transcription is then successively revised, e.g. with ELAN, or with a spreadsheet, converted into a Translog-II compatible xml file and uploaded to the CRITT TPR-DB.

Depending on the type of spoken data (interpretation, sight translation, reading aloud, etc.) the voice data can also be synchronized with the gaze data (e.g. during reading or sight translation) or/and with the audio input (e.g. for interpreting, simultaneous interpreting with text). 

Generate a TPR-DB with perl scripts

You can also generate TPR-DB tables locally on your computer:

Preparing Windows


Tokenize and align texts

In cywin: cd into your tprdb/bin folder and run:

    ./StudyAnalysis.pl -C tokenize –S <study> -U <user>

This will produce three files inside the folder tprdb/<study>/Alignment with suffixes *.src, *.tgt and *.atag (special tokenizer must be installed for Japanese and Chinese)

Open the Jdtag program and load the *.src, *.tgt and *atag. Manually align words in source and target texts and save the file in the <Study>/Alignment/*.atag

Or use one of the other options for word alignment

Generate tables

In cygwin: cd into your tprdb/bin folder and run:

    ./StudyAnalysis.pl –C tables –S <Study> -U <user>

This will produce two folders <user>/<study>/Events and <study>/Tables. The folder <study>/Tables contains a number files which are helpful for further data analysis.