Consider to publish your next paper in "Special Issue on Human and Machine Translation: Recent Trends and Foundations". Click here:

TPR-DB and R

CRITT TPR-DB tables are suited for statistical analysis. To work with R you might want to install R and R-studio:

Install R from http://www.r-project.org/ and
Install R-studio from https://www.rstudio.com/products/rstudio/download/

Below are commands some to visualize translation progression graphs in R, and how to concatenate several tables of the same type, necessary for statistical analysis.

Post your technical, methodological, and theoretical questions and comment here

Visualize a translation progression graph with R

Start R-studio and type:

1. set WD under Session -> Set working Directory -> Choose Directory
2. load the visualization script: source("tprdb/bin/proGra.R")
3. read a session data: ReadData("<Study>/Tables/<session_to_be_plotted>")
4. Use ProgGraph() to visualize the translation progression graph

Example:

source("D:/tprdb/bin/proGra.R")

ReadData("D:/tprdb/KTHJ08/Tables/P08_T1")

ProgGraph()

Reading for comprehension

The progression graph below was produced with the following commands:

source("D:/tprdb/bin/proGra.R")

ReadData("D:/tprdb/SJM16/Tables/P03_L1")

ProgGraph()

It shows how reading for comprehension develops in time. The time in ms is shown on the horizontal axis. The source text in its original order is shown on the left vertical axis. For reading, source and target text are, of course identical. Fixations on words are shown in blue.

The participant reads in a linear fashion and occasionally, there are regressions to earlier words. Reading is relatively smooth and segment 1 and 4 seem more demanding than the other segments. Also segment 11 seems difficult to a certain extent, but in general, reading progresses without major setbacks.

Translation with orientation and revision

The progression graph below was produced with the following commands:

source("D:/tprdb/bin/proGra.R")

ReadData("D:/tprdb/KTHJ08/Tables/P03_T1")

ProgGraph()

The target text is shown on the right vertical axis in the order of the source text. Fixations on the source text are shown in blue and fixations on the target text are shown in green. The longer the fixations, the larger the green diamonds / blue circles.

It is very obvious that there was a complete orientation phase (marked with a yellow oval) - the translators reads the complete source text before starting to type. Segment 2 and 8 seem to be difficult for this participant in comparison to the other segments which are translated in a fairly smooth manner. Insertions are shown in black and deletions in red. Finally, there is a revision phase (marked with a red rectangle).

Translation without orientation or revision

The progression graph below was produced with the following commands:

source("D:/tprdb/bin/proGra.R")

ReadData("D:/tprdb/KTHJ08/Tables/P08_T1")

ProgGraph()

This translator does not read the source text before starting to type nor do they revise the final target text.

Zooming in

The progression graph below was produced with the following commands:

source("D:/tprdb/bin/proGra.R")

ReadData("D:/tprdb/KTHJ08/Tables/P08_T1")

ProgGraph(X1=190000, X2=310000, Y1=100, Y2=130)

X1 refers to the left limit of the horizontal axis, while X2 refers to its right limit. Y1 and Y2 are the limits on the vertical axes. In other words, in order to zoom in to a particular moment in time, the axes are adjusted.

The translator reads and re-reads segment 8 more or less completely before starting to type. Subsequently, reading and typing activities follow each other. While typing, the participant reads the target text, monitoring production. The final words of segment 8 are revised after the first 10 words of segment 9 are typed. In other words, the translators revises the translation of [to him an to the killings] just after typing the translation of [killings] in segment 9, returning to segment 8, before continuing to translate segment 9.

Zooming in further

Here, the X and Y axes were adjusted. with the following code, resulting in a zoom in an even closer inspection of the data:

ProgGraph(X1=190000, X2=240000, Y1=100, Y2=117, seg=0)

The variable seg, if set to 0 makes the segment boundaries disappear.

Visualising Fixation and Production Units

For this image, X and Y axes have been slightly adjusted and the variables pu and fu, if set to 1 visualise production and fixation units. The variable label, if set to 1, shows the type of fixation unit (type 1 for source text reading and type 2 for target text reading). The label also shows the duration of the unit.

The progression graph below was produced with the following command:

ProgGraph(X1=194000, X2=234700, Y1=100, Y2=117, seg = 0, pu=1, fu=1, label=0)

Visualising activity units

ProgGraph(X1=225000, X2=234700, Y1=100, Y2=117, au=1, seg=0)

Concatenate tables of same type

All tables of a kind for at least one study should be concatenated when doing statistical analysis e.g. in R The challenge is to keep the header of the first file and to concatenate all other files without header. Here are tree ways to do this:

In bash (Windows command tool)

concatenate all *st files from a study (e.g. BML12) and keep only on header line

cd into the study folder, e.g. BML12
use "findstr" to extract the header line from any one *st file (e.g. Tables\P01_T1.st) and store it in a file BML12.st
use "type" to concatenate concatenate all *st files, pipe into "findstr" to suppress the header lines and paste to BML12.st

findstr /b Id Tables\P01_T1.st > BML12.st

type Tables\*.st | findstr /v /b Id >> BML12.st

In Linux (or e.g. cygwin)

generate the a file "ml.st" from six multiLing studies

cd into the tprdb folder
generate header line and store in file "ml.st"
cat all *st files in the multiLing studies, suppress header lines with "grep -v ^Id" and paste to "ml.st"

head -1 KTHJ08/Tables/P02_T1.st > ml.st

cat {KTHJ08,BML12,SG12,NJ12,RUC17,ENJA15}/Tables/*st | grep -v ^Id >> ml.st

In R

generate the function "readCRITTables()"

readCRITTables <- function(paths, tab) {

D <- {}

for (path in paths) {

files = list.files(path, full.names=T, pattern=tab)

for (i in files) {

D <- rbind(D, read_delim(i, "\t", escape_double=FALSE,

col_types=cols(Task = col_character()), na="---", trim_ws=TRUE))

} }

return(D)

}

The function can be called with a list of studies and the extension (i.e. type) of the table, e.g.

BML12.st <- readCRITTable(["BML12/Tables"], "*st")

Chinese Character Encoding problems with Chinese windows

The TPR-DB data is generated with UTF8 encodings. If you happen to work on a Chinese Windows system, you might need to convert the Chinese Characters (i.e., the Tables) into GB18030 (check the Wikipedia entry), for instance with the following Linux command:

iconv -f UTF8 -t GB18030 fileNameIn > fileNameOut

e.g. iconv -f UTF8 -t GB18030 ml.st > ml_GB18030.st

Google Sites

Report abuse