During summer 2013 The Bridge (CRITT + danCAST) plans to conduct an implementation workshop for computer assisted translation (CAT), in which a translator reads a source text on a computer screen and speaks out the translation in the target language, a process called sight translation. This sight translation process is supported by an Automatic Speech Recognition (ASR) and a Machine Translation (MT) system, which transcribe the spoken speech signal into the target text and which assist the translator with partial translation proposals, predictions and completions on the computer monitor. An eye-tracking device follows the translators gaze path on the screen, detects where he or she faces translation problems and triggers reactive assistance.

The project will extend the CASMACAT workbench, transforming it to a Speech and Eye-tracking Enabled Computer-Assisted Translation (SEECAT) platform which will be experimentally implemented and tested.

    • Objective: Use speech input as a post-editing tool for language translators in order to enhance their efficiency. Use eyetracker to synchronize reading and speaking with the MT output, for positioning of input cursor.
    • Success Metric: Demonstrate increase in translation throughput using speech input for post-editing over a system without speech input.
    • Why: Currently, human post-editors use keystrokes to improve the quality of the translation output. The project is to investigate the efficiency impact if they were to correct the translation output using speech input.
    • How: In the framework of the current CASMACAT workbench, integrating an automatic speech recognizer (ASR) that would accept spoken translations as input from human translators/post-editors to improve the quality output generated by a machine translation system. The speech recognition system would be partly constrained by the gaze data and output of a machine translation system, but will also be flexible enough to accept broader language. Strategies for balancing these approaches will be investigated.
      • Technology and Languages: The Sphinx and AT&T ASR system (Watson) will be trained for Danish and Hindi and the Moses open source decoder will be used for English > Danish and English > Hindi translation. More language combinations (e.g. Bengali > English) are investigated.


The SEECAT summer project is divided into 3 workshops, of each approximately 3 weeks:

  • May 21 to June 7: introduction ASR, MT, Eye-tracking, GUI technology;
  • June 8 to June 30: work in smaller groups on sub-projects;
  • July 1 to July 21: integration of sub-projects into a SEECAT prototype.

The introductory workshop (May 21 to June 7) will take place at Copenhagen Business School, Dalgas Have 15, room 2Ø091.

We'll have lectures and practical sessions in the morning and in the afternoons. According to the interests of each participant, we will define small tasks and sub-groups to work on a focussed sub-project and discuss its relation to the SEECAT project on a regular basis.

For the group work (June 8 to June 30) we have rented a spacious summer house in Nykøbing, Falster. The third integration workshop will, again, take place at Copenhagen Business School.

The programme for the SEECAT course can be found here (updated as of May 15, 2013).


Presentations and hand-outs will be made available in this section:

  • May 21:
    • Statistical Machine Translation - Overview (Jakob Elming): (PDF)
    • CasMaCat meets SEECAT (Ragnar Bonk)
    • CasMaCat GUI: upload - replay - list (Ragnar Bonk & Mercedes García): (PDF)
  • May 22:
    • Introduction to Post-editing / On-line workbenches (Bartolomé Mesa-Lao) : (PDF)
    • Fundamentals of ASR 1 (Richard Rose): (PDF)
    • Language modelling: when, why, how (Jeevanthi Liyanapathirana): (PDF)
  • May 23:
    • Interactive MT (Philipp Koehn): (PDF)
    • Fundamentals of ASR 2 (Richard Rose): (PDF)
  • May 24:
    • Some thoughts about the conceptual/procedural distinction in translation (Fabio Alves): (PDF)
      • Employing speech recognition software in human translation (Barbara Dragsted): (PDF)
    • Psycholinguistics meets translation studies (Laura W. Balling): (PDF)
  • June 3
    • Web Technologies in CASMACAT (Vicent Alabau): (PDF)
  • June 4
    • Interactive machine translation (Vicent Alabau): (PDF)
  • June 5
    • Multimodal post-editing (Vicent Alabau): (PDF)
  • July 9
    • ETAP-3 Linguistic Processor: an NLP Implementation of the Meaning/Text Theory (Leonid Iomdin): (PDF)



Lecturers and presenters:

  • Alexandra Birch (UEDIN, UK)
  • Andreas Søeborg Kirkedal (CBS, Denmark)
  • Anusuya Mallavalli Andanigowda (Mysore, India)
  • Arnt Lykke Jakobsen (CBS, Denmark)
  • Barbara Dragsted (CBS, Denmark)
  • Bartolomé Mesa-Lao (CBS, Denmark)
  • Fabio Alves (UFMG, Brazil)
  • Jakob Elming (KU, Denmark)
  • Jeevanthi Liyanapathirana (CBS, Denmark)
  • Joris Driesen (UEDIN, UK)
  • Leonid Iomdin (IPPI, Moskwa)
  • Mercedes García Martínez (CBS, Denmark)
  • Michael Carl (CBS, Denmark)
  • Pascual Martínez (Tokyo, Japan)
  • Peter Juel Henrichsen (CBS, Denmark)
  • Philipp Koehn (UEDIN, UK)
  • Richard Rose (McGill, Canada)
  • Ragnar Bonk (CBS, Denmark)
  • Silvia Hansen-Schirra (Germersheim, Germany)
  • Srinivas Bangalore (AT&T, USA)
  • Syam Agrawal (KIIT, India)
  • Titus von Malsberg (Potsdam, Germany)
  • Ulrich Germann (UEDIN, UK)
  • Vicent Alabau (UPV, Spain)



Some of these papers can be retrieved from this link.

Literature on Translation Process Research in Copenhagen Studies in Language (CSL):

Sponsored by:

The Danish Agency for Science, Technology and Innovation and