Labelling Guidelines

Transcribing and labelling is in itself quite simple, but we ask you to follow some rules so that the transcription is useful for MONK:

  1. Write the words exactly as you see them in the original document, including the use of capitals. Do not interpret or correct them. Abbreviated words are also labelled as they are written in the document.
  2. Do not label a word if you are not sure of what it says (in other words, do not guess). Skip the word and jump to the next one. This is important because MONK only “unlearns” a mistake by adding on top of it enough correct labels, and that takes time.
  3. Please do not try to be hypercorrect by entering diacritics in the main label, MONK cannot read it. If you encounter for instance words such as 'Änderung', then label it as 'Aenderung'.
    • 'über' is labelled as 'ueber',
    • 'Őffnung' as 'Oeffnung'

... the 'Umlaut' will be added later.

4. If words are seperated with a hyphen, e.g. 'ge-' <line break> ' 'legen', then label them as two labels, label one is 'ge' or 'ge-', label two is 'legen'

7. Indicate wherever underlined text appears in the document. The example below is labelled as “@Rana_underlined”.

@Rana_underlined