Informatics Reading Corpus

Uploading texts to the IRC

Choose texts to add to our corpus that

  • are not there

  • are fairly recent

  • are by a native speaker or have been proofread

Check if the article you want to upload is already in the corpus:

  • Click on Concordances in the left panel.

  • Click on Text Types if it is not already visible.

  • In the Title field, type a key word of the text you want to upload. If it is already there, it will appear in this box.

Convert pdf etc. to txt:

Join any words that don’t need hyphens.

Divide the text into sections according to its headings.

Do not include the References section.

Paste the section of the paper into the Paste text field in Upload:

Click next. In the left panel then, choose Edit metadata.

Click the green plus sign and add the attribute and the value.

Add the following attributes and your own values:




uco (your own)

And choose your values from these attributes here:

section id:

Please don't use anything below Future Work, unless there really are whole sections in the paper devoted to something else.


Then click OK and this section of your academic paper is uploaded. Repeat with the other sections of each paper.

When you have completed uploading, click Compile Corpus.

The following notes concern the older system and may be preferred by XML aficionados who can insert correct XML coding throughout a full paper, rather than breaking it into sections as requird by the metadata tool.

Prefix each section with xml codes following the example here:


    • <doc id="majtner08" title="Radon Representation-Based Feature Descriptor for

    • Texture Classification" uco="172786" pubyear="2009" lab="Laboratory of Optical

    • Microscopy [LOM]">

    • <section id="abstract">

And paste it into the text field in Upload:

Click next, etc.

When the whole text is done, click Compile Corpus.