Informatics Reading Corpus
http://bit.ly/maelt_irc
Uploading texts to the IRC
Choose texts to add to our corpus that
are not there
are fairly recent
are by a native speaker or have been proofread
Check if the article you want to upload is already in the corpus:
Click on Concordances in the left panel.
Click on Text Types if it is not already visible.
In the Title field, type a key word of the text you want to upload. If it is already there, it will appear in this box.
Convert pdf etc. to txt:
Join any words that don’t need hyphens.
Divide the text into sections according to its headings.
Do not include the References section.
Paste the section of the paper into the Paste text field in Upload:
https://ske.fi.muni.cz/auth/corpus/1078/
Click next. In the left panel then, choose Edit metadata.
Click the green plus sign and add the attribute and the value.
Add the following attributes and your own values:
title
author
pubyear
uco (your own)
And choose your values from these attributes here:
section id: http://bit.ly/irc_section_id
Please don't use anything below Future Work, unless there really are whole sections in the paper devoted to something else.
doc.lab: http://bit.ly/irc_lab
Then click OK and this section of your academic paper is uploaded. Repeat with the other sections of each paper.
When you have completed uploading, click Compile Corpus.
The following notes concern the older system and may be preferred by XML aficionados who can insert correct XML coding throughout a full paper, rather than breaking it into sections as requird by the metadata tool.
Prefix each section with xml codes following the example here:
http://ske.fi.muni.cz/auth/file/139946/setup/
<doc id="majtner08" title="Radon Representation-Based Feature Descriptor for
Texture Classification" uco="172786" pubyear="2009" lab="Laboratory of Optical
Microscopy [LOM]">
<section id="abstract">
And paste it into the text field in Upload:
https://ske.fi.muni.cz/auth/corpus/1078/
Click next, etc.
When the whole text is done, click Compile Corpus.