Text Importation

By Miller Prosser October, 2013

Updated January 2017

OCHRE can import texts copied in from other sources, like text editors and word processors.

There are many options to help import the text correctly. 

The Pending Content Tab

Type or paste your text here.

Text entered in square and half brackets will be marked with the appropriate level of metadata upon import. 

Entering Special Codes

Broken Sections

Use the following codes to indicate broken sections.

NOTE: the resulting display depends on the configuration of the epigraphic sigla at the project level.


A section of approximately 3 broken signs: [x$3$] will result in something like [x x x]

A section of an indeterminate number of broken signs: [x$~$] will result in something like [ ]

A section of an indeterminate number of broken signs equivalent to about 6 signs: [x$%6$] will result in a blank space six signs long.

A section of 3 broken lines: [X$3$]

NOTE: If you are providing line number in your transliteration, do not indicate a line number when using the missing lines code X$.


A section of an indeterminate number of broken lines: [X$~$]

A section represented as both blank and missing: [x$3$]

NOTE: You may use this last code to indicate blank space as well. Simply omit the brackets and use the code x$9$, replacing the number 9 with the approximate amount of space needed. However, keep in mind that the blank space will be assigned the "missing signs" metadata property even if the view of the text looks like simple blank space.

NOTES:

Examples of broken sections:

ml[k x$~$] ytn will be imported as ml[k ...] ytn with ytn being a separate word.

ml[k x$~$]ytn will interpret -ytn as the latter half of a word that began in the break.

mlk [x$9$] ytn will be imported as mlk [ . . . . . . . . .] ytn.

NOTE: the blank and missing space is usually represented simply by spaces in the text view; however, users can override this display and choose another character in their project-level preferences.

NOTE: If you import blank space in the middle of a word, this blank space will be added to the discourse unit. In most cases, this will cause the text analysis wizard to fail when looking up this attested form. You will need to remove the blank epigraphic unit from the discourse unit before it will match against an attested form in the glossary. 

Here is a sample Pending Content and resulting import:

Recto

[X$5$]

6) k [x$14$]

7) w [x$14$]

8) a͗⸢ḥ⸣[x$11$]

9) k . [x$13]

10) w . ⸢ṯ⸣[x$12]

11) w . y[x$13]

Recto

[5 lines missing]

(6) k [--------------]

(7) w [--------------]

(8) a͗⸢ḥ⸣[-----------]

(9) k . [-------------]

(10) w . ⸢ṯ⸣[------------]

(11) w . y[-------------]

Ruled Lines

A similar strategy is used for noting the presence of ruled lines on the epigraphic representation of a Text. Use "s" to represent a single line; "S" to represent a double line. Include the length of the line as shown in the following examples:

The s- or S- notation marks the resulting epigraphic unit's Type as a "separator" and, in effect, assigns the following settings which are made explicit on the epigraphic unit's pane. If auto-line-numbering is on, the ruled line is not assigned a number. 


NOTE: after import, all of these options can be changed in edit mode. See the wiki article on using metadata to indicate damage and missingness. 

Other Codes

OCHRE provides two codes for creating compound discourse units during the import process. These compound units could be words with clitic elements (as is common in certain alphabetic languages), compound proper names, or entire phrases. Each project may decide for itself which character codes to use when marking up import text. The codes should be entered at the project level on the epigraphic sigla tab. We suggest the following two codes because they are relatively safe in that they do not communicate any inherent processing meaning for the database.

For compounds combined without intervening spaces:

For compounds separated by spaces:

NOTE: In the case where you want the compound items to be separated by a specific character, like a dash, configure the epigraphic sigla tab with your chosen character in the compound field. Be careful not to use a character that has other meaning on import.

The Import Tab

Link in the appropriate dictionaries and writing systems. The import will try to match against these, if the user so chooses.

On the Information Tab of the Text, it is helpful (and sometimes necessary) to select the type of writing and the language of the text.


Enable epigraphic-unit processing: this will atomize the text down to individual signs or letters.

Lookup script-units/readings in available writing systems: links each sign to a sign in a writing system.

Automatically number each line of text: select this option if you do not intend on providing line numbers in the Pending Content pane.

Expect section headers in the text: check this box if you intend on providing headers like Recto, Section, etc. In the Pending Content tab, each headers must follow an blank line.

Leave 'x' and 'X' unformatted: check this box if you include 'x' or 'X' as indications of broken signs. This well tell OCHRE not to treat these as real letters.

Parse words into individual alphabetic characters: the is only available for texts of the Type: alphabetic. If unchecked, OCHRE will not fully atomize words into letters.

Apply language tags to numerals: if checked OCHRE will define logographic numerals (e.g. 5, 1/2, etc.) as being in the selected language.

Remove format: these check boxes allow the user to provide italicized text on import to be used for language recognition and to remove this formatting upon formatting. These are useful if you are importing a bilingual text and you want the import to recognize Latin apart from Punic, for example.


There are some helpful notes at the bottom of the page explaining some of the assumptions OCHRE makes based on the input it receives in the Pending Content pane. 

Project Specifications

Each project must specify their transcription system in the project specifications. OCHRE will use these specifications to recognize languages and writing systems. A project can choose which cuneiform transcription tradition it wishes to follow, whether signs are uppercase, lowercase, italicized, or superscript. 

Additional Assumptions

Workflow

Other Notes

On rare occasions, you may need to classify a sign as a specific type even though it is not a typical sign. Perhaps it does not match against a sign in a writing system. Perhaps you can tell the sign in question is a number, but you cannot determine the value. In these cases, you can use the sign type pick list to specify manually the type of sign. 

This will allow OCHRE to handle and view the sign as needed. This manual choice should not be used in most cases.