Managing parallel corpora in CasualMultiPConc

CasualMultiPConc has two modes: File and Database. You can switch modes in Preferences.

When you first run CasualMultiPConc, the number of the corpus can be specified also in Preferences.

File Mode

Adding files to corpus file tables

In File Mode, you add files to file list tables.

    1. Select a corpus tab to add your corpus files.

    1. Click Add button and select file(s)/folder to add to the list. You can also drag&drop file(s)/folder to the file list table.

    1. CasualMultiPConc can read the following file types.

        • Plain Text (.txt)

        • Microsoft Word (.doc/.docx)

        • Rich Text (.rtf)

        • OpenOffice (.odt)

    2. For Plain Text files, you can specify an encoding before you add the files to the table.

    1. The added file(s) appears on the file table.

    1. If a plain text is added, you can change an encoding for individual file on the table.

    1. You can put a label to the selected corpus. This will be reflected throughout the application.

    1. Repeat the process until you add files to all the corpora. Just make sure you add the same number of files to each file table.

    2. You can delete selected file(s) by clicking Delete and clear the table by clicking Clear.

    3. You can also add plain text (.txt) files to the file lists using Text Aligner.

Previewing file content

You can check the content of a file on a file table to check if CasualMultiPConc can read the file appropriately (esp. text file encoding).

    1. Check File Content Preview.

    1. On a file list table, click a file you want to preview.

    1. The content of the selected file appear in the preview box.

    1. If you want to open a file with a specified application, click Open. This Open button works for a selected file on the table even if file preview is not enabled.

Checking text alignment

File Match Preview

You can check text alignment of files.

    1. Select one of the files on a file table. Make sure you have files on all the tables.

    2. Click Preview Match under the table.

    1. A File Match Preview window appears (above).

    2. Click any line to check the matching texts.

    3. You can select a corpus and click Open with Editor to open the file for a selected corpus with a specified application.

    1. Once you fix the alignment, save the file and click Update Preview to check the change(s) you made.

    2. If you are satisfied, close the File Match Preview window.

Database Mode

In Database Mode, the text contents of the corpus files will be stored in a database file. You can save a database file, import matched aligned texts, or export the database content.

Creating a new database

    1. Follow the instruction to add files in File Mode to add files to file tables.

    2. Make sure you have the same number of files on each table.

    1. Files on the file tables will be added to a database named temp.

    1. You can start using the corpus now.

    2. You can also add text files to the database using Text Aligner.

    3. If you want to start from a new blank database (temp), click New DB button.

Saving/opening a database file

Save

You can save a temp database file for later use. Simple click Save as... button to save the file. The extension is .cmdb.

Open

Click Open DB File button to open an existing database file. If you open a database file, the files in the unsaved temp database will be gone (a warning message appears).

You can open CasualMultiPConc (.cmdb) database and CasualPConc (.cpdb) database files.

If you want to add files in a database file to a current database, use Import function (see below).

Importing text file(s) with aligned texts

You can import the following files to the existing database.

    • CasualPConc database file - if you want to merge an existing database files to the current database

    • Aligned text file (.txt) - matched pairs of text from two or more corpora are in a single file (with single line break) and pairs are separated by two line breaks (see example)

    • text from corpus 1

    • text from corpus 2

    • text from corpus 1

    • text from corpus 2

    • CSV (.csv) separated by comma (UTF-8) - a matched pair of text are in one line separated by a comma

    • CSV (.csv) separated by tab (UTF-8) - a matched pair of text are in one line separated by a tab character (tab-delimited text file with .csv extension)

    • Note, CSV files should be encoded in ASCII or UTF-8.

When you import an .csv file, you will be asked if the format is comma separated or tab-delimited.

This process might be re-worked in the future.

Exporting database content

You can export the text in the database. Click Export button.

The choices are CSV (.csv) separated by commas or by tabs (tab-delimited) or Parallel Aligned Text (.txt). You can also swap the order of two corpora.

Parallel Aligned Text is the format CasualPConc can import (see above). A pair of matched sentences/paragraphs from two corpora are together and separated by a blank line from the previous/next pair.

text from corpus 1 - 1

text from corpus 2 - 1

text from corpus 1 - 2

text from corpus 2 - 2

Once you import text files into your database, check if texts are properly imported. If not, select other File Type or edit the original file.