Before you start CasualMultiPConc

CasualMultiPConc is my second attempt to create a parallel concordancer, which is based on CasualPConc. The feature set of CasualMultiPConc is very simple at the moment. The only advantage over CasualPConc is this application can handle 2 to 5 parallel corpora. Because I personally don't use parallel concordancer, I wrote this from my very limited knowledge about parallel concordancer. So whether this projects proceeds depends on your feedback. If enough people use this and give me feedback (or tell me what should be included or how things are dealt with), I might spend more time to further develop this program. In fact, this project started when I got a feedback on CasualPConc, which can only handle 2 corpora at a time.

To use CasualMultiPConc, you need corpus files that are aligned. This means you need separate files for two or more corpora and the two matched files should be aligned. "Aligned" here means both files should have the same number of paragraphs (text separated by line break character) that are matched. With the current implementation, CasualMultiPConc ignores blank lines, so if matched sentences/paragraphs appear in the same position (nth sentence/paragraph), the number of blank lines does not matter. You can open unaligned files and check the alignment on this application.

The supported file formats are plain text (.txt), Rich Text (.rtf), MS Word (.doc/.docx), and OpenOffice (.odt/sxw). But to be safe (and faster processing), plain text files (.txt) are recommended. For plain text files, various text encodings are supported, but UTF-8 is the default and recommended.

CasualMultiPConc can handle aligned text files or you can create a database file from your corpus files for faster analysis. You can also create a database from the files by importing CSV/tab-delimited text files and text files with a certain format (there might still be bugs in this feature). The database file can be saved for your future use.