CasualConc

© 2008-2009 Yasu Imao
How to Use‎ > ‎

Before You Start

CasualConc is a set of tools to analyze your own collection of text.  You need files with text on your local hard drive or network drive (slower).  CasualConc works best with plain text files (.txt).  Other file formats (MS Word Doc, PDF, HTML, etc.) are supported, but it takes more time to process them.  You might want to convert them to plain text files to use with CasualConc.  CasualTextractor (a utility program) has a function to extract text data from other file formats and save as plain text files.

The Concord was originally designed to work with single-byte character languages with no accent marks.  The current version is Unicode compatible and can handle most of European languages (with appropriate settings).  2-byte character languages (East Asian Languages) are also supported, but the processing speed is much slower and the functions are limited.  Unfortunately, right-to-left languages are not supported.

The basic unit of analysis in CasualConc is paragraphs, which are separated by line feed/break characters (\n or \r\n).  This means context words, sort words, clusters, etc. are handled in this unit.  There is an option to force the analysis by sentence and by file, but the processing speed of these modes are much slower.  If you want to analyze your corpus by sentences, process your text files before the analysis (insert line break character \n after each sentence).  If your texts have carriage returns in the middle of sentences/paragraphs (like Brown Corpus or extracts from PDF files), you need to delete them before using CasualConc or set the scope of analysis to File.  I want CasualConc to be able to handle such files in the future (i.e. current File as a scope mode + deleting characters from each line, etc.), but not for now.

When you run CasualConc for the first time, it creates a folder named CasualConc in ~/Library/Application Support folder.  In this folder, a file named CasualConcData.ccdb will be created.  CasualConc stores corpus/database information used in Advanced Corpus Handling Mode on this file.  If you use Advance Corpus Handling Mode, do not delete this file (only do so when you a have problem).  If you delete this file, all the corpus/database information will be gone (database files themselves will not be deleted).

If you want to completely delete CasualConc from your HDD, delete CasualConc.app, CasualConcApp.plist in ~/Library/Preferences, and CasualConc folder in ~/Library/Application Support.  If you don't delete the latter two, they don't do any harm to your Mac (the files are simple SQLite data file and standard property list file).