CasualConc - Beta features - Corpus File Information

In addition to Basic file information available in version 1.0, word frequency information and TF-IDF are added. TF-IDF is a measure of prominence or importance of words in a file. For more information, read this Wikipedia entry.

Basic Operations

You need to import a word list to count words or calculate TF-IDF in each file.

Select one of the three sources to read a word list.

Word Count - import a word list from the left table of Word Count. You need to create or import a word list in Word Count.

You can set the limit to the number of words to import from the Word Count table. This is to avoid accidentally import a very long list. Go to Preferences -> File Info. If you want to create a word frequency matrix of the entire words in the corpus, uncheck this, but it will take a very long time and a lot of memory depending of the size of your corpus.

File - import word list from a file (comma separated or tab-delimited plain text files)

You can select a text encoding and rows and columns to ignore when importing.

You can also check what will be imported by selecting a file and clicking Check.

Import Panel - import a word list you copy/paste from other list or type.

A panel appears, so just copy/paste a word list or type words and click Read.

You can check the word that are imported by clicking Check.

A panel appears. Check if the words on the table are the ones you want to count.

Once you import words, set the span. The following example is to read the first word to the 20th word on the list.

When you are ready to count words/TF-IDF list, click Get File Info.

The results can be exported as a CSV file or tab-delimited file (.csv).

You can select rows and copy/paste the selected results or copy all the results on the table.

Options

Word Frequency Information

You can limit the number of words to display on the result table. Go to Preferences -> File Info -> Word Frequency Information and check Limit result table columns to and enter a number. Even with this checked, all the words within the span (see above) will be counted, but they are just not displayed on the table. If you export or copy/paste the result, all the words appear on them.

If Only copy the results in the range displayed on the table is checked, the above limit applies to the copied results.

You can normalize the frequency. To set this, go to Preferences -> File Info -> Word Frequency Information and check Normalize word freq and select how you want to normalize the result.

You can select % or per words. If the latter is selected, specify the number of words to normalize to.

Check Sort frequency list of each file by word frequency to sort the results.

Normal

Normalized to %

Sorted by frequency in each file

TF-IDF

You can sort TF-IDF results. To enable sorting, go to Preferences -> File Info -> TF-IDF and select either Sum of all files or Each file.

No sort

Sorted by Total (Sum of all files)

Sorted by the values in each file