Text Mining

Text Mining allows you to breakdown and analyse your text data "mining" it for information. Currently there is just one mode of mining in CSV Easy - Phrase Occurrence:

First you specify the column containing the phrases you wish to analyse. Then optionally you can select an aggregated count, numeric column. This column should contain the occurance of each phrase. Omitting this option will assume that each phrase only occurs once.

Clicking 'Run' will mine the data. The results are returned in the main view as a two column file of Phrase and Count. Phrase will contain every permutation of phrase from the original data and Count is the occurrence of said phrase.

What do we mean by permutation of phrase? Lets take the following source row:

The girl is happy

After mining we could get the following results:

Similarity phrases

This optional feature allows you to group phrases together in to one collective phrase. As in the screenshot above we might only be interested to know that a phrase was an animal or a person. We don't wish to count cat,dog,fish,man,woman,male or female as individual phrases. The format is one comma separated line per similarity list. The first phrase is the replacement and all subsequent values are the similar phrases to replace.

Additional options

Case sensitive matching will ensure only phrases with exactly the same casing and letters will be considered the same.

Apply Stem Words When a phrase is extracted it will be converted to its Stem Word if possible. For example Fish, Fishing and Fishs all have the Stem Word Fish.

Apply Homophone Words As will Stem Words each phrase is converted to its Homophone group word. Homophones are words that sounds the same, but are spelt differently, so too, two and to will all become to.

Page updated

Report abuse