3. Occurrence-Based Word Categorization

CHAPTER 3

OCCURRENCE-BASED WORD CATEGORIZATION

We have developed a method of word categorization that we call "occurrence-based." Following the observations we made in Chapter 2, we scan a sample of text recording the contexts associated with each of its words. Then we use a similarity measure to group the words that have the "most similar" contexts. The result is a set word groups that represent the natural categories present in the text. In this chapter, we will describe our technique in detail and illustrate it using Elman's "toy" language.