2.2 ELMAN'S NEURAL-NET MODEL OF WORD CATEGORIZATION
Elman (1990) approaches the problem of learning lexical categories using a connectionist model. The model is fed a continuous stream of words from a sentence generator (see Figure 2.1). Its task is: given the current word, predict the next word. The task is complicated by the fact that there is no end-of-sentence marker. The task is eased by the fact that there are some strong regularities present in the sentences.
Sentences are generated from the lexicon using the sentence templates listed in Figure 2.1. Note the sentences that involve the words in category AGPAT. These verbs have optional objects. Further, BREAK, which is lexically ambiguous, is one of the verbs in this category. Thus, although this model has an extremely small lexicon, it presents a number of complicated issues to deal with.
Sentence generation was accomplished as follows:
Randomly generate a list of 10,000 sentence templates; that is, about 625 occurrences of each template.
Fill each slot in a selected template by randomly selecting words from the appropriate category; for example, the second template (HUM-PERCEPT- INANIM) would generate about 313 sentences with verb SMELL, and 313 sentences with verb SEE.
The 10,000 sentences are assembled into a stream of words with NO indication of sentence boundaries. Thus, the model would see something like the following:
woman, smash, plate, cat, move, man, break, car, boy, ...
The input stream will contain about 27,500 words (since 75% of the sentence templates have three words, and 25% have two words).
The actual data set reported in Elman (1990) had 27,534 words. The model was trained to predict the next word in the input stream. It was given six training passes through the data set, and then its performance was evaluated. The net was very bad at its assigned task, predicting the actual next word (RMS error = 0.88). However, it was very good at predicting next-word likelihoods; that is, CONTEXT SENSITIVE probabilities for next words (RMS error = 0.053). Further, when Elman examined the model's word representations (that is, the average hidden unit (internal) activations over all occurrences of a word), he obtained the word clustering in Figure 2.2.
Note that the NOUN categories occur in nice groups: HUM, ANIM, AGR, INANIM, FRAG, and FOOD. This is anticipated because of the regular use of these categories in the sentence templates. On the other hand, the VERB categories are quite messy. However, INTRAN and DO-REQD (that is, CHASE and LIKE) are used in a consistent fashion, and they do appear as neat clusters. All the other verbs have unique behaviors. Thus, their clustering is hard to predict.