2.3 ANALYSIS OF THE OUTPUTS
Elman's model is trained on the task of predicting the next word in the input stream. Although a powerful statistical machine like Elman's PDP net was obviously doing more than first-order statistics, we wondered if there was enough information in these simple statistics to meet our needs. So, the first question we asked was: How much of this internal structure is inherent in the training data? To answer this question, we duplicated Elman's experiment, tallying next-word probabilities (i.e., we totaled all the words that followed each word in the lexicon and then normalized the sum for each word to one). Using the resulting next-word transition vectors as the internal representation for the words of the language, hierarchical clustering produced the diagram in Figure 2.3.
As the cluster diagram indicates, most of the primary word clusters that Elman found are present. However, the FOOD, FRAG and INTRANS groups have been consolidated into a single primary cluster. (This is because all three of these categories end sentences. Thus, all of these words will always be followed by a subject noun.) Further, Elman's higher-level word clusters are not duplicated. The first-order statistics do not group the primary clusters into neat categories such as ANIMATES/INANIMATES or NOUNS/VERBS.
Thus, simple transition probabilities are not sufficient to duplicate Elman's results. However, they are sufficient to duplicate the primary word groups: AGR, ANIM, HUM, INANIM, FOOD, FRAG, INTRANS, and DO-REQD. In subsequent cluster charts, we will treat these primary groups as single entries. This will allow us to present both Elman's and our cluster diagrams in the same figure.
When Elman evaluated his net's performance, he compared the output against context sensitive likelihood vectors. As a next step in this analysis, we investigated whether using context-sensitive word representations could replicate Elman's word categorization pattern. (That is, we represented each word by a pair of probability vectors. The first was the transition vector used in the previous experiment. The second was a similar vector representing the prior word. Thus we have modified the transition vector to include some reflection of the prior context1.) The cluster diagram for this word representation is in Figure 2.4.
Note that INANIM has joined ANIM and HUM to form a word cluster that includes all words that can be both SUBJECTs and OBJECTs. However, this is very minor progress toward the desired word categorization. Increasing the "depth" of sensitivity (for example, using three probability vectors representing the next-word transition probabilities and similar vectors for the two preceding words) did not improve the clustering.
Finally, we decided to investigate whether adding an end-of-sentence marker (<EOS>) would aid us in duplicating Elman's results. Notice that the following characteristics hold for the two noun groups ANIMATES and INANIMATES:
It appears that a key characteristic of ANIMATES is that they are predominately SUBJECTs. Likewise, INANIMATES can be characterized as predominately OBJECTs. Thus, Elman's word categorizations might to be based on word usage within the sentence. For this to occur, it seems likely that the network must be learning to detect sentence boundaries. The next question we investigated was: Can first-order statistics plus an end-of-sentence marker (<EOS>) explain Elman's word categorization?
Again, we duplicated Elman's experimental data. But this time we added <EOS> at the end of each sentence. Again, the context-sensitive word representations were used. Cluster analysis produced the diagram in Figure 2.5. This addition of an end of sentence marker produced a higher-order cluster for NOUNS-W/O-AGR. Further, the three INANIMATE word groups form a separate higher-order cluster. Unfortunately, the FOOD and FRAG groups have been brought much closer together, and should be considered merged into a single OBJECT ONLY primary group. Finally, only seven of the eleven verbs are grouped together before the nouns and verbs the nouns and verbs become mixed. Thus we still lack the clean division of words into NOUNS and VERBS that is so prominent in Elman's word clustering.