2.6 A NEED FOR FREQUENCY AS WELL
The concept of an occurrence-based word representation was motivated in the preceding section. It seems quite relevant in syntactic situations, where "possibilities of occurrence" seem most relevant. On the other hand, frequency based representations seem to be more related to semantic situations. In essence, the semantics will help us select the "most appropriate" of the possibilities provided by syntax. (Or, allow us to eliminate "inappropriate" possibilities.)
As a simple example of where we might need frequency information in the Elman corpus, consider the word "move". It can be used optionally as either an intransitive or transitive verb. When used as an intransitive verb, it will be followed by any of the possible subject nouns. Thus, it will have transition probabilities similar to the INTRAN word group. However, when it is used as a transitive verb, it is always followed by a noun from the INANIM group. Unfortunately, the INANIM noun group is also a possible subject. Thus, our occurrence-based representation will hold no information to separate these two varying usages.
In Figure 2.9, we show the transitional probabilities for the INTRAN verb group and "move". Each bar represents the probability that the next word will come from that particular noun group. The probabilities for the INTRAN group simply reflect the probability that the next sentence template will have a subject from that noun group. The probabilities for "move" reflect a mix of intransitive usage (40%) and transitive usage (60%). Thus, we see a down shift in the likelihood that a noun from AGR, ANIM, or HUM might occur next; and an up shift in the likelihood that a noun from INANIM might occur next.