Although lexical information plays an important role in language, it actually accounts for only a small range of facts. Words are processed in the contexts of other words; they inherit properties from the specific grammatical structure in which they occur. [ELMAN89, pg.8]
Up to now we have been looking at low-level relationships between Harris' word categories. But, at the next higher level, we can examine words which yield the same "fuzzy" sets for next words. These words can be considered "equivalent" to the extent that their word groups establish the same "context" for the next word. Thus, we can identify groups of subjects that are associated with the same "likelihood" space of verbs. Or, groups of verbs that are associated with the same "likelihood" space of objects. This equivalence relation between words, when correlated with the lexical definition of the words, can be used to identify word sequences that are paraphrases of each other. In fact, a necessary condition for Harris to consider two word sequences "paraphrastic" to each other is that they have the same next word "likelihood" space.
Harris considers word groups belonging to the same paraphrastic equivalence class to be related by linguistic transformations. He attempts to locate the core of a language by finding the one "kernel" word group for each class. This "kernel" word group must generate the whole class using as few transformations as possible. The analysis at this level will yield a set of transformation domains. Such a domain includes the words that terminate each word group on which the transformation can act. It should be noted that most of the transformations will be "reductions" -- that is, the elimination of redundant or low-information words. Further, these reductions are based on the "likelihood properties" of the component words of each word group.
A key point here is the need to include a "semantic" component to guide the network's search for transformations. Elman voices a similar sentiment:
The network has no information available which would "ground" the structural information in the real world. In this respect, the network has much less information to work with than is available to real language learners. In a more realistic model of acquisition, one might imagine that the utterance provides one source of information about the nature of lexical categories; the world itself provides another source. One might model this by embedding the "linguistic" task in an environment; the network would have the dual task of extracting structural information contained in the utterance, and structural information about the environment. Lexical meaning would grow out of the associations of these two types of input. [ELMAN90, pg.201]
The appeal of a Chomskyan-style formal system is the ability to isolate syntax from semantics. However, as Chomsky has said, such a formal system cannot emerge by induction from the actual sentences of the language. Harris offers a theory that will allow grammar to arise from the actual linguistic data, but it requires the mixing of semantics with syntax.