Elman also clustered the hidden unit activation patterns for each word in the training data set. This "context-sensitive" clustering of hidden unit patterns created groupings similar to those obtained for the "mean vector" analysis.
In this simulation, the context makes up an important part of the internal representation of a word. ... [I]t is literally the case that every occurrence of a lexical item has a separate internal representation. ... The fact that these are all tokens of the same type is not lost ... These tokens have representations which are extremely close in space -- closer to each other by far than to any other entity. Even more interesting is that the spatial organization within the token space is not random but reflects differences in context which are also found among tokens of other items. The tokens of boy which occur in subject position tend to cluster together, as distinct from the tokens of boy which occcur in object position. This distinction is marked in the same way for tokens of other nouns. Thus, the network has learned not only about types and tokens, and categories and category members; it also has learned a grammatical-role distinction which cuts across lexical items. [ELMAN89, pp.7-8]
Although Harris does not directly address this type-token distinction, he does address the emergence of grammatical-role from co-occuring words. The "fuzzy" sets of next words tend to establish grammatical-roles. In essence, the likelihood relationship between a word and its possible successors partitions the appropriate operator space in a very specific manner. In the context of a PDP schema model [RUMEL86b], each word will adjust the "goodness-of-fit" landscape for the next possible word. This distortion will place more likely words at very high points, and less likely words at lower points.
I believe that Elman's type-token distinction may well correspond to a word's adjusting positions in "likelihood" space based on the word(s) that preceded it. Note that subjects, which precede their verbs, would have a distinctly different position in "likelihood" space from objects, which follow the verb. Thus, it appears that Elman's type-token distinction is also consistent with Harris' language theory.