In the Elman model mentioned above, the corpus of sentences is generated by a very simple sentence generator. It had a set of simple two- and three-word sentence "templates" that it randomly filled with words from the lexicon. This corpus was so constrained that it would easily satisfy Harris' criterion for forming a sublanguage [HARRIS89]. A sublanguage is a very restricted subset of the language as a whole. The key restriction is that the words assigned to the sublanguage only have a "standard" usage. That is, the co-occurrence patterns are sufficiently regular that sentence "formulas" can be identified for those words. These sentence formulas perform the function of a grammar in Harris' theory of language.
Thus, Harris would predict that Elman should be able to identify specific sentence formulas in his model. Elman does not address this point in either of his two papers [ELMAN89,90] covering the first model. However, analysis using principle components (see below) identifies patterns in the hidden units that might qualify as sentence formulas.
Elman's second sentence processing model was a giant step beyond the one mentioned above. It's primary goal was to investigate a connectionist model's representation of grammatical structures. To pursue that goal, he set up the following training data set:
The stimuli in this simulation were based on a lexicon of 23 items. These included 8 nouns, 12 verbs, the relative pronoun who, and an end of sentence indicator, ".". Each item was represented by a randomly assigned 26-bit vector in which a single bit was set to 1 (3 bits were reserved for another purpose). A phrase structure grammar ... was used to generate sentences. [ELMAN89, pg.9]
As mentioned above, when a modeller uses grammar generated sentences to train his/her model, the subject of study becomes the grammar, not the language. In this case, the goal was to show that a connection model could implement the rather complex system represented by the grammar. In particular, the grammar allowed the nesting of relative clauses. This made the tasks of subject/verb agreement and verb argument determination far more complex. Any of the words filling these roles might be separated from their companion word by one or more relative clauses. Elman found:
... the network was unable to learn the task when given the full range of complex data from the beginning of training. However, when the network was permitted to focus on the simpler data first, it was able to learn the task quickly and then move on successfully to more complex patterns. The important aspect to this was that the earlier training constrained later learning in a useful way; the early training forced the network to focus on canonical versions of the problems which apparently created a good basis for then solving the more difficult forms of the same problem. [ELMAN89, pp.11-12]
Since we are not talking about a corpus from actual language here, Harris is not really applicable. However, the idea that the model would learn the simpler patterns first is compatible with Harris. He would hold that the complex sentences would be "paraphrastically" equivalent to simpler sentences in the "kernel" language. Since the simpler sentences are all "kernel" sentences themselves, it would be easier to learn them. Learning a complex sentence would require the language learner to: (1) first acquire the "kernel" sentences that would be considered equivalent to the complex sentence, and (2) then learn the transformation(s) that relate the "kernel" and complex sentences [see NOTE 2]. If the corpus was from actual language, the frequency of occurrence of complex sentences would probably be diminished enough so that the task could be accomplished without resorting to the "staged learning" strategy used by Elman.
The corpus used for training this model was sufficiently simple so that the network could, in fact, learn its regularities without resorting to transformations. Thus, Harris would anticipate that sentence formulas should be stored within the statistical information coded by the hidden units. Elman also aniticipated that the grammatical structure must be coded in the hidden units. Since the cluster analysis only yielded categorical information, it was necessary to devise a different analysis technique to look for the grammatical relations. The technique that located this information was principle component analysis.
This involved passing the training set through the trained network (with weights frozen) and saving the hidden unit pattern produced in response to each new input. The covariance matrix of the set of hidden unit vectors is calculated, and then the eigenvectors for the covariance matrix are found. The eigenvectors are ordered by the magnitude of their eigenvalues, and are used as the new basis for describing the original hidden unit vectors. This new set of dimensions has the effect of giving a somewhat more localized description to the hidden unit patterns, because the new dimensions now correspond to the location of meaningful activity (defined in terms of variance) in the hyperspace. Furthermore, since the dimensions are ordered in terms of variance accounted for, we can now look at phase state portraits of selected dimensions, starting with those with the largest eigenvalues. [ELMAN89, pg.15]
In particular, Elman found that principal components 1 and 11 appear to identify the sentence formulas for the following test sentences:
The trajectories through state space for these four sentences ... are shown in Figure 10 [pg.18]. Panel (10a) shows the basic pattern associated with what is in fact the matrix sentence for all four sentences. ... [T]he matrix subject noun is in the lower left region of state space, the matrix verb appears above it and to the left, and the matrix object noun is near the upper middle region. ... The relative clause appears to involve a replication of this basic pattern, but displaced toward the left and moved slightly downward, relative to the matrix constituents. Moreover, the exact position of the relative clause elements indicates which of the matrix nouns are modified. ... This trajectory pattern was found for all sentences with the same grammatical form; the pattern is thus systematic. [ELMAN89,pp.17-18]
Thus, it appears that another of Harris' predictions is being fulfilled. It is possible to identify the underlying grammatical structure for a simple corpus by induction from the empirical data.