Among the problems that will face the modellers following Elman's research program will be the selection of the appropriate kind of data to use in training their models. There are two possible choices for training data: (1) actual examples from a human language, or (2) an artificial sample generated from a standard grammar. The problem here is that using the latter presupposes the deductive base principles underlying the artificial grammar.
Assume a connectionist modeller begins an experiment by generating training sentences from a grammar. Then these sentences are used to train the model. Finally, the model is tested by using actual sentences from the language. This seems like a perfectly valid research program, but it is fatally flawed. The problem is that the model will learn the formal system, not the language. If the model performs perfectly well in the test sentences given it, this says nothing about the validity of the model as a whole. At any time, we might encounter a series of sentences that the model could not process correctly. The only value in this model would be in refuting the formal system under study. Since this kind of negative evidence could easily be generated without the construction of such a model, this seems to be a fruitless enterprise.
To be fair, I should note possible alternative motives for creating such a model. For classical (i.e., Chomskyan) linguists, such a model would show that the formal system under study could in fact be created using "neuron-like" machinery. This could aid in establishing the psychological reality of the model. Further, such a simulation could be used to show that the behavioral consequences of the formal system parallel human behavior with the target language. But, again, the object of study is the formal system, not the language itself. On the other hand, this kind of model could be used by a connectionist to show that his/her model can do at least as much as a classical model can.
The other alternative is to use actual language sentences to train the connectionist model. The problem here is that the classical Chomskyan linguists have nothing to offer our modeller in this case. One of their key tenets is that language principles are not induced from the empirical language data. Thus, our modeller is left without any theorectical underpinnings to aid his/her research. He/she will be like Thomas Kuhn's pre-paradigm scientists:
In the absence of a paradigm or some candidate for paradigm, all of the facts that could possibly pertain to the development of a given science are likely to seem equally relevant. As a result, early fact-gathering is usually restricted to the wealth of data that lie ready to hand. The resulting pool of facts contains those accessible to casual observation and experiment together with some of the more esoteric data retrievable from established crafts ... [KUHN, pp.15]
In this case the craft will be connectionist modelling, and the data that has emerged thus far from sentence processing models has been diverse and difficult to analyze.
But, we must recall, that linguistics is not a pre-paradigm science. It is, in fact, a science with its paradigm -- generative grammar -- in crisis. But paradigms under attack can prove to be quite resilient. Kuhn says the following about scientists struggling through such a crisis:
Though they may begin to lose faith and then to consider alternatives, they do not renounce the paradigm that has led them into crisis. They do not, that is, treat anomalies as counterinstances, though in the vocabulary of philosophy of science that is what they are. ... [O]nce it has achieved the status of paradigm, a scientific theory is declared invalid only if an alternative candidate is available to take its place. ... The decision to reject one paradigm is always simultaneously the decision to accept another, and the judgement leading to that decision involves the comparison of both paradigms with nature and with each other. [KUHN, pg.77]
If the generative grammar paradigm has no assistance for our connectionist modellers, is there any other source that might help? The answer is yes. As Kuhn has noted, when paradigms begin to lose their dominance, the research they guide increasingly resembles "that conducted under the competing schools of the pre-paradigm period" [KUHN, pg.72]. The school of linguistics that preceded the Chomskyan paradigm was post-Bloomfieldian structuralism, and one of its foremost practitioners was Zelig Harris. In fact, Harris has continued to practice and refine structural linguistics throughout the Chomskyan revolution. S.-Y. Kuroda notes:
... the difference between Harris and Chomsky turns on the notion of grammar. Harris was one of the foremost methodologists in post-Bloomfieldian taxonomic structuralism; he brought it to a completion by his work Methods in Structural Linguistics in 1947. Harris attempted to extend the taxonomic methodology of descriptive linguistics to discourse analysis around 1950, but by 1960 he had virtually returned to the study of grammar by developing [his] transformational theory, without explicitly dissociating himself from his past methodological stance. Chomsky, in the meantime, abandoned taxonomic methods of structural linguistics in the early 50's and launched into the construction of the theory of transformational generative grammar under a "realist" and pyschological interpretation of linguistic theory. [KURODA, pg.45]
Expounding on the differences between Harris and Chomsky, Kuroda says
Harris's [transformational] theory is directed to the structure of correspondence that underlies the syntactic design of language. ... Correspondence and derivation are two dynamic forces that shape the formal design of human language, and it is a major task imposed on linguistic theory how to determine the sphere of influence of these contending forces. Harris' transformational theory took the form it did to respond primarily to the former, and Chomsky's initial formulation of transformational generative grammar, to the latter. The later development of transformational generative grammar may to a large measure be looked upon as testimony to the linguist's response to a tension produced by two contending forces. [KURODA, pg.6]
In further examining the history of generative grammar, Kuroda notes:
Chomsky is reported to have ... expressed the opinion that "the history of transformational grammar would have been more 'rational' if generative semantics had been the original position ..." ... [A] development from generative semantics through the Standard Theory and then to the Government and Binding Theory is easy to imagine as a rational history of transformational grammar ... If what interests us is a conceivable ideal history, ... one might be able to imagine a path from Harris' ... conception of transformational theory to the present [i.e., Government and Binding Theory] and to the future, without going through the idea of transformational generative grammar ... [KURODA, pg.47]
Thus, it appears that Chomsky's theory, emphasizing derivation, and Harris' theory, emphasizing correspondence, are two possible trails leading to the same end. What is important here for connectionist modellers is that Harris' theory gets to the common goal via the study of actual language performance. Thus, Harris' theory may provide connectionist modellers the appropriate guidance to be successful in developing their grammar models. Further, Harris' theory provides specific guidance as to what types of internal representation might be expected to emerge from these models. This should guide the modellers as they attempt to analyze their models' performance.
Below, I will examine two specific Elman models. Both of these models take sentences, represented by a stream of words, as their input. The "simulations address problems in the distinction between type and token, the representation of lexical categories, and the representation of grammatical structure." [ELMAN89, pg.1]
At the core of both simulations is the way words interact in the sentences of a language. Harris' theory is also built on a foundation of word interactions. Harris' theory postulates the emergence of "grammar-like" behavior from these low-level interactions. This too, appears to be happening in the models under review. I will have more to say about each model in turn.