Dual-path model

Readings:

Chang, F. (2002). Symbolically speaking: A connectionist model of sentence production. Cognitive Science, 26(5), 609-651.

You will need to save the day3.tar.gz file and decompress it as before (tar -zxvf day3.tar.gz).

Here are the files:

dualpath3.in # model file for dual-path model

train.ex # same training and test files as before, but with messages.

test.ex

decode9.perl # translates model's activations into word sequences

syncode.perl # codes word sequences with syntactic categories

arcthis.perl # saves everything in a separate diretory, cleans up main directory

Here we will try to better understand how the Dual-path model works by testing novel verb-structure pairings and then by examining the code in more detail.

First, train the model by typing:

lens -b dualpath3.in 'trainSave;exit' &

The test set here is a set of sentences with the novel verb glorp. Look at the model's testing output and compare with the overall accuracy for the training set. In the test output files (sum40000test), there is a section at the bottom that records the sentence accuracy in terms of the different target structures in the model.

##results

## struct DET NOUN AUX BEING VERB MOD BY DET NOUN PER PER corr c 27 t 36 perc 75%

## struct DET NOUN AUX VERB ING DET NOUN DET NOUN PER PER corr c 23 t 41 perc 56%

## struct DET NOUN AUX VERB ING DET NOUN PER PER corr c 43 t 43 perc 100%

## struct DET NOUN AUX VERB ING DET NOUN TO DET NOUN PER PER corr c 29 t 40 perc 72%

## struct DET NOUN AUX VERB ING PER PER corr c 45 t 46 perc 97%

## struct DET NOUN AUX VERB MOD BY DET NOUN PER PER corr c 44 t 44 perc 100%

## struct DET NOUN VERB ED DET NOUN DET NOUN PER PER corr c 19 t 19 perc 100%

## struct DET NOUN VERB ED DET NOUN PER PER corr c 8 t 18 perc 44%

## struct DET NOUN VERB ED DET NOUN TO DET NOUN PER PER corr c 23 t 26 perc 88%

## struct DET NOUN VERB ED PER PER corr c 1 t 12 perc 8%

## struct DET NOUN VERB SS DET NOUN DET NOUN PER PER corr c 14 t 14 perc 100%

## struct DET NOUN VERB SS DET NOUN PER PER corr c 17 t 24 perc 70%

## struct DET NOUN VERB SS DET NOUN TO DET NOUN PER PER corr c 13 t 19 perc 68%

## struct DET NOUN VERB SS PER PER corr c 18 t 18 perc 100%

##gram=_ corr c 342 t 400 perc 85%

##sent=_ corr c 324 t 400 perc 81%

##word=_ corr c 3730 t 3817 perc 97%

Examine novel word prediction

Open the gui and load the last weight set: lens -c dualpath3.in 'loadWeight comp40000.wt.gz'

Open up the unit viewer and examine how the model processes different training sentences. Now change the example set to the testing set and examine how testing sentences are processed.

What you will notice is that the model predicts most of the words correctly, except for the novel word glorp. And you will see that the GLORP semantic unit is activated in the what layer at the position where the verb should be predicted, since the action where unit (A) is activated at that point.

To predict a novel word, one has to learn to map from the word's semantics to the word's form. In experiments, children are taught the meaning of a novel verb before the experiment begins. So they might see the action, and then hear "Look! Glorping!" To simulate this novel word training, we create a link between the glorp word and the glorp semantics by typing this:

setObj word.unit(31).incoming(31).weight 10;

Now if you go back to the unit viewer and click on the sentence again, you should see that the model has correctly activated the word glorp at the appropriate position in the sentence. This allows the model to predict novel verbs in structures that it has learned previously.

Production vs. Prediction

So far, all of the tests that we have done have the previous external input (i:). In the real world, this approximates the situation where one is passively listening in to another person's utterance and making implicit predictions about that utterance. This is in contrast to production, where the speaker (or producer) has to make sequences that depend on the previous words that the speaker has said. Assume that a speaker is trying to say "The dog chased the cat", but starts the sentence with "The cat" instead. To convey the same initial meaning, production needs to continue with "was chased by the dog". The cword->cwhat->cwhere links support this behavior, by telling the model the role of the previous word that was produced. You can examine how the model does this by looking at the testprod.ex file as well as the sum*testprod files that are produced after training. Also, start up the model (type: lens -c dualpath.in 'loadWeight comp40000.wt.gz") and examine how it produces the testprod examples (open unit viewer, change to example set testprod). In the figure below, the model has just produced "The kite" as the subject. Since the word kite activates the semantics KITE, and KITE is the patient in the particular sentence (Y is activated in the cwhere layer), the system knows that it needs to produce a passive.

Vary Layer Sizes in Architecture

We will be changing the dualpath3.in file, so we better archive the complete directory first, using (typing) arcthis.perl.

Now you might want to save a copy of the original dualpath3 by typing: cp dualpath3.in dualpath3.orig

Open the file "dualpath3.in" in a text editor.

At the top are parameters that define the size of different layers. You can modify the hidden layer sizes.

## hidden layers

set hiddenSize 20

set contextSize $hiddenSize

set compressSize 10

set ccompressSize 10

Try changing the compressSize and ccompressSize to 5 or to 15. Or change the hiddenSize value to 40 or 10.

Then train the model: lens -b dualpath3.in 'trainSave;exit' &

Finally examine the accuracy for both the training and the test files.