Becoming syntactic
Chang, F., Dell, G. S., and Bock, K. (2006). Becoming syntactic. Psychological Review, 113(2), 234–272 pdf.
Psycholinguistic research has shown that the influence of abstract syntactic knowledge on performance is shaped by particular sentences that have been experienced. To explore this idea, a connectionist model of sentence production was applied to the development and use of abstract syntax. The model makes use of (a) error-based learning to acquire and adapt sequencing mechanisms and (b) meaning-form mappings to derive syntactic representations. The model is able to account for most of what is known about structural priming in adult speakers, as well as key findings in preferential looking and elicited production studies of language acquisition. The model suggests how abstract knowledge and concrete experience are balanced in the development and use of syntax.
Contents:
1) Software
2) Architecture
4) Input
6) Downloads
7) Simple Message-Sentence Generator
8) Tutorial
Required Software:
LENS - the program that implements the model (version 2.6 or higher)
Perl - creates the input and parses the output (version 5.8 or higher)
R - statistics (optional)
LENS uses tcl/tk, so knowledge of that language is useful for reading the code below.
Architecture of the model:
Single arrows are learned weights. Dashed arrows are fixed copy connections. Thick gray lines are fast-changing message weights. Double arrows are for target comparison (what unit output as target for cwhat unit are not shown).
The LENS code is below:
addNet dualPath -i 20
## these are the input layers, and their sizes are a bit bigger than the number of units in the paper, because the actual mapping\
for linguistic units to model units was slightly different (e.g., sometimes 0 was not used).
set semSize 147
set lexSize 165
set eventsemSize 19
set whereSize 20
## hidden layers (as in paper)
set hiddenSize 40
set contextSize $hiddenSize
set compressSize 20
set ccompressSize 20
## create layers
addGroup cword $lexSize ELMAN ELMAN_CLAMP ELMAN_CLAMP -BIASED OUT_NORM
addGroup ccompress $ccompressSize -BIASED
addGroup cwhat $semSize OUTPUT TARGET_COPY -BIASED -WRITE_OUTPUTS
addGroup cwhere2 $whereSize ELMAN ELMAN_CLAMP ELMAN_CLAMP -BIASED
addGroup cwhere $whereSize SOFT_MAX -BIASED
addGroup eventsem $eventsemSize LINEAR -BIASED
addGroup context $contextSize ELMAN OUT_INTEGR -BIASED
addGroup hidden $hiddenSize -BIASED
addGroup where $whereSize -BIASED
addGroup what $semSize -BIASED
addGroup compress $compressSize -BIASED
addGroup targ $lexSize INPUT
addGroup word $lexSize OUTPUT SOFT_MAX STANDARD_CRIT -BIASED
## parameters for connections
## hystersis 1=copy 0=no change
setObj context.dtScale 1
## connect layers
connectGroups cword cwhat -type cwordcwhat
connectGroups cwhat cwhere -type ww
connectGroups where what -type ww
connectGroups what word -type whatword
connectGroups hidden where -type hidwhere
connectGroups context hidden -type conthid
connectGroups cwhere hidden -type prehid
connectGroups cwhere2 hidden -type prehid
connectGroups eventsem hidden -type esemhid
connectGroups hidden compress word -type hidword
connectGroups cword ccompress hidden -type cwordhid
## connect bias
connectGroups bias eventsem -type bt
connectGroups bias what -type low
connectGroups bias cwhat -type low
## copy output of what units as training signal for cwhat units
copyConnect what cwhat outputs
## create elman unit connections and initial states
elmanConnect targ cword -r 1 -init 0.0
elmanConnect word cword -r 1 -init 0.0
elmanConnect cwhere cwhere2 -r 1 -init 0.0
elmanConnect cwhere2 cwhere2 -r 1 -init 0.0
elmanConnect hidden context -r 1 -init 0.5
## turn off learning for what-where cwhat-cwhere message weights
setLinkValue learningRate 0 -t ww
setLinkValues randMean 0 -t ww
setLinkValues randRange 0 -t ww
## turn off learning for event-semantic weights
setLinkValue learningRate 0 -t bt
setLinkValues randMean 0 -t bt
setLinkValues randRange 0 -t bt
## set bias of what units so that normal activation is low
setLinkValue learningRate 0 -t low
setLinkValues randMean -3 -t low
setLinkValues randRange 0 -t low
## seed and randomize network
randWeights -t low
freezeWeight -t low
Representing the Message in Weights
One novel aspect about the Dual-path model is the use of weights to instantiate the message. This is done using functions that are called within the environment files (*.ex).
Here is an example pattern for the sentence "a dog sleeps".
name:{ a dog sleep -ss }
proc: { clear;link 1 62;link 5 19;plink .33 5 120;targlink 1 5;}
#mes: AINTRANS 1A=SLEEP,PRES,PERF 5X=DOG,INDEF TLINK AINTRANS 1A= 5X=
6
t:{word 1.0} 121
t:{word 1.0} 19
t:{word 1.0} 62
t:{word 1.0} 138
t:{word 1.0} 156
t:{word 1.0} 156 ;
In the proc field, four functions are present.
clear - resets the weights to 0
link(A,B,..) - links the where unit (A) to all the remaining units in the what layer (B,...).
plink(V,A,B,...) - links the where unit (A) with value (V) to all the remaining units in the what layer (B,...)
targlink(A,..) - sets the event-semantic units to alternation parameter scaled value. This value is initially 0.5. When a -1 is in the list, then the value is multipled by the alternation parameter (0.95). So targlink 1 5 -1 8 would mean that the activation of unit 8 is 0.95 of the value of unit 5.
In normal training, tlink is used instead of targlink. It is the same except its alternation parameter is either 0.5 or 0.75.
Here are the functions in the prodc.in file. The proc line above calls these functions before a pattern is processed.
proc clear {} {
randWeights -t ww
randWeights -t bt
}
#strength = 6
proc link {input args} {
global strength
foreach j $args {
setObj what.unit($j).incoming($input).weight $strength;
setObj cwhere.unit($input).incoming($j).weight $strength;
}
}
# prostrength adjusts strength by the first argument prop.
proc plink {prop input args} {
global strength
set prostrength [expr $prop * $strength]
foreach j $args {
setObj what.unit($j).incoming($input).weight $prostrength;
setObj cwhere.unit($input).incoming($j).weight $prostrength;
}
}
# reducer is the alternation parameter, inittlink is the starting value of 0.5
# randlevel is a random value that makes tred either 0.5 or 0.75.
proc tlink {args} {
global reducer
global inittlink
set tstrength $inittlink
set randlevel [randInt 2]
set tred [expr $reducer + $randlevel * 0.25]
foreach j $args {
if {$j < 0} {
set tstrength [expr $tstrength * $tred]
} else {
setObj eventsem.unit($j).incoming(0).weight $tstrength;
}
}
}
# treduce is the alteration parameter for targets = 0.95
# inittargtlink is the starting value of 0.5
proc targlink {args} {
global treduce
global inittargtlink
set tstrength $inittargtlink
foreach j $args {
if {$j < 0} {
set tstrength [expr $tstrength * $treduce]
} else {
setObj eventsem.unit($j).incoming(0).weight $tstrength;
}
}
}
Input for the Model
The model is trained using message-sentence pairs. Here is an example. The first line is the name of the pattern, and shows the sentence. The proc line is the tcl code that sets the message. The third line is a comment that explains what the tcl code is doing in terms of the input grammar. Next is a number that represents the number of words in the sentence (includes two periods). And then the input (i:) and target (t:) pairs. The input is placed into a buffer that is copied to the cword layer on the next timestep.
name:{ mary was hurt -par by the bird #c}
proc: { clear;link 1 77;link 5 3;link 3 22;plink .66 3 120;tlink 1 15 17 5 -1 3;}
#cmes: THEMEEXP 1A=HURT,PAST,INPERF 5Y=MARY, :CAUSE 3X=BIRD,DEF TLINK THEMEEXP 1A= PAST INPERF 5Y= -1 3X=
9
i:{targ 1.0} 3
t:{word 1.0} 3
i:{targ 1.0} 160
t:{word 1.0} 160
i:{targ 1.0} 77
t:{word 1.0} 77
i:{targ 1.0} 155
t:{word 1.0} 155
i:{targ 1.0} 134
t:{word 1.0} 134
i:{targ 1.0} 120
t:{word 1.0} 120
i:{targ 1.0} 22
t:{word 1.0} 22
i:{targ 1.0} 156
t:{word 1.0} 156
i:{targ 1.0} 156
t:{word 1.0} 156 ;
A messageless pattern would be similar except the proc: field would be just "{ clear; }".
Structural Priming
The model is tested for structural priming with this code. The prime is "trained" in the same way that the model use to learn the language.
proc structuralPrimingExp {wtfile set {lag 0} {fillerlist spfillers}} {
setObj batchSize 1
## run all the pairs in the structural priming experiment
repeat [expr [getObj $set.numExamples] / 2] {
## reset network and weights to final state
resetNet
loadWeights $wtfile
## train prime
doExample -train
updateWeights -algorithm steepest
## train fillers if testing lag
if {$lag > 0} {
for {set i 0} {$i < $lag} {incr i} {
doExample $i -train -s $fillerlist
updateWeights -algorithm steepest
}
}
## do target
doExample -train
}
}
Here is passive prime pattern with a transitive target. Notice that the prime is messageless.
name:{ the beer is smash -par by marty #p0}
proc: { clear;}
#pmes: TRANS 1A=SMASH,PRES,PERF 5Y=BEERZZ,DEF :CAUSE 3X=MARTY, TLINK TRANS 1A= 5Y= -1 3X=
9
i:{targ 1.0} 120
t:{word 1.0} 120
i:{targ 1.0} 35
t:{word 1.0} 35
i:{targ 1.0} 153
t:{word 1.0} 153
i:{targ 1.0} 73
t:{word 1.0} 73
i:{targ 1.0} 155
t:{word 1.0} 155
i:{targ 1.0} 134
t:{word 1.0} 134
i:{targ 1.0} 5
t:{word 1.0} 5
i:{targ 1.0} 156
t:{word 1.0} 156
i:{targ 1.0} 156
t:{word 1.0} 156 ;
This target has event semantics that is biased towards an active (hence the name uses the active), but there are also patterns with this target that bias towards a passive.
name:{ the boy hurt -ss a sister #t0}
proc: { clear;link 1 77;link 3 7;plink .66 3 120;link 5 13;plink .33 5 120;targlink 1 3 -1 5;}
#tmes: THEMEEXP 1A=HURT,PRES,PERF 3X=BOY,DEF :CAUSE 5Y=SISTER,INDEF TLINK THEMEEXP 1A= 3X= -1 5Y=
8
t:{word 1.0} 120
t:{word 1.0} 7
t:{word 1.0} 77
t:{word 1.0} 138
t:{word 1.0} 121
t:{word 1.0} 13
t:{word 1.0} 156
t:{word 1.0} 156 ;
Downloads
April 28, 2007: Cleaned up directories: XYZ (52MB) and Trad Roles
Oct 30, 2005: Added tar file of final version of the model: XYZ (102MB) and Trad Roles (92MB)
May 11, 2005: Added download on Simple Message-Sentence Generator.
May 3, 2005: This is a small compressed tar file with a copy of the model.