Simple Message-Sentence Generator

To make it easier to generate simple languages for training and testing the model, I've created a simple version of the system that I use to generate the input environment files. This is not the same system that generated the language in the Becoming Syntactic paper.The file that specifies the language is the environment grammar file (envgram). It has three parts: categories, constructions, sent-rewrite. The category section specifies the words that are part of particular semantic categories (e.g., man is in the LIVING category). The first word is the category name, the second word is the syntactic category name (used later to parse the output of the model), and the list of words in the category.

categories{ # these are the semantic categories that are used for selecting words for messages

LIVING NOUN man woman cat dog

NONLIVING NOUN ball stick toy kite

INTRANSVERB VERB sleep jump

TRANSVERB VERB throw hit kick

DET DET the a

AUX AUX is are was were

BY BY by

THAT THAT that

PER PER .

PAR PAR -par

ING ING -ing

ED ED -ed

BEING BEING being

The next section in the category section specifies the event semantic units. Here there is only the category label and the list of event-semantic units that make up the category.

#event semantics

TENSE PRES PAST

ASPECT SIMP PROG

XX XX

YY YY

ZZ ZZ

CC CC

DD DD

Finally, the role units in the model are specified on a special line.

#roles: A X Y Z B C

}

The construction section specifies the message-sentence pairs that make up the syntax of the language. The mess line is make up of a list of roles=semantics pairs (e.g., A=INTRANSVERB, the action is a verb in the INTRANSVERB category). In language generation, the system will randomly pick a word from the semantic category and link it to the role. The E= role is a special role representing event semantics, but the same process of random replace happens here, except when -1 is seen (this changes the activation level of event semantic units).

The sent line represents the mapping of role information into sentence positions. X1 is the element in the X roles list of concepts that corresponds to position 1 (starting from 0). In the case of X=LIVING,DET, X0 is the LIVING argument (noun) and X1 is the DET argument (determiner). Lowercase elements are left as words in the sentence (make sure that they are in the category section, otherwise the model will not have word units for these elements).

constructions{ ## these are the message-sentence pairs.

mess: A=INTRANSVERB X=LIVING,DET E=TENSE,ASPECT,XX

sent: X1 X0 A0 E0 E1 .

# the cat sleep -s

mess: A=TRANSVERB X=LIVING,DET Y=NONLIVING,DET E=TENSE,ASPECT,XX,YY

sent: X1 X0 A0 E0 E1 Y1 Y0 .

# the dog throw -s the stick

mess: A=TRANSVERB X=LIVING,DET Y=NONLIVING,DET E=TENSE,ASPECT,YY,-1,XX

sent: Y1 Y0 is E0 E1 A0 by X1 X0 .

# the stick is throw -par by the dog

## embedded clause example

mess: A=TRANSVERB X=LIVING,DET Y=NONLIVING,DET B=TRANSVERB C=NONLIVING,DET E=TENSE,ASPECT,XX,CC,YY

sent: X1 X0 that B0 C1 C0 A0 E0 E1 Y1 Y0 .

# the dog that throw the toy hit -ed the ball

}

Languages often have rules that are not simply mappings from the semantics. For example, tense and aspect interact in different ways with main verbs and auxiliary verbs. These interactions are specified with rewrite rules that changes the surface structure of the sentence.

sent-rewrite{ ## these rewrite rules are applied only to the sentence and can be used to implement some of the language specific changes that are not

represented in the message.

s/is PRES PROG (\S+)/is being $1 -par/;

s/is PAST PROG (\S+)/was being $1 -par/;

s/is PRES SIMP (\S+)/is $1 -par/;

s/is PAST SIMP (\S+)/was $1 -par/;

s/(\S+) PRES PROG/is $1 -ing/;

s/(\S+) PAST PROG/was $1 -ing/;

s/PRES SIMP/-s/;

s/PAST SIMP/-ed/;

}

Running the generator:

Normal usage:

generate.perl -s 10 -n 100 | translate.perl > train.ex

generate.perl -s 20 -n 100 | translate.perl > test.ex

The -s argument allows one to change the random number generator seed (to get the same or different random utterances). The -n argument specifies the number of patterns to generate. The translate.perl file changes the output of generate into LENS input files. To move the finished files into the model directory, just type the copy command "cp train.ex test.ex ..".