Child/adult sentence generation in typologically-different languages

Bag-of-Words Incremental Generation using Sentence Prediction Accuracy (BIG SPA)

Human syntax acquisition involves a system that can learn constraints on possible word sequences in typologically-different human languages. Evaluation of computational syntax acquisition systems typically involves theory-specific or language-specific assumptions that make it difficult to compare results in multiple languages. To address this problem, a bag-of-words incremental generation (BIG) task with an automatic sentence prediction accuracy (SPA) evaluation measure was developed. The BIG–SPA task was used to test several learners that incorporated n-gram statistics which are commonly found in statistical approaches to syntax acquisition. In addition, a novel Adjacency–Prominence learner, that was based on psycholinguistic work in sentence production and syntax acqui- sition, was also tested and it was found that this learner yielded the best results in this task on these languages. In general, the BIG–SPA task is argued to be a useful platform for comparing explicit theories of syntax acquisition in multiple languages.

Chang, F., Lieven, E., & Tomasello, M. (2008). Automatic evaluation of syntactic learners in typologically-different languages. Cognitive Systems Research, 9 (3), 198-213.

To try out the algorithms, download Processing and open the program

https://processing.org/download/?processing

You will have a folder in your documents folder called "Processing". 

Download Corpus10.zip here.

 Copy and decompress Corpus10.zip into that folder.

Double click on the Corpus10.pde file and the script will open in Processing.

Change the first line to match the path for your Processing folder.

String userdir = "/Users/chang/Documents/Processing/Corpus10/data/";

Press play to run the algorithms against the anne corpus from CHILDES.  The anne corpus is included in the data folder as a file "corenganne".