Infant Word (Segmentation) DataBase Home

One of the first things infants seem to extract and encode in their memory are recurring words. In this database, we nucleate meta-analyses bearing on how infants encode words. We have started with an analysis of studies on infants' extraction of words from native, natural speech as evidenced in paradigms built on listening times. 

How to get the Database?

The most recent version of the database is always accessible and freely downloadable from Metalab (click on Visualization, select "word segmentation" and download the raw data). The downloaded data are so-called .csv files (comma-separated values), a format that can be processed by many different programs. You can open the file in Excel to browse its contents or read it into R. 

Along with the database, we provide scripts for preprocessing and to conduct different analyses. This primer explains the scripts step by step. (Note, in the most recent versions, names have changed in the datafile to ensure compatibility with the Metalab platform). 

Frequently Asked Questions

We have responded to a few questions that we received, feel free to submit your own questions to

Right now, the FAQ includes the following questions:

1) How can I create my own database and meta-analysis?
2) Why do you (for now) only include studies with natural speech? 
3) Can I add my data, even if they are unpublished?
    3b) Can I still publish my data in a paper after they have been added to the database?
4) How can I use the database to determine how many participants I want to test?
5) Can I check for publication biases in the field using this database?
    5b) How can I find out if there are lab-specific effects?

What is a Community-Augmented Meta-analysis?

InWordDB is one of several community-augmented meta-analyses. The concept of such an open database and meta-analysis, shortened to CAMA, is explained in Tsuji, Bergmann, & Cristia (2014)

You can find more information at

The CAMA idea has led us to conceive of Metalab, where multiple CAMAs on language development are collected in a compatible format.

I want to expand this Database!

We welcome volunteers to add other pieces of the puzzle (e.g., infants' extraction of words from artificial mini-languages using statistical cues). 

Just get in touch with us via

Subpages (2): Contact Contribute