Frequently Asked Questions

Here are answers to questions that we encountered repeatedly.

1) How can I create my own database and meta-analysis?

InWordDB is a Community-Augmented Meta-Analysis. We have rendered all our data and analysis scripts public and are open to updates (see point 3). If you want to conduct a meta-analysis or even want to publish your own meta-analysis in the same format, detailed instructions can be found on this website.


Tsuji, S., Bergmann, C., & Cristia, A. (2014). Community-Augmented Meta-Analyses: Toward Cumulative Data Assessment. Perspectives on Psychological Science, 9(6), 661-665. doi: 10.1177/1745691614552498

PRISMA statement on structured reviews and meta-analyses, which comes with a checklist and a flowchart

Excel sheet to demonstrate how to conduct a literature search, this sheet can be copied and used for new CAMAs

Template spreadsheet to create your own database. Note that it needs to be adapted to your topic.

2) Why do you (for now) only include studies with natural speech?

Infants' developing word segmentation skills have also been assessed in a related paradigm using artificial speech (e.g., Saffran, Aslin, & Newport, 1996). The present meta-analysis does not include these studies, because they differ from those using native, speech in important ways that would make a joint analysis counter-productive, since a meta-analysis aims at aggregating data on a specific phenomenon. First, target items are either (potential) words in studies using real speech or a sequence of syllables that can usually be freely combined in artificial speech studies. The former can be monosyllabic (the majority in fact are), whereas the latter must contain multiple syllables. Second, grammar (the presence of different word classes, phrasal prosody, etc) is only present in the studies in focus here. In artificial language studies, target words are surrounded by other target words. Third, cues available for segmentation are non-overlapping across study types. Co-articulation, prosodic strengthening, word stress, and other natural cues are usually eliminated in artificial language studies. Statistical cues are much less prominent in most studies using real speech. For now, we limit ourselves to studies using real speech in infants’ native language to avoid a comparison of potentially distinct phenomena that future work (read: expansions of this database) should examine in detail.

If you want to collaborate on such a database or even have conducted a meta-analysis on the topic, we would be thrilled if you could get in touch via (or see point 1).


Jusczyk, P.W. & Aslin, R.N. (1995). Infants′ detection of the sound patterns of words in fluent speech. Cognitive Psychology, 29(1), 1-23. DOI: 10.1006/cogp.1995.1010

Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926-1928.

3) Can I add my data, even if they are unpublished?

Yes, of course. We noted the publication status of all contributed data in the database, so you also know whether someone else has shared their unpublished data with us. In fact, thanks to unpublished lab reports, the database always contained information that did not stem from journal papers, proceedings contributions, or theses.

If you have data you wish to contribute, simply fill out the questionnaire or contact us at

3b) Can I still publish my data in a paper after they have been added to the database?

In short: Yes! Sharing your data here does not count as prior publication, so if you decide to write a paper about your experiment, you can use your data, even if they are already part of the public database. Just notify us if the publication status changes via

4) How can I use the database to determine how many participants I want to test?

You can either check what other people have done or calculate prospective power based on the effect size.

In both cases, you have to download the database from the files site (csv format, you can open it with excel or google docs, for example). For eyeballing what others have done, you can use the filter function of excel or google docs and select the studies that match your plans best. For example, you can select only those studies looking into American English, or those which have the same familiarization-test order (words-to-passage versus passage to words).

If you want to conduct a prospective power analysis, we recommend the use of R.

Again create a subset of the database that fits your study parameters and then compute the effect size (how to do that and how to read in the database is demonstrated in the post hoc analyses R script on the files site).

Then, fill in all your values in this R command:

power.t.test(delta="Median effect size",sd="Expected pooled standard deviation",power=0.8,sig=.05,type=c("paired"),alternative="two.sided")

Warning! The effect sizes are very low! For other approaches to determining sample size that allows one to stop early, try sequential sampling (one explanation with concrete steps to follow, for researchers planning to run either a replication or a novel study, along with example R scripts can be found here).

Our colleagues from the sister database InPhonDB have written a small guide to determining sample size a priori.

5) Can I check for publication biases in the field using this database?

Publication biases are an important issue across sciences. There are a number of ways, this database can help assess whether they are present in the studies included here:

    • Inspect the funnel plot (see the files site, which contains an R script to generate figures). A funnel plot depicts effect sizes versus their standard error. The smaller the error, the closer the effect size is expected to be to the true population mean; and the larger the error, the further away from the population mean an effect size can be. If all results are published, then studies will deviate from the population mean in either direction, whereas if a field of research systematically ignores a certain direction (for example non-significant outcomes or one sign, such as negative effect in studies on treatment-outcome links), then this plot can be asymmetrical.

    • Inspect the p-curve. In the database, t-values and F-scores are given whenever they are reported in the original paper, which can directly be entered into the p-curve online app. For simplicity, we also added a document with all relevant values to the files site and conducted a p-curve analysis, whose output is stored as well.

5b) How can I find out if there are lab-specific effects?

The Lab where a study was (most likely) conducted has been coded in the database. You can introduce "Lab" as a moderator in your own analyses, following the examples given in the scripts on the files site or limit your dataset to that of your favorite lab. However, there are some very prolific labs and others that have published less. We therefore have quite some data from a few and only single datapoints from others. As a consequence, it is difficult to compare labs.

NB: If you have corrections for the current lab coding, please email us at