Create your own Cama

Questions, comments? Don't hesitate to get in touch with us: {tsujish | chbergma | alecristia}
Please also check out the accompanying article.

Community-augmented meta-analyses (CAMAs) like the one introduced on this site are a simple tool to significantly facilitate the accumulation and evaluation of previous studies within a specific scientific field. Based on our own experience, we have created a step-by-step tutorial of how to create a CAMA. Please note that we followed the PRISMA statement on structured reviews and meta-analyses for the meta-analytic part of our CAMA. We strongly recommend to refer to the PRISMA checklist and flowchart in addition to our tutorial.

This tutorial is rather lengthy, so you may want to look at specific sections depending on your interests:

  • Part A: Assembling a set of relevant studies: steps 1-3
  • Part B: Create a database of relevant variables: steps 4-6
  • Part C: Share your database online: step 7

Part A: Assembling a set of relevant studies

1. Narrow down the topic

The topic should be narrow enough that methods and outcome variables are comparable, but broad enough to address other researchers visiting different questions with comparable techniques, and who can profit from and enlarge the CAMA.
Narrow enough: InPhonDB and InWordDB both focus on infant speech perception. However, the former focuses on the ability to discriminate speech sounds and the latter on the ability to segment words out of fluent speech, two different abilities that are assessed in distinct basic paradigms.
Broad enough: All studies in InPhonDB have in common that (at least) two speech sounds are presented to infants and their discrimination abilities are assessed. However, these studies use diverse methods ranging in manipulation from habituation to conditioning, and dependent measures ranging from looking times to hemodynamic responses. 

2. Conduct a literature search

Conduct a systematic literature search in one or multiple search engines such as Google Scholar*, Science Direct, Web of Science, or PubMed. Use a keyword combination that is as inclusive as possible (this means you will get a large number of false positives you will need to weed through, but reduce the chances of failing to detect a study). We also set up search alerts to identify us via email in case new research with our keyword combination was published. 
We used the keyword combination "ALL (vocabulary infant speech perception longitudinal AND LIMIT-TO (topics, "infant,language development,language acquisition,speech perception")) "infant,language development,language acquisition,speech perception")) for InVarInf (on Science Direct, yielding 567 results), and "{infant|infancy} & {vowel|speech sound|syllable} & discrimination" (on Google Scholar, yielding 1340 results) for InPhonDB.

*Note that Google Scholar lists both research that is published in journals and unpublished research (which is available and identifiable online), which is relevant for assessing the possibility of bias in reporting.

3. Converge on final sample

This step includes both excluding irrelevant studies and adding further relevant studies. In order to select the relevant studies out of the studies retrieved in step 2, we read through the titles/abstracts of each study and accessed all which were potentially relevant*, excluding further studies after reading through the actual articles.

Further relevant studies can be identified by looking through the reference list of relevant studies, as well as contacting experts. We defined experts as researchers that are first or last authors in at least two relevant studies, and who are still active in the field or can be contacted otherwise. Of course, you might also know of additional potential experts that are active in a closely related field.

The PRISMA flowchart is helpful for keeping an overview of exclusions and additions to the search sample. In addition, we kept excel sheets (click for an example you can copy and use) to keep track of all potentially relevant studies.

*Fellow researchers/authors can be contacted where there is no access, and if accessing the articles is still not possible this should be noted in the search protocol.

Part B: Create a database of relevant variables

4. Create coding form and enter study information

Here we code potentially relevant moderator variables (e.g., participant characteristics, experimental manipulation) and outcome variables (means, outcomes of statistical analyses). While the relevant variables might be fixed a priori for a regular, narrowly focused meta-analysis, the primary goal of a CAMA is to provide an exhaustive assembly of data that can serve multiple research interests and purposes. Therefore, it is important to code as many variables as possible. Nevertheless, too many variables can be difficult to overview for users and demotivating to new contributors, and uninformative variables should therefore be skipped.
We adopted a two-step procedure to determine the relevant columns to code. In a first step, we coded all moderator and outcome variables per experiment. In a second step, we removed predictor variables that were only reported in a small subset of studies (for instance, size of presentation screen, signal-to-noise ratio), or that turned out to be non-significant predictors of effect size.
We entered studies in spreadsheet format, where each row represented one unique measurement sample (e.g., one group of participants measured in one condition), and each column represented a moderator or outcome variable. Each topic requires a different amount of coded variables, as you can see by comparing the spreadsheets for InWordDB, InVarInf and InPhonDB. You may want to start your own spreadsheet from scratch, or take a look at our template spreadsheet containing suggestions for basic columns we consider useful for a wide range of topics.

5. Calculate effect sizes and weights

With all the study variables coded, the last step to obtaining a 'meta-analyzable' CAMA is the transformation of all coded outcome variables into effect sizes (which provide a comparable metric to quantitatvely express the strength of your effect). You will also need a weight for each of these effect sizes (which indicates how strongly this particular entry should be weighted in relation to other entries). What kind of effect size fits best your data format and how exactly you calculate it varies by the type of data you are looking at. We recommend novel CAMA creators to consult textbooks1 and primers2 for a general overview, and scientific articles3 for more specific questions.

You can access our R4 (R core team, 2014) analysis scripts (for InVarInf, InWordDB, InPhonDB) in order to get the code for calculating the effect sizes and accompanying weights we use in our CAMAs (Cohen's d, Hedge's g, Pearson's r). We recommend you to go through the scripts (using our CAMAs or your own brand-new database) in order to get a hands-on understanding of how that works.

Complementary to that, we here provide some descriptive examples to give you an idea of how to get from specific outcome variables to an effect size.

Example 1: correlational
Most correlative studies contained in InVarInf already contain the critical effect size, Pearson's r. All that needs to be done is thus to calculate a weight. The weight is related to the number of observations (i.e., if vocabulary was measured at 12, 16, and 24 months, but at each later age there were fewer babies who came back, then the earlier effect sizes will weigh most)

Example 2: mean difference, between-participants
Some studies in InPhonDB were between-participant studies. Means and SDs for each group are all that is required for the calculation of Cohen's d. Sometimes, means and SDs are not available as numbers, but depicted in a figure. If it is not possible to retrieve the numbers from the authors, they can be eyeballed, by using a graphical program to precisely relate the axis values to the graphs. Finally, t or F values in combination with sample sizes can be used to calculate Cohen's d. We transformed Cohen's d into Hedge's g in most cases, which entails the application of a correction factor for small samples (small samples get a lower effect size estimate).
One common measure for effect size weighting is the standard error of the effect size, the formula for which relates sample size and effect size (larger samples get higher weight).

Example 3: mean difference, within-participants
The effect sizes in InWordDB are based on mean differences in within-participant studies. Effect sizes for this type of study are calculated the same way as in between-participant studies, but in order to calculate the weight of these studies the correlation between the first and second measurements is required (to account for the amount of within-participant variation). This measure is usually not reported, therefore we contacted authors to ask for this piece of information. Since not all authors replied, we needed to estimate the remaining correlations. Missing values can be replaced by median correlation of available values, or by using their distribution to impute correlation values, for instance with the impute function in the Hmisc package (Harrell, 2013), which samples randomly from the available observation
1 Textbooks are great to get a basic overview of how to calculate effect sizes. We consulted: Lipsey, M. W. & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA: Sage.
2 D. Lakens. (2013). Calculating and Reporting Effect Sizes to Facilitate Cumulative Science: A Practical Primer for t-tests and ANOVAs . Frontiers in Psychology 4:863. Material:
3 Since textbooks do not cover every possible question that different meta-analysts may encounter, we turned to articles for more specific questions. We found this article useful for considering the comparability of effect sizes from within- and between-partcipant designs: Morris, S. B., & DeShon, R. P. (2002). Combining Effect Size Estimates in Meta-Analysis With Repeated Measures and Independent-Groups Designs. Psychological Methods, 7(1), 1805-125. doi: 10.1037//1082-989X.7.1.105
4 If you are not familiar with R, you can perform the same calculations in programs like excel as well.

6. Meta-analysis

Having put together a spreadsheet that contains moderator variables, effect sizes and their weights, all ingredients for a meta-analysis are in place. The main pieces of information we want a meta-analysis to convey is the size of the overall effect, whether there is heterogeneity in the sample, and if so, which moderator variables can explain parts of this heterogeneity.
We refer to the same literature and CAMA scripts as in step 5 to get background knowledge and hands-on experience with conducting meta-analyses. In addition, we again provide a short descriptive illustration.

Meta-analytic regressions can easily be performed with the metafor (Viechtbauer, 2010) package in R. The rma function is constructed similarly to a regular regression function, for instance: rma(EffectSize, sei=EffectSize.SE, data=database, weighted=TRUE).
Here, the EffectSize is the dependent measure, and it is weighted by the SE of the effect size (one common way of weighting). Performing this calculation results in a value and significance test for the mean weighted effect size, as well as measures of sample heterogeneity. These results can inform us whether there is an overall significant effect, and whether the inclusion of moderator variables is justified (yes if there is significant heterogeneity). If the latter, the rma function can be used again, now containing hypothesized moderator variables: 
rma(EffectSize, sei=EffectSize.SE, data=database, mods=~m1*m2, weighted=TRUE). 

Part C: Share your database online

7. Open your database to the public

Finally, you want to make your database available to the research community. We used google sites, since they allow the set-up of a multitude of online tools (questionnaires, forms, downloadable material) quickly and easily.*
With a google account, you can easily create a site -- you can even use our CAMA template (further instructions are found there). We strongly recommend you include the following information:
  • A general description of the CAMA
  • A link to the downloadable database and descriptions of database fields
    • Our own CAMAs are hosted in google drive so that several people can have access to them and changes are immediately public. 
  • A form for new contributions
    • For two of our CAMAs, we created online forms to contribute new data (InWordDB form, InVarInf form). New contributions will be stored in a spreadsheet in your google account, with each entry being a row, and each element within an entry being stored in a different column. You can choose whether the entries are directly stored in the downloadable CAMA (we do this for InVarInf and in the sample template), or whether they are stored in a separate spreadsheet and you add them after a CAMA moderator has checked them (we do this for InWordDB).
    • For the most complex of our CAMAs, we created downloadable spreadsheets that contributors can send back to our CAMA email address after filling them in (contribution page for InPhonDB). You can also create additional subpages to describe some possible uses of your CAMA or post important additional documents.
* Note that a webpage created with google sites will, however, not be accessible in some countries.

Additional material mentioned in this tutorial: