LitReviewPackage DataSubmission

Please note that the pages on this website do not reflect the methods and definitions that were used to generate the final set of results for the GBD-2010 study, which was published in December 2012.

-- If you are interested in the methods or results of GBD-2010 and subsequent revisions, please visit the official website of IHME's GBD project (http://www.healthdata.org/gbd). The materials described on the website are now obsolete but these webpages have been retained as an internet archive of the work of the group.

-- Please visit the website www.globalburdenofinjuries.org to find out more about other closely related collaborations of our group members.

This document can form the basis of a table used within the literature review to indicate the incidence and prevalence rates of each GBD region. This document can also be used to record data whilst reading shortlisted articles to help keep track of regions for which data is available.

This document also outlines the specifications for data analysis and submission, main features as described by Theo Vos are:

  • Each bit of data should have its own row: for example when reporting the age and sex specific data from each region, although they may come from the same data set, it gets its own separate row. As much detail by age and sex should be given.
  • A minimum set of variables is required for each row.
  • Enough info in each row needs to be added to allow for estimation of uncertainty (mean and SE; or % and 95% CI; or % with SE etc.)
  • Additional variables are added for each study (and the population the study was conducted in) that will be used to derive missing values; these may range from socio-economic/demographic characteristics to study design/quality characteristics.

Proposed GBD2005 Data Submission File Specifications

Written by Theo Vos t.vos@sph.uq.edu

Goal: To provide a specification/template for GBD2005 data submission files that expert groups can use. If data have been collated in other databases with relatively similar structure, please continue using the same as long as the basic information below is available.

Purpose – The GBD2005 project aims to estimate the disease burden associated with more than 200 diseases and their disabling sequelae as well as the size of disease burden that can be attributed to major risk factors. Expert groups are currently collecting as much information as possible on: (a) the occurrence of diseases and sequelae (prevalence and incidence) and other epidemiological disease parameters (remission, mortality risk, average duration, severity distribution); (b) prevalence of exposure to risk factors; and (c) the risk of disease by level of exposure. From the ‘raw’ data collected during the systematic reviews, estimates will be derived for 21 World regions by age and sex with uncertainty intervals. This requires a number of imputations to deal with missing values and to check available estimates, especially for diseases, for internal consistency. An updated version of DisMod is imperative in this process for disease estimation. DisMod will have a ‘data pre-processing’ function to: (a) determine common age patterns; and (b) impute regional estimates from the data supplied by the Expert Groups from their systematic reviews. Next, it will allow checks of internal consistency between the available disease parameters and, finally produce an internally consistent full set of epidemiological parameters describing a disease and its disabling sequelae.

The Comparative Risk Assessment will use similar imputation techniques to derive regional estimates of risk factor exposure; that will be outlined in a similar but separate data submission file specification.

Even for diseases that do not use DisMod, we propose this format for all submissions of data, e.g. to deal with age patterns and to estimate uncertainty.

Minimal Required Information – For DisMod and other analysis purposes, we will need certain minimal information about the parameter of interest (e.g. incidence, remission, case fatality, etc) or proportion (e.g. prevalence). Currently these are:

GBD Cause and/or GBD Cause Code or Sequela; Case Definition; Parameter; Sex; Country; Coverage (defined as being national, regional, or community representative); urban/rural/both); Age Start & Age End for the estimates provided; Estimate Year Start & Estimate Year End; Parameter Value; Units; Parameter uncertainty (expressed as (a) parameter numerator & denominator; (b) Lower & Upper Value of e.g. a 95% Confidence Interval; (c) a mean parameter value and a standard error; (d) any other way of quantifying sampling error; and (e) if applicable, whether the sampling design of the study influences the calculation of sampling error, e.g. the design factor applied in a cluster sampling survey); and Citation.

Additional Information - There is disease-specific additional information that expert groups will want to include in their models, which may be used to determine the plausible estimates for GBD Regions without sufficient data available or to control for excess variance. This information may vary depending on the particular parameter or disease in question or describe the population from which the information is derived, but some examples are: GDP/level of education/urbanicity etc of study population (these data are available from the GBD Core group for national level and can be accessed from the GBD website or through the cluster leaders), study type (cross-sectional vs cohort vs ...), measurement technique (e.g. various diagnostics for diabetes), and quality of study (representativeness, case definition, confounding, etc. or some aggregate quality score).

Most disease parameters will have an age pattern that is consistent between studies from the same region or globally. Obviously, rates from different age ranges in the same study would suggest an age pattern much more strongly than if they come from different studies. Therefore, ideally, we want information on each parameter in the greatest detail possible by age and sex.

Proposal:

For each cause, the expert group collects the parameter information as rows in a csv file (or a format that can be converted to a csv file, such as Access or Excel) with certain required columns and any additional columns that they think might be important. When in doubt, the advice is to leave variables in as additional optional columns. It is easier to ignore some variables than to add additional variables later.

Proposed required columns (with example data):

Further explanation of allowed values in the columns are:

Example of optional columns (from Bipolar Prevalence Dataset):

Literature Review Package - Table of Contents