Variable Definition Sheets

Variable Definition Sheets (VDSs) are single document summarizes of how raw data are transformed into analytic variables. VDSs can form an important component of study code books and play an important role in supplying tidy data. The prescribed process of generating a VDS, described below, ensures good communication between data analysts and investigators and ensures that investigators are aware of all assumptions and challenges involved in coding the analytic variables in a research study. Here is an example:

The principles for drafting a VDS include that the data analyst is responsible for creating the VDS, and ensuring that all components are complete and accurate. The investigator is responsible for providing essential information (the description of the variable, the original citation) and for reading and approving all components of the VDS before the VDS can move out of the "draft" stage and considered "approved".

The components of a VDSs include:

  1. A description of the process by which the VDS was created (the investigator initiating the variable request, the analyst who coded the variable, the process by which the VDS was approved by the investigator(s)).
  2. Citations to original sources (e.g., published papers) to the coded variable as appropriate.
  3. The name of the variable defined. We append a "vd" to the front of all variables defined and documented using VDS. This can help easily identify tidy variables.
  4. A description of the variable and how it was coded. This description contains minimal technical information and should be interpretable by a non-scientist. This description should allow a person to interpret the values or quantity of the variable.
  5. Pseudo code. This is slightly more technical information that the description, but is an attempt to represent using simple rules of logic how the variable is constructed.
  6. An explicit statement of how missing data are handled in the creation of the VDS variable.
  7. The exact computer syntax used to create the variable
  8. Descriptive information on missing data
  9. Descriptive information on the derived variable
  10. A snapshot or copy of the data collection form used to collect information used in constructing the VDS variable

The process of generating a VDS are:

  1. An investigator makes a request for the development of a constructed variable. This request includes:
    • Citation to original source (and copy of original source if feasible to obtain)
    • Description of the variable (this is the first draft of component #4)
    • Instructions for how missing data are to be handled (this is the first draft of component #6)
  2. The data analyst starts to prepare the VDS document, and revises contributions provided by the investigator as appropriate and completes components #1-#6.
  3. The data analyst and investigator(s) meet and discuss the plan for the variable and review components #1-6 & #10. Revisions are made as appropriate.
  4. The data analyst creates the code, and generates the entries for components #7-#9.
  5. The data analyst and investigator(s) meet and discuss the complete VDS draft. Revisions are made as appropriate, and this process is repeated until the VDS is approved by the investigator.
  6. The data analyst makes final edits to component #1 of the VDS, and posts or archives the VDS.

Rich Jones

7 May 2017