Call for data - Mexico hospital example

Process for Contributing ICD 10-Coded Hospital Morbidity Data:

The following text describes the process by which hospital data is transformed into non-fatal nature of injury/external cause-linked tabulations for injuries by age and sex, according to Global Burden of Disease Project external cause and nature of injury sequelae groupings. Mexican hospital data is used as an example.

*For generic script for your use see attachments [ICD10_GBD_Mapping_Part1_Generic_Script.do & ICD10_GBD_Mapping_Part2_Generic_Script.do].

*For full example script and output (.csv or STATA) see attachments [ICD10_GBD_Mapping_Part1_Mexico_Script.do, ICD10_GBD_Mapping_Part2_Mexico_Script.do, & Mexico_Injury_Tabs.csv or Mexico_Injury_Tabs.dta].

Process steps:

[Note: Stata code is shown in italics.]

(1) Identify variables of interest (age, sex, external cause, first three diagnosis codes, fatal/non-fatal, and patient type [inpatient or admitted vs. outpatient or ambulatory care]):

In the Mexican Hospital dataset, we identified the relevant variables (edad, cveedad, sexo, motegre, afecprin, afec01, afec02, afec03, and causaext) and translated Spanish names into names used by our script. These variables are age, age_key, sex, reason for exit (i.e. fatal or non-fatal), principle diagnosis code, first diagnosis code, second diagnosis code, third diagnosis code, and external cause. There is no patient type variable, as the dataset is an admitted registry only. Age, sex, first, second, and third diagnosis codes, and external cause are obviously important for tabulation. Age_key is important for this dataset because the values of age are unitless. Age_key is a categorical variable with values equal to hours, days, months and years (see ICD10_GBD_Mapping_Part1_Mexico_Script.do). Thus, age_key = 1 / age = 8 corresponds to 8 days, whereas age_key = 2 / age = 8 corresponds to 8 weeks. Check to see if an age-key variable exists in your hospital dataset and adjust the script accordingly. Additionally, the Mexico dataset includes a variable for principal diagnosis, which equals the first diagnosis. However, upon dataset exploration it was discovered that, even if a principal diagnosis was listed, in some cases, first diagnosis was missing. In these cases, first diagnosis was replaced with the value from principal diagnosis. Check to see if this is the case with your hospital dataset.

NOTE: You may choose to do these steps differently, but you must document your methodology and communicate it to us (kavi_bhalla@harvard.edu).

rename edad age

label var age "Age"

rename cveedad age_key

label var age_key "Key of age"

rename sexo sex

label var sex "Sex"

rename motegre discharge_reason

label var discharge_reason "Reason of exit"

rename afecprin cause_main

label var cause_main "Primary diagnosis: nature of injury code (S-T)"

rename afec01 cause_x1

label var cause_x1 "First diagnosis: nature of injury code (S-T)"

rename afec02 cause_x2

label var cause_x2 "Second diagnosis: nature of injury code (S-T)"

rename afec03 cause_x3

label var cause_x3 "Third diagnosis: nature of injury code (S-T)"

rename causaext cause_ext

label var cause_ext "External cause code (V-Y)"

.........................................................................................................................................................................................................

(2) Apply inclusion criteria

The inclusion criteria for GBD Injury hospital data requires:

(a) Keeping only non-fatal cases

(b) Keeping only cases presenting at first point of contact with the healthcare system (e.g. "emergencias" for Mexico)

(c) Keeping only cases with S&T codes (nature of injury) within first three diagnoses (icludes pre-processing of external cause and nature of injury codes)

(d) Assign cases to "ambulatory care" or "admitted"

We implemented application of inclusion criteria for the Mexico hospital dataset by eliminating all fatal cases, only keeping "emergencias," which represent cases at first point of contact with the health system, and only keeping cases with S&T codes. The Mexico dataset only contains admitted patients. Code is below. For complete code see ICD10_GBD_Mapping_Part1_Mexico_Script.do.

NOTE: You may choose to do these steps differently, but you must document your methodology and communicate it to us (kavi_bhalla@harvard.edu).

(a) & (b)

drop if discharge_reason == 5

* 5= death

keep if proced == 2

* 2 = emergencies

*The next line replaces the value of the second diagnosis variable (cause_x1) with the value of the main diagnosis variable (cause_main) when the value of cause_x1 is missing

replace cause_x1 = cause_main if cause_x1 == ""

*The next line keeps only the relevant variables

keep age age_key sex cause_x1 cause_x2 cause_x3 cause_ext

(c)

*NATURE OF INJURY

*The following "foreach" syntax says, for each variable beginning with cause_x (includes cause_x1 to x3), generate a temporary variable called "temp_cause_x*"; where * = 1, 2, or 3). This temporary variable will only have the first value of the cause_x* variables. (i.e., for cause_x1 = T21, temp_cause_x1 = T)

foreach var of varlist cause_x*{

gen temp_`var' = substr(`var',1,1)

}

*The following syntax says keep the observations only where temp_cause_x* = S or T (i.e., only keep observations where there is nature of injury information)

keep if temp_cause_x1 == "S" | temp_cause_x1 == "T" | temp_cause_x2 == "S" | temp_cause_x2 == "T" | temp_cause_x3 == "S" | temp_cause_x3 == "T"

*The following syntax deals with missing nature of injury information. It says to replace cause_x* with a blank (equivalent to missing) if temp_cause_x* does not equal S and does not equal T

replace cause_x1 = "" if temp_cause_x1 !="S" & temp_cause_x1 !="T"

replace cause_x2 = "" if temp_cause_x2 !="S" & temp_cause_x2 !="T"

replace cause_x3 = "" if temp_cause_x3 !="S" & temp_cause_x3 !="T"

drop temp_*

*The following foreach syntax relates specifically to the Mexican Hospital data, whereby some ICD codes were given an "X" or "x" at the end of the code (i.e., T21x). This syntax will strip the code of that "x".

foreach var of varlist cause_x*{

replace `var' =substr(`var',1,3) if substr(`var',4,4)=="x" |substr(`var',4,4)=="X"

}

*The following syntax deals with nature of injury codes that are incomplete (i.e., less than 3 characters (i.e., T2 instead of T21).

*It says for each variable of cause_x*, replace the value with "unspecified" if the code is less than three characters.

foreach var of varlist cause_x*{

gen `var'_length=length(`var')

replace `var'="Unspecified" if `var'_length<3

}

drop *_length

*EXTERNAL CAUSE

*The following syntax codes external causes that are not V, W, X or Y (but have associated S and T codes) as "Unspecified"

gen cause_ext_temp = substr(cause_ext,1,1)

replace cause_ext = "Unspecified" if cause_ext_temp != "V" & cause_ext_temp != "W" & cause_ext_temp != "X" & cause_ext_temp != "Y"

replace cause_ext_temp = "Unspecified" if cause_ext_temp != "V" & cause_ext_temp != "W" & cause_ext_temp != "X" & cause_ext_temp != "Y"

keep if cause_ext_temp == "V" | cause_ext_temp == "W" | cause_ext_temp == "X" | cause_ext_temp == "Y" | cause_ext_temp == "" | cause_ext_temp == "Unspecified"

drop cause_ext_temp

*The following syntax is specific to the Mexican hospital data in which there are "x" and "X" added to the end of the external cause code. This syntax deletes the x.

replace cause_ext=substr(cause_ext,1,3) if substr(cause_ext,4,4)=="x" |substr(cause_ext,4,4)=="X"

gen cause_ext_length=length(cause_ext)

*The following syntax replaces external cause codes that are less than 3 characters in length with "Unspecified"

replace cause_ext="Unspecified" if cause_ext_length<3

drop cause_ext_length

.........................................................................................................................................................................................................

(3) Map data (age and ICD codes) to GBD groupings

(a) Recode age to GBD age groupings for mortality and cause of death compilation:

Early neonatal (completed days of life 0 to 6)

Late neonatal (completed days of life 7 to 27)

28 days - 5 completed months

6-11 months

1-4 years

5-9 years

10-14 years

15-19 years

20-24 years

25-29 years

30-34 years

35-39 years

40-44 years

45-49 years

50-54 years

55-59 years

60-64 years

65-69 years

70-74 years

75-79 years

80-84 years

85+ years

(b) use ICD =>GBD mappings for external cause groups [GBD_ICD_10_ExternalCause_Mapping Worksheet.csv]

(c) use ICD =>GBD mappings for nature of injury groups [GBD_ICD_10_NatureInjury_Mapping Worksheet.csv]

(d) apply code key [You are NOT responsible for this step. This will be conducted by the Core Injury Team. To see what codes correspond with what GBD groupings consult External Cause Definition Table and Nature of Inj. Definition Table.)

NOTE: You may choose to do these steps differently, but you must document your methodology and communicate it to us (kavi_bhalla@harvard.edu).

(a) From ICD10_GBD_Mapping_Part1_Mexico_Script.do

*AGE

/*Codebook for age_key:

0 = hours

1 = days

2 = months

3 = years

9 = unknown

*/

*The next part of this script deals with age that has been miscoded.

*replace observations with values out of range specified for each age_key (hours: 0-23; days: 1-30; months:1-11; years: 1-120) with 999 (=Unknown)

replace age = 999 if age>23 & age_key == 0

replace age = 999 if age >30 & age_key == 1

replace age = 999 if age > 11 & age_key == 2

replace age = 999 if age > 120 & age_key == 3

replace age = 999 if age_key == 9

/*Codebook for GBD age grouping:

0 = Antepartum stillbirth (28 weeks or later, but before start of labor)

1 = Intrapartum stillbirth (after start of labor)

2 = Early neonatal (completed days of life 0 to 6)

3 = Late neonatal (completed days of life 7 to 27)

4 = 28 days - 5 completed months

5 = 6-11 months

6 = 1-4 years

7 = 5-9 years

8 = 10-14 years

9 = 15-19 years

10 = 20-24 years

11 = 25-29 years

12 = 30-34 years

13 = 35-39 years

14 = 40-44 years

15 = 45-49 years

16 = 50-54 years

17 = 55-59 years

18 = 60-64 years

19 = 65-69 years

20 = 70-74 years

21 = 75-79 years

22 = 80-84 years

23 = 85+ years

999 = unknown

*/

*replace age with GBD age group

*For example, the following line replaces the value of age with the value for GBD age group of <1month ("0"), if age is coded in hours (age_key==0) and does not equal "unknown" (age != 999)

gen age_gbd = age

replace age_gbd = 2 if age_key == 0 & age != 999

replace age_gbd = 2 if age_key == 1 & age <= 6 & age != 999

replace age_gbd = 3 if age_key == 1 & age >= 7 & age <= 27 & age != 999

replace age_gbd = 4 if age_key == 1 & age >= 28 & age != 999

replace age_gbd = 4 if age_key == 2 & age <= 5 & age != 999

replace age_gbd = 5 if age_key == 2 & age >= 6 & age != 999

replace age_gbd = 6 if age_key == 3 & age <= 4 & age != 999

replace age_gbd = 7 if age >=4 & age<=9 & age_key == 3 & age != 999

replace age_gbd = 8 if age >=10 & age<=14 & age_key == 3 & age != 999

replace age_gbd = 9 if age >=15 & age<=19 & age_key == 3 & age != 999

replace age_gbd = 10 if age >=20 & age<=24 & age_key == 3 & age != 999

replace age_gbd = 11 if age >=25 & age<=29 & age_key == 3 & age != 999

replace age_gbd = 12 if age >=30 & age<=34 & age_key == 3 & age != 999

replace age_gbd = 13 if age >=35 & age<=39 & age_key == 3 & age != 999

replace age_gbd = 14 if age >=40 & age<=44 & age_key == 3 & age != 999

replace age_gbd = 15 if age >=45 & age<=49 & age_key == 3 & age != 999

replace age_gbd = 16 if age >=50 & age<=54 & age_key == 3 & age != 999

replace age_gbd = 17 if age >=55 & age<=59 & age_key == 3 & age != 999

replace age_gbd = 18 if age >=60 & age<=64 & age_key == 3 & age != 999

replace age_gbd = 19 if age >=65 & age<=69 & age_key == 3 & age != 999

replace age_gbd = 20 if age >=70 & age<=74 & age_key == 3 & age != 999

replace age_gbd = 21 if age >=75 & age<=79 & age_key == 3 & age != 999

replace age_gbd = 22 if age >=80 & age<=84 & age_key == 3 & age != 999

replace age_gbd = 23 if age >=85 & age_key == 3 & age != 999

drop age

label var age_gbd "Age by GBD age group, see script and documentation for codebook"

drop age_key

order sex age_gbd cause_ext cause_x1 cause_x2 cause_x3

(b) From ICD10_GBD_Mapping_Part2_Mexico_Script.do

*Merge ICD-10/GBD external cause mapping file

*The following syntax maps ICD10 external cause codes to GBD external categories

rename cause_ext icd_10_external

sort icd_10_external

tempfile temp7

save "`temp7'", replace

clear

insheet using "`directory'\GBD_ICD_10_ExternalCause_Mapping Worksheet.csv", name

sort icd_10_external

tempfile temp8

save "`temp8'", replace

clear

use "`temp7'"

merge icd_10_external using "`temp8'", nokeep

drop _merge

*The following syntax does the same thing as above, except for ICD10 3 digit codes

rename code code_3digit

rename icd_10_external icd_10_external_4digit

gen icd_10_external = substr(icd_10_external_4digit,1,3)

sort icd_10_external

merge icd_10_external using "`temp8'", nokeep

drop _merge

*The following syntax says if value of code is missing, replace with the value for code_3digit

replace code = code_3digit if code == .

drop icd_10_external_4digit code_3digit icd_10_external

(c) From ICD10_GBD_Mapping_Part2_Mexico_Script.do

*Merge ICD-10/GBD nature of injury mapping file (2x in order to manage 3 and 4 digit ICD codes (e.g. S10 and S101))

*The following syntax merges the observations in the hospital data file (coded in ICD10) with the GBD nature of injury mapping worksheet (to get ICD10 codes into GBD categories).

rename cause_x1 icd_10_cause

sort icd_10_cause

tempfile temp1

save "`temp1'", replace

clear

insheet using "`directory'\GBD_ICD_10_NatureInjury_Mapping Worksheet.csv", name

sort icd_10_cause

tempfile temp2

save "`temp2'", replace

clear

use "`temp1'"

merge icd_10_cause using "`temp2'", nokeep

rename icd_10_cause icd_10_cause_4digit

rename gbd_seq_code gbd_seq_code_4digit

drop _merge

*The following syntax generates a variable called "icd_10_cause" which will only have the values of the first three characters of "icd_10_cause_4digit".

*This then is merged with the temp2 variable (ICD-GBD map) to get the nature of injury ICD 10 3digit codes to GBD nature of injury categories (need to match ICD10 cause 3 digits, and ICD10 cause 4 digits to hospital data file and then get GBD category values for each).

gen icd_10_cause = substr(icd_10_cause_4digit,1,3)

sort icd_10_cause

merge icd_10_cause using "`temp2'", nokeep

drop _merge

*If the value for gbd_seq_code is missing, replace with gbd_seq_code_4digit, and rename to gbd_seq_code_1, then delete gbd_seq_code_4digit

replace gbd_seq_code = gbd_seq_code_4digit if gbd_seq_code == ""

rename gbd_seq_code gbd_seq_code_1

drop gbd_seq_code_4digit

*The following syntax generates new variables called cause_x2_short and cause_x3_short (values are only first three characters)

gen cause_x2_short = substr(cause_x2,1,3)

gen cause_x3_short = substr(cause_x3,1,3)

*The following syntax deals with recoding ICD10 nature of injury T311 and T321, which are burns that are between 10-19% of body surface area.

*The first line of the following syntax says replace gbd_seq_code_1 with "A5a" if icd_10_cause_4digit equals "T311" or "T321" and gbd_seq_code_1 equals "Recode" and cause_x2_short equals "T20" or "T23" or "T26" or cause_x3_short equals "T20" or "T23" or "T26"

replace gbd_seq_code_1 = "A5a" if icd_10_cause_4digit == "T311" & gbd_seq_code_1 == "Recode" & (cause_x2_short == "T20" | cause_x2_short == "T23" | cause_x3_short == "T20" | cause_x3_short == "T23")

replace gbd_seq_code_1 = "A5a" if icd_10_cause_4digit == "T321" & gbd_seq_code_1 == "Recode" & (cause_x2_short == "T20" | cause_x2_short == "T23" | cause_x3_short == "T20" | cause_x3_short == "T23")

***The above syntax is then repeated for cause_x2 and cause_x3.

rename icd_10_cause_4digit icd_10_cause_4digit_1

rename icd_10_cause icd_10_cause_1

drop cause_x2_short

rename cause_x2 icd_10_cause

sort icd_10_cause

tempfile temp3

save "`temp3'", replace

clear

insheet using "`directory'\GBD_ICD_10_NatureInjury_Mapping Worksheet.csv", name

sort icd_10_cause

tempfile temp4

save "`temp4'", replace

clear

use "`temp3'"

merge icd_10_cause using "`temp4'", nokeep

rename icd_10_cause icd_10_cause_4digit

rename gbd_seq_code gbd_seq_code_4digit

drop _merge

gen icd_10_cause = substr(icd_10_cause_4digit,1,3)

sort icd_10_cause

merge icd_10_cause using "`temp4'", nokeep

drop _merge

replace gbd_seq_code = gbd_seq_code_4digit if gbd_seq_code == ""

rename gbd_seq_code gbd_seq_code_2

drop gbd_seq_code_4digit

replace gbd_seq_code_2 = "A5a" if icd_10_cause_4digit == "T311" & gbd_seq_code_2 == "Recode" & (icd_10_cause_1 == "T20" | icd_10_cause_1 == "T23" | cause_x3_short == "T20" | cause_x3_short == "T23")

replace gbd_seq_code_2 = "A5a" if icd_10_cause_4digit == "T321" & gbd_seq_code_2 == "Recode" & (icd_10_cause_1 == "T20" | icd_10_cause_1 == "T23" | cause_x3_short == "T20" | cause_x3_short == "T23")

*As above, the syntax is repeated for cause_x3

rename icd_10_cause_4digit icd_10_cause_4digit_2

rename icd_10_cause icd_10_cause_2

drop cause_x3_short

rename cause_x3 icd_10_cause

sort icd_10_cause

tempfile temp5

save "`temp5'", replace

clear

insheet using "`directory'\GBD_ICD_10_NatureInjury_Mapping Worksheet.csv", name

sort icd_10_cause

tempfile temp6

save "`temp6'", replace

clear

use "`temp5'"

merge icd_10_cause using "`temp6'", nokeep

rename icd_10_cause icd_10_cause_4digit

rename gbd_seq_code gbd_seq_code_4digit

drop _merge

gen icd_10_cause = substr(icd_10_cause_4digit,1,3)

sort icd_10_cause

merge icd_10_cause using "`temp6'", nokeep

drop _merge

replace gbd_seq_code = gbd_seq_code_4digit if gbd_seq_code == ""

rename gbd_seq_code gbd_seq_code_3

drop gbd_seq_code_4digit

replace gbd_seq_code_3 = "A5a" if icd_10_cause_4digit == "T311" & gbd_seq_code_3 == "Recode" & (icd_10_cause_1 == "T20" | icd_10_cause_1 == "T23" | icd_10_cause_2 == "T20" | icd_10_cause_2 == "T23")

replace gbd_seq_code_3 = "A5a" if icd_10_cause_4digit == "T321" & gbd_seq_code_3 == "Recode" & (icd_10_cause_1 == "T20" | icd_10_cause_1 == "T23" | icd_10_cause_2 == "T20" | icd_10_cause_2 == "T23")

rename icd_10_cause_4digit icd_10_cause_4digit_3

rename icd_10_cause icd_10_cause_3

drop icd_10_cause_4digit_1 icd_10_cause_4digit_2 icd_10_cause_4digit_3 icd_10_cause_1 icd_10_cause_2 icd_10_cause_3

(d) From ICD10_GBD_Mapping_Part2_Mexico_Script.do

rename code code_ext

label var code_ext "External cause"

label var cause "Nature of injury"

*Nature of injury code merge

*The following syntax is matching the nature of injury code (i.e., A1) with the gbd_seq (injured spinal cord) and gbd_seq_specific (Neck)

*The following line says rename variable "cause" (which is nature of injury), to "code".

rename cause code

sort code

tempfile temp9

save "`temp9'", replace

clear

insheet using "`directory'\Nature_Injury_Code_Key.csv", name

sort code

tempfile temp10

save "`temp10'", replace

clear

use "`temp9'"

merge code using "`temp10'", nokeep

drop _merge code

label var gbd_seq "GBD nature of injury sequlae (e.g fracture)"

label var gbd_seq_specific "GBD nature of injury specific sequlae (e.g. ankle)"

*External cause code merge

*The following syntax is matching the external cause code (i.e., U1), with the gbd intent (Unintentional), and gbd external cause, broad (Road Traffic Incident)

rename code_ext code

sort code

tempfile temp11

save "`temp11'", replace

clear

insheet using "`directory'\External_Cause_Code_Key.csv", name

sort code

tempfile temp12

save "`temp12'", replace

clear

use "`temp11'"

merge code using "`temp12'", nokeep

drop _merge code

rename intent gbd_intent

label var gbd_intent "GBD external cause intent (e.g. self-inflicted or unintentional)"

rename broad gbd_broad

label var gbd_broad "GBD external cause broad (e.g. RTI)"

rename short_name gbd_short_name

label var gbd_short_name "GBD external cause specific (e.g. pedestrian)"

rename gbd_category gbd_full_name

label var gbd_full_name "GBD full name (e.g. Unintentional RTI Pedestrian)"

.........................................................................................................................................................................................................

(4) Generate separate tabulations for outpatient (ambulatory care) and inpatient (admitted care) for external cause by nature of injury, age, and sex [see Mexico_Injury_Tabs.csv or Mexico_Injury_Tabs.dta for Mexican output - Mexico dataset only has admitted care]. The GBD Injury Team is primarily concerned with tabulations for 2005; however, depending on the size of the dataset mapping may best be conducted as an aggregate over several years. For example, the Mexican dataset here discussed is only tabulated for 2005, whereas Mauritius hospital data is tabulated for 2003-2007. The cut-off for dataset size for 2005-only analysis versus aggregated analysis is subject to ongoing discussion. For now, contact Kavi at kavi_bhalla@harvard.edu.

In the final tabulation, each individual is counted only once regardless of the number of injuries. Mapping codes are used to assign priority to one of multiple injuries. For the theoretical basis for the priority ranking of injuries, please refer to Discussion Document 3- Health state definitions.. All injuries coded as "A" are considered priority injuries (e.g. fractured femur). "A-coded" injuries trump "B-coded" injuries. If an individual is diagnosed with multiple priority injuries, numeric ranking of "A-coded" injuries determines the priority ranking. For example, an individual with "A1," "A2," and "A5" would be considered as "A1" only. For an individual with only non-priority injuries ("B-coded"), the first diagnosed "B-code" is used as the only injury code for that individual.

*The following syntax yields tabulations of the total numbers of cases in each age and sex GBD group, for gbd_seq, gbd_seq_specific, gbd_intent, gbd_broad, and gbd_specific

gen total = 1

label var total "Number of observations by category"

order gbd_intent gbd_broad gbd_short_name gbd_full_name gbd_seq gbd_seq_specific gbd_title_state sex age_gbd

collapse (rawsum) total, by (gbd_intent gbd_broad gbd_short_name gbd_full_name gbd_seq gbd_seq_specific gbd_title_state sex age_gbd)

*replace CountryName with name of country being analyzed

save "`directory'\Output\Mexico_Injury_Tabs.dta", replace

outsheet using "`directory'\Output\Mexico_Injury_Tabs.csv", comma replace

capture log close

This is an example of how to process a hospital data set for use in the GBD project. If the steps described above are not clear or if you have questions, please contact kavi_bhalla@harvard.edu, and we will provide technical support.