Call for data - Mexico hospital example
Process for Contributing ICD 10-Coded Hospital Morbidity Data:
The following text describes the process by which hospital data is transformed into non-fatal nature of injury/external cause-linked tabulations for injuries by age and sex, according to Global Burden of Disease Project external cause and nature of injury sequelae groupings. Mexican hospital data is used as an example.
*For generic script for your use see attachments [ICD10_GBD_Mapping_Part1_Generic_Script.do & ICD10_GBD_Mapping_Part2_Generic_Script.do].
*For full example script and output (.csv or STATA) see attachments [ICD10_GBD_Mapping_Part1_Mexico_Script.do, ICD10_GBD_Mapping_Part2_Mexico_Script.do, & Mexico_Injury_Tabs.csv or Mexico_Injury_Tabs.dta].
Process steps:
[Note: Stata code is shown in italics.]
(1) Identify variables of interest (age, sex, external cause, first three diagnosis codes, fatal/non-fatal, and patient type [inpatient or admitted vs. outpatient or ambulatory care]):
In the Mexican Hospital dataset, we identified the relevant variables (edad, cveedad, sexo, motegre, afecprin, afec01, afec02, afec03, and causaext) and translated Spanish names into names used by our script. These variables are age, age_key, sex, reason for exit (i.e. fatal or non-fatal), principle diagnosis code, first diagnosis code, second diagnosis code, third diagnosis code, and external cause. There is no patient type variable, as the dataset is an admitted registry only. Age, sex, first, second, and third diagnosis codes, and external cause are obviously important for tabulation. Age_key is important for this dataset because the values of age are unitless. Age_key is a categorical variable with values equal to hours, days, months and years (see ICD10_GBD_Mapping_Part1_Mexico_Script.do). Thus, age_key = 1 / age = 8 corresponds to 8 days, whereas age_key = 2 / age = 8 corresponds to 8 weeks. Check to see if an age-key variable exists in your hospital dataset and adjust the script accordingly. Additionally, the Mexico dataset includes a variable for principal diagnosis, which equals the first diagnosis. However, upon dataset exploration it was discovered that, even if a principal diagnosis was listed, in some cases, first diagnosis was missing. In these cases, first diagnosis was replaced with the value from principal diagnosis. Check to see if this is the case with your hospital dataset.
NOTE: You may choose to do these steps differently, but you must document your methodology and communicate it to us (kavi_bhalla@harvard.edu).
rename edad age
label var age "Age"
rename cveedad age_key
label var age_key "Key of age"
rename sexo sex
label var sex "Sex"
rename motegre discharge_reason
label var discharge_reason "Reason of exit"
rename afecprin cause_main
label var cause_main "Primary diagnosis: nature of injury code (S-T)"
rename afec01 cause_x1
label var cause_x1 "First diagnosis: nature of injury code (S-T)"
rename afec02 cause_x2
label var cause_x2 "Second diagnosis: nature of injury code (S-T)"
rename afec03 cause_x3
label var cause_x3 "Third diagnosis: nature of injury code (S-T)"
rename causaext cause_ext
label var cause_ext "External cause code (V-Y)"
.........................................................................................................................................................................................................
(2) Apply inclusion criteria
The inclusion criteria for GBD Injury hospital data requires:
(a) Keeping only non-fatal cases
(b) Keeping only cases presenting at first point of contact with the healthcare system (e.g. "emergencias" for Mexico)
(c) Keeping only cases with S&T codes (nature of injury) within first three diagnoses (icludes pre-processing of external cause and nature of injury codes)
(d) Assign cases to "ambulatory care" or "admitted"
We implemented application of inclusion criteria for the Mexico hospital dataset by eliminating all fatal cases, only keeping "emergencias," which represent cases at first point of contact with the health system, and only keeping cases with S&T codes. The Mexico dataset only contains admitted patients. Code is below. For complete code see ICD10_GBD_Mapping_Part1_Mexico_Script.do.
NOTE: You may choose to do these steps differently, but you must document your methodology and communicate it to us (kavi_bhalla@harvard.edu).
(a) & (b)
drop if discharge_reason == 5
* 5= death
keep if proced == 2
* 2 = emergencies
*The next line replaces the value of the second diagnosis variable (cause_x1) with the value of the main diagnosis variable (cause_main) when the value of cause_x1 is missing
replace cause_x1 = cause_main if cause_x1 == ""
*The next line keeps only the relevant variables
keep age age_key sex cause_x1 cause_x2 cause_x3 cause_ext
(c)
*NATURE OF INJURY
*The following "foreach" syntax says, for each variable beginning with cause_x (includes cause_x1 to x3), generate a temporary variable called "temp_cause_x*"; where * = 1, 2, or 3). This temporary variable will only have the first value of the cause_x* variables. (i.e., for cause_x1 = T21, temp_cause_x1 = T)
foreach var of varlist cause_x*{
gen temp_`var' = substr(`var',1,1)
}
*The following syntax says keep the observations only where temp_cause_x* = S or T (i.e., only keep observations where there is nature of injury information)
keep if temp_cause_x1 == "S" | temp_cause_x1 == "T" | temp_cause_x2 == "S" | temp_cause_x2 == "T" | temp_cause_x3 == "S" | temp_cause_x3 == "T"
*The following syntax deals with missing nature of injury information. It says to replace cause_x* with a blank (equivalent to missing) if temp_cause_x* does not equal S and does not equal T
replace cause_x1 = "" if temp_cause_x1 !="S" & temp_cause_x1 !="T"
replace cause_x2 = "" if temp_cause_x2 !="S" & temp_cause_x2 !="T"
replace cause_x3 = "" if temp_cause_x3 !="S" & temp_cause_x3 !="T"
drop temp_*
*The following foreach syntax relates specifically to the Mexican Hospital data, whereby some ICD codes were given an "X" or "x" at the end of the code (i.e., T21x). This syntax will strip the code of that "x".
foreach var of varlist cause_x*{
replace `var' =substr(`var',1,3) if substr(`var',4,4)=="x" |substr(`var',4,4)=="X"
}
*The following syntax deals with nature of injury codes that are incomplete (i.e., less than 3 characters (i.e., T2 instead of T21).
*It says for each variable of cause_x*, replace the value with "unspecified" if the code is less than three characters.
foreach var of varlist cause_x*{
gen `var'_length=length(`var')
replace `var'="Unspecified" if `var'_length<3
}
drop *_length
*EXTERNAL CAUSE
*The following syntax codes external causes that are not V, W, X or Y (but have associated S and T codes) as "Unspecified"
gen cause_ext_temp = substr(cause_ext,1,1)
replace cause_ext = "Unspecified" if cause_ext_temp != "V" & cause_ext_temp != "W" & cause_ext_temp != "X" & cause_ext_temp != "Y"
replace cause_ext_temp = "Unspecified" if cause_ext_temp != "V" & cause_ext_temp != "W" & cause_ext_temp != "X" & cause_ext_temp != "Y"
keep if cause_ext_temp == "V" | cause_ext_temp == "W" | cause_ext_temp == "X" | cause_ext_temp == "Y" | cause_ext_temp == "" | cause_ext_temp == "Unspecified"
drop cause_ext_temp
*The following syntax is specific to the Mexican hospital data in which there are "x" and "X" added to the end of the external cause code. This syntax deletes the x.
replace cause_ext=substr(cause_ext,1,3) if substr(cause_ext,4,4)=="x" |substr(cause_ext,4,4)=="X"
gen cause_ext_length=length(cause_ext)
*The following syntax replaces external cause codes that are less than 3 characters in length with "Unspecified"
replace cause_ext="Unspecified" if cause_ext_length<3
drop cause_ext_length
.........................................................................................................................................................................................................
(3) Map data (age and ICD codes) to GBD groupings
(a) Recode age to GBD age groupings for mortality and cause of death compilation:
Early neonatal (completed days of life 0 to 6)
Late neonatal (completed days of life 7 to 27)
28 days - 5 completed months
6-11 months
1-4 years
5-9 years
10-14 years
15-19 years
20-24 years
25-29 years
30-34 years
35-39 years
40-44 years
45-49 years
50-54 years
55-59 years
60-64 years
65-69 years
70-74 years
75-79 years
80-84 years
85+ years
(b) use ICD =>GBD mappings for external cause groups [GBD_ICD_10_ExternalCause_Mapping Worksheet.csv]
(c) use ICD =>GBD mappings for nature of injury groups [GBD_ICD_10_NatureInjury_Mapping Worksheet.csv]
(d) apply code key [You are NOT responsible for this step. This will be conducted by the Core Injury Team. To see what codes correspond with what GBD groupings consult External Cause Definition Table and Nature of Inj. Definition Table.)
NOTE: You may choose to do these steps differently, but you must document your methodology and communicate it to us (kavi_bhalla@harvard.edu).
(a) From ICD10_GBD_Mapping_Part1_Mexico_Script.do
*AGE
/*Codebook for age_key:
0 = hours
1 = days
2 = months
3 = years
9 = unknown
*/
*The next part of this script deals with age that has been miscoded.
*replace observations with values out of range specified for each age_key (hours: 0-23; days: 1-30; months:1-11; years: 1-120) with 999 (=Unknown)
replace age = 999 if age>23 & age_key == 0
replace age = 999 if age >30 & age_key == 1
replace age = 999 if age > 11 & age_key == 2
replace age = 999 if age > 120 & age_key == 3
replace age = 999 if age_key == 9
/*Codebook for GBD age grouping:
0 = Antepartum stillbirth (28 weeks or later, but before start of labor)
1 = Intrapartum stillbirth (after start of labor)
2 = Early neonatal (completed days of life 0 to 6)
3 = Late neonatal (completed days of life 7 to 27)
4 = 28 days - 5 completed months
5 = 6-11 months
6 = 1-4 years
7 = 5-9 years
8 = 10-14 years
9 = 15-19 years
10 = 20-24 years
11 = 25-29 years
12 = 30-34 years
13 = 35-39 years
14 = 40-44 years
15 = 45-49 years
16 = 50-54 years
17 = 55-59 years
18 = 60-64 years
19 = 65-69 years
20 = 70-74 years
21 = 75-79 years
22 = 80-84 years
23 = 85+ years
999 = unknown
*/
*replace age with GBD age group
*For example, the following line replaces the value of age with the value for GBD age group of <1month ("0"), if age is coded in hours (age_key==0) and does not equal "unknown" (age != 999)
gen age_gbd = age
replace age_gbd = 2 if age_key == 0 & age != 999
replace age_gbd = 2 if age_key == 1 & age <= 6 & age != 999
replace age_gbd = 3 if age_key == 1 & age >= 7 & age <= 27 & age != 999
replace age_gbd = 4 if age_key == 1 & age >= 28 & age != 999
replace age_gbd = 4 if age_key == 2 & age <= 5 & age != 999
replace age_gbd = 5 if age_key == 2 & age >= 6 & age != 999
replace age_gbd = 6 if age_key == 3 & age <= 4 & age != 999
replace age_gbd = 7 if age >=4 & age<=9 & age_key == 3 & age != 999
replace age_gbd = 8 if age >=10 & age<=14 & age_key == 3 & age != 999
replace age_gbd = 9 if age >=15 & age<=19 & age_key == 3 & age != 999
replace age_gbd = 10 if age >=20 & age<=24 & age_key == 3 & age != 999
replace age_gbd = 11 if age >=25 & age<=29 & age_key == 3 & age != 999
replace age_gbd = 12 if age >=30 & age<=34 & age_key == 3 & age != 999
replace age_gbd = 13 if age >=35 & age<=39 & age_key == 3 & age != 999
replace age_gbd = 14 if age >=40 & age<=44 & age_key == 3 & age != 999
replace age_gbd = 15 if age >=45 & age<=49 & age_key == 3 & age != 999
replace age_gbd = 16 if age >=50 & age<=54 & age_key == 3 & age != 999
replace age_gbd = 17 if age >=55 & age<=59 & age_key == 3 & age != 999
replace age_gbd = 18 if age >=60 & age<=64 & age_key == 3 & age != 999
replace age_gbd = 19 if age >=65 & age<=69 & age_key == 3 & age != 999
replace age_gbd = 20 if age >=70 & age<=74 & age_key == 3 & age != 999
replace age_gbd = 21 if age >=75 & age<=79 & age_key == 3 & age != 999
replace age_gbd = 22 if age >=80 & age<=84 & age_key == 3 & age != 999
replace age_gbd = 23 if age >=85 & age_key == 3 & age != 999
drop age
label var age_gbd "Age by GBD age group, see script and documentation for codebook"
drop age_key
order sex age_gbd cause_ext cause_x1 cause_x2 cause_x3
(b) From ICD10_GBD_Mapping_Part2_Mexico_Script.do
*Merge ICD-10/GBD external cause mapping file
*The following syntax maps ICD10 external cause codes to GBD external categories
rename cause_ext icd_10_external
sort icd_10_external
tempfile temp7
save "`temp7'", replace
clear
insheet using "`directory'\GBD_ICD_10_ExternalCause_Mapping Worksheet.csv", name
sort icd_10_external
tempfile temp8
save "`temp8'", replace
clear
use "`temp7'"
merge icd_10_external using "`temp8'", nokeep
drop _merge
*The following syntax does the same thing as above, except for ICD10 3 digit codes
rename code code_3digit
rename icd_10_external icd_10_external_4digit
gen icd_10_external = substr(icd_10_external_4digit,1,3)
sort icd_10_external
merge icd_10_external using "`temp8'", nokeep
drop _merge
*The following syntax says if value of code is missing, replace with the value for code_3digit
replace code = code_3digit if code == .
drop icd_10_external_4digit code_3digit icd_10_external
(c) From ICD10_GBD_Mapping_Part2_Mexico_Script.do
*Merge ICD-10/GBD nature of injury mapping file (2x in order to manage 3 and 4 digit ICD codes (e.g. S10 and S101))
*The following syntax merges the observations in the hospital data file (coded in ICD10) with the GBD nature of injury mapping worksheet (to get ICD10 codes into GBD categories).
rename cause_x1 icd_10_cause
sort icd_10_cause
tempfile temp1
save "`temp1'", replace
clear
insheet using "`directory'\GBD_ICD_10_NatureInjury_Mapping Worksheet.csv", name
sort icd_10_cause
tempfile temp2
save "`temp2'", replace
clear
use "`temp1'"
merge icd_10_cause using "`temp2'", nokeep
rename icd_10_cause icd_10_cause_4digit
rename gbd_seq_code gbd_seq_code_4digit
drop _merge
*The following syntax generates a variable called "icd_10_cause" which will only have the values of the first three characters of "icd_10_cause_4digit".
*This then is merged with the temp2 variable (ICD-GBD map) to get the nature of injury ICD 10 3digit codes to GBD nature of injury categories (need to match ICD10 cause 3 digits, and ICD10 cause 4 digits to hospital data file and then get GBD category values for each).
gen icd_10_cause = substr(icd_10_cause_4digit,1,3)
sort icd_10_cause
merge icd_10_cause using "`temp2'", nokeep
drop _merge
*If the value for gbd_seq_code is missing, replace with gbd_seq_code_4digit, and rename to gbd_seq_code_1, then delete gbd_seq_code_4digit
replace gbd_seq_code = gbd_seq_code_4digit if gbd_seq_code == ""
rename gbd_seq_code gbd_seq_code_1
drop gbd_seq_code_4digit
*The following syntax generates new variables called cause_x2_short and cause_x3_short (values are only first three characters)
gen cause_x2_short = substr(cause_x2,1,3)
gen cause_x3_short = substr(cause_x3,1,3)
*The following syntax deals with recoding ICD10 nature of injury T311 and T321, which are burns that are between 10-19% of body surface area.
*The first line of the following syntax says replace gbd_seq_code_1 with "A5a" if icd_10_cause_4digit equals "T311" or "T321" and gbd_seq_code_1 equals "Recode" and cause_x2_short equals "T20" or "T23" or "T26" or cause_x3_short equals "T20" or "T23" or "T26"
replace gbd_seq_code_1 = "A5a" if icd_10_cause_4digit == "T311" & gbd_seq_code_1 == "Recode" & (cause_x2_short == "T20" | cause_x2_short == "T23" | cause_x3_short == "T20" | cause_x3_short == "T23")
replace gbd_seq_code_1 = "A5a" if icd_10_cause_4digit == "T321" & gbd_seq_code_1 == "Recode" & (cause_x2_short == "T20" | cause_x2_short == "T23" | cause_x3_short == "T20" | cause_x3_short == "T23")
***The above syntax is then repeated for cause_x2 and cause_x3.
rename icd_10_cause_4digit icd_10_cause_4digit_1
rename icd_10_cause icd_10_cause_1
drop cause_x2_short
rename cause_x2 icd_10_cause
sort icd_10_cause
tempfile temp3
save "`temp3'", replace
clear
insheet using "`directory'\GBD_ICD_10_NatureInjury_Mapping Worksheet.csv", name
sort icd_10_cause
tempfile temp4
save "`temp4'", replace
clear
use "`temp3'"
merge icd_10_cause using "`temp4'", nokeep
rename icd_10_cause icd_10_cause_4digit
rename gbd_seq_code gbd_seq_code_4digit
drop _merge
gen icd_10_cause = substr(icd_10_cause_4digit,1,3)
sort icd_10_cause
merge icd_10_cause using "`temp4'", nokeep
drop _merge
replace gbd_seq_code = gbd_seq_code_4digit if gbd_seq_code == ""
rename gbd_seq_code gbd_seq_code_2
drop gbd_seq_code_4digit
replace gbd_seq_code_2 = "A5a" if icd_10_cause_4digit == "T311" & gbd_seq_code_2 == "Recode" & (icd_10_cause_1 == "T20" | icd_10_cause_1 == "T23" | cause_x3_short == "T20" | cause_x3_short == "T23")
replace gbd_seq_code_2 = "A5a" if icd_10_cause_4digit == "T321" & gbd_seq_code_2 == "Recode" & (icd_10_cause_1 == "T20" | icd_10_cause_1 == "T23" | cause_x3_short == "T20" | cause_x3_short == "T23")
*As above, the syntax is repeated for cause_x3
rename icd_10_cause_4digit icd_10_cause_4digit_2
rename icd_10_cause icd_10_cause_2
drop cause_x3_short
rename cause_x3 icd_10_cause
sort icd_10_cause
tempfile temp5
save "`temp5'", replace
clear
insheet using "`directory'\GBD_ICD_10_NatureInjury_Mapping Worksheet.csv", name
sort icd_10_cause
tempfile temp6
save "`temp6'", replace
clear
use "`temp5'"
merge icd_10_cause using "`temp6'", nokeep
rename icd_10_cause icd_10_cause_4digit
rename gbd_seq_code gbd_seq_code_4digit
drop _merge
gen icd_10_cause = substr(icd_10_cause_4digit,1,3)
sort icd_10_cause
merge icd_10_cause using "`temp6'", nokeep
drop _merge
replace gbd_seq_code = gbd_seq_code_4digit if gbd_seq_code == ""
rename gbd_seq_code gbd_seq_code_3
drop gbd_seq_code_4digit
replace gbd_seq_code_3 = "A5a" if icd_10_cause_4digit == "T311" & gbd_seq_code_3 == "Recode" & (icd_10_cause_1 == "T20" | icd_10_cause_1 == "T23" | icd_10_cause_2 == "T20" | icd_10_cause_2 == "T23")
replace gbd_seq_code_3 = "A5a" if icd_10_cause_4digit == "T321" & gbd_seq_code_3 == "Recode" & (icd_10_cause_1 == "T20" | icd_10_cause_1 == "T23" | icd_10_cause_2 == "T20" | icd_10_cause_2 == "T23")
rename icd_10_cause_4digit icd_10_cause_4digit_3
rename icd_10_cause icd_10_cause_3
drop icd_10_cause_4digit_1 icd_10_cause_4digit_2 icd_10_cause_4digit_3 icd_10_cause_1 icd_10_cause_2 icd_10_cause_3
(d) From ICD10_GBD_Mapping_Part2_Mexico_Script.do
rename code code_ext
label var code_ext "External cause"
label var cause "Nature of injury"
*Nature of injury code merge
*The following syntax is matching the nature of injury code (i.e., A1) with the gbd_seq (injured spinal cord) and gbd_seq_specific (Neck)
*The following line says rename variable "cause" (which is nature of injury), to "code".
rename cause code
sort code
tempfile temp9
save "`temp9'", replace
clear
insheet using "`directory'\Nature_Injury_Code_Key.csv", name
sort code
tempfile temp10
save "`temp10'", replace
clear
use "`temp9'"
merge code using "`temp10'", nokeep
drop _merge code
label var gbd_seq "GBD nature of injury sequlae (e.g fracture)"
label var gbd_seq_specific "GBD nature of injury specific sequlae (e.g. ankle)"
*External cause code merge
*The following syntax is matching the external cause code (i.e., U1), with the gbd intent (Unintentional), and gbd external cause, broad (Road Traffic Incident)
rename code_ext code
sort code
tempfile temp11
save "`temp11'", replace
clear
insheet using "`directory'\External_Cause_Code_Key.csv", name
sort code
tempfile temp12
save "`temp12'", replace
clear
use "`temp11'"
merge code using "`temp12'", nokeep
drop _merge code
rename intent gbd_intent
label var gbd_intent "GBD external cause intent (e.g. self-inflicted or unintentional)"
rename broad gbd_broad
label var gbd_broad "GBD external cause broad (e.g. RTI)"
rename short_name gbd_short_name
label var gbd_short_name "GBD external cause specific (e.g. pedestrian)"
rename gbd_category gbd_full_name
label var gbd_full_name "GBD full name (e.g. Unintentional RTI Pedestrian)"
.........................................................................................................................................................................................................
(4) Generate separate tabulations for outpatient (ambulatory care) and inpatient (admitted care) for external cause by nature of injury, age, and sex [see Mexico_Injury_Tabs.csv or Mexico_Injury_Tabs.dta for Mexican output - Mexico dataset only has admitted care]. The GBD Injury Team is primarily concerned with tabulations for 2005; however, depending on the size of the dataset mapping may best be conducted as an aggregate over several years. For example, the Mexican dataset here discussed is only tabulated for 2005, whereas Mauritius hospital data is tabulated for 2003-2007. The cut-off for dataset size for 2005-only analysis versus aggregated analysis is subject to ongoing discussion. For now, contact Kavi at kavi_bhalla@harvard.edu.
In the final tabulation, each individual is counted only once regardless of the number of injuries. Mapping codes are used to assign priority to one of multiple injuries. For the theoretical basis for the priority ranking of injuries, please refer to Discussion Document 3- Health state definitions.. All injuries coded as "A" are considered priority injuries (e.g. fractured femur). "A-coded" injuries trump "B-coded" injuries. If an individual is diagnosed with multiple priority injuries, numeric ranking of "A-coded" injuries determines the priority ranking. For example, an individual with "A1," "A2," and "A5" would be considered as "A1" only. For an individual with only non-priority injuries ("B-coded"), the first diagnosed "B-code" is used as the only injury code for that individual.
*The following syntax yields tabulations of the total numbers of cases in each age and sex GBD group, for gbd_seq, gbd_seq_specific, gbd_intent, gbd_broad, and gbd_specific
gen total = 1
label var total "Number of observations by category"
order gbd_intent gbd_broad gbd_short_name gbd_full_name gbd_seq gbd_seq_specific gbd_title_state sex age_gbd
collapse (rawsum) total, by (gbd_intent gbd_broad gbd_short_name gbd_full_name gbd_seq gbd_seq_specific gbd_title_state sex age_gbd)
*replace CountryName with name of country being analyzed
save "`directory'\Output\Mexico_Injury_Tabs.dta", replace
outsheet using "`directory'\Output\Mexico_Injury_Tabs.csv", comma replace
capture log close
This is an example of how to process a hospital data set for use in the GBD project. If the steps described above are not clear or if you have questions, please contact kavi_bhalla@harvard.edu, and we will provide technical support.