These pages serve to provide the requisite documentation assocaited with each of the sources available through the DataCore. Beacuse each data source within the DataCore has idiosyncratic components involving issues such as licensing and access, we provide this information here for individual researchers. This website uses terms that have specific meanings in data science and a short tutorial video is available to quickly walk users through how one might read a particular data source page.
Purveyor The Purveyor is the organization from whom we acquire the data. further information will often be listed about them below.
Purveyor Agency for Healthcare Research and Quality
Years in the DataCore Not all datasets are fully uploaded and accessible
Years in the DataCore 1988-2016
Years of Data owned This is the up to date list of years that we have for a given dataset.
Years of Data owned 1988-2016
Unit of Data Different datasets are organized around different types of buckets, we want to talk about how the data within a given dataset is organized here.
Unit of Data Hospital Discharge
Dataset Website Often a dataset will have a website associated with it, this is that website.
Purveyor Website Is the Website of the Purveyor.
Purveyor Website https://www.hcup-us.ahrq.gov/
Public Facing Data Dictionary Is the website that contains the online data dictionary if it exists.
Public Facing Data Dictionary https://www.hcup-us.ahrq.gov/db/nation/nis/nisdde.jsp
General Description
This section is just a blurb about the dataset and what the purveyor thinks it might be useful for. This is often a direct reference to the purveyors website.
General Description
The NIS is the largest publicly available all-payer inpatient healthcare database in the United States, yielding national estimates of hospital inpatient stays. Unweighted, it contains data from more than 7 million hospital stays each year. Weighted, it estimates more than 35 million hospitalizations nationally.
Common Key Linking Variables
Different databases will be able to connect with other databases through various ways. This section we're still figuring out; however, we currently expect to be able to link directly to other sources by grouping by:
There will likely be other ways of identifying cohorts (Diagnosis, Medications, Procedures, ETC.) and we will add those various methods of grouping as they are used.
Common Key Linking Variables
Hospital Linking:
Patient Linking:
Geographic Linking:
Data Dictionary
A data dictionary is a list of all variables for a given dataset, a description of those variables, and category value information about those variables if it is available.
Data Dictionary
Licensing and Access
There is a lot to this and the cost of a dataset might change. There is a cost and requirements to access a whole dataset and there are costs to access subsets or aggregates which are prepared for you.
Licensing and Access:
All users of HCUP data must complete the HCUP Data Use Agreement (DUA) Training Course and sign an HCUP DUA before receipt of the data. See this website for further information: https://www.hcup-us.ahrq.gov/tech_assist/dua.jsp.
Database Structure
Sometimes a given dataset will have multiple tables, in this section we will detail all of the tables, their primary keys, and how they key off of eachother, We will also provide an entity relationship diagram (ERD) as well to help illustrate the structure.
NIS Structure
Core [1998-2016], A B C [1993-1997], Q1 Q2 Q3 Q4 [1988-1992]
Every row of the Core dataset is a hospital discharge.
The Primary Key of the Core table is:
Sequence Number (SEQ) 1988-1997
Unique Record Identifier (KEY) 1998-2011
NIS Record Number (KEY_NIS) 2012-2016
Hospital [1988-2016]
Every row of NIS Hospital is a hospital.
The Primary Key of the Hospital table is:
Data source hospital number (DSHOSPID) 1988-2011
NIS Hospital Number (HOSP_NIS) 2012-2016
Severity [2002-2016]
Every Row of severity is All Patient Refined DRG codes from 2015Q4-2016; however, Severity also contains comorbidity data before that. The severity table can be directly mapped to the Core or Hospital tables.
The Primary Key of the Severity table is KEY_NIS.
DX_PR_GRPS [2005-2015] ,Dx Pr [1993-1997]
Diagnosis and Procedure information contains data about diagnosis and procedures performed during an admission. It can also be mapped to the Core of Hospital tables.
The Primary Key of the DX_PR_GRPS table is KEY_NIS.
DataCore Staff Errata
As the data is loaded or as the data is used it is possible that we will find errors or inconsistencies in the data. As these are found we will detail them in this section.
DataCore Staff Errata
5/28/2019: No data errata, data exceptions or data corrections have been issued.
DataCore Purveyor Errata
Over Time, the purveyor might issue corrections or changes to the data structure. As these corrections are implemented we will post them here.
DataCore Purveyor Errata
5/28/2019: No data errata, data exceptions or data corrections have been implemented.
Provenance
In this section, we detail the process that we executed in order to convert the data as it was received from the proveyor into the final SQL structure that is utilized by the DataCore.
Provenance
The data from HCUP was sent in ascii files (.asc) with associated file specification files. It was found that these file specifications offered an accurate depiction of the data.
For the code used for these processes, email datacore@osumc.edu.
Stata .do files provided by HCUP were used to load the .asc files into Stata. These files were then exported in tab separated value files (.tsv). For 1988-2003 data, stata load files were created following the same structure as the provided .do files in order to load the data into Stata to explore it as .tsv.
The provided file specification files were used in order to create SQL tables to fit the data.
A bulk copy program (BCP) was used in order to upload the .tsv into SQL.
The website https://www.hcup-us.ahrq.gov/db/nation/nis/nisdde.jsp was used to generate metadata about the dataset fields and was used to generate the data dictionary.