We retrieve “as-filed” financial statement data from the Financial Statement and Notes Data Sets compiled by the SEC. We also retrieved the annual U.S. GAAP taxonomies starting 2009 from the FASB’s website. (For an archive of the taxonomies, please see this page.)
Each filing references the specific taxonomy (consisting of schema files and relationship files) used in its preparation. The taxonomy defines a hierarchical graph whose nodes are financial statement items and whose arcs identify constituents that may be aggregated (via addition and subtraction) to arrive at the value of that item. We first populate the nodes with values from the filing. Beginning at the bottom of the hierarchy, we then iteratively aggregate values at one level to arrive at values at the next higher level where these are not already provided.
After incorporating custom tags (as described in Appendix C.2 below), our algorithm next considers the following accounting identities:
Assets = AssetsCurrent + AssetsNoncurrent;
Liabilities = LiabilitiesCurrent + LiabilitiesNoncurrent;
CashCashEquivalentsAndShortTermInvestments = CashAndCashEquivalentsAtCarryingValue + ShortTermInvestments;
PropertyPlantAndEquipmentNet = PropertyPlantAndEquipmentGross – AccumulatedDepreciationDepletionAndAmortizationPropertyPlantAndEquipment;
IntangibleAssetsNetIncludingGoodwill = IntangibleAssetsNetExcludingGoodwill + GoodWill;
DebtCurrent = ShortTermBorrowings + LongTermDebtAndCapitalLeaseObligationsCurrent.
If any two items in any of these identities are tagged in the filing, and the third is imputed via the bottom-up aggregation, the algorithm replaces the imputed value with the value implied by the accounting identity.
Then, we map Compustat data items to standard taxonomy items by comparing the reporting taxonomy and Compustat’s balancing model of financial items. To validate this mapping, we retrieve all firm-year observations from Compustat that have a non-zero value. For each of those observations, we identify the XBRL standard tag whose value is the closest to the Compustat item. We then verify that the most frequently selected tag is indeed the one in the mapping.
In DHJ (2023), Table IA.3 describes the sample selection process. Table IA.4 presents the Compustat-XBRL mapping for data items examined.