Our “as-filed” data are based on the Financial Statement and Notes Data Sets compiled by the SEC, which contain financial statement information directly extracted from periodic corporate XBRL filings. For some firms, data are available from 2009; however, we focus on the period of 2012–2019, during which all public firms were required to submit periodic filings in XBRL format.
A major step of our data preparation process involves constructing an as-filed data set comparable to the annual fundamental file compiled by Compustat. We focus on 73 Compustat data items that are either (i) frequently used in accounting and finance research or (ii) key elements on one of the three financial statements. Our list covers most data items frequently used in prior research. In "List of Data Items," we tabulate the data items included in our study with more than 100 Google Scholar citations. These 73 data items include but are not limited to those used in later analyses of specific research settings. In the Compustat-like data we post on this website, we also include four additional data items to facilitate the check of certain accounting identities.
For each data item, we identify the highest-level tags in the taxonomy that correspond to the Compustat data items based on Compustat’s balancing model for financial statement items (S&P Global, 2018). The mapping is unambiguous and is not subject to the researcher’s discretion. Nevertheless, we validate this mapping by verifying that the selected tag (or the combination of several tags) dominates all other tags when following the procedure detailed in "Matching As-filed Data with Compustat."
Occasionally, however, the filing does not report a value for a high-level tag. In such cases, we use the hierarchical relations specified by the calculation linkbase to impute the high-level tag value from the values of the appropriate child tags. We then validate imputed tag values using several basic accounting identities. Appendix C.1 contains further details on these steps. We emphasize that these steps do not involve subjective judgments on our part because the linkbase encodes relationships among all standard tags as determined by the taxonomy.
When the standard taxonomy does not accommodate unique circumstances in a filer’s disclosure, filers are permitted to extend the taxonomy by using custom tags in addition to the standard tags prescribed by the taxonomy. We use an algorithm detailed in "Treatment of Custom Tags" to search for the nearest equivalent standard tag for every custom tag. In essence, the algorithm seeks the best match (through fuzzy matching of tag labels) for each custom tag among the standard tags that are descendants of the same parent as the custom tag.
The as-filed disclosure quality data is compiled following the procedure detailed here.
Reference: "Lost in Standardization: Effects of Financial Statement Database Discrepancies on Inference" (Du, Huddart, and Jiang (DHJ), 2023, JAE) (SSRN).