For archival researchers relying on financial statement data from commercial aggregators, the findings of DHJ (2023) provide a cautionary note that discrepancies between commercial data and originally-filed accounting numbers are both common and large. The discrepancies are greater when the financial statement is more complex, and in certain industries (e.g., Energy, Shops, and Chemicals). Moreover, different aggregators (e.g., Compustat and FactSet) exhibit non-overlapping discrepancies with as-filed data, implying that their standardization practices diverge.
In light of the discrepancies, we recommend using as-filed data in research that relies on values we show to have such discrepancies. Understanding that the cost, time, and effort involved in preparing data for analysis is a barrier to use, we make the as-filed analogs of the Compustat data items used in this study publicly available on our website and through the Wharton Research Data Services (WRDS). Along with the counterpart to Compustat data, we also provide a more granular set of as-filed line items prepared using our methodology, which may facilitate research that requires data items more flexibly defined than those provided in conventional databases. Also made available is the data set on the structure of as-filed financial statements.
The data are available in three different formats: SAS, Stata, and CSV.
Disclaimer: Data may be used for non-commercial purposes free of charge. For all other uses please contact us.
A Compustat-like data set that includes 77 data items including those examined in DHJ. Data items are named following the Compustat convention.
A more granular data set that allows a researcher to prepare accounting data items that are not in the list of Compustat data items.
Data on the hierarchical structure of XBRL 10-K filings, disclosure quality, number of tags, and level of depth of each of the three financial statements