Common Ownership Data

Introduction

In the course of researching the common ownership hypothesis, we found a number of issues with the Thomson Reuters (TR) "S34" dataset used by many researchers and frequently accessed via Wharton Research Data Services (WRDS). WRDS has done extensive work to improve the database, working with other researchers that have uncovered problems, specifically fixing a lack of records of BlackRock holdings. However, even with the updated dataset posted in the summer of 2018, we discovered a number of discrepancies when accessing data for constituent firms of the S&P 500 Index. We therefore set out to separately create a dataset of 13(f) holdings from the source documents, which are all public and available electronically from the Securities and Exchange Commission (SEC) website. Coverage is good starting in 1999, when electronic filing became mandatory. However, the SEC's Inspector General issued a critical report in 2010 about the information contained in 13(f) filings. 

The process:

The resulting dataset has around 48M reported holdings (CIK-CUSIP) for all 76 quarters and between 4,000 and 7,000 CUSIPs and between 1,000 and 4,000 investors per quarter. We do not claim that our dataset is perfect; there are undoubtedly errors. As documented elsewhere, there are often errors in the actual source documents as well. However, our method seemed to produce more reliable data in several cases than the TR dataset, as shown in Online Appendix B of the related paper linked above.  

Downloads

Examples of the Parsing Challenge:

Prior to the XML era, filings were far from uniform, which creates a notable challenge for parsing them for holdings.