Many mutual fund SEC filings, including semi-annual N-CSR filings, are at investment management company level, each of which may include one or many mutual funds. In order to match between those filings and the conventional mutual fund datasets, such as CSRP dataset, you need a mapping of the investment management company and the included mutual funds. From N-SAR filings, I have collected all management company filings and their included mutual funds from 2003-2016. NOBS: 299825 Size: 161 MB. If you download and use the file, please let me know by email. Also you can email me to check if I have updated the dataset.
Simple Code for Collecting Data from N-SAR [github link]
N-SAR filings are one of the most well-organized filings on SEC EDGAR system, so automated scraping over these filings is very convenient. This very simple code collects information mentioned in items 7, 7a, and 7b of the N-SAR filings. You may change the code to get your desired result.
Simple code for running web-scraping code on parallel computers [github link: here and here]
In my paper, I searched over 500,000 names through more than 45,000 SEC filings. Based on the huge size of the data, I needed to run the code on Super-Computers on GSU computing center. This sample code gives you an idea how to chunk your data on run simultaneously on parallel computers. The code itself collects all the names in 10-k filings which follow a "s/", which is a sign of signature in 10-K filing. As a result, this code is the starting point to collect directors names from 10-k filings. After this step, more cleaning is required though because of the nature of 10-K filings. I thank Dr. Suranga Edirisinghe for kindly helping me to run the codes on GSU DICE system.
An index of all SEC filings from 1993 - 2016, collected from SEC EDGAR idx files. This file includes the web-address for all types of filings, from all companies through the period, for web-scraping purposes. NOBS: 17,922,689. Size: 200MB (Compressed in rar format).