Data/Programming

Home     Other     Publications     Working papers     Data/Programming
Stata guide
For the latest version of my Stata guide for PhD (/MSc) students, please contact me by email (current version: Dec '13)

Perl code for downloading of SEC filings (with thanks to Joost Impink, see wrds.us)

#!/usr/bin/perl
use strict;
use warnings;
use LWP;

my $ua = LWP::UserAgent->new;

open LOG , ">download_log.txt" or die $!;
open DLIST, "downloadlist.txt" or die $!;
my @file = <DLIST>;

foreach my $line (@file) {
   
my ($nr, $get_file) = split /,/, $line;
   
chomp $get_file;
   
$get_file = "http://www.sec.gov/Archives/" . $get_file;
   
if ($get_file =~ m/([0-9|-]+).txt/ ) {
       
my $filename = $nr . ".txt";
       
open OUT, ">$filename" or die $!;
       
print "file $nr \n";
       
my $response = $ua->get($get_file);
       
if ($response->is_success) {
           
print OUT $response->content;
           
close OUT;
       
} else {
           
print LOG "Error in $filename - $nr \n" ;
       
}
   
}
}




















This code assumes an input file ("downloadlist.txt") that looks like this (the list can be created based on the SEC's quarterly FTP indices):

downloadId,url
1,edgar/data/864270/0001564590-15-005265.txt
2,edgar/data/864509/0001144204-15-039213.txt
3,edgar/data/864683/0000864683-15-000000.txt
4,edgar/data/864683/0000864683-15-000037.txt


Alternatively, HTTP::Tiny can be used to achieve the same:

#!/usr/bin/perl
use strict;
use warnings;
use HTTP::Tiny;

my $ht = HTTP::Tiny->new;

open LOG , ">download_log.txt" or die $!;
open DLIST, "downloadlist.txt" or die $!;
my @file = <DLIST>;

foreach my $line (@file) {
   
my ($nr, $get_file) = split /,/, $line;
   
chomp $get_file;
   
$get_file = "http://www.sec.gov/Archives/" . $get_file;
   
if ($get_file =~ m/([0-9|-]+).txt/ ) {
       
my $filename = $nr . ".txt";
       
open OUT, ">$filename" or die $!;
       
print "file $nr \n";
       
my $response = $ht->get($get_file);
       
if ($response->{success}) {
           
print OUT $response->{content};
           
close OUT;
       
} else {
           
print LOG "Error in $filename - $nr \n" ;
       
}
   
}
}