Recipes‎ > ‎05 - IO‎ > ‎

02 - Slurp a file

Problem:

Sometimes you need to read all the file before processing it, normally called slurp in the Perl Community.

Solution:

use File::Slurp qw(slurp);
my $body = slurp($filepath);

# read all the file at once bypassing the IO stack
open( my $FH, '<', $filepath ) or die "can't open file";
sysread( $FH, my $body, -s $FH );
close $FH;

# read all the file at once using the IO stack
open( my $FH, '<', $filepath ) or die "can't open file";
my $body = do { local $/; readline $FH };
close $FH;

# read all the file at once using the IO stack, doesn't die if the file doesn't exists
my $body = do { local ( @ARGV, $/ ) = $filepath; readline };

Description:

You may read all the file into an array with the minimal code:

my @lines = readline $FH;

If you need the file divided in lines, most of times, it is better to read the file inside a while cycle.

Usually slurping a file means reading the file content to a scalar, and one of the worst solutions is to join everything after reading it to an array:

my $body = join( '', readline $FH);

Multiple better solutions can be used. Lets start by the worst ones first

When we set the $/ ($INPUT_RECORD_SEPARATOR under use English) to undef, any call to readline (note that <$FH> and readline($FH) are similar) will read all the file at once. A solution commonly used is:

open( my $FH, '<', $filepath ) or die "can't open file";
my $body = do { local $/; readline $FH };
close $FH;

Attention to localize the $/, since the variable is global. If you leave it with undef, any other read will read the full file also.

One other solution that we may use, it's to change also @ARGV, setting it with the file path. like:

my $body = do { local ( @ARGV, $/ ) = $filepath; readline };

The problem in here is that you lose the control of the existence of the file. The warning will go to the *STDERR.

Now let's look to the fast solutions. You may use the fast sysread passing the file size like:

open( my $FH, '<', $filepath ) or die "can't open file";
sysread( $FH, my $body, -s $FH );
close $FH;

We may also find the package File::Slurp. Even considering that slurp isn't exported by default, it is the name that will be used in Perl 6

use File::Slurp qw(slurp);
my $body = slurp($filepath);

I will suggest the last as the most readable solution

See Also:

Unit Test:

use strict;
use warnings;
use Test::More;
use Benchmark qw(cmpthese);
use File::Slurp qw(slurp);

# slower solution, concatenating every line of the file 
sub concatenate_file {
    my ($filepath) = @_;
    my $body = '';
    open( my $FH, '<', $filepath ) or die "can't open file";
    while ( my $line = readline $FH ) {
        $body .= $line;
    }
    close $FH;
    return $body;
}

# read all the file at once using the IO stack
sub read_filehandle_in_do {
    my ($filepath) = @_;

    open( my $FH, '<', $filepath ) or die "can't open file";
    my $body = do { local $/; readline $FH };
    close $FH;

    return $body;
}

# read all the file at once using the IO stack, doesn't die if file doesn't exists
sub read_file_in_do {
    my ($filepath) = @_;
    return do { local ( @ARGV, $/ ) = $filepath; readline };
}

# read all the file at once, the faster solution
sub sysread_filehandle {
    my ($filepath) = @_;

    open( my $FH, '<', $filepath ) or die "can't open file";
    my $body;
    sysread( $FH, $body, -s $FH );
    close $FH;

    return $body;
}

my $filepath = __FILE__;
is( concatenate_file($filepath),      read_filehandle_in_do($filepath) );
is( read_filehandle_in_do($filepath), read_file_in_do($filepath) );
is( read_file_in_do($filepath),       sysread_filehandle($filepath) );
is( sysread_filehandle($filepath),    scalar( slurp($filepath) ) );

cmpthese(
    1000,
    {
        'concatenate_file'      => sub { concatenate_file($filepath) },
        'read_filehandle_in_do' => sub { read_filehandle_in_do($filepath) },
        'read_file_in_do'       => sub { read_file_in_do($filepath) },
        'sysread_filehandle'    => sub { sysread_filehandle($filepath) },
        'slurp'                 => sub { scalar( slurp($filepath) ) },
    } );

done_testing();
Comments