pigz

pigz (Parallel gzip)

pigz [1], which stands for parallel implementation of gzip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data. pigz was written by Mark Adler, and uses the zlib and pthread libraries.

Running pigz utility in HPC

Batch Job Submission:

Copy the following content of the slurm script "pigz.slurm" in a script file. Here, the "test" directory is compressed into temp.gz file. Also check the pigz manual [2] for details.

pigz.slurm:

!/bin/bash

#SBATCH --mail-user=abc123@case.edu

#SBATCH --mail-type=ALL

#SBATCH --nodes=1

#SBATCH --cpus-per-task=4

#SBATCH --time=2:00:00

#SBATCH --mem=500m

#SBATCH -o pigz.out

cp -r test $PFSDIR

cd $PFSDIR

module load pigz

NPROCS=$(( $SLURM_JOB_CPUS_PER_NODE*$SLURM_NNODES ))

#Convert a folder into a file

tar -cf temp test

#Compress the file employing multiple threads/cores

pigz -p $NPROCS --best temp

cp * $SLURM_SUBMIT_DIR

Submit the job

sbatch pigz.slurm

Check the status using the top command:

ssh -t <node-your-job-is-running> top

output:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

3594 <caseID> 20 0 319m 5512 616 S 300.1 0.0 0:27.00 pigz

Since 4 cores have been requested using "#SBATCH --cpus-per-task=4" the CPU% can go up to 400%

References:

[1] pigz Home

[2] pigz Manual