pigz
pigz (Parallel gzip)
pigz [1], which stands for parallel implementation of gzip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data. pigz was written by Mark Adler, and uses the zlib and pthread libraries.
Running pigz utility in HPC
Batch Job Submission:
Copy the following content of the slurm script "pigz.slurm" in a script file. Here, the "test" directory is compressed into temp.gz file. Also check the pigz manual [2] for details.
pigz.slurm:
!/bin/bash
#SBATCH --mail-user=abc123@case.edu
#SBATCH --mail-type=ALL
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH --time=2:00:00
#SBATCH --mem=500m
#SBATCH -o pigz.out
cp -r test $PFSDIR
cd $PFSDIR
module load pigz
NPROCS=$(( $SLURM_JOB_CPUS_PER_NODE*$SLURM_NNODES ))
#Convert a folder into a file
tar -cf temp test
#Compress the file employing multiple threads/cores
pigz -p $NPROCS --best temp
cp * $SLURM_SUBMIT_DIR
Submit the job
sbatch pigz.slurm
Check the status using the top command:
ssh -t <node-your-job-is-running> top
output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3594 <caseID> 20 0 319m 5512 616 S 300.1 0.0 0:27.00 pigz
Since 4 cores have been requested using "#SBATCH --cpus-per-task=4" the CPU% can go up to 400%
References:
[1] pigz Home
[2] pigz Manual