15 - Working with compressed files saves disk space
Bioinformatic processing of high throughput sequencing data uses batches of large files as input and creates large batches of out files at nearly every processing step--that can quickly consume lots of disk space!
Step 1 - hundreds of compressed sequence files
Step 2 - pair reads, creating hundreds of paired read files
Step 3 - trim adapters/primers, creating hundreds more read files, etc.
Work with the compressed *.fastq.gz files whenever you can. Many programs will allow you to use these for input and will print compressed output files as well. I do this routinely with SEQPREP for pairing R1 and R2 files, as well as with CUTADAPT when trimming primers.
To peak into a compressed file use zcat:
$ zcat file.fastq.gz | head
or
$ zcat file.fastq.gz | tail
To count lines in a compressed file:
$ zcat file.fastq.gz | wc -l
To print to another file (decompress): (if you can't use gunzip for some reason)
$ zcat file.fastq.gz > file.fastq