grep

Grep

The command grep will print the lines matching a given pattern.

grep PATTERN file

grep -e PATTERN file (Pattern uses regex)

The different flags for the command grep can allow us to explore that specific pattern in more detail:

Counting the ocurrences of a string:

grep -c 'string' file

Printing only the lines not containing the string:

grep -v 'string' file

Printing line numbers of matches with string:

grep -n 'string' file


We can specify range of the output and input:

Prints string, not the line:

grep -o 'string' file

String must match the entire line:

grep -x 'string' file


We can also use grep on multiple files, with the l and L options. 

Lists files containing the string:

grep -l 'string' /home/genomics/*txt

Lists files not containing the string:

grep -L 'string' /home/genomics/*txt



Example of grep on a fastq file. Find a specific sequence among your reads: 

grep "ACGT" file.fastq

If we observe the fastq header (using head), we can also grep all the sequence identifiers and print them: 

grep "@A" file.fastq

We can use grep with the flag -c to count how many reads we have:

grep -c "@A" file.fastq


Example of grep on a fasta file. Print all the sequence headers:

grep ">" file.fasta

We can also count the number of sequences by piping it to wc -l:

grep ">" file.fasta | wc -l 


Example of grep on a VCF file. Print every line that is not a comment:

grep -v "#" file.vcf